November 29, 2005

Football's F-word

After the Indianapolis Colts beat the Pittsburgh Steelers on Monday night, Indianapolis coach Tony Dungy was asked about the Colts being called a "f****** team." Dungy replied:

"We don't think we are. We never thought we were. It's something that, yeah, (the players) don't like to be called that. But you can't change people's perception."

On Sunday, after the Seattle Seahawks beat the New York Giants, Seattle center Robbie Tobeck was quoted as saying:

"People think that because we're from the Northwest that we're a f****** team. We're not a f****** team."

And earlier this month after the Cincinnati Bengals beat the Baltimore Ravens, a Columbus Dispatch article led off with this outraged sentiment from the Bengals' right tackle:

Willie Anderson has heard the Cincinnati Bengals described as a f****** team, and it rankles him.

What is this terrible aspersion against the character of football teams that requires such a spirited defense?

The unspeakable word is finesse.

Charles Robinson of Yahoo! Sports refers to it as "the F-word," while's Len Pasquarelli prefers to call it "the F-bomb." Clearly the word has a tremendous ability to infuriate NFL coaches and players alike.

King Kaufman, sports columnist for Salon, sarcastically explains the subtext: finesse is "a football code word" implying that a team is "a bunch of sissies who listen to show tunes in the locker room if you know what I mean." Kaufman first picked up on football's anti-finesse posturing in a 2002 column discussing a game between the St. Louis Rams and the New York Giants:

Leading up to the game, the Giants had used the F word, which in St. Louis is "finesse." See, teams use that word to describe the Rams because the Rams are built for speed, not power. Which is to say that the Rams are a bunch of flaming homosexuals. Well, not really. But, you know, sort of. And the Rams say, We are not a finesse team. We are in fact a rough, gritty, hard-hitting group of butch boys, and even though our receivers run very fast, they are all real men in every way if you know what I mean.

The dictionaries may define finesse as "elegant ability and dexterity," but NFLers hear something quite specific in the word: an unmanly suggestion of softness or delicacy, inappropriate to football's pervasive atmosphere of machismo. No doubt the French pedigree of the word, with its dainty -esse ending, contributes to this perception.

Finesse isn't such a dirty word in baseball, where its attributive usage is limited mostly to pitchers and pitching. The New Dickson Baseball Dictionary defines a finesse pitcher as "a pitcher who relies on placement, deception, change of speed, and guile rather than velocity and power." The Proquest database has examples going back to 1967 at least, as in this quote from Dodgers pitcher Don Sutton after completing a two-hitter:

Los Angeles Times, June 28, 1967, p. III2, col. 4
"I had decided to stop trying to be a finesse pitcher. Most of the times I've been hit hard, I've been trying to finesse the hitters. Throwing them changeups on 3-and-2 counts and things like that. I decided I should just go out there and throw the ball and try to get ahead of the hitters. I'm just not a cutie-pie pitcher. I don't know how to pitch like a 35-year-old man."

Sutton, then only 22 years old, disparages the "cutie-pie" tactics relied upon by older pitchers no longer able to overpower opposing batters with an intimidating fastball. (One is reminded of Nuke Laloosh in Bull Durham wanting to "bring heat" against every batter he faces.) But by the time Sutton finished his Hall of Fame career more than two decades later, I think he would have had fewer concerns about being categorized as a finesse pitcher.

The earliest attributive usage I've found in football is from 1969, in a quote from Los Angeles Rams defensive tackle Merlin Olsen:

Washington Post, Nov. 23, 1969, p. C4, col. 2
"Before then we were a clawing, biting, scratching team. Now we have developed finesse," Olsen said. "Now we can win in three or four different ways. You can't win consistently on just power or just finesse. Eventually, a finesse team is going to out-finesse itself. And a power team will eventually run into a team it can't overpower."

Olsen takes a balanced view on football's power vs. finesse dichotomy: over the long run, a team can succeed neither as a finesse team nor as a power team, but must ideally have elements of both. Nowadays, however, it seems that even the suggestion of finesse is enough to make a player or coach sue for defamation of character.

In the car: a 0.1-act play

E is six years old. M is his mother. M is driving, E is in the back seat.

E:     How many vowels are in "yay"?
M:     Well, that's interesting.  Yay has a diphthong, which is when two vowels are stuck together.  you can hear it if you say it slowly:  Yaaaaaeeeee.
E:     Yaaaaeeeee.  Yaaaaeeee.  Yeah, I hear it.  Are there any diphthongs in "diphthong"?
M:     No, there are only the vowels [ɪ] in dip  and [ɔ] in thong.
E:     Oh, that's too bad.  You know that there's an er-sister (the kindergarten term for an r-colored vowel) in 'sister'.  That's cool.  What are some other diphthongs?
M:     Well, there's the ow in cow--hear it  Kaaaaaaauuuuuu   and there's .....
E:     Are we almost home?

[Script submitted by Mark Seidenberg. Another phonemic awareness story is here.]

How was your Cyber Monday? In case you missed out on the avalanche of hype, online retailers promoted yesterday as "Cyber Monday," a brand-new coinage. While "Black Friday" — the day after Thanksgiving — kicks off the traditional holiday shopping season, online shopping is supposed to follow suit on the following Monday with a big spike in purchases. But the Monday after Thanksgiving has in the past proven to be only the the twelfth-biggest online shopping day of the year, according to Business Week, which exposes the new name as little more than a clever marketing ploy:

It turns out that, an association for retailers that sell online, dreamed up the term just days before putting out a Nov. 21 press release touting Cyber Monday as "one of the biggest online shopping days of the year."

The idea was born when a few people at the organization were brainstorming about how to promote online shopping, says Executive Director Scott Silverman, who answered his phone, "Happy Cyber Monday." They quickly discarded suggestions such as Black Monday (too much like Black Friday), Blue Monday (not very cheery), and Green Monday (too environmentalist), and settled on Cyber Monday. "It's not the biggest day," Silverman concedes. "But it was an opportunity to create some consumer excitement."

Still, as far as marketing ploys go, the coinage of "Cyber Monday" has by all accounts been an overwhelming success. (The neologism-hunters have been hot on its trail: Paul McFedries of The Word Spy reported it on Nov. 25, Barry Popik of the American Dialect Society on Nov. 26, and Grant Barrett of Double-Tongued Word Wrester on Nov. 27.) The first known media mention of "Cyber Monday" was in a Nov. 19 New York Times article, as the Times evidently had the inside line on the machinations at After the official press release two days later, "Cyber Monday" spread extremely quickly, aided by numerous mentions from both traditional and online media outlets.

And how might we measure the initial spread of the coinage? Not surprisingly, the Business Week writer takes the easy way out and trumpets a Googlecount in the article's lead paragraph:

Do a Google search on "Cyber Monday," and you get as many as 779,000 results. Not a bad haul for a term that was created just a week and a half ago to describe the jump in online shopping activity following the U.S. Thanksgiving holiday.

A count of 779,000 is impressive to be sure (though the article is careful to hedge its bets by saying "as many as 779,000 results"). But as we've seen time and again, Googlecounts of any sizable magnitude simply aren't trustworthy. There's no reason to expect this number to be particularly meaningful.

At the moment, Google tells me there are about 740,000 results for "Cyber Monday." No matter: Googlecounts can vary quite a bit according to time and place accessed, and that's only 5 percent off from the Business Week count. I then cross-checked the count against Yahoo, which search-engine observers such as Jean Véronis find to be generally more consistent in its reporting of counts returned (though it too has its shortcomings). Yahoo currently yields about 751,000 results for "Cyber Monday," so there's no big discrepancy between the two search engines.

If we skim through the results on both search engines, however, we'll see a lot of repeated text. Theoretically, this should be offset by the search strategy I previously discussed, where one finds the count for "most relevant results" by appending "&start=950" to the end of the URL (for Yahoo, "&b=1000" does the trick). But this yields some odd conclusions. Though both search engines max out their "most relevant results" at about 1,000, only Yahoo reaches that limit for "Cyber Monday" — Google reports only about 500 distinct results!

Surely there must be more than 500 appearances of the term that Google considers non-identical. I'd expect this to be yet another spurious figure, based on some quirk in the algorithm Google uses to compare pages to judge their similarity. Even if this number is far too low, the repetitions of text involving "Cyber Monday" are extremely frequent. For instance, many sites reprint a Nov. 25 article by Reuters under the headline, "Online retailers await 'Cyber Monday.'" How many of the Google and Yahoo results use this text? Quite a large proportion, if the total results have even a modest degree of reliability. Here is what I currently get (though of course I cannot guarantee that these figures will come close to approximating anyone else's results):

Google Yahoo
"Cyber Monday"
"await Cyber Monday" 402,000
percentage using "await"

Disjunctive queries to determine the non-Reuters appearances are reasonably consistent with these numbers: <"Cyber Monday" -"await Cyber Monday"> at the moment yields 381,000 results on Google and 478,000 on Yahoo, roughly what one would expect by subtracting the above figures. Could a third to a half of the search results for "Cyber Monday" really reproduce the Reuters article, or at least its headline?

A look at some of the results for "await Cyber Monday" indicates that a huge number of websites include automatic feeds of news stories from Reuters and other wire services, mostly via RSS services provided by such portals as Yahoo News or CNET The Reuters headline was included in the feeds for top news stories, and it thus appeared automatically on countless sites.

Beyond the reproduction of the Reuters headline, it's also important to consider the provenance of this particular term. "Cyber Monday" was invented by online retailers for the express purpose of boosting their Web-based sales. It wouldn't be at all surprising if various websites were flooding the search engines with text on "Cyber Monday" in the lead-up to the big day as a way of building the hype.

This is yet another object lesson in the unreliability of counts provided by Google and other search engines: words and phrases thrust suddenly into circulation may show very misleading counts, at least until an initial period of volatility passes. (Further evidence for this volatility: I waited a few hours and checked Google again, and now it gives me 1,010,000 results for "Cyber Monday" and 585,000 for "await Cyber Monday"! I would expect the number of raw hits to continue fluctuating wildly over the next few days. Still, the "await" ratio doesn't seem to have changed much, with 58 percent of Google's results now apparently derived from the Reuters headline — not that I put much stock in that figure either.)

When the retailers count up their profits, they'll find out the extent to which the "Cyber Monday" hype paid off. Whether the marketing campaign had any lasting effect on the lexicon remains to be seen. Many of the results returned by the search engines (particularly those generated by news feeds) will soon fade away, and "Cyber Monday" may quickly be forgotten — at least until next year, when the inevitable marketing machinery kicks into gear once again.

November 28, 2005

Presidential self-repair

If you missed that odd annual Thanksgiving ritual, the presidential turkey pardon, Bruce Reed provides a sarcastic blow-by-blow on Slate today. Reed contrasts Bush's buoyant mood at last year's ceremony, shortly after his reelection, with the more somber atmosphere at the White House this year. Bush even had trouble making it through his routine introductory jokes, Reed notes:

Bush labored through the ceremony, stammering, "This is what we call—the White House is called the people's house, and we're going to call Marshmallow and Yam the people's turkey ...s."

This is perhaps further proof of Bush's growing disfluency over the course of his administration, but this particular stumble is worth some extra attention.

The video of the ceremony accompanies the official White House transcript, but I have isolated the audio for the relevant sentence in an MP3 file here. My transcription follows:

the uh [pause 1.26]
this is what we call th- [pause 0.64]
uh the White House is called the people's house [pause 1.10]
and uh we're going to call Marshmallow and Yam the people's turkey [pause 0.68]
zuh [pause 1.20]

After a couple of false starts, Bush performs "self-repair" (as it's called in discourse analysis) and introduces the first half of his coordinate structure ("X is called the people's Y"). But things go amiss once again in the second half, as Bush runs into a conflict between a plural noun phrase ("Marshmallow and Yam") and a singular coreferent ("the people's turkey"). I suspect that the source of Bush's confusion was twofold: first, he was swept along by the coordination between "the people's house" and "the people's turkey," and second, only one of the two pardoned turkeys was actually present at the event. (Yam, the president revealed, was "in a pickup truck hanging out by the South Lawn.")

Unlike other cases where Bush has run into problems with agreement in number, it was possible to rectify this error without repeating all or part of the sentence. All he needed to do was to add [z] to the end of "turkey" to indicate the plural form. But Bush pauses a bit too long to perform this self-repair seamlessly. And once he does so, he chooses neither to repeat the corrected word ("turkeys") nor to add a syllabic [z̩], as appears in onomatopoetic words like bzzz. (The latter strategy is often used to contrast a plural form as distinct from a singular, e.g.: "Did you say 'your friend' or 'your friends' were coming over?") Rather, Bush appends an exaggerated monosyllable, which could be transcribed as [zə] or [zʌ], and then moves on to the next sentence.

It's possible that Bush was hamming up his speech a bit for the crowd, which included students from an elementary school in Clarksville, Maryland. But it wasn't a very good example to set for the kids, discourse-wise.

[Then again, maybe Bush was simply deploying an interjection from The Simpsons: "Zuh," defined by Wikipedia as an "exclamation used when one cannot comprehend a complex situation or statement."]

November 27, 2005

Waiter, there's a metaphor in my soup!

Mark Liberman wonders about the origins of the expression in the soup, meaning "in great difficulty," noting that an animal (or human) would prefer to be out of the soup than in it. But there's one modest creature that I think could have provided the basis for the figure of speech: the fly, which was constantly finding its way into people's soup in the humor of the late 1880s (when the "soup = difficulty" metaphor first arose).

Though the OED gives a first citation of April 1889 for in the soup, Gerald Cohen and Barry Popik of the American Dialect Society have discovered two examples from the previous year, both in sporting contexts (the first baseball and the second horse-racing):

The World, Apr. 26, 1888, p. 3
The photographers were slow in getting ready and the boys on the bleaching-boards encouraged them to speed by yelling: 'Play ball.' 'Quit talking through your hat!' 'That picture machine is in the soup — it can't work!' and all sorts of similar comments."
[quoted in "Old baseball columns as a repository of slang: reading through The World." In Studies in Slang, Part 2, edited by Gerald Leonard Cohen. Frankfurt am Main: Peter Lang, 1989, pp. 11-84.]

New York Times, Sep. 1, 1888, p. 8
McLaughlin won with King Crab in the easiest possible fashion, and Speedwell finished "in the soup."

All of the early cites for the expression are clustered around 1888-1889, when sportswriters and others began using it in a rather faddish manner. My best guess is that the metaphor arose from the popularity of jokes at the time in which a restaurant customer exclaims, "Waiter, there's a fly in my soup!" (or something along those lines). The earliest example I've found of this old chestnut comes from 1872:

Appletons' Journal, Aug. 13, 1872, p. 140 (via Making of America)
Guest — "How comes this dead fly in my soup?"
Waiter — "In fact, sir, I have no positive idea how the poor thing came to its death. Perhaps it had not taken any food for a long time, dashed upon the soup, ate too much of it, and contracted an inflammation of the stomach that brought on death. The fly must have a very weak constitution, for when I served the soup it was dancing merrily on the surface. Perhaps — and the idea presents itself only at this moment — it endeavored to swallow too large a piece of vegetable; this, remaining fast in his throat, caused a choking in the windpipe. This is the only reason I could give for the death of this insect."

But it took another decade before magazines and newspapers began running variations on the "fly in the soup" joke. Here is a selection of jokes from the 1880s, culled from the Proquest and Newspaperarchive databases:

Saturday Evening Post, July 15, 1882, p. 14
"Here's a fly in my soup, waiter."
"Yes sir; very sorry, sir; but you can throw away the fly and eat the soup, can't you?"
"Of course I can; you didn't expect me to throw away the soup and eat the fly, did you?"

Decatur (Ill.) Morning Review, Oct. 24, 1885, p. 2
"Look here, waiter, quick," called out a gentleman in an Austin restaurant.
"What is it, sir?"
"Here is a dead fly in my soup."
"So I see. It seems to be quite dead."
"Well, by thunder, I want you to understand that I consider it an outrage."
"I am sorry, sir, but if you are opposed to eating dead animals, you should patronize one of the vegetarian restaurants."

Chicago Daily Tribune, Nov. 28, 1886, p. 7
Jakey — "Fader, dere's a fly in dor soup."
Mr. Cohn — "Vell, eat all but der fly before you show it to der waiter; den you can get some more."

Life, Feb. 17, 1887, p. 100
Customer (in restaurant): "Waiter, isn't it strange that I should find several flies in my soup?"
Waiter (somewhat amazed): "It is strange at this season of the year." —Harper's Bazaar.

Life, Dec. 13, 1888, p. 336
Customer (to waiter): I say, waiter, confound you, there's a fly in this soup!
Waiter (amazed): Well, I do decla', ef it yain't surprisin'! Eberything seems to be gittin' in de soup nowadays.

The last joke from December 1888 hinges on the double meaning of in the soup, indicating that readers of Life were expected to pick up on both the clichéd joke and the then-new slang expression. The combination of literal and figurative soups in the joke leads me to believe that the expression in the soup was originally meant to be understood from the perspective of the lowly fly, discovering itself again and again in the dire circumstances provided by the jokesters of the era.

Elite individuals

I've always thought of elite as a collective noun -- when people talk about "an elite," I assume they're referring to particular group and not simply a person who has elite characteristics.

Shows how much I know. The other day I was looking at the conservative talk-show star Laura Ingraham's book Shut Up and Sing: How Elites from Hollywood, Politics, and the UN are Subverting America and was brought up short by a whole clutch of sentences like:

As I said, being an elite is not necessarily about being a liberal and/or a Democrat. There are plenty of capitalist elites atop some of American's great corporations...

[Vincente Fox] can divert Mexico's excess labor force northwards to work in such minimum-wage jobs as looking after the elites' children, painting elites' houses, mowing the elites' lawns, and cleaning elites' homes.

What's up with that? Had I stumbled on a right-wing plot to subvert the semantics of English collective nouns?

As it happens, Ingraham isn't the only conservative who's in on this. John Leo writes:

We are seeing the bitterness of elites who wish to lead, confronted by multitudes who do not wish to follow.

National Review's Romesh Ponnuru writes:

For these people, the trouble with the federal government is not that it is too big but that it is run by elites who are disloyal to them.

In the Weekly Standard, Peter Berkowitz says:

They charge, for instance, that such programs appeal to white elites who wish to separate their children from blacks and to religious parents who wish to separate their children from the secular world.

(True, not all of these are unquestionably references to individuals, but that seems to be the most obvious reading.)

It turns out, though, that you can also find the usage among writers on the other side of the table, like Jim Hightower and Thomas Frank:

Excuse me, but where is the morality in cutting back on granny's small retirement check while rushing to pad the huge inheritances of Paris Hilton and other elites who live on their parents' wealth?

According to market populism, elites are not those who, say, watch sporting events from a skybox, or spend their weekends tooling about on a computer-driven yacht, or fire half their work force and ship the factory south

But then we've been down this road before. It used to be that you could only use minority as a collective noun; now it's all over the place as an individual-denoting term:

The fact that Asians, who are overrepresented on many of the best campuses, are minorities who contribute to campus diversity is downplayed. (Beth Henary in The Weekly Standard):

Of the ten sports offered, two have more than one or two minorities on the entire roster. ( Katheryn Jean Lopez in National Review).

But I did hear from a handful of minorities who said they've been verbally abused at rock concerts. ( Richard Roeper, in the Chicago Sun-Times).

Maybe it's a natural semantic change. Times was when cohort was restricted to use as a collective noun, and so was comrade (at least if you take it back to its Spanish etymon camarada, "group sharing a room"). But with political terms, you have to wonder why it has happened selectively -- why don't we see majority used this way? Beats me.

Update, 11/28: Mark points out that there are in fact some uses of majority as an individual-denoting term out there; he gives this cite, for example:

Well its all about being a minority, even minority experience, in my case, I grew up as a majority.

But Mark suggests that "such usage is not very common yet," and the numbers seem to bear that out. Google turns up 493 hits for "handful of minorities," almost all of them (including 19 of the first 20) involving the usage in question, whereas a search on "handful of majorities" turns up no hits at all.

Update, 12/3: Ranesh Ponnru responds that the use of elite for individuals "rings false" to him, and that he doesn't think he's using the word in that sense in the sentence I linked to (which is not the same one I cited above -- I got my links crossed). The sentence in question is:

Among the small band of Catholic elites who are pro-life liberals, it may be that the felt imperative to maintain friendly terms with pro-abortion Catholic liberals sways people's views on the communion question.

I interpreted this as an individual-denoting use of elite because a band of seems to suggest a group of individuals rather than a group of groups; to my mind, "a small band of elite groups" would sound odd in this context. But Ponnru is a pretty careful and lucid writer, so I'd be inclined to give him the benefit of the doubt on this one.

For another point about Ponnru's response, see this.

Deep soup

An 11/27/1005 NYT article by Sheryl Gay Stolberg, "Look Who's Talking About Making a Comeback in the Senate", piles up quotes from politicians about Senator Trent Lott's activities and plans. One of them:

"He has to be in the soup," Mr. Livingston said, "and I think he's been frustrated over the last couple of years, not being in the position of leadership that he once was."

For me, "in the soup" is one of the various locative idioms for being in trouble -- up the creek, in a fix, in deep gumbo -- and the AHD agrees:

IDIOM: in the soup Slang Having difficulties; in trouble.

So does the OED, glossing it as "in a difficulty. orig. U.S.", with citations like

1889 Lisbon (Dakota) Star 26 Apr. 4/2 After collecting a good deal of money, the scoundrels suddenly left town, leaving many persons in the soup.
1939 H. G. WELLS Holy Terror I. ii. 38 We're in the soup... We've got to do 1914 over again.

James Briggs, in a discussion forum at, gives an explanation (attributed to a tour guide) in terms of the humiliating effects of access to soup kitchens during the Irish potato famine in the 1840s. This might be true, but having occasionally worked as a tour guide myself in college, I'm skeptical of the scholarly value of tour guide stories, and I should have thought that there are at least two obvious metaphorical routes to the expression that don't have any particular historical reference.

In any case, Robert L. Livingston ("a Republican from Louisiana who was due to become House speaker in 1998 but left Congress amid revelations of an extramarital affair") apparently meant "in the soup" to be one of the various locative idioms for being in the center of the action, like in the swim or in the mix. Certainly that's the way that the NYT article frames it, by putting it immediately after this:

Some thought Mr. Lott would quietly slink away, but instead he rebuilt his career as sort of a Republican Greek chorus. On any given Tuesday in the Capitol, when Republicans meet for their policy luncheons, Mr. Lott can be found afterward lingering in the corridors, surrounded by reporters eager for sharp sound bites from the former leader.

Livingston's usage might be a Louisiana regionalism, but it's probably just a mistake -- that is, a word substitution in the process of speaking -- or an individual misunderstanding of the meaning of the idiom, a sort of phrasal malapropism. (Of course, it could also be a transcription error by the reporter -- in fact, previous experience suggests that this has a high probability.)

Whatever the source of the mistake, it's a natural one. It's true that if you're an animal, then you'd prefer to stay in the woods or the farmyard rather than being killed, cut up and boiled. And if you're a human being, the idea of floundering in a mass of thick organic liquid is one that probably holds little appeal. On the other hand, if you're an ingredient, it's obviously better to be in the soup, mixing it up with all the others, rather than on the shelf, left unused because of your poor quality or the cook's lack of interest in your flavor.

The "good to be in the soup" metaphor is natural and reasonable, but it's normally blocked by the fact the our culture has chosen a "bad to be in the soup" connotation, whatever its historical or metaphorical underpinnings, as the conventional idiomatic force of the phrase. This tension between creativity and convention is one of the forces that drives language onward. When people complain about usage, it's often because these forces are felt to be out of balance. Someone uses a metaphor creatively but in a way that is discordant with conventional interpretations, if you happen to know them, or chooses a conventional expression that is felt to be over-used, or is used in a context where its literal meaning feels foolish, if you happen to think about it. In fact, it's hard to say or write anything that is immune to all criticism of this sort. When you add the goal of transmitting an intended message, it's amazing that any of us ever manages to make it through a paragraph unscathed.

Churchill vs. editorial nonsense

For a while I've been on the trail of a saying usually attributed to Winston Churchill: "This is the sort of arrant nonsense up with which I will not put" (or some variation thereof). Typically the line appears in an anecdote where an officious clerk or editor tries to correct something Churchill has written by "fixing" his trailing prepositions, and Churchill then scribbles the famous comment in the margin of the revised text. I had previously found this anecdote circulating without reference to Churchill as early as 1942, with the first attributions to Churchill appearing in various forms in 1948. (A version in Sir Ernest Gowers' Plain Words that year played a large part in the story's dissemination.) Now I think I've found the original attribution to Churchill, though it differs in some important ways from later retellings.

The source is a short news story that was wired by a correspondent in London to both the New York Times and the Chicago Tribune in February 1944. Even though the same story reached both newspapers, the New York Times editors made a few small but critical revisions, as a side-by-side comparison reveals:

Chicago Tribune, Feb. 28, 1944, p. 1 New York Times, Feb. 28, 1944, p. 9

I presume that the Tribune editors made few or no changes to the correspondent's original copy. (Notably, when the Los Angeles Times printed the article in the same day's paper, they used the exact same wording as the Tribune, even though they credited the New York Times. This suggests that the Tribune's version was the one that made the wires.)  The New York Times editors made a few sensible revisions (such as changing the odd adjectival usage of embryo to embryonic), but they made one change that seems to undercut Churchill's humor completely: they "fixed" the quote so that there are no fronted prepositions. (Technically speaking, up doesn't count as a preposition here; rather, as Geoffrey Pullum explains, it is considered an adverb in traditional grammatical analyses.)

Here are the three versions of the crucial relative clause:

up with which I will not put (attributed to an unnamed writer by the Strand Magazine in 1942, later to Churchill)
with which I will not put up (attributed to Churchill by the Chicago Tribune and L.A. Times, Feb. 28, 1944)
which I will not put up with (attributed to Churchill by the New York Times, Feb. 28, 1944)

Let's suppose that the correspondent's story isn't completely apocryphal and that Churchill actually made such an annotation. My suspicion is that Churchill saw the 1942 version appearing in the Strand Magazine (to which he frequently contributed) and created his own variation on the theme. If he wrote "with which I will not put up," then the line would still retain some of the derisive flavor of the original anecdote, since there is still at least one inappropriately fronted preposition, with. But the version in the New York Times does not feature any preposition-fronting, thus defeating the purpose of the joke (which ridicules the convoluted steps that bad writers take to avoid sentence-final prepositions).

Presaging modern spellchecker-generated errors, someone at the Times apparently committed an editorial hypercorrection. I would surmise that the offending editor thought that the point of the squib was merely Churchill's strong castigation of the "tedious nonsense" in ministerial minutes. The relative clause was seen as secondary, rather than the entire point of the remark, and thus was subject to redaction. The final sentence, mentioning the underscoring of up ("just to make his intention plain," in the Times version), then appears to be nothing more than added emphasis on Churchill's part, rather than driving home the witticism.

It is of course deeply ironic that an anecdote about editorial intrusion (especially in other tellings involving an overeager Foreign Office clerk or book editor) should itself be foiled by an intrusive editor. But even when the prepositional humor is maintained, there are other possibilities for spoiling the anecdote. Here is the earliest example I've found where Churchill is credited with the canonical form, "up with which I will not put":

Los Angeles Times, Apr. 7, 1946, p. C11
"Things About Which Women Are Talking"
Women are passing along a bon mot in the current issue of Counter-Point. Winston Churchill, after laboring through the circumlocution and trailing prepositions of a governmental report, exploded, "This is the sort of stilted English up with which I will not put."

As with the 1944 version, Churchill is bemoaning the tortured prose of official government reports. (Churchill was much in the news at the time as a proponent of "Basic English.") Though we get the joke properly told this time, the context is confusing. Why would Churchill make his comment after laboring through text with "trailing prepositions"? Surely the whole point of the quip is to draw attention to the unnecessary contortions a writer goes through to avoid trailing prepositions. (And as a further bit of delicious irony, the rubric of this column features a daintily fronted preposition: "Things About Which Women Are Talking"!)

Finally, in September 1946, the anecdote appeared in its more familiar form, as a battle between Churchill and a "stuffy Foreign Office secretary" over the editing of the Prime Minister's speeches (note that the story must have been set before Churchill and the Conservatives lost the general election in July 1945):

Washington Post, Sep. 30, 1946, p. 12
"Town Talk," by Eva Hinton
Latest Churchill story going the rounds has to do with a stuffy young Foreign Office secretary who had the job of "vetting" the then Prime Minister's magnificent speeches. The young man disliked the P.M.'s habit of ending sentences with prepositions and corrected such sentences whenever he found them.
Finally, Mr. Churchill had enough of this! So he recorrected his own speech and sent it back to the Foreign Office with a notation in red ink, "This is the kind of pedantic nonsense up with which I will not put!"

In this telling, the anecdote resembles the original 1942 version in the Strand Magazine where the line is credited to an unnamed writer. Gone is Churchill's opprobrium toward the "tedious nonsense" of ministerial minutes; his ire is directed instead toward a would-be improver of his own prose (though Churchill's "red ink" remains constant). It would seem, then, that the story that inspired Churchill to make his purported 1944 annotation in the first place came to be credited to Churchill himself by 1946. And this would be the version that would become more firmly linked to Churchill in later years, through Gowers' Plain Words and other promulgators of the story. I can't help thinking that Churchill would have been quite happy to get credit for the original anecdote, since it was more memorable (and less confusing to editors!) than his actual comment of 1944.

If you have a Play Station Portable, for $69.90 you can apparently buy a "game" called TalkMan that uses speech recognition (and synthesis?) software to do speech-to-speech translation among English, Chinese, Japanese and Korean. There's a description and review here. This page suggests that it's actually "a voice-activated translation software application designed to teach English to Japanese PSP owners". Somehow Language Log's review copy was lost in transit :-), but if I get access to one, I'll post a review.

Posted by Mark Liberman at 02:15 PM

Trends in presidential disfluency

Remember the town meetings about social security reform? It wasn't all that long ago, really. This was back in those halcyon days before the presidency was beset by the worst of its recent public relations troubles, and so I thought it would be a good point of comparison for evaluating a possible uptick in POTUS disfluency. And after analyzing the recordings, I have to say, it looks like we're talking about deep-seated tendencies, not just the expression of a transitory problem.

In the first of the town hall meetings that I looked at, the moderator begins the session by asking about a just-released USA Today poll showing that most voters were against having the government invest social security funds in the stock market. The president starts his answer this way (with blank lines indicating significant pauses):


I think first- I think there

a couple of explanations, first of all

we live in a time where

people are

using technology

to become more and more self sufficient and to get more and more information directly.

I mean that the- the internet is the fastest growing

uh communications organism in human history.
And I think that- so I think


The relevance of the internet is unclear -- people are skeptical because they're too well informed? And it's hard to imagine a lamer bit of closing rhetoric than "And I think that- so I think [pause] *that*".

The President's answer continues:

Secondly I think there's uh always been a healthy scepticism of


And thirdly

government hasn't been in very great favor over the last uh

seventeen or eighteen years, although it's doing better now than it was uh
a few years ago.


I think-

in public esteem, all the surveys also show that.

So his first point is the internet, his second point is the general public skepticism about government, and his third point is the general public skepticism about government. A compelling logical structure, here.

He mentions in his third point that the government is doing better now, starts to go on to say something substantive, and then jumps back to clarify that the dimension in which the government was doing better is the dimension of public esteem. It's good to have that cleared up -- otherwise we might have thought that the government was doing better at, say, governing.

The president continues:

I think the real question is

from my point of view -- we ought to get down to the merits of this --

first question you have to ask yourself is


a portion of the social security



go into

securities? Now- into stocks?


if they should go into stocks, or into corporate bonds

should that decision be made according to individual accounts...

OK, good, 54 seconds and 146 words into the answer, we get to the first question you have to ask yourself. Having gotten to the point, the president expands on it:

Would um would the g- uh would- and I think- but I think most people just think uh

if the risk is gonna b- if there's gonna be a risk taken, I'd rather take it than have the government take it for me, I don't think it's very complicated, so I think that

those who believe that- that it's safer and better

for people

to have the public do the investment, or the government do the investment, have the-

have to bear that burden.

Well, that certainly clarifies everything.

When I wrote about President Bush's 11/16/2005 Kyoto press conference, I observed that the passage I quoted was "surprisingly chaotic, given that the question was a predictable one and the answer is a routine piece of diplomatic boilerplate". But you could say the same thing about the social-security Q&A just discussed. The question was a predictable softball right over the center of the plate, an opportunity to bang a prepared answer over the rhetorical fence. Instead, the president's answer was an incoherent mass of rambling disfluency.

However, at this point, I have to confess that I may have misled you. At least, I've tried.

The author of the quoted remarks was President William J. Clinton. The occasion was a Town Hall Meeting on Social Security Reform, held at the University of New Mexico, in Albuquerque, on July 7, 1998. (The recording, which alas is not available on line, was sent to me by the William J. Clinton Presidential Library.)

Now, President Clinton is widely believed to be a first-rate public speaker. For example, at the biannual Public Address Conference in September of 1998, Prof. John Murphy of the University of Georgia said that

Clinton's expertise at public speaking ... may be likened to that of someone with a great ear for music -- someone who can hear a song once and then play it perfectly on a piano, according to Dr. John Murphy.

"Rhetorical style is quite important to a presidency, and Clinton is a master at it, and is, with the possible exception of Ronald Reagan, the most effective public speaker as president since Franklin Roosevelt," said Murphy ...

... Clinton has a striking ability virtually to become part of any audience he is addressing, so that his message is enormously effective. ...

"Clinton's rhetorical success is a melding of all kinds of discourse ... " said Murphy. "Now he's doing what he's very good at indeed -- trying to change the conversation from issues of impeachment to the problems of the American people."

And his former director of speechwriting, David Kusnet, underlines the role of Clinton's perceived ability to think and talk on his feet:

Clinton's talent for extemporaneous speaking influenced the work of his speechwriters, according to Kusnet. "All we really did was just take notes on what Clinton was saying and just type it up and give it back to him and then he'd change it again."

Let's note for the record that the 7/7/1998 Albuquerque Town Meeting took place during a key period of the Monica Lewinsky scandal's timeline. The Drudge Report broke the story on 1/19/1998, and the Washington Post first reported it on 1/21/1998. Paula Jones' case was dismissed by the judge on 4/1/1998. Linda Tripp began testifying before the grand jury on 6/30/1998, and Lewinsky was granted immunity by Starr's office on 7/28/1998. It's reasonable to guess that President Clinton was under considerable stress, and may not have had time to prepare properly for the discussion.

Based on my own experience as an interested citizen, I've always shared the view of Bill Clinton as a master of rhetoric, and it's no part of my purpose here to try to debunk that perspective. As Aristotle told us, there's more to rhetoric than words, sentences and logical arguments. You can't measure ethos and pathos by counting pauses, self-corrections, awkward sentences, illogical progressions and unhappy word choices. And there are stretches later in the same Albuquerque meeting when Clinton also handles the logos part very well indeed.

My point is that people who don't like George W. Bush, and have focused that dislike onto the (I believe untrue) stereotype that he is an incoherent and disfluent speaker, should beware of confirmation bias. As I wrote earlier, "You can make any public figure sound like a boob, if you record everything he says and set hundreds of hostile observers to combing the transcripts for disfluencies, malapropisms, word formation errors and examples of non-standard pronunciation or usage." Most of the time, you don't need hundreds of observers or lots of recordings -- I found my evidence in the first two minutes of the first Clinton recording I listened to.

Of course, if the alternatives are having Ken Starr spend four years and $40 million combing through your financial records and personal activities in search of scandal, or having Jacob Weisberg publish a regular magazine feature, four books and a yearly calendar detailing your "accidental wit and wisdom", I guess the choice is clear.

In my opinion, both Starr and Weisberg could learn something from John 8:7. A deep-seated tendency to disfluency and incoherence is part of the human condition. The fact that we can sometimes transcend our nature and achieve moments of eloquence is a miracle for which we should give daily thanks.

[For completeness, here is the moderator's initial question in full:

I guess I'd like to take the mod- moderator's prerogative to uh ask the president the first question
and it's about an interesting poll that appeared in uh USA Today this morning, Mr. President,
which voters said,
two thirds uh of the voters said
that they liked this idea of private investment accounts
but most of them also say that they don't want the government investing their money for them.
So how do you explain that?

An mp3 of the whole Q&A is here.

It's also worth noting that Clinton (as I understand it) then favored some form of private investment accounts as part of social security reform. However, the fact that this position was controversial even within his administration, and certainly within the Democratic Party, may also have made it difficult for him to give a straightforward answer to the question.]

November 25, 2005

Going it together

In a post earlier today, Geoff Pullum observed that his grammatical intuitions don't countenance the forms goes, went or gone in the idiom "go it alone". I agree with Geoff about "went it alone", but my intuitions about goes and gone are different. Thus I don't notice any grammatical difficulties when I read that Tony Blair told the House of Commons on March 12, 2003

"What is at stake here is not whether the US goes it alone or not, but whether the international community is prepared to back up the instructions it gave to Saddam Hussein..."

or when I learn that George W. Bush told CNN on August 13, 2004 that

"I think to say we've gone it alone really does denigrate the contributions of other countries."

However, I know better than to trust my grammatical intuitions very far. If Geoff were nearby, I'd suggest that we try his proposal to measure whiskey-sipping rates as a proxy for grammaticality judgments. I can see all sorts of practical difficulties in experimental design, but over the course of a weekend, I have no doubt that we could resolve them. Since he isn't here, I'll have to fall back on the easiest grammatical proxy to explore from one's armchair: corpus frequency.

Rather than fight my way through the difficulties with using Google counts for this purpose, I decided to use the counts from a corpus of 2.84 billion words of news text available at LDC Online. My method is to compare the relative frequency of various forms of to go in the idiom "go it alone" to the frequency of the same forms in other frames, such as "go shopping", "go fishing" or "go home".

__ it alone
__ shopping
__ fishing
__ home

Note that these numbers are consistent with the counts that Geoff got on a smaller news corpus: 48 for "go it alone", 8 for "going it alone", and zero for "goes it alone, "gone it alone" and "went it alone". However, with a corpus of several billion words rather than several million words, we have a more powerful grammatical telescope, so to speak.

Note also that the first row of the table provides little comfort for my intuitions -- "went it alone" (which feels ungrammatical to me) is actually 32% commoner than "gone it alone" (which feels grammatically OK to me). However, that's not the end of the story. Since the base rates of the frames are quite different, let's express the counts as a fraction of the count for "go it alone". Plotting the result, we can see that "went it alone" is enormously less frequent that we would expect, given the relatively frequency of went in other frames.

In fact, the relative frequencies of go, going, goes and gone are similar across the four frames, except that "gone fishing" (which is a fixed expression in its own right) is commoner than the others.

Does this mean that my perceptions of relative frequency are closer to the truth than Geoff's? Not really: if we look more closely at the table of relative frequencies, we can see that goes and gone are quite a bit rarer in the frame __ it alone than before shopping, fishing and home. Perhaps Geoff is just setting his judgmental thresholds at a more discriminating level than I am:

__ it alone 1 0.220 0.036 0.011 0.014
__ shopping 1 0.198 0.083 0.056 0.437
__ fishing 1 0.310 0.111 0.277 0.354
__ home 1 0.230 0.054 0.066 0.237

We can see this better in the graph if we use a log scale on the y axis:

[You shouldn't trust this analysis too far -- we need to investigate the cross product of more frames and more verbs, and we should fit a model, not just look at some tables and graphs. And the results should definitely be cross-validated with those whiskey-sipping rates that Geoff mentioned.]

Going it alone

On page 160 of Jonathan Kellerman's novel Flesh and Blood (Ballantine, 2001) a character says, "Far as I recall, she went it alone." I was struck by the fact that I could never use that latter phrase. I don't remember previously noticing this, but for me the idiom go it alone is extremely limited in its contexts of occurrence: there can be no tense inflection on go, nor past participle inflection. That is, I find *went it alone ungrammatical; *gone it alone is also terrible; and I think *goes it alone is bad as well. As a double-check, I searched the Wall Street Journal corpus, and found 48 tokens of go it alone, 8 of going it alone, and nothing else. Of course, Google as usual provides examples of things you never thought you'd find: 15,000 hits for went it alone, and nearly as many (13,500) for gone it alone. There's a lot of text out there, much of it written by people who apparently don't speak quite the same variety of English that I do. So I'm not saying that these forms don't occur, or that they shouldn't. Rather, the puzzle is why the inflected forms of this idiom should sound so terrible to me when (clearly) there is no general grammatical principle ruling them out and some people use them. The only serious possibility is that I, like other human beings, must be quite sensitive to unconsciously noted and tabulated statistics about frequency of patterns. It has become well known since the work of Jenny Saffran and others from about 1996 that even newborn babies are highly sensitive to statistical patterns in speech that they hear. (You test them by such stratagems as detecting a shift of gaze, or a change in sucking rhythm while feeding, when a stimulus that breaks the pattern is presented.) Clearly, it isn't just babies that unconsciously keep track of how frequently they hear what. And in this case I just happened to notice explicitly the reaction in myself, so no experimental detection of gaze shift or whisky-sipping rate was needed.

By the way, the reason we really have to consider this an idiom is that the verb go is strictly intransitive: it can't take normally take a direct object: *Let's get our coats on and go it!. And in the special usage go it alone, the it cannot be replaced by an ordinary noun phrase (*Let's get our coats on and go it!. There are all sorts of other syntactic peculiarities, like that there's no passive (*It has been gone alone by many people); and of course the meaning is somewhat unpredictable: I think go it alone means, very roughly, something like "take up or continue some endeavor despite lack of the assistance of others that would in some sense have been preferable". You don't refer to it as going it alone if you just decide to get lunch at a diner without inviting anyone else along; that's just lunch. Going it alone has got to involve something a little bit bold or unusual in that normally there would have been an expectation that others would be needed. I think that's accurate (but hey, don't trust me on semantics, it is not my normal schtick).

Parli Berluschese?

American political discourse may have spawned Scalito, Fitzmas, and Miered, but it looks like Italy is way ahead of the U.S. in the neologizing game. The Telegraph reports on the publication of a new dictionary of Italian neologisms (2006 parole nuove by Valeria Della Valle and Giovanni Adamo), which includes dozens of coinages based on the names of political leaders. Not suprisingly, the largest number of neologisms have been derived from the name of Prime Minister Silvio Berlusconi, who has dominated the country's political landscape since his election in 2001 (an unusually long ministerial tenure for Italy). According to Guido Bonsaver, fellow of Italian at Pembroke College, Oxford, "All these new words are very much the result of the spirit of resentment that has evolved around Berlusconi. It is a way of expressing anger or irony."

Here are the Berlusconi neologisms listed in the Telegraph article and an accompanying sidebar:

berluschese: the populist political language Berlusconi speaks
berluschista: a supporter of Berlusconi 
berlusconardo: someone who is in the close circle of Berlusconi
berlusconeide: relating to Berlusconi's "epic" political journey or career
berlusconite: a condition, illness or syndrome of someone who is excessively optimistic, who tends to distort fact and reality to paint a rosier picture, typical some say, of Berlusconi
berlusconizzante: seeking to behave, act like, or carry oneself like Berlusconi
berlusconizzare: to turn something into Berlusconi's property; or, to adopt the strategies of Berlusconi
berlusconizzarsi: to behave like Berlusconi; or, to succumb to the style of Berlusconi
berlusconizzato: "berlusconised," used to describe a convert to Berlusconi's ideas or an entrant to his circle
neoberlusconismo: Berlusconi's latest brand of political thought
postberlusconiano: that which comes after Berlusconi

A full list in Italian, including a number of neologisms based on the names of Berlusconi's colleagues and rivals, is provided by the newspaper Corriere Della Sera.

November 24, 2005

Oops, we lost North America

Happy Thanksgiving to all Language Log readers. I just received a message from the Treasurer of the International Phonetic Association (who is in Scotland, where they know nothing of this Thanksgiving holiday), who told me something that made me thankful:

Dear Colleague

Thank you for letting us know if you had received JIPA 35, i, 2005 or not.

CUP have confirmed that there was a glitch in their distribution system in North America, as a result of which very few American members received their copy of the journal.

CUP have told us that they will be despatching copies as a matter of priority on Monday of next week.

So that's good; the missing journal issue will soon be on its way. But I have to confess that (looking a gift horse in the mouth) I felt strangely dissatisfied with the wording. It's that bit about it being a "glitch". Surely a glitch is an inherently trivial and sporadic departure from normal operation of some process?

Here's what Webster's says:

1 a : a usually minor malfunction <a glitch in a spacecraft's fuel cell>; also : 2BUG 2 b : a minor problem that causes a temporary setback : SNAG 2 : a false or spurious electronic signal

The sole, defining, task of a distributor, the absolute sine qua non, is to distribute. Cambridge University Press can't say that failure to distribute a whole issue of a journal to North America (the largest and richest market in the world) is a glitch. It's a complete collapse of the only thing that could entitle them to be called distributors at all.

Consider: suppose the hairdresser accidentally nicks one of your eyebrows with the clippers. That might be called a glitch. But now suppose they attach a plastic sheet round your neck and sit you in a corner for an hour but nobody ever comes to see you and then they tell you that you have to go, that isn't a "glitch". It's a dereliction. Let's look that in the eye. Call a spade a spade.

I think I know what Lady Bracknell of Oscar Wilde's "The Importance of Being Earnest" would have said about the situation: for CUP to lose one issue of a journal may be regarded as a misfortune; to lose virtually the whole North American subscribership looks like carelessness.

Posted by Geoffrey K. Pullum at 11:43 AM

Life in these, uh, this United States

It's Thanksgiving, the high holy day of the American civil religion, and as good a time as any to reflect on the terms America and the United States. In her always enlightening column "The Word," Jan Freeman of the Boston Globe recently examined a peculiar usage by President Bush in a speech at the end of his Latin America trip earlier this month:

Freedom is the gift of the Almighty to every man and woman in this world — and today this vision is the free consensus of a free Americas. It is a vision ... that puts what was once a distant dream within our reach: an Americas wholly free and democratic and at peace with ourselves and our neighbors.

Bush's usage of Americas is not simply the latest example of the President's penchant for construing ostensibly plural nouns as grammatically singular. (That quirk is mostly limited to combining plural nouns with singular copulas; see, for instance, the most recent Bushism catalogued by Jacob Weisberg, in which Bush informs his South Korean hosts, "I know relations between our governments is good.") This was a prepared speech, not spontaneous discourse, and the choice of singular Americas was a conscious one by Bush's speechwriters (the usage also appeared in a Bush speech in June at a meeting of the Organization of American States).

As Freeman writes, singular Americas is being pressed into service by speechwriters and policymakers in need of a simple, unifying term for the nations of the Western Hemisphere now intertwined in various political and trade agreements. It's possible to avoid the singular vs. plural question by using Americas in contexts without overt grammatical markers for number (as in Bush's statement, "Ensuring social justice for the Americas requires choosing between two competing visions"). But when Americas appears in concord with determiners, verbs, and pronouns marked as singular, it sounds odd to those outside of hemispheric policy circles.

Freeman rightly observes that this transformation from plural to singular mirrors the history of the phrase the United States. The change from the United States are to the United States is was not at all smooth, and has even served as a linguistic emblem for the nation's own turbulent history: "the Civil War is often credited with (or blamed for) transforming 'the United States' into a singular noun," Freeman writes. But how much truth is there to the claim that the Civil War was the watershed moment for the singularization of the United States, and how did that idea get spread around in the first place?

Other milestones for the shift in usage have been proposed (such as the War of 1812), but it's the Civil War theory that has had the most resonance in the popular imagination. The claim received a great deal of attention when it was made by the historian Shelby Foote (who passed away in June) in Ken Burns' much-watched PBS documentary series The Civil War, first broadcast in 1990. In an interview for the documentary that appeared in the companion book The Civil War: An Illustrated History, Foote said:

Before the war, it was said "the United States are." Grammatically, it was spoken that way and thought of as a collection of independent states. And after the war, it was always "the United States is," as we say today without being self-conscious at all. And that sums up what the war accomplished. It made us an "is."

Foote's assertion echoes one made in 1909 by the renowned classics scholar (and former Confederate soldier) Basil Lanneau Gildersleeve in a lecture collected in Hellas and Hesperia; or, The vitality of Greek studies in America:

It was a point of grammatical concord which was at the bottom of the Civil War — "United States are," said one, "United States is," said another.
—quoted in Soldier and Scholar: Basil Lanneau Gildersleeve and the Civil War by Ward W. Briggs Jr. (1998), p. 22

Though Gildersleeve's quote circulated widely (and may have been the basis for Foote's argument), he wasn't the first to put forth this idea. The earliest example I've found so far appeared in the Washington Post in 1887:

There was a time a few years ago when the United States was spoken of in the plural number. Men said "the United States are" — "the United States have" — "the United States were." But the war changed all that. Along the line of fire from the Chesapeake to Sabine Pass was settled forever the question of grammar. Not Wells, or Green, or Lindley Murray decided it, but the sabers of Sheridan, the muskets of Sherman, the artillery of Grant. ... The surrender of Mr. Davis and Gen. Lee meant a transition from the plural to the singular.
—The Washington Post, Apr. 24, 1887, p. 4

Four years later, an article by G. H. Emerson (available on the American Periodicals Series database) elaborated on the argument:

The many histories are careful to distinguish between the Colonies and the States, but they have failed to impress the distinction, the immense and radical distinction, between the States and the United States. Early in the period of the Revolution there was, as just noted, a feeble incipiency of a Union in the Articles of Confederation, proposed in 1777 and ratified in March, 1781. For about a decade the states, under the technical name, "The United States of America," were a Confederacy; but when the Constitution was adopted the United States was. "They" gave place to "it." And as Mr. Fiske in his latest book, "Civil Government in the United States," has noted, the change from the plural to the singular was vital, though it has taken a War of Rebellion to make the difference unmistakable. The sovereign States were consolidated into a unit — a unit indeed with important limitations — when the Federal Constitution was adopted. The United States began not their but its history with the first inauguration of Washington as Chief Magistrate.
—"The Making of a Nation," by G. H. Emerson, in The Universalist Quarterly and General Review, Vol. 28 (
Jan. 1891), p. 49

Emerson argues that the United States became notionally singular with the ratification of the Constitution and the inauguration of Washington, though it took the Civil War to "make the difference unmistakable." But this doesn't really tell us anything about usage. Indeed, not only does the Constitution consistently use the plural construal, but so do official texts in the immediate aftermath of the Civil War — as with the pronominal anaphora used in the 13th Amendment:

Neither slavery nor involuntary servitude, except as a punishment for crime whereof the party shall have been duly convicted, shall exist within the United States, or any place subject to their jurisdiction.

So how "unmistakable" could the shift from plural to singular have been? In any case, Emerson at least provides a source for the claim: Civil Government in the United States by John Fiske (1890). The text is available on Project Gutenberg, but there's nothing specifically linking the shift in usage to the Civil War:

From 1776 to 1789 the United States were a confederation; after 1789 it was a federal nation. The passage from plural to singular was accomplished, although it took some people a good while to realize the fact. The German language has a neat way of distinguishing between a loose confederation and a federal union. It calls the former a Staatenbund and the latter a Bundesstaat. So in English, if we liked, we might call the confederation a Band-of-States and the federal union a Banded-State. There are two points especially in our Constitution which transformed our country from a Band-of-States into a Banded-State. [etc.]

So Fiske only said "it took some people a good while" to move from the plural to singular construal, while Emerson explicitly pointed to the Civil War as the crucial moment. Again, like the Gildersleeve quote, none of this tells us anything about actual changes in usage. The earliest analysis I can find that tackles the usage question with actual research is a May 4, 1901 column in the New York Times book review by John W. Foster, secretary of state under Benjamin Harrison. The headline for Foster's piece reads: "ARE OR IS? Whether a Plural or a Singular Verb Goes With the Words United States." This is evidence that four decades after the Civil War, the plural vs. singular question was still open to debate. (Jan Freeman notes that usage guides of the early 20th century were divided on the issue, with Ambrose Bierce coming out against the singular in 1909, the same year as Dr. Gildersleeve's pronouncement.)

Secretary Foster's column was in response to a book review of A Century of American Diplomacy that took issue with the book's singular construal of United States on the grounds that the Constitution treats it as plural. Foster first notes that the Constitution also construes such nouns as House of Representatives, Senate, and Congress as plural, a usage later abandoned in American English. He then writes:

The fact that the plural use of the verb occurs in the Constitution in connection with that phrase is not of itself a controlling reason. It must have a deeper cause. Is it found in the fact that this Nation is made up of a collection of States, and that they cannot be ignored in the use of the phrase? It is naturally suggested that an event occurred in the sixties which relieved our language from that servitude. I do not, however, think that event was the only, or the controlling, reason why the use of the singular verb is permissible, and even more proper. The oneness of our Government was proclaimed long before the first shot was fired at the flag over Sumter.

Foster disputes the already circulating claim that the Civil War was entirely responsible for the change in usage. He then examines the writings of various antebellum statesmen and finds that figures such as Hamilton, Jefferson, Clay, and Webster did indeed tend to use the plural form, but more often tried to avoid the problem by using a singular substitute like the Union, the Republic, or the Government of the United States. Foster conjectures that earlier writers were more concerned with euphony, while later writers focused on "the true significance of the words." He also notes that nations like Great Britain, France, and Germany have often been treated as feminine singular entities, based on the Latin forms Britannia, Gallia, Germania, etc. The feminine pronoun she was often used for the United States as well, but he says that "of late years we have gradually drifted into the custom of adopting the neuter it, which makes necessary the use of the singular verb."

After providing a long list of public figures who used the singular form both before and after the Civil War, Foster concludes:

The result of my examination is that, while the earlier practice in referring to the "United States" usually followed the formula of the Constitution, our public men of the highest authority gave their countenance, by occasional use, to the singular verb and pronoun; that since the civil war the tendency has been toward such use; and that to-day among public and professional men it has become the prevailing practice.

As it turned out, Foster's research had some important policy implications. A Jan. 8, 1902 article in the Washington Post reported that Foster's work (which evidently was reprinted as a pamphlet) had persuaded the House of Representative's Committee on Revision of the Laws to rule that the United States should be treated as singular, not plural.

But Foster's careful, gradualist argument did not capture the public imagination the way that Gildersleeve's more forceful version did. A New York Times article about Gildersleeve on Oct. 21, 1923 (shortly before his death) restated his comment from 1909:

A Confederate soldier and officer during the Civil War, which he used to say was fought to settle a question of grammar (that is, the question as to whether "the United States" was singular or plural), he carried his pocket Homer till the day that he lost not only it but his pistol and his horse and all but his life.

Still, even Gildersleeve's formulation seems to be more of a rhetorical device than an observation based on the study of usage. It took Foote and others to transmogrify this rhetoric into the unsupportable claim that the United States was always construed as plural before the Civil War and always as singular afterwards.

Nowadays the plural form still lingers in certain set idioms, such as these United States. A common antebellum designation for the country, these United States survived in the 20th century in folksy idiomatic usage. It had something of a revival in the years after World War II, as evidenced by the Reader's Digest feature begun around that time, "Life in These United States." Harry Truman seemed particularly fond of the construction — searching on the website for the Truman Presidential Museum & Library turns up two dozen citations for these United States in speeches during his administration (mostly in the election years of 1948 and 1952). Even official government agencies occasionally use the phrase: the EPA issued a study on "How We Use Water In These United States," while the FBI reports on "Crime in These United States." So even now, the pluribus sometimes outweighs the unum.

When Ambrose Bierce made his plea against singular United States, he wrote:

It would be pretty hard on a foreigner skilled in the English tongue if he could not venture to use our national name without having made a study of the history of our Constitution and political institutions. Grammar has not a speaking acquaintance with politics, and patriotic pride is not schoolmaster to syntax.

Bierce is usually quite reliable in such matters, but in this case he misread the situation. Sometimes grammar has more than a speaking acquaintance with politics, even if they make strange bedfellows.

(Much of the above appeared in a post I wrote last year for the alt.usage.english newsgroup. Thanks to Donna Richoux and other a.u.e regulars for their contributions.)

Ab surd

Eugene Volokh writes:

Reading a book about the history of math, I came across the word surd. Never heard of it before, despite my many years of math education. I probably won't use it, precisely because if it's obscure to a fairly math-savvy person like me, it's probably obscure to others, too. But it's good to know, if only for Boggle purposes.

The AHD gives two definitions:

1. Mathematics An irrational number, such as √2. 2. Linguistics A voiceless sound in speech.

and a two-step etymology:

Medieval Latin surdus, speechless, surd (translation of Arabic (jaḏr) ’aṣamm, deaf (root), surd, translation of Greek alogos, speechless, surd), from Latin.

The OED explains the etymology as

[ad. L. surdus (in active sense) deaf, (in pass. sense) silent, mute, dumb, (of sound, etc.) dull, indistinct. The mathematical sense ‘irrational’ arises from L. surdus being used to render Gr. ἄλογος (Euclid bk. x. Def.), app. through the medium of Arab. açamm deaf, as in jaðr açamm surd root. ]

Pat Ballew's page "Origins of some arithmetic terms" attributes the following more extended account to Jeff Miller:

The Arabic translators in the ninth century translated the Greek rhetos (rational) by the Arabic muntaq (made to speak) and the Greek alogos (irrational) by the Arabic asamm (deaf, dumb). See e. g. W. Thomson, G. Junge, The Commentary of Pappus on Book X of Euclid's Elements, Cambridge: Harvard University Press, 1930 [Jan Hogendijk].

This was translated as surdus ("deaf" or "mute") in Latin.

As far as is known, the first known European to adopt this terminology was Gherardo of Cremona (c. 1150).

Fibonacci (1202) adopted the same term to refer to a number that has no root, according to Smith.

Surd is found in English in Robert Recorde's The Pathwaie to Knowledge (1551): "Quantitees partly rationall, and partly surde" (OED2).

According to Smith (vol. 2, page 252), there has never been a general agreement on what constitutes a surd. It is admitted that a number like sqrt 2 is a surd, but there have been prominent writers who have not included sqrt 6, since it is equal to sqrt 2 X sqrt 3. Smith also called the word surd "unnecessary and ill-defined" in his Teaching of Elementary Mathematics (1900).

G. Chrystal in Algebra, 2nd ed. (1889) says that "...a surd number is the incommensurable root of a commensurable number," and says that sqrt e is not a surd, nor is sqrt (1 + sqrt 2).

The extraordinary Liddell & Scott entry for logos explains that it is the verbal noun of legô (pick up; count, tell; say, speak) and gives this dizzying array of senses and sub-senses (with most cross-references, examples, citations and details edited out):

I. computation, reckoning
  1. account of money handled
    b. public accounts, i. e. branch of treasury
  2. generally, account, reckoning
  3. measure, tale
  4. esteem, consideration, value put on a person or thing
II. relation, correspondence, proportion
  1. generally
  2. Math., ratio, proportion
  3. Gramm., analogy, rule
III. explanation
  1. plea, pretext, ground
    b. plea, case, in Law or argument
    c. in Logic, proposition, whether as premiss or conclusion
    d. rule, principle, law, as embodying the result of logismos
  3. law, rule of conduct
  4. thesis, hypothesis, provisional ground
  5. reason, ground
  6. formula (wider than definition, but freq. equivalent thereto), term expressing reason
  7. reason, law exhibited in the world-process
    b. spermatikos l. generative principle in organisms
    c. in Neo-Platonic Philos., of regulative and formative forces, derived from the intelligible and operative in the sensible universe
IV. inward debate of the soul
  1. thinking, reasoning
  2. reason as a faculty
    b. creative reason
V. continuous statement, narrative (whether fact or fiction), oration, etc.
  1. fable
  2. legend
  3. tale, story
  4. speech, delivered in court, assembly, etc.
VI.verbal expression or utterance ...,rarely a single word, ... never in Gramm. signf. of vocable ..., usu. of a phrase
    a. pl., without Art.,talk
    b. sg., expression, phrase
    c. coupled or contrasted with words expressed or understood signifying act, fact, truth, etc., mostly in a depreciatory sense
    2.common talk, report, tradition
    c.mention, notice, description
    d. the talk one occasions, repute, mostly in good sense, good report, praise, honour ... less freq. in bad sense, evil report
    3. discussion, debate, deliberation
    b. right of discussion or speech
    c. dialogue, as a form of philosophical debate ... hence, dialogue as a form of literature
    d. section, division of a dialogue or treatise ... branch, department, division of a system of philosophy
    e. in pl., literature, letters ... but, also in pl., treatises
VII. a particular utterance, saying
  1. divine utterance, oracle
  2. proverb, maxim, saying
  3. assertion, opp. oath
  4. express resolution
  5. word of command, behest
VIII. thing spoken of, subject-matter
  2. plot of a narrative or dramatic poem
    b. in Art, subject of a painting
  3. thing talked of, event
IX. expression, utterance, speech regarded formally
  2. of various modes of expression, esp. artistic and literary
    b. of the constituents of lyric or dramatic poetry, words ...; dramatic dialogue
  3. Gramm., phrase, complex term ... l. onomatôdês noun-phrase
    b. sentence, complete statement
    c. language
X. the Word or Wisdom of God, personified as his agent in creation and world-government

So apparently the Arabic translators of Greek mathematical texts took alogos in L&S sense II.2, meaning "not a ratio (of integers)", and re-interpreted it as (an extended sense of) something like "not speaking", by reference to the L&S senses VI to IX of logos. (I'm sure they understood the concept of irrationality, the point at issue is just what they took the base sense of alogos to be, for the purpose of creating a calque.) Then [I falsely guessed -- ed.] they connected that to the idea of "not vocalized", as in the Arabic orthographic distinction between "vocalized" text in which short vowels are written, and "unvocalized" (i.e. normal) text in which they are not. (Was the traditional Arabic terminology really the same for rational vs. irrational numbers and vocalized vs. unvocalized text? I don't know... [But Ben Zimmer did -- see below]) Then when the Italians translated Arabic mathematics, they apparently used a literal translation of the Arabic term aṣamm for the irrational side of the opposition, but reverted to a lexical derivation of Latin ratio for the rational side.

The linguistic term surd is not used any more, at least not in any literatures that I read. For me, it evokes the days when phonetics involved instruments of glass, wood, leather, ivory and brass.

For the punchline, we go back to the Volokh post, where jc comments that

the link provides a good antonym: "that's not a surd, that's a sonant!" (that's for the other definition of surd - i'm not sure i understand what a voiceless sound in speech would be, though - if one's communication were reduced to such voiceless sounds, however, would it be a reductio ad surdum?)

[via email from Linda Seebach]

[Though it spoils the joke to explain the terminology, voiceless speech is of course speech during which the vocal cords are not vibrating; and a voiceless sound is one like [s] or [k] where this condition normally obtains.]

[Update: Karen Davis writes:

I first ran across this word in high school, in Cordwainer Smith (pen name of Paul Linebarger)'s short story "Alpha Ralpha Boulevard", in a quatrain apparently original with him:

She wasn't the woman I went to seek;
I met her by the merest chance.
She did not speak the French of France,
But the surded French of Martinique.

LION is ignorant of any works containing "surded", French or otherwise, but surd does turns up Erasmus Darwin's 1799 Botanic Garden, which includes in Part II ("Containing the Loves of Plants"), Canto II, in a sort of ode to the invented goddess Papyra:

119 ---Three favour'd youths her soft attention share,
120 The fond disciples of the studious Fair,
121 Hear her sweet voice, the golden process prove;
122 Gaze, as they learn; and, as they listen, love.
123 The first from Alpha to Omega joins
124 The letter'd tribes along the level lines;
125 Weighs with nice ear the vowel, liquid, surd,
126 And breaks in syllables the volant word.

Papyra's other two disciplines inscribe mathematics ("in deepening ranks his dexterous cypher-train") and music ("on four [sic] concordant lines / prints the lone crotchet, and the quaver joins").

And Delmore Schwartz used surd several times, always aspiring to the same Euclidian metaphor as in this passage about Coriolanus (from the seasonally appropriate work Act Four: "A Goodly House, the Feast Smells Well"):

176 And now the Volscian camp. As he shakes Rome,
177 He gnaws Aufidius with every tooth,
178 Unknowingly, though vowed to show, in all,
179 Humility, and move with modesty,
180 And loyalty.
181                               But the true surd
182 Is irreducible. The individual
183 Is uncontrollable. To him, to him
184 The soldiers draw, forget Aufidius,
185 Render him virtual kingship.

But it seems that Schwartz has forgotten his lessons, because the true surd is not an irreducible ratio, i.e. a ratio of mutually prime integers, but rather a number that can't be expressed as a ratio of integers in the first place.]

[Ben Zimmer corrects my guess that the Arabic deaf/mute vs. speaking metaphor might have been connected to the terminology of orthographic vocalization:

As far as I know, Arabic doesn't apply the deaf-mute vs. speaking metaphor to the voweling of texts as you suggest. Vowel markings are called Haraka:t, from the root حرك , and the act of voweling is denoted by a verb from the same root, literally meaning 'to set in motion'.

Now that I check the Hans Wehr dictionary, I see that the word for 'deaf' (aSamm or أصم) does actually have a linguistic as well as a mathematical sense, but it has nothing to do with voweling. The word can refer to a geminate verb, i.e., a triliteral verb where the second and third radicals are the same — also called mediae geminatae.


[ John Cowan sent in this quatrain by Lewis Carroll:

And what are all such gaieties to me,
   Whose thoughts are full of indices and surds?
x2 + 7x + 53
    = 11/3

This is one of those rare poems that includes a metrically scanned equation. The only other one that comes to mind at the moment is the limerick

    . #     .   .    #    .   .  #
    I used to think math was no fun
      .    .  #   .    .   #  .   .   #
    'Cause I couldn't see how it was done
     .  #  .     .  #
    Now Euler's my hero
     .  .  #   .    .  #
    For I now see why zero
    .  #  .   .  # .   .  #
    Is e to the pi i plus 1.

Caroll's quatrain includes a quadratic iambic pentameter (perhaps the only one so far explicitly composed as such?):

  .    #      .   # .  #   .   #  .    #
  x squared plus seven x plus fifty three

and then a trimeter with an inverted initial foot:

.  #   . # .    #
equals eleven thirds

Since Dodgson was a mathematician who loved puzzles, I wonder if there's a message in there? A bit of search on LION reveals that the quotation comes the first of Four Riddles, and Carroll's note says that

No. I. was written at the request of some young friends, who had gone to a ball at an Oxford Commemoration---and also as a specimen of what might be done by making the Double Acrostic a connected poem instead of what it has hitherto been, a string of disjointed stanzas, on every conceivable subject, and about as interesting to read straight through as a page of a Cyclopedia. The first two stanzas describe the two main words, and each subsequent stanza one of the cross "lights."

The whole poem is a bit long for this post, so you can find it here.

Maybe someone (John?) can tell me what word the equation is a clue to, or more precisely why Carroll chose that particular equation, among the large number that would have scanned and rhymed equally well.

A bit of high-school algebra yields

x2 + 7x + 148/3 = 0

x = (-7 ± sqrt(72 - 4*148/3))/2 = (-7 ± sqrt(49 - 592/3))/2 = (-7 ± sqrt(-148 1/3))/2

And since sqrt(-148 1/3) will be i * sqrt(148 1/3), there's a glimmer of something about "I surd one..." -- but maybe it doesn't mean anything after all, at least not gone at that way.


[I thought I was done with surd, but here is another update.

Trevor ap Patnarthur observed by email that "most Brits who went to downtown schools will be aware that a Surd is in fact a Sikh who wears the full rig. Hence Surd jokes."

This is apparently a re-spelling of Sardar -- Google indexes 54,100 {"Sardar jokes"} pages against only 634 for {"Surd jokes"}. The OED thinks that Sardar is a version of Sirdar, from [Urdū (Pers.) sardār, f. Pers. sar head + dār possessor.], meaning "in India and other Eastern countries, a military chief, a leader or general of a force or army". Hobson Jobson has an entry for sirdar, "s. Hind. from Pers. sar- dar, and less correctly sirdar, 'leader, a commander, an officer'; a chief, or lord; the head of a set of palankinbearers, and hence the 'sirdar-bearer,' or elliptically 'the Sirdar,' is in Bengal the style of the valet or body-servant, even when he may have no others under him."However, there are no pages of "Sirdar jokes", spelled as such.

I surmise that Sardar became a term for Sikhs because of their traditional role in the Indian army. The honorific form Sardarji is also apparently used -- there are 45,100 pages of {"Sardarji jokes"}.

The vowel change from Sardar to Surd is analogous to the change from pandit to pundit -- it reflects the Indic pronunciation of short /a/. The reason for the loss of the -ar is less clear to me.

Under whatever spelling, Surd/Sardar jokes seem to be a variant of Newfie/Polish/Chelm/Blonde/Aggie/men jokes:

Q: Why do men like surd jokes??
A: Because they can understand them.

OK, back to mashing the sweet potatoes. ]

Further adventures in "self" expression

I've received a number of interesting responses to my recent post on the expression, (So) I say(s) to myself, "Self...", which has still only been documented since the 1980s, surprisingly enough. [*] Many readers are convinced that they've heard it used in old movies or comedy routines. It does have the feel of a well-worn vaudeville line, something that might have been introduced to film or radio audiences by a former vaudevillian like Jimmy Durante, William Demarest, or Red Skelton. But if an old-time comic actually used the line, I've found no record of it in the various online databases. (One might expect it to turn up on a database like ProQuest Historical Newspapers, which covers the New York Times, Los Angeles Times, Chicago Tribune, Washington Post, and several other major papers. Surely some entertainment columnist would have alluded to it somewhere along the line.)

One tipster suggests that Bill Cosby used So I said to myself, "Self..." in a standup routine captured on one of his celebrated live albums from the 1960s. This matches the recollection of a few others:

Years ago Bill Cosby had a comedy routine which portrayed self-talk very humorously with the line, "So, I said to myself, 'Self'. . ."
—"The Good Side of Talking to Yourself" by Donna R. Vocate, Boston University College of Communication

I like that Bill Cosby record, you remember that? Where he [says], "I said to myself, 'Self.'"
—"The Cross Pt. 2" by Pastor Star R. Scott, Sword of the Spirit Ministries

I've had the opportunity to listen to several of Cosby's albums from the '60s, and I have yet to come across an instance of this exact expression. So far, the closest approximation I've found appears on the 1966 album Wonderfulness, in a routine called "Niagara Falls." In it, Cosby impersonates the producer/director Sheldon Leonard (who cast Cosby in his breakout role on the television series "I Spy"). Cosby delights in imitating Leonard's vocal mannerisms through a story about a young Leonard taking his wife on a honeymoon to Niagara Falls. Cosby mimics Leonard recalling:

So I said to my bride, "Bride... why don't we take a little dip in the wonderful lake?"

Leonard swan-dives into the lake, only to find it's freezing cold ("My body turned into a giant goose pimple"). At the point in the story where his wife is about to jump in, Cosby-as-Leonard says:

And I said to myself, "Why should I tell her?"

The first example of reported speech in the routine is quite similar to the pattern we're looking for, only with vocative bride rather than vocative self. But when Leonard, voiced by Cosby, recounts his self-address in the second example, vocative self does not appear.

It's possible that Cosby performed another, perhaps more extended, version of "Niagara Falls" where he has Leonard using vocative self. Or he might have used self-talk in another routine. It's also possible that the combined force of the above two examples has led to faulty recollections of Cosby making Leonard address himself as "Self...". (Anyone with a more intimate knowledge of Cosby's oeuvre is welcome to email me at: bgzimmer at-sign ling dot upenn dot edu.)

Regardless of Cosby's role in the popularization of self-talk, the turn of phrase became widespread by about the mid-1980s. Here are the first four appearances on the Usenet archive:, Sep. 6, 1983
If it is a Saturday, though, I would say to myself, "Self, it will undoubtedly be noisy, but it *is* Saturday, you can catch up on sleep later"., Jan. 6, 1985
Then I found out he was moving to a job in a different building, so I said to myself "Self", I said, "This is a nice person who you have liked for a while.", Mar. 11, 1986
Then I saw your Mr. Video's posting that his cable company had it for a while and I said to myself, "Self, what gives?"., July 10, 1986
So I then sez to myself "self, there can't be THAT much difference between RAM chips, can there?"

In these early examples, it's interesting to note that only the last one combines vocative self with the mock-dialectal form says (or sez in its eye-dialect spelling). In my previous post I mentioned that I says is common in American dialect writing, including in forms of self-address like I says to myself, I says... (or, again, with eye-dialect spelling: I sez to myself, I sez..., as it appears in Bret Harte's 1894 work The Bell-Ringer of Angel's and elsewhere). The modern pattern (So) I says to myself, "Self..." thus melds together the much older usage of I says with the newer vocative usage of self.

Based on some additional database research, it appears that there have been a number of popular variations on I says to myself, I says... dating back to the late 18th century, if not earlier. One version inverts the second I says to says I, as in this line attributed to Andrew Jackson in an anecdote that supposedly took place in 1798 (when Jackson was a judge in Tennessee):

And so I says to myself, says I, hoss, it's about time to sing small, and so I did.
—quoted in The Life of Andrew Jackson by Robert V. Remini (1990), p. 44

This variant was common enough to appear in the works of a number of famous 19th-century American and British authors:

After you left me, I began to generalize over my sitiation, and I says to myself, says I, 'Moses Marble, them lads will never consent to sail and leave you here, on this island, alone like a bloody hermit,' says I.
Afloat and Ashore by James Fenimore Cooper (1844)

I'd got money enough, wi' only one daughter to leave it to, an' I says to myself, says I, it's time to leave off moitherin' myself wi' this world so much, an' give more time to thinkin' of another.
Scenes of Clerical Life by George Eliot (1857)

"Often I says to myself, says I, 'I used to mend all the boys' kites and things, and show 'em where the good fishin' places was, and befriend 'em what I could, and now they've all forgot old Muff when he's in trouble; but Tom don't, and Huck don't—THEY don't forget him, says I, 'and I don't forget them.'"
Adventures of Tom Sawyer by Mark Twain (1876)

Sometimes I says to myself, says I, 'Well, I'll be jiggered!'"
Little Lord Fauntleroy by Frances Hodgson Burnett (1886)

This expression was so pervasive that Charles S. Peirce, one of the founders of modern semiotics, used it at least twice as a "vernacular" elucidation of the exchange of signs within the mind:

Now it is needless to say that conversation is composed of signs. Accordingly, we find the sort of mind that is least sophisticated and is surest to betray itself by its language is given to such expressions as "I says to myself, says I."
—1907 manuscript, quoted in "Charles S. Peirce on Objects of Thought and Representation" by Helmut Pape, Noûs 24:3 (June 1990), pp. 383

Meditation is dialogue. 'I says to myself, says I' is a vernacular account of it; and the most minute and tireless study of logic only fortifies this conception.
—review of biography of Alfred Russell Wallace (date unknown), quoted in Peirce's Approach to the Self by Vincent Michael Colapietro (1989), p. xiv

What about inverting both the first and second appearances of I says in the expression to says I? This version is not quite as common in 19th-century texts, but some digging turns up a number of literary examples:

Says I to myself, says I—' that's twice you've done it, my buzzum friend and sweet-scented shrub—but you doesn't do that 'ere again.'
—"The Fastest Funeral on Record" by F. A. Durivage, in A Quarter Race in Kentucky, and Other Sketches, edited by William Trotter Porter (c. 1854), p. 50

Well, I hadn't read a page hardly afore she was asleep, and then I laid down the book; and says I to myself, says I, "What shall I do next?" ... Well, says I to myself, says I, "Suppose it was the devil or a Britisher that was there."
The Sayings and Doings of Samuel Slick by Thomas Chandler Haliburton (c. 1866), p. 249

Says I to myself, says I, 'This poor fellow's got no capital; and he hasn't the head to git capital.'
—"How Sharp Snaffles Got His Capital and Wife" by William Gilmore Simms (1870)

But the best-known example of this variant came in the 20th century, in a line from Ulysses:

Hoho begob, says I to myself, says I.
Ulysses by James Joyce (1922)

In their 1959 book Song in the Works of James Joyce, Matthew Hodgart and Mabel Worthington suggest that this is an allusion to the refrain of the Lord Chancellor's song in Gilbert and Sullivan's 1882 opera Iolanthe :

When I went to the Bar as a very young man,
(Said I to myself—said I),
I'll work on a new and original plan,
(Said I to myself—said I),
I'll never assume that a rogue or a thief
Is a gentleman worthy implicit belief,
Because his attorney has sent me a brief,
(Said I to myself—said I!).

Given the numerous variations on this expression in American and British literature, I'm not so sure that Joyce had Iolanthe in mind when he wrote the line in Ulysses. In fact, it might have been modeled on an earlier song with says I, perhaps predating Gilbert and Sullivan, as suggested by this quote from the author A.C. Benson:

The essay is the reverie, the frame of mind in which a man says, in the words of the old song, "Says I to myself, says I."
—A.C. Benson, "The Art of the Essayist" in Modern English Essays, edited by Ernest Rhys (1922), pp. 50-51

(A later song elaborates the pattern further, with one says I inversion: "I Says To Myself Says I, Say There's The One For Me," by Harry Akst and Jack Yellen, used in the 1929 film Bulldog Drummond.)

These endless variations on the theme demonstrate that the narration of an interior dialogue using the (mock-)dialectal forms I says and says I long ago achieved cliché status. Versions with says I, though once quite common, dropped out of usage, but the I says version has stuck around long enough to join together in new forms with vocative self.

And so I says to myself, "Self," I says, "Have I spent far too much time pursuing this?"

[* Update #1: Lance Nathan comes through with a pre-1980 example of self-talk, in a form that is slightly different from what I had been looking for. It appears in a song performed by Three Dog Night and composed by Allen Toussaint:

So I said to myself
I said "Self, do you see what is sailin' through my soul?"
And I gotta have some more, don't ya know.
—"Play Something Sweet (Brickyard Blues)" by Three Dog Night (1974)

According to All Music Guide, Three Dog Night wasn't the first to perform Toussaint's song; in 1973, the Scottish R&B singer Frankie Miller sang a version of it. (Toussaint himself never released a studio version of the song, though he recorded a live version on the album New Orleans Jazz & Heritage Festival, 1976.)

Nathan also suggests that this might be patterned on a humorous template for making puns on people's names, e.g.: "So I says to the man floating in the water, I says, 'Bob...'"]

[Update #2: Two more uncorroborated leads on comedic sources for the expression: Henny Youngman and Art Carney (as Ed Norton on "The Honeymooners").]

Book review review review

Q Pheevr has begun to review book reviews, calling the results "book review reviews". According to Q, "My hope is that these reviews will help you decide whether to take the time to read the reviews they review, or at least that they will give me an opportunity to make snarky remarks." Q's review of Gale Zoë Garnett's review of Lynne Truss's Talk to the Hand accomplishes something more, raising interesting questions about the nature of imperatives, the meaning of "it", and the evolution of formulaic politeness.

Q's post is by no means the blogosphere's first book review review, nor even the first one to name itself as such. Beatrix at is subtitled "A Book Review Review", and Portifex has a weekly "New York Times Book Review review" in his Daily Blague, and so on. But I had hoped that this post might be the web's first explicit "book review review review", and here the situation is less clear.

If I ask technorati about "book review review review", it tells me that

There are no posts that contain that text yet. Please try again later or add it to your watchlist to track future conversation.

No, thanks all the same. If I search Google for "book review review review", the first few strings that I find either cross punctuation marks

"Just review the Video or DVD, read your book, review, review, review and you will be on your way to learning how to become a professional mixologist."

or reflect a more profound limitation in Google's text indexing algorithm, which allows it find strings that cross not only punctuation but major formatting divisions like these:


Review of the first edition "Blah blah." The New York Times Book Review


Review of the first edition "… Blah blah." The Economist

I don't care enough to look at more than a page or two of those, so I'll just say that this post may or may not be the first use on the web of the phrase "book review review review", in the sense of a review of a review of a review of a book.

Returning briefly to Q's review review, in reference to Lynne Truss' jeremiad against the formulaic politeness of waitstaff:

"Enjoy!" cannot reasonably be interpreted as a command, because enjoyment is not something one can decide to do, but rather an involuntary mental state. The imperative is not generally compatible with such states, as illustrated in (1) and (2):
1. *Suffer from clinical depression!
2. *Desire a glass of beer!
The only rational interpretation of "Enjoy!" is something like "I hope that you will enjoy [the good or service I have just provided you with]."

Fair enough so far. But there are several other conventional imperative-form formulas of benevolence that seem to command states or processes not under voluntary (or for that matter involuntary) control, e.g.

Be well!
Have fun!
Have a safe trip!

It's plausible that these really mean things like "I hope you will be well/have fun" etc., but the relationship is not systematic. It's reasonable to say something like "I hope you will be treated fairly (by the court)" to someone who is about to go on trial, but the pseudo-imperative form "be treated fairly!" doesn't work at all. This is not just a matter of avoiding imperative passives, since "be the beneficiary of a large inheritance" is no better.

I'm sure that there is an extensive literature on this, which I'll link to as soon as someone tells me about it.

[Update: Richard Hershberger writes:

You forgot to check usenet (setting aside for the moment whether or not that counts as "on the web"). Checking Google Groups comes up with a clear hit from 1998, where a thread in comp.lang.python with the subject line "Book review review" includes a reply with the subject line "Book review review review". See The content of the message is not in fact a review of the review of the book review, but that clearly is the sense intended in the subject line.


November 21, 2005

Alphabet wars: an update

When it was first revealed that an abecedary from the 10th century BCE had been unearthed near Tel Zayit, Israel, initial coverage in the New York Times and Pittsburgh Post-Gazette (as well as a follow-up from the Chicago Tribune wire service) suggested that a major scholarly conflict over the artifact's interpretation was looming. The showdown was supposed to have taken place this past week when Ron E. Tappy of the Pittsburgh Theological Seminary and his colleagues presented their findings at conferences in Philadelphia, first for the American Schools of Oriental Research on Nov. 16, and then for the Society of Biblical Literature on Nov. 20. From the initial reports of conference attendees, it appears that Tappy and his co-presenter, P. Kyle McCarter of Johns Hopkins University, took a cautious approach, largely avoiding the contentious debates over biblical history seized upon by the media.

As before, the "biblioblogs" have kept the rest of us informed about the latest news. The first presentation at ASOR did not generate much discussion among the bibliobloggers, though Paul Nikkel of the Deinde blog shared his rough notes on the session. (Among the tantalizing details mentioned by Nikkel is McCarter's explanation that the Tel Zayit stone is not, in fact, an abecedary. Nonetheless, McCarter continued to refer to it as an abecedary for the rest of his presentation on the inscription's paleographical aspects. [*]) The ASOR presentation was apparently rather short (part of a general panel on ASOR-affiliated excavations), without much time for discussion afterwards.

The SBL presentation, however, was the main attraction for those interested in the Tel Zayit stone. Jim Davila reports on his PaleoJudaica blog that the session was held "on Sunday evening from 7:00 to 9:00 in a stifling, standing-only crowded room with several hundred attending." Davila gives a fine summary of the presentation and discussion, as does Christopher Heard on Higgaion. According to Heard, "both Tappy and McCarter stayed away from 'interpreting' the inscription in their prepared remarks, though they were urged to do so, in different ways, during the Q&A." Some audience members were vocal in claiming that the abecedary is proof of a literate state, identified by biblical scholars as the "united monarchy" of Israel and Judah under the reign of David and Solomon. Heard characterizes these interpretations as "jumping the gun," but credits Tappy and McCarter with taking a more even-tempered approach with regards to the artifact. However, Jim West on Biblical Theology asserts that "Tappy seems to draw conclusions that reach beyond the evidence." Joseph Cathey, meanwhile, feels that Tappy and his colleagues will ultimately be vindicated, despite objections from "minimalists" who question the historical reliability of biblical narratives. (Heard responds to Cathey here.)

Finally, one of the participants in the excavation, Michael Homan, shares his first-hand account of the Tel Zayit stone's discovery. Homan does a good job of conveying the excitement that the team experienced when they realized that the scratches on the stone actually constituted a complete Hebrew (or proto-Hebrew) alphabet.

[* Update: On the Ancient Near East mailing list, Peter Daniels questions Nikkel's account and says that McCarter never suggested that the inscription wasn't an abecedary. But the purpose of the abecedary remains open to intepretation, Daniels notes.]

[Update 11/22/05: The Jewish Exponent, a Philadelphia-based newspaper, reports on the Nov. 20 presentation and provides some background on the minimalist/maximalist debate.]

[Update 11/25/05: Christopher Heard provides a thoroughgoing critique of the Jewish Exponent article here.]

Posted by Benjamin Zimmer at 03:02 PM

November 20, 2005

So I says to myself, "Self, what's up with these Googlecounts?"

In my recent post on the difficulties of Googlinguistics, I heeded Mark Liberman's warning to be suspicious about the reliability of Googlecounts much greater than 100,000. But an attempt at some Google-aided snowclone research suggests that the upper limit for reliability may in some cases be on the order of 1,000 or less.

First, let me explain a quirk in the way that Google displays search results. Overall Googlecounts are especially meaningless if one searches on bits of text that appear verbatim on multiple websites — as with song lyrics, poems, public-domain literature, etc. This is particularly evident with disjunctive queries, which use the minus sign to exclude particular search terms. Compare these search results, for instance, on pages with lyrics to the song "Junco Partner" (a traditional New Orleans song, also known as "Junker Partner," recorded by many performers including Harry Connick, Jr.):

"junco partner" lyrics 9,440
"junco partner" lyrics connick 279
"junco partner" lyrics -connick 930

Googlecounts are notorious for exhibiting variations based on the time and place that the search engine is accessed. Still, results from one user at one time, such as mine above, should at least be internally consistent. These results clearly are not, since we would expect A to be roughly equal to the sum of B (mostly pages referring to Connick's cover version) and C (mostly pages referring to versions by other performers, such as the Clash). Ideally, of course, A should be exactly equal to the sum of B and C.

But now try adding "&start=950" to the end of the URL for each search. At the bottom of the page, Google gives a message like this:

In order to show you the most relevant results, we have omitted some entries very similar to the N already displayed.
If you like, you can repeat the search with the omitted results included.

For these queries, Google has found many search results that are more or less identical — such pages most likely contain the same song lyrics with slightly different surrounding content. Focusing on the counts for only the "most relevant" results, i.e., the ones that Google deems to be non-identical, I currently get:

"junco partner" lyrics 413
"junco partner" lyrics connick 85
"junco partner" lyrics -connick 374

Using this search method, the sum of B and C at least approximates A without an enormous margin of error. The lesson here is that the "most relevant" results should provide more trustworthy numbers when dealing with text that appears frequently on the Web with only minor differences across sites. However, this technique will not help if the search string appears non-identically on more than 1,000 pages. For more commonly appearing search strings, the "most relevant" results will cut off at some number under 1,000 (generally between 800 and 950).

Now to the snowclone research I mentioned. I was curious about a snowclonish turn of phrase that is often used to indicate a jokey interior monologue (or dialogue, actually): (So) I says to myself, "Self (I says)..." In September, I mentioned this expression on the American Dialect Society mailing list, asking if anyone knew of its origin. Surprisingly, despite the fact that it sounds like it comes from some old vaudeville routine, no one was able to find an example before the 1980s. John Baker tracked down this example from the Boston Globe of May 31, 1981, quoting a New Hampshire gardener:

"Becuz of the mild weatha at the end of the winta, every blossom and blade of grass was weeks ahead of schedule. Back about the middle of this month, I says to myself, Self, mebbe this is the yea to plant early,' but then I hud the ghost of my fatha and his fatha sayin', Plant the tumatuz on Memorial Day.' I held off."

It's notable that the example uses New England "dialect writing." The use of says with a first-person singular pronoun is common in representations of reported speech in numerous American dialects. Here are examples of I says from two of our most illustrious (and perceptive) dialect writers, Mark Twain and Ring Lardner, Jr.:

"Geewhillikins," I says, "but what does the rest of it mean?"
"We ain't got no time to bother over that," he says; "we got to dig in like all git-out."
Well, anyway," I says, "what's SOME of it? What's a fess?" 
—Mark Twain, The Adventures of Huckleberry Finn, Ch. 38

I says Well I won the pot didn't I? He says Yes and he called me something. I says I got a notion to take a punch at you.
He says Oh you have have you? And I come back at him. I says Yes I have have I? I would of busted his jaw if they hadn't stopped me. You know me Al.
—Ring Lardner, You Know Me Al, Ch. 1

It's not surprising, then, that the dialectal form I says turns up not just in representations of "authentic" American speech but also in jocular expressions like (So) I says to myself, "Self..." Very often with this snowclone, though, the standard forms of the verb (i.e, present-tense I say or past-tense I said) are used instead, as in this example from Buffy the Vampire Slayer:

Willow: Yeah.. I- I know I've been sort of a party-poop lately, so I said to myself, "Self!" I said, "It's time to shake and shimmy it off."
Buffy the Vampire Slayer, "Something Blue," Season 4 (aired Nov. 30, 1999)

It's also common to see the snowclone appear with other verbs appropriate for the self-reporting of an interior monologue, such as think/thought or ask/asked. (However, the mock-dialectal equivalents of I says, namely I thinks and I asks, appear very rarely.)

Here is where Googlecounts would be particularly valuable to calculate the relative frequencies of variant forms. Below are the results that I found in searches conducted back in September and now, for both total Googlehits and "most relevant" results:


"so I said to myself, self"
1,480 493 4,680 585
"so I says to myself, self" 3,240 169
"so I say to myself, self" 996
"so I thought to myself, self" 1,850
"so I think to myself, self" 377
"so I asked myself, self" 495
"so I ask myself, self" 370

We would expect the "total" results to be skewed by the repetition effect noted above. Interestingly, though, the total figures have stayed relatively constant since September, except for an expansion of results for said (perhaps in part due to various fan sites repeating the bit of dialogue from Buffy the Vampire Slayer, or an undercounting of previous such repetitions).

But if we discount the total figures and focus on the "most relevant" results, the figures have not been particularly stable. Results for said have increased, though more moderately compared to the jump in "total" results. Raw hits for says, say, and thought, however, have increased significantly, and it's highly doubtful that there were actually that many more examples for Google to count in the past two months. Only the low-frequency results for think, asked, and ask have kept their results at constant levels, and they all return less than 200 "most relevant" results. (Note that the automatic stemming discussed in the previous post wouldn't have an effect here, since Google applies stemming only to individual search terms, not to words in a string with quotation marks.)

Perhaps it's better to ignore the raw numbers and simply rank the results. Then we find a jump in the rankings for said in the total results and for say in the most relevant results:

  9/26: (says-thought)-said-(say-asked-think-ask)
11/20: said-(says-thought)-(say-asked-think-ask)

  9/26: (said-thought-says)-(think-asked-ask)-say
11/20: (said-thought-says)-say-(think-asked-ask)

The jump for say in the most relevant results possibly rectifies a previous undercounting, since it brings all the rankings into rough alignment: said/says/thought in the top three spots, followed by say, rounded out by think/asked/ask in the bottom three spots.

My suspicion is that the shifts in Googlecounts since September are largely due to various "invisible" factors, such as changes in Google's searching algorithms and its methods of extrapolating results based on small samples. It's a bit distressing, though, that only the search strings with the lowest frequencies (under 500 total results, under 200 most relevant results) show much stability. But I suppose these are matters of interest only to reporters and computational linguists.

[Update #1: Just to be clear, the turn of phrase under investigation is crucially marked by the vocative use of the word self. Dialect writing has plenty of examples of "(so) I says to myself (I says)..." without vocative self, e.g.:

"'Where be the stoat?' he says — 'I ain't seen 'em,' I says. Well, next day we goos again — and I says to myself, I says, — 'I wunt be afeared of a stoat,' I says — so I caught 'em that time — gor' how he did bite surely — they be wonderful bitten things, stoats."
— "A Summer Stroll in Sussex" by Edward Clayton, The Living Age, June 7, 1890, p. 637

So I says to myself I says, there you are, greedyguts, I says, if that pot had smashed Friday night so's you couldn't eat the business cooked in it they wouldn't be smashing all the pots and pans and plates this Saturday night.
— "A Pot Story" by S.J. Agnon, in A Golden Treasury of Jewish Literature, edited by Leo W. Schwarz (1937), p. 358

The use of self as a playful mode of self-address adds an extra layer of self-conscious irony to the expression. In fact, the 1981 example from the Boston Globe given above is the only example I've seen with vocative self that purports to be a genuine representation of unironic speech.]

[Update #2: It's possible that children's rhymes could have provided an early template for this sort of pattern. Here are two examples that seem like they could be related to the snowclone, though neither uses vocative self:

As I walked by myself,
And talked to myself,
Myself said unto me:
"Look to thyself,
Take care of thyself,
For nobody cares for thee."
I answered myself,
And said to myself
In the selfsame repartee:
"Look to thyself,
Or not look to thyself,
The selfsame thing will be."
The Real Mother Goose by Blanche Fisher Wright (1916)

James James
Morrison Morrison
Weatherby George Dupree
Took great
Care of his Mother,
Though he was only three.
James James
Said to his Mother,
"Mother", he said, said he;
"You must never go down to the end of the town,
if you don't go down with me."
James James
Morrison's Mother
Put on a golden gown,
James James
Morrison's Mother
Drove to the end of the town.
James James
Morrison's Mother
Said to herself, said she:
"I can get right down to the end of the town and be
back in time for tea."
When We Were Very Young by A.A. Milne (1924)

Thanks to Mark Liberman for reminding me of the latter.]

[Update #3: Barbara Zimmer points out that So I say(s) to myself, "Self..." has been popularized in recent years by the chef Emeril Lagasse. On his show on the Food Network, Emeril has developed the catchphrase into a call-and-response between him and his studio audience, with the audience expectantly chiming in "Self!" (much as Johnny Carson's audience would chime in "How cold IS it?") ]

[Final update: See further commentary here.]

The Holy Open Source Media Empires (Eastern and Western variants)

Every student knows (as they say) that the Holy Roman Empire was neither holy, nor Roman, nor an empire. Well, maybe it was sort of holy, and maybe it was sort of an empire, but in any case, students have to learn to distinguish it from the original Roman Empire, which was definitely Roman and definitely an empire, and also from the Eastern Roman Empire, which was certainly eastern and imperial, but no longer Roman, and the Western Roman Empire, which was definitely western, Roman and imperial, but existed only briefly and intermittently. And of course, there's also at least one rock band called Holy Roman Empire, which I don't expect to be in the least bit holy, Roman or imperial, though I guess you never know.

This is the sort of branding confusion that modern trademark law is intended to avoid. But ironically, confusion is now rampant with respect to the name "Open Source Media". For those of you who haven't already been following this story, there's a brief history below.

Somewhere in the lexicographical background, there is the term "open sources", used to refer to non-classified sources of information. The OED's earliest published citation for this term is

1980 Sci. Amer. Apr. 36/1 The size of the U.S. stockpile of lethal chemical munitions is classified information, but estimates can be made from open sources.

I'm sure that I heard this phrase used much earlier than 1980, and the New York Times archive shows a headline from 12/26/1951 "Open sources Give U.S. Gata on Soviet: Washington Relies Chiefly on Russian Press and Radio -- Envoys, Refugees Help", with sentences in the body of the article like "...those familiar with the problem know that the overwhelming quantity of such information comes from open sources, primarily those available even to diligent private scholars having no Governmental connection."

The first brand in this arena was the Open Source Initiative, "a non-profit corporation dedicated to managing and promoting the Open Source Definition ... specifically through the OSI Certified Open Source Software certification mark and program". This is a software licensing thing, started by Bruce Perens, Eric Raymond and others as "a marketing program for free software", where free is used as in "free speech", following the lead of the Free Software Foundation. According to the OSI's on-line history, the phrase "open source" (in the current software licensing sense) was coined by Chris Peterson on 2/3/1998 at a strategy session in Palo Alto, California. "Open source" has become a very common phrase, with 329 million Google hits, used in all sorts of ways in all sorts of endeavors.

One of these endeavors is a radio show called "Open Source", hosted by Christopher Lydon, operated from Lowell, MA, by a non-profit corporation called Open Source Media, launched in the spring of 2005, and distributed by PRI. Chris Lydon's discussion of this project's origins references Tom Paine, I.F. Stone, and the Internet as God: "It’s invisible. It’s everywhere. It knows everything. Sing it now: It’s got the whole world in its hands. Its eye is on the sparrow, paraphrasing the Ethel Waters song, and I know it watches me." Chris opines that "American institutional journalism looks to me broken beyond repair", while "the redemptive energy of the new media seems suddenly to be gathering real force". He asks whether the program is "only ripping off a trendy phrase" by calling itself Open Source, and asserts that the answer is "no":

...we are serious followers of the “social gospel” of open source. We believe in fact that the critical work ahead is to extend open-source ideas, so effective in computer world, deeper into politics, culture, media and the rebuilding of civil society.

Everything we do at Open Source will be “open to inspection, improvement, adoption and reuse,” in Doc Searls‘ neat formulation. We will make all the content of Open Source available under a Creative Commons license for non-commercial use, with the standard proviso that our work is credited and further use is open.

OK, now the scene shifts to a different revival tent, this one on the west coast. During the summer of 2005, two noted bloggers, Roger L. Simon and Charles Johnson, started putting together an enterprise called Pajamas Media, named of course for the extraordinary crystallization of ancien-regime arrogance by Jonathan Klein (former CBS News VP) in his 9/9/2004 sound bite on the Rathergate scandal:

"You couldn't have a starker contrast between the multiple layers of checks and balances [at 60 Minutes] and a guy sitting in his living room in his pajamas writing."

The Pajamas Media web site went live in September (I think), and early in the process, they decided to change the name:

... as we have gone forward putting together this company, it has become clear to us that we do not wish to be defined merely as gadflies in opposition to mainstream media. We owe our readers and our colleagues something bigger, an alternative to the structures we have lived with all our lives. It's not enough to criticize. We also have to build something new. To do that, we needed a name that would allow us to grow. And that name we are in the process of deciding.

What they came up with was "Open Source Media". The open letter from the founders starts by (mis?)quoting the FSF's slogan and the OSI's rephrasing:

"Free speech, not free beer!"

In 1985, that’s how the Free Software Foundation first described an idealized world wherein innovative ideas would flow freely though the collaborative environment of the internet. In casting about for a term that would denote freedom, not freebies, those who followed FSF coined the term "Open Source," intending it merely as a reference to the "source" code in which they programmed. It turned out to be much more than that.

They observe that "the term '"Open Source' had a ring to it", and quote Linus Torvalds as saying "The future is open source everything." They don't go so far as to deify the net, but they do offer a stirring vision of the free flow of digital grace:

...freedom, openness and transparency in media is an inevitable result of the technological advances that have given every citizen the chance to breathe deeply of the news, thought and opinion that hovers in the ether between us.

So, if you take a deep breath of that ether, you can see the associative connection to the ideas of free software and open source. But the connection to free (re-)distribution of information has been lost: all the (Western) Open Source Media stuff, as far as I can see, is asserted to be "Copyright © 2005 OSM Media, LLC All Rights Reserved".

Apparently the founders have gotten some complaints along this line, both from people who like the FSF/OSI ethos and from those who don't, because they further explain their choice as follows:

Some OSM readers have expressed consternation over our new company name, so please let us take a moment to explain--in the spirit of full disclosure--the story of its origin. At the outset, we formed a company under the masthead "Pajamas Media," after that now-famous remark about bloggers being "just a bunch of people sitting around in their pajamas." Then, as the idea for the company grew, we cast about for a new name that would reflect our ethos long after the joke grew old. Some of the unsuccessful names rejected along the way were "Alpha Media" and "Jellyfish Media," so don't be so hard on us about "OSM"--it could have been worse.

The goal of our enterprise is to bring gravitas and legitimacy to the blogosphere, to amplify the individual voices that compose it, and bring you the best of blogging as we know it, and to do so, we felt it wise and appropriate to arm ourselves with all of the conventional tools of business--including a trademark. To that end, we have filed an application for a trademark on the name OSM, and our legal corporate name is OSM Media, LLC. We have not trademarked the term "Open Source Media," and agree with those who point out the irony inherent in any attempt to do so. We consider Open Source Media to be a description of what we are and do, not a trade name.

This is puzzling. "Open source" is a term with a very specific meaning, defined and defended ad nauseam by the folks at OSI. What OSM Media LLC aims to do, admirable as it may be, is clearly outside that definition. It's plausible to say that OSM was inspired by certain characteristics of the open source movement, just as the Holy Roman Empire was inspired by memories of the old Western Roman Empire. But the phrase "Holy Roman Empire" doesn't refer to the rock band of that name by virtue of being a description, but rather because the band chose to adopt it, and the phrase "Open Source Media" seems to have a similar relation to the business enterprise recently started by Roger Simon and Charles Johnson.

[For more discussion from various points of view, see these two pages at Open Source Media (Eastern), and blog posts by Charles Johnson, Roger Simon, Dan Gilmore, Jeff Jarvis, Strategic Public Relations, Ann Althouse, Kevin Drum, Private RadioJames Joyner, David Corn, protein wisdom, Tony Pierce, The Talent Show, The Poor Man, Wonkette, Dennis the Peasant, Monty Python, etc.]

[Update: OSM has folded their hand -- read the latest on the OSM site, by Roger Simon and Charles Johnson, under the title "Excuse us while we change back into our pajamas". Money quote:

So how did this happen in the first place? Back at the beginning, certain, shall we say, paternalistically minded parties (i.e., the guys in suits) decided that we should act like grownups, and being as yet somewhat immature--at least as businesspeople--we did as we were told.

Which is how, one day, we ended up sitting around a conference table listening to representatives from a "branding" company. What followed is still a bit of a nightmarish blur, but it involved a PowerPoint presentation on the history of names, and such probing questions as, "If you were an animal, what animal would you be?" (Which is how we almost ended up as Jellyfish Media.)

Enough said. So, in the spirit of "open source," we thought we'd tell you the real story behind the reason for our name change.

Fair enough, and a lesson to us all. Except that "the spirit of 'open source'" doesn't really mean "telling everyone a lot of all-rights-reserved things about yourself". At least that's not what it means to me. ]

November 19, 2005

Snowclone blindness

Journalists and others often make an ethnographic or political point by observing that a particular language or culture "has N words for X", where N is either zero or some number viewed as excessively large. This trope has been around at least since the 18th century, when it was the supposed 500 Arabic words for lion, rather than the usual modern counts of Eskimo words for snow or for robin. As Geoff Pullum periodically reminds us, these rhetorical flights are hardly ever true in linguistic terms, and their logic would be suspect even if the facts were correct.

Over the past few weeks, a number of readers have sent in examples of the arctic form of the trope, which I've laid out below. The last example is especially interesting: the mind-clouding power of this rhetorical device apparently led an eminent scientist to say something completely illogical about his area of specialization. And he said it in a television interview, embedded in a passage that is long enough that we can be sure that he hasn't been journalistically misrepresented.

On the large-number side, Linda Seebach sent in the Independent's review of The Meaning of Tingo:

Everyone knows that Inuit-speaking races can call on 30-odd words for snow. Adam Jacot de Boinod first became entranced by language when he discovered 27 words for "moustache" in an Albanian dictionary - and another 27 for "eyebrows". A world of bushy machismo and stolid dignity sprang to life before his eyes.

In the "no word for X" category, Ray Girvan sent a link to a BBC News piece on global warming, explaining that

I just *knew* the moment I saw the topic that they'd do it again.

Sure enough. About halfway through: "He says that their language, which has evolved over thousands of years, has no word for the new climate".

Hugo Quene submitted a link to a profile in Salon:

Sheila Watt-Cloutier's people, who have lived in the Arctic for thousands of years, recognized the threat posed by global warming long before science confirmed their observations. When robins and barn owls began showing up in the North's frozen reaches, the Inuit had no name for them.

And finally, the promised (and puzzling) finale. Matthew Hutson sent a link to a Discovery Channel special on disappearing Arctic ice ("Examining the Arctic Melt"), observing that

About 6 minutes into the first online segment, someone says that the Inuit never had a word for sunburn, but now they do.

True enough. The segment is about an important topic: the loss of arctic ice because of climate change. The speaker is David Barber, Canada Research Chair in Arctic System Science at the University of Manitoba. What he says is:

It's remarkable when a society like the Inuit, in northern Canada, develop a word for sunburn. They never had a word for that before, and now they have a word for it.

I'm reluctant to suppose that Prof. Barber doesn't know what he's talking about, but this surprises me. First, though I'm not sure about the implications for Inuit lexicography, excess UV exposure has been an issue in the arctic as long as there have been living things there. And second, while greenhouse gases are apparently raising mean global temperature, their effect on UV levels must be trivial at best.

Any skier knows that the sunshine reflected from snow as well as received directly can be dangerous, causing not only sunburn of the skin but also snow blindness. This problem is increased in polar regions, where the summer sun shines all day long. From a description of the diary of one of the members of Shakleton's 1914 expedition:

On 12 November, Hussey records his first wash since leaving the Endurance, and the effects of polar sunburn: 'My nose & face are peeling as though I'd been at Margate for a month'.

Arctic invertebrates have had to adapt to the problems of polar sunburn. And this article on "Skin Color as an Adaptation" notes that the Inuit have the skin color typical of people living at much lower latitudes:

Nature has selected for people with darker skin in tropical latitudes, especially in nonforested regions, where ultraviolet radiation from the sun is usually the most intense. Melanin acts as a protective biological shield against ultraviolet radiation. By doing this, it helps to prevent sunburn damage that could result in DNA changes and, subsequently, melanoma ...

People who live in far northern latitudes ... have an advantage if their skin has little shielding pigmentation. Nature selects for less melanin when ultraviolet radiation is weak. In such an environment, very dark skin is a disadvantage because it can prevent people from producing enough vitamin D, potentially resulting in rickets disease in children and osteoporosis in adults. ...

The Inuit people of the American Subarctic are an exception. They have moderately heavy skin pigmentation despite the far northern latitude at which they live. While this is a disadvantage for vitamin D production, they apparently made up for it by eating fish and sea mammal blubber that are high in D. In addition, the Inuit have been in the far north for only about 5,000 years. This may not have been enough time for significantly lower melanin production to have been selected for by nature.

The Inuit's skin color is certainly appropriate for summer sunburn protection in the situation they have been living in, as this discussion of UV in polar regions by Jack Williams suggests (from the Answers Archive of the Weather section at USA Today):

... sunburn and snow blindness were problems during the polar summer because ice reflects both visible light and ultraviolet energy. This more than makes up for the lower intensity of sunlight in the polar regions than in the tropics, where the sun is always nearly overhead at noon.

Here's something that makes you hurt when you think about it: While doing research for my latest book, The Complete Idiot's Guide to the Arctic and Antarctic, which was published in June 2003, I found accounts of early explorers getting the roofs of their mouths sunburned from sunlight reflected from the snow as they pulled sleds, with their mouths open, gasping for air.

When I went to Antarctica in January 1999, I was told to bring plenty of high-powered sunblock, and good sunglasses with UV protection.

Williams discusses "ozone hole" effects, and says that they exist but are secondary -- in any case, the Discovery Channel segment containing Barber's commentary was about global warming, not ozone layer issues.

Excess UV exposure traditionally been a problem for the Inuit, as suggested by this passage from Jean Malaurie's wonderful memoir The Last Kings of Thule. The year is 1950, and the author is doing geomorphological fieldwork in Greenland.

I did some useful snow cartography of slopes and took measurements of scree slopes. But however reassuring the brilliant sun might be, the moving ice and flowing rivulets kept reminded us of the danger of being cut off from our base in Greenland if the sea should suddenly become an expanse of free water. Not one day could be lost, so morning and night we made trips right and left. We saw the star-shaped footprints of the qupannaaq, some gulls, many polar hares, but not the slightest trace of musk ox or reindeer. Qaaqqutisaq's health prevented my keeping up the pace I would have liked. At each geomorphological station, he would lie, face down, on the floor of his sledge, his head in his arms. On the morning of June 4, after fourteen hours of uninterrupted work along the southeastern coast, he took me aside and complained of a headache and pain in his eyes. All he could see in front of him was a sort of halo. Although he had been wearing sunglasses, he was the first of us to suffer from the painful affliction of snow blindness. He begged me not to let that stop me: "Just let me sleep," he said, leaning against the napariaq, "and let's not go back to the camp until the work is finished." Before we rejoined the group, his wife suggested the powerful but painful remedy commonly used by the Canadian Eskimos and sometimes by those of Thule -- a few drops of oil in the eyes -- but he refused vehemently.

I don't know whether the Inuit traditionally had a word for sunburn, but the phenomenon has clearly been around for them to talk about, for as long as they have lived in the arctic. And in any case, global warming and the melting of arctic ice doesn't cause greater amounts of sunburn, unless there's some aspect of this situation that I'm not seeing. So why did Barber make this probably false and surely illogical point, right out loud on a major television program?

One clue: Barber's interview is not the only place where this idea can be found. In a statement by Kelly Reinhardt and Tooker Gomberg, (said to have been) delivered on 10/25/2000:

Yesterday we burned our Canadian passports in outrage at the behaviour of the Canadian Government at the World Conference on Climate Change in Den Haag, Netherlands. We are investigating renouncing our Canadian citizenship.

Canada emerged from the UN Climate Conference in Den Haag as the WORST country on planet earth, according to a world-wide coalition of environmental Non-Governmental Organizations ...

Canada is already feeling extreme impacts of climate change, and should be at the head of the pack, not at the back. The arctic is in the midst of a massive meltdown. Northern people are reporting climate unknown in their oral history. The Inuit have no word for sunburn, for thunderstorms, or for robins because they have never had these experiences before in their history. [emphasis added]

And a 5/19/2000 article in the Nunatsiaq News by Jane George interviews one Graham Ashford, the director of a video project called "Inuit Observations of Climate Change":

Ashford said the video should bring home how Inuvialuit are trying cope with this new, unstable environment, and even illustrate some of the health problems that they’re suffering now. These include sunburn and allergies as a result of more light, heat and plant growth.

Leaving thunderstorms and robins out of it, it seems pretty clear that sunburn has a long history among arctic peoples, and also that climate change is not responsible for increases in UV levels. It's understandable that environmental and Inuit activists, looking for ways to dramatize the issues they care about, should seize on the idea that warmer temperatures would somehow connect with sunburn. It's less understandable that an eminent physical scientist like Barber would repeat the argument. Sometimes rhetoric clouds the mind, I guess.

Posted by Mark Liberman at 06:52 PM

November 18, 2005

Contrast vs. emphasis in Kyoto

There's a clip from the 11/16/2005 Bush/Koizumi press conference, widely quoted in the media -- at least I heard it on an NPR broadcast, and found the recording online in a VOA radio piece -- which some listeners are likely to hear in a way that President Bush certainly did not intend.

... the Senate did ask that we report
on progress being made in Iraq, which we're more than willing to do. That's-
that's to be expected. That's what the
Congress expects. They expect us to keep them abreast
of *A* PLAN
that is going to WORK.
(audio clip)

I've used capital letters and bold-face type and asterisks to mark what I initially heard as a pair of contrastive accents, used to indicate that one alternative is being selected from among a contextually salient set. And along with the pitch accents on "plan" and "work", the president uses a lengthened and unreduced form of the indefinite article "a". Construed this way, the apparently constrastive "*A* PLAN" invokes alternatives like "NO PLAN" or "a SERIES of PLANS", while the phrase "that is going to WORK" invokes an alternative like "that has turned out to FAIL".

Hearing the quoted phrase with the president's prosody, I immediately heard the voice of Jon Stewart from The Daily Show in my mind's ear:

George W. Bush: They expect us to keep them abreast of *A* PLAN that is going to WORK
Jon Stewart:
-- and as soon as we can think of one, we will.

But of course this continuation makes no sense as part of the president's message -- his pitch accents and his fluently unreduced "a" indicated emphasis, not contrast. His next two sentences, left out of the radio quotes that I heard, were:

It's a plan that we have made very clear to the Senate and the House, and that is the plan that we will train Iraqis, Iraqi troops to be able to take the fight to the enemy. And as I have consistently said, as the Iraqis stand up, we will stand down. [from the transcript at]

And in fact, what Jon Stewart chose to do with the Iraq reference in Bush's Kyoto news conference (video clip available as "Moments of Zen" on the Comedy Central web site) was to scan one of these later clauses as a haiku:

Jon Stewart: Bush also spoke about Iraq. Now here's how good Bush has gotten with the Iraq talking points. He's in Japan, talking about the war. Listen to what he said:
George W. Bush: I've consistently said, as the-
as the Iraqis stand up, we will stand down.
Jon Stewart: Did you catch that? That was seventeen syllables. That's a mother****ing haiku!

In order to make the sentence scan, Stewart removes the um and the repetition, and arranges the phrasing like this:

(5) I've consistently
(7) said as the Iraqis stand
(5) up we will stand down

And they say a degree in Comparative Literature has no real-world applications!

Let me observe in passing that I take Jon Stewart's voice in my mind's ear as a subconsious echo of a broader shift in public opinion. For what it's worth, my own opinion on these issues is about half way between those of Christopher Hitchens and Brendan O'Leary. And I'm not a person who normally talks back to news broadcasts, whether in my own voice or someone else's.

For those interested in the phonetics of contrast and emphasis, here's a display of Bush's crucial phrase, with an audio waveform, a spectrogram and a pitch track:

Here are several earlier Language Log posts that discuss the tendency of President Bush (and plenty of other people) to use unreduced "a" to mark emphasis on the following word or phrase:

And for completeness, here the whole Q & A from the Kyoto press conference, in the transcription offered on the White House web site:

Q Thank you, sir. Sir, as you probably know, the Senate rejected earlier today measures that would have required a timetable for withdrawal in Iraq, but a Republican resolution was overwhelmingly passed that called for more information from your information to clarify and recommend changes to U.S. policy in Iraq. So is that evidence that your party is increasingly splitting with you, sir, on Iraq? And is it an open challenge to you -- is that open challenge to you embarrassing while you're traveling abroad?\

PRESIDENT BUSH: I, first of all, appreciated the fact that the Senate, in a bipartisan fashion, rejected an amendment that would have taken our troops out of Iraq before the mission was complete. To me that was a positive step by the United States Senate.

Secondly, the Senate did ask that we report on progress being made in Iraq, which we're more than willing to do. That's to be expected. That's what the Congress expects. They expect us to keep them abreast of a plan that is going to work. It's a plan that we have made very clear to the Senate and the House, and that is the plan that we will train Iraqis, Iraqi troops to be able to take the fight to the enemy. And as I have consistently said, as the Iraqis stand up, we will stand down.

I view this as a -- as an amendment consistent with our strategy, and look forward to continue to work with the Congress. It is important that we succeed in Iraq. A democracy in Iraq will bring peace for generations to come. And we're going to. The Iraqi people want us to succeed. The only reason we won't succeed is if we lose our nerve, and the terrorists are able to drive us out of Iraq by killing innocent lives. But I view this as positive developments on the Hill.

Posted by Mark Liberman at 08:39 AM

November 17, 2005

Has George W. Bush become more disfluent?

George W. Bush, in his 11/16/2005 press conference in Kyoto:

Obviously, the extent to which uh [0.295]
the Japanese government wants to give reconstruction money to Iraq is up to the Japanese government, and [pause 0.187]
to- to the- and I- as to the- [pause 0.205]
the- the uh deployment of troops, it's up to- [0.421] it's up to the government. [pause 1.237]
's what happens in democracies -- government makes decisions that uh [pause 0.598]
that uh that they're uh capable of living with, and that's [pause 1.966]
that's what we said, ((we)) said, do the best you can do; [pause 0.530]
make up your own mind, it's your decision, not mine.
(audio clip)

These are exemplary sentiments, but their expression is surprisingly chaotic, given that the question was a predictable one and the answer is a routine piece of diplomatic boilerplate.

Regular readers of this blog know that I think that George W. Bush has gotten a bad rap on the question of fluency and verbal facility. (You can find more discussion of this issue by searching Language Log for "Bushisms".) In reference to the 10/8/2004 presidential debate, for instance, I observed that both the moderator (Charlie Gibson) and John Kerry committed verbal flubs that were more striking than any of George W. Bush's miscues. One of Kerry's disfluencies:

And I'm gonna [pause 0.258]
put ((in)) place a better homeland security. [pause 0.239]
Effort. [pause 0.259]
Look at it. [pause 0.256]
95% of our containers coming into this country are not inspected today. [pause 0.769]
When you get on an airplane, your car- your [pause 0.421]
bag is- is- is [pause 0.184]
((b-)) x-rayed [pause 0.494]
but the cargo hold isn't x-rayed. Do you feel safer? [pause 0.686]

(audio clip)

As stress and distraction increase, people are more likely to break down verbally. Because we know this, we may pay more attention to disfluency when we suspect that the speaker is distracted and under stress. So my impression that George W. Bush has become more disfluent recently may be a fact about me and the conventional assessment of the political situation, rather than a fact about him. Still, I can add one small quantitative comparison to the subjective impressions. Compared to his 10/3/2005 nomination speech for Harriet Miers, his 10/31/2005 nomination of Samuel Alito had twice as many self-corrections in barely half the time. (More exactly: my transcription of Bush's speech nominating Harriet Miers shows two self-corrections in 1202 words over 547.7 seconds; my transcription of Bush's speech nominating Samuel Alito shows four self-corrections in 672 words over 314.8 seconds.)

I don't know how these self-correction rates compare to those typical of other public figures in similar situations (reading short speeches to a small live audience and a large broadcast audience). I do know that the duration of W's pauses in such speech reading, at least in recent times, is much longer than I would expect in presentations of this type (and longer than his pauses are in unscripted speech). But on this measure, at least, his Miers and Alito nomination speeches are entirely consistent, as this boxplot comparing the distribution of within-sentence and between-sentence pauses in the two speeches shows:

Within each "box", the horizontal line shows the median value, while the top and bottom of the box are the 75th and 25th percentile. The "whiskers" show the extreme values (as long as they are within 1.5 times the interquartile range of the box -- outliers beyond that range are shown as individual points).

As I pointed out in an earlier post, W's pauses in the Miers nomination are roughly twice the duration of those in Miers' remarks on the same occasion, and I'm confident that his current pausing in read speeches is in general much longer than the norm for American public figures. It's easy to speculate that this is a tactic he's adopted in defense against being criticized for making mistakes in speaking. It might be interesting to model the variation in pause durations in terms of local properties of the message, other than the obvious within-sentence vs. across-sentence distinction made above. In the case of the first presidential campaign debate back in 10/2004, I found that the duration of Bush's pauses was not significantly correlated with the duration of the immediately following spoken phrases (r=-0.05), but did correlate with the duration of the immediately preceding phrase (r=0.54).

With access to a complete set of recordings over time, it would be fairly easy to sample various relevant quantities and rates for different public figures in different sorts of material at different times and to explore various hypotheses about what was going on in their minds as well as in their rhetoric. I'm a bit surprised that more of this has not been done.

[For completeness, here's the whole Q&A where the Bush remarks came from, as transcribed on the web site. Obviously (and appropriately, in my opinion) most of the false starts and self-corrections have been edited out.

Q Concerning the dispatch of self-defense forces to Iraq, the 14th of next month is the time limit of the stationing. What kind of explanation did you make to the President about that? And how did President Bush evaluate that-- appreciate Japan's position on this? And what do you expect Japan to do further in Iraq on this issue?

PRIME MINISTER KOIZUMI: Concerning Japan's assistance toward Iraq, including the activities of the self-defense forces, we will want to see that the Iraqi people, themselves, bring democratic and stable nation by the power of the Iraqis, themselves. And they are making efforts toward that goal. Certainly there are political difficulties, but they are making progress.
So, against that background, as a responsible member of the international community, Japan should seriously consider what we could do to help the situation there. That has been our position, and there is no change in our basic stance.
What kind of assistance we are going to make in December? First, toward the reconstruction of Iraq, what we can do -- that, first, we have to think about, and then multilateral forces and other nations are involved in helping reconstruct Iraq. As a member of the international community, we have to join them. And further, on the basis of the importance of the U.S.-Japan alliance, we have to take all those things in a comprehensive manner, so that we seriously think what we could do to help the Iraq situation, and we make judgment on that basis.

PRESIDENT BUSH: Obviously, the extent to which the Japanese government wants to give reconstruction money to Iraq is up to the Japanese government. And as to the deployment of troops, that's up to the government. That's what happens in democracies -- governments make decisions that they're capable of living with. And that's -- that's what we said, said, do the best you can do; make up your own mind, it's your decision, not mine.


[Update: As an example to show that George W. Bush has often been extremely fluent and clear in presenting complex ideas extemporaneously, listen to this clip from one of his town meetings on Social Security reform (remember those?), held on June 2, 2005, in Hopkinsville KY. I can't provide any valid evidence that the pattern of the Kyoto clip has become stronger and more common, and the pattern of the Hopkinsville clip weaker and less common, but that's my subjective impression. ]

[Update #2: More on this topic here.]

November 16, 2005

The past, present, and future of readying

There are more developments on tracking the incipient verb ready, both in the prehistory and the present and the likely future. Chris Waigl has researched the palaeontology of the transitive construction and has come up with some examples showing that it was coming into the language a hundred or even two hundred years ago. And Marilyn Martin at Cornell University has been searching for further modern cases of ready as an infinitival complement-taking verb, and has found oodles of them.

One of Chris Waigl's old transitive examples is from a poem in London's Cockney dialect, and has readied up in the sense of "prepared":

I've crawled; I've eaten dirt; I've lied a treat;
I've dodged the cops an' led a double life;
I've readied up wild tales to tell me wife...
(C.J. Dennis (1924), Rose of Spadgers, )

And another case of transitive ready with this meaning, even older, is found in a 19th-century reference to the practice of fixing a warm steak for supper by (and I admit this sounds unappetizing) putting it under the saddle of your horse and riding hard on it for a day:

Can a Tartar be said to cook, when he only readies his steak by riding on it?
(Thomas Carlyle (1831), Sartor Resartus: The Life and Opinions of Herr Teufelsdrockh, )

(The method, which I have heard gauchos in Argentina have also used, lends new meaning to the term "rump steak". Nude bareback riders, please do not try this at home.)

Meanwhile, Marilyn Martin was searching for further modern cases of ready as an infinitival complement-taking verb, and has found as many as 244 Google hits just for the phrase "as they ready to". Here are some of her examples:

As they sit down, as they rest, as they watch TV, as they read, as they get nervous, as they get excited, as they ready to go. (

Faith is that which brings joy to the heart's of people who lie on their death beds as they ready to make their imminent voyage into Eternity. (

Chekov volunteers himself and a few journalists as a fill-in medical staff as they ready to help the stricken El-Aurians. (

The Bowery Ballroom is packed for stellastarr*, and it's only 9 pm While the band seems a bit nervous as they ready to rip into their opening numbers, ... (

IT staff will set up specific installation dates with each area as they ready to install in an area. (

But as they ready to embark on the journey, Lackland is unaware that Barlennan has his own secret motives for wanting to find the rocket. (

sound check as they ready to rock The Joint in the city of sin - Las Vegas. (

The fish gather at the mouth of Nit Nat Lake where it opens into the ocean as they ready to go up to the river to spawn. (

. . . and in full song as they ready to breed (

What do we learn from all this? That there is always more going on as regards changes in the English language than you can shake a stick at. That language truly is always in flux. That everything you thought was new has older roots. That these days anyone with a serious interest can make new linguistic discoveries making use of web resources. That lexical investigations need to be done by collaborative teams of people. That there are huge amounts of energy and intelligence to be tapped into out there in cyberlingspace. That these days you can post on Language Log, go get a latte at the Starbucks on the ground floor of One Language Log Plaza, and come back to find three emails have come in correcting errors you made or adding to the data you found. These and all sorts of other things are what we learn. We learn such things every day. Even while we're downstairs getting a cup of coffee.

Googlinguistics: the good, the bad, and the ugly

Language Log popped up rather unexpectedly today in an entry on Peter Suber's Open Access News, an informative blog collecting reports on the open access movement. Suber links to an article in the Cornell Daily Sun by Elise Kramer on the potential uses of Google for social scientists. Here's the relevant passage:

If linguistics is more your thing, Google's index of billions of webpages provides insight into how people across the globe use written language. A variety of academics at the Language Log, a linguistics blog, use Google to assess common usage — for example, how often the word "guttural" is used incorrectly (pretty often), or whether people more frequently say "in the circumstances" or "under the circumstances" (it depends on the circumstances, so to speak).

It's interesting that the writer singled out these particular posts ("Guttural politics" and "In or under") as her examples of Googlinguistics in action. Both entries do indeed illustrate the possibilities of using Google for rough-and-ready corpus linguistics (as previously discussed in The Economist back in January), but they also point to some severe limitations that go unmentioned in the article.

In my post on the shifting semantics of the word guttural under influence from both gutter and gut, I invoked Google in a very limited way, only mentioning the search engine in a parenthetical note about instances of "guttural/gutteral reaction" and "guttural/gutteral instinct." Of course, Google-searching informed other parts of the post in a more implicit manner (for instance, I Googled to find examples of guttural used as a pejorative tag for unpleasant speech patterns, and also as a description of nonlinguistic vocalizations). But I wouldn't go so far as to say I was using Google to "assess common usage" to determine "how often the word 'guttural' is used incorrectly." Google could do no more for me than provide some anecdotal (though enlightening) evidence about changes in usage of this particular term. (Also, good descriptivist that I am, I never weighed in on which particular usage should be labeled "incorrect" — and even if I did, I wouldn't necessarily rely on Google to tell me "how often" it occurs!)

Arnold Zwicky's post on "in the circumstances" vs. "under the circumstances" did use Google in a more systematic manner, with calculated ratios of Googlehits for "in..." vs. "under..." in a variety of collocational contexts. Arnold used the Google data to form some preliminary conclusions, but he acknowledged that his analysis "just scratches the surface of the phenomenon." Though I found the conclusions quite intriguing, I would have preferred some additional caveats, since many of the Googlecounts extend into the hundreds of thousands or even millions, orders of magnitude that have proven to be quite unreliable. (See the roundup of Language Log commentary on Googlecount problems at the end of this post.)

When Mark Liberman and Jean Véronis were investigating shortcomings in Googlecounts back in January, they cautioned against expecting much reliability in counts above about 100,000, particularly when boolean operators were at play. Since then, the situation has only become more dire, in terms of the prospects for even the roughest kind of Googlinguistics.

In August, Google announced that it was using "softer pattern matching" to make searching more effective. It was never explained what exactly this entails, beyond a mysterious change in the already troublesome use of the asterisk as a full-word wildcard. (Try searching on strings like "fourscore * ago" and "fourscore * * fathers" and figuring out how many "filler" words an asterisk can stand for!) They also introduced automatic stemming for plurals (try "seven years ago our" + father), tense endings (try "conceived in liberty and" + dedicate), and some derivational suffixes (try "all men are created" + equally). All this may indeed improve the results of the average searcher, but it makes large Googlecounts even more meaningless from a computational standpoint.

Despite these criticisms, I'm actually an enormous Google fan (developments in Google Print [*] are particularly exciting for linguists and lexicographers). But unless Google starts offering search services specifically designed for researchers concerned with standards of precision, I despair for the future of Googlinguistics. Signs are not particularly positive from Google's camp. In Carl Bialik's Sep. 15 "Numbers Guy" column in the online Wall Street Journal, Peter Norvig, Google's director of search quality, had this to say about the unreliability of Googlecounts: "It's only reporters and computational linguists who care if it's really precise." Well, at least they know that computational linguists care! Perhaps that's the first step.

[* Update 11/17/05: Make that Google Book Search.]

[Update 11/18/05: It's been pointed out to me that Google's automatic stemming can be avoided by prefixing a search term with a plus sign. So, for example, <"all men are created" +equally> will not return matches for "all men are created equal." But there doesn't seem to be any way around the algorithm that allows an asterisk to stand for two or three words in a search string rather than a single one.]

Eating, drinking, sleeping snowclones, part 2: the early years

In our last installment we catalogued the efflorescence of the "X eats, drinks, and sleeps Y" snowclone in its multitudinous forms, culled from a century or so of American newspaper appearances. Now we take a look at some "proto-snowclones" that illustrate the early roots of the formation.

What distinguishes the modern form of the snowclone is the piling together of various common verbs relating to everyday life (eat, drink, sleep, think, dream, live, breathe, etc.) in a transitive formation, where the object of the conjoined verbs constitutes some sort of obsessive fixation, one that overwhelms the subject's daily activities. So far the earliest clear-cut examples I've found come from the 1870s:

"The Risks of the Stocking Trade," Chester (Pa.) Daily Times, Aug. 5, 1878, p. 1/3
My whole existence is one elongated hose. I eat, sleep, drink and think stockings.

Julian Hawthorne, "A Feast of Blood," The Galaxy, Vol. 16, Issue 3, Sep. 1873, p. 405/2
The Schläger is the life of the corps system; the corps student talks, eats, sleeps, drinks Schläger.
[Text available via Making of America at Cornell; the "Schläger" is a German fencing blade.]

A citation from 1863 doesn't quite fit the canonical modern form but bears a strong family resemblance:

John Cumming, Moses right, and Bishop Colenso wrong. New York: J. Bradburn, 1863, pp. 152-3
The unhappy prelate breathes doubts, and eats doubts, and lives in doubts, till doubts seem to be assimilated to, and incorporated with his very nature.
[Text available via Making of America at Univ. of Michigan.]

Here we have "doubts" as the consuming force, overtaking the breathing and eating of Bishop Colenso (the unhappy prelate). The object is repeated for emphasis across the conjuncts, a variant pattern I noted in the previous post. But the third conjunct, "lives in doubts," is intransitive (or quasi-transitive), somewhat spoiling the parallelism with "breathes doubts" and "eats doubts." This is therefore a hybrid form, featuring the idiomatic transitive usage for the first two conjuncts and a more typical intransitive for the third.

Continuing our reverse chronology, here is an 1814 example (from Chadwyck's Literature Online database) that also appears to be a hybrid case:

Maria Edgeworth, Patronage, 1814, p. 270
I am sure the profession of the law has not contracted his heart, and yet you never saw or can conceive a man more intent upon his business.—I believe he eats, drinks, and sleeps upon law; he has the reputation, in consequence, of being one of the soundest of our lawyers—the best opinion in England.
[Text also available in a later anthology via Google Print.]

As with the 1863 example, the first two transitive conjuncts, "eats (law)" and "drinks (law)," are matched by a quasi-transitive, "sleeps upon law." The idiomatic transitive use of sleep appearing in the modern versions of the snowclone does not seem to have developed yet. It's also possible to interpret "upon" as shared by all three conjuncts: "eats (upon law)," "drinks (upon law)," and "sleeps upon law."

Earlier forerunners rely entirely on intransitive or quasi-transitive verbs, with the object preceded by a preposition. Here are two more citations from Literature Online (thanks to Mark Liberman for the latter):

Edward Thompson, "The Courtesan," from The Court of Cupid, 1770, p. 70
Be thou my Muse; in spite of pedant fools
Who walk, eat, drink, and sleep by college rules.

Catherine Jemmat, "The Choice," from The Memoirs, 1762, p. 115
And let him be no learned fool,
That nods o'er musty books;
Who eats and drinks and lives by rule,
And waves my words and looks.

The similarity of these two versified examples is striking, with the everyday activities of learned or pedant fools (and their musty books!) governed entirely by rules. Have we reached the ultimate source of the snowclone? Probably not, as the historical horizon for such research only seems to recede further and further, thanks to ever-expanding digital databases.

One final note: Brett Reynolds points out via email one of the pitfalls of relying solely on electronic databases of mainstream newspapers and print literature. A common variant on the "eating, drinking, sleeping" snowclone has "shitting" as a conjunct (usually in final position). Fortunately, Google readily fills the gap:

Your entire country eats, drinks and shits rugby and yet you haven't won a World Cup since '87.
If we lose, we blame it on the computer, not on the 54 year-old man who eats, sleeps, drinks, and shits video games.
We found a retired guy who lives in northern Wisconsin who lives, breathes, eats, sleeps and shits classic Mustangs.
He loves music and he lives, breathes, eats and shits music so God Bless Him.
Metallica lives, eats, breathes, sleeps, and shits their music.

[Update: At least one reference book has taken notice of the snowclone: Brewer's Dictionary of Modern Phrase & Fable has an entry for eat, drink and sleep something. No discussion of the expression's history or variant forms, though it does mention the 1996 ad campaign by Coca Cola during the European (Football) Championship: "Eat football, sleep football, drink Coca Cola."]

November 15, 2005

Hold on, I'm readying

Here's a change in progress: the well-established adjective lexeme ready has for some time been slowly turning into a verb. Webster's only gives it as a transitive verb (as in They were readying the plane for departure), and even in that use it's relatively new: texts from early in the 20th century contain absolutely no forms like readying, readies, or readied. More recently there is something newer — new enough that I don't recall noticing it before: the other morning I heard an NPR announcer talk about a Boston sports team (don't ask; I don't do sports reporting) that was "readying for" a game against some other team (Miami?). That's an intransitive use of the verb. Intransitive uses are in fact attested from the 1980s: a scan of the 44 million words of the useful (and inexpensive!) 1987-89 Wall Street Journal corpus comes up with four occurrences of readying for (I exhibit all four of them at the end of the post below). But readies for does not occur at all, and the six occurrences of readied for are all passives (so they illustrate the transitive verb).

This intransitive verb is new. It has hardly been born, in linguistic terms (where changes normally come only as generation follows generation). It is not just an ordinary verb meaning "get ready". Imagine this: your partner is waiting for you to finish dressing and preening so the two of you go out, and shouts up the stairs, "Come on! We need to get going!". I don't think you can call back, "Just a minute; I'm readying!"

It seems to me that the way things are right now, you can ready something for use, as most large modern dictionaries acknowledge; and to a very limited extent you can ready for something (a use that most dictionaries do not seem to have caught up with yet). But that's just about it, I think. It's obligatory to have either a direct object or a preposition phrase with for. The unfolding verb ready has a few more decades of slow spreading before it is more fully available for intransitive use (if it ever is; it is quite possible that it will never happen).

The situation might be compared with a similar case, the slow spread of grow as a transitive verb. It seems to me that a few decades ago the only transitive use of this verb was in agricultural and biological contexts: you could grow carrots or geraniums (there the verb meant "cultivate"), and a lizard could grow a new tail (there it meant "come to have as a part of the body through a biological process"), and that was about it. You couldn't describe blowing up a balloon as "growing a balloon", even though by inflating it you were causing it to grow in size. But at some point a new use began to emerge: businessmen started talking about "growing a business" (there are 19 occurrences in 1987-89 Wall Street Journal articles). And then in the presidential campaign of 1991 one of the candidates, William Jefferson Clinton, starting talking about how we had to "grow the economy". The press picked it up, and soon there was a new possible object of the verb. I imagine it may now slowly spread out through the economic realm picking up other suitable direct objects as the sense "cause to become bigger by means of economic management" becomes entrenched. (Added later: Dirk Scheuring points out to me that by 1998 Guy Steele — the author of the SCHEME programming language) had written a paper called "Growing a language".)

So as ready slowly increases its reach as an intransitive, grow is slowly increasing its reach as a transitive. Though let me stress that what I have offered here is not an expert opinion; I have done no serious quantitative work on this topic, and I have no real expertise in diachronic lexical semantics. I have no doubt that the blogosphere will be able either to correct me or to extend what I've said.

Here are the four late-1980s examples of intransitive readying for as promised:

  1. If enough chief financial officers expect the window to slam shut shortly, the pipeline (where deals are readying for market) may get clogged.

  2. And out on Long Island Sound, a number of dinghies from the Larchmont Yacht Club jockey for position, readying for another afternoon of racing.

  3. Against this uncertain background, Europe's steelmakers are readying for 1992.

  4. The Navy had been readying for a grapefruit-sized satellite called Vanguard, and this was rushed to Florida for a quick blast-off in December 1957.

P.S., next morning: And the blogosphere has already come through. Eliah Hecht at Reed College went all the way through the Oxford English Dictionary entry (I was working last night without access to the OED), and found intransitive occurrences from as early as the late 60s, and one of them introduces something really new: verbal ready, in the present tense (from a photo caption, I think), with an infinitival complement clause!

b. intr. or absol. To make oneself ready or prepare in any way. U.S. 1967 Wall St. Jrnl. 12 Dec. 1 Machinists Union President Roy Siemiller, readying for aerospace bargaining, and Steelworkers chief I. W. Abel feel they must match the big rubber and auto settlements. 1972 Time 17 Apr. 22 (caption) In a cloud of catapult steam, a U.S. jet readies to attack Viet Nam.

Posted by Geoffrey K. Pullum at 10:20 PM

"Twice as long" is 50% longer, or maybe 42% longer, or was it 21% longer?

Simeon Yates and colleagues at Sheffield Hallam University have recently been reported to have done a study on gender differences in mobile phone text messages. I say that they "have recently been reported to have done a study" because I haven't been able to find a publication or preprint describing the study, so I'm forced to rely on reports in the mass media,, which I know from experience are often spectacularly untrustworthy. In this case, the information spread across the mainstream media is primarily an exercise in the expression of gender stereotypes. Some sample headlines:

BBC "Men write short, sarcastic texts"
Scotsman "Men keep phone texts short, sexy and full of swearing"
The Sun "Girls are going txt crazy"
Times of India "Women just can't keep it brief"

This would be less troublesome if the extraordinary quantitative illiteracy of the fourth estate were not so strikingly on display in the associated stories.

The Mirror's headline reads

"SEXES ARE DIVIDED ON JOYS OF TEXT: WOMEN'S text messages are twice as long as men's, new research reveals."

But a few sentences into the body of the article, we learn that

The study by Dr Simeon Yates of Sheffield Hallam University, says women enjoy in-depth texting while men use just a few words.

Dr Yates said: "Women's texts are 50 per cent longer. They text to communicate while men text to inform."

So suddenly "twice as long" is "50 per cent longer".

But it gets worse: The Sun's story gives us some (alleged) actual numbers:

Men’s texts average 60 characters, while women’s are 85.

That's more like 42% -- but I guess "50 per cent longer" sounds better, not to say "twice as long". And The Times of India tells us that

Dr Simeon Yates observed the mobile phone habits of hundreds of people. He found that female text messages typically go on for 82 characters, while men get their point across with just 68.

That's just 21% longer. Nevertheless, the same story leads with the trenchant observation that

As any long-suffering boyfriend or husband knows, some women just love to talk. Now research shows that females talk more than men even when communicating by text message.

That's a lot of interpretation to put on 82 vs. 68 characters, especially when another article tells us that men texting to women use an average of 80 characters. So even without knowing anything about the study's methodology, much less the actual facts found as opposed to the intoxicated exaggerations that seem to be the MSM's norm in reporting such stories, I'm reserving judgment on the other generalizations that are offered. These are things like

Women offer their friends love and support, while men often resort to sarcasm, sexual humour and swearing.

Men keep their texts short and snappy while women have seized on the mobile as a new way of expressing support and affection...

This certainly has the flavor of tabloid journalism, but it's not limited to the tabloids -- that last quote was from The Guardian. And some version of it might all be true -- or it might be journalistic froth that's quite different from what the study actually found, and also different from what Prof. Simeon Yates told the media. But as I observed a couple of months ago with respect to the ridiculous "email is worse than pot" story, scientists who go to the media with a story like this one -- or are taken up by the media without intending to be -- have a responsibility to make a responsible and factual version of their work available, either as a scientific publication, or at least as a responsible lay-oriented description on a web site somewhere. I looked at Simeon Yates' web site and found nothing; nor could I find anything on the web site of the journal he edits, Discourse Analysis Online, nor did searches on Google Scholar or Scirus turn anything up.

[By the way, I wonder: are copy editors expected -- or allowed -- to correct elementary errors in arithmetic?]

[Update: Linda Seebach writes

... yes, copy editors are expected to spot and correct errors in arithmetic, but actually at most papers (obviously they're not all the same) the line editor -- the primary editor who works directly with the reporter -- would have the chief responsibility. By the time the story gets to the copy desk, which may be hours later when the reporter is not available, it could be too late. If the copy editor sees "Eight is 40 percent of 73" he knows one of the three figures has to be wrong, but not which one.

It is also true that proofreading tends to focus on the local. Spotting inconsistencies in the text, perhaps widely separated, is harder than it would seem when you're looking for "Paris in the the spring" stuff.

That makes perfect sense.]

Posted by Mark Liberman at 01:49 PM

But the small buckets don't have enough leg room

Here's another one for the Fellowship of the Predicative Adjunct, this time from Science Daily: "People Eat More Stale Popcorn If Served In A Big Bucket".

[Via Tom Gilson]

Posted by Mark Liberman at 09:46 AM

Eating, drinking, sleeping snowclones

In an attempt to parse the Tom Paine quote "It sleeps obedience," Eric Bakovic ended up chasing a tangent, but what a very interesting tangent it is. He brought up the snowclone "X eats, drinks, and sleeps Y," understood to mean "X has an all-consuming dedication to Y." It turns out this is a remarkably "modular" snowclone, allowing a wide variety of conjoined verbs beyond eat, drink, and sleep, in numerous permutations and combinations.

Based on some quick-and-dirty corpus analysis using the Newspaper Archive database, I have isolated ten main verbs that can be used in the snowclone, suitable for paradigmatic alternation and syntagmatic combination:


These verbs tend to cluster in series of three, though the number can range anywhere from two to six. Rules for ordering are a bit hard to determine, beyond tendencies toward certain collocations, e.g., EAT-DRINK, EAT-SLEEP, and SLEEP-DREAM. The most common alternant is EAT, though it is not obligatory — indeed, the oldest example I have found so far, from 1882, uses DRINK SLEEP THINK, with no EAT. The second-oldest example thus far uncovered, from 1890, is the more familiar series EAT SLEEP DRINK, while EAT DRINK SLEEP appears as early as 1908.

As can be seen from the citations below, there are a number of variations on the canonical form of the snowclone. For instance, the object of the conjoined verbs may be repeated for emphasis in the form "X V1 Y, V2 Y, and V3 Y," as in the 1903 cite, "He eats railroad, drinks railroad, sleeps railroad and dreams railroad." Furthermore, the verbs are usually transitive, though occasionally they are used intransitively with a preposition like for, of, or about introducing the object. Because the conjuncts may belong to different verb classes, intransitive usage can lead to WTF coordinations as in the 1969 cite, "Why does a boy suddenly give up something he outwardly eats, thinks and sleeps about and then elect to do something else?"

As for the object of the consuming passion or dedication, it is very often "politics" or a sport like baseball or football. (This may reflect a preponderance of articles reporting on sports and politics in the corpus of newspaper articles, or perhaps journalists reporting on those subjects are more prone to clichés!) The object doesn't necessarily have to be an activity, though an activity is usually implied metonymically (e.g., "horses" for equestrian sports).

The list below is certainly not meant to be exhaustive. A bigger corpus (like webpages indexed by Google) will turn up countless more variations on the theme. But these examples provide a historical glimpse at the many guises a particularly flexible snowclone can take.

ACT THINK EAT SLEEP (Frederick, Md.) News, Aug 27, 1964, p. 8/1
Sen. Hubert Horatio Humphrey acts, thinks, eats and sleeps politics.
BREATHE EAT SLEEP Trenton (N.J.) Evening Times, Oct 14, 1910, p. 15/1
That peculiar brand of sport-loving inhabitant who breathes, eats, and sleeps for baseball first, last and all the time.
DRINK EAT SLEEP DREAM Bridgeport (Conn.) Telegram, Feb 11, 1918, p. 4/1
He drinks, eats, sleeps, and dreams continuously of notes, bills, bonds, stocks and business.
DRINK SLEEP EAT (Reno) Nevada State Journal, Apr. 28, 1946, p. S2/1
The victim practically lives in a bowling alley; he drinks, sleeps and eats bowling.
DRINK SLEEP LIVE Chillicothe (Mo.) Constitution Tribune, Oct 6, 1978, p. 9/3
Mr. Haney practically eats, drinks, sleeps, and lives sports.
DRINK SLEEP THINK Atlanta (Ga.) Constitution, Sep 28, 1882, p. 4/1
The native Georgian drinks, sleeps and thinks politics.
EAT BREATHE SLEEP Oakland (Cal.) Tribune, Apr 3, 1960, p. B23/1
When he is working on a new film, he literally eats, breathes and sleeps the role.
EAT BREATHE TALK Indiana (Pa.) Evening Gazette, Aug 18, 1950, p. 14/4
The trouble with Hollywood is that everybody eats, breathes and talks movies here.
EAT BREATHE THINK DREAM Frederick (Md.) Post, May 14, 1953, p. 14/2
He eats, breathes, thinks, dreams Army.
EAT DREAM LIVE Coshocton (Ohio) Tribune, May 11, 1967, p. 12/1
Baseball Reds' manager eats, dreams, lives the game.
EAT DREAM THINK Western Kansas Press, June 25, 1964, p. 4/1
Burt is different. He eats, dreams and thinks horses.
EAT DRINK SLEEP (Ada, Okla.) Evening News, Apr 21. 1908, p. 8/1
The politician who is following the calling for a livelihood eats, drinks and sleeps politics. You cannot get him off the track. He finds little interest in anything else.
EAT DRINK SLEEP BREATHE (Saint George, Utah) Color Country Spectrum, Apr 6, 1977, p. 4/3
She nearly eats, drinks, sleeps and breathes her native homeland — the red hills of Brigham Young's Dixie of the desert.
EAT DRINK SLEEP DREAM (Oshkosh, Wisc.) Daily Northwestern, Mar 12, 1903, p. 4/4
There is little doubt  that the governor has railroadophobia in an acute form. In fact, there is reason for believing that he eats railroad, drinks railroad, sleeps railroad and dreams railroad.
EAT DRINK SLEEP LIVE Council Bluffs (Iowa) Nonpareil, Nov 29, 1947, p. 7/7
Dyer is a fine manager; he eats, drinks, sleeps and lives baseball.
EAT DRINK SLEEP THINK Havre (Mont.) Daily News Promoter, Aug 30, 1929, p. 4/2
Farrell Macdonald eats, drinks, sleeps, and thinks one thing — acting.
EAT SLEEP (Connellsville, Pa.) Daily Courier, Sep 6, 1932, p. 1/7
Pinchot eats and sleeps politics.
EAT SLEEP BREATHE Portsmouth (N.H.) Herald, Mar 10, 1954, p. 8/1    
He eats, sleeps and breathes football.
EAT SLEEP DREAM Stevens Point (Wisc.) Daily Journal, July 25, 1913, p. 5/5
He eats, sleeps and dreams baseball all the time.
EAT SLEEP DREAM LIVE (Elyria, Ohio) Chronicle Telegram, Mar 20, 1980, p. D11/7
The vital element of any series is having one person who eats, sleeps, dreams and lives the show 24 hours a day.
EAT SLEEP DREAM WALK TALK ACT Lancaster (Ohio) Daily Eagle, May 6, 1918, p. 5/2
She eats, sleeps, dreams, walks, talks and acts pictures.
EAT SLEEP DRINK (Portland, Or.) Morning Oregonian, Mar. 11, 1890, p. 7/6
He eats sleeps and drinks his educational bill and he talks it upon every possible occasion.
EAT SLEEP LIVE Marion (Ohio) Star, Nov 8, 1940, p. 19/3
The man eats, sleeps, lives football.
EAT SLEEP TALK Bucks County (Pa.) Gazette, June 17, 1910, p. 2/4
The county seat just now eats, sleeps and talks Fourth of July celebration, the big, big day of the year.
EAT THINK DREAM Sheboygan (Wisc.) Press, July 23, 1913, p. 6/4
He eats, thinks and dreams baseball when he is not dreaming of the little cottage in Sheboygan.
EAT THINK SLEEP Edwardsville (Ill.) Intelligencer, June 28, 1969, p. 2/5
Why does a boy suddenly give up something he outwardly eats, thinks and sleeps about and then elect to do something else?
LIVE BREATHE SLEEP EAT Lincoln (Neb.) Evening Journal, Feb 14, 1967, p. 6/3
Miss Runn says she lives, breathes, sleeps and eats skating.
LIVE BREATHE THINK Oakland (Cal.) Tribune, Sep 30, 1940, p. 10D/1
Intense, he lives, breathes, thinks football all his waking hours.
LIVE EAT BREATHE Syracuse Herald Journal, Aug 15, 1965, p. 52/8
This is a man who really lives, eats and breathes football.
LIVE EAT BREATHE SLEEP (Zanesville, Ohio) Times Recorder, Nov 25, 1968, p. 4D/9
He lives, eats, breathes and sleeps music.
LIVE EAT DRINK SLEEP DREAM Fort Pierce (Fla.) News Tribune, Mar 20, 1952, p. 6/6
But the balding brown-eyed gentleman is described by friends as one who "lives, eats, drinks, sleeps and dreams politics."
LIVE EAT SLEEP Monessen (Pa.) Daily Independent, Feb 26, 1954, p. 7/1
Kovey is the popular umpire from Monessen who lives, eats and sleeps baseball.
LIVE EAT THINK SLEEP ACT San Mateo (Cal.) Times, Nov. 6, 1926, p. 1/1
He not only sells Studebakers, but he actually lives, eats, thinks, sleeps and acts Studebaker.
LIVE SLEEP Mansfield (Ohio) News Journal, July 21, 1941, p. 8/4
Greasy Neale, the Philadelphia Eagle coach, lives and sleeps football.
LIVE SPEAK THINK EAT SLEEP Fitchburg (Mass.) Sentinel, June 9, 1945, p. 8/1
The man lives, speaks, thinks, eats and sleeps baseball.
LIVE THINK EAT SLEEP BREATHE DREAM Wisconsin Rapids (Wisc.) Daily Tribune, Aug 03, 1929, p. 1/1
Everybody in Seattle lives, thinks, eats, sleeps, breathes and dreams Seattle and this state of Washington.
TALK EAT SLEEP Mansfield (Ohio) News, June 7, 1917, p. 12/5
King talks, eats and sleeps baseball.
TALK THINK EAT Appleton (Wisc.) Post Crescent, Oct 18, 1939, p. 4/5
Joe talks football, thinks football, eats football.
TALK THINK EAT SLEEP DREAM Lincoln (Neb.) Evening News, Jan 05, 1912, p. 3/5
In New York there is a man who talks, thinks, eats, sleeps and dreams peanuts.
THINK ACT BREATHE Key West (Fla.) Citizen, March 18, 1931, p. 3/5
He thinks, acts, breathes in headlines, slogans.
THINK DREAM Wisconsin Rapids (Wisc.) Daily Tribune, Nov 25, 1927, p. 7/2
He thinks, dreams football most of the year.
THINK EAT BREATHE (North Hills, Pa.) News Record, Aug 17, 1985, p. 5/3
With those credentials you would think Dunn is Mr Music. He thinks, eats and breathes the stuff, right?
THINK EAT DREAM Mansfield (Ohio) News, June 26, 1915, p. 10/2
Buck thinks, eats and dreams baseball.
THINK EAT DRINK (Elyria, Ohio) Chronicle Telegram, Oct 19, 1955, p. 16/8
Eddie Fisher...thinks, eats and drinks the product he sells on TV.
THINK EAT LIVE Oakland (Cal.) Tribune, Dec 9, 1927  p. 46/3
Helen Wills...thinks, eats and lives tennis.
THINK EAT SLEEP (Connellsville, Pa.) Daily Courier, August 19, 1926, p. 2/4
One of our customers is a merchant who thinks, eats and sleeps in terms  of business.

[Update: For the early history of the snowclone, see this post.]

November 14, 2005

Snowclone shortening

After reading Geoff's post yesterday on "it sleeps obedience", I'm thinking that this might be a deliberately shortened version of the snowclone "X eats, drinks, and sleeps Y", which generally means that all X does is (related to) Y. The first page of ghits for "eats, drinks, and sleeps" displays some typical examples.

[ Update: Aidan Kehoe rightly comments:

From the Tom Paine of two hundred years ago? The OED doesn't mention the "eats, drinks, and sleeps" construction under any of the verbs, which to me says it's recent

Somehow, I missed that it was a Paine quote. Thanks for the correction, Aidan -- that's what I get for posting before coffee. ]

  • When he's not out scaling mountains (he's a world-class rock climber), author Jim Collins eats, drinks, and sleeps business. (link)
  • He eats, drinks, and sleeps movies. Fortunately, he lives in New York City, the best place in the country for disorders of this type. (link)
  • He eats, drinks, and sleeps football, until Walter miscalculates and the roof caves in. (link)
  • A visit to Dr. Dre's recording studio reveals that he eats, drinks and sleeps rap--and rarely rests. (link)
  • Scientific openmindedness and thinking does not stop at 5 pm. One eats, drinks and sleeps scientific exploration. But how to get this across to students! (link)
  • We are seeking someone who works, eats, drinks, and sleeps the user's experience. (link)
  • My son eats, drinks, and sleeps basketball. By coming here, he has improved SO much. (link)

The shortened version "it sleeps obedience" indicates to me both the sense of the full snowclone that all "it" (parliament) does is to obey, and also that it does so completely passively (like sleeping), not actively (like eating and drinking), due to the PM's "opium wand".

[ Comments? ]

Alphabet wars

Controversy has been brewing since last week's announcement that a team of archaeologists had discovered an ancient alphabetic inscription on a stone unearthed near Tel Zayit, Israel. The leader of the excavation project, Ron E. Tappy of the Pittsburgh Theological Seminary, has claimed that the inscription is evidence of an Israelite state with widespread literacy in the 10th century BCE, a time that biblical scholars associate with the kingships of David and Solomon. As Ron Grossman reports in an article for the Chicago Tribune wire service, this interpretation plays directly into a volatile debate between "minimalists," who view biblical narrative as an unreliable guide to the history of the era, and "maximalists," who seek to bring the archaeological record into alignment with the stories of the Bible.

Rather too glibly, Grossman breaks down the significance of Tappy's claim not just for biblical minimalism and maximalism, but for opposing ideologies in the ongoing Israeli-Palestinian conflict:

By the Old Testament account, the 10th century was an era of the great kings David and Solomon, who built a mighty temple in Jerusalem. To Israeli nationalists, that version of the story gives their cause title to the Holy Land.

But minimalist scholars think the biblical account inflated; they argue that, in the 10th century, the Hebrews were wandering tribes, not nation or temple builders.

That account suits Palestinian nationalists just fine, because they claim Jerusalem as theirs.

The terms of this dispute should perhaps be unsurprising given the highly politicized nature of archaeology in the Holy Land (see, for instance, the work of Nadia Abu El-Haj on archaeology's role in shaping Israeli national identity). Nonetheless, it's a bit disturbing to read of the dueling accusations sketched out by Grossman (with no actual attributions): "One camp, 'the maximalists' implies the other harbors anti-Semites. The 'minimalists,' in turn, charge their accusers with confusing Zionism with scholarship."

It's difficult for someone outside of this debate to know how overstated Grossman's characterizations might be. But we should know soon enough, as Tappy and his colleagues will be presenting their findings at the meetings of two scholarly organizations in Philadelphia this week: the American Schools of Oriental Research on Nov. 16, and the Society of Biblical Literature on Nov. 20. Some attendees are already girding themselves for a serious clash, according to Grossman:

Philip Davies, professor emeritus at the University of Sheffield in England, is generally considered the founding father of the minimalists — most of whom are European-based. He is coming to the Philadelphia meetings prepared for battle with his American colleagues.

"When I fly the Atlantic, I feel like a gladiator," Davies said. "Tappy's research is going to be a football, kicked around from one side to the other."

There are some "biblioblogs" that are useful for keeping tabs on the debate, such as Tyler F. Williams' Codex Blogspot, Jim Davila's PaleoJudaica, Jim West's Biblical Theology, and Christopher Heard's Higgaion. On Joseph Cathey's blog there has been some interesting back-and-forth on the relevance of the discovery to the minimalist/maximalist debate. Scholarly mailing lists have also been active in discussing the Tel Zayit artifact, particularly ANE, b-hebrew, and biblical-studies. For first-hand accounts of the excavation, see the blog of dig participant Michael Homan, as well as this article in the Colorado State University newspaper about Dan Rypma, the CSU undergrad who actually uncovered the Tel Zayit stone.

Finally, I should note that in the online version of the article that broke the news of the discovery, the New York Times eventually put up an informative photo and caption that wasn't available at the time of my first post (in which I included a cropped version of a wire-service photo):

Courtesy of The Zeitah Excavations and Israel Antiquities Authority

Detail of the "ABC" Inscription from Tel Zayit, showing the letters waw through tet. Note that the letters are out of the traditional order: going (right-to-left) waw, he, het, zayin, tet rather than the expected he, waw, zayin, het, tet.

November 13, 2005

It sleeps obedience

Said Tom Paine (quoted by The Economist, 12 November 2005, p.13), speaking of the often somnolent state of the British parliament and its reluctance to rise up against a sitting prime minister by voting to defeat his proposed legislation:

"The minister, whoever he at any time may be, touches it as with an opium wand, and it sleeps obedience."

And although I did a bit of a double-take, I soon got the idea of what was meant by that stunningly ungrammatical sleeps obedience — with its intransitive verb assigned a direct object in defiance of all syntactic decency. It must mean "Parliament shows its obedience by sleeping" — it's somewhat like He nodded his agreement or She smiled her approval. Proof positive, were it needed, that we are capable of doing something quite remarkable with our native language: we can follow what is meant by sentences that do not have a prayer of being characterized as grammatical by the principles we normally use for our production and interpretation of utterances. I suppose one could instead say that there are no intransitive verbs, or that far from it being the case that (as I believe) nearly all strings of words are ungrammatical, rather, everything is correct, nothing is ungrammatical. But that seems to me to be a considerably less sensible view than the one I hold, which is that the principles that tell us what is grammatical tell us quite a bit about sequences of words that are not grammatical. Not every theory of grammar permits that to be the case. I think the ones that don't have got a problem.

Of course, I speak only of modern English. Steve, over at Language Hat, has already (within about an hour or so of me first posting this) mailed me to point out first that sleep can take what are called "cognate objects" (as in to sleep the sleep of the just), which I knew, and second, something I did not know, that the use of sleep as a transitive verb involved here is covered in the Oxford English Dictionary, complete with a somewhat different Thomas Paine quote, in a rather weird passage where what they say about the meaning can't be right:

7. To put off or delay; to disregard, pay no attention to. Also with out. Obs. 1470 Paston Lett. II. 398, I pray yow let not thys mater be slept. 1523 LD. BERNERS Froiss. I. cclxi. 385 So these companyons..slept nat their purpose, but rode in a day and a night. a1548 HALL Chron., Hen. VI, 123 These valeaunt capitaines, not myndyng to slepe their busines, environed the toune with a strong siege. 1600 HOLLAND Livy XXIII. xiv. 482 They might not sleepe their affaires and go slowly about their businesse. 1624 HEYWOOD Gunaik. IV. 179 To persuade men to too much remisnes in wincking at and sleeping out the adulteries of their wives. 1792 T. PAINE Writ. (1895) III. 79 It appeared to me extraordinary that any body of men..should commit themselves so precipitately, or 'sleep obedience'.

They cannot possibly mean that sleep means "disregard" in these examples. Surely sleep obedience doesn't mean "disregard obedience". My interpretation makes more sense. But Steve may be right that there is more historical research to be done here. Research that I haven't done.

Posted by Geoffrey K. Pullum at 04:40 PM

Distributed outsourcing

A couple of weeks ago, Amazon Web Services introduced the Mechanical Turk, which inverts the usual relationship in interactive computing by providing "a web services API for computers to integrate Artificial Artificial Intelligence directly into their processing by making requests of humans". The name is a reference to Wolfgang von Kempelen's Turk, an 18th-century chess automaton which pretended to be a sort of clockwork computer, but in fact incorporated a small, hidden, human player.

Amazon's Mechanical Turk (whose welcome page is here) is "currently experiencing extremely heavy traffic", so it's going to be hard to really sign up to

Complete simple tasks that people do better than computers. And, get paid for it. Choose from thousands of tasks, control when you work, and decide how much you earn.

I presume that most of the current traffic is rubbernecking, but in principle, this sort of thing could turn into a new kind of labor exchange, in which a large pool of workers can connect with a large number of (small or large) tasks.

Of local interest, this kind of labor exchange can be an efficient way to create training data for machine translation, speech recognition and various sorts of pattern-recognition and pattern-classification systems. There are obvious issues of training and quality control, but there are equally obvious solutions. Some colleagues at Johns Hopkins used a similar technique on a small scale a few years ago, to get translations done for a pilot project on machine translation in a language for which little parallel text was available. The main problem in extending their (quite effective) experiment was the issue of tax and employment regulations. It's not so easy for an American organization to pay a large number of individuals from around the world, without running afoul of various IRS regulations. As I understand it, the issues in payment for services are different in this respect from the issues in auction or sales sites like eBay. I wonder how Amazon deals with this problem?

I also wonder how Amazon prevents this from being used for the most obvious single application, namely helping spammers circumvent captchas?

A post at Bitporters media gives one Turk-worker's experience:

So four days and 505 HITs later I'm sitting at a cool $6.84. Note: More than half of my HITs are still in the pending state. I'm getting pretty quick at cracking these off, with my tabletPC I'm down to < 5-10 seconds pet HIT. At 3 cents a hit, I'm not really sure if this is a waste of time or not, for now I'm just going to do enough to buy a book I've been wanting.

If you can really keep up 10 HITs per minute at $.03 per HIT, that's $18/hour, which would appeal to a lot of people, especially for a job that you can do from any location, whenever and for however long you like. I wonder whether the numbers really do work out that way -- this person claims to have worked for 30 minutes at a rate of $10.20/hour, minus (an unknown number of) disallowed HITs -- but in any event, a system like this will presumably bring supply and demand into a world-wide equilibrium at a rate of pay that reflects the value that employers put on the product, and the value that workers with the relevant skills put on their time. If employment regulations actually permit such a marketplace to develop...

Moral evaluation in the news

In response to yesterday's post on "Evil", several readers wrote in to ask what planet I've been living on. How could it surprise me that journalists put opinions into news reports?

But it didn't, of course. The opinions of reporters and editors are reflected in a whole sequence of choices, starting with what to write about. Given that the topic is chosen, you get to pick whom to quote, and how to get interviewees to give you your conclusion back in quotable form, or at least to pretend that they did. You can decide whether or not to specify the agents responsible for a given event. Finally, you can choose your words, and choose which of them to put in scare quotes or to set off with qualifiers like "What X's call ___".

However, this is Language Log, and not Politics of Journalism Log. We've generally commented on this sort of thing only when it deals with linguistic topics, or uses linguistically-relevant techniques. In yesterday's post, I was commenting on two things. The first thing that caught my eye was the use of the word "evil", in Michael Slackman's own voice. I'm not used to seeing this word used in papers like the New York Times, in a news story outside of a quotation, even when the subject is something like a politically unconnected rape and murder. The usual practice, when strong and explicit moral judgments are to be made, is to find someone to quote. And as I thought about this, and read the Reuters coverage and the BBC piece by Jon Leyne, I was reminded that news stories about political bombings, especially in the Middle East, often don't even take the step of expressing opinion by evoking morally evaluative quotes.

In the case of this particular bombing, there seems to have been an unusual amount of moral evaluation in the rhetoric of politicians, whom journalists can't entirely avoid quoting. I surmise this is a symptom of the same cause that led to the NYT's choices: the ethnic identity of the innocent civilians who were murdered, and the political issues that are thereby (not) invoked.

I've noticed that even Reuters, contrary to its stated policy against using the words terrorist and terrorism outside of quotation marks, sometimes uses these words in headlines and news stories: "Australia foils terrorist attack"; "Asia terrorist suspect may be dead, but threat remains"; "Bosnian police last month arrested a Turk and a Bosnian-born Swedish citizen suspected of terrorist-related activities"; "Warfare wanes and terrorism rises, new study says". In most of these cases, there is a quasi-quotative context lurking around, involving legal charges or official suspicion or an authoritative report that uses the words in question. Still, I might have expected Reuters to use scare quotes in such cases, if their policy were being consistently applied. Though I'll confess that I don't know the facts about how often Reuters or other outlets have historically used such scare quotes in their headlines and reporting over time and space -- perhaps someone interested in journalistic bias has done such a study?

Posted by Mark Liberman at 09:07 AM

November 12, 2005

Bierce's Law?

Mark Liberman exposes a new victim of the "Law of Prescriptive Retaliation" — the Murphy-esque principle that corrections of linguistic error are themselves inevitably prone to error. The law was independently discovered around 1999 by Jed Hartman, Erin McKean, and alt.usage.english contributor Skitt. But it looks like an earlier observer of prescriptivist pitfalls has them all beat by a mile: Ambrose Bierce, in his slender, unjustly neglected volume of 1909, Write it Right: A Little Blacklist of Literary Faults.

Unlike the venerable Mr. Strunk and other early 20th-century contemporaries in the verbal hygiene racket, Bierce was acutely aware of the need for humility in any critique of English usage:

In neither taste nor precision is any man's practice a court of last appeal, for writers all, both great and small, are habitual sinners against the light; and their accuser is cheerfully aware that his own work will supply (as in making this book it has supplied) many "awful examples" — his later work less abundantly, he hopes, than his earlier. He nevertheless believes that this does not disqualify him for showing by other instances than his own how not to write. The infallible teacher is still in the forest primeval, throwing seeds to the white blackbirds.

Thanks to Jason Streed of the Finches' Wings blog for unearthing this quote. Streed further explains Bierce's allusion to "white blackbirds":

I was curious as to what "white blackbirds" could be referring to. Googling turned up its usage in various proverbs as a figure of improbability — "There'll be white blackbirds before an unwilling woman ties the knot" — as well as this a bit from the fascinating Aberdeen Bestiary: "In the regions of Achaia, according to Isidore, there are white blackbirds. A white blackbird represents purity of will. But by Achaia we understand the industrious sister. There are two sisters, Rachel and Leah, namely the active and the contemplative life. Leah we take to be the industrious one. The active life teaches us to devote ourselves to works of charity, to teach men who lack discernment, to have the purity of chastity, to work with our own hands. This is Achaia, the active life. In Achaia, therefore, like the white blackbirds, live those who live chastely the active life."

Dvorkin dangles

Another example of the Decline of Western Civilization linguistic ignorance of today's intellectuals, deftly critiqued at Headsuptheblog and Language Hat. This time it's Jeffrey Dvorkin, the NPR ombudsman, who is taken to task for assuming that Roberts' is a "plural possessive", among other sins.

Let me add a pointer to a nice dangling modifier in the second paragraph of Dvorkin's piece:

Visitors to Washington, D.C. may note that upon entering a D.C. taxicab, the car radio is often tuned to one of the two local NPR member stations -- WAMU or WETA.

In addition to pleasing the Fellowship of the Predicative Adjunct, Dvorkin's dangler is also a prime example of the Hartman/McKean/Skitt Law of Prescriptive Retaliation, according to which this post must itself contained at least one error.

Posted by Mark Liberman at 11:04 AM


A sentence in Michael Slackman's 11/11/2005 NYT article on the Amman bombings took me aback:

As investigators searched for the identities of the three attackers - and for evidence that they hope will lead to those who helped plan the terrorist strike - Jordanians, especially those who survived the explosions, were struggling to deal with the sheer evil of what happened.

It surprised me to see the word evil used like that in a news story, rather than in an editorial or an opinion piece. Even in the news, it's common enough in quotations, or as an ironic modifier in expressions like "the evil Baron Bomburst in Chitty Chitty Bang Bang", or in fixed expressions like "evil spirits". But Michael Slackman's own words, in this news story, refer to the bombing as a "terrorist strike", and presuppose that it is an instance of "sheer evil".

By comparison , Reuters refers as usual to "blasts" in "closely synchronized attacks" by "suspected suicide bombers". Similarly, BBC News wrote about "explosions" and "bombings" due to "attacks by radical Islamic militants". Use of terms such as "terrorist" or "terrorism" in stories from such sources was as usual limited to quotations, and (as far as I can tell by internet searching) the only other instance of the world evil being used in reference to the events in Amman was this quote:

"This is a worldwide evil," Foreign Secretary Jack Straw, visiting Jordan on his way to Iraq, told reporters at the devastated Hyatt hotel. "Jordan's determination to fight this terrorism is our determination too," he said.

In an interesting contrast, the BBC's Jon Leyne wrote a piece describing Jordanian reactions to the bombings that covers most of the same ground as Slackman's piece did, but with very different attitudes and words. Leyne does observe that people who are blown up generally don't like it much, so that "these attacks really do seem to have changed attitudes", but he describes the Jordanian protesters as

...enjoying themselves, gathering for candlelit vigils, driving around waving flags and hooting horns, sitting together singing patriotic songs.

Leyne does say that

They even chanted swear-words against Abu Musab al-Zarqawi, the Jordanian militant apparently behind the attacks, who had a fair bit of sympathy here before they happened.

But that's still a long way from "struggling to deal with the sheer evil of what happened". Leyne manages to write 900 words about reactions among journalists and Jordanians without making or describing any moral evaluations at all, except to suggest that there is now somewhat less sympathy for Zarqawi in Jordan than there used to be.

On the internet at large, the word evil is more commonly used than it is in news stories. So on a whim, I looked at the frequency of terms in the frames "__ is evil" and "__ isn't evil", and sorted the results by the ratio. By this (statistically unstable as well as morally misguided) measure, Israel and France are the most evil entities I found, while China and North Korea are the least:

Google counts
  __ isn't evil __ is evil  ratio (is/isn't)
Al Qaeda
North Korea

So much for the wisdom of crowds.

Posted by Mark Liberman at 09:49 AM

November 11, 2005


On her Abecedaria blog, Suzanne E. McCarthy draws our attention to the title of a new film adapting Jane Austen's most famous work: Pride & Prejudice. One can interpret the ampersand as a visual indicator of the movie's high-spirited take on the Austen novel, though McCarthy notes that many in the media have simply represented the title as Pride and Prejudice. She suggests that those typing or editing copy may be "making a grammar correction along the lines of 'In this context the ampersand really should not be used.'" Yet another editorial "correction" that turns out to be not so correct.

This isn't the first Austen adaptation to deploy an ampersand. Just last year there was Bride & Prejudice, a Bollywood-style reimagining of the novel set mostly in Amritsar, India. The use of the ampersand, along with the punning change of Pride to Bride, let viewers know that this was no ordinary adaptation. The same could be said for Baz Luhrmann's stylish 1996 film, Romeo + Juliet, which moved the action of Shakespeare's play to "Verona Beach" in contemporary southern California. For Luhrmann, an ampersand wasn't sufficient; only the ultramodern plus sign would do the trick.

So far, critical appraisal of the ampersand in Pride & Prejudice has been mixed. On Slate, David Edelstein calls the ampersand one of the "ominous first impressions" that he had to get over in order to like the movie. The Toronto Globe and Mail (or is it "Globe & Mail"?) says the ampersand signals a "contracted, contemporary approach" to the novel. The San Francisco Chronicle finds the typographical choice to be indicative of the movie's "jaunty approach." And the Detroit Free Press says "the only thing really new" in the film is "the hip ampersand of the title."

Contemporary! Jaunty! Hip! That's a lot of stereotypical baggage to put on a modest piece of punctuation that has been kicking around in one form or another for about two thousand years.

[Update #1: Duane Dudek of the Milwaukee Journal Sentinel picked up on the recent cinematic trend of using the ampersand, as in this summer's Hustle & Flow, Kicking & Screaming, and Mr. & Mrs. Smith. The choice by filmmakers is an aesthetic one, based on connotations of hipness and modernity (despite the typographic element's long history back to Roman times). Graphic artists trace the origin of the ampersand as a "modern" design element to the art director Herb Lubalin, known for his work on Avant Garde magazine and the journal U&lc (not U&L, as the Journal Sentinel has it), which stands for 'upper and lower case.' Dudek further notes that marketers looking for a "cutting-edge" visual approach see the ampersand as a way of attracting a younger, technologically savvy demographic, who know it from messaging/texting shorthand.]

[Update #2: Suzanne McCarthy has more on Abecedaria about the typographic details of the italic ampersand used in Pride & Prejudice advertising. She also takes a look at the connector in Romeo + Juliet, which appears as more of a cross than a plus sign in the movie poster.]

[Update #3: Several readers email to question the idea of the ampersand as a signifier of hip modernity. Carrie Shanafelt notes that the ampersand was actually quite common in the typography of Jane Austen's era, though it was largely restricted to use as a symbol for Latin et. Richard Mason detects a "retro air" in ampersand usage, pointing to "the old-timey flavor" of Smith & Company as compared to its more modern equivalents, Smith, Inc., Smith LLC, and (See Mason's blog for further comments.) Clearly what is old is new again.]

Posted by Benjamin Zimmer at 02:17 PM

Sex doesn't matter

At least it doesn't seem to affect conversational speech rate, and the effect on the relative time of conversational contributions is small (in mixed-sex conversations, men use about 5-6% more talk time than women). That's the result of this morning's Breakfast Experiment, in which I ran a few little perl scripts over the transcripts from a published corpus of conversational speech (Fisher English Training Speech Part 1).

This corpus comprises 5,850 conversations of about 10 minutes each, for a total of 11,700 conversational sides. It forms about a third of the set involved in my two previous Breakfast Experiments, in which I looked at the effects of speaker (and interlocutor) sex and age on the frequency of filled pauses and assenting murmurs. This time I was interested in overall word counts and time spans, which were not provided by the interactive search program that I used before, and so I had to do a bit of programming.

A typical transcript fragment (this one is from conversation 5189) looks like this:

67.53 69.74 B: my best friend is my wife
69.41 80.38 A: is that right well that's just about the same the same way with me [laughter] i can depend on her [laughter]
70.63 71.69 B: (( yes ))
82.37 89.12 A: yeah [lipsmack] well that's why i think i say i have a few special friends 'cause the ones i have are
89.45 93.62 A: special because i can always depend on 'em and and
93.05 93.70 B: (( [noise] ))
94.23 100.98 A: ah they're always there when you need 'em if you need 'em and 'course i never hardly ever need 'em so that makes it nice
101.84 103.32 B: that's how i feel

There's a relational table telling us that in conversation 5189, the A side was speaker 35043 and the B side was speaker 75769, and thre's another table telling us that speaker 35043 was a 66-year-old man with 16 years of education, raised in Oregon, while speaker 75769 was a 42-year-old man with 14 years of education, raised in California.

So for this morning's exercise, I wrote a little script that added up the time and the word count on the A side and the B side of each conversation; looked up the conversational number in the tables that defines which speakers were involved in which conversations, and printed out the results in a long list like this:

00001 A 149 773 208.5 222.446 B 138 876 212.26 247.621 2602 m.a 1790 f.a
00002 A 113 632 159.69 237.46 B 204 1451 382.64 227.525 2152 f.a 9998 m.a
00003 A 161 502 205.9 146.285 B 209 1043 344.35 181.734 5897 f.a 9997 m.a
00004 A 117 627 209.57 179.51 B 165 1001 343.32 174.939 4775 f.a 4612 f.a
00005 A 106 760 221.01 206.33 B 121 483 167.14 173.388 5334 m.a 2066 f.o

where (for example) the fifth line means that the A side of conversation #5 involved 106 segments, 760 words, and 221.01 seconds, for a speech rate of 206.33 words per minute; and was produced by speaker #5334, who was a male native speaker of American English. The B side of the same conversation involved 121 segments, 483 words, 167.14 seconds, for a speech rate of 173.388 wpm; and was produced by speaker #2066, who is a female whose dialect is "other" than American English (in fact the caller table tells us that she is a native speaker of Japanese).

The distribution of speech rates across the 11,700 conversational sides in the corpus looks like this:

The sample mean of this distribution is 172.7 wpm, and the standard deviation is 27.23.

If we then calculate the overall speech rate of the male native speakers of American English, we get 174.3 wpm; and for female native speakers of American English, we get 172.6. (Note that these were calculated in a different way, based on the total words and the total time for a given category of speakers, instead of averaging the speech rates calculated for the individual speakers in the category.)

The overall speech rate of the men whose dialect was "other" (which was mostly non-native speakers, but also includes some native speakers of variants such as British and Australian) was 162 wpm, while for "other" women it was 160 wpm.

What about the effect of the conversational partner's sex? There wasn't much.

American native-speaker (ANS) women speaking to ANS women produced on average 907 words in 315 sec., at an average rate of about 173 wpm. ANS women speaking to ANS men produced 872 words in 303 sec., at the same average rate (173 wpm).

ANS men speaking to ANS men produced on average 940 words in 324 sec., at an average rate of about 174 wpm. ANS men speaking to ANS women produced an average of 934 words in 320 sec., for an average rate of about 175 wpm.

In mixed-sex conversations, the men on average used about 5.7% more time than the women did, and produced about 7% more words.

There were 1,694 mixed-sex conversations between native speakers of American English in this corpus, and in 931 of them (55%) the male participant took more overall speech time than the female participant. (And of course in the other 45%, the female participant took more overall speech time.)

These results, though basically negative, are not without interest. Given the general (and reasonably well supported) belief that women have greater verbal facility than men, we might have expected to see women's speech rates significantly higher than men's. However, there was essentially no difference. Given the various available stereotypes about sex and talk, we might have expected to see a large difference in average talk time in one direction or another. Of course, it's not clear which way it should go -- maybe women's stereotypical chattiness should make them talkier, or maybe men's stereotypical drive to dominate should make them the winners. Here as often, stereotypes are like horoscopes -- after the fact, they can be seen to have predicted just about any outcome. Anyhow, the effect in this case was a small one, so neither stereotype has much work to do.


This corpus was created for the purpose of training speech recognition systems, not for linguistic or sociological research. The transcripts are not perfect, and neither is their alignment with the audio time lines -- in both cases, the process was designed to optimize cost-benefit tradeoffs for ASR research.

Also, the demographic variables are by no means guaranteed to be orthogonally varied -- the distribution of geography, age, educational level, conversational topic and sex may contain some relevant partial correlations. (I don't know that they do, but I also don't know that they don't.) Finally, different portions of the transcripts were done by different groups using different methods -- although nominally the same specifications.

For all of these reasons, and some other ones, conclusions based this corpus should be checked for the possible influence of uncontrolled demographic variables, and a random sample of the associated data should be re-transcribed and/or re-aligned in order to estimate the relevant rates of error and/or bias.

In my experience, essentially all sources of empirical evidence, in linguistics and in other subjects, are subject to similar sorts of questions. In this case, the good news is that all the relevant raw information -- the audio, the transcripts, and the demographic tables -- has been published, and so the empirical foundation of any results can be checked in these and other ways.

I should not need to stress that the patterns shown in telephone conversations between pairs of American strangers speaking about assigned topics are not necessarily characteristic of other kinds of conversations or other groups of speakers.


November 10, 2005

"I don't think that's accurate"? I don't think that's accurate

The official transcripts archived at the White House website tend to be relatively trustworthy representations of public speaking by President Bush and other officials. The transcribers apparently feel no compulsion to clean up the notorious disfluencies of the President (as with his recent substitution of "marriage" for "merits" when introducing Samuel Alito, or his frequent use of the singular copula with a plural noun phrase). But there's a controversy over the White House transcript of a press briefing by the President's mouthpiece, Scott McClellan, and it's not over some nitpicky grammatical point like subject-verb agreement. Rather, it's over a statement that could have serious legal implications in the ongoing CIA leak investigation.

At issue is a short response McClellan interjected into a question from NBC News correspondent David Gregory at the Oct. 31 press briefing. First, here is a transcript of the exchange as provided by the Federal News Service and archived by LexisNexis:

Q Whether there's a question of legality, we know for a fact that there was involvement. We know that Karl Rove, based on what he and his lawyer have said, did have a conversation about somebody who Patrick Fitzgerald said was a covert officer of the Central Intelligence Agency. We know that Scooter Libby also had conversations.

MR. MCCLELLAN: That's accurate.

Now here is the official White House transcript:

Q Whether there's a question of legality, we know for a fact that there was involvement. We know that Karl Rove, based on what he and his lawyer have said, did have a conversation about somebody who Patrick Fitzgerald said was a covert officer of the Central Intelligence Agency. We know that Scooter Libby also had conversations.

MR. McCLELLAN: I don't think that's accurate.

The question is transcribed exactly the same, but the answer is obviously quite a bit different. Congressional Quarterly agreed with the FNS transcript of "that's accurate" and also published it that way. But when White House officials discovered the diverging transcriptions, they asked CQ and FNS to change their version to match the official one, according to a Nov. 7 article in CQ by Chris Lehmann (text provided by Wonkette, aka Ana Marie Cox, who happens to be married to Lehmann):

When the White House noted the discrepancy, officials asked CQ editors to revisit the wording of McClellan's reply. This was curiouser still, since while one could conceivably argue that McClellan tripped over his intention to say "That's inaccurate," his delivery is far too rapid-fire for the expansive wording "No, I don't think that's accurate."

CQ Transcriptions has declined to alter its account; FNS has not done so, either.

The video supplied on the White House website certainly seems to support Lehmann's account. (You can find the relevant portion at about 5:30 in the White House video, or you can just watch the excerpt given on the Think Progress blog.) There is simply no way that McClellan could have fit "I don't think..." before "...that's accurate," unless the video is somehow missing a crucial second or two. Nonetheless, the White House continues to stand behind the official transcript, as reported by Editor and Publisher:

White House press office spokeswoman Dana Perino confirmed that her office had requested a review of the transcripts, noting, "it was simply to point out that the official transcript by the White House stenographer had it as it was released and that is all it was," she said, saying the White House transcript was never altered.

When asked about the fact that the White House version contradicts video accounts of the briefing, Perino added, "the White House stenographer was in the room and I was in the room" and they heard McClellan say "I don't think that's accurate'."

It's a mysterious case, one that already has bloggers invoking Orwell and vanishing commissars. It doesn't seem possible that the White House video could have skipped over the first segment of McClellan's response, if there are indeed multiple recordings of the exchange that have him saying only "That's accurate." So how else to explain the account of Perino and the stenographer? Did they hear what they wanted to hear?

In his CQ article, Lehmann begins by noting that "semantics can loom large in the history of a White House scandal" and compares the transcript dispute to Bill Clinton's famous rumination over "what the meaning of the word 'is' is." Lehmann concludes that "complaints about inaccuracy may ultimately depend on what the meaning of 'accurate' is." But this is not a semantic question about the meaning of the word accurate. It's a question of how the government and the media go about making authoritative representations of public discourse in print and audiovisual forms, and who we trust to make those representations. It just so happens that these issues of representational accuracy hinge on a legally charged piece of discourse about accuracy itself.

[Update #1: On the general subject of Scott McClellan's pragmatic opacity, see this post from Polyglot Conspiracy detailing McClellan's wanton disregard for Gricean maxims.]

[Update #2: Nancy Wiegand of the University of Michigan sends along an excellent analysis of the situation, which is worth reproducing in its entirety:

There's just no way that McClellan ever wanted to admit in a press briefing that the situation as described by David Gregory in the opening sentence of his question was "accurate". So why would he say it? There's just something way the hell wrong with that picture, that any explanation of the discrepancy between the CQ/FNS transcripts and the WH version has to take into account.

You (and others) are absolutely right-this is no slip of the tongue or grammatical error, he does have a rapid-fire delivery, and there is absolutely no room for him to have included "I don't think" between the end of the question and "that's accurate". But it's also true that turn-taking is not that cut and dried in many if not most heated conversations, ones where one or both speakers have a position to maintain or a point to prove and are feeling under considerable pressure to do so. So off I went to look at the video clip, thinking it was perhaps possible that McClellan had begun talking, and uttered the "missing" 3 words, before the questioner had finished.

Well, having watched the video excerpt several times, it's pretty hard to tell whether his mouth starts moving before the end of the question, though there's certainly a case to be made that it does. And I don't think you can actually hear him speaking. (But I also grew up as the daughter of a sound recording engineer, who taught a couple generations of Hollywood sound men, so the mantra "your soundtrack is only as good as your boom man" comes immediately to mind. Okay, I'm dating myself, but nevermind.) Gregory's voice is louder and clearer than McClellan's. I don't know how these things are taped, but isn't it possible that a comment by McClellan "underneath" the voice of the questioner wouldn't be picked up?

So, as I said, the evidence for mouth movements is difficult to interpret. It's a small picture on my computer screen, not close up, and it goes by fast without me being able to pause it.

But there was other evidence that is in fact much clearer support for the "interjected comment" hypothesis. It's the body language.

Look at him. He starts out listening readily. Then his mouth opens, and closes again. He's almost immediately got something in his mind that he wants to say, and he's just waiting for the question to stop so he can say it. He nods. ("Yes, I've got your gist, I'm ready to reply."). At this point, his expression is still open — he's listening, though he's also blinking more. He nods again. But as the question keeps going on, he gets impatient to be able to make his point. He gets more tight lipped. He shifts his balance from one foot to the other, then back. (If I wanted to be unkind, I'd say he looks like my 8-year-old son when he really needs to pee.) He kind of twists his head around a bit, as if his collar is bothering him. And then, the most telling point, he gets tight lipped, Gregory stops for a breath, and as McClellan speaks, he shakes his head. It's a quick shake, so it goes by really fast, but it's there.

And in fact — more evidence still — it wasn't the end of the question at all. McClellan's comment, whichever it was, is interjected into the middle of Gregory's speech, at basically the first possibility, and there is hardly a split second between when he finishes and Gregory continues with the rest of his question ("So aside from the question of legality here, you were wrong.") McClellan is bound and determined to take the very first chance to contradict what Gregory is saying.

There is simply no way that McClellan looks and acts like a person who is agreeing with the person who is speaking to him so aggressively, aiming his question in the hard-hitting tone of someone wanting to put the questionee on the defensive. And, come to think of it, Gregory doesn't appear to take McClellan's comment as agreement — he goes right on, ... "you were wrong".

I agree with Wiegand's dissection of McClellan's delivery, and I think that all of the paralinguistic cues may have informed the White House stenographer's interpretation of McClellan's interjection. But based on the video evidence I still don't see how the beginning of McClellan's comment could have occurred "underneath" Gregory's voice. I chalk it up to a slip of the tongue, perhaps due to overeagerness on McClellan's part to make an interjection. It's just perplexing that Dana Perino could not have simply said that McClellan intended to say that he didn't think Gregory's point was accurate. There seems to be a great deal at stake in fixing an apparent slip in the transcript itself.]

[Update #3: McClellan has addressed the issue in his typically opaque manner in an interview with Newsweek:

On Friday, McClellan told NEWSWEEK that he had reviewed the video and requested White House stenographers to "take another look." "If there's something wrong, we'll correct it immediately," he said, denying the White House had intentionally altered the transcript. McClellan would not say if he misspoke, but told NEWSWEEK he "disagreed" with Gregory's statement that Fitzgerald had described Plame as a "covert officer."]

November 09, 2005

Disentangling the entanglements

The announced "retirement" of Judith Miller from the New York Times helps to resolve a couple of loose ends from my Oct. 24 post, "Semantic entanglements." Appended to the memo that executive editor Bill Keller sent to the Times staff this afternoon is the text of a letter from Keller to Miller. In it, Keller addresses two points that Miller had complained about in his earlier staff memo regarding the CIA leak case.

First, you are upset with me that I used the words "entanglement" and "engagement" in reference to your relationship with Scooter Libby. Those words were not intended to suggest an improper relationship. I was referring only to the series of interviews through which you ­ and the paper ­ became caught up in an epic legal controversy.

Second, you dispute my assertion that "Judy seems to have misled" Phil Taubman when he asked whether you were one of the reporters to whom the White House reached out with the Wilson story. I continue to be troubled by that episode. But you are right that Phil himself does not contend that you misled him; and, of course, I was not a participant in the conversation between you and Phil.

Once again it's helpful to use the lens of speech act theory. In the first case, Keller asserts that his use of entanglement and engagement was perfectly innocent, lacking the illocutionary force of an indirect speech act accusing Miller and Libby of an improper relationship. Nonetheless, Keller's use of those words had the unintended perlocutionary effect of offending Miller. In the second case, Keller modifies his claim that Miller's statements to Washington bureau chief Philip Taubman constituted an apparent illocutionary act of misleading, by acknowledging that there was no perlocutionary effect of Taubman feeling misled.

There's enough fodder for a whole thesis on journalistic pragmatics lurking in those memos.

One for the next edition

This looks like a pretty good model for the process behind many of the entries in Adam Jacot De Boinod's The Meaning of Tingo:

Of course, PartiallyClips has a LiveJournal, so that Rob Balder knows whereof he writes.

November 08, 2005

The oldest Hebrew alphabet?

The New York Times reports on a fascinating archeological discovery made in Tel Zayit, southwest of Jerusalem: a stone dated to the 10th century BCE inscribed with an abecedary (the letters of the alphabet written in their traditional order). According to Ron E. Tappy of the Pittsburgh Theological Seminary, who directed the dig, this is the earliest known rendering of the Hebrew alphabet, distinct from Phoenician predecessors. Other experts are not so sure, though everyone seems to agree that the Tel Zayit stone is an important find.

Tappy's interpretations of this artifact and others from the excavation project fit into his controversial theory about the Israelite kingdom of the era, which he argues was a sophisticated political entity with extensive literacy. His arguments are apparently predicated on the Biblical history of David and Solomon. Tappy will report on his findings in Philadelphia next week at the meetings of the American Schools of Oriental Research and the Society of Biblical Literature. According to the Times, Tappy's critics are expected to challenge his conclusions at the meetings.

(The Pittsburgh Post-Gazette has more on the controversy.)

[Update, 11/9/05: Tappy has held a news conference and has added information about the Tel Zayit inscription on the excavation website.]

[Update, 11/10/05: Tappy's hometown papers continue to provide the best coverage. See articles about the official announcement of the discovery in Pittsburgh's Post-Gazette and Tribune-Review. The latter identifies the actual discoverer of the stone: Dan Rypma, an undergraduate volunteer from Colorado State University.]

Another breakfast experiment

This time the subject is sex and murmurs of assent. Read on for the details, which are less interesting than you think. Or maybe more interesting, depending on your outlook and ... Never mind, you'll see what I mean.

Thanks to a nice database access program written by Mike Schultz (now at Microsoft Research), with a web interface written by Bill Clark and Shawn Medero, I can submit a query like

"uh-huh" & sex:female & sex_opp:male

and 57 milliseconds later, learn that my query "returned 11749 hits in 2719 documents" from a large collection of transcribed telephone conversations. The meaning of the query is "tell me about the search string 'uh-huh', looking only at female speakers who are talking to male conversational partners". I can also read the transcripts of the hits, and listen to the associated audio, but that takes too much time for current purposes -- this morning, I've only got a half an hour over a couple of cups of breakfast coffee.

In an earlier post, I looked at the frequency of uh, um, uh-huh, mm-hmm etc. as a function of speaker sex and age. This morning, I thought I'd check the effect of the sex of conversational partners.

For the first mini-experiment, I checked the counts of uh-huh and um-hum (which is how these transcripts mostly represent the sound more commonly transcribed as "mm-hmm"). Summing the various ways of transcribing assenting murmurs, and normalizing by the counts for the word the for the same speakers in each category, the results look like this:

Male speakers produced assenting murmurs 25% more often when talking with a woman than when talking with another man. Female speakers murmured assent 20% more often in cross-sex than in same-sex conversations. (The question of statistical significance will have to wait for another time -- but the Ns are pretty big here.)

Andrew Gelman over at Statistical Modeling, Causal Inference and Social Science (and by the way, I think that's the longest name among the blogs that I read regularly) suggested some improvements in my earlier um/uh/mmm-hmm plots. So in the plot above, I've started the y-axis at 0, and made the males blue and the females red. And here are some plots of the effects on "uh", "um", "yes" and "no", arranged in Andrew's recommended par(mfrow=c(2,2)) format (that's R-ese, for those of you who are not in the R subculture).

The counts for uh and um have had the counts for uh-huh and um-hum subtracted from them, as in my earlier post.

Note that these conversations involve strangers having telephone conversations on an assigned topic, and the results may not necessarily generalize to other sorts of interactions. In fact, that lesson is implicit in the results themselves, and in my opinion is the most important result of this sort of exercise.

The counts that we get from analyzing people's behavior can be significantly affected by many properties of the people and their context, and also by many interactions among these properties. We need to be careful about jumping to conclusions, especially when dealing with variables like sex that have a wide range of associated stereotypes with strong emotional loading.

Posted by Mark Liberman at 09:03 AM

Making Yucatec Maya "cool again"

Back in July we heard the intriguing news that Mel Gibson's next film project, Apocalypto, would be shot entirely in "Mayan." At the time it wasn't clear which of the many Mayan languages this might refer to, but now the situation has been clarified — sort of — by Gibson himself.

In an Oct. 28 news conference in Veracruz, Mexico, Gibson gave a few more details about his "action-adventure film of mythic proportions." AP reports:

The movie is scheduled to begin production Nov. 14 and will be shot almost entirely in the jungle of Mexico's Veracruz state.

The film's stars will be unrecognizable to most moviegoers, and they will speak in the Mayan tongue of Yucateco, Gibson said. It will be light on dialogue and heavy on images and action. It's set 600 years ago, prior to the 16th-century Spanish conquest of Mexico and Central America.

Yucateco, or maya yucateco, is the Spanish term for Yucatec, or Yucatec Maya, a language spoken by about a million people in Mexico's Yucatán Peninsula (with some additional speakers in Belize and northern Guatemala). It's the most obvious candidate for a Mayan-language historical drama set in pre-Conquest Mexico, since modern Yucatec is still widely spoken and is a direct descendant of Classic Maya. But that doesn't stop journalists from calling it an "obscure Mayan dialect" (as in a photo caption for the AP article as well as a July Variety article), though it is neither particularly obscure nor a dialect.

The AP article also says that the movie will star "unknown Mexican actors speaking in an ancient tongue." Despite the fact that modern Yucatec has retained many features of Classic Maya, its "ancientness," like that of any living language, is highly disputable. This could mean, however, that the translators hired by Gibson have attempted to render the dialogue in a reconstruction of Classic Maya as spoken six centuries ago, with Yucatec Maya as the nearest modern approximation. From a linguistic point of view, one hopes that the end result is a bit more skillfully done than the "authentic" dialogue that ended up in Gibson's last project, The Passion of the Christ. (For discussion of the Aramaic and Latin used in The Passion, see Language Log here and here, and also Language Hat here and here.)

Gibson's motivation for linguistic verisimilitude is not tied to his religious devotion, as it was with The Passion. But according to the Los Angeles Times, Gibson is on a different kind of mission with this film: to make it "cool again" to speak a Mayan language (as cool as it was 600 years ago?).

Gibson said that the plot of "Apocalypto" — a Greek word that translates as "new beginning" — concerns an Indian family man who "has to overcome tremendous odds to preserve what he values the most." The movie will employ relatively unknown actors along with hundreds of extras and will utilize Mayan dialect.

Gibson hopes that one effect may be to bolster a threatened idiom that is frequently treated with disrespect, in Latin America. "My hope is that it [the movie] makes this language cool again and that they [indigenous people] speak it with pride," he said.

It's unclear whether Gibson has been in contact with any activists or linguists involved in the Mayan language revitalization movement. (For an overview of revitalization efforts, see Nora England's article in the Dec. 2003 issue of American Anthropologist.) But he claims to have immersed himself in Mayan culture and history, a process he described at the news conference as "kind of this anthropological journey." And a Reuters report says that his research for the film has drawn on "input from indigenous groups and Spanish mission texts from the 1700s and Mayan language translators."

Beyond that, Gibson has remained cryptic about the film's content. He did mention, however, that one of his major inspirations is the Popol Vuh, a sacred manuscript of mythic stories written in Quiché (a Guatemalan Mayan language) shortly after the Conquest. There has already been speculation from the apocalyptically inclined about how a Gibsonian reading of the Popol Vuh might dovetail with Christian millenarian prophecies of the End Times. So perhaps Apocalypto won't be so far removed from Passion territory after all.

[Update #1: Here is a more extended report of Gibson's linguistic comments, from the EFE News Service via Factiva (apparently translated from English to Spanish and back to English):

The director, who said his fascination with the mysteries of the Maya drew him to the project, told reporters that the production would treat Maya culture with great respect.

He described the plot of the film as "a universal story" and said that an additional incentive for filming the movie in a Maya dialect was so that the Maya people and other Indian groups could feel pride in their languages.

"In Mexico and other parts of the world, there are languages that are becoming extinct. I hope 'Apocalypto' in Maya generates an interest in indigenous languages and helps preserve them," he said.

Gibson noted that one of the translators who worked with him on the script told him that many Maya-speaking schoolchildren are laughed at because of their language and feel ashamed.

EFE also reports that Gibson's Mayan reading list included "texts by 16th century bishop Friar Diego de Landa y Calderon, who wrote the book 'La relacion de las cosas de Yucatan' (The relation of things of the Yucatan)."]

[Update #2: Language Hat says the translation of Greek apocalypto (ἀποκαλύπτω) as 'new beginning' is "ridiculous," since it is a verb meaning 'uncover; disclose, reveal.' That was the gloss given in the Los Angeles Times article, while the AP article provides some more context:

The film's title, "Apocalypto," a Greek word for an unveiling or new beginning, "just expresses so well that I want to convey," Gibson said. "I think it's just a universal word. In order for something to begin, something has to end. All of those elements are involved. But it's not a big doomsday picture or anything like that."

Gibson's "new beginning" interpretation seems to resonate with certain New Age readings of the Popol Vuh and other Mayan sacred texts. One popular theory, as mentioned on the apocalyptic page linked above, is that a great cataclysmic event will occur on December 21, 2012, when the Mayan calendrical cycle is said to end. There is a whole body of mystical New Age literature on this subject, as a Web search on 2012 quickly reveals. One blurb for the book Maya Cosmogenesis 2012 says that in the fateful year "a new age is expected, one in which humanity will mutate spiritually into a new relationship with space-time and the material universe." Whether Apocalypto relates to these fanciful theories is perhaps known only by Mr. Gibson himself.]

[Update #3: For more on Friar Diego de Landa and his mixed legacy, see Suzanne E. McCarthy's post on Abecedaria.]

[Update #4: James Terry takes issue with my description of Yucatec Maya in a comment on Language Hat:

Yucatec Maya is not a direct descendent of Classic Maya, nor is it the modern Maya language most closely related to Classic Maya (often referred to as Ch'olan). The Yucatecan languages (Yucatec, Itza, Lacandon) are part of a northern branch that split off about 3000 years ago from the lines that formed the southern Mayan languages. Ch'olan was part of the southern branch. The descendents of Ch'olan are Cholti (extinct), Chorti, Chol, and Chontal. These are the languages most closely related to Ch'olan/Classic Maya; closest of all is possibly Chorti, spoken today in a small area near the Guatemalan/Honduran border.

In my defense, I never claimed that Yucatec Maya is the language "most closely related" to Classic Maya. Also, as I understand it, epigraphic evidence suggests that the Classic Maya spoke both Ch'olan and Yucatecan languages, so neither branch can claim to have a more direct line. But it was misleading for me to mention "Classic Maya" in the first place, since that usually refers to the "golden age" of Mayan civilization ending around AD 900, long before the time depicted in Gibson's film. I should have said that modern Yucatec Maya is descended from the "classical" language of Yucatán as spoken in the 15th-16th century, which is presumably what the Apocalypto translators have tried to approximate.]

Tawk of the Town

In the 11/14/2005 issue of the New Yorker, John Seabrook has a Talk of the Town piece (titled "Talking the Tawk") that begins

Professor William Labov is to American dialect what Lewis and Clark are to American geography.

The occasion is the launch of The Atlas of North American English, which Bill Labov and others (especially Sherry Ash and Charles Boberg) have been working on for the past decade. More on this later -- for now, just go read what Seabrook has to say!

Well, I'll add one small thing. When I first put in the Seabrook quote, I thought that the singular "American dialect" must be a typo. But no, I got it by cut-and-paste, and there it is in the first sentence in the magazine, singular as can be. I guess it's being treated as a strange kind of mass noun, sort of like talking about "American film".

I'm not used to seeing dialect used that way -- for me, Seabrook's sentence is like "Francis Ford Coppola is to American movie what X is to Y..." The many other uses of dialect in Seabrook's article are all either plural ("the speakers of Southern dialects"), specifically singular ("the Inland North dialect") or internal to complex nominals ("the most extreme dialect change in the country").

Posted by Mark Liberman at 05:00 PM

November 06, 2005

Young men talk like old women

Well, in a couple of specific respects, anyway. Details follow...

Over the past few years, the Linguistic Data Consortium (LDC) has collected and transcribed a large number of telephone conversations for the purpose of speech recognition research. Some of this has already been published (sample catalog entries are here and here), and the rest will be published soon. The collection is an interesting basis for some new sorts of linguistic research, in my opinion, and below I present a small example of a suggestive result -- about the interaction of age, sex and fluency -- that took me about half an hour to produce.

First, let me try to explain why I think collections like this one represent a new research opportunity. One new thing is simply the fact of access to an existing corpus: because the audio and transcripts are already done, and some demographic data is available about the speakers, it's easy to ask and answer many simple questions that wouldn't motivate the time, effort and funding needed to create a special-purpose data collection. Another new thing is the scale of this particular collection: combining various publications into a single database, we have 28,000 conversational sides (and therefore about 14,000 conversations) whose transcripts comprise more than 26 millions words, involving about 12,000 speakers from all over the U.S. (with some Canadians and a few speakers of other kinds of English). Finally, the fact that the material is published means that research results can easily be checked, replicated (or challenged) and extended by others.

OK, here's my little result. I took a quick look at demographic variation in the frequency of the filled pauses conventionally written as "uh" and "um". For technical reasons that I won't go into here, I used the frequency of the definite article "the" as the basis for comparison. Thus I selected a group of speakers (e.g. men aged 60-69), counted how often they were transcribed as saying "uh", and to normalize that count (since the number of people in each category was different) I divided by the number of times the same speakers were transcribed as saying "the".

If we take the relative frequency of "uh" as a measure of disfluency, then the graph above shows that

  • disfluency (or at least uh-usage) increases with age;
  • at a given age, men are more disfluent than women (or at least they use uh more than women).

As a result, men 20-39 have roughly the same uh/the ratio as women 60-69.

The facts for "um" are quite different:

The graph above shows that

  • the frequency of "um" decreases with age;
  • at a given age, women use "um" more than men.

Again, the rate of "um" usage for the younger men is almost the same as the rate of "um" usage for the older women.

It's not entirely suprising that "uh" and "um" pattern differently. For some background, read this 1/5/2004 post on statistical language modeling of filled pauses, which also references an excellent1/3/2004 NYT article by Michael Erard entitled "Just Like, Er, Words, Not, Um, Throwaways". Erard cites a 2001 Language and Speech article by Heather Bortfeld which he describes as finding that "Men say uh and um more than women, though their overall disfluency rate was the same." I'm not sure why Bortfeld's results on um are different from what I found -- more on this later.

[Update: the paper is Bortfeld H.; Leon S. D.; Bloom J. E.; Schober M. F.; Brennan S. E., "Disfluency Rates in Conversation: Effects of Age, Relationship, Topic, Role, and Gender", Language and Speech, 2001, 44(2), 123-147. There is no disagreement after all -- the Bortfeld et al. paper just aggregated all "fillers", including both uh and um, as a single count. Since the effects of age and on uh and um are apparently opposite, this may have blurred the results. One reason for this approach might have been that their total corpus size was about 192,000 words, and the counts of fillers for some of the demographic categories may have been fairly small.]

The paper featured in Erard's article is Herbert H. Clark and Jean E. Fox Tree, "Using uh and um in spontaneous speaking", Cognition 84 (202) 73-111. Their notion is that "speakers use uh and um to announce that they are initiating what they expect to be a minor (uh), or major (um), delay in speaking. ... The evidence shows that speakers monitor their speech plans for upcoming delays worthy of comment. When they discover such a delay, they formulate where and how to suspend speaking, which item to produce (uh or um), whether to attach it as a clitic onto the previous word (as in “and-uh”), and whether to prolong it. The argument is that uh and um are conventional English words, and speakers plan for, formulate, and produce them just as they would any word."

The general idea is a persuasive one, but it doesn't explain the striking (apparent) effects of age and sex.

One last plot deals with the frequency of the assenting murmurs conventionally transcribed as "uh huh", "um hum" or "mm hmm". I needed to make these measurements in order to subtract "uh huh" counts from the counts for "uh", and "um hum" counts from the counts for "um". Indeed, in interpreting all of these graphs you should be concerned about the possibility that some other transcriptional or demographic issue is NOT being controlled for. The evidence should be regarded as provisional until I (or someone else) has the time to examine the demographics more carefully, and check a large enough sample of the original audio -- but the nice thing about published corpora is that these are straightforward if tedious tasks!

This graph shows that

  • the frequency of assenting murmurs increases with age;
  • at a given age, women use assenting murmurs more than men.

I guess that these results are more or less consonant with the current conventional wisdom about language and gender, which is probably a good reason to distrust them. And there are a half a dozen obvious caveats, which I'll discuss another time. Still...

A bit more about the data I used:

The overall collection paradigm was pioneered by the Switchboard project, carried out at Texas Instruments in 1990-91. Other conversations were collected at LDC between 1999 and 2004. Calls were bridged digitally through a "robot operator", which recorded them as well as keeping track of the participants (by assigned PIN and by phone number). Participants were paid to take part in one or more conversations on specified topics with randomly-selected partners. Participants could indicate interest in available topics when they enrolled, and could opt out at call time if they didn't want to discuss a given topic. There were more than 100 topics, involving short instructions like "Discuss recent social changes. How is life in America different today compared to living ten, twenty, or thirty years ago?" or "Do you believe that the US government should provide universal health insurance, or should at least make it a long term goal? How far in that direction whould you be willing to go? What do you see as the most important pros and cons of such a program?" or "The topic is clothing. Please find out how the other caller typically dresses for work. How much variation is there from day to day? How much variation is there from season to season?".

The transcripts were produced in several different ways, but the largest number were done by a professional transcription service, to specifications intended to make the results as consistent as possible across the collection. In that portion of the data, the transcribers were encouraged to work quickly rather than to be absolutely accurate in disfluent regions, and so the filled pauses were probably somewhat undercounted.

The collection that I used can be searched at LDC Online, including the ability to read individual transcripts and listen to the associated audio. The full collection is available to LDC members, but anyone can get a guest account to search the "Switchboard" part of the corpus, comprising about 5000 conversational sides.

Posted by Mark Liberman at 08:35 AM

Guttural politics

On Friday, in the waning days of a nasty gubernatorial race in New Jersey, Democratic candidate Jon Corzine was confronted by reporters about allegations of an extramarital affair with one of his former staffers. Corzine angrily replied:

I'm not going to comment on that kind of low, guttural politics going on in this state.

And back in April, Rush Limbaugh issued the following bizarre on-air quasi-apology for using the term blow jobs during a rant about Al Gore and Bill Clinton (as transcribed by Billboard Radio Monitor):

I meant to say 'oral sex' throughout, but the guttural term escaped my pouty lips in a moment of pure, unbridled passion.

What's happened to the word guttural? A phonetic (or folk-phonetic) term for the articulation of consonants near the back of the vocal tract now gets applied to everything from sexual obscenities to New Jersey politics. How did it end up in the metaphorical gutter?

The simple explanation is that guttural has fused in many people's minds with gutter, particularly in the attributive sense of 'low-down, dirty, vulgar,' as in gutter politics or gutter mouth. So this is an eggcornic confusion. But in which direction is the eggcorn heading? Is it simply a substitution of gutter with the similar-sounding guttural? Or has guttural already changed its sense under influence from gutter, to the point that the word might more accurately be spelled as gutteral? Note that both of the above examples are from media transcripts of public speech; even if a transcriber wanted to represent the word as gutteral, an officious spellchecker, computerized or human, would quickly "correct" it to guttural. (For examples of gutteral in unedited text, see the Eggcorn Database.) 

Perhaps it doesn't matter whether we understand this phenomenon as a replacement of gutter with guttural, or as a reanalysis of guttural as gutteral (meaning 'of or in the gutter').  Either way, guttural and gutter have been phonetically and semantically conflated. A more interesting question is how this conflation developed in the first place.

For more than four centuries, guttural (from Latin guttur 'throat' via Medieval Latin gutturalis) has been used to describe consonants articulated towards the back of the oral cavity. Modern phoneticians would more precisely categorize such consonants into velar, uvular, pharyngeal, and glottal articulations. The term arose as a way to describe certain Hebrew consonants, particularly those represented by the letters het (voiceless pharyngeal or velar fricative), ayin (voiced pharyngeal fricative or approximant), alef (glottal stop), he (voiceless glottal fricative), and sometimes resh (voiced uvular fricative). (There is a great deal of variation in the phonetic realization of these consonants, as outlined here and here.)

Guttural came to be used as a descriptor not just for consonants in Hebrew and other Semitic languages like Arabic, but also for some sounds in European languages, such as the voiceless velar fricative /x/ in German Bach, Dutch van Gogh, and Scottish loch. Of course, English has consonants with velar articulation (the stops /k/ and /g/ and the nasal /ŋ/), not to mention a glottal fricative (/h/), but guttural has been inexactly associated with foreign consonants that sound "throaty" to English speakers. With the advent of modern articulatory phonetics, the term has largely dropped out of use among linguists (though it still retains some currency in studies of Hebrew).

The folk-linguistic sense lives on, however, in English speakers' impressionistic portrayals of the perceived "harshness" of languages like German or Arabic, or even of other English dialects. In contemporary usage it's one of those words that gets thrown around whenever a speaker finds an alien speech pattern somehow displeasing. (Merriam-Webster aptly defines this sense as "being or marked by utterance that is strange, unpleasant, or disagreeable.") A quick Web search turns up such examples as "a guttural English/Chinese mishmash," "a guttural Yorkshire accent," "a guttural Southern drawl," "guttural Ebonics," and countless others. Very often, of course, guttural modifies nonlinguistic vocalizations (roar, laugh, squawk, purr, growl, yell, cackle, groan, etc.). Such collocations only underscore the fact that speech described as guttural may be deemed not just substandard but sublinguistic (at times even subhuman).

The value of speech patterns labeled guttural, in other words, is already quite low in the estimation of many, even without the help of the similar-sounding but etymologically unrelated gutter.  Add to this the fact that gutter is often applied attributively to indicate coarse speech ("gutter language," "gutter talk," "gutter slang," etc.), and the conflation of guttural and gutter to describe vulgar or distasteful forms of communication seems practically inevitable. From there it's a short step to Jon Corzine's "guttural politics."

Sometimes guttural is parasitized not just by gutter but by gut as well. Thus we find many examples of guttural (or gutteral) with a sense of 'visceral' or 'intense.' (See, for instance, the hundreds of Googlehits for "guttural/gutteral reaction" and "guttural/gutteral instinct.") What is happening, then, is that as the articulatory sense of guttural becomes obscured over time, the word gets pressed into service as a readymade adjectival form for either gut or gutter, especially in contexts where those words are used attributively (e.g., gut reaction, gutter politics).

It's not always easy to pick apart the tangled semantic web of gut, gutter, and guttural. When Howard Dean emitted his famous scream in his concession speech after the 2004 Iowa caucuses, it was often described at the time as "guttural." Did that mean the scream was throaty, vulgar, or visceral? For many observers, it was all three at once. Now that's guttural politics.

Posted by Benjamin Zimmer at 01:36 AM

November 05, 2005

Better to x than to not y

I read the following in this article on condensed Bibles in this weekend's NYT (emphasis added):

"But if the man in the street is not reading the Bible," Mr. Budd continued, "you have to ask, isn't it better to read a short version than not to read the long version?"

Obviously, what Mr. Budd means is: "isn't it better to read a short version of the Bible than not to read (any version of) the Bible at all?" But what he actually said isn't as clear as that, and the ambiguity of his statement seems to be what's at the core of the controversy over condensed Bibles reported in the article.

First, let's get rid of the interrogative format and focus on the gist of the claim couched in Mr. Budd's statement:

it's better to read a short version than not to read the long version (of the Bible)

This statement makes two distinctions among people: those who read a short version of the Bible (vs. those who don't) and those who don't read the long version (vs. those who do). Let's give these groups some names:

  • Short-version readers = SV
  • Short-version non-readers = ¬SV
  • Long-version non-readers = ¬LV
  • Long-version readers = LV

We can thus re-write the essence of Mr. Budd's statement as follows, where '>' means "is better than" (or "is greater than on the goodness scale"), and SV / ¬LV stands for being a member of the relevant group.

SV > ¬LV

In other words, being a short-version reader is better than being a long-version non-reader (all else being equal).

Putting aside the relatively uninteresting (and quite likely null) set of folks who read both versions (the intersection of SV and LV), the problem here is that SV is a proper subset of ¬LV: if you're a short-version reader, you're (likely) a long-version non-reader. So what Mr. Budd's statement means depends on how goodness is measured.

Here's what I mean. Suppose everyone starts out with 50 points on the goodness scale. Long-version non-readers (members of ¬LV) get -10 points, and short-version readers (members of SV) get +2 points. So if you're a member of both groups, you'll have 42 points (all else being equal), whereas if you're only a member of ¬LV you'll have 40 points. Assuming (a) that SV and LV don't intersect, so SV is really shorthand for SV&¬LV, and (b) that ¬LV refers to the complement within ¬LV of SV&¬LV, namely ¬SV&¬LV, then under this conception of things, the statement [SV > ¬LV] above is basically true, because 42 > 40.

However, suppose that long-version non-readers simply get all their goodness points erased -- that is, they have 0 points no matter what. Then [SV > ¬LV] is false, because SV = 0 (again, assuming SV and LV don't intersect) and ¬LV = 0. In other words, there's no substitute for reading the long version of the Bible, which is exactly what folks who don't approve of condensed Bibles seem to be saying.

Like I said, it's clear what Mr. Budd meant by what he said. But somehow, it's not clear that he said what he meant.

[ Comments? ]

Shooting too good

In response to my question about word rage outside the Anglosphere, Bob Yates suggested the Zwiebelfisch feature at Der Spiegel. This is "Bastian Sicks Kolumne zur Sprachpflege" ("Bastian Sick's column on language hygeine"). Sick's latest book is Der Dativ ist dem Genitiv sein Tod, which features complaints about sporadic failures to use dative case marking according to traditional (?) principles. The particular example of "word rage" that Bob cited involves one of these missing datives:

Wenn unser Bundeskanzler nach Washington fliegt, hört man garantiert auf irgendeinem Kanal, dass er sich "mit dem US-Präsident" treffen werde. Jedem Korrespondenten dürfte es dabei eiskalt über den Rücken laufen.

When our Chancellor flies to Washington, we hear some television network promising us that he will meet with the US President.  That should send shivers up every correspondent's spine.

(It should be "mit dem US-Präsidenten", not "mit dem US-Präsident".)

Sorry, but I'm not going to count this as "word rage". We're looking for some over-the-top anger, preferably with threats of physical attack, mutilation or death. (I'm assuming that the eiskalt über den Rücken business means that journalists should feel embarrassment for their profession, not that they should feel fear due to impending violent revenge...)

Another reader referred me to Leo, "an English-German forum sponsored by the University of Munich that serves professional translators between these languages (as well as French-German), and also attracts questions and responses from amateurs and interested bystanders". According to Martin,

... every now and then a flame war erupts over German usage, and it gets just as virulent as with English. Two issues that keep coming up are "Denglish" (the German equivalent of Franglais) and, just as with English, the wrong use of the apostrophe. The German's call it the "Deppenapostroph", i.e. the idiot's apostrophe.

But the same reader wrote back an hour later to say

I may have to retract my claim about having seen violent expressions in discussions of perceived misuse of German.

I just scanned through a few archives of the Leo website for examples of threatened violence, and didn't find a single suggestion of chopping or stabbling or smashing any language villains. The most violent acts that I saw was someone who said that they erased an "idiot's apostrophe" by spitting on it, and another who said they had to puke at some example of Denglish. So maybe this violence *is* something peculiar to us English speakers.

The score so far: disgust yes, violent anger no.

Adding to the examples in my earlier post, let me give a couple of classical quotations expressing English language rage. Professor Henry Higgins, in Shaw's Pygmalion, says to Eliza Doolittle

A woman who utters such depressing and disgusting sounds has no right to be anywhere—no right to live. Remember that you are a human being with a soul and the divine gift of articulate speech: that your native language is the language of Shakespear and Milton and The Bible; and dont sit there crooning like a bilious pigeon.

In "Why can't the English", from Lerner and Loewe's musical comedy version of Pygmalion, Higgins' implicit threat is stated more plainly:

By rights she should be taken out and hung
For the cold-blooded murder of the English tongue

Threats of summary executation are popular among real-life English prescriptivists as well. For example, with respect to a notice that a legal document "Does not need notarized", a livejournal denizen comments that "Someone need to be taken out back and shot for that!" Back at the Guardian's talk forums, in response to the question "Has 'per say' become an acceptable spelling?", someone responds " No, shoot on sight". One of the webmob suggests that users of another proscribed expression should be "wheeled out and shot", and another chimes in with "The wheeling part is to [sic] good for them Just shoot", and someone else opines "Shooting too good."

Again, I understand that this is just ritual japery. The question is, do members of any other culture carry on like this about the violent punishment of linguistic offenders? Do they even talk about their rage rather than their displeasure or disgust, as when a commenter at writes

"Hot Dog's and Coke's". Makes me insane with rage.

I'm still waiting for examples.

[By the way, how did Zwiebelfisch = "onion fish" come to mean "misprint" or "typographical error" in German? And why is a column on "language hygiene" entitled "typographical error", anyhow? Is the implication that any linguistic sins must be slips of the fingers?

The Zwiebelfisch forum ought to be a good place to find word rage: its sysop kicks it off by asking "Woran orientiert sich Ihr Sprachgefühl? Welche Sprachsünden ängern Sie am meisten?" ("By what does your language sense orient itself? What language sins anger you the most?") Someone who reads German more fluently than I do ought to find some examples there, if this cultural pattern exists among German-speakers today.


November 04, 2005

Word rage outside the Anglosphere?

When Lynne Truss wrote that "people who put an apostrophe in the wrong place ... deserve to be struck by lightning, hacked up on the spot and buried in an unmarked grave", many in the Anglo-Saxon world cheered and bought her book. But even without publications to peddle, English speakers often threaten violence in support of linguistic norms.

Poke the ground cover in places like The Guardian's Talk forums, and out slither things like

Perhaps we should cut out manager's tongues. Then we wouldn't have to put up with their hideous mutilation of the language?

Yes, perhaps we should cut their fingers off at the oxters. And paralyse them from the neck down as well, just to be on the safe side?


"Let's touch base on that."

No, let's touch your bloody face with my knuckles, repeatedly.

Of course this is all in good fun. No mayhem or mutilation is committed or even seriously intended. But still, there's an impulse of genuine anger behind the jokes, just as there seems to be genuine disgust in other negative reactions to linguistic variation.

I've always assumed that such reactions are a cultural universal. But a few days ago, I read something that made me wonder. AA Gill wrote in The Times that

A simmering, unfocused lurking anger is the collective cross England bears with ill grace. ...

The English aren’t people who strive for greatness, they’re driven to it by a flaming irritation. It was anger that built the Industrial Age, which forged expeditions of discovery. It was the need for self-control that found an outlet in cataloguing, litigating and ordering the natural world. It was the blind fury with imprecise and stubborn inanimate objects that created generations of engineers and inventors. The anger at sin and unfairness that forged their particular earth-bound, pedantic spirituality and their puce-faced, finger-jabbing, spittle-flecked politics. ...

Anger has driven the English to achievement and greatness in a bewildering pantheon of disciplines. At the core of that anger is the knowledge that they could go absolutely berserk with an axe if they didn’t bind themselves with all sorts of restraints, of manners, embarrassment and awkwardness and garden sheds.

Gill is a humorist, not a social psychologist, and I'm no friend of broad-brush stereotypes. But as I ticked off in my mind a list of counterexamples to Gill's position, it occurred to me that I can't recall any examples of "word rage" among other cultures.

I don't mean scorning people for having unsophisticated hick accents, or for using the despised dialects of the urban masses. Nor do I mean raising an eyebrow at the ill-considered innovations of the young. I'm not even talking about feeling disgusted at the way someone speaks or writes. I'm talking about reacting to perceived violations of linguistic norms with talk of chopping and stabbing and smashing. Does anyone but the English and their spiritual heirs do this?

If you know examples of language rage in other languages and cultures, or you're confident that your language and culture lack this feature, please let me know.

Posted by Mark Liberman at 10:25 AM

Dzongkha and Tsong-kha-pa, Voicing and Aspiration

Some readers are understandably having a hard time interpreting George van Driem's explanation for the Chinese confusion between Dzongkha, the name of the national language of Bhutan, and Tsong-kha-pa, the name of the founder of the dGe-lugs-pa school of Buddhism, of which the current head is the Dalai Lama, which I cited in a previous post. I repeat the relevant portion here:

Such confusion could only arise in the minds of speakers of Mandarin Chinese or Tibetan who are not literate in either Tibetan or Dzongkha. Neither Mandarin Chinese nor Tibetan distinguishes phonologically between voiced and voiceless obstruent initials, unlike Dzongkha and, for example, English.

Van Driem's point turns on the distinction between voicing and aspiration. If the vocal folds vibrate during the production of a speech sound it is said to be voiced. Otherwise it is voiceless. When making the transition from a consonant to the following (voiced) vowel, the onset of voicing may occur immediately or it may be delayed by some amount. If voicing is delayed, the voiceless region at the beginning of the vowel is known as aspiration. The aspiration is the puff of air that you can feel if you wet your finger and hold it in front of your mouth when you say pot in English. To experience the contrast between aspiration and its absence, first say pot, then say spot. You'll notice that there is not much of a puff of air in spot but a noticeable one in pot.

If the consonant is truly voiced, the vocal folds will vibrate during the consonant, with the result that the voice onset occurs prior to the end of the consonant. The Voice Onset Time (VOT) is therefore said to be negative. If voicing starts up right at the transition from consonant to vowel, the VOT is 0. If voicing is delayed and there is aspiration, the VOT is positive. The result is that we can talk about voicing and aspiration as aspects of a single dimension of voice onset time.

Different languages divide the VOT continuum up in different ways. Some languages distinguish between voiced consonants, with negative VOT, and voiceless consonants, with zero VOT. These languages have a true voicing contrast. Other languages distinguish a relatively small (but non-negative) VOT from a larger VOT. These languages have an aspiration contrast. And some languages have three categories: negative (voiced), small (voiceless unaspirated), and large (voiceless aspirated).

One language that distinguishes all three categories is Thai. You can see the distinction in the following three images, which show the waveforms and spectrograms of the Thai syllables [tʰa], [ta], and [da]. You can find the audio files here.) In the first image I've highlighted the aspiration region. You can see that there is no voicing (which shows up as energy near the bottom of the frequency range) until the onset of the vowel, but there is a long (70 millisecond) noise segment between the release of the stop closure and the onset of voicing. In the second image there is very little aspiration but no voicing during the stop closure. In the third image you can see voicing during the stop closure as well as some higher frequency noise.

Acoustic analysis of the Thai syllable [tha]

Acoustic analysis of the Thai syllable [ta]

Acoustic analysis of the Thai syllable [da]

Mandarin Chinese has just two series of stops and affricates, one aspirated, the other unaspirated. There is no voicing contrast. You can see this in the spectrograms and waveforms below which show the syllables written pi and bi in pinyin, which phonetically are [pʰi] and [pi]. In the first there is a very long aspiration region; in the second there is no appreciable aspiration nor any voicing prior to the onset of the vowel.

Acoustic analysis of the Chinese syllable [pʰi]

Acoustic analysis of the Thai syllable [pi]

With this background, we can look again at what van Driem is saying. He is saying that to those who see the words Dzongkha and Tsong-kha-pa in print, it is obvious that they are different. Similarly, if one hears these words and can distinguish the voiced [dz] from the voiceless [ts], it is clear that they are different. If, however, one does not know how they are written and is unable to perceive the phonetic distinction, due to speaking a language that has no voicing distinction, then they may sound the same. A speaker of Mandarin Chinese might, he thinks, fail to distinguish these two words because Mandarin has only an aspiration distinction and is therefore not attuned to hear a voicing distinction. For the Mandarin speaker, both [dz] and [ts] fall into the unaspirated category and so do not contrast.

[Incidentally, voice onset time also enters into the explanation for the existence of so many different names for the city of Beijing, which I discussed some time ago.]

Posted by Bill Poser at 02:42 AM

There ain't no sanity clause...

So saith the renowned legal scholar Chico Marx. There is, however, a "liberty clause."

In an Oct. 31 news conference indicating at least tentative support for Samuel Alito's nomination to the Supreme Court, Senate Judiciary Committee chairman Sen. Arlen Specter led off with his favorable impressions from a meeting with Alito:

I start with his statement that he believes there is a right to privacy under the liberty clause of the United States Constitution.

Specter was referring to Section 1 of the Fourteenth Amendment (emphasis mine):

Section 1. All persons born or naturalized in the United States, and subject to the jurisdiction thereof, are citizens of the United States and of the state wherein they reside. No state shall make or enforce any law which shall abridge the privileges or immunities of citizens of the United States; nor shall any state deprive any person of life, liberty, or property, without due process of law; nor deny to any person within its jurisdiction the equal protection of the laws.

Crucially, the right to personal liberty guaranteed by the Fourteenth Amendment was invoked by the Supreme Court in upholding its 1973 decision in Roe v. Wade. It's become a tradition for pro-choice senators like Specter to ask Supreme Court nominees about the "liberty clause," as an indirect way of finding out their thoughts on the validity of Roe. (Last month, there was some controversy over what Harriet Miers told Specter about the "liberty clause." Before that, John Roberts was grilled about the clause; he was also asked by Specter whether Roe should be considered not just a "super precedent" but a "super-duper precedent.")

But what is the "liberty clause" exactly? Is it the entirety of Section 1 of the amendment? Just the second sentence? The part of the second sentence between the two semicolons? Or is it simply the word liberty itself?

This last possibility was raised by Ramesh Ponnuru, a contributor to The Corner, the group blog of National Review Online. After hearing Specter's comments about Alito, Ponnuru posed a question:

I'm not sure when people started talking about a "liberty clause" of the Constitution--I hadn't noticed this until the Roberts hearings. It seems like a strange way to refer to one word in the Fourteenth Amendment. Are there other one-word "clauses"?

This is, on the face of it, further proof of the cultural divide between legal studies and linguistic studies. A syntactician might think a "one-word clause" in English would need to be an unmodified imperative intransitive verb like "Surrender!" (See Geoffrey Pullum's post on very short sentences.) But of course, the definition of clause in the world of law has nothing to do with syntactic structures. The legal sense, meaning "a distinct article, stipulation, or proviso in a legal document," has a long history in English. (The Oxford English Dictionary offers a quote from Chaucer's Troylus And Criseyde: "He shall me never binde in soche a clause.") Still, could the "liberty clause" really consist of a single word?

Ponnuru posted a reply from Princeton professor of politics Robert P. George:

If there is a "liberty clause," then there is also a "life clause" (that should make Arlen Specter shudder), and even a "property clause" (which might not go over well with Democrats). Of course, what we actually have is a due process clause.

George seems to consider the possibility of a "one-word clause" to be rather silly, and then explains that what Specter is really talking about is a "due process clause." The "due process clause" of the Fourteenth Amendment is generally understood to be the part in between the semicolons of the second sentence, reading: "nor shall any state deprive any person of life, liberty, or property, without due process of law." (Hey, it's a syntactic clause, too!)

So is this provision "actually" the "due process clause," and "liberty clause" is a misnomer? Either way, it's a bit of synechdoche, naming the clause by one of its key elements, either "due process" or "liberty." Indeed, the Merriam-Webster Dictionary of Law has no problem defining "liberty clause" as "the due process clause found in the Fourteenth Amendment."

The phrase "due process clause" does have a longer history (it is, for instance, what is used in the Roe decision), but "liberty clause" isn't simply an invention by senators like Specter involved in this year's series of Supreme Court nominations, as Ponnuru implies. It's been in use at least since 1987, according to the Nexis and Factiva news databases. On September 6 of that year, in advance of the contentious Senate hearings for Robert Bork, Anthony Lewis wrote a column in the New York Times detailing Bork's opposition to cases that rely on "the liberty clause of the 14th Amendment." When the Judiciary Committee voted to report the nomination to the full Senate with a negative recommendation, the report of the majority (eight Democrats plus Sen. Specter) read:

Had Judge Bork's views been the governing rule on the Supreme Court at the critical moments of the last generation, principles that most Americans have come to accept would have been rejected. There would be no right to privacy. There would be no substantive content to the liberty clause of the 14th Amendment.

Bork was of course rejected by the Senate, and his replacement, Anthony Kennedy, was also asked about the "liberty clause" during his December hearings. In contrast to Bork, Kennedy testified approvingly about the clause and its application to a right of privacy, and he was later confirmed. But it wasn't just pro-choice senators and pundits who were using the phrase "liberty clause." The National Law Journal and the Legal Times both used it in 1989, and Justice John Paul Stevens gave the phrase the thumbs-up in a speech at the University of Chicago, as reported by the Oct. 28, 1991 Chicago Daily Law Bulletin:

The construction of the due process clause or as I prefer to call it, the liberty clause, has transformed the Bill of Rights from a mere constraint on federal policy into a source of federal authority to constrain state powers.

Stevens expanded on this speech in a 1992 article, "The Bill of Rights: A Century of Progress." Since then, "liberty clause" has occasionally found its way into case law, but "due process clause" is still overwhelmingly preferred in court decisions. Nonetheless, "liberty clause" remains the phrase of choice (forgive the pun) among pro-abortion-rights senators, who are probably seeking to lend more rhetorical weight to the justification for Roe than the sterile-sounding "due process clause."

The question remains, though: when pro-choice politicians refer to the "liberty clause," is it always exactly synonymous with "due process clause"? This 2003 quote from Sen. Dianne Feinstein about her Republican colleague Rick Santorum would seem to indicate otherwise:

The Senator has talked about the liberty clause. And Roe v. Wade, yes, did come from the liberty clause of the due process clause of the 14th amendment and other parts of the Constitution.

For Feinstein, the "liberty clause" is evidently a subset of the "due process clause," which in turn is a subset of the Fourteenth Amendment. So perhaps for some users of the phrase, "liberty clause" does indeed refer to the single word liberty!

[Update #1: Sean Barrett suggests "unpacking" the due process clause into three components (mini-clauses?)...

"nor shall any state deprive any person of life without due process of law;
nor shall any state deprive any person of liberty without due process of law;
nor shall any state deprive any person of property without due process of law"

This would supply "a fairly natural reading" for "liberty clause," Barrett argues. I do believe that this approaches what Feinstein, for instance, is implying with her use of "liberty clause" as a proper subset of the "due process clause." But as Prof. George points out, shouldn't there then be a "life clause" and "property clause" accompanying the "liberty clause"? Of course, "life" and "property" have not been subject to the contentious reading that "liberty" has in Roe and other cases, so those two elements fade to the background. I prefer thinking of this as a case of synechdoche, where the "liberty" element takes on the greatest significance in the clause and thus names it, even outstripping the "due process" that would normally have the greatest legalistic impact.]

[Update #2: I thought I might be reading a bit too much into Feinstein's reference to "the liberty clause of the due process clause of the 14th amendment" &mdash could this formulation merely be a typo (or speako) for "the liberty clause or the due process clause of the 14th amendment"? But here is Sen. Orrin Hatch using the same phrasing during the Clarence Thomas hearings in 1991:

As you know, the first substantive due process case was the Dred Scott case in 1857. That is where the Supreme Court held that the "Liberty Clause" of the Due Process Clause prevented Congress from forbidding slavery in the territories.

Hatch, by the way, is certainly no pro-choicer; rather, his use of "liberty clause" may have been somewhat ironic, perhaps hinting that he believes the appeal to a constitutional right to "liberty" was applied illiberally in Roe, just as it was in the Dred Scott case. For more on how pro-life conservatives use Dred Scott as a code for talking about their opposition to Roe, see this piece by Timothy Noah on Slate.]

[Update #3: John O'Neil points out an obvious problem with Hatch's use of "liberty clause" (or my initial interpretation of it): the Dred Scott case was decided in 1857, more than a decade before the ratification of the Fourteenth Amendment. The right to substantive due process at issue in Dred Scott derives from the Fifth Amendment ("No person deprived of life, liberty, or property, without due process of law"). This is also commonly referred to as a "due process clause" and is the obvious model for the similar passage in the Fourteenth Amendment, which applied due process restrictions to the states. Hatch seems to be on his own, though, in using the phrase "liberty clause" in reference to the Fifth Amendment rather than the Fourteenth.]

Posted by Benjamin Zimmer at 12:47 AM

November 03, 2005

"Hone in on" before "home in on"?

Google Print went live today, and so I'd like to be (among?) the first to use it for linguistic research. As a first try, I thought I'd check to see if I could find someone using "hone in on" before George Plimpton did it in 1965, and bingo:

"Right!" Dunk nodded. "By coming here we make contact with something that's been dramatically affected by the storms. Whether it will actually help us hone in on the perpetrators or not, I don't know. But it's worth a try." [emphasis added]

[Raymond Buckland, Cardinal's Sin, Llewellyn Worldwide, Jan. 1, 1951, ISBN 1567181023 (p. 113)]

The earliest citation that I've seen for "home in on" is 1956, so this raises the bizarre possibility that "hone in on" actually appeared first.

But no -- I've been misled by poor quality control at the source: a quick peek at the copyright page shows that Cardinal's Sin was actually published in 1996! It's not clear to me how it got entered in the Google Print database with a "Publication Date" of 1951 -- this is not a likely OCR or keyboarding error, and we can't blame it on MSWord's spellchecker.

I tried searching Google Print for "home in on", but the earliest examples available there so far are from 1976.

In fact, I still believe that "home in on" was a WWII-era coinage, and that "hone in on" quickly appeared as an eggcorn for it, but the evidence for this view is just as unclear as it was before.

[Update: well, Google Print was disappointing (though I look forward to great things), but Ben Zimmer checked ProQuest and came through with 1947 and 1944 citations for "home in on":

1947 Washington Post 23 Sep. 2/5 Approaching Brize Norton, the auto-pilot "homed" in on the selected radio compass station.
1944 Chicago Daily Tribune 7 Dec. 15/2 The Oahu radio was coming in strong. They had left the station on all night so we could "home in" on its frequency.


[Update #2: Ben Zimmer reports that

Looks like Google Print has a large number of books with a spurious pub date of 1951. Searching on that year will only return the first 50 matches (too bad!), but of those 50, most are not from 1951.

And Michael Cramer points out that lists "Llewellyn Publications (January 1, 1951)" as the Publisher for _Cardinal's Sin_. Google Print and appear to get their publication data from the same source -- or Google's been screen-scraping to populate its new service.

Probably there's a source in common, but I wonder what it is?]

[Update #3: Elizabeth Zwicky writes:

Google and Amazon are frequent partners (when you search on Amazon, you're using Google) so it's not surprising for them to have a common data source. However, in this case they probably bought the same bad data separately; Amazon's worst data comes direct from Books In Print, which is the major holder of this sort of data, and the obvious (perhaps only) place for Google to have gotten it. As far as I can tell, they have an effective monopoly and the sort of quality control that comes with it. Not that I'm bitter about having the editor listed as the sole author of my book, or anything.


The Trent Reznor prize for tricky embedding

Matthew Hutson, noting my interest in embedding, has observed by email that Trent Reznor of Nine Inch Nails is responsible for "the most tricky and yet correct and clear sentence by a rockstar in an interview that I have ever seen":

"When I look at people that I would like to feel have been a mentor or an inspiring kind of archetype of what I'd love to see my career eventually be mentioned as a footnote for in the same paragraph, it would be, like, Bowie."

Matthew suggests that Reznor deserves extra points because his sentence is "finished with a flourish of 'like.'" It surely is, even if Reznor seems to be a bit confused about where footnotes go, and so I hereby inaugurate the Trent Reznor Prize for Tricky Embedding, to be awarded intermittently.

Posted by Mark Liberman at 06:44 AM

User Friendly on Localization

User Friendly has a comment on the failure to recognize the need to translate computer resources into people's own languages.

November 02, 2005

Microsoft Outlaws Dzongkha

According to the Tibet News

Microsoft has barred the use of the Bhutanese government's official term for the Bhutanese language, Dzongkha, in any of its products, citing that the term had affiliations with the Dalai Lama. In an internal memorandum, Microsoft employees were told not to use the term Dzongkha in any Microsoft software, language lists or promotional materials since "Doing so implies affiliation with the Dalai Lama, which is not acceptable to the government of China. In this instance, replace "Dzongkha" with 'Tibetan - Bhutan'."

What adds insult to injury is that, according to the Bhutanese news site Kuenselonline, the government of Bhutan, with the assistance of the Swiss Development Corporation, paid US$523,000 to add support for Dzongkha. It didn't cost Microsoft a penny. Bhutan should have spent its money on free software. It would probably have been much cheaper, and they would have control over it.

It simply isn't true that Dzongkha is a dialect of Tibetan in the sense in which dialect is usually used. It isn't particularly closely related. There's more information about Dzongkha at the Himalayan Languages Project. The Ethnologue provides this family tree. Nor is there any relationship between Dzongkha and the Dalai Lama. A reader's comment on the Pinyin News post on this topic contains this explanation by Dr. George van Driem, Director of the Himalayan Languages Project, Department of Comparative Linguistics at Leiden Universiy:

The language Dzongkha, literally "language of the fortress", is a South Bodish language related to Dränjoke [a language of Sikkim] and, more distantly, to Tibetan. Tibetan, however, belongs to a distinct sub-branch and is a Central Bodish language. The word rDzong (pronounced Dzong) denotes the citadels which served as the centres of military power and higher learning throughout Bhutan since the mediaeval period. The word rDzong has nothing to do with the name Tsong-kha-pa, literally "man from the onion district" (1357-1419), who founded the dGe-lugs-pa (pronounced Gelukpa or Gelup) school of Tibetan Buddhism currently headed by the Dalai Lama. Such confusion could only arise in the minds of speakers of Mandarin Chinese or Tibetan who are not literate in either Tibetan or Dzongkha. Neither Mandarin Chinese nor Tibetan distinguishes phonologically between voiced and voiceless obstruent initials, unlike Dzongkha and, for example, English.

Why is it that China would object to a term that they mistakenly associate with the Dalai Lama, one of the great men in the world today, recipient of the 1989 Nobel Peace Prize? It is because as head of the legitimate government of Tibet he is the symbol of Tibetan resistance to the colonial rule initiated by the Chinese invasion of 1950. In other words, Microsoft is refusing to recognize the existence of the national language of Bhutan so as not to offend China's sensibilities over its colonization of Tibet.

Now, I know from previous experience that I'm going to get outraged email and comments elsewhere from apologists for colonialism complaining that I don't know what I'm talking about, that Tibet has been part of China for thousands of years, that when China invaded in 1950 it was merely repossessing a part of China, and that Tibetans are much better off under enlightened Chinese rule, so I'll say a few words about this issue here in an attempt to forestall this. If you're not familiar with it, you can get a good idea of the Chinese government's position here.

Some of the arguments might be valid if the underlying facts were true, but others are simply infantile. One argument is that prior to the Chinese invasion Tibet was an oppressive, feudal society. That was, in many ways true, but it hardly justifies colonization. Here's an identical argument: in the nineteenth century, China was an oppressive, corrupt, feudal society. The European powers would therefore have been justified in invading China and incorporating it permanently into their countries.

The people who take the opposite view include the Nobel Prize Committee. Here are a couple of excerpts from the Dalai Lama's Nobel Prize citation, with my emphasis added:

The Norwegian Nobel Committee has decided to award the 1989 Nobel Peace Prize to the 14th Dalai Lama, Tenzin Gyatso, the religious and political leader of the Tibetan people.
The Committee wants to emphasize the fact that the Dalai Lama in his struggle for the liberation of Tibet consistently has opposed the use of violence. He has instead advocated peaceful solutions based upon tolerance and mutual respect in order to preserve the historical and cultural heritage of his people.

There are actually two issues here. First, has Tibet historically been a part of China, and second, even if Tibet has been part of China, are Tibetans entitled to national self-determination? As for the first issue, the claim that Tibet has been part of China since time immemorial, or even for the past seven hundred years, is utter nonsense. Tibet has been independent of China for most of its history. Imperial China claimed nominal sovereignty over every state with which it had diplomatic relations, on the theory that the Emperor could only enter into the relationship of master to vassal, including Japan, Okinawa (an independent country until 1609), Korea, and Vietnam. If you aren't familiar with Chinese history, you can get an idea of the Imperial style from this letter sent in 1839 by Imperial Commissioner 林則徐 Lín Zé Xú to Queen Victoria demanding that she put an end to the opium trade.

In spite of China's nominal claims of sovereignty over Tibet, Tibet was de facto an independent state and did not acknowledge Chinese sovereignty. That is why, for example, China under the Manchus attacked Lhasa in 1720 and again in 1910. If Tibet were part of China, China would not have attacked it. Tibet also fought wars with Jammu in 1841-1842 and with Nepal in 1854-55. Making war is of course one of the defining capacities of a sovereign nation.

The first point at which Tibet was actually ruled by the same government as China was during the Yuan dynasty, when both Tibet and China were under Mongol rule. It was, however, the Mongols who conquered Tibet, not the Chinese. The Mongols took over Tibet before they took over China, and once they were in power administered the two separately. In China they exerted direct control, while in Tibet they ruled via the local rulers. When the Mongol Empire disintegrated, Tibet regained its independence.

In the period leading up to the Chinese invasion, it is clear that as a matter of international law Tibet was an independant state. It had a distinctive population occupying a well-defined territory under the effective control of its own government. The government of Tibet issued coins, currency and passports that were internationally recognized. It entered into diplomatic relations as a sovereign nation with other countries, including Nepal, Mongolia, Great Britain, and Ladakh. Even the Republic of China negotiated with Tibet as a sovereign nation at the Simla Conference in 1913-1914.

The second issue is whether Tibet is entitled to independence, whatever its prior status may have been. Surely the answer is yes. Tibetans have a distinctive language, culture, and sense of identity. As defined in international law, they are a people with a right to self-determination. To this China opposes two claims. First, it claims that the independence of Tibet would violate China's territorial integrity. International law does not recognize claims of territorial integrity by illegitimate governments. Since China does not govern Tibet with the consent of Tibetans and has engaged in massive violations of human rights in Tibet, China cannot legitimately make any claim of territorial integrity. The second is the argument already addressed, that Tibet was a backward country in need of enlightenment.

For a detailed examination of the question of Tibetan self-determination I recommend Tibet's Sovereignty and the Tibetan People's Right to Self-Determination by Andrew G. Dulaney and Dennis M. Cusack of the Tibet Justice Center and Dr. Michael van Walt van Praag of the Unrepresented Nations and Peoples Organization. You can download the entire document as a PDF file or read it online here.

So there you have it. China objects to the language name Dzongkha because of an imaginary association with the leader of the legitimate government of its Tibetan colony. In order to please China, Microsoft refuses to use the generally accepted name for the national language of Bhutan. Now there's a company with principles.

Squabbles over "Scalito"

The original round of reporting on Samuel Alito's nomination to the Supreme Court introduced us to the nickname Scalito, interpreted as either a blend of Scalia and Alito or a diminutivization of Scalia (or both). Since my first post on the subject, debates have raged over the nickname, with the expected polarization along political battle-lines.

The Drudge Report was first out of the gate in equating the Scalito nickname with supposed Italian-American-bashing by Alito's Democratic opponents. Drudge quoted an "outraged Republican strategist" as saying:

If Alito were a liberal there would be no way Democrats and Washington's media elite would use such a ethnically insensitive nickname. Italian-Americans should not have to face these types of derogatory racial slurs in 21st century America.

Later, Drudge provided a named source for the outrage, under the headline, "National Italian American Foundation Demands 'Scalito' Apology." NIAF chairman A. Kenneth Ciongoli issued a press release condemning the use of the nickname by "some senators and the media." Predictably, this charge of a smear was itself called a smear on the left-leaning Daily Kos, with one commenter reporting that Ciongoli's son once clerked for Alito. And so the recriminations continue.

Some conservatives, while not explicitly labeling Scalito a "racial slur," have been quick to challenge its use based on perceived ethnic discrimination. Matthew Continetti of the Weekly Standard writes:

The nickname is misleading. The two men may share a vowel at the end of their last name. But, needless to say, they're different people.

As Eric Bakovic notes over at phonoloblog, the business about Alito and Scalia "sharing a vowel at the end of their last name" sounds a bit odd unless you read further, where Continetti identifies himself as a fellow Italian-American possessor of a vowel-final surname:

I, too, in case you haven't noticed, have a vowel at the end of my name, and so I find myself obliged, as a strange point of ethnic pride, to point out Scalia and Alito's differences.

Continetti's remarks inspired "Nick" at the musement park blog to reply:

I mean, is the whole world going crazy?? Two ultra-conservative, dissent-penning, Italians on the same Supreme Court (potentially), one clerked for the other, both have 3-syllable, rhthmically identical names that include the letters A-L-I, and everyone is just supposed to ignore the name thing?

In another blog entry Nick bemoans that "we are witnessing the criminalization of wordplay" (spoofing a recent statement by Tom DeLay about "the criminalization of conservative politics").

Those objecting to Scalito don't have much to say about reading it as a blend of the two names, fused at the -ali- overlap (beyond a general complaint that it somehow disparages Italian names). But the second interpretation, that Scalito is a diminutive form of Scalia, has become a point of contention, since some Alito supporters find it belittling.

Press accounts have called Scalito "a translation of 'little Scalia'" — but a translation from what? It doesn't work in standard Italian, where diminutivizing suffixes include -ino, -etto, and -ello, but not -ito. (Thanks to Donna Jo Napoli for verifying this; I had thought -ito might have been an old Italian suffix based on seeing etymologies for graffito that suggest it was diminutivized from graffio. In fact graffito owes its form to graffiato, the past participle of graffiare 'to scratch, scribble,' not to an -ito suffix.) [Update: Geoffrey Nunberg further clarifies that graffito is the past participle of graffire, which the Dizionario della Lingua Italiana defines as "incidere leggermente, tracciare incidendo" ("incise/etch lightly, draw while incising"). There are also the variants sgraffire/sgraffito, now largely obsolete.]

Rather than Italian, the source for the -ito diminutivization is clearly Spanish. This is not to say that the "little Scalia" reading is necessarily a "translation" from Spanish either. (Eric Bakovic points out via email that in his native variety of Spanish, Scalia would actually be diminutivized as Scaliecito [skaljesito] — or if we simplify the initial consonant cluster, Escaliecito!) But in American "mock Spanish," as Jane Hill calls it, the -ito suffix is applied indiscriminately. So the blending of Scalia and Alito into Scalito allows for a secondary reading as a "foreign-sounding" diminutive of Scalia, regardless of actual rules of morphology in either Italian or Spanish.

We might need to go back to the original source to find out if the "little Scalia" reading was at play in the creation of Scalito or if it was simply a later interpretation. The first print appearance of the nickname is in a Dec. 7, 1992 article in the National Law Journal by Joseph Slobodzian. Commenting on Stuart Buck's blog The Buck Stops Here, Shannon P. Duffy of the Legal Intelligencer takes credit for coining Scalito and feeding it to Slobodzian.

So the nickname was evidently a journalistic invention to begin with. (Even now, according to the Washington Post, Alito's colleagues "don't know anyone who isn't a journalist who actually calls him 'Scalito.'") It has become a useful weapon, however, for those on the left wanting to tar Alito with the same brush as Scalia, and for those on the right wanting to paint criticism of Alito as discriminatory against Italian Americans.

[Update, 11/3/05: Charles Franklin, a political scientist at the University of Wisconsin, has been tracking the media's use of Scalito on his blog, Political Arithmetik. He found a big dropoff in usage between 10/31 and 11/2. It's very possible that, as in the case of the word refugee during the aftermath of Hurricane Katrina, journalists are backing off from a word that is potentially controversial. Or, perhaps, the novelty of the term is simply wearing off.]

[Update, 11/7/05: More from Franklin here.]

Posted by Benjamin Zimmer at 04:20 PM

Is marriage identical or similar to itself?

Here's another challenge to Antonin Scalia's intention-free theory of legal meaning. On November 8, Texas voters will vote on the 434th amendment to the Texas state constitution, worded as follows:

SECTION 1. Article I, Texas Constitution, is amended by adding Section 32 to read as follows:
Sec. 32. (a) Marriage in this state shall consist only of the union of one man and one woman.
(b) This state or a political subdivision of this state may not create or recognize any legal status identical or similar to marriage.

Some opponents of the amendment, operating as "Save Texas Marriage", have pointed out that if you prohibit the state and its subdivisions from "creating or recognizing any legal status identical or similar to marriage", you appear to prohibit any form of marriage whatsoever, since whatever marriage may be, it surely is "a legal status identical or similar to" itself.

The arguments against this view lean heavily (and plausibly) on the question of legislative intent:

“They’re trying to play on words,” said amendment sponsor and State Rep. Warren Chisum (R-Pampa), according to the Associated Press. “Let me tell you, we had some very good legal scholars help put this language together.”

He said that when legislators discussed the proposed amendment earlier this year, they agreed that traditional and common law marriage would not be affected.

I'm sure that Rep. Chisum is right about the intent of the legislators. And even without any historical testimony about what the drafters had in mind, Chisum's interpretation is the obvious one on any intent-based theory of the meaning of meaning, for anyone with even the most general sense of the cultural context. However, the language of the amendment is another matter. If the Ledge paid those "very good legal scholars", they should ask for a refund.

On the other side , the Save Texas Marriage web site quotes various legal authorities in support of Scalia's viewpoint, with Nathan Hecht ("Conservative Republican Texas Supreme Court Justice") explaining that "when you're construing the Constitution of statue, you're stuck with what's there", and Greg Abbott ("Republican Texas Supreme Court Justice and current Attorney General of Texas") saying that "when interpreting our state Constitution, we rely heavily on its literal text and are to give effect to its plain language".

I suppose that this will be a matter for the Texas Supreme Court, not the Supreme Court of the United State, so we may never find out what Justice Scalia thinks about it. Still, it will be interesting to see if the Texas Supreme Court makes intentionalist arguments in interpreting the amendment.

Here's the full text of the legislative resolution:

H.J.R. No. 6
A JOINT RESOLUTION proposing a constitutional amendment providing that marriage in this state consists only of the union of one man and one woman.
SECTION 1. Article I, Texas Constitution, is amended by adding Section 32 to read as follows:
Sec. 32. (a) Marriage in this state shall consist only of the union of one man and one woman.
(b) This state or a political subdivision of this state may not create or recognize any legal status identical or similar to marriage.
This state recognizes that through the designation of guardians, the appointment of agents, and the use of private contracts, persons may adequately and properly appoint guardians and arrange rights relating to hospital visitation, property, and the entitlement to proceeds of life insurance policies without the existence of any legal status identical or similar to marriage.
SECTION 3. This proposed constitutional amendment shall be submitted to the voters at an election to be held November 8, 2005. The ballot shall be printed to permit voting for or against the proposition: "The constitutional amendment providing that marriage in this state consists only of the union of one man and one woman and prohibiting this state or a political subdivision of this state from creating or recognizing any legal status identical or similar to marriage."

H.J.R. No. 6 was passed by the Texas House on April 25, 2005, and the Texas Senate on May 21, 2005.

Legal meaning: the fine print

At the end of my post on Antonin Scalia's philosophy of language, I asked "do obvious typos or malaprops ... have the force of law?" In response, a Michigan attorney sent in an example taken from a Michigan statute requiring landlords to provide tenants with certain information. Part of the requirement is an anti-fine-print provision, worded as follows:

"The notice shall include the following statement in 12 point boldface type which is at least 4 points larger than the body of the notice or lease agreement ..."

On the face of it, this requires that leases be printed in 8 point type or smaller. My correspondent's comment:

Tenants can sue landlords for leases that fail to comply with the law. No court would grant tenants relief on the grounds that their leases are in print large enough for them to read. Courts certainly recognize that there can be mistakes in legislative drafting.

The mistake in this case is significantly more subtle than a typographical error or lexical substitution. I'm frankly puzzled about what the drafters had in mind here. Perhaps they meant that the notice must be at least four points larger than the body of the document, and in boldface, and in any case no smaller than 12 point type?

Anyhow, the point is that courts are apparently allowed to reason along lines like "the plain meaning of the words of the statute is not something that its drafters could plausibly have intended; therefore the legal meaning of this passage is not what it literally says". Is this true as a generally-accepted principle of judicial interpretation? If so, it seems to be a partial refutation of Justice Scalia's theory of legal meaning:

What is needed for a symbol to convey meaning is not an intelligent author, but a conventional understanding on the part of the readers or hearers that certain signs or certain sounds represent certain concepts. In the case of legal texts, we do not always know the authors, and when we do the authors are often numerous and may intend to attach various meanings to their composite handiwork. But we know when and where the words were promulgated, and thus we can ordinarily tell without the slightest difficulty what they meant to those who read or heard them.

We can indeed "tell without the slightest difficulty" what the words of the Michigan statute "meant to those who read or heard them" in the time and place "where the words were promulgated" (a few years ago, in the American midwest). The plain meaning is that the body of a lease must be in 8 point type or smaller. Perhaps on Scalia's view the law means exactly that, until and unless the legislature acts to amend it.

Here's a larger context for the quoted statute:

554.603 Security deposit; notice.

Sec. 3.

A landlord shall not require a security deposit unless he notifies the tenant no later than 14 days from the date a tenant assumes possession in a written instrument of the landlord's name and address for receipt of communications under this act, the name and address of the financial institution or surety required by section 4 and the tenant's obligation to provide in writing a forwarding mailing address to the landlord within 4 days after termination of occupancy. The notice shall include the following statement in 12 point boldface type which is at least 4 points larger than the body of the notice or lease agreement: “You must notify your landlord in writing within 4 days after you move of a forwarding address where you can be reached and where you will receive mail; otherwise your landlord shall be relieved of sending you an itemized list of damages and the penalties adherent to that failure.” Failure to provide the information relieves the tenant of his obligation relative to notification of the landlord of his forwarding mailing address.


November 01, 2005

Language Log talks, Paper of Record listens

I think we're getting some solid results from the New York Times. First, Maureen Dowd's crocheted/croqueted mixup was resolved, albeit with no mention of the correction. Now Alessandra Stanley's truthiness/trustiness gaffe has finally been rectified — this time with a full correction notice:

The TV Watch column last Tuesday, about "The Colbert Report" on Comedy Central, misstated the "word of the day" invented for the show's feature "The Word." It was "truthiness," not "trustiness."

Language Log: holding the Gray Lady's feet to the fire. We demand truthiness.

[Update, 11/3/03: Stephen Colbert had some fun with the Times correction on last night's "Colbert Report." Transcript courtesy of Gawker, where the original error was also noted:

Now, before we start, there is something else I need to talk about, this correction in yesterday's New York Times. Let's go full frame with this.
You see, the Times mistakenly reported that in the first episode of this show, "The Colbert Report," THE WORD was "trustiness." It was, in fact, "truthiness."
Trustiness? That's not even a word!
Doesn't surprise me one bit the "New York Times" hasn't heard of truthiness.
I'll tell you one thing, somebody better go to jail for 85 days over this.
You know what, New York Times? Apology not accepted.
So let's go straight to THE WORD, which tonight is something even the New York Times can't possibly get wrong. Cat. C-A-T, cat.
I'll give the guys over at the "Times" a second to write it down.

International Spam

As you probably know I'm a fan of internationalization, but it can go too far. When I checked my email a few minutes ago I found two spam messages right in a row. One was in Japanese, advertising software. The other was in Persian. I don't know what it said since, regrettably, I can't read Persian. It identified itself in English as coming from the Iranian Sunni Webmasters' League. Maybe this is a trick for getting past the spam filter, but of course it only helps to get past the spam filter if the recipient can understand the message.

Posted by Bill Poser at 06:47 PM

Linguistics Required for British Citizenship

The Daily Telegraph reports that the new test required of applicants for British citizenship requires knowledge of where the different dialects of British English are spoken. The new test emphasizes topics considered of practical importance for living in Britain. Home Minister Tony McNulty is quoted as saying:

This is not a test of someone's ability to be British or a test of their Britishness.. It is a test of their preparedness to become citizens, in keeping with the language requirement as well. It is about looking forward, rather than an assessment of their ability to understand history.

It is interesting, and I suppose encouraging, that understanding of regional dialect variation should be considered of practical use, along with the role of the monarchy and the Church of England, but I must say that I find the Minister's attitude toward history incomprehensible. Some understanding of a country's history is important to an understanding of why things are as they are, why people have certain attitudes, what values are important, and why some changes will be strongly resisted.

I am not British and have no stake in British immigration policy, but this test seems to me to be more suitable for applicants for long-term residence in Britain than for citizenship. Given the problems that Britain as well as a number of other countries have with immigrants who do not understand or do not appreciate the political system and values of the country to which they have immigrated, I am surprised that the government is not moving in the direction of requiring of immigrants a greater understanding of and appreciation for the system and its values. Mr. McNulty's view of citizenship seems to me to be rather debased.

Posted by Bill Poser at 02:59 PM

Chomsky Named Top Public Intellectual

According to this news report, a joint internet poll by Prospect Magazine and Foreign Policy has named linguist Noam Chomsky the top public intellectual in the world today. Chomsky received nearly twice as many votes as runner-up Umberto Eco. The complete results can be found in Prospect Magazine.

Polls like this are pretty silly: the sample is self-selected, it is far from clear whether most people have the knowledge to vote intelligently, and the criteria are ill-defined: do we mean "most influential" or "having ideas of greatest long-term importance" or "having the most original ideas" or what? The news report cites University of Alberta historian Douglas Owram as making the following intelligent comment:

Owram said Chomsky wasn't a bad choice for the top spot. "It doesn't diminish my contempt for the whole notion of doing this, but in a sense Chomsky is a serious intellectual. He's certainly a worthy contender for this pseudo-crown."
Posted by Bill Poser at 02:23 PM

Literally: a history

Yet another usage bugaboo decried as the death of English turns out to have a long and venerable history. This time it's literally used not so literally. On Slate, Oxford English Dictionary editor-at-large Jesse Sheidlower takes us through a chronicling of literally applied intensively: from its emergence as a general intensifier for true statements in the late 17th century, to its use as an intensifier for figurative or metaphorical expressions in the late 18th century, on down to latter-day complaints beginning in the early 20th century (and peeveblogging in the early 21st century!). It's an arc that should look familiar to connoisseurs of the Recency Illusion.

I don't have much to add, other than to flesh out some of the early history. Using Chadwyck's Literature Online database,  I can take the disputed usage (literally intensifying figurative speech) back to the 1760s. Here are three examples:

George Colman and David Garrick, The Clandestine Marriage (1766), p. 61
I look upon it, Madam,  /  to be one of the luckiest circumstances of my life,  /  that I have this moment the honour of receiving  /  your commands, and the satisfaction of confirming  /  with my tongue, what my eyes perhaps have but too  /  weakly expressed---that I am literally---the humblest  /  of your servants.  /

Frances Brooke, The History of Emily Montague, Vol. IV (1769), pp. 82-3
I am just come from a walk in the wood behind the house, with my mother and Emily; I want you to see it before it loses all its charms; in another fortnight, its present variegated foliage will be literally humbled in the dust.

Ibid., p. 175
He is a fortunate man to be introduced to such a party of fine women at his arrival; it is literally to feed among the lilies.

The speaker in the first example isn't literally the humblest of his correspondent's servants; the speaker of the second doesn't think the foliage will be literally humbled in the dust (as a human would be humbled); and the speaker of the third isn't talking about literally feeding among the lilies. And yet there it is. Would we be as perturbed by this usage if we switched literally to really, even though it would be intensifying figures of speech that aren't "real"? As Sheidlower points out, really is for some reason impervious to the criticism leveled at literally, much as sentence adverbs like clearly escape the animadversions heaped upon hopefully.

A final comment: one of Sheidlower's early examples of a usage maven complaining about intensive literally is in a 1909 book by the great satirist Ambrose Bierce entitled Write it Right: A Little Blacklist of Literary Faults. It's old enough to be in public domain, and the text can be found on the Internet via Project Gutenberg, inter alia. From a quick look at some of the entries, it seems far more entertaining than The Elements of Style. This shouldn't be surprising for fans of The Devil's Dictionary and other acid-tongued companions in the Bierce oeuvre. So how come Write it Right isn't memorialized with lavish illustrated editions and operatic song cycles?

Posted by Benjamin Zimmer at 09:19 AM

A perilous portmanteau?

It remains to be seen if the new Supreme Court nominee, Judge Samuel Alito, will earn an eponymous verb like Bork, Souter, and Miers. But he's already responsible for a somewhat dubious contribution to the lexicon. In the coverage of Alito's nomination, journalists and pundits repeatedly mention that he has earned the nickname Scalito out of perceived similarities to Justice Antonin Scalia. As a CBS report explains, Scalito is "a nickname of dual purpose: it meshes his name with that of conservative Justice Antonin Scalia and is also a translation of 'little Scalia.'" It's been a good couple of weeks for neologistic blending: first Fitzmas, now Scalito.

The creation of a lexical blend (or portmanteau, as Lewis Carroll famously labeled such coinages as slithy from lithe and slimy) typically combines semantic elements to mimic the phonological fusion. When people's names are blended, it often indicates the inseparability of the two blendees. Washington Post executive editor Ben Bradlee famously bellowed "Woodstein!" when he had difficulty distinguishing the young reporting team of Bob Woodward and Carl Bernstein. The ascension of Bill and Hillary Clinton to the White House saw the popularization of Billary — once used endearingly (as in the 1992 campaign when Hillary was using the line, "Buy one, get one free"), but later made pejorative by opponents who ridiculed the idea of a "co-presidency." More recently, we've had a rash of blends identifying celebrity couples: Bennifer, Brangelina, TomKat. (A documentary accompanying a new Greta Garbo DVD collection reveals a predecessor from the silent movie era: Garbo and John Gilbert, her partner in romance on and off the screen, were blended into Gilbo Garbage.)

But Scalito is a different kind of onomastic blend: an epithet combining elements of two names to suggest a resemblance of one named person to the other. In recent American political history, such blends have been almost uniformly derogatory. Some examples:

  • Kerredy (used by the right to compare John Kerry to Ted Kennedy)
  • McStarrthy (used by the left to compare Kenneth Starr to Joseph McCarthy)
  • Hitlery/Hitlary (used by the right to compare Hillary Clinton to Hitler)
  • Bushitler (used by the left to compare George W. Bush to Hitler &mdash a possible play on bullshitter?)

Even when political figures are blended with the names of fictional characters, the connotation is typically negative: Clarence Thomas was called Tom Ass Clarence by Amiri Baraka as an allusion to Uncle Tom; Ronald Reagan was called Ronbo or Ronzo to evoke unfavorable comparisons with Rambo or his onetime costar Bonzo the chimp. (Thanks to contributors of the alt.usage.english newsgroup for suggesting many of these.)

Given the pejorative nature of such blends, it's not surprising that there has already been a Scalito backlash among Alito's supporters. Time reports that "clerks and associates say the comparison [of Alito to Scalia], often made with the derisive nickname of 'Scalito,' does a disservice to the man." On the blog Blue Mass. Group, David Kravitz offers "One liberal's positive view of Alito," in which he interviews Kate Pringle, a former clerk of Alito who happens to be a progressive Democrat:

If you've heard any news stories about Judge Alito, you've heard that his supposed "nickname" (it remains unclear by whom it was bestowed) is "Scalito," the idea being that he's a "little Scalia."  I asked Pringle if she thought this was fair to Alito.  "No," she said, "I never have."  Pringle noted that Scalia and Alito are of course both of Italian ancestry, are both Catholic, and are both conservative, but she thinks there are more important differences between them including temperament, personal style, and the desire (or lack thereof) to find consensus. (My own view, FWIW, is that this "Scalito" business is simply due to two conservative judges having Italian surnames that happen to sound similar.  It is therefore insulting and juvenile and should be dropped immediately - if two Jewish judges' names were subjected to similar wordplay, the "joke" would be widely condemned as anti-semitic.)

Kravitz and Pringle focus not just on the blending of the two judges' names, but also on how Scalito works as a diminutive form of Scalia with the suffix -ito. The -ito reading certainly seems to add an extra note of condescension, and it recalls the work of linguistic anthropologist Jane Hill on the use of "mock Spanish" as a means of derogation. Stay tuned to see if a consensus emerges among our political tastemakers, either embracing Scalito or dismissing it as "insulting and juvenile."

[Update #1, 11/1/05: From the conservative side of the spectrum, more complaints about Scalito (courtesy The Drudge Report) — "One outraged Republican strategist claimed, 'If Alito were a liberal there would be no way Democrats and Washington's media elite would use such a ethnically insensitive nickname. Italian-Americans should not have to face these types of derogatory racial slurs in 21st century America.'"]

[Update #2: More from Drudge — "National Italian American Foundation Demands 'Scalito' Apology." Press release from NIAF chairman A. Kenneth Ciongoli here. And from the left, a rebuttal on Daily Kos: "Kenneth Ciongoli: Republican Donor Spreads Lying Smear Against Dems."]

[Update #3: Matthew Continetti of the Weekly Standard also takes offense at Scalito: "The nickname is misleading. The two men may share a vowel at the end of their last name. But, needless to say, they're different people." Link via Nick at the musement park blog, who spoofs a recent statement by Tom DeLay to complain that "we are witnessing the criminalization of wordplay."]

[Update #4: The Scalito nickname dates back to a Dec. 7, 1992 article in the National Law Journal, according to the blog The Buck Stops Here. In the comments section of the blog, Shannon P. Duffy (then a reporter for the Legal Intelligencer), takes credit for coining Scalito. (Link via Wikipedia.)]

[Update #5: A question... does Italian borrow the Spanish diminutive suffix -ito? I'm only familiar with the Italian diminutives -ino, -etto, -ello, and -iano. But I suppose there is graffito, diminutivized from graffio (though it's unclear if this is related to graffiato, past participle of graffiare). Perhaps -ito is an older Italian suffix cognate with the Spanish, with -etto as the Modern Italian equivalent?]

[Final update, 11/2/05: see this post for a roundup of the latest developments.]

Posted by Benjamin Zimmer at 01:20 AM