January 31, 2005

The multi-purpose linguist

Yesterday's NYT op-ed column by Maureen Dowd is far more interesting for reasons other than this one, but consider the following passage:

After the prisoner spat in her face, she left the room to ask a Muslim linguist how she could break the prisoner's reliance on God. The linguist suggested she tell the prisoner that she was menstruating, touch him, and then shut off the water in his cell so he couldn't wash.

I've gotten used to the term 'linguist' being used to mean 'interpreter', especially lately; I understand it when 'Arabic linguist' means someone who can translate to and/or from Arabic and 'Iraqi linguist' means more specifically someone who can translate to and/or from Iraqi Arabic. (Note that it's usually assumed and/or understood that English is on the other end of the translation.) Now obviously, a really good interpreter has more than just a dictionary-and-basic-grammar-level understanding of the languages to be translated; they also tend to have a good understanding of the different customs of the people who speak those languages. And so here we have a 'Muslim linguist' -- the idea being that this linguist has an understanding of Islamic customs and can thus be called upon to share this cultural knowledge just as conveniently as they can be called upon to translate (to and from Arabic, presumably, but who knows).

I kinda wish this linguist had said, "What are you asking me for? I'm just an interpreter," and kept their big mouth shut.

[ Comments? ]

Posted by Eric Bakovic at 07:20 PM

The SAT fails a grammar test

Jennifer Medina had an article in yesterday's NYT about how "the new SAT, with all its imponderables, is increasing the agitation" of high school juniors across America.

What used to be a two-part, three-hour ordeal, half math, half verbal, will now require students to spend 45 more minutes completing an extra writing section. The new section will consist of three parts - one an essay, the other two multiple-choice grammar and sentence-completion questions.

Among the sources of anxiety that the article cites are the fact that "scoring an essay is subjective at best", and the students' uncertainty about how colleges will weight the old and the new SATs, which options are required by which colleges, and the relative difficulty of the tests to be given on different dates. I hate to add to the agitation of our nation's young people, but based on the controversial grammar questions of the past, and the sample questions now on the SAT web site, anyone planning to take the new SAT should also be very worried about the type of question that the College Board calls "Identifying Sentence Errors".

I tried the two sample questions in this category. In each test sentence, I could easily see one place where some people would identify an error. However, each of the possible "errors" is doubtful at best, and "No Error" is always one of the options. As a result, my decision about how to answer becomes a judgment about the linguistic ideology of the College Board, not a judgment about English grammar and style.

The instructions tell me that

This question type measures a [sic] your ability to:

  • recognize faults in usage
  • recognize effective sentences that follow the conventions of standard written English

and provide the more specific directions:

The following sentences test your ability to recognize grammar and usage errors. Each sentence contains either a single error or no error at all. No sentence contains more than one error. The error, if there is one, is underlined and lettered. If the sentence contains an error, select the one underlined part that must be changed to make the sentence correct. If the sentence is correct, select choice E. In choosing answers, follow the requirements of standard written English.

OK, fair enough. Now here's one of the sentences:

After (A) hours of futile debate, the committee has decided to postpone (B) further discussion of the resolution (C) until their (D) next meeting. No error (E)

The official answer is this:

The error in this sentence occurs at (D). A pronoun must agree in number (singular or plural) with the noun to which it refers. Here, the plural pronoun "their" incorrectly refers to the singular noun "committee."

This is doubly problematic. In the first place, it raises the issue of whether collective nouns like "committee" are singular or plural, from the point of view of verb agreement as well as pronoun choice. This is a matter on which British and American norms are different -- and the instructions refer us only to "the conventions of standard written English", not to "the conventions of standard written American English" (or should that be "standard American written English", or "American standard written English"?).

In the second place, if we take committee to be singular, there is still the infamous "singular they" question, about which we at Language Log alone have written more often than I care to think about (here, here, here, here, here, among others).

In fact, this kind of constructio ad sensum has a distinguished enough history to have a special name in traditional grammar: synesis:

A construction in which a form, such as a pronoun, differs in number but agrees in meaning with the word governing it, as in If the group becomes too large, we can split them in two.

Often-cited examples from the King James translation of the bible include:

For the wages of sin is death. [Romans 6:23]
Then Philip went down to the city of Samaria, and preached Christ unto them. [Acts 8:5]

As for the authority of respected members of today's community of English users, the examples on committee-rich sites like the U.S. Congress and the National Academies seem to favor they and their in anaphoric reference to committee:

I thank the committee for their time and look forward to working with them in the future.
And now we are transferring the jurisdiction over securities to the Banking Committee so that they may conduct the business of the securities industry in precisely the same way they have supervised the business of the banking and the savings and loan industries.
The panel agreed with the chair's suggestion to submit the revised chapter of findings, conclusions and recommendations to the NWS Modernization Committee for their review at the February 9-11 meeting.

If the College Board is right about this, then hundreds of thousands of phrases in the Congressional Record and similar places, which seemed fine to their authors and seem fine to me and many other competent analysts as well, are in fact grammatical errors. Could we ask for a recount here?

Here's the other practice question:

The students have discovered (A) that they (B) can address issues more effectively through (C) letter-writing campaigns and not (D) through public demonstrations. No error (E)

Again, I had no trouble seeing where the problem might be. As the official answer explains:

The error in this sentence occurs at (D). When a comparison is introduced by the adverb "more," as in "more effectively," the second part of the comparison must be introduced by the conjunction "than" rather than "and not."

But the trouble is, comparatives don't always need a "second part" introduced by "than". The "second part" may be omitted entirely:

Apartment hunters have more choices these days.
Powell fears more violence as elections loom closer

or the cited change may be contrasted with an alternative in a conjoined phrase:

For example, cattle eat more grass in winter and less in spring; more forbs in spring and less in fall and winter; and more browse in fall and less in spring.
The outlook for precipitation is much less certain, but most projections point to more precipitation in winter and less in summer over the region as a whole.

The contrasting alternative is sometimes expressed with a conjoined negative, as in this phrase from a user's manual:

If your television has a number of video inputs, it is better to go direct and not add extra cabling.

This does not seem in any way ungrammatical to me, and the alternative

If your television has a number of video inputs, it is better to go direct than to add extra cabling.

does not strike me as a stylistic improvement. More exact counterparts can be found in an interview with Ken Knabb about Kenneth Rexroth:

He had this notion that the poem was going to subvert people little by little. That it was more effective to be subtle, and not just use crude propaganda.

and a report from the British House of Lords:

We consider that the safety issue would be dealt with more effectively by JAR-OPS and not by a Directive which would overlap with existing regulations.

I don't believe that these two examples are ungrammatical, nor do I think that they would be improved stylistically by replacing the conjunctive contrast with a than phrase. The SAT example

The students have discovered that they can address issues more effectively through letter-writing campaigns and not through public demonstrations.

is also clearly not ungrammatical. I guess I agree that the College Board's preferred alternative

The students have discovered that they can address issues more effectively through letter-writing campaigns than through public demonstrations.

is a bit better, but it's still a rather awkward sentence. In any case, the answer No Error (E) seems like a plausible answer to this question as well.

Let me be clear:

  • I support and uphold the norms of standard written English in spelling, punctuation, word usage and grammar.
  • I agree that students should learn these norms and should be tested on this knowledge.
  • I believe that well-defined violations of these norms often occur.
  • I recognize that writing can be culpably awkward or unclear, even when it is fully grammatical, and that students should learn to recognize and correct examples of this.

However, I also believe that linguistic norms should be defined by the actions and judgments of respected members of the community, not the invented regulations of isolated self-appointed experts. It's patently unfair to ask students to identify as errors contructions and usages that are widely used by respected writers and viewed as acceptable by expert analysts.

I therefore have two suggestions for the College Board.

First, create a usage panel like the one that Geoff Nunberg chairs for the American Heritage Dictionary. Don't put Sentence Error questions on the SAT -- or among the practice questions on your web site -- without checking them with your usage panel.

Second, eliminate the "No Error" answer from your grammar and usage questions. Rephrase your instructions as something like:

The following sentences test your ability to recognize grammar and usage errors. Each sentence contains one example of a word choice or a grammatical choice that is often regarded as an error by skilled users of standard American English. Select the one underlined part that must be changed to avoid this perception of error.

Then a student who knows, as I do, that "singular they" is deprecated by a few authorities, but is supported by most informed grammarians, and has often been used by great writers over the centuries, will not be forced to second-guess the ideology of the test designers:

"... well, there's not really any error at all in this sentence; but there is an instance of singular they; so perhaps the testers want me to flag it as an error, in which case I should answer (D); or perhaps they are trying to catch the silly people who incorrectly believe that synesis is always an error, in which case I should answer (E); hmm, how sophisticated and well informed do I think that the designers of this test are?..."

A student who can reason along those lines certainly deserves full credit for this question; but as things are set up, it's a coin toss. If No Error (E) were not an available answer, then the student could reason

"...well, there's no error in this sentence, but there is an instance of singular they, and that must what the in-duh-viduals who designed this test want me to answer, so OK, (D) it is..."

This would still be testing knowledge of linguistic ideology rather than knowledge of English grammar, but at least it doesn't require the student to calibrate the College Board's precise ideological stance in order to answer "correctly".

[Update: if you haven't had enough, there are other posts on this subject here , here and (at paralyzing length) here.]


Posted by Mark Liberman at 03:26 PM

Air quotes in New York

We're having a nasty cold snap in New York. Plus the snow is piled high and dirty, the days are short, and current events are depressing.

But there is one thing that never fails to cheer me up walking these glum streets, and that is signs written by shop owners under the impression that quotation marks convey emphasis. One of my favorites is a cleaners that advertises its "free pick up and delivery", as if there's something abstract or hypothetical about the service. Or, another shop has Why rush? Drop off your laundry on your way to work, "pick it up on your way back home" — as if that's a song title or some kind of wise old saying. Then there's one I pass every day, where a proud candy store tells us that "when it come to nuts, chocolates and candies, we are the best".

The grammar of that last one shows that most of these shops are run by immigrants, who often have limited knowledge of English, and especially the nuances of English as it is written. What's interesting, though, is how very commonly immigrants make this particular mistake. Part of the reason is that it is an understandable mistake — even a predictable one. After all, quotation sets off something someone says, and it's a short step from setting something off to emphasizing it. For someone with a distant relationship to the printed page — at least in English — it's natural to suppose that quotation marks are highlighters, since in a way, they are.

There's nothing unusual here. For example, it is exactly these kinds of small misinterpretations that changed Old English into Modern English. A thousand years ago, I will go meant that you willed to go, that is, you wanted to go, not that you were going to go. But because what you want is often what ends up happening in the future, people gradually started thinking that I will go meant that I shall go. After a while, so many people were hearing it this way that now, I will go did mean I shall go. The immigrants are making the same kind of leap of logic about written English today.

But fans of books like Lynne Truss's Eats, Shoots, and Leaves need not fear. Written language is more resistant to change than spoken language. Plus, these immigrants' children learn traditional writing skills in school. Just as they often don't have their parents' accents, they won't be sending out wedding invitations saying "Your presence is requested."

Which means that using quotation marks as what we might call The New Boldface will just hang around as an underground alternative punctuation. New generations of immigrants will pick it up from older ones.

Or at least I hope so. There's a certain elegance to the quotation marks in this new usage; they can spark up a tired old phrase like Free Pick Up and Delivery. And they're sure better than what big marketers have been doing to fine old logos lately, as another way of highlighting. In one logo after another, the letters are now slanted to the right, so that they look like they're running like the Road Runner. I suppose the idea is to make it look like Denny's is a dynamic experience, or that Sunoco will rock your world. But life goes by fast enough. I want my Burger King Whopper to sit still.

Or, if I want to go healthier, then I could try one shop whose exquisite sign used to entice us to Create "a" Salad. I must admit I never quite understood what they were getting at there, but it did brighten many of my days.

Posted by John McWhorter at 01:26 PM

Not everything that passes

A friend in network engineering side of things at UC Santa Cruz wrote on Saturday to ask me a linguistic question:

Yesterday the Internet2.edu web site got hacked. In place of their home page was a single line:


It's all fixed now, but I was wondering exactly what their message was. Does this mean anything to you?

Well, Language Log does not offer a universal free translation service. We have so many other duties, covering the entire field of the language sciences — monkeys, gibbons, parrots, eggcorns, snowclones... God, the work, the pressure...

But Mark and I (he happened to be visiting here at the Santa Cruz branch office) were sort of intrigued, so we did take a lingering look at this cryptic message. Clearly Romance. Not Latin, not Spanish. It didn't take long to see that it was Brazilian Portuguese, and "H4ck3rsBr Group" is a Brazilian hackers' group (Brazil's suffix is .br). We are not by any means Portuguese-competent, but we developed the hunch that the quoted sentence might mean something like, "Not everything that passes you signifies that you're stationary", i.e., just because someone goes by you, that doesn't mean you're standing still. What we can't do is provide a context or a deeper interpretation. Is this a quotation? A proverb? Does it suggest something else in the context of the Brazilian computer world? Anyone who knows can send a tip to pullum (the site is ucsc, the domain edu). Sources will of course remain protected by the journalist's code of confidentiality: if you're a Brazilian hacker, we won't tell.

[Added later (January 31 and February 1): OK, we have many responses (thanks to all): Brazilians and others alike agree that the meaning is roughly "Not everything that goes slow means that it's stopped." It's as ill-phrased in Portuguese as this translation is in English. A more fluent presentation of the meaning might be "Just because something's going slow doesn't mean it's stopped." Devaga is not spelled in a standard way: the word is devagar, but Brazilians generally don't pronounce the final r at all, and the hackers have left it off the spelling. Ta is a very informal spelling of esta. One guess at what this highly colloquial slogan is trying to say would be that the hackers group is telling us that just because they're going slow (they're not very active) that doesn't mean they've stopped altogether. My first guess was actually that they might be making an announcement about Brazil: that it may not be a major player in cyberspace yet, but not everything that goes slow is stationary. But that turns out to be nonsense. Francisco Borges has pointed out to me that Brazil is developing a worldwide reputation in cybercrime: note "Brazil Becomes a Cybercrime Lab", "Brazil Leads Hacker Pack", and older stories like this one and this one. The latter story notes that "As a result, Portuguese has now become the lingua franca of the hacking underground." That is something we should have known here at Language Log Plaza!

Posted by Geoffrey K. Pullum at 01:13 AM

January 30, 2005

Bait and switch

Idiomatic expressions that originate as descriptions of very specific event types, like bucket brigade, are inclined to get extended semantically, so that they can describe situations lacking some of the historically defining details. Recently I came across a dramatic example of semantic extension, buried in the flap about SpongeBob SquarePants, which hit the papers on 1/20/05. The NYT that day (p. A12) reported the following from Paul Batura, assistant to James C. Dobson at Focus on the Family:

"We see the video [a music video in which cartoon characters, among them SpongeBob SquarePants, teach elementary school children about multiculturalism] as an insidious means by which the organization [the We Are Family Foundation] is manipulating and potentially brainwashing kids. It's a classic bait and switch."

First, a few words about bucket brigade (lifted from my 2002 NWAV talk on "Seeds of Variation and Change"), where the historical developments can be seen as incremental moves away from an original event type that is rich in detail.

The original bucket brigade involved chains of people, buckets of water, and putting out fires. Bucket brigade can still describe such events, but the expression now has extended senses denoting emergency situations of all sorts, not necessarily fires; as a result, buckets of water aren't necessarily involved, either. At this point, the semantic extensions go in at least two different directions. The extended sense reported by Webster's New International 3 refers to chains in emergency situations, with humans not necessarily involved: 'any chain (as of persons) acting to meet an emergency'. On the other hand, the Random House Dictionary of the English Language reports an extended sense referring to human action in an emergency, with chains not necessarily involved, as in its cite: "Seeing the two guests of honor bickering, the rest of the group formed a bucket brigade to calm them."

Back to SpongeBob SquarePants. Most of the flap has concerned Dobson's claim that the music video promotes homosexuality. But according to Nile Rodgers, the founder of the foundation, nothing in the video or its accompanying materials refers to sexual orientation, nor does the video mention the "tolerance pledge" (borrowed from the Southern Poverty Law Center) that appears on the foundation's web site; the pledge counsels tolerance for "sexual identity", among a variety of other things.

The Focus on the Family position seems to be that the video is "pro-homosexual" (Dobson's word) because it will lead its young viewers to the website and so to a mention of respect for "sexual identity" (not further explained), a mention that transparently (to Dobson's way of thinking) furthers the homosexual agenda; or perhaps that counseling tolerance in general terms is covert advocacy of homosexuality and therefore reprehensible; or perhaps that the very involvement of SpongeBob SquarePants, who some see as a character of suspect sexuality (I'm not making this up, you know), contaminates the whole video. But these dubious lines of reasoning aren't what I'm interested in here. My interest is in the expression bait and switch as applied the association between the video and the "homosexual agenda". This is a huge extension of the meaning of the expression.

Your classic bait and switch is, in the words of the American Heritage Dictionary 4, "a sales tactic in which a bargain-priced item is used to attract customers who are then encouraged to purchase a more expensive similar item." Batura's use preserves the component of deception, the assertion that one thing is offered (but not specifically for sale) and another provided (but in addition to, rather than instead of, the first), and the presupposition that the thing provided is in some way unsatisfactory (but morally offensive rather than expensive). You can get from AHD4 to Batura, but it's a long trip, and I haven't found any instances of the steps along the way.

My guess is that Batura settled on "bait and switch" as a vivid alternative to "deception" or "hidden agenda", without thinking through the details.

Addendum: When I first posted on this topic to the American Dialect Society mailing list, on 1/23/05, Larry Horn replied to the list, that same day:

That is curious. In recent political contexts, I've more often come across "bait and switch" in op-ed pieces from the left, or at least from those critical of the current administration and its policies. Economist Paul Krugman in particular seems very fond of this turn, as applied especially (but not only) to the handling of Social Security and tax cuts. Nexis turns up 11 hits of "bait and switch"... from Krugman op-ed columns or, in one case, a piece by him for the Magazine... [Larry supplies a few examples here.] But clearly these are all much more conventional applications of the figure than the one involving Messrs. SpongeBob and Batura.
I wonder if there are enough critics of Bush et al. who follow Krugman in this accusation to have inspired Batura to apply the same term for their own ends, whether or not the circumstances justify it.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 01:18 PM

If this is Tuesday it must be minimalism

Since Mark is doubtless too diffident to tout his own mots, I can't resist quoting one from the talk he gave at Stanford on Friday, under the provocative title "A series of unfortunate events: the past 150 years of linguistics." One problem with modern linguistics, he said, is that our results have excessively short half-lives. It isn't so much that the results we get turn out not to be true ten years later -- they're not even expressible.

Posted by Geoff Nunberg at 02:02 AM

"Nation split on Bush as uniter or divider"

That was the puckish headline over the cnn.com report of a CNN/USA Today/Gallup poll released recently, which showed that 49 percent of the respondents believe Bush is a "uniter" while another 49 percent said he was a "divider," and 2 percent had no opinion.

Well, okay, I can see Bush supporters quarreling with the "divider" part (as in, it's not his fault -- what can you with a bunch of leftist soreheads?). But the agentive -er suffix in uniter entails accomplishment , not merely effort -- otherwise I could describe the San Francisco 49ers as winners. And for whatever reason, a "uniter" Bush clearly ain't (I mean, unless people mean by that, "well, hey, he's united me").

You'd figure Bush's supporters could live with that, particularly since the point really isn't subject to partisan disagreement. Why not just bite the bullet and say, as controversial presidents always do, that chosing the right policies takes precedence over choosing the popular ones?

But that's what polarization comes down to these days. Never mind agreeing to disagree; we can't even agree thatwe disagree.

Posted by Geoff Nunberg at 01:48 AM

Contaminated identities

Bill Poser complains, reasonably enough, about the attention devoted to Prince Harry in a Nazi uniform, especially in a world in which so many vastly more serious things are going wrong:

The critics are assuming that wearing a Nazi uniform is a non-verbal indirect speech act that demonstrates support for the Nazis. Not one comment on this topic that I have seen mentions the basis for this belief. Surely it is false. People routinely wear costumes representing figures of whom they do not approve. When a couple come to a party as Bonnie and Clyde, does anyone think that they approve of bank robbery? When someone dresses as a pirate, is that taken to show approval of piracy? Of course not. Wearing a costume does not indicate approval.

But not all costumes -- or assumed identities -- are equal.

I grant that none of this is rational, none of this makes sense in purely logical terms. But there's a powerful (irrational) cultural basis for the horror here: certain identitites are highly contaminating culturally, to the extent that playing with them lays the performers open to attributions of actually having the portrayed identities. [Insert some appropriate reference to Erving Goffman on stigma here.] Nazis are over the line; pirates, bank robbers, even serial killers (like Jack the Ripper) are not.

Homosexuals are also over the line. Portraying, or appearing as, a lesbian or gay man puts the performer at risk of being assumed to be homosexual, while portraying, or appearing as, a vampire or impassive murderer or wife-beater or alcoholic or whatever carries no such career risk. The effect is famous. Being gay is a contaminating identity. (Some actors manage to escape it, but nobody denies its power.)

I didn't make this stuff up. I'm just reporting on it.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 01:23 AM

January 29, 2005

Flag waving

As usual, I agree with almost everything Arnold Zwicky says, but I think explanatory adequacy is a suggestive example of a quirky linguistics meme for precisely the reason he says it is a bad example: as he puts it, someone who uses the expression might just as well be waving flags that say "CHOMSKY" and "MIT". Precisely! Using the term, I'm suggesting, has more to do with saying who we are than saying what we mean.

I'll buy, in humble pie contradistinction to what I said previously, that explanatory adequacy is a technical term. Quite apart from Chomsky's stated views on the meaning of explanatory, we see at least an ostensive definition in the form of explicit contrasts between theories which are explanatorily adequate, and those which are "merely descriptively adequate" as I recall Chomsky put it. (And the word merely is part of this meme.)

How productive is explanatory adequacy? Is it the case that something has explanatory adequacy if and only if it is an explanation which is adequate? If a linguist uses the phrase adequate explanation, is this then a technical term? What differs, I submit, is not the semantics, but the pragmatics. A linguist who uses the phrase adequate explanation to describe a theory would presumably deny that the theory was merely descriptively adequate. So the meanings of adequate explanation and explanatory adequacy must be closely related for linguists in general. Both phrases could be used with Chomsky's original acquisition-centered perspective in mind, although my guess is that in practice both are used without this slant: certainly the practice from Chomsky on has been to use them without heavy use of actual acquisition data. The main difference between a linguist that uses explanatory adequacy and one that uses adequate explanation is typically that the first one waves flags that the second one does not — at least, not as vigorously.

Logically, there are at least two ways that a term like explanatory adequacy could come to be widespread in the field. First, some users of a term they have seen before could be repeating it rather lazily, a boilerplate they hammer into a paper to show whether they approve of a given theory. It would then be snowclonic, to abuse the technical vocabulary of this blog. Alternatively, they could be repeating it because it really captures a feeling they take to be appropriate. And the question then is, do they have this feeing because they want to assert that some x is P, where x is a theory and P is the set of things that satisfy explanatory adequacy. Or do they have the feeling because they know that decorating their thoughts with a patina of Chomskyana will help establish their credentials and their sub-cultural affiliation? And are those linguists (I could name some) who fail to so decorate their papers failing to do so because the concepts do not refer appropriately, or because at some level they want to express their disaffectedness?

[Update: In my original post, I should have taken into account that hundreds of occurrences of explanatory adequacy are bibliographic citations of Chomsky (2001) Beyond Explanatory Adequacy. This does not affect the argument in a big way, since using Google I can put an upper limit on the number of these citations at less than 15% of the total. But perhaps it is time for me to move beyond explanatory adequacy, and figure out a better way to illustrate my pop-sociolinguistic Tensor-derived point, by finding more linguistic memes that don't fit neatly into the category of technical terms. Even better would be if someone does it for me! Suggestions sent to dib AT stanford DOT edu are welcome. So far the best I came up with is the word counterexemplify, which is used almost exclusively by linguists, although with a small minority of philosophical uses, and gets over 100 Google hits in various morphological forms. By comparison, counterexample gets hundreds of thousands of hits, mostly non-linguistic. So which linguist was it that first turned counterexample into a verb (or added a prefix to exemplify)?]

Posted by David Beaver at 03:24 PM

Adequately explaining explanatory adequacy

David Beaver has just taken a look at semi-technical terminology that is (largely) peculiar to linguistics, focusing on the expression explanatory adequacy and maintaining:

Maybe the compound explanatory adequacy could be regarded as a technical term in linguistics. But I had not thought of it that way, and nor do I want to now. I personally had assumed that the meaning is derived compositionally from the meanings of explanatory and adequacy, and that neither of these were technical terms in linguistics.

Not the best of examples, since this expression has been associated with Noam Chomsky ever since he made it famous in the early 60s, in explicit contrast to descriptive adequacy. He has always treated it as a technical term, though it might just possibly be semantically compositional for him, given his views on what counts as an explanation (which would make explanation and explanatory technical terms). In any case, someone who uses the expression might just as well be waving flags that say "CHOMSKY" and "MIT".

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 02:08 PM

Stark raven mad

Rachael Briggs' eggcorn has two yolks with different syllabifications.

What can I say? Shit happens. Not always intentionally, I surmise:

Or, she has children that have caused the gray hair and despair from them trying her patience and she is stark raven mad.

Believe me when I say, I am use to crazy people. In fact, I really don't mind if you are stark raven mad.

Zeus. I'm serious. If anything happened to you, the count would be stark raven mad.

I can remember having panicky attacks at the tender age of 5, and they scarred the heck out of me. My worse ones were of severe thunderstorms, especially after getting caught out at the lake during a tornado in Kansas. I still get panicky when we have severe weather no matter where I live, but now I clean the house like a stark raven mad woman. I think I do this to try and keep the panic attack under control.

[PS. "scarred the heck out of me"? If you want to find eggcorns, go look in Kansas during tornado season.]
Posted by David Beaver at 01:38 PM

Star Craving Mad

Rachael Briggs sent in a lovely example of that rare subspecies the resyllabification eggcorn.

Hi. I'm one of Language Log's anonymous gentle readers, and I thought you might
be interested in the following:

A friend of mine (one of those self-proclaimed grammar types) claims to have discovered this sentence, which uses the phrase "star craving mad" in place of "stark, raving mad", on some sort of online discussion forum:

"people would ask me whats her name i said Garren they would look at me like i was star craving mad like why did you name her such a boyish name"

Since I'm supposed to be writing papers at the moment, I thought I'd google the phrase "'star craving mad'" and see what came up. Unfortunately, there's a novel by that title by an author named Miller, and most of the hits were ads for or reviews of the novel. "'star craving mad' -book -books -miller" yielded 59 results. Most were either stray references to the book or deliberate puns, but a few of them looked like real eggcorns.

"'This is my social life,' said Workentine [a food bank volunteer]. 'It gives me a reason to get up in the morning. If I just sat at home looking at the four walls, I’d go star-craving mad.'"

"The snoring was enough to drive the sanest person star craving mad."

"I'll be crying and in horrible pain and she'll toss a couple tylenol at me and rush off to appease my two younger sisters. It drives me star craving mad."

The strings "star craving crazy" and "star craving nuts" turned up no hits, so I guess the eggcorn hasn't yet caught on in any big way.

Keep up the blogging! Your work affords me a priceless opportunity to procrastinate enjoyably. Okay, back to writing papers.

Always happy to be of service. And thanks for the fruits of your investigation...

[Update: in addition to David Beaver's observation about the attested alternative eggcorn stark raven mad, Q Pheevr points out that there is another available variation, star craven mad, which is "unattested qua eggcorn", being only found so far as a witting (if not witty) pun.]


Posted by Mark Liberman at 11:01 AM

What is explanatory adequacy?

There's a nice post at Tenser said the Tensor on distinctive lexical features of the linguistic subculture. The point is that the common language of a scientific subculture is not just a simple sum [the standard language of scholarly writing]+[an explicitly or even ostensively defined technical vocabulary]. In addition to the clearly technical vocabulary there is likely a huge grey area of semi-technical or non-technical quirky turns of phrase which, having been infiltrated into our community by a single linguist, pass virus-like from one paper to the next.

Tensor's examples are X motivates Y (with the meaning X provides motivation to accept Y), and standardly assumed. I would guess that it is standardly assumed by linguists that standardly assumed is not at all particular to linguistics, but would come up whenever there was something that was standardly assumed. Apparently not. Tensor points out that googling "standardly assumed" produces about 800 ghits, whereas "standardly assumed" -linguist -linguistic -linguistics -syntactic -phonology -phonological -morphology -morphological -grammar -grammatical -phrasal -clausal -indicative -subjunctive -raising produces only 67.

This is really suggestive: just how large are the differences in frequency profiles of the languages (midiolects?) of different scientific subcultures, and how rapidly do those frequency profiles change? Can we observe a Kuhnian paradigm shift in action by looking at frequency data alone?  And if we can observe changes in this way, then to what extent would the changes reflect new understanding, and to what extent a desire to act different?

The first thing I could think of to test Tensor's method is the Chomskyan phrase explanatory adequacy. which I compared with adequate explanation to provide a control on the method, and also on those weird and whacky Google counts we have been worrying about lately. Here are the searches and their Google frequencies:

"explanatory adequacy"
"explanatory adequacy" (non-linguistic)
"adequate explanation"
"adequate explanation" (non-linguistic)

The phrase explanatory adequacy is usually not surrounded by quote marks. For explanatory adequacy is something that we linguists like to present as an obvious shared goal of scientific investigation, and we generally assume that any educated reader will know just what we mean. Perhaps they will. But if that reader is not a linguist, the table above shows that the chances are that the reader will never have seen the component words organized in that way, since at least 85% of uses of the phrase (more if we look at the actual search results) are in the linguistic literature.

Maybe the compound explanatory adequacy could be regarded as a technical term in linguistics. But I had not thought of it that way, and nor do I want to now. I personally had assumed that the meaning is derived compositionally from the meanings of explanatory and adequacy, and that neither of these were technical terms in linguistics. I had thought that anyone who knows what an adequate explanation is also knows what explanatory adequacy is. Yet, at the text level, adequate explanation is distributionally quite unlike explanatory adequacy, and is clearly not peculiar to linguistics. The vast majority of hits for adequate explanation appear to be outside of linguistics.

When we use explanatory adequacy, we use words that an educated person would know, and the semantics is intended to be clear. For that reason, I've said I do not want to call it a technical term. So what is it? Well, what we are really doing when we use it is conjuring up a web of associations. Blah blah blah explanatory adequacy blah blah standardly assumed blah blah, I say, and you might almost think I have an MIT PhD. It's not so much a technical term as a term of art, or artifice, designed to tell you who I am. Or wannabe. A lexical meme, yes, but one I use with an intention to say something about who I am and what enterprise I am engaged in.

What is explanatory adequacy? I regret to have to tell you that explanatory adequacy is now part of my identity.

Posted by David Beaver at 03:24 AM

January 28, 2005

Grammar is bad for kids

"Teaching English grammar in schools is a waste of time because it does not improve writing skills, according to a Government-funded study published yesterday. The findings have led senior academics to urge Ruth Kelly, the Education Secretary, to remove the compulsory teaching of parts of speech and syntax from the national curriculum."

So says an article in The Telegraph of Jan 22 2005 (via Onze Taal). One might, of course, take issue with several aspects of the position reported.
  1. The finding, from a study by Richard Andrews of York University, is not based upon a controlled study. Rather, it is based on historical data tracking evaluations of writing across a period in which the UK law changed as regards curricular requirements. I know of no controlled study of whether teaching grammar helps writing, but email me (dib AT stanford DOT edu) if you know better. Furthermore, any such assessment must depend on an independent characterization of what constitutes good writing.
  2. Teachers of writing in the UK do not, to my knowledge, standardly receive any systematic instruction in linguistics. I am confident that in all my years of education in the UK, I never had an English teacher who knew what linguistics was. So any grammar instruction I received probably would have been detrimental to my writing, had I paid any attention.  More generally, one could take the same sort of study as the Telegraph reports on, and conclude that grammar teaching is at present not adequate to have an effect on writing. In that case, there should be more emphasis on grammar, especially in the training of teachers, not less.
  3. Understanding the nature of language, and learning how to think about language for yourself, is independently valuable: it is an important aspect of human culture in its own right, and if used properly is relevant to other areas, such as general critical thinking skills and second language learning. Teaching how language works does not need to be justified solely by its effects on writing skills.
The way grammar has traditionally been taught in the UK is as dull as dishwater. (By a Google count, dishwater and dish water are together 5 times as dull as ditchwater, or ditch water, but I'm not sure which was the earlier idiom.) Perhaps the problem is that much grammar teaching is uninsightful taxonomizing, rote teaching of parts of speech and syntax? Personal opinion: if anything is going to help kids write, it is not a bunch of rules and labels, which will just cramp kids' style and give them premature writer's block. What is needed is a way to help kids think about the structure of language for themselves, a basic scaffolding, and a way to jump effortlessly from one structure to the next.

Posted by David Beaver at 02:18 PM

Defecated to eggcorn fans everywhere

The numbers are too small to award this one eggcorn status. And it's also not very eggcorny since it's not clear that the mistakes are a matter of reanalysis. But what it lacks in frequency and linguistic sophistication, this double malapropism makes up for in appeal to that small but important segment of the Language Log audience that shares my puerile, not to say scatalogical, sense of humor.

Many of the Google hits are for intentional defacate/dedicate and defacate/defect replacements, but genuine unintentional productions do seem to occur:

Nowadays tourists visiting Hoa Lu could only see two temples some 500m apart from each other, standing in the ground of the ancient palaces: One was defecated to King Dinh Tien Hoang and the other to King Le Dai Hanh.
(Image of temple)

Aerobatic ship defecated to a great supplier of scale sailplanes Endless Mountain Models

(pic of ASH26E model plane here)

But Saddam reminds me of the little boy who cried wolf too often. Recall that once Mustafa (his son-in-law who was head of the secret police) defecated to Jordan and reported on the Nuclear Bomb program that Saddam had up to then successfully hidden from the UN. Hmm, what we didn't know...

[Frank Delaney] remembers the night his father came home "hugely amused" because a local man had asked him why he thought Burgess and Maclean had defecated to the Russians.

[Update: gee, I'm getting confused by our own terminology now. First time I posted this I wrote snowclone when I meant eggcorn. Snowcorn? Presumably, a historically incorrect reanalysis of a standard idiom-like template. Or vice versa.]

Posted by David Beaver at 12:36 PM

A linguistic contribution to American politics

According to an article by Daniel Terdiman in yesterday's NYT:

Joshua Tauberer isn't a stereotypical techie with a degree in computer science or engineering. Yet Mr. Tauberer, a 22-year-old doctoral student in linguistics at the University of Pennsylvania, is the brains behind GovTrack (www.govtrack.us), a site for up-to-the-minute information about Congress.

GovTrack differentiates itself from other sites devoted to Congress both by being free and by being fresh: it sends users e-mail updates anytime there's activity on legislation they want to monitor.

GovTrack lets users track activity of specific legislators. It can also send updates via RSS, or Real Simple Syndication. The site collects information from Thomas (thomas.loc.gov), the Library of Congress's legislation-tracking site, and the official sites of the House of Representatives and the Senate.

The Time article gives links for GovTrack and Thomas, but not for Joshua -- to save you a trip to Google, here he is.

[Link by email from John Lawler]

Posted by Mark Liberman at 11:05 AM

January 27, 2005

Agreement with nearest always bad?

In "Everything is correct" versus "Nothing is relevant", Geoff Pullum takes on the claim that if sentences like the following occur (and they do), then they're part of the language, and it's contradictory for those who who style themselves as descriptive linguists to say otherwise (as Geoff did in an earlier posting):

(1) Why do some teachers, parents and religious leaders feel that celebrating their religious observances in home and church are inadequate and deem it necessary to bring those practices into the public schools?

I agree with Geoff that it's very likely that (1) should be dismissed as an example of English. It's almost surely an inadvertent "agreement with the nearest" (AWN) -- instances of which are moderately common, especially in speech. But it turns out that for many speakers, there is a very small island of AWN cases that are indeed in their language and for which the "technically correct" agreement is dreadful.

Geoff's position is essentially that mistakes happen; people say and write things that they didn't intend and will disavow if they're given a chance. This is not even a slightly daring proposal. The rich literature on errors in language depends on our being able to make a distinction between inadvertent errors, on the one hand, and differences between the "correctness conditions" ("rules of grammar", in the usual terminology, though this expression can be misleading) on the language of different people -- between error and mere variation. In many cases, the distinction is easy to make, but, as Geoff points out, sometimes the linguists might get it wrong:

One could imagine that there might be people who actually have different correctness conditions, so that the quoted sentence was grammatical for them. There could be people for whom tensed verbs agree with the nearest noun phrase to the left, for example.

There are ways to investigate these things, as Geoff goes on to say. And most of the time, when you investigate AWN examples, you discover that people just lost track of where they were in the production of a sentence and used a recent NP (instead of the appropriate, but more distant, NP) to determine agreement on the verb. This happens so often that we can usually feel pretty sure that examples like the following -- which are ridiculously easy to collect -- arise in similar fashion. Modifiers following the head of a NP are the very devil.

(2) Let's see which one of the two of you are next. (dialogue on a Beverly Hills 90210 episode, heard in rerun 6/29/04)
(3) On handout: "... discuss challenges each approach still faces...". But said: "... the challenges each of them still face..." (speaker in LSA presentation, Oakland, 1/7/05)
(4) ... what the phonetic character of the rises are... (graduate student in conversation, 1/18/05)

Especially troublesome are sentences where the head refers to something other than an individual, either because it's a mass noun (industry in (5)) or because it's a universal (everything in (6) and (7)), so that plural semantics is in the air, and then a post-head modifier contains a syntactically plural NP:

(5) Mr. Geffen said that DreamWorks had been judged unfairly in an era where the entertainment industry, particularly the music, television and Internet businesses, have gone through tremendous upheavals. ("A Monster Hit but No Happily Ever Afters", by Laura M. Holson and Sharon Waxman, NYT, 5/17/04, p. C1)
(6) Everything from doorknobs to live alligators are for sale. (reporter from Congo, NPR's All Things Considered, 6/9/04)
(7) Normally in my lab, everything that I write, including academic emails, are proofread by someone before they are sent out. (e-mail to AMZ, 3/22/04, in mail checked by staff)

So all would seem (relatively) clear, until I came across item (8), which happens to have the subject , in this case a coordinate subject, after the verb:

(8) Going to his house was what I lived for. There were liquor, music, and a strong desire for my body. (J. L. King, On the Down Low (Broadway Books, 2004), p. 33)

As I pointed out on the American Dialect Society mailing list on 12/28/04, (8) has the "correct" (plural) agreement with expletive there, but it still sounds weird to me; I'd much prefer (8').

(8') Going to his house was what I lived for. There was liquor, music, and a strong desire for my body.

A few respondents stuck with the technically "correct" (8) -- and I am not denying their judgments -- but many agreed with me that (8) was awful, (8') much better (maybe even simply the "correct" version), and (8") straightforwardly acceptable:

(8") Going to his house was what I lived for. There were drinks, music, and a strong desire for my body.

This sure looks like AWN, but oriented towards a following subject rather than a preceding one. And in fact, as I pointed out on 1/10/05 on ADS-L, the estimable Merriam-Webster's Dictionary of English Usage had already figured this stuff out. In the "there is, there are" entry, we find scholarship stretching back nearly a hundred years:

...when a compound subject follows the verb and the first element is singular, we find mixed usage--the verb may either be singular or plural. Jespersen... explains the singular verb as a case of attraction of the verb to the first subject, and illustrates it... from Shakespeare... Perrin & Ebbitt 1972 also suggests that many writers feel the plural verb is awkward before a singular noun, and Bryant 1962 cites studies that show the singular verb is much more common in standard English.

What we have here is a tiny, and very specific, AWN island -- grammaticalized, to be sure -- in the midst of a general English requirement for straightforward subject-verb agreement. Hey, these things happen.

To add a little spice to the whole thing, I should point out that my guess about (8) is that King originally wrote (8') and it was "fixed" by a copy-editor. As a little bit of evidence in favor of this idea, I note that (as I pointed out on 12/31/04 on ADS-L) in the very same book King has at least one singular agreement where many of us would have the plural:

(9) The need [for a down-low connection] and strong desire to make that connection overrides all common sense. (p. 159)

This looks like the very common vernacular pattern (spread across geographical regions) for existential-there sentences to have singular verbs, no matter what. Or possibly it's AWN (with connection as the relevant determining noun). In any case, my guess is that (9) is just something that escaped the editor's red pencil. Hard to believe that someone who wrote (9) would have gone for (8) rather than (8'). Of course, anyone who wants to do fieldwork with J. L. King can go at it.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 11:34 PM

Publishers are good; really!

Publishers are nice, honest, friendly, capable, and valuable people, and all of us who work with them to produce published works find that our lives are enriched by their good sense, kindness, enterprise, intelligence, and love. I want that clearly understood.

Indeed, ultimately I would very much like to reach the nirvana point of actually believing it myself. I certainly try very diligently. But people keep sending me links to one specific story of vile, utter evil, which doesn't help. And then there are the many sins of those who mess with our prose after the writing stage — yes, including those copy editors who [change to whom? —Ed.] publishers employ to change things that never needed to be changed. In one recent article of mine I found that every though had been changed to although (a real baffler, that), and since was regularly to because; and then after the final proofs, right there in the printed book as I finally received it, several new garblings were introduced by the printers, a couple of sentences lost their initial capitals, and all the arrows in formulae turned into the figure 6. Yet that was a luxuriously trouble-free experience compared to what happened to a colleague at another institution after he delivered the typescript of an accepted book to his publisher.

The book was delivered in Dec 2002, both in hard copy and electronically. (We do that these days, we authors, so that The Process Of Production Will Be Trouble Free. Ha! Ha! Read on. Unless you're an author, in which case don't, it will tie your stomach in knots.) The author received acknowledgment of both the typescript and the CD ROM. The graphics for the volume had been professionally drawn at the author's expense, and were delivered separately, both on paper and in CD-ROM form, in March 2003. Again the publisher acknowledged receipt.

Then in summer 2003 the publishers reported that their printer was complaining: the CD-ROM did not have the text, and where was it, please? It eventually emerged that the publishers had deleted their only copy of the text of the book, without realizing that they had done so, and without keeping any records. Luckily the author was able to resupply.

Months passed, and then the publishers suddenly got in touch again, to ask what had happened about the plan to supply professionally-drawn graphics. They had lost that material too, CD ROM and all. This time the author did not have other copies of the material. But luckily the graphic artist was a personal friend and supplied a new copy without extra charge.

Many months passed with no apparent movement towards getting the book out. Then in the fall of 2003 the publishers once more got in touch out of the blue to announce that proofs would arrive on 20 December. This was a terrible date for the author, the beginning of a heavy period of teaching and administration after a long period of relative leisure, so he asked if they could possibly make it a month earlier. They said no, so he made strenuous efforts to make a space in late December for work on this long and complex book, over 500 large-format, small-type pages. The time was reserved, but the proofs didn't come. December came and went with nothing from the publisher at all. When the author got in touch they said they had changed their schedule and the proofs would arrive in mid January. It was actually early February when they came. And that was when the story started to get bad (yes, up to now was the good bit).

The publisher's editor had been rather snippy about the fact that it was not permissible to make changes in the text, only to correct what were clearly typos. But when the proofs arrived the author discovered to his astonishment that there had been a copy-editing process that he had not at any point been told about. And it had made fairly significant changes to some aspects of the text. In certain cases they were more than just significant. One whole chapter was about the weird and wonderful distortions of normal English orthographic and stylistic conventions devised by kids using Internet Relay Chat. With an idiocy that may seem almost incredible, the many examples of this had been carefully copy-edited to make them conform to the usual conventions of printed academic prose, ruining their whole point.

(I say this may seem incredible, but I have heard very similar stories elsewhere: for example, an anthropological monograph by a UCSC colleague that was loaded with transcripts of spoken testimony by uneducated peasants, on which full academic English correction of all the language in the transcripts had been perpetrated by a copy editor whose work had to be undone in its entirety at the proof stage.)

When the author had originally sent in the typescript, he had also sent notes on various problematic aspects, including the fact that there were many quoted extracts from computer files which needed to be in a fixed-width, typewriter-like font. He had even mentioned this in the original proposal, remembering that with an earlier book the same publishers had ignored this problem (though they did fix it when it was pointed it out). However, these proofs again ignored the fixed-width/proportional-type distinction in the typescript, and set the computer material in the ordinary book face, adding many distinctions that are meaningless in terms of ASCII characters, such as a contrast between wide and narrow angle brackets.

Then the author noticed that the graphics in the proofs were in many cases different and much inferior to the ones he had paid to have produced, and in some cases contained plain errors. When he asked what that was about, he was told that the typesetter needed the graphic files to conform to some special technical specifications that had never been mentioned earlier, and since the ones he had supplied didn't comply, the printer had simply improvised replacements...

And so it went on, and on. I cannot reveal the name of the publisher, still less the long-suffering author, unless supplied with an enjoyable dinner and a full bottle of a good zinfandel.

But just remember, if sometimes we academics seem to be a bit testy about our experiences during the copy-editing and proofing processes for our articles and books, there are reasons.

[Added early morning, January 28: Ah, but now, in the encouraging light of the very next dawn after posting this I see I have email from England, and Kate, our lovely commissioning editor at Cambridge University Press, has received the first copy of the new textbook by Huddleston and me, A Student's Introduction to English Grammar, and copies are being mailed to us today, and because the first adopter actually wants to use it in a course that has already started, copies are being efficiently air-freighted direct to him in San Diego. It's all going like a dream. And I just know that when I see my copy it will not spell my name as "Pullman" on the cover. Because publishers really are good, they really are...]

Posted by Geoffrey K. Pullum at 11:24 PM

Senses of freedom

Orlando Patterson's Op-Ed in yesterday's New York Times shows the limitations of restricting articles to a bare 4000 characters (Patterson's piece was of that length exactly, to the letter, by my Unix count of the text). It's fine as a standard piece of polemic against President Bush on foreign policy, but it contains none of the kind of analysis of the word freedom that one would expect from knowing Patterson's scholarly work. On an NPR call-in program last night I heard Patterson give a hint of this. Freedom can be several things: it can be (1) that which ensures no one has power over you (simply not being a slave); it can be (2) that which enables you to take any action you damn well choose (having personal rights of action); and it can be (3) that which permits your participation in the political affairs of your society (being allowed to vote in fair elections for the government that you prefer). The three can conflict with each other: other people's freedom of type (1) to not be slaves deprives me of the type (2) freedom to keep slaves if I choose to. My type (2) rights to smoke infringe your type (1) rights to be free from secondhand smoke, and your type (3) rights to elect a government that then makes my smoking illegal deprives me of some of my type (2) rights. Your type (2) rights to own a company so huge it can buy elections and control governments may in effect deprive me of my type (3) rights. By floating between these three sense of freedom one can make some very confusing arguments. Which freedom is it that terrorists hate, for example? Since President Bush has been making such copious use of the word freedom of late, but never makes clear which sense he intends, it is generally very unclear what he is actually saying. I was hoping to find a clearer presentation of the linguistic point in the Times article, but I was disappointed. There really isn't a trace of it. On the cutting room floor, perhaps.

Posted by Geoffrey K. Pullum at 03:05 PM

January 26, 2005

"Everything is correct" versus "nothing is relevant"

On January 23 a user identified as Zink made some comments on ceejbot's blog about the Language Log post Nearly all strings of words are ungrammatical. They struck me as really interesting:

There's a funny bit in there where they try to at once claim to be "descriptivist, not prescriptivist" while at the same time decrying the word "are" in

Why do some teachers, parents and religious leaders feel that celebrating their religious observances in home and church are inadequate and deem it necessary to bring those practices into the public schools?

Sorry kids, you can't be an apple and an orange, and if you're a descriptivist, and someone honestly makes a sentence, that's an honest sentence in the language that actually is.

By "they" Zink means the Language Log staff (the post was actually mine; none of my colleagues need take responsibility for the views expressed here). What's so interesting is that it is quite clear Zink cannot see any possibility of a position other than two extremes: on the left, that all honest efforts at uttering sentences are ipso facto correct; and on the right, that rules of grammar have an authority that derives from something independent of what any users of the language actually do.

But there had better be a third position, because these two extreme ones are both utterly insane.

I actually devoted my presentation at the December 2004 Modern Language Association meeting to a detailed attempt at getting the relevant distinctions straight, after thinking them through with a great deal of help from a philosopher of linguistics, Barbara Scholz. The concepts are by no means easy to get a grip on. But let's make a start.

First, I didn't "decry" the form are in the quoted example from a letter published in the Philadelphia Inquirer). (Decrying is strong disapproval, open condemnation with intent to discredit; check your Webster.) It needs no strong public condemnation; it doesn't offend me. I merely said it was wrongly inflected. And I explained in painstaking detail why it couldn't satisfy the normal principles of English. Now, what are these things I'm calling the normal principles? Where do they come from?

Barbara Scholz and I have taken to using the term correctness conditions for whatever are the actual conditions on your expressions that make them the expressions of your language — and likewise for anyone else's language. If you typically say I ain't got no hammer to explain that you don't have a hammer, then the correctness conditions for your dialect probably include a condition classifying ain't as a negative auxiliary, and a condition specifying that indefinite noun phrases in negated clauses take negative determiners, and a condition specifying that the subject precedes the predicate, and so on. The expressions of your language are the ones that comply with all the correctness conditions that are the relevant ones for you.

Which conditions are the relevant ones for you is an empirical question. Descriptive linguists try to lay out a statement of what the conditions are for particular languages. And it is very important to note that the linguist can go wrong. A linguist can make a mistake in formulating correctness conditions. How would anyone know? Through a back and forth comparison between what the condition statements entail and what patterns are regularly observed in the use of the language by qualified speakers under conditions when they can be taken to be using their language without many errors (e.g., when they are sober, not too tired, not suffering from brain damage, have had a chance to review and edit what they said or wrote, etc.).

Sometimes, though, one can formulate the relevant correctness condition exactly right, and then observe a sentence in the Philadelphia Inquirer that does not comply with it. This is because people do make mistakes in their own language, and some mistakes even get past newspaper copy editors.

But by saying that, I'm not endorsing any right of descriptive linguists to be considered correct in their statements regardless of what people say! There's no contradiction here (though Zink thinks he sees one). One could imagine that there might be people who actually have different correctness conditions, so that the quoted sentence was grammatical for them. There could be people for whom tensed verbs agree with the nearest noun phrase to the left, for example. For such people, this would be grammatical (I mark it with ‘[*]’ to remind you that it's not grammatical in Standard English):

[*]Celebrating religious observances in home and church are inadequate.

In fact they would even find this grammatical (with the meaning "Celebrating birthdays is silly"):

[*]Celebrating birthdays are silly.

If they really did (one could check by interviewing them or recording them for a while), and if the letter in the Inquirer was written by one of them, I'd change my mind. I've made an empirical claim: I think the person who wrote the letter speaks the same language that I do, and would regard all three of the examples given so far as ungrammatical. I think the person just made a slip while writing, failing to keep in mind that they were writing a sentence in which be inadequate had a clause as its subject, and inflecting be as if it had observances as its subject, through a moment of inattention.

Zink thinks that if you're a descriptive linguist and "someone honestly makes a sentence, that's an honest sentence in the language that actually is." But this is not about honesty. It's about whether an occurring utterance matches the correctness conditions (whatever they may be) for the speaker who uttered it. Either speakers or linguists can be wrong. Speakers will sometimes speak or write in a way that exhibits errors (errors that they themselves would agree, if asked later, were just slip-ups); and linguists will sometimes state correctness conditions in a way that incorporates errors in what is claimed about the language (errors that they themselves would agree, if asked later, were just mistaken hypotheses about the language). I claimed that I'm right about the correctness conditions on verb agreement in Standard English, and that the person who wrote the letter I quoted made a slip-up. That's not a contradiction — no one is attempting to be both an apple and an orange.

And none of the foregoing has anything to do with prescriptive claims about grammar, which are a whole different story. Prescriptivists claim that there are certain rules which have authority over us even if they are not respected as correctness conditions in the ordinary usage of anybody. You can tell them, "All writers of English sometimes use pronouns that have genitive noun phrase determiners as antecedents; Shakespeare did; Churchill did; Queen Elizabeth does; you did in your last book, a dozen times" (see here and here for early Language Log posts on this); and they just say, "Well then, I must try even harder, because regardless of what anyone says or writes, the prohibition against genitive antecedents is valid and ought to be respected by all of us." To prescriptivists of this sort, there is just nothing you can say, because they do not acknowledge any circumstances under which they might conceivably find that they are wrong about the language. If they believe infinitives shouldn't be split, it won't matter if you can show that every user of English on the planet has used split infinitives, they'll still say that nonetheless it's just wrong. That's the opposite insanity to "anything that occurs is correct": it says "nothing that occurs is relevant". Both positions are completely nuts. But there is a rather more subtle position in the middle that isn't. That is the interesting and conceptually rather difficult truth that Zink does not perceive.

[You'll see that there's now lots more discussion available courtesy of ceejbot. There you can have the pleasure of seeing me described as "an abyssmal [sic] dunce" (for not believing that which is limited to supplementary relative clauses in Standard English). You'll read that I'm "a liar"; "smugly superior"; "muddled"; and someone who "thinks his judgement counts more than everyone else's". The strange thing about this kind of commentary is that while I stress (above) that it is entirely possible for a linguist to be wrong about what the correctness conditions on a language (even their own language) really are, the people calling me smug, stupid, and mendacious have no doubts whatsoever. They seem utterly convinced of their rectitude, as they angrily attribute to me the exact opposite of what I said. For example, you'll see that Scholz and I are directly accused (by a user called Nick) of holding that "correct" means "what happens". Our actual view is that we firmly and explicitly deny that, though we also resist the opposite lunacy, the position that what happens has no relevance to the determination of what's correct. As Mark pointed out to me when he first referred me to ceejbot, it's not just the the existence of ignorant authoritarian prescriptivism in this culture that needs an explanation, it's also the level of anger that accompanies its expression.]

Posted by Geoffrey K. Pullum at 12:49 PM

Now about those facts ...

Early in the President's press conference this morning, G.W.B. was asked about a case of a Jordanian man who was jailed for speaking out against his government. The reporter's question was: will G.W.B. publically condemn this man's punishment as anti-democratic, even though it was performed by an ally of the United States? Claiming not to know anything about the case, G.W.B. said:

You're asking me to comment on something (that) I do not know the facts.

The "that" is in parentheses because he repeated this same sentence twice, once with the "that" and once without. (Well, more or less the same sentence; he may have said "don't" instead of "do not" one of those times, and the main clause of the sentence may have been worded a tiny bit differently each time.)

Is this a case of hypercorrective avoidance of a sentence-final preposition (in this case, about)? As a faux-bubba speaker (Geoff Nunberg's term), G.W.B. is unlikely to say "... something about which I do not know the facts", but clearly (to me, at least), he finds something wrong with "... something (that) I do not know the facts about"; my interpretation of the situation is that G.W.B. avoided the problem by just not pronouncing the (otherwise obligatory) preposition.

Update: Andy Grover writes:

I saw your post on the language log concerning this statement:

"You're asking me to comment on something (that) I do not know the facts."

I think it's possible that it's not a preposition missing at the end, but that Bush started one sentence but did not finish it, and then "I do not know the facts" is a second attempt by him at expressing his thought.

That certainly is a possibility. Finding myself with some free time at the espresso machine here at Language Log Plaza, I went to www.whitehouse.gov and re-listened to the relevant portion of the press conference. Here are the sound clips.

  1. (link) You're asking me to speak about a case that I don't know the facts.
  2. (link) Again: I don't know the facts, Terry; you're asking me to comment on something I do not know the facts. Perhaps you're accurate in your description of the facts.

Just speaking as a speaker of (what I think is) the same language, the intonation of the first clip is inconsistent with Andy's hypothesis. However, I'm less sure about the second clip now (which is why I gave it more of the surrounding context). It's entirely possible that the relevant bit of the quote should be punctuated thusly: you're asking me to comment on something; I do not know the facts -- in which case, Andy's hypothesis would be right.

Now back to my espresso ... mmm, dee-lish.

[ Comments? ]

Posted by Eric Bakovic at 10:51 AM

CRF bibliography

Hanna Wallach has posted a terrific annotated bibliography (with links to all the papers, of course) on Conditional Random Fields. Her blog is join-the-dots, where you'll find lots of other neat stuff.

Posted by Mark Liberman at 10:44 AM

More arithmetic problems at Google

Jean Véronis has some further observations and speculations about Google counts. He discovers another experimental situation in which there's apparently a large bias that increases with the size of the counts. I saw something similar earlier in the ratio of {X OR X} to {X} counts; Jean finds a systematic and increasing error in comparing counts in English pages to counts in all pages. As Jean points out, in both experiments there's a sort of phase change at counts of around 0.5x108.

So to the earlier piece of advice ("be careful about counts much greater than 100K") we can add another one ("pay no attention at all to counts above about 500M"). Or if you care about counts, use Yahoo, which (at least for the experimental situations examined) doesn't show these weird errors.

I don't entirely agree with Jean's characterization of the situation, however. He seems to feel that if "the real index on which the data centers are operating [is] be much smaller, and in such a case an extrapolation would be done", that this would constitute "faking" the counts. I think this is quite unfair, because I don't see how it could be any other way. There's simply no way that Google -- or Yahoo or any other search provider -- could service every query involving more than one search term by doing a full intersection of sets of results that might be counted in the hundreds of millions or even billions. They will often have no choice but to "do a prefix and then extrapolate", as my correspondent from Google put it. The issue is how accurate the extrapolation is. And at the moment, Google's extrapolation clearly sucks.

At least this is true in the experimental circumstances that Jean and I have tested, and for cases where large sets of results are involved. Given these findings, my belief Google's counts (for queries involving more than one term or other search condition) starts at about 0.9 when sets of about 100K pages are being combined, and falls monotonically to zero as the set size increases to 500M.

I doubt that this matters at all to most of Google's users, who want the relevant pages, not the counts. But I'm sure that Google's (smart, honest and dedicated) researchers and programmers will fix the problem anyhow.


Posted by Mark Liberman at 09:42 AM

A Series of Unfortunate Events

On Friday, I'm giving a talk at Stanford entitled "A series of unfortunate events: the past 150 years of linguistics".

Here's the abstract:

About ten years ago, a publisher's representative told me that introductory linguistics courses in the U.S. enroll 50,000 students per year, while introductory psychology courses enroll about 1,500,000, or 30 times more. The current number of Google hits for "linguistics department" is 60,900, while "psychology department" has 1,010,000, or 14 times more. The Linguistic Society of America has about 4,000 members, while the American Psychological Association has more than 150,000 members, or about 38 times more. Comparisons between linguistics and fields like history or chemistry give similar results.

It's easy to accept this state of affairs as natural, but in fact it's bizarre, both historically and logically. Furthermore, it's part of a larger and much more serious problem. Those who are resigned to the fate of our academic discipline should still be disturbed that contemporary intellectuals are taught almost no skills for analyzing the form and content of speech and text, or that reading instruction is so widely based on false or nonsensical ideas that a quarter of all students have difficulties serious enough to interfere with the rest of their education.

To break the grip of familiarity, it may help to view the past 150 years of intellectual history as a poker game. We began with a bigger stake than almost anyone else at the table, and have been dealt a series of very strong hands. However, our field is now a marginal player, in danger of being busted out of the game entirely.

In this talk, I'll review our unfortunate past, and discuss the prospects for a brighter future.

Glen Whitman may see this as further evidence of psychological rent-seeking:

I constantly encounter people who think their own career or field of study is the most important one in the world. Educators teach us that education is underappreciated and underfunded. Public health officials diagnose us with insufficient concern for health and inadequate policies to make us take it more seriously. People in the arts sing a tune about the vast significance of music, theater, painting, and sculpture for the human psyche. The linguists at Language Log wax eloquent about the need for more and better linguistic education.

Economists are not immune to the syndrome, but I think they are somewhat more resistant. Of course, economists regularly complain about people – especially journalists – who opine about economic issues with hardly a rudimentary understanding of the subject. (That, to be fair, is often the linguists’ complaint as well: that people who know almost nothing about linguistics so often fancy themselves experts on language.) Still, I rarely find economists talking about how everyone should be forced to obtain, and others be forced to fund, an education in economics. Why not?

Well, uh, could it be that it's because at most universities, in many disciplines, many "others are forced to obtain" (or at least very strongly urged or even constrained to obtain) an education in economics? For example, at my own institution, all 1,500 Wharton undergraduates are required to take the basic economics sequence from the econ department, and requirements in a half a dozen other majors are set up so as to encourage students to learn economics. That's all to the good, in my opinion -- but the result is that economists here are more likely to be worried about how to keep enrollments down to manageable levels, than to be talking about how to encourage more students to learn their discipline. This is not simply the results of intellectual "market forces" -- it happens because of formal structures of requirements and listed options, and also because of well-established informal patterns of advising.

You may be saying to yourself "duh, of course students in a business school need to learn economics; and how could you hope to understand political science or sociology or philosophy or anthropology without having the concepts and methods of economics in your intellectual toolkit?" And you'd be right. But as a point of comparison, let's consider the role of linguistics and language analysis in the discipline known as "English".

In the domain english.stanford.edu, Google finds seven instances of the word "linguistics" (I just checked because I wanted a bit of exemplification for my talk).

I'm less certain that the concept of linguistic analysis doesn't come up under some other name on the Stanford English department web site, but if it's there, it doesn't jump out at you. The page on the Stanford undergraduate English major cites four tracks: Literature, Creative Writing, Foreign Language, Interdisciplinary. None of these mention linguistics by name, nor could I find any mention of the analysis of language as a topic to be studied. The "Interdisciplinary" major requires

Four courses related to the area of inquiry from such disciplines as anthropology, the arts (including the practice of one of the arts), classics, comparative literature, European or other literature, feminist studies, history, modern thought and literature, political science, and African-American studies. These courses should form a coherent program; must be relevant to the focus of the courses chosen by the student to meet the requirement; and must be approved by the interdisciplinary program director. [emphasis added]

It seems that at Stanford -- and for that matter at Penn -- you can go through an entire undergraduate and graduate program in English without ever learning anything about the analysis of language. Think about it: you can get a doctorate in English without knowing how to analyze or even describe the structure of a sentence, the meaning of a word, the rhythm of a phrase, or the flow of a discourse. I think that this state of affairs is bizarre. If that be rent-seeking, make the most of it.


Posted by Mark Liberman at 08:08 AM

January 25, 2005

* me P and call me *

Tom Anderson sent a pointer to the Google Meme Observatory, noting the similarity to our occasional posts on "snowclones".

While we're on the topic, there's a phrase of this type that I took a look at recently, following a question from a reader, for which it's hard to craft a good web-search query. The general pattern is something like "<change my state> and call me <a name appropriate given the change>".

Some examples:

...roll me up and call me curly...
...blow me down and call me shorty...
...dress me up and call me Sally...
...grease me up and call me slider...
...knock me out and call me ignorant...

Some of the examples are more complicated, for example

(link) Does the denial continue? Slap me down and call me conservative, but from outside the force field of Clinton's charm pentangle, a place I happen to inhabit because of my foreignness, the guy looks a tad over-rated.
(link) Among them is Sgt. John Falstaff, who in a dream sequence sings a coarse song ("Oil me up and call me Tex / I want cold, cold cash and red-hot sex") accompanied by a glitter ball and the clichéd pelvic thrusting indispensable to bawdy references in Shakespeare.

After looking over a large number of examples, I don't think that I can provide a single intepretation that fits all circumstances, even allowing for sincere/ironic pairs like "it's very surprising" and "it's not surprising at all." There do seem to be cases where this acts like a "construction", that is, a multi-word pattern with a non-compositional pairing of form and meaning. But there seem to be other cases where it's just a pattern chosen for its formal symmetry and amiable rhythm, without any added conventional interpretation. However, I may just be out of the loop on this bit of culture, I don't know.

And I'm not sure whether the preposition (up,down or out in the cited examples) is obligatory or not.

Whatever its meaning(s), this pattern is of course completely different from the phrasal template " take two <objects interpreted as medicine> and call me in the morning".

[Update: Margaret Schroeder points out that I could

Try googling {"and call me" -take -morning}; I think you'll find the results interesting. There are enough results, for example, to show plenty of such expressions without either "up" or "down."

I did try patterns like that. What I was missing was an easy way to ask for cases where the word before "and call" is not a preposition. But I gave up too easily, because on the second page returned from Margaret's query, I found "...line my eyes and call me pretty", which suggests that (this aspect of) the pattern is prosodic rather than syntactic.

And Rich Alderson wrote to provide a classical reference, making the same point:

The state need not be adverbial, and seems from my experience to be obligatory. In the 1994 Damon Wayans movie _Blankman_, there is an exchange between Wayans and a prototypical brawler:

Brawler: I'm warning you!
Wayans: Well slap me silly and call me Susan!
<Brawler slaps Wayans into stack of waste receptacles or the like>
Brawler: I warned you, Susan!

The phrase type wasn't particularly new at the time, it seems to me.

Thanks to both. But I still don't really "get" these phrases, somehow. ]

[And Joshua Macy writes in with a link to his blog entry on this phrasal pattern, with several other neat examples, including Foghorn Leghorn's "well roll me in corn-flour and call me dinner". Joshua hypothesizes that an initial well "seems to be an important part of the phrase". Though it's not absolutely essentially -- these constructions (or collocations, or whatever they are) have fuzzy edges!

Now I have another question: are there real-life versions of this pattern with some alternative phrasing for "call me" -- say "refer to me as"? Or are there related patterns that don't involve naming at all, like <change my state> and make me <do something appropriate to the new state>? ]

[Yet another update: Peter Erwin writes:

Enjoyed your recent Language Log post on "<change my state> and call me <something>".

As a slight bit of further evidence in favor of the argument that you don't need a preposition -- and partial evidence that this is a fairly old pattern -- I can offer: "Well, strip my gears and call me shiftless."

I first heard this in London in 1987, as part of an general orientation session for American students doing a study-abroad semester in England. The actual context was the (American) speaker warning us that some of our stereotypes about the English were probably rather out of date: "English people no longer say, 'Blimey, guv', just as Americans no longer say, 'Well, strip my gears and call me shiftless'..."

The implication was that the "strip my gears" phrase (which *I'd* certainly never heard before) had once been current -- and had perhaps become part of English stereotypes about what American said -- but was now quite old-fashioned.

Actually, there's a nice collection of this and similar sayings here: http://accweb.itr.maryville.edu/schwartz/saws%20surprise.htm


John Cowan writes

I'm just beginning to listen to the PBS program "Do You Speak American?" (hosted by Robert McNeil, who of course doesn't), and I just heard "Well, butter my butt and call me a biscuit". Thought you'd want to know.

Thanks, John -- one more nail in the coffin of the idea that a preposition at the end of the first clause is part of this pattern. Not that the lid isn't pretty well fastened already. This example also reinforces the notion that a good instance of this pattern should be a four-beat line, with the "and" coming after the second beat. A metrical snowclone? ]


Posted by Mark Liberman at 01:55 PM

January 24, 2005

Questioning reality

I wrote to an acquaintance at Google to ask about the odd estimates of counts in boolean queries ("goolean logic"?), and he explained that the problems are mainly due to faulty extrapolation from a sample (as I had unclearly conjectured in an earlier post) :

[T]here are small variations in the number of results due to the fact that index updates are done at different times in different data centers. But there are much larger variations due to the fact that these are all estimates, and we just haven't tried that hard to make the estimates precise. To figure out the number of results in the query [a OR b], we need to intersect two posting lists. But we don't want to pay the price of intersecting all the way to the end, so we do a prefix and then extrapolate. The extrapolation is done with the help of some parameters that were carefully tuned several years ago, but haven't been reliably updated as the index has grown and the web has changed, so sometimes the results can be off.

This also confirms Geoff Nunberg's understanding that "this is old code that is low on Google's priority list". Geoff points out that Google seems to perform boolean operations correctly for search terms whose frequency is small enough. He suggests that 1,000 (the number of hits that Google will actually show you) is a reasonable bound, and that seems like sensible advice. However, it seems that it may be possible to get accurate counts from queries with larger counts. Sometimes.

I did a bit of experimental observation with respect to the curious matter of the shrinking {X OR X} counts, as a way of uncovering some clues about the extrapolation algorithm and its performance as a function of set size. Now, the count of pages in response to a query {X OR X} should logically always be the same as the count of pages that match the query {X}. But Google's answers have a shortfall that increases systematically with frequency, although the error remains relatively small up to frequencies of 50,000 or even 100,000.

Here are some examples. {Nunberg} returns 49,500 hits, while {Nunberg OR Nunberg} returns 49,400 hits, which at an {X OR X} to {X} ratio of 0.998 is close enough.
{Hammurabi} gets 198,000 hits, but {Hammurabi OR Hammurabi} gets only 158,000, for a ratio of 0.798.
{Thomason} gets 606,000, {Thomason OR Thomason} gets 362,000, so at this point we're down to a ratio of 0.597.
{Berlusconi} get 3,250,000, and with {Berlusconi OR Berlusconi} at 1,810,000, for a ratio of 0.557.
And by the time we get way out there to {see}, at a count of 677,000,000, with {see OR see} at 60,200,000, the ratio of {X OR X} to {X} is a mere 0.090. In other words, the count for {see OR see} is less than 10% of what it should be, if the count for {see} is correct.

Here's a graph of showing the relationship for 60 words that span a range of Google counts from 40 to 677,000,000:

(The solid line is the x=y diagonal, and the dotted line marks x=100,000.)

Exploring a bit above {Nunberg} (which has 49,400 hits and an {X OR X}/{X} ratio of 0.998), there's aspectual at 57,600 hits and a ratio of 0.988; parterres at 67,000 and a ratio of 0.996; fidgets at 77800 and a ratio of 0.976; Wagnerian at 90,400 and a ratio of 0.988; keypresses at 105,000 and a ratio of 0.908; marquises at 136,000 and a ratio of 0.868; carjacking with 172,000 and a ratio of 0.860; and then Hammurabi (198,000 and 0.798) and on up (or down).

Exploring below {Nunberg}, cubbyhole gets 43,800 and a ratio of 1.0; cupfuls gets 27,600 and a ratio of; 1.0; Nixonian gets 24,200 and a ratio of 1.0, Johnsonian gets 14,300 and a ratio of 1.0, maniple gets 11,500 and a ratio of 1.0, embargoing gets 7,690 and a ratio of 1.0, and so on.

This suggests that whatever Google's extrapolation algorithm is, its parameters are tuned to work pretty well up to 50,000 or even 100,000 (where the dotted line is in the plot above). Or perhaps up to that point, it's doing a real intersection of pagerank-sorted lists? Of course, I don't know whether this pattern predicts the behavior of real disjunctions (i.e. those of the form X OR Y, for Y not equal to X). It would be unwise to count on it.

One check would be to look at the equation {X OR Y} = {X -Y} + {X AND Y} + {-X Y}, for {X} and {Y} as counts of {X} and {Y} rise past 50-100K.

(That is, we're comparing the (estimate of) the number of pages containing either X or Y or both, against the sum of (the estimates of) the number of pages with X but not Y, both X and Y, and Y but not X. If the two totals match, that doesn't prove that they're accurate -- but at least they're giving a consistent picture of web reality.)

{Nunberg} 49,500
{Pullum} 53,400
{Nunberg OR Pullum} 101,000
{Nunberg -Pullum} 48,000
{Nunberg AND Pullum} 484
{-Nunberg Pullum} 51,900

Checking: 101,000 against 48,000 + 484 + 51,900 = 100,384. And the {X OR X}/{X} ratios for Nunberg and Pullum are 0.998 and 0.996, respectively.

Close enough for the round-offs obviously involved -- here the extrapolation errors seem to be tolerable.

Now let's try a pairing with some larger numbers.

{Knossos} 357,000
{Minoan} 350,000
{Knossos OR Minoan} 577,000
{Knossos -Minoan} 326,000
{Knossos AND Minoan} 38,300
{-Knossos Minoan} 214,000

Checking: 577,000 against 326,000 + 38,300 + 214,000 = 578,300.

Again, within plausible estimation error bounds. But in this case, {Knossos OR Knossos} gives a count of 416,000, for an {X OR X}/{X} of 1.17 -- not a shortfall, but an overrun!

David Beaver is right: The reality of the situation is constantly in question.


Posted by Mark Liberman at 06:28 AM

Google recall (They stole his mind,now he wants it back.)

Google does weird stuff with disjunctive searches, the stuff Mark Liberman just discussed here, and I don't know why. I am impressed by Geoff Nunberg's partial explanation, that only when the results count is low (under 1000, he says) is Google's estimate based on something similar to actual counting. I'm even more impressed by his cunning workaround, of adding garbage search terms to keep the count low, and then extrapolating.

But this still leaves open the question of what on Earth, or elsewhere, Google does to produce lower numbers for disjunctive searches like quantum|mice (= "about 15,700,000") than for searches the individual disjuncts quantum (= "about 22,600,000") and mice (= "about 22,000,000"). No obvious numeric combination of the results for conjunctions, disjuncts and exclusively tested disjuncts (e.g. quantum -mice) produces the number of hits given for the disjunction. This is a matter of some importance for me, since I'm currently working on a dataset which involves cross-linguistic quantitative assessment of disjunctive Google searches. I can manage without the disjunctions, but it's a lot slower.

Since I'm just a linguist and don't know how Google works, it's really not safe for me to speculate on why Google is as Google does. So fasten your safety belts!

Suppose that Google's approach to a search request you make is like this:
Since complex (boolean or string) searches are harder to process than one word searches, it might be expected that the response rate is lower for complex searches than for single word searches that have a similar actual web frequency. And that should artificially lower your personal Google node's estimate of the result count. This is precisely what Mark, c/o Jean Véronis, showed. Note also that the above pseudo-algorithm predicts that a first search on a term will tend to produce a slower response than later searches, because look-up is used for the later searches. I did a tiny study of this using obviously rare disjunctive searches, and it appeared to hold up. Initial searches clock at around .25-.4 seconds, while later searches are  0.08-0.2 seconds. (These figures are anyway a marvel of engineering which even after years of Google use still blows me away.)

Yahoo search appears to work much better on counting for disjunctive queries (though not string searches, which is what I REALLY care about). Even if it does do something similar to Google, perhaps it does what seems most obvious: splitting the disjunctive search into multiple non-disjunctive searches, making them separately, and then combining the results. I can only conjecture (you didn't unbuckle your safety belt did you...) that Google's page rank algorithm makes recombination of separated disjunctive searches unattractive.

If you have a better explanation of Google's behavior, please send it to dib at-sign stanford DOT edu. Unless of course you know the truth, which would be cheating. In that case send it to Mark and Geoff right away. And if you think Google is confusing, just look at the plot outline for Total Recall:

What is reality when you can't trust your memory. Arnold Schwartzenegger is an Earthbound construction worker who keeps having dreams about Mars. A trip to a false memory transplant service for an imaginary trip to Mars goes terribly wrong and another personality surfaces. When his old self returns, he finds groups of his friends and several strangers seem to have orders to kill him. He finds records his other self left him that tell him to get to Mars to join up with the underground. The reality of the situation is constantly in question. Who is he? Which personality is correct? Which version of reality is true? (John Vogel's summary, originally here.)

Posted by David Beaver at 03:37 AM

Fried Snow

[Due to technical difficulties, I'm posting this on behalf of Lila Gleitman.]

The nonsense about Eskimos having a zillion words for snow just won't go away. This morning on CNN a reporter said:

The Eskimos have more than 100 words for snow.
and mentioned several of them, including a word for "pink snow", a word for "fried snow", and a word for "deep-fried snow". Even if the reporter hasn't yet learned the truth about this matter, and doesn't have a fact-checker, you'd think that an elementary knowledge of physics would suggest the implausibility of "fried snow".

My father, who was a humorous man, often pointed out that in the old Madison Square Garden they often had basketball in the afternoon and ice hockey at night. He said, and fooled me for a few years, that they had these machines so marvelous as to chill the water so fast that it was still warm when it first froze. Just think what he would have gotten past CNN!

Posted by Bill Poser at 12:36 AM

January 23, 2005

when things don't add up

I've been frustrated for a long time by the deficiencies in Google's hit-count estimation algorithms involving Boolean searches that Mark points to, citing Jean Véronis. (Another, possibly related problem arises when you do date-restricted searches in Google Groups, say for every year: when you sum the results you often get a wildly different total from what you get if you simply search on the same terms without date restrictions.)

My understanding is that this is old code that is low on Google's priority list, particularly since Google isn't going to actually show you more than 1000 hits for a given query in any case. There is a work-around to the problem with Boolean searches, though, if you're looking at fairly common terms and are interested only in determining relative frequencies.

This is to do several searches containing restrictors that will keep the hit count down under 1000, which makes it more likely that Google will actually count all the hits.

For example, a search on "criticize" turns up "about 2,360,000" hits, and a search on "criticize" turns up "about 4,000,000." But a search on "criticize OR criticized" turns up "about 1,670,000," which is obviously not consistent with the other results (though you shouldn't expect the disjunctive search to exactly sum the individual searches, since some pages contain both terms).

But now let's add some irrelevant restrictors that reduce the hit counts to under 1000:

criticized OR criticize cleveland squash 891

criticized cleveland squash 637

criticize cleveland squash 331

criticize criticized cleveland squash 78

In this case, the sum of the results for the two individual searches (968) minus the result of the conjunctive search (78) comes out to almost exactly the same total (890) as the disjunctive search. Iterate this a couple of times using other restrictors that limit the totals to a similar range (e.g., "york dictionary" or "multilingual"), and you'll have a fairly reliable way of estimating the relative frequency of the two terms, or of comparing their combined frequency to that of some other term. But of course you still won't know with any certainty how many pages Google has indexed in absoluto that contain the items

Posted by Geoff Nunberg at 09:00 PM

Prince Harry the Nazi?

Britain's Prince Harry has recently come in for an amazing amount of criticism for wearing a Nazi-era military uniform to a costume party. I can see that some of the people at the party might have found this distasteful, but that isn't the basis for the complaints. The basis for the complaints is that he thereby showed sympathy for Nazism.

Now, I'm the last person on earth to have any sympathy for the Nazis. I'm a left-wing Jew. I know many people, including members of my own family, who were persecuted by the Nazis. My father, my uncles, and my doktorvater all fought against them in the Second World War. But this doesn't make any sense. The critics are assuming that wearing a Nazi uniform is a non-verbal indirect speech act that demonstrates support for the Nazis. Not one comment on this topic that I have seen mentions the basis for this belief. Surely it is false. People routinely wear costumes representing figures of whom they do not approve. When a couple come to a party as Bonnie and Clyde, does anyone think that they approve of bank robbery? When someone dresses as a pirate, is that taken to show approval of piracy? Of course not. Wearing a costume does not indicate approval.

The people going after Prince Harry would do the world a lot more good if they would put their time and energy into any of the real problems in the world today. Prince Harry isn't one of them. A real source of anti-Semitism, among other evils, is Saudi Arabia. According to the US State Department, Saudi Arabia requires all citizens to be Muslims and forbids the public practice of any other religion. Saudi Arabia has a policy of not issuing entry visas to Jews. This policy was publicly stated on the Ministry of Tourism web site until US Congressman Anthony Weiner made a stink about it and called on President Bush to refuse entry to Saudi citizens until the policy was abolished. To take just one other, particularly vicious, example, two years ago the official government newspaper Al-Riyadh repeated the Blood Libel.

There are of course plenty of problems other than anti-Semitism. Darfur, for example. I am astounded and dismayed at how much outcry there has been about Prince Harry's costume and how little there has been about so many other infinitely more important issues. That the outcry is ill-founded makes this all the more perverse.

Posted by Bill Poser at 06:45 PM

What researchers really need from web search

As long as we're talking about research use of Google, warts and all, let me point out something that we really, really need from our internet search engines. I'm not talking about a coherent implementation of boolean search -- I'm assuming that Google will get that one straightened out at some point. And when I say "we", I don't just mean us linguists, computational or otherwise. Anyone who wants to use web search for rational inquiry needs this. That includes anthropologists, psychologists, sociologists, political scientists, rhetoricians, language teachers, marketing researchers, and just plain folks. Can you guess what I mean?

We all need to be able to get accurate approximate counts of things that Google can't search for. For example, we need to be able to count uses of words in particular constructions, or with designated senses, or in particular sorts of discourse contexts, or written by people with particular allegiances or attitudes, or with specified connotations or emotional loading.

"Well," you may be thinking, "duh". As the proverb says, people in hell want ice water. What I'm asking for isn't possible, not because Google doesn't have enough computers, but because there aren't any accurate algorithms for computing any of these things.

All the same, there's something simple that a search engine could do for us that would solve all these problems. More or less.

All we need is for the engine to forget, temporarily, about page rank and other fancy algorithms for sorting query results, and just give us a random sample of the set of pages returned by a query. Then we could use sampling techniques to harness human judgment, in an efficient way, to give us the numbers we need. The problem with the current situation is that in many (most?) cases, higher page-rank pages and lower page-rank pages have different distributions and interactions of the relevant features of linguistic form, content and context. The result is that the results of human evaluation of the (high page rank) pages that Google will let you see can't reliably be extrapolated across the (low page rank) pages that you can't see. Some earlier discussions of these issues on Language Log can be found here, here, here, and here, among other places.

From what I understand about how such search engines work, it should be possible to offer a reasonable-sized random sample (say a thousand hits) without prohibitive computational cost. Given how well optimized the current algorithms are for returning results in a useful order, I'm sure there the cost in extra computation would still be significant. But maybe one of the search engine companies could offer this as an extra-cost service? Or a public service?

While we're waiting, let me share with you Groucho Marx's response to those who want ice water. "Ice Water? Get some Onions - that'll make your eyes water!"


Posted by Mark Liberman at 05:59 PM

Uh Oh...

I hate to rain on Geoff Nunberg's parade, since I'm also really happy about Google's increase of the word count limitation in queries from 10 to 32. And regular readers know that on the whole, there are few Google users more enthusiastic than me. But there's a cloud on the horizon...

Over the past few days, Jean Véronis has been doing a little experimental science on Google. His extrapolation of the time series of Google index counts demonstrates that the number of pages searched today must not be 8,058,044,651, as Google has been claiming since November, but actually 9,105,590,456 or so. That's the good news.

The bad news: Jean shows that Google's implementation of boolean search has some serious problems.

Here's the evidence (numbers from Jean's post of 1/19/2005 and from my replication at about 3:45 this afternoon):

  Searching for:
Jean's Count
My Count
1 Chirac
2 Chirac OR Sarkozy
3 Chirac OR Chirac
4 Chirac AND Chirac
5 Chirac Chirac

As Jean points out, it's hardly logical (for example) that lines 2 and 3 have smaller counts than line 1, or that lines 1, 4 and 5 have three different counts.

And another puzzle --

Searching for: Jean's Count My Count
Chirac AND Sarkozy
Chirac -Sarkozy
-Chirac Sarkozy

As Jean points out, the total from the second table (2,272,000) should be the same as the {Chirac OR Sarkozy} search in the first table (1,420,000).

The small differences between my counts and Jean's probably reflect not only the 3-day difference in query times (though this should not produce smaller counts for many of the later queries), but also the fact that Google's results come from a massively parallel set of parallel sets of machines, which have somewhat different indices. (And it may also be true that some of the search algorithms are implemented in essentially stochastic ways, e.g. estimating set intersection and union counts by extrapolating from random (initial) samples). But the big differences are harder to explain, and more troubling from the point of view of those who would like to use Google as an instrument of research. As Jean says:

Je n'ai pas la moindre idée de l'origine du problème. Bien sûr, je sais que les nombres retournés par Google sont des approximations (d'ailleurs le moteur précise bien environ x résultats), que les valeurs peuvent légèrement varier en fonction des "centres de données" qui traitent la requête et qui peuvent varier d'un moment à l'autre. Ces raisons pourraient expliquer de petites différences, mais pas des différences du simple au double. J'ai cherché sur les différents forums. Personne ne semble avoir la solution (si certains parmi vous l'ont, je serais très curieux de la connaître !).

[translated by myl] [I don't have the slightest idea of the source of the problem. Of course, I know that the numbers returned by Google are approximations (also the engine specifically says 'about x results'), that the numbers can slightly vary as a function of the "data centers" that process the request and can vary from one time to another. These reasons can explain small differences, but not differences of a factor of two. I've asked in different forums. No one seems to have the solution (if some among you have it, I'll be very curious to know!)]

I don't really have any idea either. One possible component of the problem is that (some) boolean queries might be sent to a search unit whose index is way out of date. [Update: see this post for the real story, which is more along the lines of wrongly extrapolating intersection counts from a sample, as I mention above.] But whatever the explanation, the fact is clear: Google's boolean queries are badly broken, at least if you care about the counts.

Jean also shows that the results from Yahoo! for similar tests are more logically plausible, though the total numbers are somewhat smaller. Advantage: Yahoo?


Posted by Mark Liberman at 04:10 PM

hooray hooray hooray hooray hooray hooray hooray hooray hooray hooray hooray hooray hooray hooray hooray hooray hooray hooray hooray hooray hooray hooray hooray hooray hooray hooray hooray hooray hooray hooray hooray hooray!

Linguists who have chafed under Google's 10-word limit on search strings will be happy to learn that the limit has now been raised to 32 words, which offers lots more latitude for working with variants and negative conditions and for phrase location. A search on:

"Many things in the world have not been named; and many things, even if they have been named, have never been described."

turns up 19 hits, all for texts of Susan Sontag's essay "Notes On Camp."

Posted by Geoff Nunberg at 03:50 PM

English in deep trouble?

A user signing as phaln on Slashdot today remarks, apropos of a comment exchange about using the entire web as a corpus (the way we often do here at Language Log Plaza), which led to some comments on the sort of random slangy stuff on the web that might make that a bad idea for grammarians seeking information about English:

It came to me that the English language was in deep trouble when people started saying "rotfl" and "lol" in person.

Now, the user is being humorous, of course. But it is remarkable how often people say this sort of thing. It reaches newspaper columns and magazines as well as everyday conversations about language ("Oh, you're a linguist? What do you think about the way Internet slang is changing the language?"). I've heard a half-hour radio discussion about it on the BBC World Service (in the middle of the night; it was a real yawn, a perfect fix for my insomnia). It seems likely that at least some people really do think English might be altered radically by the intrusion of email abbreviations for phrases like "[I'm] rolling on the floor laughing" or "[I'm] laughing out loud" into regular spoken English.

Don't worry. Nothing radical or even slightly significant will happen. Suppose, say, "rotfl" (pronounced "rotfull") became quite common in speech (which seems unlikely, since if your interlocutor falls down and rolls on the floor laughing it generally needs no comment; but maybe as a metaphor, or on the phone). What would have changed? One interjection (a word grammatically like "ouch") added. Total effect on language: utterly trivial. Not even noise level. Interjections are so unimportant to the fabric of the language that they are almost completely ignored in grammars. There's almost nothing to say. They have no syntactic properties at all — you pop one in when the spirit moves you. And their basic meaning is simply expressive of a transitory mental state ("Ouch!" means something like "That hurt!"). Don't worry about English. It will do fine. Not even floods of email-originated phrases entering the lexicon would change it in any significant way. If phaln were to suggest such a thing seriously I would be LOL.

Posted by Geoffrey K. Pullum at 02:30 PM

Academic gestures in the Southwest (Conference)

Boy, was I wrong. Or more precisely, ignorant. In my post this morning on the mano cornuta (or "Hook 'em Horns") sign, I wrote that "I don't know of any other handshape-based gestures indicating academic allegiance. The culture seems to be missing an opportunity here". By the time I got back from shoveling out my wife's driveway, I had email from Bill Shirley ("4th generation Longhorn, BS Computer Science 1990") informing me that "Many schools in the (now defunct) Southwest Conference have hand signs", and pointing me an article by Paul Burka on the Texas Monthly site entitled "Football Hand Signals". Burka explains that

Of the nine SWC schools, more have hand signs (seven) than NCAA investigations (six). For that matter, one school, SMU, has more hand signs than football teams.

And apparently there is an eighth sign, unsanctioned because Rice University officials "[suspect] that a middle finger poked outward has a meaning other than 'peck 'em, Owls'".

Burka gives histories for the other signs, for example this explanation of the first one:

Blame it all on an Aggie named Pinky Downs. A 1906 Texas A&M graduate, Downs was a member of the shcool's board of regents from 1923 to 1933. He was the kind of Aggie who wore a maroon tie every day and who prodded the school into spending an extra $10,000 so that its new swimming pool would be longer than the one at the University of Texas. When the Aggies had a yell practice before the 1930 TCU game, Downs naturally was there. "What are we going to do the those Horned Frogs?" he shouted. His muse did not fail him. "Gig 'em, Aggies!" he improvised, appropriating a term form frog hunting. For emphasis, he made a fist with his thumb extended straight up. The Southwest Conference had its first hand sign.

Like the UT mano cornuta sign, this one has other meanings in other places and times. And as for that gesture that so unnerved the Norwegians, its roots turn out to lie in classical logic:

For a quarter of a century after Pinky Downs's moment of inspiration, the Aggies had a monopoly on official gestures. But by 1955 archrival UT had fallen on hard times, made harder by a corresponding rise in the fortunes of A&M. A UT cheerleader named Harley Clark syllogized: (1) A&M has a hand sign, (2) A&M is winning, (3) UT has no hand sign, therefore (4) UT is losing. (Such reasoning prowess would later lead Clark, as an Austin judge in 1987, to conclude that the state's system of financing public schools was unconstitutional.) At a pep rally before the TCU game, Clark held up his right hand in a peculiar way. The index and little fingers were sticking up, while the thumb held down the two interior digits--the head of a Longhorn, Clark said. The creation proved not to be the immediate answer to UT's football plight, however, as signless TCU won the next day, 47-20.

I appreciate living in a country where the roots of folk ways are so well documented. But now I wonder, what other American universities have hand signs?

[Update: Wes Meltzer writes:

I spotted your post on Language Log about school hand signs. I don't know the rest of the Big Ten terribly well, but I'm a Northwestern student and I can say, safely, that we have one. It's a hand with the fingers curled outward to form a claw, which (apparently) represents the wildcats we're using as a mascot.

But, being a small and athletically unimportant school, you would never meet another NU grad and make a claw sign, like Texas alumni do to each other. But it's a football-game thing; we give the band the claw, and sometimes, if the crowd is fired up and we've won -- not so common -- the football team, too.

Athletically unimportant? NU's football team beat Ohio State and Purdue this year, and has been Big 10 champion or co-champion three times in the past decade. But I guess that if we get a president from Chicagoland, his or her mother probably won't be making the NU claw sign at the inaugural. ]

[Update #2: Justin Busch at Semantic Compositions has posted about a hand sign (with arm motion) performed at USC sporting events. ]


Posted by Mark Liberman at 02:27 PM

American college totems

Bill Poser's observation about the Norwegian interpretation of the UT "Hook 'em Horns" signs is echoed by Lloyd Grove's remarks about its meaning in American Sign Language. The same hand configuration, sometimes [falsely] called mano cornuto, is used in parts of Italy for defense against the evil eye, or in other parts of Europe to insinuate cuckoldry. [Update 1/24/2005: see below.]

But when Bill refers to the University of Texas Longhorns as "a sports team", he's glossing the name incorrectly. The Longhorns can be any of a large number of teams, from football to baseball by way of basketball, tennis, softball, swimming and so on.

Like other such college totems, this nickname can also be used in a non-athletic context; thus the schedule for UT new student orientation points out that

Discussions take place the day of Gone to Texas and are a great way to start your Longhorn academic career in a one-time class (in or outside of your major area) with no tuition or tests!

Adam Smargon has compiled a "very long, but ... not exhaustive list" of such nicknames here. Like the Longhorns, many of these totems are animals (the Michigan Wolverines, the Wisconsin Badgers, the Irvine Anteaters, the College of the Atlantic Black Flies, the Santa Cruz Banana Slugs), but there are quite a few other categories, including religious figures (the Penn Quakers, the Wake Forest Demon Deacons, the Ohio Wesleyan Battling Bishops) and colors (the Harvard Crimson, the Syracuse Orange, the Dartmouth Big Green). The Stanford Cardinal is hermeneutically underdetermined -- it could be an animal or a religious figure or a color, though there is some local folk history that settles the matter for those in the know.

I don't think that schools in many other countries have totemic names (though Canada seems to share this aspect of American culture -- UBC student athletes seem to be Thunderbirds, for example. And I suppose that there may be a connection to things like the Oxford Blue in England).

And I don't know of any other handshape-based gestures indicating academic allegiance. The culture seems to be missing an opportunity here, as the Bush family demonstrated during the inauguration.

[Update: Nassira Nicola emails:

I read your Language Log post this morning, and I'm going to have to disagree with Lloyd Grove's interpretation of the Texas Longhorns salute in ASL. The sign BULLSHIT is a two-handed sign, where the non-dominant hand takes the same handshape as the Longhorns salute; the dominant hand forms a fist under the non-dominant elbow and opens and closes several times. If you look at the sign with an appropriately vulgar frame of mind, it looks like a rather iconic depiction of a bull defecating. Without the non-dominant hand, the sign means something like BULL**** - close enough to give ASL users the giggles, but not actually an obscenity in itself.

Actually, for what it's worth, the Texas Deaf community uses the Texas Longhorns Salute in ASL to mean - the Texas Longhorns.

This whole thing, incidentally, reminds me of the "gang signs" phenomenon; in the LA high school from which I graduated, students were asked to make sure their hands were visible and open in the senior picture to ensure that they weren't flashing "West Side" or "East Side" or whatnot. Not exactly a hand signal indicating academic allegiance, but definitely an indication of allegiance.

Yes, I thought of the "gang signs" business myself, but I don't know anything about it, and didn't have time to look it up, so I'm glad to have the reference -- and to have the mistake about ASL corrected! ]

[Update 1/24/2005: Stefano Taschini writes to point out that I've apparently been led astray by Wikipedia and other internet sites --

Regarding your post on "American College Totems" [1], I'd like to point out that "mano" is feminine in Italian (as in Spanish) and has always been feminine (cf. manus,-us, feminine noun of the fourth declination). The expression "mano cornuto" therefore strikes for its lack of gender agreement.

In Italy, this gesture is simply called "le corna" (the horns) and is apotropaic when the fingers lie horizontally or point down [2], and is an insult when the fingers point up. The graveness of the insult -- an innuendo of cuckoldry and, ultimately, little virility -- depends heavily on the context: kids use it rather innocently to make fun of other people posing for a picture [3], but flipping them to another car driver is a major offense.

[1] http://itre.cis.upenn.edu/~myl/languagelog/archives/001827.html
[2] Giovanni Leone, 1975, to contesters when visiting a hospital as President of the Republic. http://www.scudit.net/mdjella_file/cornaleone.jpg
[3] Silvio Berlusconi, 2002, in a group photo among the EU ministers of foreign affairs. http://www.repubblica.it/online/politica/gesto/gesto/ansa002a60c7cxw200h172c00.jpg

I read Italian and Spanish mainly by triangulating from Latin and French, but I still should have known the gender of mano. And here's case where the wisdom of internet crowds goes astray -- {"mano cornuto"} gets 787 Google hits, as opposed to 212 for {"mano cornuta"}.

Anyhow, between Nicola's correction and Stefano's, we have now learned that essentially all the linguistic information in the original post was incorrect. Consider this a formal apology. And as always, the Language Log marketing department stands ready to refund your subscription fees in full if you are less than completely satisfied. In cases of egregious error, like this one, we'll refund double your subscription fees.]


Posted by Mark Liberman at 08:58 AM

January 22, 2005

Terry Crowley

David Nash has passed on the sad news that Terry Crowley, a linguist at the University of Waikato known particularly for his work on the languages of Vanuatu, passed away on January 15th at the age of 52. One of the languages on which he did a great deal of work is Bislama, a pidgin which is the national language of Vanuatu, for which he produced a reference grammar and dictionary. He also wrote an interesting Introduction to Historical Linguistics, unusual for its emphasis on data from Oceania. Radio Australia interviewed Nick Thieberger of the University of Melbourne about him.

Posted by Bill Poser at 01:16 PM

<abbr title="It's a joke">

Over at A Roguish Chrestomathy, q_pheevr takes issue with the views of Bill Poser and SC on the ethics and etiquette of onomastic orthography. Q (or should I say q_?) establishes empirically that "it is not true that the English writing system leaves no scope for individual choice in the capitalization of names" (consider Flora MacDonald vs. Sir John A. Macdonald, or the likes of ffoulkes and ffolliott), and implies that SC is being snobbish for not offering to Laveranues Coles the courtesy routinely afforded to the Marquess of Cholmondeley. But the best part is the jokes, especially the one about Malcolm X and Prince Faisal. The html <abbr> tag is used to flag some of these, but you won't be able to see that if you're one of the 92.7% of the browsing public that still uses Internet Explorer. With or without <abbr>, read the whole thing!

Posted by Mark Liberman at 01:05 PM

The storm is real, the word is still fake

The National Weather Service is warning us that

A dangerous winter storm will affect the entire area today into Sunday morning. ... Total snow accumulations for the entire event are forecast to average 10 to 15 inches across the area. ... The snow is expected to fall so heavily that near white-out conditions are likely with rates possibly reaching one to two inches per hour.

The weatherfolk tend to overestimate this sort of thing, since inadequate warnings are more culpable than excessive ones. And the TV weatherpeople have even more reason to cry wolf, since big worries lead to big ratings. All the same, I'm sure we're facing our first big snow of the winter. But up and down the Atlantic coast, local articles and local newscasts are using a word that the National Weather Service doesn't: nor'easter.

The Boston Globe doesn't use that word either. As Jan Freeman pointed out in that paper more than a year ago, it's is a phony regionalism:

It's not, after all, a regional pronunciation, as many journalists outside New England now believe. "I grew up on Cape Cod when there still existed a pronounced local accent," wrote George Hand. "The word -- spelled phonetically -- was nawtheastah." Sailors disclaim it, too: They may say sou'wester, but never nor'easter.

Freeman's evaluation accords with my own experience of regional pronunciations growing up in rural eastern Connecticut, as I observed in a post discussing Freeman's article.

Today's storm is (predicted to be) a true northeaster, according to the National Weather Service advisory:

... An area of low pressure was located over the Mississippi River valley early this morning. It is forecast to move to the Ohio Valley later this morning and cross the central Appalachians this afternoon. A secondary low pressure system is then forecast to develop over the Delmarva coastal waters this evening and intensify rapidly.

Why is this a "northeaster"? Last year, I passed along the explanation I was given as a child:

... a northeaster is a winter storm that travels (from southwest to northeast) up the coast, with its center off shore, so that the counter-clockwise circulation of the storm blows in off the ocean full of moisture (from the northeast), dumping the load of moisture as snow and ice when it cools down over the land.

But this is not a nor'easter, as far as I'm concerned in my role as an ordinary human being. That word seems faker to me than the lederhosen at the Biergarten in Walt Disney World. I applaud the Boston Globe's editorial policy against using it. However, as a linguist I have to admit that a nor'easter is what storms like this have become, in the English language at large, whether we like it or not. Quoting Freeman:

The facts ... have not slowed the advance of nor'easter: Even in print, where it's probably less common than in speech, it has practically routed northeaster in the past quarter-century or so. From 1975 to 1980, journalists used the nor'easter spelling only once in five mentions of such storms; in the past year, more than 80 percent of northeasters were spelled nor'easter. It's no more authentic than "nucular" for nuclear or "bicep" for biceps, but it would take a mighty wind, at this point, to blow nor'easter back into oblivion.

In Google News this morning, northeaster has 48 hits to 215 for nor'easter. That's 82% for the fake regionalism.

[Update: Oops -- David Pesetsky shows that I should be more careful. When I said that the National Weather Service didn't use the term nor'easter, I was relying on reading a few notices about this recent storm. But Dave actually searched {noreaster site:noaa.gov}, and found (for example):

950 PM EST FRI JAN 21 2005

No apostrophe, but that's a mere orthographic bagatelle. Thanks! ]


Posted by Mark Liberman at 09:34 AM

Don't take your language for granite

In response to my post on deep-seeded, Fernando Pereira wrote:

My daughter brought up "take it for granite" when I showed her your "deep-seeded" piece. A quick Googling shows that the eggcorn occurs for real, but most top links are uses of the eggcorn as a pun, as in the names of kitchen counter suppliers, B&Bs, and popular geology outings.

There are indeed plenty of puns, like the climbing route named "Taken for Granite" on Stone Mountain, NC, or the slogan of GraniTech Inc. ("Your Marble and Granite Source"): "Take Us For Granite". And there are also some real examples out there:

(link) He started taking me for granite so I moved and kind of wanted to do my own thing.
(link) I put myself out there and let everyone have a piece of me. I’m finally just not going to let people take advantage of me and take me for granite anymore. I’m done trying to be the friend everyone needs when I’m left with nothing.
(link) Sometimes, he will need his space, but don't worry... He'll always make time for you and even when you're not around, you'll be in his thoughts. You will find that he isn't like any other guy that you have met, so please don't take him for granite. When it comes to his money, don't take advantage of that, He will be so unselfish with it, because that is the way he is.
(link) Never in his life would he think to see how many people really appreciated Fran. It showed him how much of a bloody cad he was for taking her for granite.
(link) Fight the deception of forgetting. When we forget what others have done we will take them for granite. When we forget what God has already done we will begin to weaken His presence and He will weaken our position.
(link) I do not remember ever hearing you say you had the faith and confidence in your sons, that they would behave themselves, and be honest and dependable. But as far back as I can remember I felt you expected as much of me, and took it for granite that I would.

This one is not as widespread as "deep-seeded", I think, because the sound is not so commonly the same, and the meaning doesn't fit quite so well.

You might think that the sound correspondence "granted" = "granite" is only approximate, but at least for some English speakers it can be exact.

There are two independent steps. The first and commoner one is for /'VntV/ (i.e. /nt/ when preceded by a stressed vowel and followed by an unstressed one) to weaken to [n]. A common example is the pronunciation of twenty as if it were spelled "twenny", or center as if it were spelled "senner". This sort of thing is often deprecated as sloppy speaking, but in fact most Americans do it all the time. I certainly do. If you can find an American speaker who never reduces /nt/ to [n] in such words, you've either found an extraordinarily fussy speaker, or one of the few Americans speaking a dialect that weakens /t/ in a different way in these contexts, e.g. to something like [ts].

The other step in making granted sound exactly the same as granite is to devoice the final /d/. I don't know much about the (phonological or sociolinguistic) distribution of this phenomenon -- I'll look into it and report back what I find. The pattern has clearly been around for a long time -- the OED explains that

The Sc. form of -ed is -it, with which cf. such early ME. forms as i-nempnet named, i-crunet crowned ...

I've heard some contemporary speakers use -it -- perhaps from this historical source, though final devoicing is a common sound change anyhow -- in words like patted or wanted (where the /nt/ weakening also applies, so that the result is something like [ˈwɔ.nɪt]). For them, granted would be exactly "granite". Of course, exact equivalence of sound isn't required for an eggcorn to be created, but it helps; and granted is exactly granite only for a minority of speakers.

As for the dimension of meaning, I suppose that to take someone for granite is to treat them like a worthless inanimate object, a commonplace rock with no human value and no human feelings. This is a plausible substitute for some of the "value too lightly" senses of "take for granted". And perhaps to take something for granite could also mean to consider it the factual bedrock on which an edifice of supposition is built, which could work for the "assume as true" sense:

[American Heritage dictionary]:
1. To consider as true, real, or forthcoming; anticipate correctly.
2. To underestimate the value of: a publisher who took the editors for granted.

[Merriam Webster's Unabridged]:
1 : to assume as true, accurate, real, unquestionable, or to be expected *took it for granted that he would not get into trouble with the licensing authorities* *taken for granted that words have definite meanings— T.S.Eliot*
2 : to pay inadequate attention to or value too lightly (as a possession, right, or privilege) *inclined to take one's liberties for granted if they are never challenged* *began to take her husband for granted until he threatened to leave her*

Still, taking for granite is not quite right here. Why "granite", for example? It's a bit too specific for an idiomatic prototype. We talk about a heart of stone, not a heart of chert; we founder on the rock of human nature, not on the basaltic outcropping of selfishness; we build on the bedrock of solid fact, not on the gneiss dome of accepted reality. And the relevant meaning of granted is still very much alive (unlike the relevant meaning of seated): "I grant that...", "granted that ...", "even if we grant that ..."

So it's not surprising that "take for granite" is spreading much more slowly in the linguistic meme pool than "deep-seeded" is.


Posted by Mark Liberman at 08:17 AM

Satan-worshipping Bushes?

According to CBC news, Norwegians were shocked by a gesture made by President Bush and his family during the inauguration. In Norway this gesture is a salute to Satan. The gesture is a "false friend"; in Texas it is a salute to the the University of Texas Longhorns (which is apparently a sports team).

[Update: more on this here and here. ]

Posted by Bill Poser at 01:12 AM

January 21, 2005

The mystery of #12

There's a new "Louisiana Style Cajun Pizzeria" at 2104 Chestnut St. here in Philadelphia, called "Two Red Boots". I had dinner there yesterday with my 9-year-old son, who said that he'd never had better pizza. My jambalaya was pretty good, too. We took home a copy of their take-out menu (slogan: "Philly Pizza Pioneers Since 2004"), which features 14 pizza combinations, of which most seem to be named after major league baseball teams. At least, many of them are: #7 The Astros ("tasso, andouille, ground beef, cheddar & mozzarella"), #4 The Yankees ("pepperoni, marinated chicken, fresh garlic on white pie"), #8 The Cubs, and so on. There are some typos: The Philles instead of The Phillies, The Ranegers instead of The Rangers, The Anahen instead (I think) of The (Anaheim) Angels. There's one that's clearly not a team name: #3 The Mother Earth ("cheese out, calabria sauce, fresh mushroom, red onion, red yellow peppers, spinach, roasted garlic").

And then there's #12. The Manliws.

I guess this might be an implausible attempt at The (Seattle) Mariners. If not for the baseball context, I would certainly not have made that guess, though. The ingredients are no help: "plum tomatoes, fresh spinach, fresh garlic on white pie, mozzarella". And Google's index is, so far, innocent of any pages including the string "manliws". Helpful Google asks if I mean manlius. But I don't. Or at least I don't think I do.

I didn't read the menu until we got home. Next time I'm on the 2100 block of Chestnut St., I'll stop in and ask the proprietor.

[Update: Neil Bardhan suggests that perhaps "Manliws" should be "Marlins". Yes! that makes much more sense than "Mariners". At least, it has the right number of letters, and 'n' is a plausible misreading of hand-written 'r', and 'w' of "n". This gives me a warm feeling for the management of this pizzeria, who must be the only people in the world who are worse proofreaders than me. ]

Posted by Mark Liberman at 05:15 PM


Wednesday's New York Times (I'm a bit behind in my reading) has an article by Nina Bernstein in the New York section called `Problems With Speaking English Multiply In a Decade'. The burden of the article is that more and more New Yorkers don't speak or write English, mostly because they arrived in this country as adult immigrants, and the author emphasizes that English language classes are desperately scarce in Queens and Brooklyn, where most of the non-English-speaking immigrants live. By contrast, there are lots of centers elsewhere in the city devoted to teaching English to adult immigrants. Most notably, Manhattan, with a much, much smaller population of recent immigrants, has 80 English-as-a-second-language programs, while Queens has only 14.

What makes these figures especially striking is that a major argument of the English-only folks, who advocate eliminating all other languages from all official government documents (including applications for social services like Medicare), is that rigorous measures must be taken to require immigrants to learn English so that they won't keep refusing to abandon their native languages in favor of English. Bernstein's article highlights a fundamental flaw in this English-only position: if you don't speak English, and if you are surrounded by other people who don't speak English, then you need English classes in order to learn English. And those classes are simply not available to very large numbers of immigrants, in New York and elsewhere in the country.

Posted by Sally Thomason at 11:52 AM


Mark Liberman's post yesterday about deep-seeded reminded me of a tiny quasi-experiment I conducted years ago, after reading a letter to the editor that used meddle instead of its homophone mettle. Various phonological theories have various ways of dealing with alternations between [t] and a tap (which is a name for the middle consonant in middle, seeded, and seated) in word pairs like write, with a [t] at the end, and writing, with a tap in the middle, and between [d] and a tap in word pairs like ride and riding. But what, I wondered, is the basic, underlying consonant in the middle of single-morpheme words like metal, meddle, mettle, and medal? Is it a tap, or a d, or a t, or sometimes a d and sometimes a t? I didn't expect the tap to be basic, because the English tap is usually assumed to secondary, a phonetic realization in a particular medial context from a basic /d/ or /t/ phoneme. But if it never alternates with anything else, maybe the tap actually is basic in simple words; and if it isn't, is it a realization of a /t/ or a /d/ in such words?

The problem is, since no alternations can be found in English in words that have a tap in the middle in all forms, how can we find out what the basic unit is? I thought that asking for a very slow pronunciation of the word might help, because a tap can only be said fast -- it you slow it down, it's not a tap any more. So in super-slow speech, would speakers still use a tap (in which case that part of the word wouldn't be slowed down), or would they replace the normal-speed tap with a stop [t] or [d]?

But now there's another problem: English speakers are too literate to make this experiment easy to run. In spite of misspellings like deep-seeded for deep-seated and meddle for mettle, speakers are all too likely to have a mental image of how the word is spelled, which will make it hard to discover their actual phonological representation of the word: I expected, but didn't test this hypothesis, that speakers who thought the word was spelled with one or two [t]'s would pronounce it slowly with a [t], and likewise for [d]. So, motivated by mere curiosity and unhampered in those long-ago days by IRB boards with their stern warnings against potential damage to speakers that could be produced by requiring them to (gasp!) pronounce words, I collected a gaggle of illiterate English speakers, namely, half a dozen neighborhood kids under the age of six. I read each of them a short list of relevant words and asked them to say the words very, very slowly. The kids were excellent experimental subjects, and every one of them, independently and without hesitation, pronounced every one of the words with a [t] in the middle.

What does this mean? I haven't the faintest idea. As an experiment, it's laughable -- historical linguists are not noted for expertise as experimental scientists, and even I can think of several confounding factors that could have skewed the responses. But I've always wondered why those kids (who did not hear each other's pronunciations) were so consistent in replacing the tap with [t], which is less phonetically similar to the tap than [d] is: both [d] and the tap are voiced, and the [t] is voiceless.

An afterthought: it occurred to me that Google might shed some light on this, so I googled two phrases with both spellings: on your mettle/on your meddle, and don't meddle/don't mettle. Google has 1140 hits for on your mettle and 0 hits for on your meddle, so there are no replacements of tt by dd. And it has 26,700 hits for don't meddle and 77 hits for don't mettle. The examples of don't mettle clearly mean the same thing as don't meddle, for instance I don't mettle in such things. So there's a small but nontrivial number of replacements of dd by tt. Not much data, though this apparent pattern does fit with what the little kids did in my quasi-experiment. But I still have no idea what, if anything, it means.

Posted by Sally Thomason at 11:47 AM

January 20, 2005

The Economist on internet linguistics

Today the Economist has a story entitled "Corpus colossal", which starts like this:

LINGUISTS must often correct lay people's misconceptions of what they do. Their job is not to be experts in “correct” grammar, ready at any moment to smack your wrist for a split infinitive. What they seek are the underlying rules of how language works in the minds and mouths of its users. In the common shorthand, linguistics is descriptive, not prescriptive. What actually sounds right and wrong to people, what they actually write and say, is the linguist's raw material.

But that raw material is surprisingly elusive. Getting people to speak naturally in a controlled study is hard. Eavesdropping is difficult, time-consuming and invasive of privacy. For these reasons, linguists often rely on a “corpus” of language, a body of recorded speech and writing, nowadays usually computerised. But traditional corpora have their disadvantages too. The British National Corpus contains 100m words, of which 10m are speech and 90m writing. But it represents only British English, and 100m words is not so many when linguists search for rare usages. Other corpora, such as the North American News Text Corpus, are bigger, but contain only formal writing and speech.

Linguists, however, are slowly coming to discover the joys of a free and searchable corpus of maybe 10 trillion words that is available to anyone with an internet connection: the world wide web. The trend, predictably enough, is prevalent on the internet itself. For example, a group of linguists write informally on a weblog called Language Log ...

Read the whole thing! It's a welcome example of an article about linguistics in the popular press that is clear, accurate and interesting. I don't think that my evaluation is influenced by the fact that it cites Language Log, and quotes Philip Resnik and me. Like most scientists and scholars, I'm usually more critical than I should be of articles about topics I know something about, and most critical of all when an article mentions or quotes me.

I particularly like the final point:

The easy availability of the web also serves another purpose: to democratise the way linguists work. Allowing anyone to conduct his own impromptu linguistic research, some linguists hope, will do more to popularise their notion of studying the intricacy and charm of language as it really exists, not as killjoy prescriptivists think it should be.

If I were a pedant, not to say a killjoy prescriptivist, I might suggest the alternative "...allowing anyone to conduct their own linguistic research ... will ... popularize the notion of studying the intricacy and charm of language as it really exists", for reasons that Geoff Pullum explained in a Language Log post from last August. But I'm not, so I won't.

Posted by Mark Liberman at 02:37 PM

Buffaloing buffalo

All right, I know, in the previous post I claimed that there were "a very few peculiar exceptions" to my claim that for any arbitrary string of words, with repetitions or without, nearly all orders you can put them in are ungrammatical; and I didn't say what the exceptions were. I can tell (no need to email me) that you, as a curious Language Log reader, want to know. I will at least partially satisfy your curiosity. Read on.

I will give you just one example, the case of strings consisting entirely of repetitions of the word buffalo. It turns out that all such strings are grammatical. Here are a few such strings, with rough paraphrases so that you can see that they have to be grammatical:

"Engage in bamboozlement."

Buffalo buffalo.
"American bison are characteristically given to engaging in bambloozlement."

Buffalo buffalo buffalo.
"American bison are characteristically given to bamboozling other members of their species."

Buffalo buffalo buffalo buffalo.
"American bison habitually bamboozled by members of their own species (that is, buffalo whom other buffalo regularly buffalo) characteristically engage in bamboozlement."

Buffalo buffalo buffalo buffalo buffalo.
"American bison habitually bamboozled by members of their own species (that is, buffalo whom other buffalo regularly buffalo) tend to return the compliment by bamboozling in turn yet other members of the species."

Buffalo buffalo buffalo buffalo buffalo buffalo.
"American bison habitually bamboozled by members of their own species that have themselves been bamboozled by others of their ilk (that is, buffalo whom other buffalo who have themselves been buffaloed by buffalo regularly buffalo) tend to engage in bamboozlement."

Buffalo buffalo buffalo buffalo buffalo buffalo buffalo.

O.K., I think I've done enough of these for you to see which way this is going.

Posted by Geoffrey K. Pullum at 01:03 PM

Nearly all strings of words are ungrammatical

Language Log is written by professional linguists. Even the utterly silly bits (and lord knows I've done a few) are written by professional linguists taking time off from being the serious scholars they normally are. And linguists are often concerned to point out that many of the things ordinary folks imagine to be grammatical errors are in fact perfectly grammatical and acceptable. But linguists are not like indulgent parents lowering the bar so they can congratulate every child for jumping over it. There's definitely such a thing as a syntactic error, even in your native language, even as judged by descriptive linguists. Here is just one example, from a reader's letter that I noticed in a copy of the Philadelphia Inquirer (Friday, December 31, 2004, p. A260) that was lying around near the water cooler in one of the corridors at our headquarters, Language Log Plaza:

Why do some teachers, parents and religious leaders feel that celebrating their religious observances in home and church are inadequate and deem it necessary to bring those practices into the public schools?

"Letters may be edited for clarity, length and accuracy," it says on the letters page. But where are the copy editors when you need them? There are two present-tense verbs here, both inflected for plural agreement. One of the two is wrongly inflected.

The first is the auxiliary verb do. It has the form do because its subject is the plural noun phrase some teachers, parents and religious leaders. That is correct.

The second is a form of be. (Remember, I use bold italics when naming a lexeme.) It's just before the adjective inadequate. It has the form are. But that's a mistake. Perhaps the writer became momentarily confused and thought its subject was the plural noun phrase their religious observances. But it's not. The subject is a gerund-participial clause, celebrating their religious observances in home and church. Clauses count as singular. In fact you can make up a sentence in which either singular or plural agreement is possible with the same string of words as subject:

[1] Moving pianos is dangerous.
[2] Moving pianos are dangerous.

The first has singular agreement (is), so moving pianos has to be read as a clause: it's about the thing you're doing when you move a piano. The second has plural agreement (are), so moving pianos has to be read as a plural noun phrase: it's about loose pianos rolling around. This is possible because moving can function as an attributive modifier of the head noun pianos.

But celebrating their religious observances in home and church is a clause with celebrating as its verb, not a noun phrase with observances as its head. The word celebrating cannot possibly be an attributive modifier with observances, because their, a genitive pronoun, follows it; that cannot be anything but a determiner, and determiners precede attributive adjectives. (Genitive pronouns cannot serve as attributive modifiers: phrases like *the my house or *an our cat are utterly ungrammatical.)

So the subeditors on the Inquirer's letters page dropped the ball here. There's no possible way that sentence is grammatical.

So we linguists don't dismiss as puristic prescriptive nonsense all claims of some attested sentence being ungrammatical. Our aim, at least, is only to dismiss the ones that are puristic prescriptive nonsense.

And I'll tell you something else. It seems clear to me that nearly all strings of English words you can construct are ungrammatical. Try writing down any random sequence of words (a fully grammatical one if you want to bias things against my claim), either with repetitions or without, it doesn't matter. With a very few peculiar exceptions, for any string of words you will find that almost every one of the orders in which those words can be arranged will be ungrammatical — exponentially many more are ungrammatical than are grammatical.

Posted by Geoffrey K. Pullum at 12:48 PM

Deep-seeded ignorance

According to a Jan. 19 Fox News story from Houston about how "[a]n application form to join a parochial schools group that was sent to Texas Islamic schools has created misunderstanding and anger between local Muslims and Christians",

Iesa Galloway, Houston Executive Director of the Council on American-Islamic Relations (search) said the questionnaire was "rooted in deep-seeded ignorance of the religion of Islam and the Muslim people."

For most Americans, "deep-seeded" is pronounced exactly the same way as "deep-seated", due to (what linguists call) flapping and voicing of /t/ in words like seated, as in many other contexts (e.g. in fatter and rabbiting and at all, but not in attack). And in terms of the current ordinary-language meaning of the words involved, "deep-seeded ignorance" makes sense, while "deep-seated ignorance" doesn't. Ignorance can be planted deep and thus have deep metaphorical roots, but deep-seated ignorance would have to be ignorance cut with a lot of room in the crotch, or maybe ignorance sitting in a badly-designed armchair.

Still, Fox News needs better copy editors.

The established phrase is "deep-seated", which is listed in any good dictionary and has 590,000 Google hits, while "deep-seeded" is not listed in any dictionary (at least as far as I've checked), and has only 24,800 Google hists, so that the public vote is 96% for seated, 4% for seeded.

We've been accused recently of "let[ting] stodgy prescriptivism out into Language Log". In fact, I'm a linguistic libertarian -- I think you should speak and write as you please, but you should also understand what you're doing, and accept the consequences. In this case, if you write about "deep-seeded ignorance", you'll be using what most educated people will take to be a misconstrual of a long-established phrase.

The fact that roughly 4% of the population has the wrong idea about this phrase is a perfect example of the forces that lead to the formation of eggcorns. (Indeed, deep-seeded was mentioned here last fall as an example of this process.) The substitution sounds the same, and it means something plausible. Both similar sound and sensible meaning are essential -- no one is likely to make the mistake of writing "seated rolls" in place of "seeded rolls", or "deep-chaired ignorance" in place of "deep-seated ignorance".

The OED defines deep-seated as "Having its seat far beneath the surface". This would make sense for the meaning of seat given in the AHD as

6a. The place where something is located or based: The heart is the seat of the emotions.

but that sense of seat is essentially obsolete, except in fixed expressions like "deep-seated".

The idea of seeds being buried is much commoner these days, both literally and metaphorically, than the idea of seats having a similar property, as these Google counts suggest:

"buried seed(s)" 10,400 "seed(s) buried" 6,400
"buried seat(s)" 13 "seat(s) buried" 232

And some of those few buried seats are not real:

When he got the pardon, he looked at it and went back to his seat, buried his face in his hands and cried.

So it's odd to call the use of "deep-seeded" a mistake, since it combines ordinary words of English in a grammatically correct and semantically reasonable fashion. But a mistake it will usually be, at least in the view of most readers, because the existing phrase "deep-seated" gets in the way.

This is what John Dryden was getting at when he asked

Wouldst thou the seeds deep sown of mischief know,
And how the egg-corne doth emplanted grow?

OK, he didn't ask any such thing. But if he had, you'd know the answer.


Posted by Mark Liberman at 10:12 AM

January 19, 2005

More on Browning, Pippa and all

My post this morning on Twat v. Browning left several questions open: how did the writer of the OED's entry on twat know how Browning formed his idea of its meaning? What was Vanity of Vanities (1660) and who wrote it? And how did Pippa Passes, KY, get its name?

By the time I checked my email this afternoon, all these questions had been answered by notes from readers.

Anthony Hope sent a page from Dr. Bowdler's Legacy, which reads in part:

In 1841, Browning published the long dramatic poem Pippa Passes, now best known for the lines “God’s in His heaven/ All’s right with the world.” Toward the end of it, he sets up a kind of Gothic scene, and writes:

Then, owls and bats,
Cowls and twats,
Monks and nuns, in a cloister’s moods,
Adjourn to the oak-stump pantry!

The second of these lines created no stir at all, presumably because the middle class had truly forgotten the word “twat” (just as it had forgotten “quaint,” so that Marvell’s pun on the two meanings in “To His Coy Mistress” has fallen flat for six or eight generations now). A few scholars must have recognized the word, but any who did behaved like loyal subjects when the emperor wore his new clothes, and discreetly said nothing. No editor of Browning has ever expurgated the line, even when Rossetti was diligently cutting mere “womb” out of Whitman. The first response only came forty years later when the editors of the Oxford English Dictionary, collecting examples of usage, like Johnson before them, and interested to find a contemporary use of “twat,” wrote to Browning to ask in what sense he was using it. Browning is said to have written back that he used it to mean a piece of headgear for nuns, comparable to the cowls for monks he put in the same line. The editors are then supposed to have asked if he recalled where he had learned the word. Browning replied that he knew exactly. He had read widely in seventeenth-century literature in his youth, and in a broadside poem called “Vanity of Vanities”, published in 1659, he had found these lines, referring to an ambitious cleric:

They talk’t of his having a Cardinall’s Hat;
They’d send him as soon an Old Nun’s Twat.

If you are sufficiently delicate and sheltered, it is possible to take the last word as meaning something like a wimple, and Browning did. A fugitive and cloistered virtue can get into difficulties that even Milton didn’t think of.

Andrew Gray answered the question about Vanity of Vanities:

It occured to me that in 1660, an untranslated work in English would likely have been printed in the UK; as such, there was a very good chance a copy was still around, and a reasonable chance it had actually been titled that.

So I checked COPAC (copac.ac.uk), which is a remarkably useful resource - a single search interface for most of the major university (and deposit) libraries in the UK. It produced two records:

Vanity of Vanities: or Sir H. Vanes Picture to the tune of the Jews Corant. [A satirical ballad.] / [by] Vane, Sir Henry . 1660

Vanity of Vanities or Sir Harry Vanes Picture. : To the tune of the Jews Corant . 1660

although on examination the second is a microfilmed copy of the first, which is held at the British Library. On further digging they also hold a 1659 copy; I don't know if there was a difference between the two, or if the OED deliberately chose the more recent edition for the sake of it.

I assume Harry Vane was Sir Henry Vane the Younger, who among other things served as the governor of Massachusetts; his father had been dead for five years, so it likely wasn't him. Interesting chap, by the looks of it...


[Update 1/23/2005: Dan Holbrook wrote to correct this attribution:

I've got access to Early English Books Online at my school, and having read the poem in question, I think it's pretty clear that Sir Henry Vane is the subject (hence "Portrait"), and that he wouldn't have written such disparaging things about himself (of his father and him: "The Devil no're see such two Harry's"). Given the authority many readers (I, for one) attach to Language Log, it would probably be a good idea to edit the old post before this error propagates all over the internet.

Sure enough. Our library has EEBO too, and so here is an image of the (anonymous) broadsheet. It's nice to see that Language Log is considered the Weblog of Record, if only for anonymous 17th-century religious-political song lyrics.]

And Paul Bickart wrote:

From my copy of "American Place Names" by George R. Stewart (Oxford University Press, New York (1970)):

Pippapasses KY Named, ca. 1915, by a schoolteacher from Browning's poem, with the idea of doing good by unconsciously influencing other people.

Doesn't get us a whole lot forrader, does it?

Well, it's a good start. For one thing, it confirms that Pippa Passes was actually named for the poem, rather than (say) a sequence of narrow gaps between mountains named for some Appalachian pioneer. And it tells us that the place was named in 1915, by a schoolteacher.

And the odd demographics of Pippa Passes offer a clue about who the schoolteacher might have been, and what context the naming might have taken place in.

Population (year 2000): 297, Est. population in July 2002: 295
Males: 58 (19.5%), Females: 239 (80.5%)
Median resident age: 20.8 years
White Non-Hispanic (97.3%)

What sort of hamlet in the mountains of Kentucky would have a female population of 239 out of 297, and a median resident age of 20.8 years?

Well, part of the story seems to be that Pippa Passes is the home of Alice Lloyd College. This doesn't entirely explain the demographics, since ALC is said to have 557 students, 45% of whom are male. Maybe the men's dorms are outside of the town limits?

Anyhow, the school's history apparently

traces back to a community center founded by Alice Geddes Lloyd, a newspaper reporter from Massachusetts who moved to Kentucky in 1916 for health reasons. By 1923 the center had evolved into a junior college, and in 1980 it became a senior college.

I think we can identify Ms. Lloyd as the schoolteacher responsible for the name. She seems to have been quite a woman, and very much the sort of person who would have adopted Browning's Pippa as a totem.

And for lagniappe, John Kozak writes:

To Browning's 'twat' I'll add Bulwer-Lytton's: in "The Coming Race", a novel about an underground race of amphibian super-beings, we get, in a passage about one of their leading thinkers:

Among the pithy sayings which, according to tradition, the philosopher bequeathed to posterity in rhythmical form and sententious brevity, this is notably recorded: "Humble yourselves, my descendants; the father of your race was a 'twat' [...]

B-L adds the gloss "(tadpole)" after this. I /think/ what he was referring to is a Suffolk dialect word "twud", but I don't have anything decent to check that against.

Happy new year!

Same to you, John!

[Update 1/20/2005: In reference to twud, Ray Girvan writes:

This fits with the regional rhyme (I've seen it in Arnold Silcock's "Verse and Worse") - "an 'ornet lived in a 'oller tree, an' a narsty spiteful twud were 'ee" - where "twud" is usually interpreted as "toad".



Posted by Mark Liberman at 05:57 PM

Our bird aviary

On the outer wall of the building that houses a big pet store out by the freeway near Santa Cruz I noticed this sign:

Take a Look and See Who's In Our


And yes, my inner pedant — the priggish, prescriptive, Mr Hyde that sometimes tries to emerge from the tolerant and scientific Dr Jekyll that is truly me — did spend a few seconds devising a scathing formulation of the obvious question concerning what other kinds of aviary they thought there might be. But I held my evil side in check: I did not go in and ask them.

The string "bird aviary" gets about 14,600 Google hits, of course. But some of those would not excite the ire of people who hate ignorant redundancy of phraseology ("Omit needless words! Omit needless words!"). For example, "Tropical Bird Aviary" is not redundant: all aviaries are for birds, but not all have tropical birds. And "tropical aviary" might suggest an aviary in tropical latitudes, rather than a place (at any latitude) where tropical birds are housed.

Geraint Jennings tells me he once saw a "bat aviary". That would not please the pedant one bit, of course. It should be called (he would say) a chiropterarium.

Posted by Geoffrey K. Pullum at 12:54 PM

Twat v. Browning

In response to Monday's post about dismortality/epismetology and puppetutes/pompatus, ACW emailed that "[y]our Language Log piece about the 'etymology' of Steve Miller's 'word' reminded me of the famous Twat v. Browning case".

In Robert Browning's poem Pippa Passes, Browning uses the word "twat" under the misimpression that it was an article of nun's clothing:

   Then owls and bats
   Cowls and twats
   Monks and nuns in a cloister's moods
   Adjourn to the oak-stump pantry

I don't have a reference to hand, sorry.  Internet resources on this seem to be scarce, but I think the whole story is in the OED entry for "twat".

Anyway, bemused etymologists eventually tracked down the source of Browning's confusion.  It was a 17th-century satirical poem called "Vanity of Vanities"; the relevant lines are:

   They talk'd of his having a Cardinall's Hat
   They'd send him as soon an Old Nun's Twat

This somehow reminds me of how Miller reinterprets Green's adolescent homespun pornographic "puppetutes".

Well, the OED entry certainly suggests a story, but it's far from whole. In fact, the entry is a curious and interesting document:

low slang.
[Of obscure origin.]

1. (See quot. 1727.)
Erroneously used (after quot. 1660) by Browning Pippa Passes IV. ii. 96 under the impression that it denoted some part of a nun's attire.

1656 R. FLETCHER tr. Martial II. xliv. 104.
1660 Vanity of Vanities 50 They talk't of his having a Cardinalls Hat, They'd send him as soon an Old Nuns Twat.
a1704 T. BROWN Sober Slip in Dark Wks. 1711 IV. 182 A dang'rous Street, Where Stones and Twaits in frosty Winters meet.
1719 D'URFEY Pills III. 307.
1727 BAILEY vol. II, Twat, pudendum muliebre. Twat-scowerer, a Surgeon or Doctor. E. Ward.
1919 E. E. CUMMINGS Let. 18 Aug. (1969) 61 On Tuesday an Uhlan To her twat put his tool in.
1934 H. MILLER Tropic of Cancer 55 A man with something between his legs that could..make her grab that bushy twat of hers with both hands and rub it joyfully.
1959 N. MAILER Advts. for Myself (1961) 101 The clothes off, the guards are driving them into the other room, and smack their hands on skinny flesh and bony flesh, it's bag a tittie and snatch a twot.
1970 G. GREER Female Eunuch 39 No woman wants to find out that she has a twat like a horse-collar.
1973 P. WHITE Eye of Storm iii. 137 This young thing with the swinging hair and partially revealed twat.

2. A term of vulgar abuse. Cf. TWIT n.1 2b and CUNT 2.
[quotations omitted]

3. U.S. dial. The buttocks.
[quotations omitted]

In the first place, sense 1 is not given a gloss, but instead refers us to the 1727 quotation, which gives the gloss in Latin ("pudendum muliebre"). This is oddly circumspect, given that elsewhere in the OED, cunt is defined straightforwardly (if incorrectly?) as "the female external genital organs". Continuing the circumspection, Fletcher's 1656 translation of Martial and D'Urfey's 1719 quotation are given only as citations, without the quotes. This is odd given the Cummings, Miller, Mailer and Greer quotations, which are hardly demure. Is this perhaps a residue of changing editorial policies over time?

But the reference to Browning seems especially unusual to me. A citation is given without a quote, and the citation is to an idiosyncratic usage that is identified as a lexical misunderstanding on the part of the poet. Even curiouser, the editor confidently identifies the specific source of the mistake as misunderstanding of the 1660 quotation from Vanity of Vanities. Is this because Browning fessed up to the error somewhere? or has the editor done some more indirect literary detective work?

To add to the mystery, the author of Vanity of Vanities is not identified, and the title is not in the OED's bibliography, at least as far as I can tell. And the quote in question ("They talk't of his having a Cardinalls Hat, They'd send him as soon an Old Nuns Twat.") is not found in the otherwise compendious LION database.

Browning's Pippa Passes does exist -- it's an 1841 verse drama about Pippa, a girl "from the silk-mills", best known for Pippa's song from Part I:

222      The year's at the spring
223      And day's at the morn;
224      Morning's at seven;
225      The hill-side's dew-pearled;
226      The lark's on the wing;
227      The snail's on the thorn:
228      God's in his heaven---
229      All's right with the world!

The twat reference is from Pippa's evening song, at the end of the play in Part IV:

102 Day's turn is over, now arrives the night's.
103 Oh lark, be day's apostle
104 To mavis, merle and throstle,
105 Bid them their betters jostle
106 From day and its delights!
107 But at night, brother howlet, over the woods,
108 Toll the world to thy chantry;
109 Sing to the bats' sleek sisterhoods
110 Full complines with gallantry:
111 Then, owls and bats,
112 Cowls and twats,
113 Monks and nuns, in a cloister's moods,
114 Adjourn to the oak-stump pantry!

From the context, it makes sense to interpret twats as referring to nuns' headgear. Certainly the more standard interpretation is at variance with the religious imagery and with Pippa's fresh-faced girlishness. But without the OED's guidance, I would have read it simply as evidence that Browning's 1841 interpretation of Austrian girlishness was less demure than I might have expected.

One last question: how did Pippa Passes, KY come to be named for Browning's play?

[Update: a bit of Googling uncovers the fact that Language Hat mentioned Browning's mistake (?) last year, and gave a link to the missing Fletcher translation of Martial II. xliv. ]

[Update #2: See here for more answers... ]


Posted by Mark Liberman at 07:59 AM

January 18, 2005

The kaleidoscope of power

Done forever with my reading of The Da Vinci Code, I had to find a way of disposing of the offending object. (Even the title contains a linguistic error, Adam Gopnik claims in this week's issue of The New Yorker. Leonardo came from Vinci. Da Vinci is not a name. It's a prepositional phrase, like of Nazareth in Jesus of Nazareth. What would Of Nazareth do?)

But clogged recycling centers are now refusing to accept copies of Brown's book, and libraries are closing their after-hours book drops to avoid having people getting rid of them that way by night. So (I'm a cruel father, but fair) I hit upon the idea of sending the book on to my son Calvin, who I recently learned had not read it. Within a day or two after the package reached him I got an email:

The Da Vinci Code, page 30:

"Five months ago, the kaleidoscope of power had been shaken, and Aringarosa was still reeling from the blow."

What the fuck does that even mean?

Perhaps he meant something like: "The kaleidoscope of power had been shaken and the orange-green pattern of courage had been consumed by the yellow-red jumble of fear"?

Calvin did explore the matter a bit further, looking up kaleidoscope on dictionary.com, and he found a possibly relevant though little-known third definition for the word — after (1) pattern-displaying optical toy with mirrors and lenses and colored glass pieces, and (2) multi-colored pattern such as is produced thereby:

3. A series of changing phases or events: a kaleidoscope of illusions.

But he comments:

Even so, that has got to be one of the worst mixed metaphors ever. It's like mixing oil and Lego. And I'm still reeling from the crunchy salad.

Quite so. After all, the shaking of a kaleidoscope generally has effects involving new randomly selected patterns of colors, but that is not a blow. So what was the blow? The kaleidoscope of power had been shaken according to Brown's metaphor, not hit with a heavy blunt instrument. And why would shaking the kaleidoscope mean Aringarosa had been hit? Perhaps someone hit Aringarosa over the head with the kaleidoscope of power... in order to avoid having to shake it?

One has to admit, Calvin is right, we don't have a clear picture here. But then it's always like that with Dan Brown. As I believe I may have said before, when Dan Brown is doing the describing, you really need pictures.

Posted by Geoffrey K. Pullum at 04:09 PM

The scenic route

In a post on 1/15/2005 to the Risks Digest, under the heading "MapPoint explains Vikings?", Adam Shostack pointed out that

When going from Haugesund, Rogaland, Norway, to Trondheim, Sør-Trøndelag, Norway, be aware that following Microsoft MapPoint's directions, will take you through England, France, Belgium, the Netherlands, Germany, Denmark, Sweden, and finally back into Norway. While this may be culturally sensitive and respectful of historic Viking routing, rooting, or looting, it is somewhat less efficient than other routes, as a quick glance at a map will show.

Start: Haugesund, Rogaland, Norway
End: Trondheim, Sør-Trøndelag, Norway
Total Distance: 1685.9 Miles

Adam was following earlier directions from Nick Brown (or at least Nick's post to the same forum was earlier):

1. Go to http://mappoint.msn.com/DirectionsFind.aspx
2. In the Start section, select "Norway" from the listbox and enter "Haugesund" into the "City" field
3. In the End section, select "Norway" from the listbox and enter "Trondheim" into the "City" field
4. Click on "Get Directions"

MapPoint's point-by-point directions are fully up-to-date, so that the first three steps are

Start: Depart Haugesund, Rogaland, Norway on 47 [Karmsundgata] (West) 0.6 miles
1: At roundabout, take the THIRD exit onto 47 [Djupaskarvegen] 0.9 miles
2: Road name changes to Garpeskjærvegen 0.1 miles
3: *Check timetable* Take Haugesund-Newcastle Upon Tyne (North-East)  

Well, the distances are in miles, but that is not an anachronism, but rather (I imagine) an automatic response to the IP address from which I made the query. If you ask for the route in the opposite direction, you get a less scenic and interesting answer, as shown in the picture on the right. In this case, the distance is given as 476.1 miles, or 1209.8 miles less.

I'm more sympathetic to such indirect navigation in weblog posts than in real-life journeys. When I've occasionally rented a car with a GPS-based navigation system, such as Hertz NeverLost, I've just entered the destination and followed the instructions without checking a map. Note to self: don't do this again.

[Risks Digest link via email from Fernando Pereira]

[Update: Nathan Vaillette wrote:

long time listener, first time caller, so to speak. Love your work.

Regarding the Haugesund--Trondheim itinerary: try the same search at mappoint, but asking for the "shortest route" rather than the quickest (by clicking the relevant button under "Route type" on the right). The result is less rococo but arguably more absurd.

(By the way, am I missing the language tie-in in that blog entry?)

The "shortest route" is indeed very striking, since it involves taking the ferry from Haugesund to Newcastle Upon Tyne, and then turning right around and taking the ferry back from Newcastle Upon Tyne to Bergen. I guess that's one way to get around the fjords. This also suggests what might be wrong with the route-finding program -- perhaps it doesn't include water travel in its distance or time cost functions...

As for the missing language hook, I originally had two, one being the influence of Scandinavian on northern Middle English, and the other being the misplaced comma in Adam Shostack's original post. But I didn't have time to explain the first one before I had to get going on my morning obligations, and I decided that it would be ungracious to mention the second one. ]

[Update 1/19/2005: Thomas Paul writes with more information:

MapPoint seems to have fixed/broken the problem. If you try to create a route between the two cities you now get:

"A route between the locations you entered could not be calculated. One of the locations may not contain necessary connectors such as ferry routes or main roads. Please change one or both of the locations."

Yoiu get this no matter which city you make the starting point. My guess is that the problem was not caused by MapPoint not figuring in the time of the ferry (they correctly estimated the roundtrip as being 51 hours) but rather that they simply didn't have a route out of Haugesund, Rogaland. The only route they could find out of town was the ferry, although they did find a route into the city. Notice that the "Grand Tour" through Europe only took 48 hours which is why MapPoint took it as the "quickest" route rather than taking the ferry back to Bergen, Norway which takes 51 hours.

Hmm. MapPoint's own map certainly makes it seem that there are roads from Haugesund to the rest of Norway, but I guess appearances might be deceiving... ]


Posted by Mark Liberman at 06:30 AM

Lower Case Names?

It is news to me that E. E. Cummings did not insist that his name be written e. e. cummings. Nonetheless, this raises an interesting question. Do people have a right to insist that other people use a particular, unconventional capitalization of their name? If I write Bell Hooks in spite of Ms. Hooks' expressed preference for bell hooks, does she have a legitimate grievance? (I'm not going to link to her web site. It looks ghastly. The color is something out of a nightmare. If you insist, you can find the link in Geoff's post.)

I submit that the answer is "no". People are entitled to choose the name that they go by, subject to a few constraints, but how that name is written is not, for the most part, up to its bearer. That's because names are part of a language, and the way a language is written is governed by socially accepted conventions. Just as a name must conform to the phonological system of the language, so the way it is written must conform to the orthographic conventions of the language. If I were to announce that I wanted everyone henceforth to spell my name <Bille>, though people might think me eccentric, I would expect them to comply, but if I were to announce that I wanted everyone to write my name <⊱⍼>, few people would feel any obligation to do so. The reason is that although <Bille> is not the usual way of writing my name, it falls within the conventions of written English. It uses only letters that are part of the English inventory, the correspondance between letters and sound is canonical, and it is capitalized according to the rule that says that proper nouns are to be capitalized. On the other hand, <⊱⍼>, attractive as it may be, uses characters that are not part of the English inventory. In demanding that people write my name this way, I would be demanding that they extend the writing system of their language in a unique way just to write my name.

Capitalization is part of the social convention for writing English. Like the alphabet, it isn't something that the writing system makes available for manipulation by individual users. Declining to violate the norms of capitalization should be no more offensive to the bearer of the name than declining to write a person's name always at the beginning of the sentence, regardless of its grammatical role. That just isn't the way it is done in English.

Thus, Geoff should feel free to write about Zeiran as he wishes to, even putting Zeiran's name at the beginning of a sentence. Zeiran may not like it, but Geoff need feel no guilt about it. He has no ethical obligation to comply with Zeiran's wishes on this point.

By the way, the idea of writing a name at the beginning of the sentence regardless of its grammatical role isn't entirely a matter of whimsy. Egyptian royal names usually contained the names of gods. As a sign of respect, the god's name was usually written first, even if it actually appeared phonologically later in the name. The photograph below shows the name of Tutankhamen. As you can easily see, the first three characters (roughly the first row) spell the name of the god Amun. The second row reads /twtʕnx/, the first two syllables of his name as conventionally anglicized. (Egyptian did, of course, have vowels. It just didn't usually write them.) The last row is a title, not part of his name. (For those whose Egyptian is rusty, this text reads right-to-left.)

cartouche of Tutankhamen
Posted by Bill Poser at 01:14 AM

January 17, 2005

Capitalization and Mr Cummings

While poking around for evidence about another person who is thought to have spelled his name all in lower case, zeiran r'ei discovered that with regard to one very famous case, the poet E. E. Cummings, the widely believed proposition that he insisted on lower case only is merely a widespread myth. The interesting details can be found here.

Thank goodness. I had been wondering how to begin sentences about Cummings. The principles for printed prose set out in The Chicago Manual of Style (one style guide that is worth taking seriously) say that a sentence should never begin with a lower-case letter or a nonalphabetic character. I think that is a very good principle. But it means I can never mention zeiran at the beginning of a sentence, only after it has already begun.

E. E. Cummings, though, can (I now learn) stand as the subject of a main clause with no preceding adjunct, which makes him much easier to talk about. Not that I have anything much to say about him, except that he is responsible for a reprehensible poem that directly suggests that syntacticians are not sexy. Cummings tended not to title his poems, but this one is generally known by the title "since feeling is first":

[Copyright 1926, 1954, © 1991 by the Trustees for the E. E. Cummings Trust; copyright © 1985 by George James Firmage, from Complete Poems: 1904-1962 by E. E. Cummings, ed. by George James Firmage; used by permission of Liveright Publishing Corporation; this selection may not be reproduced, stored in a retrieval system, or transmitted in any form or by any means without the prior written consent of the publisher.]

since feeling is first
who pays any attention
to the syntax of things
will never wholly kiss you;

wholly to be a fool
while Spring is in the world

my blood approves,
and kisses are a better fate
than wisdom
lady i swear by all flowers.    Don't cry
—the best gesture of my brain is less than
your eyelids' flutter which says

we are for each other:then
laugh,leaning back in my arms
for life's not a paragraph

And death i think is no parenthesis

The gratuitous insult to grammarians everywhere is contemptible. We grammarians are in fact very sensual, sexy, and exciting people. When a grammarian kisses you, you stay kissed.

[By the way: if you want a real example of someone who insists on lower case, Lisa Davidson reminds me that the African American writer bell hooks is one.]

Posted by Geoffrey K. Pullum at 08:30 PM

Mattress mice

Seymour Hersh's New Yorker article entitled "The Coming Wars" (posted 1/17/2005) ends like this:

“Rumsfeld will no longer have to refer anything through the government’s intelligence wringer,” the former official went on. ... Rumsfeld no longer has to tell people what he’s doing so they can ask, ‘Why are you doing this?’ or ‘What are your priorities?’ Now he can keep all of the mattress mice out of it.” [emphasis added]

"Mattress mice" is a phrase I hadn't heard before, so I did a bit of poking around. Google shows 16 hits at the moment, of which only 5 are independent real instances of the phrase -- and one of them is in Russian!

(link) "Mattress mice." Cheney doesn't care about polls or media criticism, confidants say. In private, he makes fun of the anonymous sources who criticize him in the media as "mattress mice" and tells friends that it isn't his job to curry favor with anyone except Bush, even if his public reputation suffers.
(link) Ambassador Martin, who died in 1990, was a strange combination of Pollyanna and paranoid. He often seemed to regard the Washington bureaucracy rather than the Vietnamese communists as his main enemy. In a just-declassified and previously unpublished cable, he ranted that State Department foes were calumniating him in the U.S. press: ''The sly, anonymous insertions of the perfumed ice pick into the kidneys in the form of the quotes from my colleagues in the Department are only a peculiar form of acupuncture indigenous to Foggy Bottom against which I was immunized long ago.'' If the ''mattress mice'' in Washington were pressing him to prepare an evacuation -- well, he knew the situation better: ''I have been right so far, which is unforgivably infuriating to the bureaucracy.'' [from Time Magazine, Apr. 24, 1995]
(link) The people you are dealing with are "mattress mice"---people buried in an organization who have little power except when they can say "no" to you.
(link) For the Wechsler proposal to have a chance, the President would need to take on the mattress mice and announce a team even before securing the reforms. [From a letter to The National Interest, summer 2002]
(link) Аттрибутированы только позитивные. Вот уж и правда mattress mice.
(Emphasis added throughout...)

I didn't find the phrase "mattress mice" in the OED, the AHD, Encarta, Merriam-Webster online, or Merriam-Webster's Unabridged. However, the four English-language instances that Google finds are enough to make the meaning clear.

The earliest Google-indexed citation is from Time Magazine in 1995. LexisNexis finds a George Will op-ed from Sept. 18, 1986:

George Carver, a senior fellow at Georgetown University's Center for Strategic and International Studies, says that the United States has, in effect, agreed to play by Soviet rules regarding espionage and has placed a mantle of protection over Soviet spies: ''The next time the FBI catches a Soviet, the mattress mice in the State Department and the White House will be out wringing their hands and saying, 'Oh, no, we can't arrest him! We don't want another Daniloff thing; there's too much going on.'''

There is also a transcript from of testimony by Lawrence Eagleburger (then Deputy Secretary of State) at a hearing of the Foreign Operations Subcommittee of the Senate Appropriations Committee, March 20, 1990, in the course of which which Mr. Eagleburger says:

Let me say now [that] I don't know where the reports are coming from that we would be happy with less than the $500 million for Panama and we would like $200 million or whatever it may be put into Eastern Europe. I -- if it is mattress mice in the State Department who are saying that, I emphasize they are mattress mice. The fact of the matter is that the President of the United States, the Secretary of State, Secretary of the Treasury and everybody else, including those of us who are here right now, believe the $500 million is necessary for Panama.

And Eagleburger uses the phrase again in an interview with Brian Williams on April 10, 2001:

WILLIAMS: And you know, of course, that's another argument that some people who long for the tense days of the Cold War, especially in the West Wing of this White House, are somehow looking for a fight.
EAGLEBERGER: Well, I hope that's not true. I don't see it as the case at this point, at least not the senior leadership. I am sure there are some people down a level or two who came in with the administration that are slightly above the level of mattress mice who probably do want a confrontation.

So mattress mice has apparently been a common inside-the-beltway idiom for at least 20 years, and probably longer. It's curious that despite its assonance, relevance and vividness, it hasn't made more headway in the wider world.


Posted by Mark Liberman at 04:10 PM

Dismortality and puppetutes

In the course of seconding the nomination of Dave Barry for a position on the NYT Op-Ed page, I cited his Ask Mister Language Person response to a lexicographical question. Further research suggests that his answer was wrong, though I am certainly not joining the ranks of those readers that Dave says "are sometimes critical of me because just about everything I write about is an irresponsible lie".

Here's the Q&A:

Q. In the song ``The Joker,'' what is the mystery word that Steve Miller sings in the following verse:
"Some people call me the space cowboy
Some people call me the gangster of love
Some people call me Maurice
'cause I speak of the (SOMETHING) of love.''
A. According to the Broward County Public Library, the word is "pompatus.''
Q. What does "pompatus'' mean?
A. Nothing. Steve made it up. That's why some people call him ``the space cowboy.''

A page on Cecil Adams' site "The Stright Dope", echoed by another one on Chris Harris' Steve Miller fan site, explain the true story. In the beginning was the Medallions' 1954 R&B hit The Letter, which included a line that Steve Miller apparently heard as

Let me whisper sweet words of epismetology and discuss the pompitous of love

Miller used his version of these neologisms in his 1972 song Enter Maurice, whose liner notes give the relevant sentence

My dearest darling,
come closer to Maurice so I can whisper sweet words of epismetology in your ear
and speak to you of the pompitous of love.

And in The Joker, the terms "space cowboy", "gangster of love" and Maurice all refer to earlier songs. No lyrics were published with the 1973 album on which The Joker appeared, but the word was spelled "pompatus" in the sheet music for Steve Miller's The Joker published in the songbook Rock Hits Through the Years, and that's the spelling adopted for the 1996 movie The Pompatus of Love.

But never mind whether it's "pompatus" or "pompitous". According to Vernon Green, who composed and performed The Letter, Miller mis-heard the original line [quoted from Harris' page]:

Vernon Green, the author of The Letter, says, "You have to remember, I was a very lonely guy at the time. I was only fourteen years old, I had just run away from home, and I walked with crutches." The uneducated but imaginative youth was prone to fantasy, so he just made up the lyrics. 'Pismotality' described words of such secrecy that they could only be spoken to the one you loved.

"And it's not pompitous," he emphasizes. "What I said was 'puppetuse', which is a term I coined to mean a secret paper doll fantasy figure."

According to Cecil Adams, "[t]he mystery words, [his assistant J.K. Fabian] ascertained after talking with Green, were 'puppetutes' and 'pizmotality.' (Green wasn't much for writing things down, so the spellings are approximate)."

According to this page, "Frank Zappa and a crew of linguists are still deciphering 'Sweet words of pismotology' and 'the pulpitudes of love' from the Medallions' 'The Letter'". Carrying the enterprise forward, let's put aside "pismotology" for the moment, and take up pompitous/pompatus/pulpitudes/puppetuse/puppetutes.

It's pretty clear, based on Green's paper-doll explanation, that the root morpheme must have been puppet. The evidence so far seems to be equivocal as to whether the second part was -us (or maybe -ous or -is?), by analogy to words like stimulus, or -ude by analogy to words like pulchritude , or -ute by analogy to prostitute. I haven't had a chance to listen carefully to a good-quality dub of the original recording, but I wonder whether Green might have said "puppetutes", just as J.K. Fabian reports, and meant it as a blend of puppets and prostitutes? As a 14-year-old in 1954 Los Angeles, he probably knew the word prostitute only as a fancy term for a sexually attractive and available woman.

There's an interesting discussion of these words in a 2002 article by Greil Marcus from Los Angeles Magazine, "In the secret country: Walter Mosley, doo-wop and '50s L.A". Marcus talks about Walter Mosley's character Easy Rawlins in 1956, when "his own story threatens to dissolve"

Anything can unmask him. And he admits it: "I nodded and bowed. My wife had left me, had taken my child, had gone off with my friend. There was no song on the radio too stupid for my heart."

That stupid song might have been the Jewels' "Hearts of Stone" or the Penguins' "Earth Angel," made in Los Angeles in 1954; Jesse Belvin's "I'm Only a Fool" or Don Julian and the Meadowlarks' "Heaven and Paradise," made in Los Angeles in 1955; Arthur Lee Maye and the Crowns' "Gloria," made in Los Angeles in 1956. But I like to think it was the Medallions' "The Letter," made in Los Angeles in 1954.

It is a profoundly stupid record--and also profoundly strange. There's no instrumentation except for a quietly rumbling piano. A few weak voices go "Oh--uh uh uh--oh" behind the lead singer, Vernon Green. He starts off crooning around one word: "Darling." As the backing singers shift into long "ooos," Green stops singing and starts talking. He speaks in a clipped, almost effete voice, not like a man but like a boy trapped in a fantasy he can't even begin to believe. "Darling--I'm writing this letter--knowing that you may never read it." The listener doesn't believe the person the singer is writing to exists.
As the singer goes on, his voice crumbles with puerile emotion. He seems confused, barely able to remember what he's talking about. He sounds like he's underwater, but he's in love with his own voice. "Darling--what is there words--on this earth--to be unable--to stop loving you," he says, swirling. "Oh! my darling!" Then Vernon Green--16, crippled by polio, who would wander the streets of Watts on his crutches, making up songs, trying to find people to sing with him--offered the words that would make him immortal, or at the least unsolvable. "And to kiss, and love--and then have to wait ... Oh! my darling. Let me whisper, sweet words of dismortality--and discuss the pompatus of love. Put it together, and what do you have? Matrimony!"

Dismortality. Pompatus. Matrimony! He sounds like a complete idiot. At the same time he sounds like someone who knows something you never will.

After describing the cultural history of mid-50s doo-wop, especially the Los Angeles version, Marcus takes up Vernon Green's neologisms again:

WHAT WAS VERNON GREEN REALLY SAYING IN "The Letter"? Nearly 20 years later, in 1972, Steve Miller took the phrase pompatus of love and put it in his "Enter Maurice"; the next year he highlighted the weird phrase in his number one hit "The Joker." Twenty-three years after that, screenwriters Richard Schenkman, Jon Cryer, and Adam Oliensis took the phrase for the title of a movie, and it meant ... Cryer decided he had to find out.

It was more than 40 years since the Medallions first stepped up to a microphone, but in Los Angeles there were still people willing to pay Vernon Green, now in a wheelchair, to sing "The Letter," and he wasn't hard to find. Dismortality--it meant, Green told Cryer, "words of such secrecy they could only be spoken to the one you loved." Pompatus, Green said, was a 15-year-old's word for "a secret paper-doll fantasy figure who would be my everything and bear my children." Dismortality can't be factored, but it signifies effortlessly. It communicates a will to escape the limits of ordinary life, to cheat death. Pompatus is a real word--in the dictionary, if only in a faint line in the Oxford English Dictionary. It's hard to imagine that Vernon Green ever learned it, but not that he found it, found it contained in the ordinary words anyone might speak. Pompatus means to act with pomp and splendor--exactly what, in "The Letter," a teenage Vernon Green tried to do. It was a specter he chased for the rest of his life until, following a show on March 4, 2000, he suffered a stroke, dying nine months later, on Christmas Eve.

Well, Marcus seems to miss the "puppet" morpheme altogether, in favor of a hypothesized connection to pomp and circumstances. It's a plausible idea, but it seems to be contradicted by Green's own testimony.

"Dismortality" makes a lot of sense as a coinage for "words of such secrecy they could only be spoken to the one you loved", and the missing [r] is easy to explain given the intermittent r-lessness of African-American speech. Still, I wonder why everyone else who has listened to the song and spoken with Green seems to have heard a [p] rather than a [d]. Maybe there was forward place assimilation from the [f] in the phrase "of dis..."? Maybe Green himself blended dis- and mis- to get pis-? Or maybe Marcus is on the wrong track, and the word was something like a blend of abysmal and totality, maybe with a bit of episcopal and even epistemology thrown in? Frank Zappa may be dead and gone, but the linguists are still on the case.


Posted by Mark Liberman at 01:44 PM

Parenthetical puns?

The Viennese architectural firm that just won the design competition for the new European Central Bank building in Frankfurt is called "Coop Himmelb(l)au". Their website suggests that the letter 'L' in the name should be subscripted (or at least lowered) as well as parenthesized: COOP HIMMELB(L)AU. Either they are not really serious about this part, or the newspapers are willing to accept parentheses in a name, but not subscripts.

HIMMELB(L)AU is a sort of pun: Himmelblau means "sky blue" in German, while Himmelbau would mean something like "sky building". There's a religious undertone missing in the English translations, since Himmel can mean "heaven" as well as "sky". There is probably also an echo of Hochbau, which can mean "building construction" or "structural engineering", with hoch meaning "high" or "tall" or "up". This substitution of Himmel for Hoch might remind others, as it does me, of Martin Luther's famous hymn Vom Himmel Hoch. And there are probably other elements in the name that would be apparent to someone who knows German well (as I do not).

Perhaps echoing the "sky building" pun, the winning entry includes a spread-out base called a "groundscraper", along with two twisted towers connected by an atrium which "serves as the communication hub with interconnecting platforms and communal areas. This satisfies two important elements of the competition brief, namely that the new premises should 'foster interactive communication' and 'promote teamwork'". Except, perhaps, for those suffering from acrophobia, among whom it seems likely to promote the screaming meemies.

Anyhow, word games are a lot easier if you get to parenthesize letters. I can't think of any other companies that have taken advantage of this possibility in constructing their names, though.

[Update: Tom Ace writes

The underwear brand 2(x)ist has a letter parenthesized (and superscripted to boot). It's not quite the pun that Himmelb(l)au is, but it does have a double (nay, exponential) meaning.

Yes. Perhaps it's only because I know English and mathematics better than I know German, but 2(x)ist seems to me to imply an even more elaborate set of puns than HimmelB(l)lau does. However,I guess you could argue that bringing in digits, mathematical parentheses, exponentiation and clothing sizes takes us out of the realm of purely linguistic puns where HimmelB(l)au operates. And in particular, the parentheses in 2(x)ist don't seem to mean "optionally leave this out", which was the trick I had in mind. Or is the equivalence "2ist" = "twist" supposed to be part of the package? ]

[Update 1/18/2005: Thomas Paul sent in a note from a German-speaking friend:

Himmelbau sounds for me more like "castle in the sky" and it doesn't have a religious undertone in this context, at least not for me. Or even more like "dream building". Building you ever dreamt of. Undertones in a language are a complex topic. "Himmel" can mean heaven and sky, but it has also a strong meaning of "great, fantastic", like himmelhochjauchzend (very happy). Might be that my grandma who is much more roman-catolic than I am sees an religious undertone. Am going to ask her, when i see her next time.



Posted by Mark Liberman at 09:13 AM

January 16, 2005

Was it Frazier or the copy editor?

Roger Shuy, the distinguished sociolinguist, writes me to say that he knew Ian Frazier slightly at one time, and doubts that he would use a nominative-case pronoun as complement of than, as in the passage I noted in a recent post:

Your recent piece on Ian Frazier's usage causes me to try to defend him, albeit a bit weakly. Ian lived here in Missoula for a few years while he did his research on his book, The Rez — about Indian reservation life (perhaps one of the few natural resources still remaining in this state). His kids went to the same elementary school that my daughter attended and I saw him many times. He even came to her class one time and gave a talk about Russia, which he appeared to know something about. But I digress. What I really wanted to tell you is that he is a very common man. He wears jeans and a baseball cap (even indoors) on all occasions (at least in Montana--who knows what he wears now that he's moved to New York City). He speaks in a natural conversational style — no high falutin' words. My guess is that it was his New Yorker editor who changed his them to they. Ian wouldn't be that uppity.

This is interesting, because it suggests another case where a copy editor should have been jailed or at the very least rebuked for time-wasting garbage and told to put things back the way they were. Was it Ian Frazier who chose the case on that pronoun? Or was it some meddling copy editor in the offices of The New Yorker? I would really love to know. Can anyone put Ian Frazier in touch with me so I can ask him? It is a matter of some interest. Remember, internal evidence (the informal you in the same sentence, and the homely nature of the first-person coming-of-age memoir) argues that the nominative pronoun was totally out of keeping with the style of the surrounding piece.

It would be very interesting to discover whether a copy editor changed the case on that pronoun. You see, copy editors often miss the things they really should be catching to earn their pay — wrong page references, name inconsistencies, misspellings, punctuation slips, that sort of thing. There they could be useful, but they often let us authors down. They're too busy messing with our grammar.

Take a look at the new issue of The New Yorker (January 17, 2005), on page 62, second column, bottom sentence:

It is also odd that other produce bags at the store had a red line across the top but the one with Soto-Fong's prints, did not.

See the error? That's a comma between subject and predicate. There's absolutely no excuse for it. It is a straightforward flat-out error in modern written English. The copy editor should have caught it. They miss things like this, and spend their time changing the syntax of perfectly serviceable Standard English into some fancy-schmancy puristic alternative version against the author's better judgment. I'd love to know whether that's what happened with Frazier's piece. So please put him in touch with me, somebody. You email to pullum; the site is ucsc (the University of California, Santa Cruz). and the domain is .edu. Tell Frazier there is no one I would rather find had emailed me than he.

Posted by Geoffrey K. Pullum at 07:39 PM

January 15, 2005

Don't put up with usage abuse

A correspondent in Arizona (he writes his name as zeiran r'ei, the lower case and apostrophe being apparently mandatory) emails me to say he had not heard about Strunk and White's The Elements of Style until I mentioned it. (I feel awful, of course: his life had been free of that horrid little notebook of nonsense, and now I have drawn it to his attention by ranting about it. I should watch my big mouth.) However, on checking the Amazon.com reviews zeiran found that my harsh views of the book are very much a minority opinion. So he asked me:

Is there an objective final authority here, as far as disputes of grammar or style?

Regardless of authority, how should such disputes be best resolved?

Very good questions. A full reply would be a book about the whole notion of grammatical correctness. (One such book, very enjoyable and easy to read, is Proper English by Ronald Wardhaugh, published by Basil Blackwell in 1999; ISBN: 0631212698; $28.95 in paperback; yesterday for some reason I mistakenly gave this title as "Proper Grammar" despite actually having the book in front of me when I wrote, but I have now transcribed it correctly.) But I can offer a short answer along the lines of my reply to Zeiran.

The first thing to say is that the only possible way to settle a question of grammar or style is to look at relevant evidence. I suppose there really are people who believe the rules of grammar come down from some authority on high, an authority that has no connection with the people who speak and write English; but those people have got to be deranged. How could there possibly be a rule of grammar that had nothing with the way the language is used or has been used by the sort of people who are most admired for their skill with it? What motive could there possibly be for following some rule if it had no connection to the actual practice of the sort of people you would like to be counted among, or regarded as similar to, with regard to the use of the language? Face it: a rule of English grammar that doesn't have a basis in the way expert writers deploy the English language (or the way expert speakers speak it when at their best) is a rule that has no basis at all.

The reason the question can even arise at all is partly that Strunk and White fail to make that connection. The Elements of Style offers prejudiced pronouncements on a rather small number of topics, frequently unsupported, and unsupportable, by evidence. It simply isn't true that the constructions they instruct you not to use are not used by good writers. Take just one illustrative example, the advice not to use which to begin a restrictive relative clause (the kind without the commas, as in anything else which you might want). But the truth is that once E.B. White stopped pontificating and went back to writing his (excellent) books, he couldn't even follow this advice himself (nor should he; it's stupid advice). You can find the beginning of his book Stuart Little on the official E.B. White website; and you can see him breaking his own rule in the second paragraph. That isn't the only such example. (For another one out of the dozens I could give, see my post ‘Those who take the adjectives from the table’.)

Where, then, can one get evidence of what decent writers really do, as to what Strunk and White wrongly imagine decent writers do, given that they simply lie about it? The unhelpful answer would be that you read millions of words of fine prose and remember what you've seen. But there is a shortcut you can use to get to that evidence: get hold of a really good usage book. And the best usage book I know of right now is Merriam-Webster's Concise Dictionary of English Usage (ISBN: 0-87779-633-5). This book — I'll call it MWCDEU for short — is utterly wonderful. Detailed, but tight-packed, and great value (exactly 800 pages for $16.95 — roughly 2 cents per page plus the cost of a small regular coffee).

I own no stock in the Merriam-Webster company and get no commissions on sales. If they published a rubbishy book, I'd tell you. And if The Cambridge Grammar of the English Language were better for this purpose, I'd definitely say so; but it isn't — not if you want usage advice as opposed to systematic and detailed grammatical description. The Cambridge Grammar is big and somewhat technical, and doesn't cite literary examples, and it doesn't give advice. The book you need is MWCDEU. Throw your Strunk & White away, and hang the pages on a nail in the guest outhouse for emergency use. Or tear out the pages and use them as liner paper for the bottom of the parrot cage, if you have a parrot (change the paper at least weekly, and wash your hands afterwards). Then get hold of MWCDEU, and keep it away from the parrot (parrots are jealous birds and will tear up things they can see you value).

MWCDEU explains what actually occurs, shows you some of the evidence, tells you what some other usage books say, and then leaves you to make your own reasoned decision. It won't tell you either that you should split infinitives, or that you shouldn't. But it will give you a number of examples of writers who do, and point out that the construction has always occurred in English literature over the last six or seven centuries, and that nearly all careful usage books today agree it is entirely grammatical, and it will then leave you to decide.

In other words it treats you like a grown-up. Strunk and White treat you like the abused 9-year-old daughter of a pair of grumpy dads ("Omit needless words, damn you! And fetch my slippers. And bring his slippers too. Now fix our supper. And don't let us hear you beginning any sentences with however"). Don't put up with the abuse.

Thanks to Barbara Scholz for pointing out some errors in the first version.

Posted by Geoffrey K. Pullum at 01:25 PM

Subtle Distinctions

Claire Bowern at Anggarrgoon mentions an example of a grammatical contrast marked in a very subtle way. In Bardi, the transitivity of a verb is marked by the contrast between [n] and [ŋ] before [k]. A possibly even more extreme example occurs in Carrier:

  goh ʔʌji  "A rabbit is eating."
  gohʌji  "Something is eating a rabbit."

These sentences differ only in the presence or absence of a glottal stop at the beginning of the verb. Speakers of English find glottal stop very difficult to hear because for them it is automatic. In English, words that would otherwise begin with a vowel have a glottal stop inserted.

This glottal stop is the unspecified object marker. In the first sentence, it fills the object position, leaving the Noun Phrase goh "rabbit" to be interpreted as the subject. In the second sentence, with no glottal stop, the verb has no object marker. It must, therefore, have an overt Noun Phrase as its object. In this case, goh must be interpreted as the object.

You may have noticed that Carrier has [h] at the end of syllables. That isn't just a spelling convention; those are real [h]s. In the Southern dialects of Carrier the word for "okay" combines all of these difficulties. It is: [aʔah]. There's no glottal stop at the beginning, but there is an [h] at the end. To speak Carrier you have to get in touch with your glottis.

Posted by Bill Poser at 12:17 AM

January 14, 2005

The water tower was higher than they

Ian Frazier's personal history in the latest issue of The New Yorker (print edition, January 10, 2005) describes growing up in the town of Hudson, Ohio. This passage (p. 40) struck me as linguistically astonishing:

The town's water tower, built in the early nineteen-hundreds, was its civic reference point, as its several white church steeples were its spiritual ones. The water tower was higher than they, and whenever you were walking in the fields — the town was surrounded by fields — you could scan the horizon for the water tower just above the tree line and know where you were.

Higher than they? Yes, I know, in the most formal styles of Standard English the old lie about predicative NP complements of than being required to be in the nominative case is still honored. But somehow it seems even more ridiculous to confer this morphosyntactic honor on a pair of steeples — usually the sort of stuffy grammar books that require nominative complements of than illustrate with human NPs (He is taller than I). I could hardly believe Ian Frazier was serious. I stared at it for quite a while. Is he extraordinarily old? No; he was born in 1951, so he's only about 53. That's only about half as old as you'd need to be to believe that the nominative was obligatory after than. Could a copy editor have required that nominative case? I'm not sure. But to me, "higher than they" sounds more than just formal; it sounds way too strange to write. Especially when immediately followed by informal features like indefinite you ("whenever you were walking in the fields"). The Cambridge Grammar (p. 460) is a bit delphic about the matter (and happens to use only human-denoting NPs in the examples given), but stresses that the accusative is always clearly grammatical after than, even if the nominative is also permitted in some (not all) contexts. But old prescriptivists' myths about grammar die hard in the heart of America.

Posted by Geoffrey K. Pullum at 11:50 AM

Dave Barry, linguist

Dave Barry is retiring from journalism in order to devote himself full time to linguistics.

Well, that's not quite right. He's been more of a humor columnist than a journalist. And he's being coy about his future plans.

But Bryan Curtis sent up a balloon at Slate about Dave Barry taking over for William Safire at the New York Times. Curtis pretends that this is about socio-political commentary, with Dave in line to become the Times' resident libertarian. But surely what's really on the table is replacing Bill's On Language column with Dave's Ask Mister Language Person.

In academic settings where language is discussed, the Mister Language Person columns are already cited more frequently than any other modern source, with the possible exception of The Simpsons. At the University of Otago in New Zealand, for example, there is a course on "Writing for Psychology" whose entire section on "Mechanics and style of writing" consists of links to eleven Ask Mister Language Person columns, five additional Dave Barry selections on punctuation, and a poem (apparently not by Dave Barry) about the effect of psychotropic drugs on apostrophes.

And as Dave himself put it:

Mister Language Person is the only authority who has been formally recognized by the American Association of English Teachers On Medication. ("Hey!" were their exact words. "It's YOU!")

Let me say right up front that I'm a fan of William Safire's writings on language, so I would hate to see him replaced. But I can think of a long list of other NYT regulars that could become less regular without any protest from me. If the NYT can't pull the trigger, maybe the New Yorker should hire Dave Barry to play James Thurber to Louis Menand's E.B. White. (Seriously, Dave Barry is not a New York (whether Times or -er) person. But consider that Mark Twain spent much of his working life in Hartford, CT.)

I'm going to close with selected quotations from old Mister Language Person columns. But first I want to persuade you not to accept Dave's claim that his main themes are booger jokes and exploding livestock, and not to substitute Bryan Curtis' evaluation that Dave is really a political commentator. In fact, even outside of the Mister Language Person series, his central issues are linguistic ones. Consider his 11/30/2003 column on herring communication, featured in an earlier Language Log entry. He begins with a basic linguistic question, adding an etymological note in passing:

A question that we have all asked ourselves hundreds of times is: How do herring communicate?

I'm pleased to report that we may, at last, be getting closer to an answer, thanks to an important recent discovery by fish scientists. This discovery involves a bodily function that some readers may find distasteful to read about (even though I bet they do it) so before I tell you what it is, here is a:

WARNING TO PEOPLE WHO ARE OFFENDED BY THE PHRASE ''BREAK WIND'' -- The following paragraphs contain the phrase ''break wind.'' So if you don't want to see the phrase ''break wind,'' go read a classier part of the newspaper, such as the bridge column. Although if you think bridge players don't break wind, you are clearly not aware of the origin of the word ''trump.''

OK, now that we've gotten rid of Attorney General Ashcroft, let's get to the amazing recent discovery that has fish scientists in such an uproar, which can be summarized in three words:

Herring break wind.

After some discussion of the scientific background, he asks

The critical question now facing the scientific community is: WHY do herring break wind? Scientists quoted in the article speculate that the herring might be using these sounds -- which they make mainly at night -- to communicate with each other.

This raises another question: What, exactly, would a herring need to communicate? I mean, we're talking about creatures with roughly the same IQ as a Tic-Tac. They are not down there discussing Marcel Proust. My guess is they're probably breaking wind to convey extremely simple messages such as: ''Hey, it's dark!'' ''I know! The same thing happened last night!'' ''Who said that?'' ''Me!'' ''Who are you?'' '' A herring!'' ''Wow, that's amazing! I'm also a herring!'' ''Wow! I'm also a Yankees fan!'' ''Wow, that's amazing! I'm a Yankees. . .'' etc.

and in keeping with recent trends in the MLA, he problematizes the gendered status of these communications:

I asked [Dr. Ben Wilson] if, by any chance, the wind-breaking herring happened to be males. Because if they were, that might explain it: It is a well-known scientific fact that human males deliberately break wind purely for the sense of accomplishment it gives them.

But Dr. Wilson said he was unaware of any correlation between the sex of the herring and the FRT noise. He also noted that it's difficult to tell male and female herring apart. Maybe that's what they're communicating about: ''Hey, you want to mate?'' ''Sure! My name is Bob!'' ''Hey, my name is Bob, too!'' ''UH-oh!'' etc.

OK, enough background. Here's a Mister Language Person sampler:

Q. Please explain how to diagram a sentence.
A. First spread the sentence out on a clean, flat surface, such as an ironing board. Then, using a sharp pencil or X-Acto knife, locate the "predicate," which indicates where the action has taken place and is usually located directly behind the gills. For example, in the sentence: "LaMont never would of bit a forest ranger," the action probably took place in a forest. Thus your diagram would be shaped like a little tree with branches sticking out of it to indicate the locations of the various particles of speech, such as your gerunds, proverbs, adjutants, etc.

Q. Please explain the correct usage of the word "neither."
A. Grammatically, "neither" is used to begin sentences with compound subjects that are closely related and wear at least a size 24, as in: "Neither Esther nor Bernice have passed up many Ding Dongs, if you catch my drift." It may also be used at the end of a carnivorous injunction, as in: "And don't touch them weasels, neither."

Q. When should I say "phenomena," and when should I say "phenomenon?"
A. "Phenomena" is what grammarians refer to as a "subcutaneous invective," which is a word used to describe skin disorders, as in "Bob has a weird phenomena on his neck shaped like Ted Koppel." Whereas "phenomenon" is used to describe a backup singer in the 1957 musical group "Duane Furlong and the Phenomenons."

Q. Please tell me which is correct: ``Bud, you should never of fed them taffies to the dog,'' or ``Bud, you never should of fed them taffies to the dog.''
A. According to Strunk & White, it depends on the context.
Q. The context was a brand-new Barcalounger.
A. Whoa.

Q. What is the purpose of the semicolon?
A. It can be used to either (1) separate two independent clauses, or (2) indicate an insect attack.
(1) ``Well, I'm a clause that certainly doesn't need any help!''; ``Me either!''
(2) ``Be careful not to bump into that ;;;;;;;;;;;;;;;;;;;;;;;;;;; AIEEEEEEE!''

Q. In the song ``The Joker,'' what is the mystery word that Steve Miller sings in the following verse:
"Some people call me the space cowboy
Some people call me the gangster of love
Some people call me Maurice
'cause I speak of the (SOMETHING) of love.''
A. According to the Broward County Public Library, the word is "pompatus.''
Q. What does "pompatus'' mean?
A. Nothing. Steve made it up. That's why some people call him ``the space cowboy.''
Q. How come we say "tuna fish''? I mean, tuna IS a kind of fish, right? We don't say "tomato vegetable'' or "milk dairy product'' or "beef meat,'' do we? And how come we call it "beef''? How come we don't say, "I'll have a piece of cow, rare''? And how come we say "rare''? And how come the waiter always says, "DID you want some dessert,'' instead of, "DO you want some dessert?'' Does he mean, "DID you want some dessert, before you found those hairs in your lasagna?'' And how come everybody says "sher-BERT,'' when the word is "sher- BET''? And how come broadcast news reporters end their reports by saying, "This is Edward M. Stuntgoat, reporting.'' What ELSE would we think he's doing? Hemorrhaging? And how come some people call Steve Miller "Maurice''?
A. Those particular people call EVERYBODY "Maurice.''

Q. Please explain the correct usage of the phrase ``all things being equal.''
A. It is used to make sentences longer.
WRONG: ``Earl and myself prefer the Cheez Whiz.''
RIGHT: ``All things being equal, Earl and myself prefer the Cheez Whiz.''

Q. Is there any difference between ``happen'' and ``transpire''?
A. Grammatically, ``happen'' is a collaborating inductive that should be used in predatory conjunctions such as: ``Me and Norm here would like to buy you two happening mommas a drink.'' Whereas ``transpire'' is a suppository verb that should always be used to indicate that an event of some kind has transpired.
WRONG: ``Lester got one of them electric worm stunners.''
RIGHT: ``What transpired was, Lester got one of them electric worm stunners.''

Q. Please explain the expression: ``This does not bode well.''
A. It means that something is not boding the way it should. It could be boding better.

Q. Like most people, I would like to use the words ''parameters'' and ''behoove'' in the same sentence, but I am not sure how.
A. According to the Oxford English Cambridge Dictionary Of Big Words, the proper usage is: ''Darlene, it frankly does not behoove a woman of your parameters to wear them stretch pants.''

WRITING TIP FOR PROFESSIONALS: To make your writing more appealing to the reader, avoid ``writing negatively.'' Use positive expressions instead.
WRONG: ``Do not use this appliance in the bathtub.''
RIGHT: ``Go ahead and use this appliance in the bathtub.''

TODAY'S BUSINESS WRITING TIP: In writing proposals to prospective clients, be sure to clearly state the benefits they will receive:
WRONG: "I sincerely believe that it is to your advantage to accept this proposal."
RIGHT: "I have photographs of you naked with a squirrel."




Posted by Mark Liberman at 08:52 AM

January 13, 2005

Phonetic gaydar again

In response to an earlier Language Log post, Bert Vaux sent some information about work by a former student of his: Michael Schuler, " That's the gayest thing I've ever heard! The phonetic and phonological content of the perceptions of gay- and straight-sounding speech in English." MA thesis, Harvard University, 2003. I haven't seen a copy of Schuler's thesis, so I've based this post on a summary that Bert sent me.

Twenty male speakers of various sexual orientations made four recordings each: reading a list of words, reading from an economics textbook, acting out dialogue from a Fierstein play, and talking without a script about their freshman dorm. The speakers reported their sexual orientation, perhaps on the Kinsey scale.

Thirty listeners were asked to rate (selected parts of?) these passages on five seven-point scales: plain/dramatic, feminine/masculine, straight/gay, reserved/emotional, affected/ordinary. The speakers were also asked to rate their own voices on the same scales.

For all the forms of speech except for the word lists, the listener ratings on the straight/gay dimension correlated in a statistically-significant way with speaker self-reports of sexual orientation. The correlations were better for the acted passages than for the textbook reading, and better still for the free narrative. I assume (though I don't know) that the free narrative portions were chosen so that the content did not provide information about sexual orientation.

  Correlation between listener straight/gay rating
and speaker self-identification of sexual orientation
Word list
r=0.265 (p=0.304)
r=0.486 (p=0.048)
r=0.545 (p=0.024)
r=0.690 (p=0.002)

Schuler found that for speakers who were perceived as sounding "gay", listener ratings moved towards the "gay" end of the scale as the material moved from wordlists to textbook passages to play-acting to free narrative, while for speakers who were perceived as sounding "straight", listener ratings moved towards the "straight" end of the scale over the corresponding sequence of passages. He interprets this as suggesting that "both gay and straight speech are deviations from a mid point, rather than straight being the baseline".

Schuler also found that speakers' evaluations of their own voices were "inaccurate", meaning (I think) that they did not correlate well with listeners' evaluations.

Finally, (Bert reports that) Schuler found that most of perceptual effects in his study -- at least in the free narrative -- appeared to be due to pitch range, as opposed to vowel quality, consonant place, other characteristics that have been cited in other studies. However, I don't know any of the details about how he determined this -- I'll say more about this when I've seen a full copy. Meanwhile, here's his abstract:

The hypothesis underlying this research is that listeners can judge speakers based on phonological and phonetic features of their voice. Building on the work of Gaudio 1994, Crist 1997, and Rogers and Smyth (in press), data was collected to allow analysis of many features and many possible listener judgments; the results given in this paper are largely restricted to pitch features and listener judgments of sexual orientation. My findings verify Gaudio's claim that listeners largely can identify the sexual orientation of the speakers based solely on voice recordings, and that pitch properties in speakers' reading voices do not give strong indications to listeners whereby they can make accurate judgments of sexual orientation. The most important new finding is that gay-sounding speakers sound gayer when speaking freely than when reading and straight-sounding speakers sound straighter when speaking freely than when reading. This is important because it suggests that features that allow listeners to judge sexual orientation are more pronounced when speaking freely than when reading, and that the features are not necessarily absent when reading, just less extreme. This explains why the results of this study show many measures where pitch properties correlate with listener judgments of free response recordings while judgments of read passages show reduced or no correlation. Through this lens earlier literature can be reevaluated to the extent that small correlations based on read passages could reflect larger correlations that would exist in freer speech registers, if those registers had been tested. This presentation of the material uses extensive statistical analysis to draw its conclusions.

Bert sent audio samples of representative gay-sounding and straight-sounding speech from this study, but I don't know whether the speakers gave permission for their recordings to be published, so I'll try to find out before posting them.

The most important lesson that I'd like to draw from this study has nothing to do with perception of gender stereotypes in speech.

Experimental phonetics has become easy!

I don't mean to take anything away from Mike Schuler, who obviously put a lot of work into his project. But just a few years ago, he would have had to splice reel-to-reel audio tapes to make his test materials -- several times, in order to get randomized presentation orders. He would have had to spend many hours with expensive and cranky machinery in order to make phonetic measurements to correlate with listener judgments. He would have had to spend more long hours with a calculator in order to do the statistical analysis.

Today, everything can be done with an ordinary laptop and some free software.

[Update 1/21/2005: I've temporarily linked to a copy of (the body of) Michael Schuler's MA thesis, kindly provided via email by Michael himself, who is traveling in China. He wrote that he will put the thesis on the web himself, along with supporting materials including his recordings, when he gets back from his trip. ]


Posted by Mark Liberman at 02:22 PM

January 12, 2005


What do Garth Brooks' lines

I've got friends in low places,
where the whiskey drowns and the beer chases
my blues away.

and the FAA-mandated warning

Please move from the exit rows if you are unwilling or unable to perform the necessary actions without injury.

have in common? Read Neal Whitman on FLoP coordination (and here, since the Neal's old links don't work any more...) to find out.

[By a slip of the brain, the first version of this post attributed the FLoP discussion to Neal's brother Glen, also a blogger but in a different discipline. Sorry! ]

Posted by Mark Liberman at 10:02 PM

January 11, 2005

Overnegation alert

From a comment on Roger Simon's discussion of James Wolcott's attack on pro-war bloggers:

That Upper West Side *ex cathedra* tone of his indicates an utter lack of disinterest in anything resembling a debate or requiring any thought that cannot be contained in two paragraphs of pop cult snark. [emphasis added]

Some overnegations like fail to miss have become very common, but (at the moment) "utter lack of disinterest" has only one Google hit. The string "lack of disinterest" is commoner, though many of the examples seem to be correct uses meaning "lack of objectivity" or "lack of freedom from self-interest". Not all, though:

Each of the school board members knew what school I was writing about and not one of the nine ever visited the school to see for him or herself what was going on. In my opinion, that shows a great lack of disinterest on what is really going on in the schools.

I was once married to an illiterate moron who used phrases like "a complete lack of disinterest".

Even written true to character in LID (which she isn't on GL) I'm not as thrilled with her any more as I used to be. I don't know if it's just a general lack of disinterest in the character because there are other characters in the story that are more interesting to me.... FAR more interesting, or if the psycho Harley fans out there that have totally turned me off the character and BE as an actress in their continuous over-praising of her at the cost of other characters and actors (Lucy, Beth... Hayley Sparks and Amy Carlson) but I don't like her anymore.

i haven't updated 4 ages, due to a sudden lack of disinterest ...


Posted by Mark Liberman at 11:59 PM

The syntax and morphology of ex-parrots

If, like me, you're a fan of Monty Python's Pet Shop Sketch, you should enjoy Nominal Tense in Cross-Linguistic Perspective, by Rachel Nordlinger and Louisa Sadler, which appeared in the December 2004 issue of Language.

Posted by Mark Liberman at 05:54 PM

January 10, 2005


In reference to the rat speech perception story, David Beaver asks several good questions, including:

Are there types of pattern recognizer such that those recognizers can differentiate between certain classes of pattern they are presented with in one order, but not differentiate between those classes of pattern when presented in the reverse order?

This question has a particularly easy answer, which is obvious when you think about it. If we're talking about acoustic patterns in the natural world, then (most) such patterns share local properties that are very different from those of their time-reversed equivalents. As a result of these differences, any system that is used to encoding local acoustic properties of naturally-occurring complex sounds is likely to have trouble with time-reversed sounds. Specifically, many sound onsets have abruptly rising amplitude profiles -- bangs, pops, etc. -- while sound offsets mostly have more gradually falling amplitude profiles. This follows from the response of any resonant system (a room, a struck object, a vocal cavity) to an impulse-like excitation.

As a result, most natural sounds have very different local amplitude-contour statistics, overall or across frequency bands, from their time-reversed counterparts. You can see that in the following time waveform of the start of a bugle call:

and you can easily hear the difference (in the individual notes) between the original and time-reversed versions. Here's a short drum passage and a time-reversed version of the same file, making the same point even more strikingly. Imagine trying to learn to recognize a particular rhythmic pattern in each case...

Rats, like humans, have a lot of experience with natural sounds. Lab rats may have a fair amount of experience with human voices, but I don't think this is necessary. Any critter sensitive to the statistical properties of the signals that impinge on it -- and that means pretty much any critter at all -- will experience normal and time-reversed natural sounds in very different ways.

The fact that the signals used in (the forward vs. backward part of) this experiment were synthetic doesn't matter, since synthetic speech shares the relevant properties with natural speech.

It's easy to imagine that rats find it harder to learn to distinguish patterns in locally-unnatural acoustic stimuli. And the differences in the experiment were pretty small ones. Here are the actual results from the paper:

The definition of discrimination ratio is

The discrimination ratio was calculated by dividing the mean frequency of lever pressing in the first minute (A) of the 2-min interval after each sentence by the mean responses in A plus mean responses in the second minute of this interval (C). This operation gives values between 1 and 0. Values tending to 1 indicate a higher mean response in A than in C; values tending to 0 indicate a higher mean response in C than in A.

So in the forward condition, the rats trained on Dutch had a discrimination ratio of 0.491 for Dutch test sentences, and 0.407 for Japanese test sentences. This was a statistically significant difference, but it's not exactly an impressive level of overall performance. In the backwards condition, the rats trained on Dutch had a discrimination ratio of 0.401 for Dutch test sentences, and 0.420 for Japanese test sentences.

In other words, in all cases the rats are responding very nearly randomly, but in the forwards condition they responded just a bit (about .08) more often (in the first minute vs. the second minute after hearing the sentence) to same-language test material than to the new-language test material. This marginal effect did not hold in the backwards condition, which might be because the rats were better able to encode the natural patterns, or might be because they were distracted by the unnatural patterns (you might call this the "holy sh*t, what was that?" effect).

Both sorts of explanations might have played a role here.


Posted by Mark Liberman at 08:42 AM

? taR .rM, siht si egaugnal tahW

Various news organizations, like Al Jazeera and CNN, are running with a great new animal language story. You might prefer to get it from the horses mouth - full text of the Journal of Experimental Psychology article.

It turns out that Spanish rats can be trained to prefer synthesized pseudo-Japanese over synthesized pseudo-Dutch, or vice versa, more easily than Spanish rats can be trained to prefer backwards synthesized  pseudo-Japanese over backwards synthesized pseudo-Dutch, or vice versa. I'm not sure whether the rats actually consider themselves Spanish, as they reside in Barcelona. But they're hardly Catalan, as they come from a strain of rattus norwegicus which was originally cross-bred here in California. But as usual, I digress. Digression is so much easier than in the old days. How on earth did people manage to digress effectively before Google?

Anyhow, you can guess why the press is excited. A language log favorite. Talking animals. Indeed, the authors of the paper put their result in roughly the talking animal category, albeit in a much more finely nuanced way: animals are surprisingly well attuned to prosodic properties of language as against other physically similar stimuli. The researchers, Juan M. Toro, Josep B. Trobalon, and Nuria Sebastian-Galles, are sensible people, and do not take a Dolittlian inter-species communication or new age conclusion from this. Rather, they think it is evidence that in the development of human language, features already present in the mammalian auditory system were co-opted.

I am intrigued by the study, and I have the impression it was carried out carefully and effectively. But personally, I never had any doubt whatsoever that in the development of human language, features already present in the mammalian auditory system were co-opted. Moreover, I'm skeptical that Toro et al's study shows this. The problem is that Toro et al don't actually know which features of Japanese and Dutch were the ones that mattered, the relevant differences between the two languages that are more easily extracted forwards than backwards. And this leads me to some very general mathematical questions:

  1. Are there types of pattern recognizer such that those recognizers can differentiate between certain classes of pattern they are presented with in one order, but not differentiate between those classes of pattern when presented in the reverse order?
  2. Are there types of formal language recognizer (e.g. with a limited working memory, like just two states, or a limited stack, whatever) that can recognize classes of languages in one direction, but cannot recognize classes of language consisting of the same strings except in reverse?
  3. Are there types of learning algorithm such that these algorithms can learn to recognize certain classes of pattern presented in one direction but not learn to recognize the same classes of pattern presented in in the other direction?
  4. Are there types of  learning algorithm such that these classes can learn to recognize certain formal languages but not learn to recognize the languages which consist of the same set of strings except in reverse?

I'll put money on the answers being 4 * yes with even a quite modest definition of what a "type of pattern recognizer" etc is. In which case, the difference between Japanese and Dutch prosody might be intrinsically more learnable for a large class of abstract learning systems than is the difference between the reverse of these langagues. And this class of learning systems might well include every organic learning system that has ever muddied its feet, scales or other protuberances on our wonderful planet, not just mammals, and not just animals with auditory systems. In which case, all the results of Toro et al's study would show is that Dutch and Japanese evolved in such a way that they are potentially recognizable and learnable by some creature, whereas Hctud and Esenapaj did not.

Would this surprise us?

Posted by David Beaver at 02:24 AM

January 09, 2005

Horace and Quintilian on Correct Language

In writing about William Deresiewicz' review of David Crystal's The Stories of English, and MacNeil and Cran's Do you Speak American?, I complained that it's misleading to say that "the notion of [linguistic] correctness emerged only in the late 18th century".

This is an easy mistake to make, because it's certainly true that a new mass-culture concern about correct English emerged around that time, along with fully standardized English spelling, the use of standard French to establish a national culture after the French revolution, and so on. But that doesn't mean that linguistic norms and evaluation of usage didn't exist before that time, or that people didn't use legal and ethical metaphors in talking about such things.

For example, consider the passage in Horace's De Arte Poetica, written in the first century B.C., which is often quoted against prescriptivists because of its view that words will change "si volet usus / quem penes arbitrium est et ius et norma loquendi" ("if it be the will of custom, in the power of whose judgment is the law and the standard of language") [English translation here].

Earlier in the same section, Horace writes:

 ... brevis esse laboro,
obscurus fio; sectantem levia nervi
deficiunt animique; professus grandia turget;
serpit humi tutus nimium timidusque procellae:
qui variare cupit rem prodigialiter unam,
delphinum silvis adpingit, fluctibus aprum:
in vitium ducit culpae fuga, si caret arte.

I labor to be concise, I become obscure: nerves and spirit fail him, that aims at the easy: one, that pretends to be sublime, proves bombastical: he who is too cautious and fearful of the storm, crawls along the ground: he who wants to vary his subject in a marvelous manner, paints the dolphin in the woods, the boar in the sea. The avoiding of an error leads to a fault, if it lack skill.

Let's focus on the last line: "in vitium ducit culpae fuga..." (word-for-word = "to fault leads of error avoidance").

Lewis and Short's dictionary glosses vitium as "a fault, defect, blemish, imperfection, vice [and in particular] a moral fault, failing, error, offence, crime, vice"; and culpa is glossed as "crime, fault, blame, failure, defect (as a state worthy of punishment; on the contr. delictum, peccatum, etc., as punishable acts; diff. from scelus, which implies an intentional injury of others; but culpa includes in it an error in judgment)." Particular senses of culpa include unchastity, remissness, neglect and  social faux pas. The dictionary also indicates that vitium is widely enough used to mean "a fault of language" that this is given as a special sense.

So when Horace says that "in vitium ducet culpae fuga" ("flight from crime leads to vice"), he's describing alternative forms of bad writing using the two most loaded nouns for general moral and legal transgression that Latin had to offer.

Horace was writing about style and word choice, not grammar. But the prescriptive "rules" about Correct English are also often about style ("omit needless words", "write with nouns and verbs, not with adjectives and adverbs") or word choice ("don't use hopefully to mean 'it is to be hoped'"). Indeed, Deresiewicz leads off his review by complaining about a student who writes "The following methodology was utilized" rather than "The following method was used" or "This is what I did."

When Quintilian (1st century AD) offered advice about how to raise (male) children to speak correctly, he used the same sort of legal and moral language:

Above all see that the child's nurse speaks correctly. ... No doubt the most important point is that they should be of good character: but they should speak correctly as well. It is the nurse that the child first hears, and her words that he will first attempt to imitate. And we are by nature most tenacious of childish impressions, just as the flavour first absorbed by vessels when new persists, and the colour imparted by dyes to the primitive whiteness of wool is indelible. Further it is the worst impressions that are most durable. For, while what is good readily deteriorates, you will never turn vice into virtue. Do not therefore allow the boy to become accustomed even in infancy to a style of speech which he will subsequently have to unlearn.

... if it should prove impossible to secure the ideal nurse, the ideal companions, or the ideal paedagogus, I would insist that there should be one person at any rate attached to the boy who has some knowledge of speaking and who will, if any incorrect expression should be used by nurse or paedagogus in the presence of the child under their charge, at once correct the error and prevent its becoming a habit.

The Latin original uses words and phrases like vitiosus sermo (for "faulty, [or bad, or corrupt] speech"), recte loquantur ("speak correctly"), vitiose ("in a faulty [or bad, or corrupt] way"), corrigat ("correct [the error]"):

Ante omnia ne sit vitiosus sermo nutricibus ... Et morum quidem in his haud dubie prior ratio est, recte tamen etiam loquantur. Has primum audiet puer, harum verba effingere imitando conabitur, et natura tenacissimi sumus eorum quae rudibus animis percepimus: ut sapor quo nova inbuas durat, nec lanarum colores quibus simplex ille candor mutatus est elui possunt. Et haec ipsa magis pertinaciter haerent quae deteriora sunt. Nam bona facile mutantur in peius: quando in bonum verteris vitia? Non adsuescat ergo, ne dum infans quidem est, sermoni qui dediscendus sit.

... Si tamen non continget quales maxime velim nutrices pueros paedagogos habere, at unus certe sit adsiduus loquendi non imperitus, qui, si qua erunt ab iis praesenti alumno dicta vitiose, corrigat protinus nec insidere illi sinat, dum tamen intellegatur id quod prius dixi bonum esse, hoc remedium.

Quntilian's advice about nurses is nonsense. Upper-class American children today don't grow up to speak with the accent of their nannies, and I doubt that Roman children did either. And there's pretty good evidence that explicit correction of "incorrect expressions" has little effect, at least on young children. So Quintilian's advice is no better than that of his modern counterparts, but he's talking about usage using the same language of good and bad, right and wrong, correct and incorrect. I doubt that all people in all cultures think and talk that way, but it's not something that was invented at the end of the 18th century.


Posted by Mark Liberman at 01:57 PM

Homo Hemingwayensis

Over the past couple of years, there's been renewed controversy about the role of recursion in human language. Steven Pinker and Ray Jackendoff, in an article entitled "The Faculty of Language: What’s Special about it?", put it this way:

In a recent article in Science, Marc Hauser, Noam Chomsky, and Tecumseh Fitch (2002) offer a hypothesis about what is special about language, with reflections on its evolutionary genesis. ... HCF differentiate (as we do) between aspects of language that are special to language (the “Narrow Language Faculty” or FLN) and the faculty of language in its entirety, including parts that are shared with other psychological abilities (the “Broad Language Faculty” or FLB). The abstract of HCF makes the extraordinary proposal that the narrow language faculty “only includes recursion and is the only uniquely human component of the faculty of language.” (Recursion refers to a procedure that calls itself, or to a constituent that contains a constituent of the same kind.) ... The authors suggest that “ most, if not all, of FLB is based on mechanisms shared with nonhuman animals…In contrast, we suggest that FLN – the computational mechanism of recursion – is recently evolved and unique to our species” (p. 1573). Similarly ( p. 1573), “We propose in this hypothesis that FLN comprises only the core computational mechanisms of recursion as they appear in narrow syntax and the mappings to the interfaces”.

There are lots of interesting issues here. To what extent does human language rely on mind or brain systems that are unique to language as opposed to shared with other human activities? Are language-specific mechanisms qualitatively different (like the difference between auditory and visual spatial localization), or just slightly specialized instances of more general systems (like the difference between touch perception on the arm and on the leg)? Are human linguistic systems qualitatively different from analogues in other animals (like elephants' trunks) or just quantitatively different (say, just the consequence of bigger brains)? Or have qualitative differences emerged from quantitative ones? Pinker and Jackendoff offer a range of arguments in opposition to Hauser, Chomsky and Fitch, and you can go read their paper and see what you think.

Now, though, I'd like to come at this question from a different and less serious direction.

A piece by James Thurber entitled 'A Visit from Saint Nicholas (In the Ernest Hemingway Manner)' appeared in the 12/24/1927 edition of the New Yorker. It starts like this:

It was the night before Christmas. The house was very quiet. No creatures were stirring in the house. There weren’t even any mice stirring. The stockings had been hung carefully by the chimney. The children hoped that Saint Nicholas would come and fill them.

The children were in their beds. Their beds were in the room next to ours. Mamma and I were in our beds. Mamma wore a kerchief. I had my cap on. I could hear the children moving. We didn’t move. We wanted the children to think we were asleep.

One of the defining characteristics of this style is that the writer doesn't tell you much about how the pieces of the story go together, at least not explicitly. The usual sorts of discourse relationships exist among the phrases, but very little of this structure is encoded by phrasal embedding within sentences. Thus

We didn’t move. We wanted the children to think we were asleep.


We didn’t move, because we wanted the children to think we were asleep.


The stockings had been hung carefully by the chimney. The children hoped that Saint Nicholas would come and fill them.


The stockings, which the children hoped that Saint Nicholas would come and fill, had been hung carefully by the chimney.

As a result, there's very little syntactic recursion. In the cited passage, the only examples are a couple of nouns post-modified by prepositional phrases ("the night before Christmas", "the room next to ours"), which you could analyze as involving noun phrases inside of noun phrases, and a couple of sentential complements of propositional attitude verbs ("the children hoped that Saint Nicholas would come", "we wanted the children to think we were asleep"). If we take such structures to be limited in principle to one level -- as they generally are in practice in such writing -- then no recursion is really involved. And these simple structures can easily be replaced by even simpler ones, even less suggestive of recursion. Thus the possible NP → NP PP patterns like "the night before Christmas" and "the room next to ours" can be replaced by [N N] patterns like "Christmas eve" and "the next room". The sentential complements could be replaced by pronouns: "Saint Nicholas was going to come and fill them. So the children believed, anyhow", and "We pretended to be asleep. We wanted the children to believe that."

Now suppose that next week, explorers in the Himalayan rhododendron forest find a new hominid species, H. hemingwayensis. These secretive creatures have been able to hide away from humans for millennia, partly because they are so well adapted to their dense, mountainous habitat, and partly because they are quite smart. Their cleverness includes the ability to amuse themselves while hiding by engaging in vocal displays, known anthropomorphically as "discourses", which they use to form and maintain social bonds and to compete for social prestige. These displays have a complex structure, in most respects just like human language. There is a small inventory of meaningless phoneme-like elements -- different for each troupe -- out of which a large vocabulary of more than 10,000 well-individuated "words" is formed. These "words" are combined into "clauses", just as in human languages, and are "inflected" according to their sentential function. Most surprisingly, the "words" seem to have "meanings" just like human words do, and the "sentences" and "discourses" are put together in familiar ways that we find it easy to believe we understand. In fact, when we record and translate some of these "discourses", we find patterns that remind us of many familiar human stories, suggesting that over the millennia, there has been a steady trickle of cultural contact between these creatures and humans.

The only thing is, the communication system of H. hemingwayensis appears to have absolutely no recursion whatsoever. "Nouns" can be modified, but only in the parallel "big bad wolf" sort of way. There is no (syntactic) clausal subordination, and no relative clauses. There are no sentential complements, though pronouns and some noun phrases ("that idea", "his argument") can be used to refer to explicit or evoked propositions. When you translate one of their stories, it comes out just like Thurber's imitation of Hemingway:

It was Christmas eve. The grove was very quiet. No creatures were stirring in the grove. There weren’t even any rodents stirring. The baskets had been hung carefully from the upper branch. Saint Nicholas would come fill them. The children had that hope, anyhow.

And so on.

How would we react to this discovery? Would we say that H. hemingwayensis has language without recursion, or no language at all? I think the answer is obvious. In fact, we can ask exactly this question about the historical reaction to the particular modernist style that Hemingway exemplified. In effect, the reaction was just "gee, these people are writing in a curious and interesting way." As far as I know, no one ever said anything like "these people are not using the English language", much less "these people are not human".

Let me say that I don't think my H. hemingwayensis scenario is very plausible. It's hard for me to believe that any creature could develop anything much like human language without at least some limited form of recursive compositionality. By my hypothesis, H. hemingwayensis does have compositional syntactic and semantic structures up to the clausal level, and analogous sorts of structures at the discourse level. Given that much, I'd be surprised not to see some use of syntactic embedding that goes at least a step or two in the direction of recursion (though whether this is implemented by fully recursive mechanisms is another matter).

But it seems preposterous to claim that syntactic recursion "is the only uniquely human component of the faculty of language". As we've observed in other contexts, sometimes it takes a really smart person to have a idea like that.

[Some other Language Log posts that are relevant to this topic:

Hi Lo Hi Lo, it's off to formal language theory we go
Cotton-top tamarins: on the road to phonology as well as syntax?
Humans context-free, monkeys finite-state? Apparently not.
Parataxis in Pirahã


Posted by Mark Liberman at 09:58 AM

January 08, 2005

ADS Word of the Year

The American Dialect Society "Word of the Year" has been announced. By category, the winners are:

Word of the Year: red state, blue state, purple state, n., together, a representation of the American political

Most Useful: phish, v., to acquire passwords or other private information (of an individual, an account, a web
site, etc.) via a digital ruse. Noun form: phishing.

Most Creative: pajamahadeen, n., bloggers who challenge and fact-check traditional media. 36/52

Most Unnecessary: carb-friendly, adj., low in carbohydrates.

Most Outrageous: santorum, n., the frothy residue of [redacted].

Most Euphemistic: badly sourced, adj., false.

Most Likely to Succeed: red, blue, and purple states, n., the American political map.

Least Likely to Succeed: FLOHPA, n., collectively, the states of Florida, Ohio, and Pennsylvania, said to have been
important in the 2004 American presidential election. From the postal codes for the three states: FL, OH,
and PA.


Posted by Mark Liberman at 03:00 PM

Our indoor or outdoor pool

Like Mark, I'm at the LSA annual meeting at the Marriott in downtown Oakland. And in my room there is a binder detailing guest services, and on one page of that I read the following sentence. Study it closely.

During your stay, you can enjoy a relaxing swim in our indoor or outdoor pool or take advantage of our state-of-the-art equipment including StairMasters®, treadmills and LifeCycles®.

What does that mean to you? How many pools are there? The wordingis utterly baffling. If there is one pool, and it is indoors, they could just say "in our indoor pool". For an outdoor pool, they could say outdoor pool. Even if they didn't know (if the sheet had been printed from standard Marriott boilerplate before the hotel was designed and built), they could just have said "in our pool". But what could possibly justify saying "in our indoor or outdoor pool"? One has to assume that there are two, one indoor and one outdoor. Consider someone who said, "I will pick you up in my Ford or Chevrolet pickup truck." One would have to assume the man had two trucks.

But there weren't two pools. Barbara and I spent a frustrating and unhappy quarter of an hour roaming the 4th floor in swimming apparel, hunting for the alleged indoor pool. Outside it was grey, windy, and raining — California is being battered by winter storms right now. When we gave up the hunt and returned to our room to learn by phone that there was no indoor pool, we felt distinctly cheated. It's sometimes a rather subtle business to interpret badly or puzzlingly written prose (as you may recall from this recent legal story). But after long and embittered reflection, I see no possible honest construal of "in our indoor or outdoor pool" as uttered by a hotel that has just one forlorn pool outdoors on a rainy, windy rooftop.

Posted by Geoffrey K. Pullum at 01:15 PM

Deresiewicz on Crystal and MacNeil and Cran

Tomorrow's NYT has a review by William Deresiewicz of David Crystal's The Stories of English, and MacNeil and Cran's Do you Speak American?

Here's how it starts:

I came across the following sentence in a term paper recently. The student was about to describe how she had arrived at her conclusions. This is what she wrote: ''The following methodology was utilized.'' I see this kind of thing all the time. Not ''the following method was used''; not ever ''this is what I did.'' Like nearly all the students I've taught, this young woman has learned to believe that the English language does not have room for her. That it is a secret code known only to the initiated. That the language she speaks is uneducated, inferior and incorrect. Hence the corseted tone, the vocabulary that strains at sophistication, the way she absents herself from her own writing. This is a student who has been taught to worship the volcano god of Correct English.

In fact, there is no such thing as Correct English, and there never has been. That's why David Crystal, one of the language's leading scholars, titles his new history THE STORIES OF ENGLISH (Overlook, $35), plural. As Crystal shows, the notion of correctness emerged only in the late 18th century, the work of a few self-appointed authorities like the grammarian Lindley Murray and the pronunciation pundit ''Elocution Walker.'' Murray, Walker and their ilk believed the language had gotten out of control -- too many new words, too many regional accents, too many different ways of saying things -- and needed to be stabilized. Behind this linguistic anxiety lay an anxiety about status.

Strong stuff.

It seems to me that D's "young woman", with her "corseted" tone, is really worshipping the god of Fancy English, who is a different god from the god of Correct English -- and also different from the god of Simple English, to whom altars are often erected. And it's misleading to say that "the notion of correctness emerged only in the late 18th century". The notion of correct Latin and correct Greek had been at the center of European education for centuries at that point -- and for that matter, the notion of correct Sanskrit or correct Sumerian had emerged in other places much earlier.

Still, Lindley Murray and his ilk did start something, and what Deresiewicz has to say about it is worth reading.


Posted by Mark Liberman at 12:07 PM

Noi lai and contrepets

One of the talks I plan to hear at the LSA today is by Marcy Macken and Hanh Nguyen, on the topic "Phonological Constituents". According to a copy of the handout that I've seen, the part that looks like the most fun is about Vietnamese "nói lái" and their role in the work of the 18th-century poet Ho Xuan Huong.

You can think of nói lái as subversive communication by means of implied speech errors. For example, in the period after the fall of Saigon to the communists in 1975, residents would say of the obligatory picture of Ho Chi Minh that they would like to "lộng kiếng" = "frame (it in) glass", by which they meant that they would like to "liệng cống" = "throw (it in the) sewer" [from this site].

A series of posts on an archived newsgroup about nói lái in recent Vietnamese culture, starting with this one, include this assertion:

As of my own experiences, its uses proliterated in various places in Saigon during the period of 1985-89 (I left in 1989) as a street slang.

About 5-6 out of 10 words were "la'i" and the whole conversation can be conducted in this manner. Most of the times, it was spoken extremely fast to confuse others whose the conversation was not meant to. ;-)

The spirit of this kind of use in slang or disguised speech is similar to that of Cockney rhyming slang and recent French verlan, but the earlier historical practice seems most like the French contrepets. These are exemplified by phrases like "que votre Verbe soit en joie", which literally means "may your Word be in joy", but which expresses a less spiritual message if the indicated sounds (not letters!) are swapped: "que votre verge soit en bois" = "may your staff be of wood". (This reminds me again of the hypothesis that "some of the differences between American and French intellectual life can be explained by the fact that we Americans have the opportunity to get this sort of thing out of our systems in high school..., while the French, with their more formal and rigid educational system, do not".)

As John Balaban says of Ho Xuan Huong,

the greater part of her poems--each a marvel in the sonnet-like lu-shih style--are double entendres: each has hidden within it another poem with sexual meaning. In these poems we may be presented with a view of three cliffs, or a limestone grotto, or scenes of weaving or swinging, or objects such as a fan, some fruit, or even a river snail--but concealed within almost all of her perfect lu-shih is a sexual design that reveals itself by pun and imagistic double-take.

Macken and Nguyen go over this lu-shih:

Kiếp Tu Hành

Cái kiếp tu hành nặng đá đeo
Vị gì một chút tẻo tèo teo
Thuyền từ cũng muốn về Tây Trúc
Trái gió cho nên phải lộn lèo

Life of a Monk

The life of a monk is as heavy as carrying stone
Who cares about the little things
The boat of religion would want to go to Buddha's home
But the adverse wind came, and the halyard was entangled

Among the other nói lái here is the implicit transformation lộn lèo "entangled halyard" → lẹo lồn "copulating vagina", in which the tone sequence of the two syllables remains the same, but the rimes are exchanged. In the traditional Vietnamese terminology, the tone sequence (I think) is nặng + huyền (in both the original and the transformed version), and in the phonetic spelling that Macken and Nguyen use, the non-tonal part is [lon lɛw][lɛw lon].

Some French literary contrepets are intentional, just as the nói lái in Ho Xuan Huong's poems were, but others are probably accidental: "Il ôta lentement sa casquette et après avoir lissé les mèches luisantes, il la remit" [NABOKOV, La Vénitienne, cited here].

As secondary jokes, the French refer to la contrepèterie belge, where the output of the transformation is the same as the input (e.g. "Il fait chaud et beau"), and the contrepèterie britannique, which are the riddles I recall from junior high school that pose questions like "what is the difference between a woman in a church and a woman in a bath?"

[More on the (original Chinese version of the) lu-shih form can be found here.

John Balaban's translation of the cited poem is

The Lustful Monk

A life in religion weighs heavier than stone.
Everything can rest on just one little thing.

My boat of compassion would have sailed to Paradise
if only bad winds hadn't turned me around.



Posted by Mark Liberman at 08:24 AM

January 07, 2005

Homeric objects of desire

The Simpsons has apparently taken over from Shakespeare and the Bible as our culture's greatest source of idioms, catch phrases and sundry other textual allusions. It's especially rich in those meta-clichés that Glen Whitman dubbed snowclones. One that we've discussed previously is "I, for one, welcome our new __ overlords".

According to an apparently authoritative list, Homer Simpson uses the meta-cliché "mmm... ___" 97 times in 14 seasons, including 83 different values for the appreciated object. When I composed yesterday's post about the exam-eating dog, I started it with the title "K-9 Grading Corps", but it soon became obvious that it had to be called "Mmm, exams". A few seconds with google (unfortunately after I titled the post) taught me that the conventional spelling sets off mmm from the object of desire with three dots rather than a comma.

There's the usual spread of beliefs about how many m's to use. For the disjunction of "chocolate|beer|donuts" (the commonest authentic values), the counts are

Spelling Count

A histogram of the names of the authentically homeric objects of desire (according to D'oheth's transcription) is given below:

4 chocolate
4 beer
3 donuts
2 forbidden donut
2 sprinkles
2 sacrilicious
2 snouts
2 free goo
2 invisible cola
1 organised crime
1 fish
1 Marge
1 fifty-dollar pretzel
1 loganberry
1 slanty
1 re-circulated air
1 purple
1 turbulent
1 beer nuts
1 money
1 ham
1 barbecue
1 me
1 sugar walls
1 double glaze
1 trophy
1 farfetched
1 hog fat
1 grapefruit
1 something
1 Gummi Beers
1 delicious
1 nuts
1 crumbled-up cookie things
1 candy
1 double chocolate (gasp) New flavor, triple chocolate!
1 fattening
1 ovulicious
1 pistol whip
1 memo
1 pie pants
1 business deal
1 hamburger
1 free wig
1 mediciney
1 caramel
1 maca-ma-damia nuts
1 pie
1 unexplained bacon
1 horse doovers
1 strained peas
1 incapacitating
1 urinal fresh
1 open-faced club sandwich
1 shrimp
1 bad eggs
1 spaghetti
1 donuts
1 salty
1 various eggs
1 Danish
1 split peas with ham
1 elephant fresh
1 pancakes
1 chicken
1 steamed gentile
1 64 slices of American cheese
1 foot long chili dog
1 hug
1 hippo
1 McNuggets
1 potato chips
1 unprocessed fishsticks
1 convenient
1 pointy
1 bowling fresh
1 cupcakes
1 marshmallows
1 apple
1 the Land of Chocolate
1 feed
1 burger
1 Soylent Green

There are many common non-authentic values: "mmm... coffee" with 865 hits, "mmm... scotch" with 224, "mmm... licorice" with 124, "mmm... bourbon" with 107.

And even "mmm... linguistics" with 23. A new slogan: "Linguistics: almost 1/4 as popular as bourbon". As if, alas.


Posted by Mark Liberman at 08:12 AM

January 06, 2005

Mmm, exams!

I've never actually seen the proverbial newspaper headline Man Bites Dog, and I've never seen anyone actually use the proverbial unconvincing excuse "My dog ate my homework." But now I've seen photographic proof of a (professor's) dog eating a (student's) exam:

So it could happen.

This is a dog with many talents.

[Update: Abnu at wordlab sent a genuine Man Bites Dog link. Now I just need a credible "dog ate my homework" excuse story.]

Posted by Mark Liberman at 10:40 AM

Dinner at the L.S. Cabal

TstT and others have organized a linguistiblogosphere dinner Friday, 1/7/2005, at the linguists' secret annual cabal in Oakland.

In truth, the LSA meeting is so far from secret that its program is on the web for all to read. This is in contrast to the program for the recent MLA meeting in Philadelphia, which you can only access on the web if you have a membership number and password. The LSA hasn't quite gotten to the stage of putting abstracts as well as titles on line, but I attribute that to lack of focus on communication via the web, not to any principled opposition to open access.

The dinner is open to all, but let TstT know if you plan to come (by adding a comment to his post), so that he can keep the restaurant informed.

[Update: Mike Albaugh wrote

As a Language-log fan, I was struck by the juxtaposition of two articles. One was about the "Language Cabal" dinner in "Oakland", another about the amount of context brought to bear in disambiguation. Not actually being part of the "Language Cabal", I was curious which of the two Oaklands I am familair with (let alone the numerous Oaklands there must be) was the site of the secret meeting. Visiting the site did not provide an immediate answer, until I noticed that the area code given for the two "overflow" Hotels was (510), which means this would be the Oakland dismissed by Getrude Stein, not the one famous for its Cathedral of Learning. Not sure exactly how the Semantic Web is going to handle this. Hubert Dreyfus keeps bubbling to the surface of my consciousness.

I thought about adding "Oakland CA 94607", with a link to the convention center, but then I thought "naw, the link to the meeting program should be enough."

Ironically, it seems that everyone misinterprets Gertrude Stein's famous remark: "The trouble with Oakland is that when you get there, there isn't any there there". This comes from a lecture she delivered on a visit to Mills College in 1934, when she was 60 years old, and she apparently meant that by then, the Oakland of her youth in the 1880's had completely disappeared. Most people take her phrase to be a clever way of putting down boring cities with no real civic life, whereas she apparently meant it as a clever way of saying that you can't go home again. That's the thing about modernism -- even when you think you know what it means, it means something else entirely.

Even Oaklanders have apparently decided to go along with the convention in this case. According to the Gertrude Stein bio page cited earlier:

...a developer in Oakland who has been going through the downtown restoring and renovating older buildings keeps Stein's memory alive through his triumphant banners. As each building is completed, he flies a green flag that states emphatically "There!"


Posted by Mark Liberman at 09:16 AM

Military Dialects

You don't have to go from one country to another, or even from one region to another, to encounter the sort of potentially troublesome differences in interpretation that Mark mentioned in his discussion of an article in the Economist. Long ago I heard the following explanation of how the order to "secure a building" will be interpreted by the various branches of the U.S. armed forces:

The Marines
will form a landing party and assault it.

The Army
will occupy the building with a troop of infantry.

The Navy
will send a yeoman to ensure that the lights are turned out.

The Air Force
will obtain a three year lease with option to purchase.

Posted by Bill Poser at 02:43 AM

And now to revive Cornish?

I love The Economist, as you know. But I have to admit that almost every story they do on language is goofy. (Perhaps everyone goes goofy when they talk about language, we often remark as we chat around the water cooler in our new office building, Language Log Plaza.) The December 18 issue reports very credulously on a putative revival of the Cornish language — the Celtic language once spoken in the extreme south-west promontory of Great Britain. I don't doubt that there are enthusiasts who study Cornish from available descriptions and surviving texts, but these are hobbyists, not speakers. I don't want to upset them, but I have to tell you, there is not going to be any revival of Cornish that turns it into a living language again. To suggest otherwise would be to utterly trivialize the issue of language endangerment.

The reputed last native speaker of Cornish, Dolly Pentreath, died in 1777 with no one left to speak the language to. British editions of The Guinness Book of Records used to say that in the paragraph about languages in Britain with most and least numbers of speakers. But by about 1970 someone had persuaded Guinness Superlatives to drop the reference to Dolly's death completely and instead to mention gratuitously that "A movement exists to revive the use of Cornish." Suspicious: where did Dolly go?

By 1983 someone had discovered evidence of a man called John Davey who supposedly spoke Cornish fluently and lived until 1891, and the Guinness Book mentioned that fact. (They don't mention, though, that a 1922 document quoted here says that "Had not the whole history of the language prepared us for such neglect, it would have seemed far less credible that as recently as 1891, the year of his death, John Davey's Cornish, like that of Dolly Pentreath or William Bodener a century earlier, was allowed to perish unrecorded, than that at so late a date a man still lived who could recite some traditional Cornish. Less astonishing, but even more sad, is that not one word of all his store is known to his descendants today, although it is well remembered that he possessed it." It seems clear that no one really has any data on whether he spoke it fluently or not. From what I know, I would suspect he did not.)

Now it is reported in The Economist that yet another Cornishman, named Henry Jenner, could speak Cornish fluently around the same time, and so could his wife, and in 1896 they started "reviving" the language.

And The Economist goes on to talk breathlessly of lessons in Australia (enrollment: 15), proficiency exams, weekly news broadcasts, and (I swear to God) a Britain-only Christmas Day episode of The Simpsons in which Lisa shouts a liberation slogan (Rydhsys rag Kernow lemmyn!") in the Cornish language. The article admits, though, that the modern descendant of this movement is "plagued by squabbles, particularly among the academics specializing in Cornish" (what? linguists squabble?), there being "four rival versions of the written language, each with differing degrees of authenticity, ease of use, and linguistic consistency." In other words, even the written form of the language is not clearly preserved in a definite form, as opposed to hinted at in various fragments in different orthographies.

Let me remind you what is necessary for a language to be living: there must be little kids who speak the language with each other because it is their only language or else their favorite. Little kids who would speak it even if they were told not to. It is not enough that a community of grownups (squabbling or not) has learned it from books and reads to each other each Tuesday night in someone's living room.

Cornish is dead, sadly. Stone dead. No one alive has ever heard a conversation between two native speakers in it, let alone lived with them for long enough to acquire native competence as a child. Talk of "the beginnings of official recognition from both the European Union (EU) and Whitehall" (the U.K. government), and a county council planning "to use Cornish ‘where practicable’," is simply nuts — a philological hobby out of control, and national governments gone mad.

Don't get me wrong: I don't want Cornish to be dead; I love languages, and I traveled on my own initiative to the Isle of Man when I was an undergraduate in 1968 so that I could go and see the last native speaker of Manx, Ned Maddrell, and record him telling a story. I would have gone to see Dolly Pentreath too. But I was 200 years too late. Everybody is.

Re-read what my colleague Jim McCloskey said (here) about Irish: it will be dead in thirty years, and nothing can save it now, despite thousands and thousands of enthusiastic second language speakers who will keep a different form of it alive for a generation or two. That's a language with several thousand native speakers alive now who you can go and speak to any time you like. And still Irish is dying, and its death apparently cannot be stopped. With Cornish the chance of doing anything was pretty well over before the 19th century began.

Always remember this, as we head into the sad time of massive language extinctions that is coming. Ask around the village and find the age of the youngest people using a language every day for all their normal conversational interaction. If the answer is a number larger than 5, the language is probably dying. If the answer is a number larger than 10, it is very probably doomed. If the answer is a number larger than 20, you can kiss it goodbye right now: no amount of nostalgic appreciation of it will make it last even one more generation as a going concern. That's the way languages are. And it's the way Cornish once was. It would not be sensible for the EU to encourage the idea of adding it to the already frightening list of languages they have to arrange translation into and out of.

Posted by Geoffrey K. Pullum at 01:48 AM

January 05, 2005

Receiving what condition their condition is in

Francis Heaney responds to Geoff Pullum's story of recorded alarms at PHL with an account of "the idiosyncratic language used whenever a fire alarm is set off in the Conde Nast building":

"May I have your attention, please. May I have your attention. This is your fire safety command station. We have received a condition on the 16th floor."

A condition is generally not a good thing to have or to receive:

So if you already know you have a condition that's linked to infertility, even one miscarriage is cause for concern and testing.

If a physician wants to report a patient who has a condition making driving unsafe, is there any legal protection for the physician making the report?

M2 Has a condition for which the degree of risk to public health or safety is not sufficient to exclude admission, but which risk should be considered in relation to other personal and social criteria.

One out of 14 have a condition that substantially limits one or more basic physical activities.

ERZ017012E, Program 'programName' received an unexpected condition conditionNumber.
Explanation: A program received a condition that is not a CICS condition.

Skill Evaluation. If your performance in a skill area does not meet the standard, you will receive a "condition" in that area.

Every time we will receive a condition of life, a circumstance that's painful or humiliating, and receive it from God's hand as coming from Him, we have an opportunity to learn of Him.

As we find, there have been outside influences which have produced a real nervous shock to the system in such a nature that the reflexes from the cerebrospinal centers, and the cerebrospinal center itself, have received a condition which prevents their coordination.

Given the several apparently neutral meanings of condition as "a mode or state of being", "state of health ... readiness or physical fitness", "existing circumstances", etc., you'd expect that a condition on the 16th floor might sometimes be a very good thing, say a prize for floor of the month, or leftover food and drink from the Directors' meeting. But not likely. As Francis points out, the best you can realistically hope for is that the announced condition turns out to be "unwarranted". I wonder if the "disease or physical ailment" meaning has infected the other uses of the word, or if other negative connotations (perhaps from the language of negotiation) have tainted the "state of health" and other meanings?


Posted by Mark Liberman at 06:23 AM

Clarity and respect

Back in September, I somehow missed this column in the Economist, presenting some amusing entries from two allegedly genuine phrasebooks. One decodes Britspeak for Dutch businesspersons, and another translates French phrases for British diplomats (specifically for officials attending the meetings of the European Union's Council of Ministers). The column's point is that the EU's translation woes only scratch the surface of the problem, since "cultural differences mean that a literal understanding of what someone says is often a world away from real understanding". Americans, who don't care a lot about the EU's translation woes, may still find the explanations helpful.

From the first phrasebook (with some omissions filled in):

British Phrase Apparent meaning Correct translation
"Up to a point" "Partially" "Not in the slightest."
"I hear what you say" "I accept your point of view" "I disagree and I do not want to discuss it any further."
"With the greatest respect" "I respect you" "I think you are wrong, or a fool."
"By the way/incidentally" "This is not very important" "The primary purpose of our discussion is ..."
"I'll bear it in mind" "I'll take care of it" "I'll do nothing about it."
"Correct me if I'm wrong" "I may be wrong, please let me know" "I'm right, don't contradict me."

From the second:

French Phrase Literal Translation Idiomatic Translation
"je serai clair" "I will be clear" "I will be rude"
"Il faut la visibilité Européenne" "We need European visibility" "The EU must indulge in some pointless, annoying and, with luck, damaging international grand-standing."
"Il faut trouver une solution pragmatique" "We must find a pragmatic solution" "Warning: I am about to propose a highly complex, theoretical, legalistic and unworkable way forward."

Unfortunately, no bibliographic information is given. The diplomat's lexicon may be an unpublished private joke, but the British phrasebook for Dutch businesspersons should be findable, if it's really real.

I'm reminded of the well-known phrasebook of science writing, available in various versions, with translation pairs such as

"It can be shown" Somebody said they did this, but I can't duplicate their results. I can't even find the reference, or else I would have cited that instead.
"It is not unreasonable to assume" If you believe this, you'll believe anything.
"It is believed that ..." I think that ...
"It is generally accepted that ..." A guy in a bar once agreed with me.
"It is widespread knowledge that ... " Two guys in a bar once agreed with me.
"It is universally accepted that ... " The bartender agreed too.
"Typical results are shown" The best results are shown, or the only results are shown.
"of great theoretical and practical importance" interesting to me
"It is to be hoped that this paper will stimulate further work in the field" This paper is not very good, but neither are any of the others on this miserable subject.
"Thanks are due to X for assistance with the experiments and to Y for valuable discussion" X did the work, and Y explained it to me.

There are many other such lists, especially purporting to explain men to women or women to men, though I can't find any good ones at the moment.

[Economist column via Jurieland]


Posted by Mark Liberman at 04:52 AM

January 04, 2005

Hip to be square

Cameron Majidi sends in a possible eggcorn sighting, from (of all places) Philosophers' Carnival VII: the Holiday Edition.

A carnival, in this sense, is a regular round-up of recent posts from a topically-defined region of the blogosphere. Like a traveling show, a blog carnival usually moves from (blog) place to (blog) place. Thus the Philosphers' Carnival page tells us that Philosophers' Carnival #1 was hosted by Richard Chappell at Philosophy, et cetera, and Carnival #2 by Brandon Watson at Siris, and so on to the seventh and latest, hosted by Chris at Mixing Memory. For a small sampler of other blog carnivals, check out the Carnival of the Canucks, the Carnival of the Capitalists, the Carnival of the Cats -- and that's just part of the letter C!

Anyhow, Cameron draws our attention to this item in P.C. VII, which references a post at The Leiter Report entitled "Drebenized":

Burton Dreben is known for believing

Philosophy is garbage. But the history of garbage is scholarship.


Nonsense is nonsense, but the history of nonsense is scholarship.

In response, John Rawls writes:

The crucial questions in understanding Burt's view are: What is philosophical understanding? What is it the understanding of? How does understanding differ from having a theory? I wonder how I can give answers to these questions in my work in moral and political philosophy, whose aims Burt encourages and supports. Sometimes Burt indicates that my normative moral and political inquiries do not belong to philosophy proper. Yet this raises the question. Why not? And what counts as philosophy?

Brian Leiter (who is responsible, indirectly, for the square quotes around the word "analytic," by the way) wonders what philosophers think about this. Read the whole post, and let him know in a comment. [emphasis added]

As Cameron points out, the phrase "square quotes" seems to be a malapropism in which square is substituted for the similar-sounding word scare.

Sometimes such word substitutions are a sort of word-level typographical error, where the speaker or writer knows perfectly well what the right word is, but produces the wrong one by a slip of the brain. Alternatively, someone may make the substitution confidently and reliably, having learned the wrong word in the first place. This is usually the result of substituting a familiar word that makes sense in context for an unfamiliar or unexpected one, either as an individual creative act or due to being led astray by someone else. In junior high school, I read about (what I thought must be) "maniac depression", and I recall being puzzled that the second 'a' kept being omitted, over and over again, in printed references to this condition. Or the substitution may be a deliberate piece of word play. Since that's the most charitable interpretation, I'll adopt it here, and assume that Chris at Mixing Memory meant to suggest that analytic philosophy is square while continental philosophy is hip. Or perhaps, more precisely, that analytic philosophy is "square" while continental philosophy is "hip". Or even that "analytic" philosophy is "square" while ...

The net has several hundred other examples of this substitution. Some are probably jokes; others are pretty clearly brainos, as in the comment on this post which uses "scare quotes" early on and then "square quotes" in the same meaning a few lines later; and with others, we can't tell.

Cameron's email ended with the observation that some East Asian orthographies really do use "square quotes", 「as in this example」, making the scare quotessquare quotes substitution all the easier for those who may be familiar with such symbols. By coincidence, this same fact once led me (and no doubt some others) into a different word substitution error in linguistic terminology. A traditional term for Japanese pitch accent, dating at least to the 1950s, was "accent kernel". Although I don't know the history, I assume that this term was intended to evoke the metaphor of the accent as a sort of central core, located within a particular syllable, whose larger envelope of acoustic effects was distributed more widely in the surrounding utterance. In any case, I first encountered this term when I was a graduate student, in a lecture given by a Japanese linguist, who also used symbols similar to "square quotes" as a convenient typographical representation for the points in a word or phrase where a pitch rise or fall is aligned, something like this:

= /syakai+seido/ "social system", with the accent kernels presented as square corners added with a pen to a typed handout, or written on the board in chalk. Combined with the speaker's difficulty with the English r/l distinction, this typography led me to adopt -- and for a while to use -- the term "accent corner".

Frankly, I've always felt that "accent corner" is a much better term than "accent kernel". And as Geoff Pullum has pointed out, you can't look up everything, though in this case I did learn the terminological truth, after a few months, when I read a 1950s-era paper that discussed Japanese prosody with reference to "accent kernels". Whether the proprietor of Philosophers' Carnival VII intended "square quotes" as a hip joke, or just produced it as a slip of the fingers because square is about 15 times commoner than scare, I'm glad to have it added to my vocabulary.


Posted by Mark Liberman at 11:53 AM

January 03, 2005

What can you Bret Easton Ellis to that?

Donna Malayeri sent by email one of the most blatant violations of Elmore Leonard's Third Rule of Writing that I've ever seen:

With the discussion on the Language Log about using certain nouns as verbs, I thought you might find this example interesting. From Glamorama by Bret Easton Ellis: "'As if', I Alicia-Silverstone-in-Clueless back at him." The sentence was hard to parse at first, but once I did, I laughed out loud. :)

I've seen longer and more distracting violations of the closely-related Fourth Rule, but Ellis certainly deserves some recognition for this effort.

Posted by Mark Liberman at 03:51 PM

January 02, 2005

Near? Not even close

From William Safire's 1/2/2005 On Language column:

''You have ruled out tax cuts,'' a reporter said to the president, ''and no cuts in benefits for the retired and the near retired.'' Then came the semantic zinger: ''What, in your mind, is 'near retired'?''

Bush half-answered that with a reference to ''our seniors,'' but let me deal with the dropping of the adverbial -ly and the overuse of near as a combining form. It became controversial with near miss, a nonsensical version of near thing; some of us patiently but uselessly pointed out that the writer meant ''near hit.'' Near miss has since entrenched itself as an idiom. (Idioms is idioms, and I could care less.) The abovementioned Vlad the Impaler refers to Russian speakers in the nations that broke away from the Soviet Union as the near abroad. And now we have Bush's near retired, presumably but not decidedly people approaching their 60's. Two paragraphs back, today's column was near finished. The compound nouns are chasing the adverbs out of the language.

Safire may well be right to flag an uptick in near -- even if it's based on only three phrases encountered over several years -- but his analysis is, well... Since I'm fond of defending Safire from my fellow linguists, I'll put it like this: Safire manages to fit four major analytic errors into 129 words, while simultaneously dissing the President of Russia and describing a press conference Q & A. You have to admit that the man has talent.

First, the phrase near miss has nothing to do with "dropping of the adverbial -ly". It involves a sense of near glossed by the OED as "Close to a goal, target, or object, or a perceived model", and supported with citations from 1530 onward. Since miss is a noun in this phrase, it should be modified by an adjective, not an adverb. I doubt that any competent speaker of English has ever been tempted to use a phrase like "whew, that was a nearly miss!". The case of the near abroad is slightly less clear, since abroad is not normally used as a noun, but the structure is clearly that of a noun phrase, and so near is probably an adjective here as well.

Second, the English word near, without -ly, has in any case been serving as an adverb since the time of Beowulf. The OED describes some of the options like this:

In purely adverbial use. Freq. with noun or noun phrase as complement (in dative in Old English). (When used with complement near can be analysed as a preposition.)

Geoff Pullum and CGEL would put this differently: like most English prepositions, near can be used either transitively (with a noun-phrase complement) or intransitively (without a complement).

Some of the old adverbial/prepositional uses of near are obsolete:

1533 J. HEYWOOD Mery Play 653 Stand styll, drab, I say, and come no nere.
a1616 SHAKESPEARE Macb. (1623) II. iii. 139 The neere in blood, the neerer bloody.
1485 MALORY Morte Darthur III. xiii. f. 58v, Her arme was sore brysed and nere she swouned for payne.

but many are now merely informal

1673 J. RAY Observ. Journey Low-countries 8 At near an hundred foot depth they met with a Bed or Floor of Sand.
1719 D. DEFOE Life Robinson Crusoe 269 It cost us near a Fortnight's Time.
1770 S. FOOTE Lame Lover III. 69 The knight is..very near drunk.
1836 T. C. HALIBURTON Clockmaker 1st Ser. xii. 99 It's near about the prettiest sight I know of.
1951 S. H. BELL December Bride II. xvi. 169 That woman near killed me! I was stooned for days after it!
1995 J. BANVILLE Athena 113 One of them gave the security guard a belt of a hammer and damn near killed him.
1990 G. G. LIDDY Monkey Handlers xv. 241 ‘That got that little pigster triangular folding bayonet?’ asked Saul. ‘Right... Ain't so little. Near nine inches long.’

or perfectly normal in all registers

1955 E. BOWEN World of Love ii. 46 She had gone on to make much of the rescued dress..finally hanging it near her window.
1941 J. AGEE & W. EVANS Let us now praise Famous Men (1988) 26 The land..was speckled near and far with nearly identical two-room shacks.

Finally, Safire's little tag-line about how "[t]he compound nouns are chasing the adverbs out of the language" sums things up nicely by missing three targets at once: there are no compound nouns in sight; at least 2/3 of his examples have nothing to do with adverbs; and to the extent that there's a change (in the uses of near or in the overall frequency of adverbs in -ly), it's in the opposite direction.

"Compound nouns" are things like ski lift, hair oil, or channel swimmer. In contrast, "near miss" is a simple phrase made up of an adjective near modifying a noun miss; "near retired" is apparently an adverb (or "intransitive preposition") near modifying an adjective retired (though in the phrase "the retired and the near retired", the adjective is used like a noun meaning "retired ones", as in "only the brave deserve the fair", or "the unspeakable in pursuit of the inedible", so maybe this is just adjective+noun again), and "near abroad" is probably also a simple adjective+noun construction.

The OED's entries on near suggest that its adverbial/prepositional uses have been fading over the past millenium or so, not gaining -- not that this is really relevant to most of Safire's examples anyway.

As for the frequency of adverbials in -ly, I don't know of any study of recent historical changes in their frequency, so here's a small start. The Atlantic Magazine's web page features links to articles from the magazine's past. The oldest one cited today is William Dean Howells' 1869 review of Mark Twain's Innocents Abroad. Howells' piece is 1,784 words long, and contains 19 -ly adverbs, or one every 94 words. In the current (Dec. 2004) edition of the same magazine, Lorrie Moore has a review of Alice Munro's Runaway. Moore's review is 1,914 words long, including 40 -ly adverbs, for a rate of one every 48 words. In other words, a 2004 book review in The Atlantic uses -ly adverbs at nearly twice the rate of an 1869 book review of similar length. I didn't count the compound nouns, but whatever they're doing, it's not chasing out the adverbs.

As Bob Dylan put it

You're very well read, it's well known.
But something is happening here and you don't know what it is,
do you, Mr. Jones?

Once again, I blame the linguists for failing to educate the public -- and the pundits -- in the basic techniques of grammatical analysis.

[Update: several people have written to point out that the phrase "near retired" has clearly reminded Safire of a prescriptivist bugaboo along the lines that "near miss should mean 'nearly but not quite a miss'", so that the "correct" term (for an attempt that barely fails to hit) ought to be "near hit." Safire half-agrees with this, but then excuses the common usage on the grounds that "idioms is idioms", rather than on the more straightforward basis that near has an acceptable meaning that would allow "near miss" to mean what it normally does. I recognized this issue but passed over it in silence. Safire might well be right that the normal interpretation of "near miss" is compositionally unexpected, I'm not sure -- either way, this question doesn't help us understand why President Bush might have said "the near retired" rather than "the nearly retired". I guess Safire's point might have been that "near retired" could in principle mean either "barely retired" or "almost retired", but in that case, he confused things from the start by bringing up "the dropping of the adverbial -ly", which is not relevant to any account of "near miss".

Some others have reminded me that near is etymologically a comparative form of nigh -- though the history is complicated, with the OED's discussion beginning"Old English nēah NIGH a. and adv. had a comparative adjective nēarra NAR a., and a comparative adverb nēar (also nēor, nȳr) NEAR adv. ..." This helps explain why (earlier forms of) near have been adverbs since before English was English, but it's not really relevant to Safire's paragraph about "the near retired". ]


Posted by Mark Liberman at 12:31 PM

Please evacuate the area: a speech synthesis story

While I stood in line for a slice of pizza at Sbarros opposite Gate 24 on concourse B of the Philadelphia airport, waiting to board my flight back from that great city yesterday, a loud alarm went off at Gate 23. A constant, strident, teeth-jarring, metallic scream tone started up (eeeeeeeeeee!). Overlaid on it was a sequence of rising alarm tones ("Whooeee! Whooeee!!") very much like the great call of a female gibbon. Then, added to these two earsplitting sounds, came a harsh, synthesized (or could it have been prerecorded?) male voice saying, with the familiar failure to destress grammatical items such as articles like the and auxiliary verbs like have and the official insistence on technical words like activate rather than set off: "Thee fire alarm in this area hazz been activated! Please move away and evacuate thee area!" (People were not eager to miss boarding their 100% full flight, so they stood up slowly and uncertainly, as if very reluctantly participating in a standing ovation for a someone who had never really been all that popular. The Sbarros staff stubbornly kept on handing out slices with pepperoni and mushroom: no sense in losing business just because the concourse is about to be engulfed in flames.)

After five minutes of this aural purgatory, humans intervened, but the result was just that a live female voice on the public address system was added over the top: "The fire alarm is a false alarm. Someone opened a security door. Please do not evacuate the area." The male voice (not being human) refused to be put off: "Thee fire alarm in this area hazz been activated! Please move away and evacuate thee area!" So the woman began to contradict the voice more emphatically, over the scream tone (eeeeeeeeeee!) and the gibbon call ("Whooeee! Whooeee!!") and the male voice ("hazz been activated!"): she insisted, "Repeat, the fire alarm is a false alarm! Please do not evacuate the area!"

It was a quarter of an hour in absurdist, synthetic audio hell. The people like Douglas "Hitchhiker's Guide" Adams who have suggested that our technology (particularly the technology incorporating mechanical speech) is annoying if not malign (remember the doors that said "Glad to be of service!" every time they opened?) are not wrong. They have barely scratched the surface.

Posted by Geoffrey K. Pullum at 12:57 AM

January 01, 2005

John Ford movie to be dubbed... in Irish

Jim McCloskey sent me this example of what he means about the new wave of enthusiasm for Irish as a second language that he recently wrote about here. It's from the Irish Times, Wednesday, December 29th, 2004, which refers to modern Irish as ‘Gaelic’.


Fans of `The Quiet Man,' filmed in Ireland in 1951, plan to dub the John Ford classic into Irish.

The Quiet Man Movie Club, which has 200 members worldwide, has lined up Irish actors to speak the lines of John Wayne and Maureen O'Hara.

Fora na Gaeilge and Udaras na Gaeltachta are ready to fund the project in conjunction with TG4 and an independent production company, Telegael. However, permission must be sought from Paramount Pictures, which owns the rights to the Oscar-winning movie. The classic has already been translated into 12 languages including French, German, Russian and and Japanese.

A spokesman for the Quiet Man Movie Club, Mr Des McHale, said: "If John Wayne can speak in German and Japanese, then why not Irish also?

"The film was mostly shot in Gaeltacht areas in Cos Mayo and Galway and it still has a huge following in the west of Ireland. It was the first Technicolor film to be shown in Ireland, the first that showcased the country abroad. It attracted thousands of tourists to the country.

"It's only fitting that the dialogue should be dubbed by the voices of native Gaelic speakers to help it win an new audience and add it to our rich Gaelic folklore."

Mr McHale, a mathematics professor at University College Cork, said the club had e-mailed executives in Paramount to see if they would give the green light.

The author of 50 books, Mr McHale recently published Picture The Quiet Man — An Illustrated Celebration.

The book contains rare photographs from the set of the film while it was shot here in the summer of 1951. It was launched in The Quiet Man snug in The Rathmines Inn in Dublin.

Among the attendance was the family of a Meath garda, Richard Farrelly, who wrote the film's theme song, Isle of Inisfree, while travelling on a bus from his home in Kells to Dublin.

The Quiet Man tells the story of retired prize-fighter Sean Thornton who returns to his Irish roots and falls in love with the fiery Mary Kate Danaher.

Posted by Geoffrey K. Pullum at 01:35 PM

Banned in Sault Ste. Marie

The Lake Superior State University 2005 List of Banished Words is out, and it includes not only body wash ("also known as 'soap'") and flip flop ("belong[s] at the beach, not in a political dialogue"), but also blog ("something that would be stuck in my toilet"). This will be about as effective as efforts to banish words usually are, but it's LSSU's yearly day in the spotlight. You can consult lists back to 1976 to survey the residue of crankiness past.

Posted by Mark Liberman at 11:20 AM

×åñòèòà Íîâà Ãîäèíà

According to the New Year Wishes Around the World web site, "Happy New Year" in Bulgarian is:

×åñòèòà Íîâà Ãîäèíà
Bulgarian is not one of my better languages, but that didn't seem quite right. If you copy this into a greeting card and send it to a Bulgarian friend, as the site suggests, I'm not sure what reaction you'll get. That last word looks a lot like an attempt at transcribing Howard Dean's notorious howl. You might lose a friend, or start a war.

A little investigation shows that the problem is that the web page tells the browser that it is encoded in ISO-8859-1, which is used for a lot of Western European languages, but the Bulgarian is actually in Microsoft's CP1251 encoding. What you're supposed to see is this:

Честита Нова Година

I'm afraid that isn't the only problem. The Hungarian isn't right either:

Boldog £j vet k¡v nok!
It isn't in ISO-8859-1 as the page claims to be, nor in CP1251 like the Bulgarian. It's in the IBM 852 encoding. It should look like this:
Boldog új vet kívánok!
A good New Year's resolution would be to use Unicode. It's a Good Thing™.

Posted by Bill Poser at 02:15 AM