Language Log: May 2004 Archives

May 31, 2004

Construal in Houston

According to the Houston Chronicle, "an 18-wheeler carrying 30,000 pounds of eggs overturned" today, "[sending] an avalanche of eggs sailing over the side of the overpass, crushing a state Department of Transportation truck at a construction site below". No one was seriously hurt, but the clean-up was apparently a messy, smelly business. The supervisor, Gary Babb, explained that "we were able to save a few cases of eggs, in case you need any." But when his co-workers brought lunch to the clean-up crew, said Babb, "They brought us scrambled eggs, you believe that? Sick sense of humor, these people."

We've now reached the essential "linguistic hook" that you've no doubt been waiting for. What's the structure of "Sick sense of humor, these people"?

It's related somehow to "These people have a sick sense of humor" -- but it's not quite right to take just any sentence of the form "These PluralNouns have a SingularNounPhrase" and transform it to "SingularNounPhrase, these PluralNouns". Or is it?

I queried Google with the pattern "these * have a", and tried transforming the first half a dozen examples with a suitable structure. Mostly not too bad, the results. Sounds kind of like one of those parodies of Bush 41 that used to be popular:

These cars have a lot of problems. ?Lot of problems, these cars.
These places have a presence of their own. ?Presence of their own, these places.
These eggs have a wonderful tale to tell. ?Wonderful tale to tell, these eggs.
These zills have a beautiful tone. ?Beautiful tone, these zills.
These comics have a lot of sex in them. ?Lot of sex in them, these comics.
These scenarios have a common theme. ?Common theme, these scenarios.

You definitely need some sort of modifier or quantifier, though:

These films have a plot. *Plot, these films.
These women have a dream. *Dream, these women.

And it's apparently not great for the complement of have to get too long:

These teams have a concentrated focus on the Xbox Full Spectrum Warrior gaming front!
???Concentrated focus on the Xbox Full Spectrum Warrior gaming front, these teams!

Definite subjects with determiners other than these are similar in quality:

The Amish have a distinctive culture. ?Distinctive culture, the Amish.
Those Germans have a word for everything. ?Word for everything, those Germans.

but plural indefinite subjects seem worse:

Stroke survivors have a high risk of dementia. ??High risk of dementia, stroke survivors.
Africa's economic problems have a medical solution. *Medical solution, Africa's economic problems.

though by now all the examples are starting to sound like the slurred dialogue of stereotypical drunks. And anyhow, surely there are some more general principles at work here...

[Update: Haj Ross asks

Q: is this mebbe the same rule that does

Your cousin has been working here a long time. -> (*has) been working here a long time, your cousin.

I'm not sure about this. Even without the postposed subject, you get things like the punchline of the famous story about "Silent Cal" Coolidge returning to his home town in Vermont after the end of his presidency; going to the local story, selecting a few items, and bringing them to the cash register, without saying a word; the storekeeper ringing up the purchases, also in silence, taking payment and making change, and then closing the encounter by saying

"Been away."

to which President Coolidge responded

"Ayah."

and left the store.

Of course, you could also just comment "Sick sense of humor." without the postposed subject, if you were a stereotypically laconic New Englander instead of a stereotypically garrulous Texan.]

Posted by Mark Liberman at 11:40 PM

Word puzzle! Word puzzle!

The abbreviation for preposition is PREP; and if you replace the first and last letters by the preceding letter of the alphabet you get the name of a kind of cookie: OREO!

Can you think of another common grammatical term which yields the name of a common snack food when you replace the first and letters of its common abbreviation by the preceding letter of the alphabet, boys and girls?

No, of course you frigging can't. There isn't one. Listen, I'm going to tell you something I've never told anyone else before. I hate those stupid word puzzles that they have Will Shortz doing on National Public Radio every Sunday morning with a random listener over a bad phone line and Liane Hansen gets all nervous and giggly and sympathetic and tries to help out the listener if he turns out to be the kind of moron who is unable to achieve the marvellously useful and interesting feat of thinking up a name of a farm animal that begins with the same letter as a farm implement or something.

I suppose some people would imagine a grammarian is the sort of pointy-headed dweeb who would simply love to wake up on a Sunday morning to hear someone answer a series of questions about names of cities that sound like Latin names for ecclesiastical garments or two-word phrases for types of criminal activity where each word begins with the letter the other one ends with and then be told that since they got 3 out of 10 they will get an NPR lapel pin and a paperback college dictionary. Well those people would be dead wrong. I loathe word puzzles and when Liane Hansen introduces Will Shortz my arm twitches even if I'm asleep and my hand zaps over to the RADIO OFF button so fast that it makes a swooshing noise as it burns through the air. I couldn't give a monkey's fart about word puzzles. I couldn't...

The expressive power of human language is barely adequate to convey the profound level of apathy word puzzles provoke in me. I despise them. Actually Language Log is a bit too public a place for me to share the full visceral force of my reaction; ask me about them privately some time and I'll tell you how I really feel.

Posted by Geoffrey K. Pullum at 09:56 PM

More Perfect

My grade 8 English teacher inisted on something perhaps even stupider than the obligatory omission of omissible that (that) Geoff Pullum discusses. She was of the view that one cannot combine more with an adjective like perfect that describes an absolute state. Her reasoning was that if it is correct to describe something as perfect, then the absolute has already been reached and no greater degree of perfection can be attained. This assumes a particular semantics for words like perfect, one that is plausible at first glance, but easily falsified. For instance, we can say that something is absolutely perfect, which wouldn't make sense if perfect already described the absolute state. I stand to be corrected by the semanticists, but it seems likely that when we say that something is perfect we mean that its degree of perfection falls within a distance d of absolute perfection, where the value of d is contextually determined. This allows us on the one hand to cut d down to 0 by specifying that something is absolutely perfect and on the other hand to talk about things approaching more and less closely to perfection.

I don't know if the demigods of usage have pronounced on this issue or not since I pay no attention to them, so I don't know if she got this silly idea from them, but she didn't cite authority as the basis for her views; she cited her incorrect semantic theory. Evidently she was not familiar with the Constitution of the United States, whose Preamble reads:

We the People of the United States, in Order to form a more perfect Union, establish Justice, insure domestic Tranquility, provide for the common defence, promote the general Welfare, and secure the Blessings of Liberty to ourselves and our Posterity, do ordain and establish this Constitution for the United States of America.

This is not only grammatical; it is elloquent. I can only endorse Linda Monk's proposal that the Preamble would be a far more suitable recitation for schoolchildren than the Pledge of Allegiance. Unlike the Pledge of Allegiance, it is not offensive to atheists, cannot be considered idolatrous, and expounds the values on which the United States was founded.

Posted by Bill Poser at 05:46 PM

Omit stupid grammar teaching

I talked recently with an undergraduate who told me something about her grammar instruction in the Los Angeles public schools. And in addition to the usual nonsense about not ending sentences with prepositions and never using "contractions" and things of that sort, she told me a new one. She was told that sentences like the one you are now reading are ungrammatical.

The alleged fault I'm alluding to here does not have to do with the fact that the main clause is passive, though I have often encountered absurd over-applications of the notion that passives must be avoided, so that would probably have been considered a second strike against it. No, the red sentence above has another feature that is supposed to be a grammatical sin. Sit awhile and try to figure out what, before you read on.

What my undergraduate student's high school English teacher insisted on was that you should look at any sentence containing the subordinator that and see whether omitting it would leave the sentence still grammatical. If so, then you must omit it, this teacher said. She would grade you down if you ever used that where grammar did not absolutely require it.

Think about that. The teacher is saying that these famous lines by Joyce Kilmer are ungrammatical:

I think that I shall never see
A poem lovely as a tree

She is saying the same about the first sentence of Wuthering Heights. And so on and so on. This is worse than bad English teaching. This is raving, blithering nonsense.

But I think I know where it comes from. I think it originates in an elevation of a stupid mantra to the status of a holy edict. The mantra is "Omit needless words," stated on page 23 of Strunk and White's poisonous little collection of bad grammatical advice, The Elements of Style, and elaborated on by E. B. White in the reminiscences of his introduction. It could be interpreted in a sensible way as a piece of advice for those editing their own writing: make sure you're not being too wordy (e.g., why say on a daily basis if you're trying to keep to a length limit and the phrase every day is shorter). But the teacher must have decided that the Strunkian imperative had to be obeyed literally and without question at all times, and that punishment must be meted out to those who do not obey. Fascist grammar.

If I have one ambition for my professional life, it is to do something to drive back the dark forces of grammatical fascism of this kind, to help get English language teaching back into a state where the things that are taught about the grammar of the language are broadly the things that are true, rather than ridiculous invented nonsense like that all words are forbidden except where they are required.

Posted by Geoffrey K. Pullum at 02:44 PM

Feint of heart

Nicole at A Capital Idea links to my post on participial relative clauses with whom, calling it "interesting". She adds a cautionary coda:

Warning: not for the feint of heart.

Since Nicole indentifies herself as "A newspaper copy editor [who] talks shop and invites you to do the same", I'm going with the hypothesis that the eggcorn was ironic.

Google reports 3,910 hits for "feint of heart", so it's a 3,910 whG pattern, weighing in at 912 whG/gp. By comparison, "faint of heart" comes in at 151,000 whG, or 35,238 whG/gp.

Score: correct spelling 39, wrong spelling (or irony) 1.

Proportionally speaking, that is.

Posted by Mark Liberman at 02:10 PM

In Memoriam

We don't usually post things here without a language hook, so in honor of Memorial Day I'll just put up a link to a post that I wrote last fall for Veterans Day -- though the language hook was vestigial at best -- and another to a (more linguistic) post about military modal logic.

Posted by Mark Liberman at 01:45 PM

Hey folks, "passive voice" != "vague about agency"

We've seen several examples of people who think that "passive" means "without an explicit agent". Here's another example, from Phil Dennison's weblog.

Dennison quotes the lede of an AP story:

Prosecutors dropped their case Friday against a security guard in the 2000 death of a man put in a choke hold during a shoplifting investigation — a case that took on racial overtones.

and complains that "[i]t just 'took them on,' out of the ether or the phlogiston, I guess. Just like that. Nobody’s fault, really". Dennison points out that you have to read to the end of the AP story to learn how the overtones arose, namely because of protests led by Al Sharpton. The linguistic criticism is fair enough. It's a political question whether raising the racial issue was to Sharpton's credit or due to his "fault", but either way, his agency deserves to be placed higher in the story.

However, Dennison starts his post by writing "Here is a great example of how to mislead readers by using the passive voice", and ends "Don’t use the passive voice in news stories, kids. Especially in news stories about people doing things to other people. It’s really, really dishonest."

Ironically, there's only one instance of the passive voice in the offending sentence, and it's not the one that Dennison complains about. He's annoyed about "a case that took on racial overtones", which involves an ordinary active use of the verb take, in the past tense. Removing the relative clause and making the subject definite for clarity, we get:

The case took on racial overtones.

A passive version -- at best marginally possible for me -- would be

?Racial overtones were taken on by the case.

The actual passive in the AP's lede is in the phrase

a man put in a choke hold during a shoplifting investigation

Again removing the (implicit) relative clause ("a man (who was) put in a...) and making the subject definite, we get

The man was put in a choke hold during a shoplifting investigation.

An active version would be

[Someone] put the man in a choke hold during a shoplifting investigation.

Ironically echoing Dennison's ironic complaint, we could say "he just was 'put in a choke hold' out of the ether? Just like that. Nobody's fault, really."

I don't know anything about the facts of this case, and I'm not trying to take sides for or against either Sharpton or Dennison. The AP story's lede choses to be vague about two questions of agency -- who choked the alleged shoplifter to death? and who raised the issue of race in connection with the case? But the AP writer achieves this vagueness by using the passive voice in only one of the two cases -- and it's not the one that Phil Dennison complained about.

Posted by Mark Liberman at 01:19 PM

Red vs. Blue reloaded

Kevin Drum at Washington Monthly posts a map of red vs. blue counties, and asks his readers "Can you guess what this map represents? Click the graphic for the answer if you give up. Hint: it's got nothing to do with politics." More discussion is here, here, here, here, and so on. A larger version of the map is here ( created by Matthew T. Campbell at East Central University in Oklahoma, based on a data from a survey by Alan McConchie located at this site). A larger (and more scientifically interesting) set of similar maps, presenting data gathered by Bert Vaux and others, can be found here.

[Kevin Drum link via Erika at KDT]

[Prior postings by Kerim Friedman (May 28) and and Irish Eagle (May 25).]

Posted by Mark Liberman at 11:47 AM

Another overnegation opportunity: yet vs. yet to

Glen Whitman of agoraphilia emailed to ask

You ended your post on the poetry of Rumsfeld with the following: "No one, as far as we know, has yet set to music the press releases of the Plain English Campaign." Isn't that one of those "double negations" you and your co-bloggers have discussed in some recent posts?

After all, I for one have not set those press releases to music, which means you can't (on a literal interpretation) say that *no one* has yet to do so.

I make plenty of mistakes -- although the rumor that Geoff Pullum gets teaching relief from UCSC in return for editing my posts is not true -- but I'm innocent in this case. The cited sentence wasn't a case of overnegation, because there's a difference between "(not) have yet V+en" and "have yet to V", and I used the first of these rather than the second.

Here's (the relevant bit of) the AH Dictionary's entry for yet:

1. At this time; for the present: isn't ready yet. 2. Up to a specified time; thus far: The end had not yet come.

We can just substitute "at this time" or "for the present" into the cited sentence, to clarify the meaning at the expense of complicating the form. Maybe "up to the present time" would be even clearer:

"No one, as far as we know, has up to the present time set to music the press releases of the Plain English Campaign."

That's just what I meant, and it has just the right number of negatives in it.

GW was thinking of a different usage of yet, which the OED gives as sense 2.c.:

2.c. Followed by an infinitive referring to the future, and thus implying incompleteness (e.g. yet to be done, implying ‘not hitherto done’; I have yet to learn, implying ‘I have not hitherto learnt’). Cf. also 5.

This yet is not a polarity item, but it does imply a negative:

(a) Kim has yet to arrive ⇔ (b) Kim has not yet arrived

My sentence was of type (b), but GW interpreted it as being of type (a) with an extra negative. In this case, I wasn't guilty -- but because this misinterpretation is only one little "to" away, at least with a verb like set whose past participle is the same as its bare stem, I probably should have chosen another wording.

Certainly plenty of others have made the mistake that GW attributed to me. There are 3,820 Google hits for "no one has yet to" -- that's 3,820 whG, or 891 whG/gp -- and all of those that I checked are overnegations (except for one or two that I couldn't understand):

(link) While no one has yet to describe England as the anti-Christ they have come close.
(link) No one has yet to compare these findings to possible early symptoms in men.
(link) ..no one has yet to beat my $12,000 pc
(link) The property... has been advertised for sale for nearly a year and a half, and no one has yet to purchase it
(link) No one has yet to figure out why ... they got it in their heads to film a "real lemming migration" ...
(link) No one has yet to sign on to star in the film.
etc., etc.

So we can add "no one has yet to" to the case of " fail to miss", as an example of a phrase that is almost always used to mean the opposite of its compositional meaning.

I guess that " construction grammar" implies that this is possible and even normal, but it still seems like a mistake to me.

A couple of other "yet" notes in passing.... We ought to be able to unify the AHD's two senses of this yet with a somewhat more abstract definition: "up to an implicitly specified time", where the time can be past (sense 2) or present (sense 1). The OED does this with its sense 2.a.:

2. a. (a) Implying continuance from a previous time up to and at the present (or some stated) time: Now as until now (or then as until then): = STILL adv.

This yet has become a "polarity item" (though other senses have not): the AHD's examples are negative for a reason. We no longer say "It's ready yet" meaning "it's still ready" -- we only say "it's not ready yet" or "is it ready yet"? There are plenty of other interesting semantic issues associated with the word yet, not least the question of how far to go in unifying its protean spread of structures and senses. Not all of the examples below are currently colloquial, but it's just as important to explain what we don't (or no longer) say, as what we do:

He may yet change his mind.
The Sekhti came yet, and yet again.
My sandals were worse yet.
Averse alike to Flatter, or Offend,/Not free from Faults, nor yet too vain to mend.
The tracks include..‘To Know Him is to Love Him’ (with David Bowie on saxophone, yet!).
A yet-warm corpse, and yet unburiable.
The swampy patches of yet unreclaimed forest.
This is the queerest thing yet!
Are we there yet?
Even yet not quite finished.
O merchants, tarry yet a day Here in Bokhara.
But there were..extensions of this practice as yet but little noticed.
As yet the Duke professed himself a member of the Anglican Church.
He was one of the numerous party of yet walkers in the world.
In the yet non-existence of language.
The splendid yet useless imagery.
Though his belief be true, yet the very truth he holds, becomes his heresie.
Surely I could always be that way again, and yet/ I've grown accustomed to her looks...

(quotes mostly but not all from the OED's citations)

Posted by Mark Liberman at 10:26 AM

A Prison Riot Over Strict Transitivity

Well, O.K., not a full-scale riot, but for a few minutes there I wondered if I was about to be in the middle of one. This happened years ago, but I was reminded of it by the recent Language Log posts on Strict Transitivity. I was teaching Introduction to Linguistics at a local maximum-security state prison in Pittsburgh, for the University of Pittsburgh's earn-a-degree-in-prison program, and my co-teacher and I had arrived at the topic of transitivity.

We asked the students whether some verbs had to be transitive. Yeah, a couple of them said, some verbs are only transitive. Like what? Well, there's find, that 's always transitive. No, said another student, you can use find intransitively. "NO you can't!", said the others. "YES YOU CAN" (he was shouting by this time), you can say "I looked all over the house for it, but I didn't find there, and finally I found in the yard." "NO YOU CAN'T!" (a growing number of the other twenty or so men in the class also began shouting) "THAT'S STUPID!" The holdout leapt to his feet and started waving his arms around: "YOU CAN TOO! YOU CAN FIND HERE, FIND THERE, IT'S INTRANSITIVE, IT'S FINE!" "OH NO IT ISN'T!" But at least the others stayed in their seats, so in the end Sasha and I did not have the opportunity to find out if the instructions we'd been given during the required orientation for working in the prison could actually be carried out ("In case of a threatening disturbance, show no fear, walk calmly to the wall and summon help by pressing the red button there"). (Yes, there was an actual red panic button attached to the wall of each classroom in the prison school.)

This incident left me with two main reactions. First, while it was in progress I envisioned the newspaper headlines -- "Volunteer Teachers Injured in Prison Riot over Transitivity" -- and thought that maybe that would finally convince the general public of the importance of training in linguistics. And second, I realized that I will never, never, never see a class of ordinary undergraduates getting so excited about a bit of language structure. It's not that I yearn for classroom riots, but I sure wouldn't mind transplanting some of the intellectual enthusiasm of my inmate students to my regular classrooms. (I've tried telling my classes that I wish they were more like prison inmates, but this doesn't seem to have the desired effect.)

Posted by Sally Thomason at 10:16 AM

May 30, 2004

Just as good for hate

It was the words "in Arabic" that truly shocked me. The source: Jeffrey Goldberg's long article "Among the Settlers" in the May 31 New Yorker (related slide show with audio here). The scene: outside Hadassah House, home to several families of Jewish settlers in Hebron, across the street from the Córdoba School for (Palestinian Arab) girls.

A group of yeshiva students appeared, walking in the direction of the Tomb of the Patriarchs. ... They had the wispy beards of young men who have never shaved.

Two Arab girls, their heads covered by scarves, books clutched to their chests, left the Córdoba School, and were walking toward the yeshiva boys.

"Cunts!" one of the boys yelled, in Arabic.

"Do you let your brothers fuck you?" another one yelled.

Raw, hostile, poisonous, overt, sexually-charged, ethnic hatred. And these young Hebrew-speaking men, isolated by soldiers from virtually all contact with Arabs, had taken the trouble to learn enough Arabic to be able to howl their filth in their victims' native language.

So many people talk about the need to speak a common language so that we can all get along — the dream of Esperanto. Never think that sharing a language is either a necessary or a sufficient condition for being able to empathize with other human beings or to treat them with humanity. The key difference between human language and, say, the hyperspecialized dance "language" of honey bees is that a human language can be used for propositional communications of any sort. They are infinitely adaptable. They are just as good for expressing hate as for anything else.

Posted by Geoffrey K. Pullum at 07:05 PM

Dog Latin of the day

Tom Friedman starts his NYT column today with this: "The American public has been treated to such a festival of mea, wea and hea culpas on Iraq lately it could be forgiven for feeling utterly lost."

In these latter days, that's about as good as we're going to get. For a glimpse of Dog-Latin as it once was, see Stevens' definition of a kitchen, from the entry for Dog-Latin in E. Cobham Brewer's Dictionary of Phrase and Fable:

As the law classically expresses it, a kitchen is "camera necessaria pro usus cookare; cum saucepannis, stewpannis, scullero, dressero, coalholo, stovis, smoak-jacko; pro roastandum, boilandum, fryandum, et plum-pudding-mixandum ..." A Law Report (Daniel v. Dishclout).

[Update: John Kozak emails:

The UK satirical magazine "Private Eye" has a running Dog Latin feature called "That Honorary Degree Citation In Full". Not online, sadly (as far as I know), but here's the current issue's offering:

SALUTAMUS BEII GEII TRES FRATRI CANTORES IN VOCE FALSETTO NOMINE BARRIUS, ROBINUSQUE ET MAURICIUS EHEU NUNC MORTUUS (QUONDAM UXOR 'LULU', DIVA CELEBRATISSIMA GLASWEGIENSIS CUM CARMINE POPULARE 'CLAMATE!') TRANSFORMAVERUNT MUNDUM MUSICAE DISOTECHNIS CUM JOHANNES TRAVOLTUS HOMO IN TUNICO BLANCO IN ARTEFACTO CINEMATICO 'FEBRUM NOCTIS SATURNALIS'.

John adds that

Now I come to think of it, more contemporary Dog Latin can be found in the spells in the Harry Potter books, of course (the Latin translation of HP&TPS leaves the spells "untranslated": they should be in mangled Greek, it seems to me).

]

Posted by Mark Liberman at 04:18 PM

Plummy

Americans' accents may be " flat", but at least they're not "plummy".

According to a 1999 BBC News article, Radio 4 is said to have dumped an announcer for excessive plumminess:

Outspoken journalist Boris Johnson claims he has been the victim of discrimination because his accent is too "plummy". ...

The Daily Telegraph columnist and newly-appointed editor of The Spectator magazine believes he has been the victim of what he calls "vocal correctness".

The article goes on to relay the suggestions of Gregory de Polnay, head of voice at the London Academy of Music and Dramatic Art, about how Johnson could "reinvent" his accent:

He said the journalist's voice was that of someone "used to commanding, used to being heard".

Standardising Johnson's vowel sounds would be the first hurdle.

"All those rather clipped vowel sounds that go with that accent we could iron out," said Polnay.

It's interesting that the BBC can write without irony about "standardising" someone's Received Pronunciation vowels. And to talk about "ironing out" vowels makes it sound like they are not "flat" enough -- is there a translatlantic flatness continuum here, with Americans having too much of it, and upper-class Brits not enough? De Polnay adds that

Johnson's "nasality" would also have to be addressed, ensuring he does not push the sound of his voice down through the nose.

"Somewhere along the line somebody has said that [nasal] sound can appear to be more authoritative," he said.

Perhaps it's pushing sounds "down through the nose" (from where?) that makes them "plummy"? But he OED is quite specific about what plummy means, and the nose is not mentioned:

1. Consisting of, abounding in, or like plums.
2. fig. a. Of the nature of a ‘plum’; rich, good, desirable.
2.b. Of the voice, then of sound gen.: thick-sounding, rich, ‘fruity’; indistinct; with bass predominating.

However, the citations for sense 2.b., which go back to 1881, seem to refer to personal or stylistic characteristics, or even the sound of certain kinds of amplifying circuits, rather than to social class. Class aside, it's clear that sometimes plumminess is a Good Thing and sometimes (despite sense 2.a.) not:

1881 Punch 23 July 25/2 The same aged lover was bidding, with rather a ‘plummy’ voice, the More-than-Middle-Aged Heroine ‘good bye for ever’.
1947 Jrnl. Inst. Electrical Engin. XCIV. IIIA 446/1 Such distortions can be tolerated..without serious loss of articulation, though the speech will usually sound rather ‘plummy’ and unnatural.
1951 K. HARRIS Innocents from Abroad 199 The rich, plummy voice of [actor] Edward Arnold.
1955 Times 3 May 14/4 A disc which sounds plummy and muffled in tone.
1965 G. MCINNES Road to Gundagai xi. 197 His voice..was wonderfully plummy and Edwardian.
1970 Daily Tel. 1 Sept. 9/5 All India Radiomodelled..on the BBC, even down to the plummy accents of its announcers.
1975 City Press 1 May 16/5 Her duchess on the make is a finely pointed performance, the plummy vowels contrasting splendidly with consonants periodically marred by the lack of false teeth.
1977 Early Mus. Oct. 549/3 The plummy..tone [of Flemish virginals] is evidently more popular than the musically versatile but astringent Italian virginal.
1978 Gramophone Feb. 1439/1 His tone is mellow, but again, as in the Waltzes..the sound sometimes seems a bit plummy and close.

In contrast, the Boris Johnson story emphasizes class associations, and so do most of the comments in a forum devoted to Brian Sewell's candidacy for being "more annoying than Mick Hucknall". The nomination features the plumminess of his accent as a key source of annoyance:

First, his unfounded acid criticism of just about everything: "Oh, of course one cannot take Mozart seriously, since he didn't have an overblown plummy accent like one's own."

Second, his overblown plummy accent. Like Jeremy Spake, I suspect that this is a deliberately exaggerated affectation which makes up for his lack of other noteworthy features.

The other forum commenters echo the class associations:

...vain, self obsessed snobby little bastard...

posh talking bastard. thinks he is better than anyone else.

Sewell acts as if he's little lord Fontleroy.

Just because he went to Public school and practiced Received Pronunciation behind the bike sheds ... doesn't make his opinion any more valid than any other yobbo.

I've got Brian Sewell down as a massive wind-up merchant.
The plummy voice CAN'T be real [...] & his comment on the subject of common people, other cultures, non-arty types etc. are just inflammatory for their own sake... As for posh? Nah, not with a name like Brian...

For those who (like me) have never heard of Brian Sewell, here's an (allegedly typical alleged) transcript, showing the content as opposed to the accent:

'Brian Sewell': "So, how does one get to Gateshead?"
Gatesheed Coonsil: "Well, you can take the train out of King's Cross..."
'BS': "The train?! and travel with.... *the masses*....?!?!"
GC: "I think you'll find that's what everyone else does..."

'Brian Sewell'": "I've heard that Gateshead is merely a small village, an insignificant backwater inhabited by uneducated, illiterate neanderthals who still live in caves? Is that true?"
Coonsil: "Naah, I think you'll find that's Sunderland."

You can check out the accent as well as the content in more detail on this Brian Sewell satire (?) site, where you'll find that his voice is not at all "plummy" in the OED's sense of "thick-sounding ... with bass predominating". Yet people today seem to accept "plummy" as a description of his way of talking, suggesting that it's the social class rather than the sound quality that has become primary.

[Note for Americans who (like me) don't follow British politics very closely: Boris Johnson has not suffered too much after losing his Radio 4 gig: Simon Hoggart wrote a few days ago in the Guardian that "in the fullness of time [Boris Johnson] will probably become prime minister". So apparently plumminess is not terminally out of fashion in the U.K. -- assuming, as I do, that Johnson did not take Gregory de Polnay's 1999 advice. I'm not sure whether Hoggart is serious, however, since most of his article deals with aspects of British culture that are opaque to me, such as what it means that Johnson "produced a used envelope and tossed it onto the table of the house", or why Mickey Fabb "will be polishing [Boris'] bat with linseed oil".]

[Update: Margaret Marks emailed a comment on Brian Sewell:

I absolutely agree that his voice is not what's usually called plummy.
Sewell was the boyfriend of Ant(h)ony Blunt, the Keeper of the Queen's Pictures who turned out to be Philby's third (or fourth?) man, i.e. a spy for the Soviet Union. I think he was about 80 when the news came out. His house was surrounded by journalists, who were dealt with by Sewell, unknown at that time, who obviously loved the limelight and had a variety of longwinded ways of saying 'No comment'. He does or did camp it up a lot, though. Deliberately exaggerated, as you say. A very upper-class accent.

A couple of relevant links are here and here.]

Posted by Mark Liberman at 01:03 PM

Strict transitivity

It sounds like Geoff's recent posts on transitive verbs have inspired a flood of activity in which people are seeking real linguistic examples related to a syntactic phenomenon, and that's all to the good. However, a quick perusal of the discussion suggests that people are taking an overly simplistic approach to the phenomenon in question.

There is a whole body of literature, dating back at least to Fillmore's work on definite and indefinite null complements, demonstrating that it is dangerous to think of the presence -- and particularly the absence -- of direct objects as a single phenomenon. For a start, "Habitual" or "characteristic" uses are well known to permit even the most classicially "rigidly transitive" verbs to appear without an object ("Pussycats eat, but tigers devour"); in addition, a verb's degree of flexibility with respect to omitting objects is influenced by aspectual considerations and the strength with which it selects the semantic category of its argument. Discourse factors can play a role as well.

All this is to say that "transitivity", in the sense of a dictionary's v.i. versus v.t., or in the sense of strict subcategorization frames, is not all it's cracked up to be, and it hasn't been for quite some time, at least to lexical semanticists who work at the interface with syntax. I haven't done a careful analysis, but the counterexamples being sent to Geoff seem like they fall into categories we already knew about.

I would say more, but theoretically this is a holiday weekend and I'd rather spend my time playing than discussing.

Posted by Philip Resnik at 12:42 PM

Flat Yanks, sharp Brits?

On May 14, when Bob Mondello reviewed Troy on NPR, he said:

"As with most sword-and-sandal epics, go indoors and everything's suddenly about statuary, and torches, and an international cast that's trying to reach common ground on accents; here the kings hail from Scotland and Ireland, and the followers from London's West End and Australia. Happily this makes Pitt's Achilles sound like the outsider he's supposed to be, even when he remembers to round his vowels..."

On the same date, when Carrie Rickey reviewed Troy in the Philadelphia Inquirer, she wrote

"Just because Pitt is a hair actor, tossing highlighted tresses for emphasis, doesn't mean he's a bad actor... But when Pitt opens his mouth, the voice that emerges is prairie-flat, lacking the thunder-on-the-palisades sweep and resonance of O'Toole and Bana. When Pitt speaks, you don't think Troy; you think, as a friend says, Troy Donahue."

There seems to be some sort of dialect=shape metaphor in the background here: American voices are flat, British (and Scottish and Australian) voices are not.

Although I suggested in an earlier post that Mondello might have been referring to Pitt's artificial r-lessless, that's probably not it. There a whole complex of inter-related semi-synesthesias going on here, several different metaphorical extensions of "flatness" -- to sounds, to articulations, and especially to social evaluation.

An IMDB review of the 1955 movie Seven Cities of Gold complains about Richard Egan as "Jose Mendoza" that

Egan is about as Spanish as William Bendix!! His flat American accent and obviously non-Latin coloring create a sensory paradox when he is onscreen.

A bit of Googling will turn up hundreds of other examples of this sort of thing. But what is flat about American accents, exactly?

A flat voice might be one that is emotionless or uninflected, and American speech is stereotypically uninflected by comparison to British speech. It's easy to find lists of (empirically unsupported) national stereotypes that depict Americans (especially men) as using little intonational modulation, for example this one:

These are some of the more commonly-held ideas about different cultures: German men speak fairly slowly, with a deep voice; everything sounds serious! Scandinavians appear more quiet and modest: soft, gentle tones but with clearly varied intonation. Italians, Spanish and Greeks always seem to be excited about something. Americans “chew” their words and frequently have very deep voices with “flat” intonation.

However, often it's the vowels or consonants rather than the intonations that are perceived as "flat", as in Mondello's review, or this Wired article about call centers in Bangalore:

"You try to place the accent. Iowa maybe? No, the "a" sound is too flat. California? Maybe it's a crowded call center in some business park in Kansas City.

But Betty is actually calling from Bangalore, and her real name is Savita Balasubramanyam. ... And her perfect American accent is the result of rigorous training and an employer-encouraged addiction to Ally McBeal."

Then again, this interview with Tom Friedman about his visit to call centers in Bangalore features a clip of an "accent neutralization class" in which the instructor uses the phrase "flat the 'tuh' sound" to mean "flap and voice prevocalic non-pre-stress /t/":

INSTRUCTOR: All right, class. I want you to take out your books and I'm going to give you a passage. Remember, the first day I told you that the Americans flat the "tuh" sound. You know, it sounds like an almost "duh" sound, not keep it crisp and clear like the British. So I would not say "Betsy bought a bit of better butter" or "insert a quarter in a meter." But they would say "insert a quarder in the meder," or "'Beddy bought a bit of bedder budder."

So I'm just going to read it out for you once, and then we'll read it together. All right? "Thirty little turtles in a bottle of bottled water. A bottle of bottled water held 30 little turtles. It didn't matter that each turtle had a round metal ladle in order to get a little bit of noodles." All right, who's going to read first?

Friedman explains that the same instructor

...also does British accents, American accents. That was actually for a Canadian call center. They were actually working on a sort of flat North American Canadian accent.

where he is not talking about T's but about some overall impression of flatness.

A discussion of accents in Hebrew suggests that American R's are "very flat":

"Don't worry about how you sound" was my mother's advice. She had heard a political talk in Hebrew on the radio, by someone whose masculine sounding voice spoke with a heavy American accent, including very flat "r"s. After the speech, the announcer said: "You have just heard Golda Meir speak." My mother suggested: "Listen to how she speaks. She's prime minister of Israel and her Hebrew sounds so American."

Flat vowels, flat T's, flat R's, flat accents. What are these people talking about?

As usual, reading the OED helps us to trace the metaphor back to its sources, which turn out to be multiple. Among the OED's senses for flat (the adjective) are:

4.b. Engraving. Wanting in sharpness...
4.c. Of paint, lacquer, or varnish: lustreless, dull.
4.d. Photogr. Wanting in contrast.

7. Wanting in points of attraction and interest; prosaic, dull, uninteresting, lifeless, monotonous, insipid. Sometimes with allusion to sense 10. a. of composition, discourse, a joke, etc. Also of a person with reference to his composition, conversation, etc.

9.a. Wanting in energy and spirit; lifeless, dull.

10. Of drink, etc.: That has lost its flavour or sharpness; dead, insipid, stale.

11.a. Of sound, a resonant instrument, a voice: Not clear and sharp; dead, dull. Also in Combs., as flat-sounding, -vowelled.
11.b. Music. Of a note or singer: Relatively low in pitch; below the regular or true pitch.

There are several different metaphors here: "flatness is lack of variation"; "flatness is lack of some (desired) feature"; "flatness is lack of attractiveness"; "flatness is lack of resonance"; "flatness is pitch lowered relative to a reference value". All of these can be applied to sound.

There is a tradition (now obsolete) in phonetics of using "flat" (usually as opposed to "sharp") to describe certain differences in sound quality. For example, the OED cites

1874 R. MORRIS Hist. Eng. Gram. §54 B and d, &c. are said to be soft or flat, while p and t, &c. are called hard or sharp consonants.

This is the sense of "flat" as "voiced" (i.e. accompanied by vocal-cord vibration) that the call-center instructor in Bangalore employed, carrying on the tradition of 19th-century British phonetics, which was developed largely in order to help teach "commercial millionaires" and colonial subjects to speak "properly."

The OED also references two other similarly-abandoned traditional uses of "flat" in phonetics. One, due to Henry Sweet, refers to vowels made with a level or "flat" tongue shape, not raised or lowered in either the back or front:

1934 H. C. WYLD in S.P.E. Tract XXXIX. 609 The tongue may be so used that neither back nor front predominates, but the whole tongue, which lies evenly in the mouth, is raised or lowered. Vowels so formed are called ‘mixed’ by Sweet, but I owe to him also the term ‘flat’ which I prefer as more descriptive. The vowel [ʌ] in bird is low-flat.

The other one is due to Jakobson, Fant and Halle. This one refers to sound rather than tongue shape, invoking the contrast with "sharp" again, but in terms of physical measurements of frequency rather than subjective impressions of sound quality:

1952 R. JAKOBSON et al. Prelim. Speech Analysis 31 Flat vs. Plain...Flattening manifests itself by a downward shift of a set of formants.

However, I think it's clear that the references to "flat American accents" don't describe anything at all about the intrinsic quality of the sounds, but are pure social evaluation: "wanting in points of attraction and interest; prosaic, dull, uninteresting, lifeless, monotonous, insipid". The American middle classes have always had self-esteem problems.

Posted by Mark Liberman at 11:27 AM

May 29, 2004

The poetry of Donald Rumsfeld and other fresh American art songs

As we know,
There are known knowns.
There are things we know we know.
We also know
There are known unknowns.
That is to say
We know there are some things
We do not know.
But there are also unknown unknowns,
The ones we don't know
We don't know.

This and six others songs with lyrics by Rumsfeld are available on CD here. No one, as far as we know, has yet set to music the press releases of the Plain English Campaign. [via kaleboel].

Posted by Mark Liberman at 10:38 PM

Natural language and artificial intelligence

I hesitate to say a good word for modern software (it might encourage the further production of the sort of bug-ridden word processors I use so much), but I think Craig Silverstein (Google's technology director), in the interview that Mark Liberman recently cited, might actually be underestimating recent successes at simulating aspects of common sense. Try typing "Simmons" and "Beautyrest" into the Google search box (notice, no use of the word mattress) and watch the mattress ads spring up in the margin of your display of search results. If that isn't effective computer simulation of common sense I don't know what would be: the name Simmons is common enough (one might imagine it thinking), but there is a Simmons firm that has trademarked the name Beautyrest, and it's the name of a line of mattress products, so...

The classic dreams of GOFAI (Good Old-Fashioned Artificial Intelligence) may have been quixotic. But the work on impossible projects about replicating common sense (like inferring in context that someone who goes into a restaurant probably intends to eat a meal there, and will be paying for it, and so forth) used programmers who stayed in the business and ended up working on more sensible things that turned out to be considerably more successful. Google's subject-linked advertisements really do quite often turn out to be relevant, even if you do get silly mistakes based on superficial word similarities sometimes. This is not GOFAI, but it is reasonably called AI, and it is not completely brain-dead. I'm fully aware that what's really going on may be just a matter of mattress-sellers including words like Simmons and Beautyrest on the lists of words that they pay to have their ads associated with, which renders it almost stupidly simple; but my point is not that wondrously clever techniques of reasoning are really being used, but rather that it takes so little of this sort of elementary rigging and accessing of files and lists to make it appear that the system is intelligently responsive to one's interests, and the folks at Google are doing it quite well.

And it's not just Google. Yesterday I checked the details of a book on Amazon.com and I noticed that not only did it have some suggestions for me about the usual kinds of books I buy, it also asked me if I would like the book I had just looked up to be delivered to me the next day, and told me that if I placed my order within one hour and ten minutes that could be done. While I stared at this, the screen did a sudden update and "one hour and ten minutes" changed to "one hour and nine minutes". Now, the software accomplishment involved shouldn't be compared to to proving Fermat's last theorem, but looks intelligent. It's a closer approach to helpful, relevant, and timely information than I've had in most phone calls to retailers.

What is notable about these advances in advertising and retailing software, though, is that the symbolic computational linguistics of the 1980s has contributed nothing to them. None of the small wonders of convenience and user-friendliness found on the very best of the commercial websites involves anything you could reasonably call natural language processing.

(I know, I know, AskJeeves.com boasts of natural language processing, and says its product is "able to understand the context of what you are asking" and can offer "answers and search suggestions in the same human terms in which we all communicate". Puh-lease! can you say pa-thet-ic? Ask Jeeves "Show me some cars that are not Japanese." The results are all about Japanese cars. The NLP claims about AskJeeves appear to be a load of nonsense.)

Perhaps Silverstein is right to talk in terms of it being centuries before you can talk to a computer at the library reference desk and find it as intelligent as the human being who currently staffs it. But I'm not prepared to concede that yet. I think in due course we have to get back to real natural language front ends: it is possible to pull together (1) literal sentence understanding based on grammatical analysis, and (2) modern computational techniques of spotting likely relevance. But no one is trying to do it at the moment.

The people working on (2) are doing brilliantly. (Notice, the spotting of likely spelling mistakes in Google and advanced word processors is an aspect of (2), and deserves some real respect.) But work on (1) was all but abandoned in industry some ten years or more ago. It didn't have to be. Nothing was discovered that made people decide syntax or literal meaning were impossible to grapple with algorithmically. I am not ready to believe that centuries will have to elapse before I can send email to a shopping robot "Get me price information on Simmons Beautyrest mattresses from at least five stores in Northern California" and get some useful data back in response. Sure, there are some syntactically irresolvable ambiguities in there (is it mattresses from five stores, or information from five stores?). But Google's ad-relevance technology could pretty much figure out what I want even without analyzing the grammar. If we brought even just a little grammar together with some educated guesswork and smart search technology, we could have something really remarkable. Not intelligence and empathy — we're going to be talking about something vastly less intelligent than a cockroach here, and an equivalent level of empathy, i.e., none — but it could still be something very effective, something that you could easily mistake for rapid and effective assistance from an intelligent entity that had understood the gist of what you said.

Posted by Geoffrey K. Pullum at 04:41 PM

Brad Pitt in Troy: r-lessly round or r-fully flat?

Erika at Kittenishly Doomy Thoughts asks

What the hell was up with Brad Pitt's accent in Troy? It seemed to be entirely devoid of postvocalic /r/, but he didn't have any.. other.. features of r-less dialects. It sounded like some sort of affectation, but I had no idea what he was hoping to convey with it.

Could this have been what Bob Mondello was talking about on NPR when he said that "Pitt's Achilles [sounds] like the outsider he's supposed to be, even when he remembers to round his vowels"?

A lot of people are talking about how Brad Pitt talks, but so far, Erika is the only one (among those I've read or heard!) who's said anything specific and coherent about it. Here's a small fraction of the Brad Pitt accent discussion from (here and there in) FT forums:

Kithy: I'm totally going to see this. I don't care who's in it or how lame Brad's accent is...
Elle Driver: I am so gonna be there all those pretty men in skirts, I was laughing at Brad's accent during the trailer though...
Sharmila: What is UP with Brad Pitt's accent? It's not just me who starts cracking up during the trailer, right?
M One: And yes, even in the short previews, Brad's accent slippage is pretty bad. Skirts outweight accents for me though.
Talamasca: Oh good, I thought I was the only one who thought Brad's accent was lame in the commericals/trailer.
Satine O'Hara: It looks like an amazing film to me- besides Brad Pitt's accent which can be described as shaky at best...
Jcpdiesel21: I am very frightened of Brad Pitt's accent.
radguurl: ...Brad Pitt's terrible, terrible, terrible accent.
Knesaa: I rather liked Brad Pitt's Achilles also, shitty accent or no.
Cassandra423: Good thing he had those fabulous muscles to distract me from his horrible accent.
Sally Albright: From the trailer, I was concerned about his accent, but in the movie it wasn't that bad. Or maybe I become accustomed to it.
Elle Driver: I liked Brad Pitt as Achellies, but his accent was really, really bad
Binky: The accent didn't even bug all that much. I liked him. Not exactly a shining performance, but he wasn't the stand-out suckage of the film.

If I have a chance later on, I'll see if I can grab some Pitt accent audio from the online trailers -- so far all I can remember hearing, in a couple of TV ads, is the musical background for fleets of ships and flashes of battles.

[Update: Here are .wav files for Pitt's contributions to the scene in which Agamemnon confronts Achilles over possession of Briseis. Pitt is indeed consistently r-less in this scene, just as Erika says. So when Mondello said "round his vowels", did he mean "leave out syllable-final r's"?

(link) Perhaps the kings were too far behind to see.

(link) The soldiers won the battle.

(link) Be careful, king of kings.

(link) First you need the victory.

(link) You want gold? Take it.

(link) It's my gift. To honor your courage -- take what you wish.

(link) No argument with you, brothers, but if you don't release her, you'll never see home again.

(link) Decide!

There are some other interesting things about Pitt's speech in this scene, which I'll have to pick up later, as my youngest son and I are off to the zoo at the moment.]

Posted by Mark Liberman at 10:09 AM

May 28, 2004

Squelch squelched, predecease whacked, induce terminated

Proposed strictly transitive verbs have half-lives as short as synthesized high-end transuranic elements. Lance Nathan at MIT squelches Adam Albright's suggestion squelch with a sentence off the web (http://thalo.net/freedom.html): "I have been a participant in other online forums, grew dissatisfied with the rampant tyranny of political correctness and impulse to squelch, and therefore acted to whack together my own modest forum site." So much for squelch. He also kills predecease: "Where the husband predeceases, neither widow nor children can claim a right in any part of the heirship moveables" (Erskine, Inst. Law Scot., 1765, via OED). And M. Crawford writes with this example of induce with implicit object: "I'm really hoping I go before they have to induce because I've heard so many bad things about induction and how much more painful it is." (That's, about inducing labor to hasten childbirth, of course.) So, though one could quibble, that looks like three of Adam Albright's potential strict intransitive verbs gone. And more counterexamples are coming in over the transom all the time.

Posted by Geoffrey K. Pullum at 06:13 PM

Garfield learns about directives

The traditional statement about imperative clauses is that they express commands: they are for bossing people around, telling people what to do. This is not so. Imperatives express a much wider range of meanings, referred to in The Cambridge Grammar as directives. In today's Garfield cartoon strip (May 28, 2004) the difference comes out clearly.

"Nobody tells me what to do!", Garfield is thinking.

Then his owner/servant Jon appears with a heaped platter and says, "Have something to eat."

And the gluttonous cat says to himself, "Well, this is a bit awkward."

Not at all, Garfield. Tuck in. Jon's utterance is (syntactically) an imperative, but it is not (semantically) a command. It's just a directive. Directives can be polite invitations ("Come in"), suggestions ("Have a seat"), or even just good wishes ("Have a nice day!").

Posted by Geoffrey K. Pullum at 05:16 PM

What is linguistics, and why do they embarrass your international customers?

In the May 28 Ecommerce Times, Naseem Javed has a "Viewpoint" article entitled "Six Questions to Spur Web Success". Question number 5 is:

5. What is linguistics, and why do they embarrass your international customers?

A site name in one country can mean something entirely different when it circumnavigates the globe. How do you tackle such language issues? The answer is to acquire skills and a deeper understanding of global communications. Even if you're a regional player, your sites are still visible and exposed to the entire world. Cyber branding is an extremely global phenomenon.

Note that "A site name in one country can mean something entirely different when it circumnavigates the globe" is, I think, an Escher sentence of a new type, though I can't quite get it to sit still long enough to be sure.

There are some other great lines in this article:

One must determine the desired size, personality and length of the name, plus the choice of alpha characters, as each emits its own unique signals.

Customers must allow a name brand to settle in their minds before they give you cash.

When trying to process millions of silly and randomly structured names, the mind becomes overly tired.

The future can be pretty clear if it is planned today.

[Thanks to David Donnell for emailing the article reference].

[Update: Margaret Marks emails a link to a critical discussion (at wordlab) of an earlier Javed column: "... we feel compelled, in the interest of our profession, to debunk this bunk point-by-point...". Read the whole thing.]

Posted by Mark Liberman at 04:22 PM

More conjectures and refutations on strict transitivity

My recent suggestions for verbs so rigidly transitive that they always have an overt direct object (where it is permitted) were have and keep. There may be some others; but not the ones some people have been sending me.

Andy Durdin was immediately reminded (as I should have been) of the words from the old Church of England marriage service from the Book of Common Prayer:

"I _______ take thee _______ to my wedded wife, to have and to hold from this day forward, for better for worse, for richer for poorer, in sickness and in health, to love and to cherish, till death us do part, according to God's holy ordinance; and thereto I plight thee my troth."

However, I think this is one of the cases where the construction involved requires a missing object. There are lots of these constructions, and in every case it is fine for the object not to be there; but it is also fine for the object of a preposition like at not to be there:

I want a wife I can love ____ for the rest of my life
I want a copy of this that I can look at ____ whenever I want.
That diamond would really be something to have ____ if you wanted to attract thieves.
This would be a useful goal to aim at ____ .
Peace of mind is a wonderful thing to have ____.
That Rembrandt is a wonderful thing to gaze at ____.

What these examples show is that in some kinds of sentence you are required to have a missing noun phrase. We're talking about whether or not you can leave it out in a context where it would normally be permitted.

That doesn't mean I was right about have. I wasn't, as email correspondents and bloggers have been jumping all over me to show. Keith Ivey sent me by email a Googled sentence, The world we live in is a world of those who have and those who don't, which seems fine. And Jonathan Mayhew offers an attested example, from Billie Holliday's "God bless the child": Mama may have, papa may have / But God bless the child that's got his own. Douglas Davidson points out that the Bible also says For he that hath, to him shall be given: and he that hath not, from him shall be taken even that which he hath, and he is quite right, it doth. Language Hat emailed me to point out that there is a book by Ernest Hemingway entitled To Have and Have Not. All in all, that pretty much does it for have.

Sasha Albertini suggested give and take might be obligatorily transitive, but I don't think so: The trouble with you is that you know how to take but you have no idea how to give.

Still, there may be some verbs that are. Adam Albright proposes a list that are about as solid as I can imagine coming up with. He doubts that an implicit objects would ever sound good with any of these verbs:

attain, attribute, cause, comport, delineate, depict, eclipse, impute, induce, portray, predecease, resemble, squelch, subsume, supercede, utter

The nice thing about this little puzzle is that it is highly empirical: you can always find one of your conjectures is blown away completely by a single counterexample. I once thought abandon was a solid case, and then I leafed through a copy of Atlantic Monthly at an airport bookstall one day in the 1990s when Newt Gingrich was threatening his Contract With America, and I read that the conservatives in the Congress were implacably devoted to the destruction of bloated Federal programs of expenditure: "Where necessary, we must not merely revise, we must abandon." I put the magazine back on the rack and got on the plane knowing that I couldn't cite abandon as strictly transitive any more. And as Adam says:

I also thought "await" might be one, but then I discovered this quote from T.H. White's "Once and Future King":

"There is nothing," said the monarch, "except the power which you pretend to seek: power to grind and power to digest, power to seek and power to find, power to await and power to claim, all power and pitilessness springing from the nape of the neck."

This is a lovely example of a genuine context in which plenty of transitive verbs could be coerced into occurring without their usual objects. And a similar case could readily be imagined that would remove several of the verbs on Adam's list above:

If you want to be an artist, it is not enough just to splosh paint around; before your abstractions can mean anything you must first study representational art — you must learn how to delineate, to portray, to impute, to depict.

I think that is convincingly grammatical (not that it's a genuine citation: sometimes a syntactician cannot Google, but must invent). As yet I am not able to see it as likely that objectless occurrences will be found for attain, attribute, cause, comport, eclipse, induce, keep, predecease, resemble, squelch, subsume, supercede, or utter. But who knows? It may merely be that I lack the necessary imagination and haven't yet spotted cases that are out there somewhere in textland waiting to be Googled up.

Posted by Geoffrey K. Pullum at 01:40 PM

The great vow shift

For those of you who are planning to be in Philadelphia within the next few weeks, here's a short review (sent by Anna Papafragou to a local mailing list) of a play with a number of linguistic connections.

The Adrienne theater is now showing a play called 'Speech acts' written by
Claire Gleitman (Lila and Henry's daughter). This is a play about the
successful career and not-so-successful marriage of a female linguist. It
is a rare opportunity to see a play where the characters agonize over
giving LSA papers, academia, prescriptive grammar and the Great Vow
Shift (after all, the play is also about a troubled marriage). The play is
very funny - I saw it last night at the premiere and really enjoyed it.

For those of you who would like to see it, it is on Mondays,
Tuesdays, Wednesdays, Thursdays and Friday nights, through June 18:

Adrienne Theatre
2030 Sansom St.
Philadelhia
(ground floor, called "The Playroom").

Consider that 20-30 people a day visit our site to learn about wedding vowels, I'm surprised to learn that "great vow shift" is not in Google's index at the moment. Yes, I recognize that people who know what the Great Vowel Shift is also know how to spell it, but the same people tend to be fond of puns, and this is a good one.

Posted by Mark Liberman at 12:38 PM

WhG/gp and other problems of quantification

This morning I checked the web for uptake on Geoff Pullum's proposal that the measure of (document) frequency on the web should be denominated whG/gp, i.e. "web hits on Google per gigapage". I didn't find evidence of mass buy-in, though it's far too early to tell. But once I got past all the hits associated with the Werner-Heisenberg-Gymnasium Göppingen, whose domain name is "www.whg-gp.de", I did find this post by Russ Barnes at apostropher.

Russ is also worried that the British may be catching up to the U.S. in wingnuttery. I would have guessed that our U.K. cousins were way ahead, but I've never seen a quantitative study.

Posted by Mark Liberman at 11:35 AM

An ancient and fatherly show

There's been a lot of feedback on the "partitive participial relatives" that I discussed a couple of days ago. These are examples like "At present, personal injury cases are heard by many different Judges, some of whom having no experience in this field." Steve at Language Hat wrote about the topic, and his (literate and erudite) commenters commented on it, and Neque Volvere Trochum at entangledbank provided more analytic depth (as did N.V.T.'s literate and erudite commenters), and I got a bunch of email from our (l. and e.) readers. Andrew Durdin wrote in with some examples from classic texts (given below). Leaving aesthetic judgments to the side, I'll summarize the results by saying that most people seem to agree with me that the construction is ungrammatical, but some agree with Haj Ross (and the writers of the googled examples) in thinking that it's OK.

However, I wrote "seem to agree with me" because there are several different constructions being discussed, and it's possible to accept some of them while rejecting others. That's the way my own reactions fall out, for example. In my original post, I mentioned one such distinction in passing, and passed over the other in silence -- the post was already too long -- but the result was that some people misunderstood what I meant, so I'll try to clarify it here.

As before, those who are not interested in English syntax will want to turn their attention to some of our other topics, say Arabic domain names or vole as game animals.

The key question is whether a supplementary clause containing "whom" is tensed or not. If the answer is "no", then we have an example of the structure that I originally discussed. If the answer is "yes", then there are a couple of different sub-cases, to which people's reactions may also differ.

In all the cases under discussion, we have "Q of whom V-ing ..." at the start of a non-restrictive (or "supplementary") relative clause, where Q is something like some, both, most, neither, many, a few,, etc., V-ing is the present participle of a verb (usually having or being, though that is mainly because the result is an easy pattern to search for), and then ... matches the rest of the clause.

In some cases, the rest of the supplementary clause is nothing but the remainder of the verb phrase headed by the participle V-ing. That's the case with the first three examples that I gave (modulo working out the ... in the second one, where I didn't quote the original sentence in full), and it's also the case with this simple example that I cited later:

"He was one of four brothers, three of whom having died or departed."

The point is that the relative clause -- the clause containing the whom -- has no tensed (or "finite") verb.

Then there is a second structure, in which the supplementary clause is itself a complex consisting of two clauses. The first one is a participial clause, which stands as an initial supplement to a second, finite-verb clause. I gave two examples of this construction, including:

"The next evening we spent with the Consul and his two pretty daughters, neither of whom being able to speak a word of English, the conversation was carried on in French."

Finally, there is a third structure, where V-ing heads a participial relative clause serving as a supplementary modifier of the partitive Q of whom phrase, which in turn is the subject of a tensed verb following the participial relative. I didn't provide any examples of this structure, since I thought it was obvious that it isn't relevant, but of course nothing in this area is obvious. So here's a classical example, courtesy of Andrew Durdin:

(link) "[U]nder which every one did sit in his order according to his dignity, to keep him from the heat of the sun; divers of whom being of good age and gravity, did make an ancient and fatherly show." (Francis Pretty, Sir Francis Drake's Famous Voyage Round the World, originally published in 1580)

You can see the difference more clearly if we take the three supplements by themselves, replacing "whom" with "them", fixing Pretty's punctuation for clarity, and replacing his "divers" with "some" for the same reason:

(a) Three of them having died or departed.
(b) Neither of them being able to speak a word of English, the conversation was carried on in French.
(c) Some of them, being of good age and gravity, did make an ancient and fatherly show.

Example (a) is not a complete sentence, since it lacks a tensed verb, but (b) has the tensed main verb "was carried (on)", and (c) has the tensed main verb "did make".

Using square brackets for clause boundaries, and putting the participial clauses in red, we can schematize these three structures as:

(a) [ (Q of them) V-ing ... ]
(b) [ [ (Q of them) V-ing ... ] [ Subj VerbPhrase ] ]
(c) [ (Q of them) [ V-ing ...] VerbPhrase]

Now, we can take each of these structures, replace (the word) them with whom, and embed the whole thing in a main clause in which some noun phrase is to be non-restrictively modified by the structure we've created.

In the case of structure (c), the result is a completely normal if somewhat complex supplementary relative clause. Put the participial clause in parentheses, and it won't even be too hard to read:

[link] ... I am talking about 54 million people in the U.S., some of whom (being wealthy) can afford a better life ... (re-punctuated for clarity)

In the case of structure (b), the result is a non-restrictive relative clause in which the relative pronoun whom is buried inside a recursively-embedded participial supplement. This is a dangerously complex structure, but I find it perfectly grammatical, once I figure out what the writer had in mind. Such patterns seem to have been fairly common in earlier centuries, when grammatical complexity was not only tolerated but even encouraged. Of the two examples that I originally cited, I find the first to be perfectly clear, though a bit archaic-seeming, while the second is very hard to construe. Here is it again, with a comma and some parentheses added in a feeble attempt to make the writer's intent easier to see:

"In this report I have dealt more in particulars, for the reason there are no reports from brigade commanders, ( [ all three of whom having been captured ], I reserve to myself the privilege of making such corrections as would appear right and proper when I subsequently have the opportunity to examine their reports. )"

Here is the promised list of examples from Project Gutenberg e-texts, emailed by Andrew Durdin -- with my classification of each into the categories (a), (b) and (c) as defined above. Note that only category (a) is really an example of the structure that I originally intended to comment on.

"under which every one did sit in his order according to his dignity, to keep him from the heat of the sun; divers of whom being of good age and gravity, did make an ancient and fatherly show." (Francis Pretty, Sir Francis Drake's Famous Voyage Round the World, 1910; http://www.gutenberg.net/etext01/fdvrw10.txt)
{type (c)}

"Sir Henry Sidney had three children, one of whom being Sir Philip Sidney, the type of a most gallant knight and perfect gentleman." (GORDON HOME, WHAT TO SEE IN ENGLAND, 1908; http://www.gutenberg.net/1/1/6/4/11642/11642-8.txt)
{type (a)}

"The duke left Filippo and Giovanmaria Angelo, the latter of whom being slain by the people of Milan, the state fell to Filippo" (NICCOLO MACHIAVELLI (anonymous translation), HISTORY OF FLORENCE, 1901; http://www.gutenberg.net/etext01/hflit10.txt)
{type (b)}

"the marriage register contains an entry of the names of Thomas Tilsey and Ursula Russel, the first of whom being "deofe and also dombe," it was agreed by the bishop, mayor, and other gentlemen of the town, that certain signs and actions of the bridegroom should be admitted instead of the usual words enjoined by the Protestant marriage ceremony" (The Mirror of Literature, Vol. 20, Issue 572, October 20, 1832; http://www.gutenberg.net/1/1/8/6/11863/11863-h/11863-h.htm)
{type (b)}

There is another dimension of variation -- is the embedded non-restrictive relative clause placed at the beginning of the main clause, in the middle of the main clause (following the modified noun) or at the end of the main clause (perhaps immediately after the modified noun, or perhaps following some other stuff)?

For some people, the biggest surprise is that these Q-of-whom V-ing clauses can sometimes occur sentence-initially, preceding the modified noun(s), as in the first example that I cited:

"Both of whom being influenced by Ellington, Rowles and Brown choose one Ellington tune for each of the two albums that comprise this two-CD set..."

Posted by Mark Liberman at 10:25 AM

Censorship?

You would think that movie directors would know what censorship is, but it seems that they don't. There's a new DVD player out, the RCA DRC232N, that is causing a fuss. This DVD player contains software called ClearPlay that cuts out words and scenes that would be offensive to the viewer. The user configures the DVD player according to his or her preferences in a number of categories: violence, nudity, blasphemy, and so forth. The ClearPlay company reviews films and produces an electronic annotation indicating the location of words and scenes that would offend a viewer with certain preferences. If the user of the DVD player wants to cut out certain kinds of material, he installs the annotation. The DVD player then cuts out whatever bits, according to the annotation, would not conform to the preferences set. Some people are using this system to control what their children watch. Others are using it for their own viewing, to avoid what they would find upsetting.

I would think that the movie industry would be pleased. Such a gadget provides people who are easily offended by current movies, or who are concerned with what their children watch, with an alternative to complaining about the movie industry and demanding censorship. It will probably increase sales, since people who might previously have avoided a film may now see it. In fact, the industry is outraged and is engaged in litigation over this. There is much talk in the media and on the web of "censorship". The Directors Guild of America says:

ClearPlay software edits movies to conform to ClearPlay's vision of a movie instead of letting audiences see, and judge for themselves, what writers wrote, what actors said and what directors envisioned.

There's some muddled thinking going on here. This isn't censorship. The movie industry still puts out exactly what it wants to, and audiences still see what they they want to see. Everybody is free to use another DVD player, or to use this one with a configuration that suppresses nothing, or to use it without the annotation. All this software does is allow the user to choose to skip selected portions of the movie. Contrary to the DGA's claim, ClearPlay doesn't edit movies to conform to its vision - it simply tags them so that viewers can make their own decisions. Providing audiences with a choice is not censorship.

There is also an issue here of artistic integrity, though it is a bit of a stretch to characterize some movies as "art". The DGA seems to think that the director has the right to have his work viewed exactly as he intended it. There's some validity to this when a work is unique, which is why in some European countries there are now laws that prevent the alteration or wanton destruction of works of art, but here there's no question of the original being lost. The director's only right is to present his work to the viewer - he has no right to control what the viewer does with it. When you read a book, you read it as you wish. You can skip whatever you like, you can read it backwards, you can skip around, or look up selected bits in the index. You have no obligation to read the book as the author intended you to.

The DGA should should be ashamed of itself for crying wolf about censorship. Allowing people to skip bits of movies that they find offensive isn't censorship. Censorship is a very serious matter, and in much of the world is severe, as documented by organizations like Human Rights Watch. It isn't absent in the United States, as the National Coalition Against Censorship and American Civil Liberties Union will attest. It isn't a word that should be taken in vain.

Posted by Bill Poser at 12:06 AM

May 27, 2004

Illustrating obligatory transitivity

It is generally assumed by syntacticians that some verbs are obligatorily transitive. An example of one that isn't is eat. It can be used either with a direct object (I've already eaten lunch) or without (I've already eaten). I don't mean in constructions where the object is required to be eliminated, like passives (The food wasn't all eaten) or relative clauses (the things that they eat); I mean that in construction types that allow the object to be present, if the verb is eat the object can just be left implicit. Some verbs are standardly said to be much more rigid, insisting on an overt direct object noun phrase. A syntactician might well exemplify with verbs like, say, discard, or abandon, because it would be easy to assent to the notion that sentences such as these deserve their asterisk annotations for ungrammaticality:

*The company eventually decided to discard.
*I hope your brother doesn't just abandon.

But the syntactician would be wrong.

Look at this sentence, which I came across in this article about moves to improve the standard of computer programming in the future:

They require the courage to discard and abandon, to select simplicity and transparency as design goals rather than complexity and obscure sophistication.

Perfectly grammatical. So there goes the idea that discard and abandon illustrate obligatory transitivity. It is in fact extremely hard to come up with a list of even ten transitive verbs associated with a really hard requirement that the direct object be explicit rather than implicit. Sometimes I almost begin to think there aren't really any. I believe I can still be confident about have and keep. But finding eight more wouldn't be easy.

Posted by Geoffrey K. Pullum at 09:24 PM

AI and Kernighan's Law

Craig Silverstein, technology director at Google, was recently interviewed by Stephanie Olsen at CNET. The discussion got around to Artificial Intelligence, or at least Artificial Reference Librarians:

You have portrayed the ideal search engine as one resembling the intelligence of the Starship Enterprise or a world populated with intelligent search pets. Can you talk a little bit about those ideas?
Well, the third idea is having the computer be as smart as a reference librarian. That's interesting, because reference librarians, of course, use computers, use Google to help them search, but they put some element of intelligence into it that the computer cannot do by itself.

So, part of the goal is to make computers smart enough so that when you interact with them, they can do something with that information to help you actually get better results. That is certainly something Google thinks about to improve quality.

When do you think that kind of artificially intelligent search will happen?
I think that understanding language is kind of the last frontier in artificial intelligence, and then talking to a computer will be just like talking to a reference librarian, because they will both be equally knowledgeable about the world and about you.

The big difference, and this is where the search pets come in, is that the reference librarian will understand emotions and other nonfactual information that even a fully intelligent computer may have trouble with.

In terms of timing, I typically say about 200 to 300 years. I think it is probably closer to the 300th year end of it. But if it ends up being closer to the 200th year, I would not be around in any case, and I will not be able to have anyone gainsay me.

Good thinking.
Going back further, even 30 years, the people who were working on artificial intelligence in the '60s thought all these problems would be solved by today--and we are basically not very much closer in terms of those overall high AI goals of understanding language.

So basically, Silverstein agrees with Marvin Minsky -- and just about everybody else -- that AI is brain dead. But further, he's making a quantitative estimate that it will take between 200 and 300 years to create an artificial reference librarian.

SIlverstein is only 31, and he's not a scientist, so we can't apply Arthur C. Clarke's dictum that "When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong."

So we need another proverb, along the lines of "When a bright young technology hotshot predicts that something will be developed next year, (s)he's almost certainly ____. When (s)he predicts that something will not be developed for two or three centuries, (s)he's very probably ____." Fill in the blanks for yourself. There's more than one right answer, I'm sure.

In the absence of a compelling "young technologist" substitute for the "elderly scientist" role in Clarke's Law, we might instead turn to Kernighan's Law (at least I think it was Brian Kernighan from whom I first heard this): "When a programmer says that a piece of code will take X amount of time, double the estimate and raise it to the next higher unit." In other words, "half an hour" means "one day"; "one day" means "two weeks"; "two weeks" means "four months"; and so on.

According to Kernighan's Law, 200-300 years translates to 400-600 decades, i.e. four to six millennia. On that view, the time period between AI and the development of the computer would be about the same as the time period between the development of the computer and the invention of writing. That would have a nice symmetry, but I doubt that it's true, if only because cultural prediction on a time scale of millennia (or even centuries, at this point) seems absurd.

[Silverstein interview via Blogos.]

Posted by Mark Liberman at 08:03 PM

Latin on Jeopardy

Just now on Jeopardy one of the categories in Double Jeopardy was Latin. The questions required knowledge of some words, but also of an irregular verb form and some grammatical terminology. Contestant Anne got all five right! Maybe people aren't quite as ignorant about language as we thought. Or maybe it was just a fluke.

Posted by Bill Poser at 07:13 PM

"if you only speak Arabic, why would you be interested in the Internet?"

According to Jonathan Wright's Reuters wire story, Khaled Fattal at the Multilingual Internet Names Consortium (MINC) has the goal of "enabling Arabs unfamiliar with the Latin script to use the Internet in Arabic alone", and suggests that "given enough funding, say $6 million, his organisation could produce tangible results within nine months". Note that the issue is only the use of Arabic in domain names and other aspects of URLs -- web pages can now be displayed correctly in Arabic script in all the major browsers, and there are a large and growing number of Arabic-language web sites. I'm puzzled that the article doesn't mention the fact that ICANN announced a plan almost a year ago, to permit web addresses in any Unicode-supported language -- which certainly includes Arabic.

ICANN's announced plan was based on a reversible encoding from Unicode into ASCII (cleverly if undiplomatically called "punycode"), rather than on modifying the web's infrastructure to permit Unicode directly in domain names. It's clear that the proposed solution has some problems, since the domain-name proposal doesn't seem to have gone much of anywhere in the past year -- though Mozilla has supported "punycode" since rel. 1.4 -- but it would nice to know what the problems are. The Reuters article hints that the difficulties might be more political than technical -- "[t]he Arab Internet community has ... wasted several years in disagreement over which characters are essential and how to map them into computer code". But maybe there are technical issues with "punycode" too. In any case, the reporter is pretty thoroughly clueless -- or has been made to seem so by his editor(s) -- since (for example) the word "Unicode" doesn't occur anywhere in the article.

What really puzzles me, though, is the quote from Paul Verhoef, identified as "a vice president at the International Corporation for Internet Names and Numbers (ICANN)". Verhoef is represented as saying that "What Khaled says is true, because if you only speak Arabic, why would you be interested in the Internet?"

Uh, because you can read thousands of newspapers and magazines, and millions of discussion groups and information and advocacy sites in Arabic? Like, the same reason anyone else would be interested in the internet?

When you read something this dumb in a news story, it's time for the old attributional abduction tango...

...we can't tell: was the journalist or news release writer misled by the source? did the journalist misremember, misunderstand or invent something independently? was the piece subverted by an editor, accidentally in the course of hasty re-writing, or on purpose due to conceptual confusion or some independent agenda?

I'll file this one tentatively under the general heading of "Reuters anti-globalization prejudice", along with the infamous Korean tongue-cutting story. After all, Paul Verhoef, as a former "Advisor to the Director-General" of the European Commission, and head of "a team with responsibility for international policies in telecommunications, Internet, e-commerce, and Information Society" for the "DG Information Society in Brussels", can't possibly be that badly informed and illogical, right?

I'm sorry that this seems to be "beat up on Reuters week" here at the Language Log. I really don't have any anti-Reuters animus. I just read the news via Google's news aggregator, and the Reuters wire offering is often among the top few stories in a given cluster, and beyond that, I just calls 'em as I sees 'em.

Posted by Mark Liberman at 06:05 PM

When "small" means "large"

Reuters headline: "Small Changes Separate Man from Ape, Study Shows."

Headline of Science Update piece about the study in question: "Chimp chromosome creates puzzles: First sequence is unexpectedly different from human equivalent."

Science Update explains in more detail that

83% of the 231 genes compared had differences that affected the amino acid sequence of the protein they encoded. And 20% showed "significant structural changes". ... The researchers also carried out some experiments to look at when and how strongly the genes are switched on. 20% of the genes showed significant differences in their pattern of activity.

Note that this is true despite the fact that only 1.44% of the individual base pairs are different. If this seems puzzling to you, then you need to open up a calculator window and determine the value of .9856^N (the probability that all base pairs are the same for a gene with N base pairs, assuming independence) and .0144*N (the average number of base pairs that are different for a gene with N base pairs, assuming ditto), for values of N corresonding to the number of base pairs in a gene (say 1,000 to 100,000).

Perhaps this is the line of reasoning that the Reuters' headline writer had trouble with -- or maybe (s)he never got past the first three words of Maggie Fox's story ("Tiny genetic changes..."). Other headline writers did better, though: the Korea Times has "Big Genetic Gap between Chimpanzee, Human Being"; Xinhua has "Genetic study shows chimps are less human"; and so on.

The full description of the Nature article is:
"DNA sequence and comparative analysis of chimpanzee chromosome 22", by The International Chimpanzee Chromosome 22 Consortium, Nature 429, 382 - 388 (27 May 2004).

Reuters correspondent Maggie Fox actually got the story right -- her first sentence continues "(Tiny genetic changes) add up to huge differences when human DNA is compared to that of chimpanzees" -- it's the headline writer who did her in.

[Thanks to Keith Ivey for correcting my math.]

Posted by Mark Liberman at 12:40 PM

"simple navy blue silk" or "wrinkled pigeon-colored number" ?

According to this NYT story, Reuters and Robert Fisk are not the only ones whose preconceptions get in between their sensory inputs and their published descriptions.

Lara St. John is a classical violinist described as a "striking six-foot blonde". For the past few years, she's been trying to get past her first album picture, which showed her naked holding her instrument across her chest. So for a recital in Toronto last February, she "chose her best gown, a simple navy blue silk", explaining that "because the recital was so serious I didn't want trouble with the visuals". But John Terauds, apparently disappointed at the contrast with the album cover, wrote in the Toronto Star that

An almost matronly St. John shambled out onto the Jane Mallett Theatre stage in a wrinkled pigeon-colored number that had to be one of the ugliest frocks to see stage lights this season.

Terauds did like her playing, at least.

Journalists, like junior high school students, want to be able to tag everyone according to some simple and evocative mnemonic category. So-and-so is the class clown, so-and-so is a snooty brainiac, so-and-so is always daydreaming. Lara St. John is tagged as selling sex appeal, George W. Bush is tagged as linguistically inept.

This makes life easier for journalists with deadlines and without new ideas, I guess. But precisely because everyone's perceptions are necessarily influenced by expectations -- sometimes even determined by them -- you'd hope that journalists would take special care with the facts when they see or hear what they expect to.

This is good advice for scientists, too. Dick Hamming once warned me (as I gather he warned everyone) to "beware of finding what you're looking for." In John Terauds' case, I guess the advice applies in reverse; but then he had to come up with a lede for his review from somewhere. So maybe the advice should go to journalists' subjects and journalists' readers: recognize that journalists will misperceive or invent facts that correlate with their stereotypes and preconceptions, and adjust your actions and beliefs accordingly.

Posted by Mark Liberman at 09:47 AM

May 26, 2004

Unwanted linguistic material: the spam problem

Those who have noted that Language Log doesn't have an open comments section have sometimes wondered why. Those who are aware that robots can be programmed to scour the blogosphere for open comments sections and spit spam into them may have some sense of why Language Log has been cautious so far. Here is just one small story about what's out there itching to get at you: recently posted statistics reveal that the percentage of material mailed to Debian Linux mailing lists that passes all ID checks and content X-raying and security screenings and is duly made available to the subscribers on one of the lists is just 3.5%. That's not a typo: three point five percent. Only 35 out of every thousand items sent to the list are genuine postings by human beings that pertain to Debian Linux. The rest, the spam, is caught by various filters, which some human has to constantly tune and maintain. Such is the flood of mass-mailed garbage travelling around the net looking for a way to get to your screen. The percentage of all email that is from spammers is said to be as high as 80% as of April 2004, and rising. These may be underestimates. Even running spamassassin in fairly aggressive mode does not prevent my address, which I do not advertise, from getting at least one Nigerian scam letter per day that skips past the filtering (plus a dozen pieces of other junk that get caught — a very light spam load by some people's standards). This is Geoff Pullum, looking forward to not hearing too much from anyone at ssshhhh!@censored.sorry.xyz.

Posted by Geoffrey K. Pullum at 07:07 PM

Bush wrongly accused of correct Arabic

It is ironic indeed that the Reuters characterization of President Bush's first pronunciation of "Abu Ghraib" (discussed by Mark here) as "abugah-rayp" roughly matches Professor Ahmed Ferhadi's model pronunciation: they were accusing the President of pronouncing the name in almost exactly the way that Ferhadi says is correct. But they were wrong about the pronunciation Bush employed.

Like Bill Poser, I am a Linux user, because I need to have a machine with an operating system fit for grownups to do serious work in a reliable and controllable computing environment. Unlike Bill, though, I am still forced to maintain a Windows machine so that I can use WordPerfect (my coauthor Rodney Huddleston in Australia has twelve years of file creation and macro-writing invested in WordPerfect, and we have to share lengthy and complex files because we're writing another book together). I used the Windows machine to go to the file Bill pointed out he was unable to play. Since I had not used Windows Media Player before, it immediately started up a configuration process that made Windows Media Player the default player for every single kind of sound or video file you could imagine, part of their campaign to destroy such firms as Real Audio, whose product I had been using before. I let it have its evil way (I can always change everything back, though doubtless they will have found ways to make that difficult). Then it let me play the file.

Interestingly, I found that in the variety of Arabic spoken by Professor Ferhadi, the pronunciation is clearly [abugrep], but it also sounds very much like [abugurep], because the r is an alveolar tap, not an approximant like the American English r, which means the g and the r are separated enough that there is a hint of a vowel between them. For the first sound in "Ghraib" I was expecting a voiced velar fricative (like the g in Castilian Spanish haga), but what I'm hearing from Ferhadi is [g]. (One should never underestimate how much the dialects of Arabic vary.) And the last sound in "Ghraib" is definitely [p] in Ferhadi's pronunciation, not [b], possibly because he says the name all on its own and it is uncommon to fully voice a phrase-final plosive.

So the Reuters allegation that Bush said "abugah-rayp" would mean that he got it almost exactly right (here I'm ignoring the details of the length and monophthongal quality of the [e] vowel; Bush basically got that right too.) They were wrong about Bush's actual production, though: he pronounced it with a final [b], as Mark shows. (I suspect this is an acceptable Arabic pronunciation, but it does not match Professor Ferhadi's model exactly.) Mark's point is not impugned in any way, of course, and I agree with him: I'd fail the Reuters reporter in my phonetics class.

And Bush's grade? Mark says B. I say that people who insist Bush don't talk good should just try giving a broadcast speech to an audience of millions with Arabic place names in the script before they sit in judgment. He is probably doing just about as well as any of us could do. And on some occasions he hits on closer approximations to the right pronunciation than we heard from members of the Senate Committee on the Armed Services not long ago.

Posted by Geoffrey K. Pullum at 06:57 PM

Prisons for People, Prisons for Data

On reading Mark's discussion of President Bush's pronounciation of Abu Ghraib, I read the Slate article by Sam Schechner to which he referred. The Slate article provides a link to an audio file of Professor Ahmed Ferhadi of New York University saying Abu Ghraib in Arabic. I can't play it. The file is in a new multimedia format that Microsoft has created called Advanced Systems Format. Microsoft's Windows Media Player can play such files, but as far as I know, nothing else can. So if you use Microsoft software, you can play this file, but if like me you don't, you can't.

The ASF format is described on Microsoft's web site here. You can download a 98 page specification in Microsoft Word format. The specification proper is preceded by a three page End User License Agreement, in small type. The EULA begins with this:

IMPORTANT--READ CAREFULLY: This Microsoft Agreement ("Agreement") is a legal agreement between you (either an individual or a single entity) and Microsoft Corporation ("Microsoft") for the version of the Microsoft specification identified above which you are about to download ("Specification"). By downloading, copying, or otherwise using the Specification, you agree to be bound by the terms of this Agreement. If you do not agree to the terms of this Agreement, do not download, copy or otherwise use the Specification.

I am not a lawyer, and this is not legal advice, but even I can be confident that this is legal nonsense. Contrary to the statement in the above paragraph, I did not read the EULA prior to downloading the specification. I couldn't have, since it doesn't appear on the web page, only in the spec, which you have to download to read. And they're not entitled to assume that people read things in order. Anyone who looks at the table of contents and skips to the beginning of the substantive matter, or to a section of particular iinterest, won't even see the EULA. Such "agreements" have no legal force because they are not in fact agreements. The fundamental principle of contract law is that a valid contract is formed only by "a meeting of minds". The parties must both agree to the same thing. If the parties have different things in mind, no contract is formed, nor can one party impose a contract unilaterally, which is what Microsoft is trying to do here.

The invalidity of the EULA doesn't mean that you can do anything that you want with the specification document. Microsoft does have rights under copyright law. These rights are created by statute and do not depend on the existence of a contract between the company and the buyer. You can't, for instance, copy and distribute freely the specification document since that is governed by copyright law. But once they publish it, you're entitled to read it, and except as limited by any patents that may be relevant, you can use the information as you see fit. The particular words used in the document to describe the file format are subject to copyright, but data structures are ideas and are not protected by copyright.

Microsoft also has rights under trademark law. If they wish, they can trademark the name of their format. What that means is that they can prevent someone else from using the same name for a different format, or from falsely claiming to implement the specification.

The basic license is described thus in section 1(a):

(i) reproduce and internally use a reasonable number of copies of the Specification in its entirety as a reference for the sole purpose of implementing ASF in your hardware, application, or utilities (your "Solutions"); (ii) reproduce and internally use your implementations of ASF made pursuant to the terms of this Agreement (your "Implementations") in source code form solely for internal development and testing of your Solutions, and (iii) reproduce and have reproduced in object code form only, your Implementations and distribute, directly and indirectly, your Implementations (only in object code form) solely as part of and for use with your Solutions.

The first clause is reasonable and legal. It grants a license to reproduce the document for certain purposes. As copyright holder, they have the right to control copying beyond fair use. The next two clauses are the interesting ones. They attempt to control the distribution of information about the specification. This attempt continues in Section 2(c):

You may not provide, publish or otherwise distribute the Specification to any third party. Further, you shall use commercially reasonable efforts to ensure that the use or distribution of your Solutions, including your Implementations as incorporated into your Solutions, shall not in any way disclose or reveal the information contained in the Specification.

The net effect of these is that you can use the specification to write your own software for dealing with files in this format, and you can distribute the compiled versions of that software, but you cannot distribute the source code for that software, which would provide information about the specification to a programmer, or otherwise disseminate information about it. Now, this is curious. Since they have published the specification and explicitly allow people to produce software implementing it, they aren't, strictly speaking, trying to force people to use Microsoft products. What they are clearly trying to do is to discriminate against Free and Open Source Software. This is made explicit in section 2(g):

For a variety of reasons, including without limitation, because you do not have the right to sublicense the Necessary Claims, your license rights to the Specification are conditioned upon your not creating or distributing your Implementations in any manner that would cause ASF (whether embodied in your Implementation or otherwise) to become subject to any of the terms of an Excluded License. An "Excluded License" is any license that requires as a condition of use, modification and/or distribution of software subject to the Excluded License, that such software or other software combined and/or distributed with such software be (x) disclosed or distributed in source code form; (y) licensed for the purpose of making derivative works; or (z) redistributable at no charge;

An example of an Excluded License, and no doubt the one they they have in mind, is the GNU General Public License, the license under which a great deal of free software is distributed, ranging from major projects such as GNU Project and the Linux kernel down to my own software. Microsoft is so scared of the free software movement that they are trying to prevent their ASF format from being used in free or open source software.

Although they aren't, strictly speaking, preventing other people from implementing the ASF specification, the prohibition against releasing information about it, including source code, raises the barrier. I haven't studied this specification carefully, but it looks pretty complex. Writing code to implement it would probably be a fair bit of work. That will discourage people who don't have a fairly strong motivation to support this format. If, on the other hand, source code could be distributed, the work need only be done once. For instance, if you are a C programmer, you don't need to study the details of the various common audio file formats and write your own software for reading and writing them because there are freely distributed libraries for doing this available as source code. I've been using Erik de Castro Lopo's libsndfile library.

As I said above, in my opinion the restrictions that Microsoft is trying to impose have no legal force, but the use of secret and/or proprietary data formats is an increasingly widespread problem. It helps monopolies like Microsoft, creates economic inefficiency, discourages innovation, and, when these formats are used by governments, creates an improper linkage between government and private companies, often forcing people to use a particular company's products in order to obtain access to government services or information to which they are entitled. The Open Data Format Initiative is an organization created to combat this problem by encouraging companies to open up their data formats and lobbying governments to forbid the use of proprietary data formats in government operations.

Returning to Slate's example of how to pronounce Abu Ghraib, why did they provide the file in ASF format? There isn't any good reason to. There are a number of audio file formats that do the job perfectly well and are universally understood, such as WAV, probably the most common, and AU/SND. For plain sound files such as this, ASF isn't any sort of improvement. The only reason that I can see is that Slate is owned by Microsoft and that this is an effort to lock consumers in to Microsoft products and discourage the use of FLOS software.

Posted by Bill Poser at 02:28 PM

We Have Deer and Elk and Bear and Mice Around Here

A true statement, because I'm currently in the woods of northwestern Montana, but it's not the truth value that interests me here. I'm reading a fascinating (no kidding!) book called Wild Logging: A Guide to Environmentally and Economically Sustainable Forestry, by Bryan Foster, and at one point (p. 23, to be exact) he lists the wildlife found on a 160-acre tree farm in northeastern Oregon, including the following:

Mammal sightings: black bear, bobcat, cougar, white-tailed deer, mule deer, elk, chipmunk, ground squirrel, flying squirrel, snowshoe hare, mice, and vole

What's interesting is the zero-plural-for-game-animals usage in this list, except for those mice -- which aren't game animals, of course, but then, neither are voles and chipmunks. That is, as is typical in discussions of game animals, all the terms are treated as if they were structurally parallel to deer, with plural identical to singular. Surely those animal sightings are multiple, not a single member of each species (well, except maybe for rarely-seen animals like cougars).

My first thought was that the asymmetry between the plural form mice and the singular forms vole, etc., must have basically the same motivation as the parallel asymmetry in certain types of compounds, discussed, I think, by Peter Gordon: mice-eater but rat-eater, with *rats-eater impossible.

But maybe not, because as I recall (and I might be misremembering), Gordon's explanation for the asymmetry in compounds had to do with late vs. early plural formation, depending on whether it was the default regular plural (as in rats) or an irregular plural like mice; and that wouldn't be relevant for the list in the logging book. I also don't think the asymmetry is an idiosyncrasy of this author, because it sounds fine to me, and replacing mice with mouse doesn't. So what governs this pattern in the list of mammal sightings? I'm hoping that a blogger with more insight into English grammar than I have will provide The Answer.

(Notice that the singular in the compound noun mousetrap isn't a problem in this context, presumably because most mousetraps catch just one mouse at a time, whereas a creature that eats mice is likely to want more than one.)

Posted by Sally Thomason at 12:11 PM

A simple apageslication

A logician friend of mine mailed me yesterday with a linguistic question:

What does "apageslication" mean? None of my dictionaries at home has an entry for this word and an internet search did not help either.

He certainly came to the right linguistic detective. I solved this puzzle without reference works in 2.83 seconds. I wonder if you can do the same. You know my methods, Watson; apply them!

The occurrence of "apageslication" he was looking at, I immediately told him, resulted when the writer most unwisely did a global change to alter sequences like "pp 29-32" to "pages 29-32" throughout.

I was basically right, though in the first version of this post I set out a more complex and detailed hypothesis than the simpler and dumber truth. Someone had indeed done a careless global edit that said "replace pp by pages everywhere". The original word was "application".

At the time of writing this revised post (May 26, 8:35pm EST) you can see the evidence here: a page of details on nasty military books about killing people in which "88 pp" has been changed to "88 pages" by someone so careless with the editor that they also changed "an opponent's ability to fight" to "an opagesonent's ability to fight", and "Poisons & Application Devices" to "Poisons & Apageslication Devices", etc.

My professional combination of observation, deduction, and specialist linguistic knowledge enabled me to see immediately that something like this must have happened, before I even looked for "apageslication" on the web. Elementary, Watson. Normally customers have to pay for this kind of puzzle-solving service at my minimum rate of $150 for an hour or any part thereof, but my logician friend will merely be paying for our next lunch together.

Posted by Geoffrey K. Pullum at 10:37 AM

Bush -- and Reuters and Fisk -- get Abu Ghraib wrong

When I listened to George W. Bush's speech about Iraq on Monday, I noticed that he handled the pronunciation of Izzadine Saleem and Lakhdar Brahimi just fine, but tripped over Abu Ghraib, which he said three times in three different ways.

At the time, I considered blogging about it. However, I decided not to do so. I felt that these pronunciation issues are a minor point, and shouldn't be emphasized over the content of the speech, and I generally dislike the fuss that's been made over Bush's linguistic problems. I also know from experience that memory for disfluency is often inaccurate, and so I would be very reluctant to write about something like that without finding a recording to check my memory for what was said.

In most news stories, Bush's disfluent rendition of Abu Ghraib was ignored (e.g. in the New York Times story), or mentioned in passing, without any details. That's pretty much as it should have been, in my opinion. The real story is the content, not the pronunciation.

However, the Reuters wire sent out a piece featuring Bush's mispronunciations, which were described as "abugah-rayp", "abu-garon", and "abu-garah". And in an op-ed attack on American policies, Robert Fisk specified Bush's "hesitant pronunciation of Abu Ghraib as 'Abu Grub'".

As I said, I don't think the whole mispronunciation business is very important; but given that people are going to talk about it, I think it's interesting that they don't take the trouble to describe what happened accurately. If we're going to criticize President Bush for not taking the trouble to learn how to say a currently-important word fluently, shouldn't we also take the trouble to observe carefully the facts of what he did say?

Here is the passage in Bush's speech where the three pronuncations of Abu Ghraib occurred:

A new Iraq will also need a humane, well-supervised prison system. Under the dictator, prisons like Abu Ghraib were symbols of death and torture. That same prison became a symbol of disgraceful conduct by a few American troops who dishonored our country and disregarded our values. America will fund the construction of a modern, maximum security prison. When that prison is completed, detainees at Abu Ghraib will be relocated. Then, with the approval of the Iraqi government, we will demolish the Abu Ghraib prison, as a fitting symbol of Iraq's new beginning. (Applause.)

As for the "correct" way to pronounce, here's what Slate had to say a couple of weeks ago, including a link to a (non-Iraqi?) Arabic rendition. I put "correct" in scare quotes because there is always some uncertainty about how to anglicize (or americanize) the pronunciation of a foreign name that contains sounds without an English counterpart.

There are RealVideo versions from CSPAN and on the White House web site, and a RealAudio version is available from NPR. The section quoted above occurs at about 21.29 in the CSPAN version, and at about 21.12 in the White House version. After considerable trouble with non-standard downloading techniques and wrestling with conversion of proprietary audio formats -- I'll add to Bill's complaints about this stuff that observation that I did all this on a Windows box, and still had to struggle -- I was able to create .rm and .wav files of the paragraph in question -- you can listen to the whole paragraph in RealAudio here, and .wav clips of the crucial segments are linked in below.

In the first rendition, Bush seems to hesitate disfluently at three points, indicated by hyphens in the pseudo-orthographic version below:

Under the dictator, prisons like - abu gar - reb - were symbols of ...

Aside from the hesitations, this pronunciation seems to be a pretty good rendition of what Sam Schechner at Slate magazine recommends. In particular, the vowel of the last syllable was a pretty good IPA [e], not diphthongized like the vowel in English babe but also not laxed and shortened like the vowel in English bed. The president divided "Ghraib" (Arabic for "raven") into two syllables, but this is a plausible anglicization of a word for which the preferred transliteration is actually Ghurayb. Contrary to my memory of the original speech, Bush did not stutter or repeat any syllables, he just hesitated three times -- once before "Abu Ghraib", once in the middle of it, and once at the end. You can listen to a .wav clip here of the words "...prisons like Abu Ghraib were symbols of death and torture".

I don't think that Reuters is accurate in transcribing this as "abugah-rayp". The president's vowel was not diphthongized, as the "ay" would suggest, and the final consonant, though clipped because of the final disfluent hesitation, seems definitely to be a [b], not a [p], as can clearly be seen by the "voice bar" after the closure in the spectrogram (of the "Abu Ghraib" part) below:

First item's grade: Bush gets A for phonetics, C for fluency; Reuters gets a C for transcription.

Bush's second rendition is fluent but puzzling: I'd render it in IPA as [gɑrɔm] or [gɑrɑm]. It's fluent, but the final vowel seems to be backed and somewhat rounded, and the final consonant is definitely nasalized. However, Reuters is also wrong to transcribe it as "garon". The final consonant sounds like an [m], as you can hear in this clip, and on the video, you can clearly see the labial closure.

Second item's grade: Bush gets D for phonetics, A for fluency; Reuters gets a C for transcription.

Bush's third rendition is also fluent, and gets the consonants right. The final vowel of "Graib" is back but not round -- I'd put the whole thing in IPA as [gərɑb] (or maybe [gurɑb] -- the first, unstressed vowel is rather short, and acquires some rounding from the following [r], so its quality is hard for me to decide on). (You can listen for yourself to a clip of the passage here). The final-syllable vowel is not what Slate recommends, but I've heard several newscasters using it in "Abu Ghraib" over the past few weeks.

Reuters treats this pronunciation as "garah" -- that seems to be a complete invention, as both the sound and the video seem to me to indicate that there is a final labial consonant. Their only excuse is that a [p] (at the start of "prison" follows -- but the spectrogram below (of the words "...the Abu Ghraib prison") again clearly shows a voice bar for the [b], not the noise pattern expected for an [h]:

Third item's grade: Bush gets B for phonetics, A for fluency. Reuters gets D for transcription.

Summary: Bush gets a grade point average of 3 -- (4+2+1+4+3+4)/6 -- a solid B. Reuters gets a grade point average of 1.67 -- a weak C-.

We should expect better. As one commentator observed, by now Bush should be able to reel off the pronunciation of Abu Ghraib as confidently and correctly as the pronunciation of the Alamo. But I'm just as disappointed in Reuters. Having decided to devote a whole story to criticizing the pronunciation of a sitting president of the United States, couldn't they take the trouble to sit down with a recording (and the help of a phonetician, given that their reporter obviously had no relevant skills or knowledge) and get the facts right?

As for Robert Fisk, he describes Bush's pronuncation as "Abu Grub", exhibiting the breezy disregard for mere factual detail for which he has become famous. Fisk is wrong on every relevant count, for all three of the president's renditions. In all three renditions, the president divided "Ghraib" into two syllables -- Fisk missed this. The president's final vowels in his three versions of "Ghraib" were [e], [o] and [a] -- Fisk uses orthographic "u", which can't plausibly be a representation of any of these. The president mispronounced the final consonant as [m] in his second version -- Fisk missed it again. Phonetically, Fisk flunks.

[Update: I was able to download the .rm file from CSPAN, and with a bit of help from the RealEditor program, cut out the paragraph in question from Bush's speech. It's available here. I still haven't figured out how to strip out the audio track so as to make spectrograms. I'd also like to be able to get a copy of the version on the White House site, as the audio and video quality seems to be much better, but the site seems to be set up in such as way that no downloading is possible, only streaming.]

[Update 2: I was able to download an audio-only version from NPR, and extract the relevant paragraph, here. After an inordinate amount of (I hope legal) fuss, I was able to convert this to a .wav format, so that I could make spectrograms. I also extracted short clips of the crucial three pronunciations, linked in above.]

Posted by Mark Liberman at 08:21 AM

May 25, 2004

whig-ups?

Geoff Pullum has straightened out the whole Google hit nomenclature thing -- except for one key point. How do you pronounce whG and whG/Gp? (Not to speak of whA and whA/Gp...)

"Double yuu aitch gee" and "double yuu aitch gee per gee pee" are not going to make it.

We can safely leave this to Norma Loquendi and the Law of Least Effort, but my guess is that they'll decide on "whig" and "whig-up" -- or in IPA [h^wɪg] and [ˈh^wɪˌgʌp]. For for those like me who don't distinguish [h^w] from [w], it'll just be [wɪ;g] and [ˈwɪˌgʌp]. Though these will work just fine, and are certainly much more SI-compliant, I have to confess that I don't like them as well as "ghits". I'll also confess that my reason for preferring the pronunciation [gɪts] was the same as Geoff's reason for avoiding it.

Posted by Mark Liberman at 05:39 PM

Webhits on Google per gigapage: A replacement proposal

Most spelling reform proposals are totally unsuccessful, and some are disastrously so. I now want to withdraw mine, making it perhaps the most short-lived and unsuccessful in the history of the universe. Keith Ivey points out to me that under the standard conventions for physicists published here by the Physics Laboratory of the National Institute for Standards and Technology (NIST), unit names like the hertz, joule, pascal, and so on are always lower-case despite being named for persons, though with abbreviations "the symbol or the first letter of the symbol is an upper-case letter when the name of the unit is derived from the name of a person": it's Pa for pascals, Wb for webers, Hz for hertz, and so on. That would suggest "ghit" for web hits using the Google&tm; search engine, but "Gh" for the abbreviation. (The reason I suggested g-hit as the pronunciation is that "git", with a velar stop [g], is a word in British English meaning stupid or obnoxious person. But it won't matter any more; read on.)

I now realize that I am not content with any of this. It now occurs to me that the best analogy for Google hits as a measurement term is not hertz or joules or pascals, but degrees Celsius. Degrees are the units, Celsius is a specific scale of measurement. Words like Celsius are capitalized because they are names of people. Google should be capitalized because it is a corporation name But hits, like degrees, are not named for a person or corporation. According to NIST ‘the correct spelling of the name of the unit °C is "degree Celsius" (the unit "degree" begins with a lower case "d" and the modifier "Celsius" begins with an upper-case "C" because it is the name of a person.’ However, I'm not done yet. Read on.

Putting this insight (hits are analogous to degrees, Google is analogous to the Celsius scale) together with Phil Resnik's suggestion that we really want to count web hits relative to a given search engine, and Mark Liberman's scientific refinement pointing out that we really want to measure web hits per billion documents, we get to the following proposal. The basic unit should be the webhit, abbreviation wh. Webhits measured using Google are Google webhits, abbreviation whG; webhits measured using AltaVista are AltaVista webhits, abbreviation whA; and so on. Web hits per billion pages, i.e., per gigapage (Gp -- David Nash points out to me that giga- is conventionally abbreviated G-, not g-), will be a unit obtained by division, and the NIST standard would be to say webhits per gigapage when spelled out in full but wh/Gp when using the abbreviations. Measurements of webhits on Google per gigapage would be expressed in whG/Gp, and so on.

So that's my current proposal: measuring currency of (sets of) strings on the web by whG/Gp. Though we probably haven't heard the last of this thread. We may need a national standards committee to make recommendations to a central council standing executive committee which will then report to the membership at large through delegates to a national congress... yawn... I never wanted to become a national standards administrator. I'm a grammarian. Grammar is fun. Grammar is exciting.

Posted by Geoffrey K. Pullum at 04:48 PM

Participial relative clauses

If you're interested in English syntax, you'll interpret this post in one of two ways. You might find that it documents a curious construction that (like me) you've never noticed before, and in fact didn't believe to be possible. Then again, you might find that it documents the fact that an allegedly literate adult (like me) can remain completely ignorant of a commonplace and perfectly ordinary aspect of his native language. (If you're not interested in English syntax, you'll want to return to our discussions of eggcorns, ghits and coffee.)

An email query from Haj Ross led me to discover sentences like these:

"Both of whom being influenced by Ellington, Rowles and Brown choose one Ellington tune for each of the two albums that comprise this two-CD set..."
"Ireland and Denmark, both of whom being heavily reliant on British trade, decided they would go wherever Britain went..."
"At present, personal injury cases are heard by many different Judges, some of whom having no experience in this field."

These are supplementary (non-restrictive) relative clauses with a present participle in place of a finite verb, whose subject is a partitive structure involving a relative pronoun, like "both of whom", "most of whom", "few of whose parents", "part of which".

Frankly, every single one of these examples seems completely ungrammatical to me -- or at least they did at the start of the investigation. Of course I have no problem with participal relative clauses like "the boy sitting in the chair" -- but these normally lack a relative pronoun, and the examples above are more like "*the boy who sitting in the chair looked up"!

In some cases, I can fix such sentences up by deleting the prepositional phrase with the relative pronoun (e.g. deleting "of whom"), or by replacing the relative pronoun with a regular pronoun (e.g. replacing "whom" with "them"), or by replacing the participle with a finite verb form (e.g. replacing "having" with have"):

At present, personal injury cases are heard by many different judges, some having no experience in this field.
At present, personal injury cases are heard by many different judges, some of them having no experience in this field.
At present, personal injury cases are heard by many different judges, some of whom have no experience in this field.

So my first thought was that the examples in the first set were mistakes of composition -- slips of the keyboard -- caused by a typing substitution, by a partial revision, or just by losing track of the structure under construction.

However, it seems from Haj's note that such sentences are fine for him. I can't assign him responsibility for the examples above, which I found on the web, but he cites invented examples like

These men, neither of whose ID's being valid, were jailed immediately.
These gophers, some of whom having mated already, command top dollar.

which (if I understand his note) he takes to be perfectly OK. And looking on the web, there are lots and lots of examples of this pattern. So either these preposterous imitations of English have been produced by infiltrators from some parallel universe, or this is one of those little corners of the language where idiolects differ.

Here are a few other examples, just to push the point for those whose initial reactions are like mine:

"The partnership between Mfume and Bond, both of whom having held elective offices for many years, pushed the NAACP aggressively back into national politics in the 2000 election."
"...it was fascinating to find the many fields of medicine entered and the many locations chosen by those of us who attended Duke Medical School together just after the war, most of whom having also been in the armed services."
"She seems to know and be known by a great many residents, many of whom I also know, several of whom being some of the finest elder members in my own congregation, Nashville’s Second Presbyterian Church."
"CFA has grown to represent many of the leading names in UK chilled food production, who employ more than 50,000 people around the UK, many of whom being in rural areas."
"He was one of four brothers, three of whom having died or departed."
"Two associate members and two alternates (none of whom having a prior or present employment relationship with the Fund) would be appointed by the Managing Director after appropriate consultation."
The design and manufacture of the prototype is predicted at a total of eight months, three months of which being design.

There are also plenty of examples with mass-noun partitive constructions, like:

If the branch is aboveground at a bank of meters, all of the service line can be a main, part of which being aboveground.

There are also a few examples that can be construed as relativization out of a supplement to the relative clause, which is a mere island violation:

"The next evening we spent with the Consul and his two pretty daughters, neither of whom being able to speak a word of English, the conversation was carried on in French."
"In this report I have dealt more in particulars for the reason there are no reports from brigade commanders, all three of whom having been captured, I reserve to myself the privilege of making such corrections as would appear right and proper when I subsequently have the opportunity to examine their reports."

These are easier for me to take, once I succeed in parsing them. And then there are some where the participial relative clause is restrictive, which seem even worse to me:

The 99ER Pairs is open to any two players neither of whom having more than 100 or more masterpoints as of November 1st of the year in which the event is held.

As in the case of "such the good son", I expect that now that I've learned that this construction exists, I'll gradually learn to accept and perhaps even use it. No, I probably won't use it, except in a sort of jokey semi-quotative manner.

[Note: in case it's not obvious, I should state explicitly that I don't see anything logically or conceptually impossible in these constructions. I've studied languages that use relative pronouns freely in analogous non-finite clauses (e.g. Latin ablative absolute constructions):

quibus	rebus	cognitis	Caesar	apud	milites	contionatur
which	things	having been learned	Caesar	to	the soldiers	gave a speech
				(B.C. 1.7)

qua	(regione)	subacta	licebit	decurrere	in	illud	mare
which	area	subdued	it will be possible	to run down	into	that	sea
				(Q.C. 9.3.13)

I just didn't realize that English was such a language -- for some people.]

I'll risk adding even more verbiage to this already over-long post, by observing again that this case provides another example of the unsolved problem of looking things up in grammar books.

CGEL devotes chapter 12 to "Relative constructions and unbounded dependencies", and chapter 14 to "Non-finite and verbless clauses". In chapter 11 ( "Content clauses and reported speech), it mentions (p. 990) non-finite clauses with question words, like

Whether hunting or being hunted, the red fox is renowned for its cunning.

Chapter 15 is "Coordination and supplementation", and it cites (p. 1359) supplements that are "comparable in function to a relative clause":

The tourists, most of them foreigners, had been hoarded onto a cattle truck.

[By the way, this must be one of the rare errors in CGEL -- "hoarded" seems to be a malapropism for "herded".]

Somewhere in the 350-odd pages of these four chapters, there is probably a discussion of participial relative clauses with subjects like "some of whom", but I couldn't find it. Nor could I figure out how to find it in the index.

Posted by Mark Liberman at 08:51 AM

Ebooks: Neither E, Nor Books

Cory Doctorow has an excellent talk/essay, "Ebooks: Neither E, Nor Books", which I completely missed, until Blogalization posted on the problems of translating the terms "public domain" and "creative commons" into Spanish and other languages whose parent cultures are outside the tradition of English common law.

Posted by Mark Liberman at 06:22 AM

May 24, 2004

Sticking in one's crawl

The American Heritage Dictionary has

IDIOM: stick in (one's) craw To cause one to feel abiding discontent and resentment.

A transcript entitled "Gen. Anthony Zinni, USMC (Ret.) Remarks at CDI Board of Directors Dinner, May 12, 2004" contains the following phrase:

They did not ever want to hear that we had a problem, something sticking our crawl, that we didn’t bring up to them, and we didn’t honestly express if we felt it had to be expressed.

(Note that this is a transcript of a recorded presentation and interchange -- specifically at this point one of Gen. Zinni's responses to questions -- and so this " eggcorn" is much more likely to have been created by the transcriber than by the speaker.)

This is the only Ghit for "sticking our crawl" (making it a .001 KGh expression, with a frequency of 0.23 GPB -- Ghits per Billion documents). There are quite a few eggcorns for this phrase with crawl in place of craw, however, as long as we retain the preposition "in":

(link) And to think that his hand picked protoge was beat out by a 7th rounder must stick in his crawl.
(link) “Win a Date with Tad Hamilton” is going to cause the girls to melt, but it’s really gonna stick in the crawl of the guys.
(link) It may stick in your crawl, but we're coming back yankee boys and girls.
(link) There are two things that realy stick in my crawl... People who aren't tolerant of other people, and that lying bastard Sayer.

There are 21 Ghits for "stick in * crawl", making it a .021 KGh pattern with a frequency of 4.9 GPB.

The pattern "sticks in * crawl" has 47 Ghits, making that a 10.97 GPB pattern.

(link) But what sticks in my crawl is the creative lethargy that takes a twenty year old song from one of the most innovative bands to emerge from the post punk, pre-techno era and just blithely gut it for fodder.
(link) This sticks in my crawl, maybe because, I refuse to hide behind anything or anyone false ...
(link) there's just something about shania and like, celine dion, that really sticks in my crawl i just can't put my finger on it
(link) Sorry for the rant, but it sticks in my crawl.

The pattern "sticking in * crawl" has 24 Ghits, making it a 5.60 GPB deal, and the pattern "stuck in * crawl" has 66, but only 23 of those are relevant -- the others are things like "stuck in the crawl space" and "stuck in the crawl of traffic".

Adding it all up, we have 21+47+24+23 = 115 Ghits, or 26.8 GPB.

(This is an upper bound, since some of the pages may have been returned for more than one of the searches).

By comparison, looking at the original patterns "stick in * craw" etc., we see that these are overall about 127 times as common:

	stick in *	sticks in *	sticking in *	stuck in *	Total Ghits	GPB
craw	2,930	6,110	653	4,860	14,553	3396.1
crawl	21	47	24	23	115	26.8
Ratio	140	130	27	211	127	127

The original idiom has a frequency of about 3,400 GPB, while the "eggcorn" idiom has a frequency of less than 30 GPB.

Why are 3396.1 and 26.8 GPB better numbers than 14,553 and 115 Ghits?

For the purposes of comparing the eggcorn to the idiom, it doesn't matter -- the ratio is the ratio, and all we need to do is to say that at the moment, the original idiom seems to be a bit more than 100 times more frequent than the eggcorn. However, if we care about the actual frequencies, then we want the normalized counts. It means something to say that as of today, the idiom "stick in one's craw" has a frequency of about 3,400 GPB.

Posted by Mark Liberman at 08:18 PM

Ghits -- and Whits?

I like Geoff's proposal for using Ghits as a measure for Web frequencies, despite some necessary imprecision owing to duplicates, etc. But I think it's premature to settle on Ghits as the only such measure. It works fine for people doing searches in browsers, but for those attempting to automate such searches (e.g., using specializations of the WWW::Search perl module, such as WWW::Search::Google or WWW:Search:Altavista) Google's Web API usage restriction (1000 queries per day) sometimes makes it difficult to conduct searches in sufficiently large quantities. (In my experience Altavista is more forgiving, at least if you have your program pause for a second or two in between http requests.)

My initial instinct was just to propose AVhits as another measure, but perusing the CPAN site,I find that there are a ton of search modules, including not only search engines like Google, Altavista, and Lycos, but also news sources like the Washington Post and Reuters, specialized article searches like PubMed, job searches, etc.

What do do? Perhaps use Whits (for "Web hits") as the more general term, with usages like "1234 Whits (Altavista)", and then consider Ghits to be an abbreviation for "Whits (Google)"?

A nice advantage of this proposal is that you can advise researchers to keep their Whits...just in case anyone wants to see them. :-)

Posted by Philip Resnik at 08:02 PM

GKP, Gh, GPB

Geoff Pullum (GKP) suggests that we start using new units Gh, KGh, MGh and so on, as a way of formalizing the counting of Google hits, now traditional among web language folk, and dubbed "ghits" earlier this year by Trevor at kaleboel. This is a terrific idea, though I'm afraid that Geoff's suggested pronunciation "gee hits" has little chance of making headway against the competitor [gɪts] (hard g, rhymes with grits).

However, I want to propose a more substantive addition to Geoff's proposal, based on the fact that to get a measure of frequency (which is what we often but not always want), we need to normalize by the size of the set searched. In this case, the set is the documents in Google's index, and Google puts the size of this set right up on its front page. As I write this, the number is 4,285,199,774. Now if we take a modest number of Ghits -- a few hundred to a few thousand -- and divide by 4,285,199,774, we'll get an unpleasantly small number. For example, {Pullum} has a count of 23,800 Gh, or 23.8 KGh, or .0238 MGh, which is a pretty respectable number. However, 23,800 divided by 4,285,199,774 is 5.554e-06, or .000005554 Ghits per document indexed.

We can deal with this the way we deal with other uncomfortably large standard measures like Farads and Lenats, by using prefixes such as micro- and nano-. Thus the frequency of Pullum becomes 5.554 microGh/document, or 5,554 nanoGh/document. In general, I think that the nano-scale measure is the right one to use for term frequencies, since the plausible range of sensible and useful frequency counts then correspond to a sensible and useful range of natural numbers: one nanoGH/document corresponds today to about 4 or 5 Ghits; ten nanoGH/document corresponds to about 40 or 50 Ghits; a thousand nanoGH/document corresponds to about 4,000-5,000 Ghits; a million nanoGh/document corresponds to about 4 or 5 million Ghits; and so on.

We need a shorter term for this measure than "nanoGh/document" -- so I suggest that the web frequency of terms should be measured in GPB, for "Ghits per billion documents". I'll illustrate the use of this measure in some subsequent posts.

The value added of a normalized measure is that it will continue to give comparable estimates of how frequent a pattern is, as the number of pages that Google indexes continues to grow. Here's a graph of (self-reported) search engine index size from 12/95 through 6/03, in terms of billions of textual documents indexed:

(KEY: GG=Google, ATW=AllTheWeb, INK=Inktomi, TMA=Teoma, AV=AltaVista).

So a bit less than a year ago, Google indexed 3.3 billion textual pages, vs. 4.3 billion now. That's about a 30% increase. At that rate of increase, the index will double in size in about 2.5 years, and will increase by a factor of 100 in about 17.5 years. In 2021, Google may or not still be in business, and the web will certainly be organized in very different ways -- average document sizes may be quite different, to mention one trivial matter -- but to the extent that we want to make comparisons of frequencies over time -- even over a couple of years of time -- we'd better do our best to normalize counts somehow. And we linguists aspire to work on a time-scale of centuries, if not millennia.

Of course, if we are just looking at the ratios of counts -- or frequencies -- for different cases at a given time, it doesn't make any difference whether we use counts or frequencies, the results are exactly the same. In that case, it's clearer and simpler just to use counts -- and there Geoff's Gh, KGh and so on are just the right thing. An excellent case study using such comparisons of counts can be found here at Tenser, said the Tensor. We've posted a number of examples of the same sort of analysis, for example here.

Finally, I should mention that there's another issue about frequency -- document frequency and term frequency are not entirely interchangeable measures, and the cases in which they differ more or less than expected are sometimes especially interesting. For more on this, see e.g. this reference (or wait for another post on the subject). However, GPB remains a pretty decent proxy for a measure of the frequency of bits of text -- much better, and much more accessible, than anything we had just a few years ago.

[Update: Semantic Compositions suggests using capitalization to distinguish between "raw" ghits (e.g. kGh) and validated ghits (e.g. KGH). I guess one could similarly use Gpb and GPB, though I'm skeptical that folks will be able to keep the capitalization straight.]

Posted by Mark Liberman at 07:52 PM

Of the two mild coffees

Mark's entry on "doux fard" ("doux ou fort", which the waitress translated as "mild or strong" though I would have guessed "sweet [= with sugar] or black") reminded me of another nuance related to ordering coffee in French-speaking countries: my advice to American travelers is that it's really worth learning to pronounce the difference between the vowels in du, deux, and doux. In Nantes, in 1992, failing to distinguish the first two adequately, I managed to order myself two coffees instead of some coffee. Those who know me can be sure I did not need the extra caffeine.

On the other hand, even "doux fard" is more transparent than New York City's code for coffee-ordering, where "coffee, black" means coffee with two sugars and "coffee, regular" means it has milk and two sugars. (Yeah, buddy, dat's what we considuh REG-yoo-luh. Yugodda problum widdat?) Interestingly, the page I referenced does not contain "light and very sweet" -- my preferred combination, which, when done properly, has the taste of very hot melted coffee ice cream.

Posted by Philip Resnik at 07:26 PM

KiloGhits and megaGhits: measuring web frequency

First let me say that in this post I want to make a spelling reform proposal. Previous spelling reform proposals for English have had a disastrously unsuccessful history, but I only want to respell one word, and only by a capitalization. It relates to the matter of getting a little more serious about the terminology for measurment units in practical everyday use of the web as a corpus.

The term "ghit" for "Google hit" is slowly beginning to get established, at least here on Language Log. It seems to me unfair to Google&tm; to use the lower-case "g"; it should be "Ghits". We should honor Google (a company that includes "You can make money without doing evil" as part of its corporate philosophy deserves our respect) the way Heinrich Hertz (1857-1894) is honored in the abbreviation "Hz", the basic unit of measurement for frequency of wave vibrations. The term for the unit should be the Ghit, with capital G, pronounced G hit ("jee hit"), and the abbreviation should be Gh.

A thousand Ghits will be a kiloGhit (under usual US capitalization conventions, KGh — compare KB for kilobytes, KHz for kiloHertz; outside the US we may expect the spelling kGh). A million Ghits will be a megaGhit ((MGh); a billion (10⁹) Ghits, which the would probably get, would be a gigaGhit ((GGh). In due course, as the web gets bigger, we may come to need a term for a trillion Ghits: a teraGhit ((TGh), though the web will have to get about 250 times larger before we need that term.

A measurement in Ghits will be by definition a count of the number of web pages returned by a search pattern. A pattern gets n Ghits if and only if searching the web using Google yields n distinct web pages that contain tokens of the pattern. I do think the pages should be distinct: it seems to me that duplicate pages should in principle be eliminated if the notion of a Ghit is to mean anything. Since it is perfectly possible for a page on the web to have an identical copy at a different URL (this probably happens quite a bit), it is clearly possible for copies of pages to come up as separate hits in the list when you run a Google search. That means that the number of items on the list returned by the Google search engine will only be a rough approximation to the actual Ghit count for your search string. It also will not be a measure of the number of occurrences of the string on the web: the number of occurrences will be higher than the Gh value because a page will often contain multiple occurrences.

Notice that a pattern is a set of strings, not a string. The pattern {ghit} gets 636 Gh, most of them spurious (as Mark pointed out here). But the set {ghit, "Google hit"} is also a pattern, and it gets only 7 Gh, the number of pages that contain BOTH "ghit" and "Google hit". Those are all genuine hits for the word "ghit" that we're talking about, the one that I say should be respelled "Ghit". Switching to plurals gives us {ghits, "Google hits"}, which gets 9 Gh.

Adding strings to a pattern set either keeps the Gh the same or decreases it. There may be quite a few people using Google who do not fully understand that. It would be reasonable to think that a search using the pattern {flowers tulips daffodils pansies dahlias roses} might do even better at getting pages about flowers than {flowers} would, but that is not true; it gets far fewer pages, three orders of magnitude different: 4.5 KGh for {flowers tulips daffodils pansies dahlias roses}, 12.5 MGh for {flowers}. That's because an otherwise relevant page missing just one of the words, say "dahlias", will be ruled out under Google's search principles if "dahlias" is included in the search pattern. Remember also that putting a string of words in quotes turns them into a single word-like unit (call it a pseudoword): searching with the 2-word pattern {chocolate cake} will give utterly different results (far higher: 2.16 MGh) than searching with the 1-pseudoword pattern {"chocolate cake"} (0.616 MGh = 616 KGh).

Posted by Geoffrey K. Pullum at 02:17 PM

More winetalk imports into coffee lingo

A couple of days ago, I failed to understand the Quebecois accent of the barista at a Montreal café ("honi soit qui joual y pense..."). So when I went back to the same place yesterday for a sandwich and a cup of coffee, I was entirely prepared to cope with the choice "doux ou fort?" -- "mild or strong" -- pronounced so that the last word sounded like standard European French "fard".

Well, I got a different server this time, and after she prepared my sandwich and picked up a coffee mug, she asked me "velouté ou corsé?". Now "velouté" means "velvety", and I know it as a term for creamy soups and smooth-tasting wines; and the only experience that I've had with the French word corsé is in the context of wine terminology, where it means "(made) high in alcohol content" or something of the sort. So this left me in a state of cross-linguistic and cross-cultural doubt.

I figured "velouté ou corsé?" was probably just another name for the mild/strong choice. But then again, the server at the same place used "doux or fort?" for that choice just a day before., so maybe corsé meant "with a shot of brandy" or "with an extra shot of espresso" or something? Or maybe I had gotten the word entirely wrong again, due to pronunciation variation?

My uncertainty showed on my face, as usual, and so the barista tried to help me out by switching to English. "The coffee, do you want it smooth or coarse?" Now I was really confused. Being a decisive if random kind of guy, I said "corsé, s'il vous plait", as if I knew what that meant. The server didn't add anything -- brandy or otherwise -- to the coffee, and it tasted just the same as the "fort" variety had the previous day -- well-made brewed coffee, fairly strong, with the taste of a dark roast.

Pursuing the linguistic aspects later on, I learned that I had indeed apparently observed another example of the diffusion of winetalk into other areas. At least, that's what I concluded from perusing the Dictionnaire de l’Académie Française , which is now online. (More exactly, the full eighth edition is online, and the first two volumes of the ninth edition, A to mappemonde). From this source I learned that the verb corser originated in the 16th century, as a derivative of cors, the old form of corps "body", and that it means to augment the alcohol content of a wine, to spice up a sauce, or to add complexity to a story, play or real-life situation.

The past participle corsé is from the 18th century, and the entry notes that the relevant sense of corps "body" is "the consistency of a thickening liquid" -- this is confusing, since higher alcoholic content in wine hardly thickens it, though there is a clear metaphorical sense in which a wine of higher alcoholic content has more "body". The gloss for corsé , translated, is: "having body, vigor. Of wine, high in alcohol. Of coffee, very strong. Of sauce, highly seasoned, spicy." There's no indication that a sauce is considered to be corsé merely by virtue of being thickened.

Here are the full entries:

CORSER v. tr. XVIe siècle. Dérivé de cors, forme ancienne de corps.
1. Renforcer, donner du corps à. Corser un vin, augmenter sa teneur en alcool. Corser une sauce, la relever, l'épicer. 2. Fig. Corser l'intrigue d'une pièce, en multiplier les péripéties. Corser un récit, le rendre plus captivant. Fam. L'affaire, la situation se corse, elle se complique, elle devient plus sérieuse. Péj. Corser une facture, la majorer abusivement.

CORSÉ, -ÉE adj. XVIIIe siècle. Dérivé de corps, au sens de « consistance que prend un liquide qui épaissit ».
1. Qui a du corps, de la vigueur. Un vin corsé, fort en alcool. Un café corsé, très fort. Une sauce corsée, relevée, épicée. 2. Fig. Une intrigue corsée, riche en incidents et péripéties dramatiques. Fam. et péj. Une addition corsée, exagérée, abusivement majorée. Des histoires corsées, osées, scabreuses.

From this entry, it's not clear what the order of the applications to wine, food and coffee were, but I'm guessing that coffee came later in the series, although apparently not very recently.

As for velouté, l’Académie is quite exact about what it means when applied to wine: "Vin velouté, Bon vin qui est d'un beau rouge un peu foncé et qui n'a aucune âcreté" -- "a good wine that has a beautiful red color, a bit dark, and that lacks any acridity". However, the extension to coffee is not mentioned at all. I'm not sure whether this is an oversight, or whether the usage with coffee is recent, but it seems likely that this is a case of diffusion of tasting vocabulary, whatever the timing. Here's the whole entry:

(1)VELOUTÉ, ÉE. adj. Il se dit des Étoffes dont le fond n'est point de velours et qui ont des fleurs, des ramages faits de velours. Satin velouté. Passement velouté. Étoffe veloutée.
Il se dit aussi de Certains papiers qui servent de tenture et dont les dessins, les ornements imitent le velours. Un rouleau de papier velouté.
Il signifie, par extension, Qui est doux au toucher comme du velours, ou Qui a l'apparence du velours; il se dit particulièrement de Certaines fleurs. Les pensées, les œillets d'Inde, les amarantes sont des fleurs veloutées. Peau veloutée. Teint velouté.
Vin velouté, Bon vin qui est d'un beau rouge un peu foncé et qui n'a aucune âcreté.
En termes de Cuisine, Potage velouté, Sorte de potage onctueux.
VELOUTÉ, en termes de Joaillerie, se dit des Pierres qui sont d'une couleur riche, foncée. Un saphir velouté.
VELOUTÉ s'emploie aussi comme nom masculin et signifie Douceur, caractère de ce qui est velouté. Le velouté d'une étoffe, d'une pêche.

English coarse is completely unconnected -- here's the OED's etymology for it:

[First found early in 15th c. No corresp. adj. in Teutonic, Romanic, or Celtic. The general spelling down to the 18th c. was identical with that of the n. COURSE; with that word it is still identical in pronunciation, both in standard English and in the dialects (e.g. Scotch kurs); the spelling coarse appears to have come in about the time when the pronunciation of course changed from (u), to (o). Hence the suggestion of Wedgwood that coarse is really an adj. use of course, with the sense ‘ordinary’, as in the expression of course, ‘of the usual order’. It appears to have been used first in reference to cloth, to distinguish that made or worn in ordinary course from fine cloth or clothes for special occasions or special persons; ‘course cloth’ would thus be ‘cloth of (ordinary) course’. Cf. the history of mean, and such expressions as ‘a very ordinary-looking woman’, a ‘plain person’.
Our first contemporary example of the spelling coarse is in Walton 1653 (where course however also occurs; it became frequent after 1700; course occurs occasionally down to 1800.]

The OED's etymology of English velvet also applies to the French cognate:

ad. med.L. velvetum (-ettum), also vel(l)uetum (-ettum), app. representing a Romanic type *villūtettum, dim. of *villūtum, whence med.L. vel(l)utum (velotum), It. velluto, OF. velut, -ute, Sp. and Pg. velludo, ultimately f. L. vill-us shaggy hair.

and the OED gives a gustatory sense for "velvety", which however focuses on touch metaphors and ignores color:

3.b. Smooth and soft to the taste.

The semantics of coffee is multidimensional. On the production side, there are the intrinsic qualities of the beans and their fermentation, the type and degree of roasting, the method of preparation and the ratio of water to beans, and so on. On the consumption side, there are many flavors, several textures, and these differ in degree as well as kind. But I have the impression that the pragmatic aspects are even more important -- what sort of cultural systems and social settings the speaker or writer wants to evoke...

In French as well as English, there's apparently a growing infusion of terms from wine talk into coffee lingo. This is partly because the oenophiles have plenty of terms to borrow, but I suspect that it's mostly a matter of borrowing prestige by using prestige-associated vocabulary.

[Update: There is a lovely storyat Pedantry, evoked by this post, which confirms (in passing) that velouté or corsé are the traditional Montreal terms for two traditionally-available alternative forms (types? degrees?) of brewed coffee.]

Posted by Mark Liberman at 10:28 AM

May 23, 2004

Trademarks as Adjectives?

trademarks

Geoff Pullum is right to point out the absurdity of trademark attorneys' fetish about always using a trademark as an adjective ("Oreo cookies" rather than "Oreos"). I've run into this notion several times in the course of preparing expert opinions in trademark cases, when attorneys have questioned my descriptions of marks as nouns or noun phrases.

But it isn't quite accurate to say as Geoff does that "The enemy [the attorneys] are laying defenses against is the danger that a trademark might fall into the public domain." This has rather to do with the distinction between terms that apply to the attributes of a certain product or service, which are protectable, and "generic" terms that merely name the class of things of which a particular product or service is an instance, which are not protectable. So you can register the name "Tru-Fit sports shoes," for example, but not "sports shoes" itself.

The line between the two is not always clear, particularly when the mark is descriptive of some unique property of the goods or services it's associated with, which is why companies are careful in trademark applications to describe their brands under descriptions that make the generic term explicit, as in "DISTANCE-COMMANDER brand remote control devices," and the like. (The upper-case letters are used in something analogous to the way they were used by Katz and Fodor, by way of suggesting that an expression is somehow detached from its ordinary English meaning.)

Where the lawyers go wrong is in associating genericness with nominal meanings, and assuming that if you use the mark only as an adjective (or more accurately, as Geoff points out, as an attributive modifier), you will have secured yourself against a competitor's claim that your mark is generic, and hence not protectable.

But of course companies do in fact routinely use their marks as nouns, and indeed, sometimes as verbs. Scott Paper Company ran an ad for Viva paper towels some years ago that showed people using the product to clean various objects and persons against a jingle that ran, "Viva the this, Viva the that, Viva the Chris, Viva the cat…" and so on. Juniper Networks is currently running a television campaign with the slogan "Juniper your net." And "googling" and other verbal forms occur a number of times on Google's own Web site. Those would be rightly described as proper verbs, at least in the sense of "proper" that's relevant to describing nouns like Chevrolet, if not the semanticist's sense of having a unique denotation.

Posted by Geoff Nunberg at 05:41 PM

Split Decision

Arnold has explained lucidly why splitting an infinitive is sometimes obligatory, as in "to more than double." But in the interest of historical accuracy (vulgarly known as "claiming credit"), I should point out that the observation, if not its elucidation, is first recorded in the usage note for "split infinitive" that I wrote for the third edition of the American Heritage Dictionary, which appeared in 1993. It reads, in part:

In We expect our output to more than double in a year, the phrase more than is intrinsic to the sense of the infinitive phrase, though the split infinitive could be avoided by use of another phrase, such as to increase by more than 100 percent.

I'm less than happy with that phrase "intrinsic to the sense of the infinitive phrase," and to tell the truth I can't recall why I used it, though in writing these notes it's always difficult to come up with an explanation that's consistent with the limits of readers' grammatical sophistication and the requirements of brevity. In any case, when we polled the dictionary's usage panel on that example, 87 percent of them found it acceptable, though I suppose you could say that the fact that fully 13 percent demurred is evidence of just how strong a hold these superstitions have on some people.

Posted by Geoff Nunberg at 05:38 PM

Owning ideas

Isaac Waisberg at ibergus points out that

Andrew Galambos argued that ideas were the primary form of property, claimed a property right in his own ideas, and required his students to agree not to repeat them. In Against Intellectual Property (PDF) Stephan Kinsella writes that Galambos "took his own ideas to ridiculous lengths dropping a nickel in a fund box every time he used the word "liberty" as a royalty to the descendants of Thomas Paine, the alleged "inventor" of the word "liberty"; and changing his original name from Joseph Andrew Galambos (Jr., presumably) to Andrew Joseph Galambos, to avoid infringing his identically-named father's rights to the name."

Galambos seems to have been unusual, not to say nuts, even before developing a neurodegenerative disorder at the age of 60 or so. According to Harry Browne's description,

He required every student entering one of his courses to sign a contract agreeing not to divulge any of the course ideas without permission from Galambos — and not even to use the ideas, in business or elsewhere, without permission. In effect, the course tuition bought you the right to become aware of the ideas, but not to use them or even to talk about them to outsiders.

This led to the humorous situation in which a graduate would rave about the course and insist that you take it — but when you asked him for examples of what was good, he would say, "Sorry, I can't tell you."

Even the MPAA and RIAA don't go that far. Of course, they might if they thought the courts would let them.

In fact, Galambos went even further, believing that those whose views were different from his were also guilty of theft of his intellectual property, compounded with mental deficiency:

He spoke frequently of one individual or another who had stolen his ideas. And if it were pointed out that the person was preaching ideas that were the opposite of Andrew's, Galambos would say the person had stolen Andrew's ideas but had gotten them all wrong.

Galambos' perspective -- that ideas are a sort of property whose distribution must be carefully monitored and controlled -- is counterproductive and incoherent, but it's not limited to crazy fringe libertarians. It seems to have been common in antiquity, for example among the Pythagoreans. Their model was the secret lore of religious cults rather than the personal property of small owners, but the result is similar. Imagine if that perspective had governed the development of our culture's intellectual life from Medieval times...

Copyright law is supposed to be about "original works of authorship fixed in any tangible medium of expression", not about ideas. Trademark law is supposed to be about consumer protection. Patent law is supposed to be about inventions, not about ideas in general. However, there's constant pressure from interested parties to extend and broaden these laws, to the point where it's now merely unusual, as opposed to completely preposterous, for someone to claim to own a word. And "business methods" patents, combined with inappropriately trained and credulous patent examiners, open a door that in principle could lead to patenting the broader applications of basic algorithms.

Dibs on modus ponens. And let me point out, following Galambos, that illogical arguments may also be also violations, just incompetent ones.

Posted by Mark Liberman at 03:58 PM

The analysis of the creature

Now that the summer is starting, here's an extended quotation on learning in school and out of it, from Walker Percy's 1954 essay The Loss of the Creature (available in his collection of essays The Message in the Bottle: How Queer Man is, How Queer Language Is, and What One Has to Do With the Other).

A young Falkland Islander walking along a beach and spying a dead dogfish and going to work on it with his jackknife has, in a fashion wholly unprovided in modern educational theory, a great advantage over the Scarsdale high-school pupil who finds the dogfish on his laboratory desk. Similarly the citizen of Huxley's Brave New World who stumbles across a volume of Shakespeare in some vine-grown ruins and squats on a potsherd to read it is in a fairer way of getting at a sonnet than the Harvard sophomore taking English Poetry II.

The educator whose business it is to teach students biology or poetry is unaware of a whole ensemble of relations which exist between the student and the dogfish and between the student and the Shakespeare sonnet.

To put it bluntly: A student who has the desire to get at a dogfish or a Shakespeare sonnet may have the greatest difficulty in salvaging the creature itself from the educational package in which it is presented. The great difficulty is that he is not aware that there is a difficulty; surely, he thinks, in such a fine classroom, with such a fine textbook, the sonnet must come across! What's wrong with me?

The sonnet and the dogfish are obscured by two different processes. The sonnet is obscured by the symbolic package which is formulated not by the sonnet itself but by the media through which the sonnet is transmitted, the media which the educators believe for some reason to be transparent. The new textbook, the type, the smell of the page, the classroom, the aluminum windows and the winter sky, the personality of Miss Hawkins--these media which are supposed to transmit the sonnet may only succeed in transmitting themselves. It is only the hardiest and cleverest of students who can salvage the sonnet from this many-tissued package. It is only the rarest student who knows that the sonnet must be salvaged from the package. (The educator is well aware that something is wrong, that there is fatal gap between the student's learning and the student's life: The student reads the poem, appears to understand it, and gives all the answers. But what does he recall if he should happen to read a Shakespeare sonnet twenty years later? Does he recall the poem or does he recall the smell of the page and the smell of Miss Hawkins?)

One might object, pointing out that Huxley's citizen reading his sonnet in the ruins and the Falkland Islander looking at his dogfish on the beach also receive them in a certain package. Yes, but the difference lies in the fundamental placement of the student in the world, a placement which makes it possible to extract the thing from the package. The pupil at Scarsdale High sees himself placed as a consumer receiving an experience-package; but the Falkland Islander exploring his dogfish is a person exercising the sovereign right of a person in his lordship and mastery of creation. He too could use an instructor and a book and a technique, but he would use them as his subordinates, just as he uses his jackknife. The biology student does not use his scalpel as an instrument, he uses it as a magic wand! Since it is a ``scientific instrument,'' it should do ``scientific things.''

The dogfish is concealed in the same symbolic package as the sonnet. But the dogfish suffers an additional loss. As a consequence of this double deprivation, the Sarah Lawrence student who scores A in zoology is apt to know very little about a dogfish. She is twice removed from the dogfish, once by the symbolic complex by which the the dogfish is concealed, once again by the spoliation of the dogfish by theory which renders it invisible. Through no fault of zoology instructors, it is nevertheless a fact that the zoology laboratory at Sarah Lawrence College is one of the few places in the world where it is all but impossible to see a dogfish.

[...]

To illustrate... The student comes to his desk. On it, neatly arranged by his instructor, he finds his laboratory manual, a dissecting board, instruments, and a mimeographed list:
                    Exercise 22

     materials:   1 dissecting board
                  1 scalpel
                  1 forceps
                  1 probe
                  1 bottle india ink and syringe
                  1 specimen of Squalus acanthias
The clue to the situation in which the student finds himself is to be found in the last item: 1 specimen of Squalus acanthias.

The phrase specimen of expresses in the most succinct way imaginable the radical character of the loss of being which has occurred under his very nose. To refer to the dogfish, the unique concrete existent before him, as a ``specimen of Squalus acanthias'' reveals by its grammar the spoliation of the dogfish by the theoretical method. This phrase, specimen of, example of, instance of, indicates the ontological status of the individual creature in the eyes of the theorist. The dogfish itself is seen as a rather shabby expression of an ideal reality, the species Squalus acanthias. The result is the radical devaluation of the individual dogfish...

If we look into the ways in which the student can recover the dogfish (or the sonnet), we will see that they have in common the stratagem of avoiding the educator's direct presentation of the object as a lesson to be learned, and restoring access to sonnet and dogfish as beings to be known, reasserting the sovereignty of knower over known.

In truth, the biography of scientists and poets is usually the story of the discovery of the indirect approach, the circumvention of the educator's presentation--the young man who was sent to the Technikum and on his way fell into the habit of loitering in book stores and reading poetry; or the young man dutifully attending law school who on the way became curious about the comings and goings of ants ...

However it may come about, we notice two traits of the second situation: (1) an openness of the thing before one--instead of being an exercise to be learned according to an approved mode, it is a garden of delights which beckons to one; (2) a sovereignty of the knower--instead of being a consumer of a prepared experience, I am a sovereign wayfarer, a wanderer in the neighborhood of being who stumbles into the garden.

One can think of two sorts of circumstances through which the thing may be restored to the person. (There is always the direct recovery: A student may simply be strong enough, brave enough, clever enough to take the dogfish and the sonnet by storm, to wrest control of it from the educators and the educational package.) First by ordeal: The Bomb falls; when the young man recovers consciousness in the shambles of the biology laboratory, there not ten inches from his nose lies the dogfish. now all at once he can see it, directly and without let, just as the exile or the prisoner or the sick man sees the sparrow at his window in all its inexhaustibility; just as the commuter who has had a heart attack sees his own hand for the first time. In these cases, the simulacrum of everydayness and of consumption has been destroyed by disaster; in the case of the bomb, literally destroyed. Secondly, by apprenticeship to a great man: One day a great biologist walks into the laboratory; he stops in front of our student's desk; he leans over, picks up the dogfish, and ignoring instruments and procedure, probes with a broken fingernail into the little carcass. ``Now here is a curious business,'' he says, ignoring also the proper jargon of the specialty. ``Look here how this little duct reverses its direction and drops into the pelvis. Now if you would look into a coelancanth, you would see that it--'' And all at once the student can see. The technician and the sophomore who loves his textbook are always offended by the genuine research man because the latter is usually a little vague and always humble before the thing; he doesn't have much use for the equipment or the jargon. Whereas the technician is never vague and never humble before the thing; he holds the thing disposed of by the principle, the formula, the textbook outline; and he thinks a great deal of equipment and jargon.

But since neither of these methods of recovering the dogfish is pedagogically feasible--perhaps the great man even less so than the Bomb--I wish to propose the following educational technique which should prove equally effective for Harvard and Shreveport High School. I propose that English poetry and biology should be taught as usual, but that at irregular intervals, poetry students should find dogfishes on their desks and biology students should find Shakespeare sonnets on their dissecting boards ...

Percy presents a compelling case for John Dewey's ideas about education (though I suppose that he would attribute them to Aquinas). Diane Ravitch in Left Back has presented an equally compelling argument that Dewey's ideas have had a disastrous effect on American education in the past century. I find both arguments persuasive.

I've been thinking about these issues recently. in connection with our society's failure to teach the basic ability to analyze and describe the facts of language and speech. According to my investigations, less than one percent of American college students now take a course in which they might hope to learn to analyze the structure of a sentence or describe the pronunciation of a word. The percentage who acquire these skills in high school (where the basics should really be taught) is not known to me, but it must be lower.

Morning blogs from B to D

The language-related blogosphere continues to expand, at a rate which makes it hard to keep up (especially from a hotel room with erratic internet access, such as the one that I now find myself in). Here are some things I've enjoyed reading this morning, just browsing our blogroll from B to D (I'll start from some other point in the alphabet tomorrow):

The origin and development of the Korean morpheme jjang [via Blinger].

The Electronic Introduction to Old English [via cannylinguist].

Several discussions of the legend that German once almost became the official language of the U.S. [via carob].

Random Engrish slogans [via Chainik].

A self-conscious but communicatively effective Escher sentence, making a point at Close Range.

Ruminations on Oxfam, Heidegger and the "Standard sub-Saharan atrocity discount (SSAD)" at desbladet.

Some commentary by desultor on Patrick O'Brian: a phrasal pattern also found in Boswell and Nabokov; comparative ghits for versions of a proverbial phrase; and echoes of summat in George Eliot and The Office.

A Middle Scots poem on migraine at Digital Medievalist.

How to inflect court martial, and where MacGuffin and maguffin come from, at The Discouraging Word.

Posted by Mark Liberman at 05:18 AM

May 22, 2004

Beets and Bitch

The word in the Stony Creek dialect of Carrier for beets, which are not indigenous to the area, is [ɬits'e], obviously not borrowed from English or French. The other meaning of [ɬits'e] is "bitch", that is, "female dog". [ɬi] is "dog" and [-ts'e] is a suffix meaning "female". It turns out that this isn't a coincidence. Carrier doesn't contrast the vowels [i] (as in beets) and [ɪ] (as in bitch), nor does it allow the palatal affricate [ʧ] represented by the <ch> of "bitch" in syllable final position. From the point of view of Carrier speakers, the English word beets was the same as the word for "female dog", so they named beets "female dog" in Carrier. The Carrier word is based on an English pun!

This isn't the only word for beets in Carrier. The same dialect also has [bʌzkai naxwʌdleh] "it's blood flows out of its surface", a description of the way red color seeps out of beets. This appears to be the earlier term, with the newer one perhaps created when the children started to go to school and learned English. In the Stuart/Trembleur Lake dialect there is a disused term that also refers to the blood, [ʔʌzkaiɣih ] "blood root", but the current term is [lʌsuʧam dʌnʌlk'ʌn] "red turnip", where [lʌsuʧam] is a loan from French le chou de Siam, literally "Siamese cabbage".

Posted by Bill Poser at 10:41 PM

Trademark grammar

I am very grateful to Keith Ivey for pointing out to me that the International Trademark Association (INTA) is the source of the strange Microsoft prescriptivism about trademarks. The INTA is fuller in its list of prohibitions — and apparently nuttier. Their grammatical prescriptions seem at first sight worse than just insane, because at one point they're self-contradictory. But one can make some sense of it all. Let me explain.

Here are some quotes, with my comments following each.

NEVER use a trademark as a noun. Always use a trademark as an adjective modifying a noun.

EXAMPLES:

LEGO toy blocks
Amstel beer

NEVER modify a trademark to the plural form. Instead, change the generic word from singular to plural.

EXAMPLES:

tic tac candies, NOT tic tacs
OREO cookies, NOT OREOS

Now, to begin with, they cannot possibly mean that you should never use a trademark as a noun. Of course you are not misusing a trademark if you say or write that your kid is crazy about Lego, or that your favorite beer is Amstel. As Barbara Scholz pointed out to me, one only has to look at the practice in advertising campaign slogans:

I coulda had a V8! for the V8 brand of vegetable juice (a V8 is a noun phrase in which a is the indefinite article and V8 is the head noun);
Have you driven a Ford lately? for the Ford motor company (again, a is the indefinite article and Ford is the head noun);
Pardon me, do you have any Grey Poupon? for Grey Poupon mustard (any is a determinative functioning as determiner of the nominal Grey Poupon, which is a proper name with a structure comprising an attributive adjective modifying a proper noun);
This is not your father's Oldsmobile in a campaign to rejuvenate the image of Oldsmobile — an unsuccessful one, since the very last car under the name was produced on April 29, 2004 (your father's is a genitive noun phrase functioning as determiner and Oldsmobile is the head noun);
Don't squeeze the Charmin in a campaign that showed women in supermarkets ecstatically squeezing packages of a super-soft toilet tissue (the is the definite article, Charmin is the head noun);

— how could anybody think it was incorrect to use trademarks as nouns, given that millions of dollars are committed to the details of big advertising campaigns and the company controls every line and every word of the copy used? The answer: even quite educated people today know so little about grammar that often they aren't really sure what's a noun and what's not, what's a tense and what's not, what's a passive and what's not, etc., etc.

Notice also that INTA says a trademark must always be used as an adjective. What they mean actually has nothing to do with adjectives. Adjective are words like good, big, soft, reddish, etc. They are often used as attributive modifiers of nouns: good reasons, a big company, etc. But other things can be used as attributive modifiers. Proper nouns can: when we talk about London fog, we are using London (a proper noun) as an attributive modifier of the noun fog. That doesn't mean London is an adjective. It isn't. It's the name of a city. Adjectives never name cities. And adjectives are virtually never trademarked. When we use the expression a London Fog raincoat, we use London Fog (a trademark, with the form of a nominal construction, consisting of a proper noun attributive modifier and a common noun) as an attributive modifier of the noun raincoat. What INTA is saying is that it wants you to always use trademarks as attributive modifiers.

But what the INTA people mean is more subtle than they know how to say, so they get it all wrong. The enemy they are laying defenses against is the danger that a trademark might fall into the public domain. For fear of this (and it can happen), they want to forestall the conversion of certain proper noun trademarks into common count nouns. The worry is that the next stage after writing "Tic Tacs" will be writing "tictacs", and soon people will be referring to some other company's little white mints as tictacs, and soon the trademark might become unprotectable and its value be lost. It would just be a two-syllable word in the dictionary, with a small t, meaning little hard white mint candy.

But notice, none of this is relevant to other products, for example, cars: Porsche is surely very happy for you to praise Porsches as much as you like, calling them Porsches. INTA's intent is clear, but what is actually stated about grammar on their website and in their brochure is nothing like what it is trying to say.

NEVER modify a trademark from its possessive form, or make a trademark possessive. Always use it the form it has been registered in.

EXAMPLES:

Jack Daniel's whiskey, NOT Jack Daniels whiskey
Levi's jeans, NOT Levi jeans

Here they mean only half of what they say. Removing the genitive 's from a trademark that has it built in is illicit: some comanies register a genitive form like Levi's and some register a plural like Tums, and there's a difference. But since every regular noun has a genitive form, every trademark that has the form of a singular noun has a genitive form too: My Porsche's top speed is 130mph is not rationally regarded as a misuse of a trademark. As I pointed out here, Microsoft publishes instructions saying you should never use its trademarked words in the genitive case, and then violates that precept on its own mission statement web page.

NEVER use a trademark as a verb. Trademarks are products or services, never actions.

EXAMPLES:

You are NOT xeroxing, but photocopying on a Xerox copier.
You are NOT rollerblading, but in-line skating with Rollerblade in-line skates.

Here they mean you should never use a noun trademark as a verb, and again that's because its loss of proper noun status might be the start of its falling into the public domain. (Whether a trademark can ever actually be a verb is an interesting question; the old slogan Motorists wise, Simoniz suggests you are supposed to hear Simoniz as a verb, as if it were spelled simonize; but the situation is murky, and I can't find any clear cases of verb trademarks.)

Incidentally, what INTA actually says in the above quote commits the familiar error of confusing classes of words with types of thing. Trademarks are products or services, never actions, they say; and it is true that a trademark is never an action, but then it is also true that a trademark is never a product or a service. A trademark is a word over which a corporation claims some rights. I carefully use font face to draw the distinction here: Jack Daniel's is a product (a bourbon), and not a trademark; Jack Daniel's is a trademark — you can't serve it on the rocks.

Finally, INTA offers a syntactic test for you to use:

A good test for proper use is to remove the trademark from the sentence and see if the sentence (generic) still makes sense. If it does not then you are potentially using the mark as the descriptive term or as a verb and not as an adverb followed by a noun as you should.

If you would like more information on this subject, we suggest INTA's A Guide to Proper Trademark Use.

Here's where they contradict themselves: they said above that a trademark must always be used as an "adjective" (they meant attributive modifier). Now they say a trademark must be used "as an adverb followed by a noun". It's just a proofreading error. But it makes you wonder just how much the people who wrote and checked this page must know about grammar if the mistake doesn't leap out at them like it did at me.

Their test is in principle a good one for their purposes. Attributive modifiers are nearly always optional, so if you have dutifully used your trademarks as attributive modifiers throughout, you should find that when you leave the trademark words out, things still make sense without any change in the grammatical structure of what is said. Let's try this test on some advertising copy. I went to the Kraft Foods promotions page and tested the sentences on the left below (results of the experiments are on the right):

Find why Philly is too good to be true and how you could win instantly in store!	*Find why is too good to be true and how you could win instantly in store!
You could win 6 Giant® mountain bikes for your family with Jack's Pizza®.	*You could win 6 mountain bikes for your family with.

Next I visited Walmart, where at the page about their gun sale policy I did these experiments:

Customers have depended on Wal-Mart for more than 40 years to supply sporting goods me rchandise and equipment, including firearms.	*Customers have depended on for more than 40 years to supply sporting goods merchandis e and equipment, including firearms.
Wal-Mart is absolutely committed to provide firearms in the most responsible manner po ssible.	*Is absolutely committed to provide firearms in the most responsible manner possible.

On to the Toyota company. I noticed they actually had a site called buyatoyota.com, suggesting immediately that they don't object to your calling a Toyota a Toyota; and sure enough, check the results of this experiment in trademark omission:

BuyaToyota.com helps you find and purchase the Toyota you want quickly.

*Helps you find and purchase the you want quickly.

I won't go on with this demonstration that the experiments repeatedly fail; I am shooting fish in a barrel here (with firearms provided in the most responsible manner possible).

The bottom line: it is raving, wild-eyed lunacy to say that no trademarks are corectly used as nouns or that they always have to be attributive modifiers. No company respects these principles; no company could. Yet the people at INTA aren't raving, wild-eyed lunatics. It's just that like an enormous percentage of the educated population of the USA, they know virtually no grammar at all. The schools aren't teaching it, and the linguistics departments that know about it aren't reaching enough college-going students.

Posted by Geoffrey K. Pullum at 11:42 AM

Rodent grammar

Neque Volvere Trochum at entangledbank points us to Harrap's Rat-English Dictionary, advertised as having "[o]ver 5,000 references, 80,000 translations and hundreds of new expressions", as well as "usage notes to avoid being bitten, and slang signals on a wide variety of subjects".

There's a sample page of entries, such as

eee ee ee [iii:'ii:i] v. to go away; eee ee ee eep! get out of the hammock now, it's my turn.
eeeee eee ee [iiiii:iii:i] n (address to sovereign) sir, sire, your worship; eeep eep ip eeeee ee ee; I appreciate your kindess in peeing on my head, sir.

This page is one of the examples of "Rat Humor" on Anne's Rat Page. Anne has a serious pages on Norway Rat Behavior Repertoire and Norway Rat Vocalizations, with spectrograms as well as sound clips. There is also a serious glossary (of human terms for rat behaviors and characteristics), covering interesting things like sidling and bruxing .

Since Anne has such an active interest in the structure of communication, as well as a B.S. in Biological Sciences from Stanford and a Ph.D. in Animal Behavior from U.C. Davis, it's really too bad that she never took a linguistics course. At least, I infer this sad state of affairs from one unfortunate word in this "Sample entry from CD-ROM pronunciation guide" at the bottom of her Harrap's Rat-English Dictionary page:

eeeee eee eeee

SYLLABICATION: eee•eee•ee

PRONUNCIATION: iii'iii-i

PHRASE: That's my pea!

ETYMOLOGY: From high classic Rattus [1.75 million BCE]: eeeee, mine; + ee-e, small round; + ee-ee; give me, 2nd person singular, imperative tense of ee-e-e, to give, v. t.

2nd person singular, imperative tense ?

The imperative is a "tense" roughly in the sense that a ferret is a rodent, or a frog is a reptile. That is, it isn't.

The American Heritage dictionary observes that (this kind of) tense is traditionally

1. Any one of the inflected forms in the conjugation of a verb that indicates the time, such as past, present, or future, as well as the continuance or completion of the action or state. 2. A set of tense forms indicating a particular time: the future tense.

(though the "continuation or completion" part is often called aspect:

A category of the verb designating primarily the relation of the action to the passage of time, especially in reference to completion, duration, or repetition.

The "imperative", on the other hand, is traditionally viewed as a mood:

A set of verb forms or inflections used to indicate the speaker's attitude toward the factuality or likelihood of the action or condition expressed. In English the indicative mood is used to make factual statements, the subjunctive mood to indicate doubt or unlikelihood, and the imperative mood to express a command.

Now, the gloss for mood given above is almost as problematic as the definition of a noun as referring to a "person, place or thing", whose faults were discussed by Geoff Pullum in a recent posting here. But no real Harrap's editor, I hope, would treat the imperative as a "tense", especially in philological explication of an etymology.

Seriously, it's obvious that the author of these pages is an acute observer and a careful writer, and it really is too bad that she hasn't learned the language of grammatical description, whether traditional or modern. While I very much doubt that grammatical terminology has any specific application to the behavior of rats, the general idea of formal combinatoric analysis of behavioral sequences is genuinely applicable.

My main exposure to grammatical analysis of rodent behavior has been via the stochastic grammar of cephalocaudal grooming in mice, but Anne explains that "in rats, most sequences appear to be loosely organized", so that perhaps this is not such an interesting subject in that species. As an alternative, let me suggest applying grammatical methods to studying the political economy of allogrooming, as Pavel Stopka and David Macdonald did in ("The Market Effect in the Wood Mouse", Ethology , vol. 105 no. 11 p. 969 (1999)):

Although grooming is reciprocal in this species, it is asymmetrical in that males groom females more often than vice versa. This grooming asymmetry was studied using Markov chain analysis for grooming sequences in two captive wood mouse colonies, and transition rates were used to represent motivation in both sexes. Grooming sessions were often initiated by a male's attempt to sniff an immobile female's anogenital region, while the female would immediately react by avoiding or biting the male. In order to entice the female to remain, the male would begin grooming the female's head and shoulder area, surreptitiously and consistently grooming downwards towards the female's anogenital region, until she would again terminate such contact either by avoiding or biting the male. While, therefore, the male's tendency to sniff the female's anogenital region was stronger than his tendency to groom her, the female's tendency to terminate the male's naso-anal contact was much stronger than her tendency to terminate his grooming bouts. If the male did not initiate grooming after the female terminated naso-anal contact, she avoided further contacts and escaped. ... This paper therefore provides a new view of the regulation of grooming: grooming is not simply reciprocal with both participants concerned that the other does not 'cheat' (e.g. tit-for-tat (TFT)-like strategy), rather grooming is a commodity which can be bartered against female reproductive information or matings.

[Update: the Harrap's Rat Dictionary page has been fixed! It now (within a few hours of my original post) reads "imperative mood" rather than "imperative tense". Geoff Pullum emailed to point this out to me, and added thet "[t]he speed of publication and promulgation and criticism and alteration and improvement in the blogosphere is really breathtaking". Indeed it is -- though unfortunately Anne's site doesn't (yet?) include a weblog.]

Posted by Mark Liberman at 06:13 AM

With a magical gesture, reveal the vanish...

Mark Liberman has pointed out, in response to a comment at Simple Bits, that the use of reveal as a noun is nothing new. If you find that odd, here's an amuse (or perhaps a disturb): it turns out that a great many English verbs used to make perfectly fine nouns, but somehow lost this ability. Some of my personal favorites include announce, arrive, remove, divulge, and annoy.

In Old English, there were numerous ways to make nouns from verbs. Some of these involved special noun-creating endings, like -nes (as in ābrēotnes 'extermination', from ābrēotan 'destroy'), -ung (as in smirung 'anointment', from smierwan 'smear, anoint'), and so on. Some were rather irregular (dǣd 'deed', from dōn 'to do'). But perhaps the most common way to form abstract nouns from verbs was to take the verb root and simply slap on case endings directly—often creating a strong feminine noun (ending in -u) or a strong masculine noun (no ending at all). For example, the verb faran 'to travel, go' had two nouns derived from it: fær and faru, both meaning 'way, going, journey').

Over time, the -u endings fell off (along with most of the verb endings), leaving many nouns that looked more or less identical to their corresponding verbs. Thus was born a robust process of "zero derivation." As a flood of French verbs entered the language, they acquired noun forms by zero derivation, too. Many of these deverbal nouns (of both English and French origin) have stuck with us, and we don't bat an eye at them (turn, slide, ride, bite, ...). But somewhere along the way, a bunch of deverbal nouns got lost. For example, Shakespeare writes in Hamlet IV.5 81: "Next, your own son gone, and he most violent author of his own just remove", where remove means death. (The OED defines this use of remove as "the act of removing a person by death; murder"). Remove just can't be used as a noun this way anymore.

The OED is a treasure trove of other examples:

adorn: 1596 SPENSER F.Q. III. xii. 20 Without adorne of gold or silver.
disturb: 1597 DANIEL Civ. Wars VI. xlvii, From all Disturbs to be so long kept free.
arrive: 1615 CHAPMAN Odyss. II. 379 His wife should little joy in his arrive.
destroy: 1616 LANE Cont. Sqr.'s T. IX. 476 The sweete boy, wailinge most rufullie his frendes distroie.
relate: 1651 Fuller's Abel Rediv., Beza (1867) II. 218, I am he To whom an infant can no relate be.
pray: 1654 GAYTON Pleas. Notes II. v. 54 Father, we are for fighting, not for pray.
recede: 1658 SIR H. SLINGSBY Diary (1836) 202, I shall now take occasion to make my recede from the world.
announce: 1787 J. NICHOLS in Welsted Wks. p. xxvi, This friendly announce is somewhat premature.
ask: 1781 T. TWINING Let. 8 Dec. in Recreat. & Stud. (1882) 108, I am not so unreasonable as to desire you to..answer all my asks.
think: 1870 MRS. WHITNEY We Girls ii, Ruth did talk..when she came out of one of her thinks.
amaze: 1880 HOWELLS Undisc. Country v. 85 He stared at Ford in even more amaze than anger.

Others include depart, reduce, produce, maintain, retain, detain, deploy, retire, acquit, greet, defend, divulge, startle, entertain, amaze, and vanish.

In addition to these examples, there are also some "frozen" deverbal nouns, which still occur, but only in a particular phrase or idiom: employ (as in "in his employ"), compare ("beyond compare"), fancy ("flights of fancy"), and say ("have one's say"). These, too, seem to go back to more general uses, such as:

say: 1885 LYALL Anc. Arab. Poetry 21 There rises a lord, to say the say, and do the deeds, of the noble.

The real ask, then is not how "the big reveal" got its make, but rather, why is it such a startle? What caused the vanish of so many perfectly good nouns?

Incidentally, it's possible that all those live reveals on Trading Spaces are a rather different phenomenon. As a parallel, consider the word vanish. Until recently, this verb could also occur as a noun (1872 'MARK TWAIN' Roughing It iii. 33 "He..left for San Francisco at a speed which can only be described as a flash and a vanish."). Nowadays, however, the only people using it as a noun are magicians, who use it quite routinely to describe disappearing acts. For instance (quoted from a page on magic for kids): "With a magical gesture reveal the vanish of the coin, then make it re-appear wherever you want." It seems that the magician's use of vanish is meant to indicate the act, or performance, of causing something to vanish. The reveal seems to have a similar flavor; it sounds like a bit of reality-TV board-room lingo for a packaged act, that has slipped onto the screen.

Then again, recent creations like freshman admits (=admittees), new hires, and the like, might just show that zero derivation to create nouns from verbs in English is not totally dead, it's just having a dwindle.

Posted by Adam Albright at 05:15 AM

May 21, 2004

The secret of "reveal" revealed!

Following up on this post, Michael Albaugh emailed some additional information on reveal as a noun:

... it is commonly used in carpentry or cabinet making, indicating the portion of one piece of trim "revealed" by the displacement of another.

He's absolutely right -- I remember it well from shop class. Not only that, but the American Heritage dictionary tells us that reveal can also be

The part of the side of a window or door opening that is between the outer surface of a wall and the window or door frame. b. The whole side of such an opening; the jamb. 2. The framework of a motor vehicle window.

However, if the AHD is right, this is NOT a nominalization of the verb reveal. For the verb, the AHD gives the expected etymology

Middle English revelen, from Old French reveler, from Latin revēlāre : re-, re- + vēlāre, to cover (from vēlum, veil).

However, for the noun, the AHD gives the etymology

From Middle English revalen, to lower, from Old French revaler : re-, re- + avaler, to lower (from a val, down ( a, to, from Latin ad; see ad– + val, valley; see vale1).

This is like the two sources for pole discussed in this post. Live and learn.

Posted by Mark Liberman at 06:53 PM

Not for commitment-phobes: your relationship number

I've just noticed that my bank account doesn't have an account number any more. You might have thought that would make it hard for them to keep track of the checks. But it's not that there isn't a number. It's just that it's now called a relationship number. I don't just have an account, apparently, I have a relationship. Part of a plan to make me love them more, I suppose, like my mortgage company's curious decision to send me execrable poetry in the mail. Those with commitment phobia are advised to stay away from Wells Fargo Bank; they aren't content with a casual thing, they want to get serious. What I need to know is whether they are interested in me for myself, or whether they're just after my money.

Posted by Geoffrey K. Pullum at 12:52 PM

Doux fard

I'm in Montreal for NAPhC3. After I arrived yesterday afternoon, I met Charles Reiss and some others, and we out for lunch. It was the kind of place where you order at a counter, get your food, and then go sit down. After the meal I went back for a cup of coffee, and the young woman behind the counter asked me "doux fard?"

Now, my French was never terrific, and is pretty rusty now, but I know fard as a word meaning "make-up". There didn't seem to be anything special about hers, and I wasn't wearing any, and anyhow the whole topic of the gentleness of make-up was not relevant to our exchange. So I thought to myself, "I wonder if there's another meaning that I'm forgetting -- or could fard be Quebecois barista slang for some coffee-related substance or state, like the American harmless for 'decaf skim latte', or wet for 'without foam'?" Since doux can also mean "sweet", maybe she's asking me whether I want cream and sugar?"

I looked puzzled, so she pointed to the cup in her hand, and repeated "pour le café, doux fard?" I was thinking to myself "maybe it's d'ou not doux? but that doesn't help..." and I muttered something like "je comprends pas", not that it wasn't obvious, so she switched to English: "the coffee, do you want mild or strong?"

"Doux ou fort." Oh.

I guess that I should take advantage of my visit to learn something about the Quebecois vowel space. More on this later.

[Update: this table indicates that shifting the vowel of fort to sound like the vowel of standard european French fard is normal in Quebecois (the column abbreviations are Fes = français européen standard; Fqs = français québécois standard; Fq = français québécois non standard):

The table comes from this page, which gives a broader characterization of "les aspects phonétiques les plus répandus du français québécois" ("the most widespread phonetic aspects of Quebecois French").]

Posted by Mark Liberman at 07:45 AM

May 20, 2004

Seven years waiting for a reply on Ebonics

Seven years is probably long enough to wait for a reply to a letter before concluding that there will never be a reply. April 23 passed this year, like six before it, with still no reply to a letter about African American Vernacular English that I sent on that date in 1997 to the well known African American columnist William Raspberry (Pulitzer Prize nominee and recipient of several honorary doctorates), who writes for the Washington Post. So I think it's time to just post the letter on Language Log for others to see it.

I wrote the letter a few months after the disastrous press reception of the Oakland, California, school board's declaration on the possible educational importance of classroom use of what they most unwisely called `African Language Systems'. At just one point they also mentioned the name `Ebonics', and that was the name the press picked up on as they went into an orgy of riducle and outright hostility. (If you don't know the story, John Rickford's writings might be the place to start reading.) At the time, my own unwise practice (as in my commentary "Language that dare not speak its name" in Nature 386, 27 March 1997, 321-322) was to call the language in question African American English (AAE), since that was shorter than the familiar linguist's term African American Vernacular English (AAVE). What's unwise about that is that of course millions of African Americans don't speak the language at all; it is a vernacular dialect restricted mostly to uneducated residents of segregated areas. (I have corrected AAE to AAVE in the letter below.)

The occasion for writing the letter was that I had just seen the remarkably unfunny humorous column Raspberry published in the Washington Post on December 26, 1996, right after the Oakland story broke. Like everyone else, he was indirectly mocking the Oakland Unified School District and the idea of making an unprejudiced judgment about the sociolinguistic situation of many of Oakland's black schoolchildren, and directly mocking `Ebonics'.

Raspberry's column was bad, I mean ba-a-a-ad, in the Standard English sense, not the AAVE slang sense. The column was probably produced hastily, perhaps during what may have been a bibulous Christmas Day. I rather I hope he is ashamed of it. I won't explain all of the column, but basically it involved an imaginary alter ego of Raspberry himself getting into a cab in Washington DC and having a conversation, full of misunderstandings, in which the cab driver speaks AAVE and Raspberry does not. For example, the cab driver says 'Sup? (for "What's up?") as a greeting and the fictional Mr Raspberry thinks he is being asked if he would like to sup, so he says he has already dined (I did warn you that it was not funny). At the end, when he learns there is money in giving classes on AAVE, the fictional Mr Raspberry suddenly starts speaking it himself (as if all black people really do know it deep down).

My letter about this lame column was relatively friendly, though, because I sought information. I wanted to know something about him. I've shortened the letter a little below, removing some further boring friendlinesses that did not advance the main content (stuff about how it was ironic and perhaps apparently presumptuous for a white linguist born in Britain to be writing to an African American journalist about the grammar of AAVE). But perhaps the reason my letter met with seven years (so far) of stony silence was that nobody likes to be accused of being linguistically clueless, and what I had to say to him was at root, however politely cloaked, that he didn't know a single thing about the language he was mocking. This is extraordinary, because he was born in Okolona, Mississippi, in 1935, and I would have thought that would have put him in a monolingual AAVE community, but as I argue in the letter, it's as if he had never heard the language at all. Here's the letter, placed on Language Log for the record in case it has more interest to you than it apparently did to its distinguished addressee.

Stevenson College
University of California, Santa Cruz
Santa Cruz, CA 95064

April 23, 1997

Mr William Raspberry
The Washington Post
1150 15th Street NW
Washington, DC 20071

Dear Mr Raspberry,

I have only just seen for the first time your column of December 26, 1996, "To Throw in a Lot of 'Bes,' or not? A conversation on Ebonics." There was one thing about it that fascinated me.

I'm a linguist, and one of those who claim that African-American Vernacular English (AAVE) is a consistent language with its own rules. What I noticed was that your piece includes two paragraphs -- just 32 words -- in which the characters speak entirely in AAVE. But there appear to me to be grammatical errors in the AAVE. Not cases of difference from standard English, which is of course what correct AAVE frequently has, but rather cases of difference from AAVE as I know it. Let me run through them.

First, you have the cabbie saying What you be talkin' 'bout, my man?. But the uninflected be of AAVE is normally a habitual aspect marker. Your cabbie does not mean 'what do you habitually talk about?', he means 'what are you talking about right now?'. Surely the normal AAVE for this would be What you talkin' 'bout?, with the zero copula, not the uninflected be.

Second, you have the cabbie saying I don't be offerin' you my grub. Again, this is clearly present progressive -- it is what is happening right there and then he is referring to, not some habitual state of affairs. So the be seems wrong here too. Maybe utterances like I don't be lying do sometimes occur in AAVE for 'I am not lying', but I've never heard the construction or encountered any reference to it in the research literature; whereas I've heard I ain't lyin' hundreds of times (as in the blues song 'I Put a Spell on You', where I ain't lyin' is used as a perfect rhyme for because you['re] mine; he doesn't sing, *I don't be lyin').

And in the same example, the don't seems likewise wrong. AAVE is like Finnish in that it has a separate copular verb of negation meaning 'not be', pronounced ain't, and you need that here. The normal way to say 'I am not offering you my food' in AAVE (if AAVE speakers really do use the word grub for a fish fillet and small fries, a point on which I will trust you) is I ain't offerin' you my grub. (For a more distinctively AAVE utterance you could have had your cabbie say, I ain't offerin' you no grub, with the multiple negation marking that is such a distinctive feature of AAVE, but which doesn't occur in your AAVE dialog.)

The fourth apparent error is also in the cabbie's speech. He says, I be sayin' hello. Once more, 'I habitually say hello' does not fit the context; the cabbie is explaining that his initial utterance, 'sup, was a greeting. It is quite unusual to find I be sayin' with the meaning 'I am saying'. There is an utterance containing they be sayin' quoted from the speaker called Larry in Bill Labov's paper 'The logic of nonstandard English', and it is quite clearly habitual in meaning. There are no occurrences anywhere I have found in which the meaning is progressive. In this example, the copula cannot be omitted, however: *I sayin' hello would be ungrammatical. In Hungarian, the zero copula occurs only in the third person, and in AAVE it is not permitted in the first person singular. So the most likely form we would get for this meaning would be I'm sayin' hello.

And fifth, at the end you have the Raspberry alter ego switching into AAVE for the punchline, as he realizes he could augment his columnist's salary by giving language lessons. Well, he shouldn't give up his day job, because he doesn't appear to know this language. Maybe you be onto somethin' dere, my bruvah, he says. But once more it is the immediate present he is referring to: he doesn't mean 'maybe you are habitually onto something', but rather, 'That's a good idea.' I'm quite sure that the most usual way of saying this would be Maybe you onto somethin' dere (second person, so you do get the zero copula).

There are other errors, too, in the things you have your characters say about AAVE rather than in it. The claim by the cabbie's brother-in-law that you have to "leave off final consonants" is an example. From the cabbie's first word, 'sup, there isn't a single final consonant missing in any of your AAVE dialog. (Words like somethin' are not missing a final consonant; n is the final consonant; standard English has ng instead, a velar nasal instead of an alveolar one, but in both dialects the word ends in a nasal consonant.) Unillustrated in your dialog is a process of reduction that gives AAVE res' for 'rest', respec' for 'respect', han' for 'hand', and so on. But it's quite tight and systematic; the rule is (at least approximately) that a word-final stop consonant is elided if it is preceded by another consonant of the same voicing. In words like belt and dump, all consonants are pronounced (t and p are voiceless but l and m are voiced), and likewise in Fats (s is a consonant of the same voicing as t, but it is not a stop so it is retained). The cabbie's brother-in-law would have us believe that in general or at random the last consonant of an AAVE is or may be dropped. That's dead wrong; his wife's brother deserves a better mentor.

I grant you, the Oakland School Board's resolution was badly written and at some points really stupid; it deserved much censure for its Afrocentric posing (AAVE is not a West African language in origin) and its clumsy formulations. But deep down, there are linguistic and (more importantly) educational issues on which the board is exactly right. Every time I saw another black columnist come out and join the ridicule chorus, as you did (more amusingly than most), it grieved me. The folks your alter ego accurately calls "the unlettered black masses" suffer so much, and take so much undeserved contempt and abuse. It is just not appropriate to add insult to this injury by showering ridicule, contempt, and abuse on the structurally interesting dialect they happen to speak. I was really sorry that virtually every columnist in the USA chose nonetheless to do just that.

Sincerely,

Geoffrey K. Pullum
Professor of Linguistics

Posted by Geoffrey K. Pullum at 04:26 PM

Clarifying status in Wolof by fake disfluency

A little while ago, I posted about Judith Irvine's observation that "upwardly-mobile men among the Wolof nobility cultivate inarticulateness as a sign of status", and suggested that this might have some application to the case of powerful men in our culture who sometimes seem to project the attitude that they're too busy or too important or too verbally unskilled to manage to figure out how to pronounce a name like "Taguba." Or rather, I posted about my memory of having heard about this aspect of Wolof sociolinguistics, at some time in the past; and so I hedged what I said, in an attempt to avoid starting a cute story that might turn out not to be accurate.

This morning, Steve Matuszek emailed with a degree of confirmation and some additional information:

I love the Language Log.

I was a computer science major at UMBC, but I took Wolof for two years, because I figured it would be a challenge.

I forwarded the posting on Wolof men to my professor, Omar Ka, who is native Senegalese, and he wrote

This posting is indeed interesting, and mostly accurate. The only caveat to add is that this "fake" lack of fluency occurs only when the Geer interact with the Gewel. It is then a means to clarify each other's status.

So thank you -- I hadn't thought about griots or noun categories in years.

My parents and I also had a great deal of fun with Escher sentences at lunch yesterday.

See the earlier post for background on Geer and Gewel.

Posted by Mark Liberman at 03:26 PM

Dorothy Dunnett cleared of anachronism

Be it hereby noted that Dorothy Dunnett had good historical evidence for the currency in Tudor times of the disease name "the Marthambles". Details follow below.

Back in February, I noted that Dorothy Dunnett used the pseudo-disease name "Marthambles" in her novel The Ringed Castle, published in 1971, some seven years before Patrick O'Brian first used the same word in his novel Desolation Island. Shortly thereafter, I posted a note from Lisa Grossman, who reported tracing the word (with the help of a researcher from the National Library of Medicine) to a pamphlet published in London by a Dr. Tufts to advertise his tonics and medicines. Lisa said that "[t]he pamphlet was not clearly dated, but Tompson placed it circa 1675."

Lisa also pointed out that O'Brian also used "the Strong Fives" and "the Moon Pall", two other invented diseases from the same pamphlet, suggesting that Dunnett and O'Brian borrowed from the same source -- perhaps C.J.S. Tompson's "The Quacks of Old London".

I observed at the time that a date of 1675 for the source pamphlet would have made the Marthambles an anachronism both in Dunnett's book (which is set in 1555) and in O'Brian's (which is set in 1811).

Shortly thereafter, Diane MM from Plano, TX, emailed to tell me that

Elspeth Morrison's The Dorothy Dunnett Companion, on page 223, has the following entry:

"Marthambles: Castle, II, 9: Popular collective term for any number of divergent symptoms or diseases noted and 'treated' by a mountebank. If particularly fortunate, the patient might also be relieved of the symptoms of the Rockogrogle. Fictitious diseases still cost good money to cure. (W.S.C Copeman, Doctors and Disease in Tudor Times.)"

Although I've not read Copeman's book, I've ascertained that it was published in 1960, in London by Dawson.

As Diane pointed out, this strongly suggests that the Marthambles (and perhaps the others as well) were not in fact invented by Dr. Tufts in 1675, but were taken by him from a popular culture of medicine that had been around at least since Tudor times. And therefore, Dorothy Dunnett is absolved of any taint of anachronism.

Unfortunately, at about this point, I mislaid Diane's note in the course of a series of email server outages and moves. What with one thing and another, I haven't since untangled some little pockets of mail scattered around on various servers that I used as temporary expedients. Anyhow, Diane recently wrote to express her concern that an unanswered accusation of anachronism has been floating around in netspace all this time, associated with Dunnett's e-identity.

So I apologize to the spirit of Ms. Dunnett -- she died in 2001 -- and to Diane. I've also added a link from the earlier post to this one.

Posted by Mark Liberman at 03:06 PM

May 19, 2004

Knowledge her last reveal

As The Curmudgeonly Clerk did a bit earlier, a commenter over at Simple Bits has projected the Language Log bat signal on the low-hanging clouds of this evening's blogosphere. Geoff Pullum answered T.C.C.'s call, and is now chilling out after parking the 'Logmobile back in the 'Logcave, so I guess it's up to me to roll on this one...

Dan Cederholm at Simple Bits wrote:

Every year I’m amused by a certain catch phrase that sweeps the media. Last year it was “cold snap” — at least here in the Northeast United States.
[...]
Previous to last year I had never heard these two words used in conjunction. And this last winter it disappeared and was never uttered again. It had reached its Tipping Point, and people moved on to other ways of describing how cold the weather was.

This year, it’s “reveal”. Specifically when used as a noun. This word is everywhere, and we can blame reality television for it. Any makeover show — or one with a surprise ending will use this to describe the portion of the program that you just can’t miss.

“The big reveal is coming up… right after the break.”

I’m guessing that “reveal” has almost reached its tipping point.

Down in the 12th comment, Jacob observed:

I’ve also heard “cold snap” used all my life. I grew up in south Texas, for what it’s worth.

The use of “reveal” as a noun is mind-boggling; I don’t watch much TV besides cartoons and Japanese cooking shows, so I’ve never heard it used in that fashion. Perhaps “revelation” carries too much biblical baggage for comfort? Maybe they’ll address this over at the Language Log.

Well, in the first place, I'm with Jacob (and other commenters) who think that "cold snap" is an old standard -- and I grew up in rural eastern Connecticut. I also haven't noticed any recent peak and subsequent decline in usage. The OED says that "cold snap" is originally from the U.S., and gives a citation from T. Smith's "Jrnl." for 1776. Using Altavista's "Advanced Web Search", I get the following counts for "cold snap" within the past five six-month periods (ignoring that 2004 isn't half over yet):

1st half 2002	2nd half 2002	1st half 2003	2nd half 2003	1st half 2004
609	865	1,435	2,361	14,390

The time function of counts may not be reliable in this source -- I think the date restriction is based on file dates in a current snapshot, not samples collected at different times -- but such as it is, this evidence certainly doesn't support the view that "last winter [cold snap] disappeared and was never uttered again". As a spot check, Google's news search shows plenty of recent news items using "cold snap", such as this Chicago Sun-Times "Midwest Fishing Report" dated May 19, 2004, which tells us that

Crappie were hot last week; the weekend cold snap backed them off. Crappie should come up this week when the weather stabilizes.

[By the way, note the nice zero-affix plural form of "crappie", as appropriate for game animals, e.g. elk, deer, salmon, ...]

There are plenty of recent uses in papers from the northeastern U.S., such as the Providence Journal on May 11: "New England's power grid has concluded there was no market abuse involving a high number of shutdowns by generators using natural gas during a January cold snap"; or the Philadelphia Inquirer for May 16: "The limited supply is due to last winter's cold snap that killed off large numbers of woolly adelgid before they could be harvested to feed beetles..."

Here are comparable counts from Altavista for the phrase "big reveal" over the same time period:

1st half 2002	2nd half 2002	1st half 2003	2nd half 2003	1st half 2004
16	27	27	97	781

Comparing scaled counts, with the first half of 2002 set to a value of 1, suggests that "big reveal" is indeed booming this past year year to a larger extent than "cold snap":

	1st half 2002	2nd half 2002	1st half 2003	2nd half 2003	1st half 2004
cold snap	1	1.42	2.36	3.88	23.6
big reveal	1	1.69	1.69	6.06	48.8

As a point of comparison for overall growth of the web page population, take the counts and scaled counts for "comparison", which has presumably not been subject to fashion or fad in either direction:

	1st half 2002	2nd half 2002	1st half 2003	2nd half 2003	1st half 2004
counts	192,254	334,006	353,157	665,605	6,830,094
scaled counts	1	1.74	1.78	3.46	35.52

Since the scaled counts are in between "cold snap" and "big reveal", maybe the former is at least slacking off relative to expected growth rates, while the latter is growing faster than expected? So maybe Cederholm has noticed an inflection in the second derivative of the frequency of this expression? Then again, maybe he's just blowing smoke... One way or another, chalk up another instance of Layne's Law.

Of course, I'm working with a small amount of far-from-ideal evidence here -- more and better data would be needed to say anything believable about relative rates of change. Philip Resnik has been poking around in the Internet Archive, and maybe he can calibrate this better. This particular case is not very important, but a general ability to make reliable statements about changes in word and phrase frequency over time would be a Good Thing.

As for Jacob's conjecture that "reveal" is used as a nominalization in place of "revelation" in order to avoid religious overtones, -- that may well be true.

Nouning verbs by "zero derivation" is pretty common: cancel, display, gulp, skim, take, try, wail, etc..

And different nominalizations are sometimes associated with a different senses of a verb. Thus expose in one sense is connected to exposure, and in another sense to exposition. So it makes some sense to choose a new nominalization for a new nominal sense.

And believe it or not, "reveal" as a nominalization is pretty old, in fact archaic. The OED gives us:

[f. REVEAL v.]

A revealing, revelation, disclosure.

1629 WADSWORTH Pilgr. iii. 22 He vtterly disclaimed their superstitious reueales.
1646 SIR T. BROWNE Pseud. Ep. 195 In nature the concealment of secret parts is the same in both sexes and the shame of their reveale equall.
1858 BAILEY Age 41 Faith her first law, knowledge her last reveal.

Posted by Mark Liberman at 08:25 PM

Microsoft prescriptivism

There is a set of on-line prescriptive primers primer on how to write prose that includes Microsoft's trademarks. I've already violated its rules in the opening sentence of this post, unfortunately. Let me start again and do it correctly...

There is a set of on-line prescriptive primers on how to write prose that includes Microsoft® trademarks. You can see one of them here; (I learned about it from the amusing citation of it here; Keith Ivey has pointed out to me that it is based on material from the International Trademarks Association, which I will write about in a later post). It includes this stern injunction:

Do Not Use Microsoft Trademarks in the Possessive or Plural Form

Microsoft trademarks should never be used in the possessive or plural form, but should be introduced as a proper adjective followed by an appropriate descriptor.

Correct: This presentation was created using PowerPoint® presentation manager

Incorrect: Widget Software Company included some PowerPoints in its presentation

So not only does this evil company want to control all operating systems, browsers, word processors, audio players, spreadsheets, mailers, messaging presentation, and all other software in the whole damn world, crushing the life out of any rival companies by such illegal means as may be necessary; it wants its registered marks to be, unlike virtually all other nouns in the English language, nouns without a plural or genitive case forms -- the very inflections that are definitive for noun status in English.

I for one can't believe that they seriously think it is damaging to their interests if I say I am so impressed by Word's many cool features (using a genitive form of the Microsoft trademark Word®). I think that (once again) we have a case of people who want to say something that involves grammar only they have no idea how to control the terminology so as to say what they mean.

The "incorrect" example cited above has a feature they don't mention at all, yet it is crucial: it extends the meaning of the proper noun (and registered mark) PowerPoint® to a new meaning as a common noun meaning "individual slide in a set of visual presentation aids projected from a computer". That is a totally different issue: it's like Hoover not wanting vacuum cleaners (of any make) to be called hoovers as they are in Britain, or Frigidaire not wanting to hear people talking about buying some other maker's frigidaire, or the Xerox Corporation hating the notion of a xerox that was actually made on a Canon. This is about trademark dilution.

But what they actually say in the quote above is that they don't want any Microsoft trademark to appear in the plural or the genitive. Now, they seem to have missed the point that their trademark Windows® already is morphologically in the plural form, so it can only appear in the plural (though of course it is syntactically singular: Windows® is junk, not *Windows® are junk; this can happen with plural nouns: compare with Cornflakes is my absolute favorite breakfast). And as for the genitive ("possessive") form, it is formed by an inflectional process so productive that it applies to absolutely every new noun added to the language, and they can't possibly be serious about blocking it.

And indeed, they're not. On a hunch, I went to their mission statement page, and as I was expecting, I read this:

Microsoft's mission: To enable people and businesses throughout the world to realize their full potential.

Genitive case on their most important linguistic property, their corporate name itself (and with no ® symbol), on a key Microsoft web page. I thought so! As is so often the case, the prescriptivists don't think their prescriptions have to apply to them, only to the little people like you and me.

Posted by Geoffrey K. Pullum at 06:58 PM

Terror: not even a noun (says Jon Stewart)

The College of William and Mary booked alumnus Jon Stewart as the alumni commencement speaker this year. To me, the transcript of his address looks insulting, sloppy, and chaotic: it begins with an insult to the institution and the ceremony and then starts going downhill. I can imagine many listening parents being fairly disgusted. But who knows? These days Stewart is being spoken of reverently as having completely redefined political satire with his show on Comedy Central. Maybe they were proud just to have been there in his presence. But I digress. The linguistic point, and I do have one, is that at one stage in his rambling and oddly unfunny remarks, apropos of almost nothing but near some confused stuff about war, Stewart said this:

We declared war on terror. We declared war on terror -- it's not even a noun, so, good luck. After we defeat it, I'm sure we'll take on that bastard ennui

The Curmudgeonly Clerk, a legal blog, was puzzled about this, and rightly so. What could Stewart mean by saying that terror is not a noun? I think I know. Let me explain.

The traditional definition of the term "noun" has a fantastically strong hold on the public imagination. In old-fashioned grammar books it is usually the first line of the first section of the first chapter: "A noun," it will say, "is the name of a person, place, or thing." What Jon Stewart has dimly perceived is that terror is not a person, so we can't assassinate it; it is not a place, so we can't bomb it; and it is not a thing, so we can't find where it is and blow it up -- it has no spatial location.

The trouble is, of course, that the old definition is a complete crock. It is almost useless. Not completely useless, mark you: as Rodney Huddleston and I point out in Chapter 1 of The Cambridge Grammar of the English Language, it is useful in identifying which of the word classes in a language is the one that corresponds to the class we call noun in English (or any other language we've analyzed). The words we should call nouns in Japanese are the ones in that class of words (and there will be one) which includes the most basic words for kinds of thing, sorts of place, and types of people. There will be Japanese words for rice, bowl, tree, dog, hill, ocean, man, woman, etc. When you've found the grammatical class of words that includes those, you've found the nouns in Japanese.

But you can't use the old-fashioned definition to classify words within a language. The words that name kinds of thing and sorts of place and types of people will be in the class we're after, but so will other words, some of them having meanings that are pretty far from the central core of words that denote natural kinds in the animal, vegetable, and mineral realms.

Of course terror is a noun in English. There is no doubt about that. But don't expect its meaning to settle the issue. The word denotes a kind of feeling. There could easily be a word in which the only way to talk about that kind of feeling was to use adjectives (I'm terrified) or verbs (I tremble). Notice how the French for I'm hungry is J'ai faim (literally, "I have hunger"): for us, an adjective and an expression of predication, and for them, a noun and an expression of possession. Same concept, different grammar.

The way to tell whether a word is a noun in English is to ask questions like: Does it have a plural form (the terrors of childhood)? Does it have a genitive form (terror's effects)? Does it occur with the articles the and a (the terror)? Can you use it as the main or only word in the subject of a clause (Terror rooted me to the spot), or the object of a preposition (war on terror)? And so on. These are grammatical questions. Syntactic and morphological questions. Not semantic ones.

My conjecture (and of course it is only a conjecture: I don't know what was in his mind) is that Jon Stewart was sufficiently in the grip of the traditional definition that he felt terror couldn't be a noun: nouns denote things substantive enough to be attacked, destroyed, touched, owned. Now, I agree entirely that the Bushian phrase "war on terror" is stupid: terror is no more suitable as a target for a war effort than pity, sorrow, caution, shyness, indecision, or ennui. But all those words are nouns.

Posted by Geoffrey K. Pullum at 06:46 PM

Google and WSD

Word sense disambiguation (WSD) is one of the most venerable topics in natural language processing, going back to the earliest days of computing. Back in 1947, writing to Norbert Wiener about automatic translation, Warren Weaver commented on "the semantic difficulties because of multiple meanings" (see Hutchins's very nice historical discussion of MT systems).

Many of us who work on the topic of word sense disambiguation (and plenty of people who don't) have been frustrated by the fact that over the years WSD algorithms have had relatively little impact in real natural language applications, either because the algorithms don't perform well enough yet, or because problems of word ambiguity are dealt with implicitly rather than explicitly. (See, e.g., the deservedly well known paper by Krovetz and Croft on lexical ambiguity in information retrieval.)

We keep the faith, though, in part because new NLP applications like question answering seem to have a greater need for dividing the world up into semantic categories. A recent discussion of Google's GMail by a pilot user stirs the blood: Steve Bass writes, "So far, many of the ads I've seen have been wildly inaccurate: For example, promoting glass windows when I talk about Windows...".

WSD just has to make a difference, dammit, it just has to make a difference, it just has to...

Posted by Philip Resnik at 01:55 PM

Owning Words

An amusing parody of SCO's ridiculous claim that Linux infringes on its intellectual property has appeared on Groklaw. By amazing coincidence, it is based on the ridiculous idea that people can own words.

Posted by Bill Poser at 11:57 AM

Fitting names for linguists

Bill Poser recently remarked, in a post submitted at 1:39 a.m. on May 18 (make sure you get some sleep, Bill) that although there is an evolutionary psychologist called Fitness and there used to be brain scientists called Brain and Head, "I can't think of any linguists whose names reflect their profession."

Ha! I am amazed. Shocked, in fact. There are quite a few. It's surprising that Bill, who seems to know so much about so many subjects, was not aware of them.

To begin with the most celebrated instance, the most distinguished linguist and phonetician at University College London for many years was (after his knighthood for services to speech sciences) Sir Thomas Tongue. His contribution to the study of errors in apical consonant articulation are world famous. The term slip of the tongue was actually coined in his honor. Sir Thomas used to take pleasure in demonstrating his ability to say Berth both boats beneath, forsooth, in unique New York, unique New York, unique New York lest six swift thrifty Swiss ships stuffed with sifters swiftly shift three times in less than two seconds. Regrettably, while performing this trick after dinner one night he suffered an acute lingual tangle and fell dead before even tasting his dessert. Coffee was delayed for more than eight minutes while he was removed.

There are many other linguists with appropriate names. I think particularly of the fine Montague semanticist Anastasia Lambda; the well-known Italian X-bar theorist Enzo Centric; the Japanese phonologist Yuri Mora; and syntactician Sandy Clause.

All right, I admit it, the above is all a complete load of nonsense and I made it all up, just like I made up a post once about universities named after linguists for Mark (apparently a number of people thought it was for real).

But listen: there is one person who should have become a linguist but didn't, a guy with a name so fantastic I would strangle a kitten on network TV to have his name for myself. He is the BBC's main corrrespondent in Moscow, and I swear I'm not making him up: His name is Damian Grammaticas. Now is that is a gorgeous, perfect name for a linguist or what.

[Formerly this point had "Grammaticus", which does get ghits, and would be even better, with its Latinate connotations; but "Grammaticas" is correct (thanks to Keith Ivey for setting me right), and that's good too.]

Posted by Geoffrey K. Pullum at 10:36 AM

Wordcraft

A book that looks interesting, though I haven't read it: Alex Frankel's "Wordcraft: the Art of Turning Little Words into Big Business".

Nathan Bierma at Nathan's Notebook offers a blizzard of interesting Wordcraft links: his own Chicago Tribune review, on-line academic lectures on naming at Stanford (from a course on The Language of Advertising) and Penn, and the 1997 Wired article out of which Frankel's book was developed. Nathan also provides the last graf of his review, cut by some merciless Tribune editor:

"Of course, this kind of inspection could backfire, as Nunberg observed last year in the New York Times. "As advertisers have known for a long time," he wrote, "no audience is easier to beguile than one that is smugly confident of its own sophistication.""

Wordlab quotes David Kippen's slightly skeptical review, which says that "If a silly name like Accenture strikes you as "forward-thinking," you may be spending too much time hanging around boardrooms and breathing in Magic Marker fumes." Kippen ends with this:

... when the juicy anecdotes start flying, it's hard to begrudge [Frankel] his occasional swigs of BlackBerry Kool-Aid. Did you know that the poet Marianne Moore had a long correspondence with an automobile namer, from which the car names Civic and Diamante may eventually have sprung? Or that the consulting firm that coined Viagra is itself named -- I swear -- Wood Worldwide.

Blogcritics says that the book a "218-page-long puffed-up magazine article", and that "Chapter 7, on Viagra, is the only one worth reading carefully, and that only because the author explains the intricacies of the FDA's rules on drug naming." This is the most negative review I've seen.

Frankel has a weblog, but as of today, its only post (from May 7) says "This is a test." The book does have a web site with actual content, though, including links to other magazine articles that he's written on the subject.

For licensing information about "snuggle and all derivatives thereof ...; the adjective parsimonious; the preposition of; and the nouns crump, ether, parsley, helicopter, oligarchy,and rhodium", you'll have to check here at Language Log Lexical Industries with Dr. Geoffrey Pullum, Proprietor.

Posted by Mark Liberman at 09:27 AM

May 18, 2004

Language enrollments rocket upward in 1998-2002

Americans traditionally are not enormously interested in studying the languages of other countries, a fact that I take to be well known. Sometimes it almost seems that if a public figure is well acquainted with a foreign language other than Mexican Spanish he needs to conceal it. And linguistic general knowledge is so sorely lacking that Language Loggers find public dimwittedness on linguistic topics to mock nearly every week. But let's not ignore good news. Language Log has not overlooked the encouraging and remarkable facts revealed a few weeks ago by a Modern Language Association report, based on Fall 2002 enrollments in courses as compared to Fall 1998: all languages shot up, especially the less commonly taught ones, and some are up by very substantial factors indeed.

The languages with double-digit percentage enrollment increases (which I have rounded to the nearest integer) were:

Language	Increase
American Sign Language:	432%
Navajo:	164%
Vietnamese:	149%
Dakota/Lakhota:	83%
Arabic:	92%
Biblical Hebrew:	56%
Italian:	30%
Modern Hebrew:	28%
Portuguese:	21%
Japanese:	21%
Chinese:	20%
Korean:	16%
Latin:	14%
Spanish:	14%

It's true that Russian was hardly up at all (half a percent); but every language was up, and the aggregate percentage enrollment increase was 17%.

[Source: Foreign language enrollments in United States institutions of higher education, fall 2002 by Elizabeth B. Welles, available as a PDF file from the web site of the Association of Departments of Foreign Languages.]

Posted by Geoffrey K. Pullum at 06:41 PM

Snugglebunny is mine

Bill Poser reports to stunned Language Log readers that (where does he find this stuff?) some people think they can claim family ownership of words. He disapproves. But why? So Peri Fleisher is convinced that she deserves some cheap Google™ stock options on the grounds that before she was born her great-uncle Edward Kasner (who died when she was 4) introduced the number name googol on a suggestion by his 9-year-old nephew Milton Sirotta? Sounds like a solid case to me! Kudos to Peri. Greedy bitch? Sure. But who said there's something wrong with that all of a sudden? What're you, a communist? Poser's just mad because he didn't think of it. I'm not so appalled. In fact I want in. The interesting thing about Fleisher's claim is that she doesn't say she invented any word herself; the word googol has been in common use for decades. Her vague threats to sue are based on a mere feeling of inherent right through family connection and phonetic similarity. Well, my maternal grandmother appears to have coined the word crump (a British food term meaning fried bread), so I should have rights to that. And I see no reason why the relation should be as close as grandmother or great-uncle. I feel sure that ancestors of mine have coined numerous other words. In fact I have instructed my lawyers (Messrs Dewey, Cheatham and Howe of Boston and San José) to prepare papers seeking a temporary injunction stipulating that I and my heirs have ownership of, and retain all rights in, the following words and all words that sound like them: the verb snuggle and all derivatives thereof (e.g. snugglebunny); the adjective parsimonious; the preposition of; and the nouns crump, ether, parsley, helicopter, oligarchy, and rhodium. So hands off.

Posted by Geoffrey K. Pullum at 04:01 PM

Submit a manuscript, go to jail

I know, I'm perverse; but after reading Bill Poser's revelation that coauthorship with a citizen of an embargoed country is a Federal crime, I have suddenly developed a yearning, a positive lust, to collaborate on a little scientific paper of some kind with a citizen of Cuba, Iran, Iraq, Libya, North Korea, Sudan, or Syria, just to be at the center of the wave of ridicule against the U.S. government when it brings the first criminal prosecution for writing about linguistics with a member of an evildoer nation.

San José, California, Thursday — Renowned grammarian Geoffrey K. Pullum of the University of California, Santa Cruz (seen at right, hooded and handcuffed to Federal marshals) was arraigned this morning on charges of coauthoring a paper with an Iranian citizen. Pullum and his Tehran-based collaborator Faroukh Khosmud argue in their paper that previous analyses of relative clauses in Farsi have missed a subtle distinction between integrated and supplementary relatives. The paper was under review for the Squibs and Discussion section of the journal Linguistic Inquiry until the MIT Press realized that they were violating Federal law by even considering it. Charges against two anonymous referees are under consideration. Said burly editor-in-chief Jay Keyser, Pullum nearly got us in a whole bunch of trouble; but we dropped his squib like a hot rock when we realized he was collaborating with an agent of a hostile power. It's no great loss: the paper didn't look sufficiently minimalist in its orientation anyway. US Attorney General John Ashcroft issued a statement saying, This typescript of nearly eight pages on a language spoken by terrorists represents a real and present danger to the security of the United States. While the threat posed by theoretical work on the structure of relative clauses may seem modest at present, it is a short step from there to more sensitive areas of Farsi syntax, and ultimately to work of significant material benefit to world terror.

I know, it's wicked to mock the fine people at the Office of Foreign Asset Control, who doubtless suffer enough from the gibes of their friends and neighbours (Hey, Bob: caught anyone smuggling offprints today? Tee-hee-hee). But I just can't resist this vision of being the test case for their interpretation of the law, having the entire paper read out in court as part of my defense — and seeing Ashcroft use his powers under the US Patriot Act to discover who those anonymous Linguistic Inquiry referees were.

Posted by Geoffrey K. Pullum at 01:15 PM

Google/Googol Owned?

Can someone own a word? Some people think they do. According to The Inquirer and The Baltimore Sun, Peri Fleisher, the great-niece of Professor Edward Kasner, who coined the term googol, is considering suing Google on behalf of her son, who holds the copyright in Kasner's book Mathematics and the Imagination, for taking advantage of the word without compensating them. They say that it isn't fair that Google has benefitted from the use of the word without bringing attention to Kasner's work. Curiously, they aren't trying to get Google to publicize Kasner's work; they want money. To be precise, they want insider status when Google goes public. (This is particularly silly since Google is planning to use a Dutch Auction, which as I understand it means that there won't be any insiders.)

In any case, its not as if Google doesn't acknowledge the source of its name. The Google History page on the company's web site begins:

Google is a play on the word googol, which was coined by Milton Sirotta, nephew of American mathematician Edward Kasner, and was popularized in the book, "Mathematics and the Imagination" by Kasner and James Newman. It refers to the number represented by the numeral 1 followed by 100 zeros. Google's use of the term reflects the company's mission to organize the immense, seemingly infinite amount of information available on the web.

What is really bizarre here is the idea that Kasner's heirs own the word googol. By its very nature, every bit of a language belongs to the commons, and it is perfectly clear that Kasner intended googol to become part of the English language. You can copyright a sufficiently long and original sequence of words, but not an individual word. You can trademark a word, but only for specific uses, and in any case neither Kasner nor his family has ever used googol as a trademark. Legally, I am confident that the family hasn't got a leg to stand on. Morally, they don't either. None of them had anything to do with the introduction of the word, and none of them has been in any way injured or lost any opportunity through Google's use of the term. In his role as a scholar, Kasner introduced a word to the English language and thereby contributed it to the public domain. He knew that this is what he was doing because this is how science works. People make discoveries and come up with new ideas and create language for talking about them. Other people use these ideas and facts and words and build on them. We admit only a very limited form of short term ownership of ideas, in the form of patents, and even this has become increasingly problematic as patents have been extended to software.

Allowing people to own words would make life as we know it impossible. Only certain people, those with the appropriate licenses, would be able to talk about certain things. You wouldn't be able to talk or write about genetics unless you held licenses to use repressor and allele and so forth. You couldn't discuss syntax without licenses for E-language and foot feature and Determiner Phrase, and if you had them, you might find that you couldn't use, say, functional unification and thematic role in the same paper because of the restrictions in the licenses imposed by the proponents of rival theories. The mind boggles at the insanity of this idea.

Posted by Bill Poser at 10:52 AM

Great stuff from Q.

Over the past few days, Q. Pheevr at "A Roguish Chrestomathy" has posted a series of typically entertaining and enlightening items. Especially fine are Q's posts about Ugol's Law, about a nearly-Escherian sentence due to George W. Bush, and about the petition for freedom of gender identification on Livejournal.

Ugol's Law:

"Recently, while traversing the labyrinth of data that is Everything2, I stumbled across something called Ugol's Law. This principle, which is apparently common wisdom among the denizens of alt.sex.bondage , states that whatever one's kink or fetish may be, there is almost certainly someone else out there who shares it. Readers familiar with Optimality Theory may recognize this idea as a variant of Richness of the Base."

A true scholar, Q. tracks Ugol's Law to its textual source, and discovers that Harry Ugol's original statement was also an example of overnegation, literally saying exactly the opposite of what Ugol proverbially meant.

A nearly Escherian Bushism:

Q discusses a quote from George W. Bush: "The American people are just as appalled at what they have seen on TV as Iraqi citizens have".

And in another post, Q. observes that "Bush was decent enough to announce to the press that he had said he was sorry, which is of course the most expedient way of expressing contrition in public without actually expressing contrition in public."

Freedom of gender identification:

Q supports the petition to open up the gender field in Livejournal's author profile, which is limited to "Male", "Female" or "(Unspecified)", and announces that "given the choice, I believe I'd fill mine in as 'Mostly harmless.'"

Another thought: the additional gender names would be even more useful if paired with suitable pronouns. For instance, Q. might choose to suggest we refer to qim and to qer posts using the nominative qe, the accusative qim and the genitive qer. Though of course I don't presume to make this choice on qer behalf -- qe'll be free to fill in the form however qe likes, once Livejournal introduces the option.

Coincidentally, I saw recently on ProMED-mail that there is an epidemic of Q fever in Banja Luka, and learned that Q. is short for "query":

Q (query) fever is caused by _Coxiella burnetii_, the only common rickettsia to be usually transmitted by aerosol rather than through an arthropod vector. Worldwide, this zoonosis is primarily found in cattle, sheep, and goats, but many mammals and birds may also be infected. The diagnosis is usually serological, and the illness is most commonly either a self-limited febrile illness lasting up to 2 weeks or pneumonia with or without hepatitis.

The OED's citations for Q fever clarify the history:

1937 E. H. DERRICK in Med. Jrnl. Australia 21 Aug. 282/1 The suspicion arose and gradually grew into a conviction that we were here dealing with a type of fever which had not been previously described. It became necessary to give it a name, and ‘Q’ fever was chosen to denote it until fuller knowledge should allow a better name.
1964 E. H. DERRICK in Queensland's Health Dec. 11/2 ‘X’ is a recognised term for an unknown quantity. But Australia already had an ‘X disease’, now known as Murray valley encephalitis. However, the rest of the alphabet was open. Query also signified the unknown. ‘Q (for query) fever’ it became. Ibid., Many have wrongly assumed that the ‘Q’ stands for Queensland.

Posted by Mark Liberman at 07:59 AM

Fitting Names

Mark referred to the book From Mating to Mentality: Evaluating Evolutionary Psychology without mentioning the fact that one of the two editors is the appropriately named Julie Fitness. From my father, a neurologist, I learned as a child of the famous neurologists Lord Brain and Sir Henry Head. I can't think of any linguists whose names reflect their profession.

Posted by Bill Poser at 01:39 AM

OFAC Censorship Still in Place

Last month I reported that the Office of Foreign Asset Control of the US Treasury Department had abandoned its position that journals published in the United States could not edit papers submitted by residents of countries with which the United States embargoes trade, currently Cuba, Iran, Iraq, Libya, North Korea, Sudan, and Syria. I spoke too soon.

Writing in the Chronicle of Higher Education (subscription required), Peter J. Givler, Executive Director of the Association of American University Presses, reveals that OFAC has not really changed its mind. The letter that OFAC sent to the Institute of Electrical and Electronics Engineers relented only in the sense that it stated that OFAC consider minor editing permissible.

To be precise, OFAC's position is that the following (quoted by Givler from the letter) are acceptable:

Labeling units of measurements with standard abbreviations.
Correcting grammar and spelling to conform to standard American English.
Changing the size of type or the weight of lines in illustrations so that the diagrams remain legible when reduced in size for publication.
Labeling illustration captions and formatting references to conform to the style manual of the publisher.
Sizing and positioning illustrations to fit on the page appropriately and in proper proximity to references in the text.
Formatting mathematical equations to fit on the page appropriately and to avoid breakage between two lines in a way that is unclear.
Ensuring that the author has supplied a biography and a photo.
Adding page folios with publication titles and page numbers.

No other editing is acceptable, nor is translation. OFAC has also stated that co-authorship by a US national and a resident of one of the embargoed countries is a crime. Notice that this means that in OFAC's opinion it is a felony for a US national to co-author a paper with a dissident in an embargoed country, to edit it, or to translate it! Of course, they aren't very likely to prosecute in that case. By the same token, in their view collaborating with a national of an embargoed country on a paper critical of the United States is a felony, and in this case they would be motivated to prosecute.

As I argued in a previous post, not only is the OFAC position bad policy, it is contrary to legislative intent, unconstitutional, and in violation of Article 19 of the Universal Declaration of Human Rights. OFAC's current position, which is based on the claim that the exemption for "...information or informational materials, including but not limited to, publications, films, posters..." is limited to materials already fully in existence, has no basis either in the law or in the legislative history. The Treasury Department seems to be infested by petty little fascists.

Posted by Bill Poser at 01:06 AM

From Just So Stories to science, in biology and in pragmatics

Marc Moffett at Close Range suggests that Gould and Lewontin's complaint about adaptationism in biology -- that post hoc adaptationist stories are too easy to come up with -- also "infects pragmatic explanations in linguistics and the philosophy of language". He argues that this "is pernicious because the ready availability of pragmatic explanations ... allows one to preserve one's favorite semantic theory 'come what may'."

Moffett further points out the current "construction-based approach to grammar" makes it harder to restrain the proliferation of such stories, because it weakens Grice's principle of not multiplying meanings beyond necessity, which traditionally provides the core motivation for arguing that some meanings arise from pragmatic reasoning about the conversational context. At least I think that's his argument.

This is all a bit abstract. A concrete example might be the Gricean argument that sarcasm (e.g. "Yummy!" meaning "Disgusting!") arises from pragmatic reasoning in context rather than from a systematic multiplication of word senses. This idea is convincing mostly because of the parsimony argument, and not because the required pragmatic reasoning is obvious or compelling. For example, I had an exchange with David Beaver, Larry Horn, Ellen Prince and others last fall about why "reverse sarcasm" rarely works:

I described someone who comes home from a long hard day to find that the puppy has pooped on the rug, and says "oh, terrific!" (or "wonderful" or "great" or similar positively-evaluated adjective), meaning the opposite; and contrasted this with the same person finding a bouquet of roses, and saying "oh, disgusting!" (or "ugly" or "annoying" or similar negatively-evaluated adjective) to mean the opposite. I argued that the first is normal and the second is weird.

The best story about this seems to be David Beaver's: "you can sarcastically express a departure from a salient hope, not from a salient fear." But the whole discussion was very much an attempt to reason backwards to an explanation of an observed pattern; if we tried to predict the pattern without knowing in advance what it was -- say by running a theorem-prover on a set of conversational axioms -- I doubt that we would have gotten it right.

So to repeat, the conversational-implicature story is convincing here mostly because it's unattractive to systematically add a new, opposite sense to the meaning of every word and phrase that can be used to express a "salient hope".

How does"construction grammar" undermine this argument? Well, I guess that you could take the structure S(entence) to be (optionally) a "construction" whose meaning is created by negating its normal compositional semantics. Or you might say that copular sentences can be constructions with inverted meaning, and leave other sarcastic utterances in the pragmatic wastebasket. Or you could make some other division of explanatory labor. I'm not sure that this is what Moffett has in mind, but it seems that options of this kind do make it less clear when an account in terms of conversational implicature is called for.

Anyhow, I like the analogy between (the weakness of) explanations for conversational implicatures and (the weakness of) adaptationist arguments in biology. As a contribution to the discussion, I'll cite Russell Gray, Megan Heaney and Scott Fairhall's sharp but constructive critique entitled "Evolutionary Psychology and the challenge of adaptive explanation" (a chapter in a brand-new book).

This is the same Russell Gray who's been involved with the recent work on Indo-European dating. He and his co-authors are by no means gentle with the behavioral adaptationists:

... the impoverished view of evolution and psychology adopted by many Evolutionary Psychologists, and the weakness of their empirical science, is frankly rather embarrassing.

and they name names:

...our attack is confined to the specific program of Evolutionary Psychology associated with the “Santa Barbara church of psychology” ... a nativist approach to cognition that views the human mind as a collection of modules design by natural selection to solve the problems faced by our Pleistocene ancestors. This program was christened in the Adapted Mind book (Barkow et al, 1992), and proselytised to the lay public in Steven Pinker’s (1997) modestly titled book How the Mind Works. Its followers have applied EP doctrines to everything from social reasoning to preferences for green lawns and certain genres of erotic fiction...

They're specific about what they don't like:

Evolutionary Psychologists take current features of human cognition and posit that they are adaptive solutions shaped by natural selection to problems posed by life back in our Environment of Evolutionary Adaptedness (Tooby & Cosmides, 1992). This might be a good explanatory strategy if three criteria were commonly satisfied:
1. all traits were adaptations
2. the traits to be given an adaptive explanation could be easily characterized
3. plausible adaptive explanations were difficult to come by.

... The challenge of adaptive explanations is that all three of these criteria are frequently violated.

and they explain the difficulties in detail.

Even better, however, they not only debunk various instances of unwarranted speculation, they also give some examples of successful science. These include a case where a plausible adaptive hypothesis was disproved by careful research (the "promiscuous primate" theory of menstrual bleeding) and a case where an adaptive hypothesis about a behavioral trait is strongly supported by converging evidence that includes phylogenetic analysis of mtDNA data (development of wing-waving displays in Pelecaniforms from flight intention movements).

By analogy, in the area of pragmatic Just So stories, it would be nice to see a clear explanation of what hurdles such a story should have to get over in order to be accepted, along with at least one example of a plausible pragmatic explanation that was shown by careful empirical investigation to be false, contrasted with another that is strongly supported by converging evidence.

Posted by Mark Liberman at 12:35 AM

May 17, 2004

More timewasting garbage, another copy-editing moron

Mark Pilgrim is nearly done with his (online) python programming book Dive Into Python, but is currently being subjected to that bane of the author's life, the copy editing phase.

He says:

Dive into Python is almost finished. ... Now the copy editor is wielding her virtual pen and striking through every word I’ve ever written. Incorporating her revisions is simultaneously humbling, enlightening, and mind-numbingly tedious.

Here are the main things I’ve learned so far:

I use have to when I mean need to.

I misplace the word only. Instead of you can only walk through a stream once, the copy editor prefers you can walk through a stream only once.

I use lots when I mean a lot.

I use which when I mean that.

I overuse footnotes to be cute. This is a bad habit I picked up from the interactive fiction version of Hitchhiker’s Guide to the Galaxy and the infamous footnote 12.

I use like when I mean such as.

I use then immediately after a comma, when I mean and then.

I overuse semicolons for no particular reason except that I’ve always liked them.

I use note when I mean notice, and vice-versa.

I use we when I mean you. As we saw in the previous chapter… We’ll work through this example line by line. And so forth. Apparently we won’t be working through this example. You will be working through this example; I will be in the Bahamas drinking my royalty check.

Well, I don't know who is paying that copy editor, but if she were working for me she would be toast, because every single thing about English grammar here is wrong.

There are some style suggestions included: don't overuse footnotes, don't be too liberal with the rather literary device of the semicolon. On things like this, advice from an opinionated reader or a publisher with style guidelines can be helpful. I won't say anything about them. And the last point is also about style, though I think the style advice is dead wrong: inviting the reader into your deliberations and saying as we saw in the previous chapters feels much warmer and more supportive than the alternatives (as I stated in the previous chapters is all pay-attention-to-me, and as you saw in the previous chapters suggests authorial omniscience about the reader's mental state). But the rest (familiar copy-editor changes all) are based on nothing more or less than flatly false claims about what is grammatical in contemporary Standard English. This copy editor should be told not just to lay off, but to go to school and take a serious grammar course. Enough of these 19th-century snippets of grammatical nonsense that waste authors' time all over the English-speaking world. Let me go through the grammar points on which poor Mark is being corrected, one by one:

Have to and need to are essentially synonymous. There is a slight tendency for the first to be used when the compulsion source is external and for the second to be used for internally driven urges, but they can easily be used the other way round, as we see from the naturalness of Excuse me, I have to go to the bathroom and You need to move your car because that side of the street is being swept today.
The word only is frequently positioned so that it attaches to the beginning of a larger constituent than its focus (and thus comes earlier), and that is often not just permissible but better. Ian Fleming's title You Only Live Twice was not copy-edited to You Live Only Twice. Why not? Because he knows how to write, and he didn't let an idiot copy-editor change his writing into mush, that's why.
Lots of garbage and a lot of garbage are both grammatical and mean basically the same thing. Lot here is not used in any literal sense; it's what's called a non-count number-transparent quantificational noun (CGEL ch. 5 sec. 3.3). The main difference is that lots of is more informal in style (especially with count plurals: lots of stupid quibbles is distinctly more informal than a lot of stupid quibbles. But informal does not mean incorrect. It is perfectly appropriate, and becoming standard, to use informal English constructions in computer programming books and lots of other kinds of academic and technical published prose.
There is an old myth that which is not used in integrated relative clauses (e.g. something which I hate) and that has to be used instead something that I hate). It is completely untrue. The choice between the two is free and open. The people who repeat the old story about which being banned do not respect the prohibition in their own writing (Merriam-Webster's Dictionary of English Usage points out a book by Jacques Barzun which recommends against it on one page and then unthinkingly uses it on the next!). I don't respect it either — re-read that last parenthesis. As a check on just how common it is in excellent writing, I searched electronic copies of a few classic novels to find the line on which they first use which to introduce an integrated relative, to tell us how much of the book you would need to read before you ran into an instance:
- A Christmas Carol (Dickens): 1,921 lines, first occurrence on line 217 = 11% of the way through;
- Alice in Wonderland (Carroll): 1,618 lines, line 143 = 8%;
- Dracula (Stoker): 9,824 lines, line 8 = less than 1%;
- Lord Jim (Conrad): 8,045 lines, line 15 = 1%;
- Moby Dick (Melville): 10,263 lines, line 103 = 1%;
- Wuthering Heights (Bronte): 7,599 lines, line 56 = 0.736%...
Do I need to go on? No. The point is clear. On average, by the time you've read about 3% of a book by an author who knows how to write you will already have encountered an integrated relative clause beginning with which. They are fully grammatical for everyone. The copy editors are enforcing a rule which has no support at all in the literature that defines what counts as good use of the English language. Their which hunts are pointless time-wasting nonsense.
Like has exactly the same meaning as such as in contexts like this one (I could have said in contexts such as this one). There is a difference in formality level: like is more informal. But informal does not mean incorrect. I believe I have said this before. Please pay attention.
Then can introduce a new clause immediately after a comma; an extra and is not needed. Bram Stoker writes: The carriage went at a hard pace straight along, then we made a complete turn and went along another straight road. Do these copy editors think their writing wisdom is greater than that of the author of Dracula? Huh? They are morons, and they are wasting Mark Pilgrim's time with their fiddling.
Note and notice, as verbs, have basically the same meaning. It is hard to imagine a context in which one would need to be corrected to the other, or in which direction.

Have I made myself absolutely clear? Well, just in case, I will say this once more in a box, in a larger typeface designed to catch the attention of dimwitted people or perhaps even copy editors:

The things mentioned above are not debatable, they are facts about English that can easily be checked, and it is about time copy editors were told to stop wasting millions of hours on pointlessly correcting them when they were correct in the first place.

God dammit, I can feel the veins standing out in my neck. I need to step outside for a while and kick something.

Posted by Geoffrey K. Pullum at 02:36 PM

BUSH FALLS OUT

Recently an ailing aunt of mine told me that she "fell out" a while ago. The term genuinely perplexed me -- I wasn't sure just what "fall out" meant. Upon probing, it turned out that she meant that she had unexpectedly lost her balance.

The American Heritage Dictionary does not list this usage of "fall out," instead noting military connotations and the typical meaning of to quarrel. But in the language log tradition of commenting on our President's speech patterns, I have recently read of his using "fall out" the way my aunt did twice.

Recently, giving a speech on a hot day he remarked "I better quit before some of us fall out" (www.nytimes.com/2004/05/12/politics/ trail/12TRAIL-HEAT.html). And then earlier, he said during another speech (http://www.whitehouse.gov/news/releases/2004/01/20040122-6.html) "There are 34 nations that have joined us in Iraq. That's too long to list. The Senator might fall out on me if I start trying to read them all."

So it would seem that there is a meaning of "fall out" common among Southerners (of which my aunt is also one) that is, to most other Americans and English speakers, somewhat opaque. There is, after all, nothing about any meaning of OUT that would predict that OUT and only OUT would be used to indicate that the falling was involuntary and sudden.

Languages develop things like that over millennia. Words combine with words, or prefixes and suffixes combine with roots, in ways that over time drift away from perfect sense. What are we standing under when we UNDERSTAND? Presumably this word made intuitive sense in English at some earlier stage -- maybe standing under a tree was how early English speakers came to some kind of consensus in some forgotten tradition of communal parlaying -- but now we just say it without thinking about it. When a Russian says NAKAZAT' for "to punish," KAZAT' means "to show" and NA means, roughly, "onto." But how is punishing someone showing them onto something?

So, you know that a language is ancient when it has people failing to UNDERSTAND what FALL OUT means. And that is one way that many creole languages show that they are new languages, having emerged only a few centuries ago when subordinated people learned a makeshift variety of a language and then built that back up into a full one. If your language is a real one but also new, then after just a few hundred years there hasn't been time for illogical things like UNDERSTAND and FALL OUT to creep in. In Saramaccan Creole in Surinam, to spend too much money is to "eat" it -- NJAN MONI -- that's an idiom, to be sure, but it "makes sense." There are no UNDERSTANDS, and no fallings out.

In fact, these things can be seen as one of the sure indicators that a language is an old one rather than a recent creation, rather like isotope ratios in rocks and fossils. There are other indicators, such as inflectional prefixes and suffixes, or tones. Most languages have one or the other, but there are a few dozen that lack even them. Even here, though, there are always the UNDERSTANDS to tip us off that these languages trace back into the mists of time.

For example, many Mon-Khmer languages in Southeast Asia (of which Vietnamese and Cambodian are the rock stars) have no inflections and no tones. But they do have their FALL OUTs. In the Chrau language, TA- is a causative prefix, indicating that one made something happen: CHUQ "to wear," TACHUQ "to dress." But then there are cases that don’t quite follow and just have to be accepted as they are. CHEQ means "to put," but TACHEQ means not "to make someone put" but "to slam down."

In the Polynesian language Tokelauan, one uses a circumfix -- that is, a kind of "earphone" consisting of a prefix AND a suffix -- to indicate reciprocity. The circumfix is FE- / -I, and so FEAHOGI means "to kiss each other." But then while ILO means "to perceive," FE-ILO-AKI means "to meet." Sure, meeting involves perceiving one another, but one would not guess that this is what FEILOAKI meant without being told. This meaning emerged gradually over time, and Tokelauans are stuck with it.

So this means that when George Bush uses a humble colloquialism like FALL OUT, he is less "undoing" our language than -- inadvertently of course -- displaying the depth of the English language heritage.

Posted by John McWhorter at 12:26 PM

Thalerization

continues, with citation of classical sources;
new insight from computational models;
fascination, fear among the citizenry.

Posted by Mark Liberman at 10:01 AM

Marketing to grammar victims

Steven Bird's picture of the International Phonetic Alphabet on the packaging of a new Olympus digital camera is heartwarming -- may it be the first of many similar examples. Unfortunately, the µ[mju:] 400 appears to be the European packaging of what is marketed in the U.S. as the Stylus 400. This may be related to the fact that recent British dictionaries for the European market use IPA in their pronunciation fields, while the American versions of the same dictionaries not only provide American pronunciations, but translate them into a non-IPA pseudo-orthographic system concocted for the occasion.

Still -- who'd've thought that the way to popularize linguistic analysis might be... marketing?

Continuing with Steven's idea that IPA=hitech, I fantasize about seeing kəˈʧɪŋ digital headphones: "Now with infinite impulse response!". Trying for some orthographic fetishism, a line of sandals branded /ˈwʊʤəz/. Or for that southern flair, /ˈwʊʤɔl/. We need some fonts in general distribution that make IPA look more elegant, though. When I look at IPA in unicode through a standard web browser or text processor, the mixture of styles and the poor hinting usually make it look like a ransom note (even without all the problems of diacritics...)

I don't expect that Britney Spears will replace her "Hebrew" tattoo with an IPA inscription, but maybe some future Hollywood guru will promote incantatory mantras in IPA. Or maybe some analytical hiphop innovator will start using IPA on liner notes and in song titles.

Then there's a popular women's clothing store called anthropologie. Maybe linguistique has potential? Or more confidently, linguistics. Selling anaphora perfume, part of the binding theory product line. Or aphasia eye make-up ("when speech just doesn't work") -- maybe that's a bit insensitive, though opium is still in the stores... How about clitics piercing studs?

There are plenty of opportunities for catchy slogans -- fill out the list (and fill in the products) for yourselves:

epistemology: how you know.
split ergativity: it's all about the participants.
...

The list of suitably evocative terms, ready to be recycled into mass-market products, is a long one: anti-passive, bilabial, causative, deixis, diglossia, fluency, illocutionary force, implicature, irrealis, labiality, onomatopoeia, perisylvian, voiceless ...

Not likely. But a nice daydream.

Posted by Mark Liberman at 09:16 AM

Marketing the International Phonetic Alphabet

I was interested to discover recently that the International Phonetic Alphabet (IPA) has escaped from its usual place in the pronunciation field of dictionary entries to show up in the name of a camera: Olympus μ[mjuː] (Olympus Europa). I doubt the IPA is so well known that it clarifies the pronunciation of the Greek letter μ to the general population. Nor can it be a case of orthographic fetishism, such as the gratuitous use of diacritics intended to make a word look chic, e.g. Lancôme. The [mjuː] has nothing more exotic than the colon made of triangular dots. Instead, I suspect the IPA, right down to the use of the square bracket delimiters, helps the name look technical. Alternatively, an out-of-work phonologist landed a job in marketing; what other species of individual routinely uses the IPA and Greek letter variables after all? (Read more about the IPA in Bill Poser's recent piece.)

Posted by Steven Bird at 03:30 AM

Multilingual Menus

Mark's analysis of the language of the menus of Le Bec Fin, The White Dog, and The Village Treat strikes me as right on the mark, but it made me think of a menu that I am having a hard time analyzing.

There is a restaurant in Vancouver that I like a lot called Tropika that serves Malay Chinese food. On the front window is written 星馬 [siŋ mă], which puzzled me for a while. It means "star horse", and that expression didn't ring a bell with me or with the Chinese people I asked about it. Eventually my friend Frances found out from a Malaysian Chinese acquaintance that this is a Malaysian Chinese expression, kind of an acronym, meaning "Singapore/Malaysian". The characters are used just for their sound.

Anyhow, in addition to the delicious food and interesting writing on the window, Tropika has a linguistically interesting menu. It is tri-lingual. Every item is described in English, Chinese, and Japanese. My first question is, why Japanese? English makes sense since it is the dominant language in Vancouver. And Chinese makes sense since it is a Chinese restaurant, and also since Chinese is now the second language in Vancouver. (According to the 1996 census, 13.8% of the people in the Greater Vancouver Regional District listed Chinese as their first language.) But why Japanese? Less than 1% of Vancouver residents speak Japanese. Japanese are a fairly important segment of the tourist trade, but not so important that other restaurants have Japanese language menus, except of course for Japanese restaurants. And why not Malay? I'd like to think that it's because I don't know Malay, but that doesn't seem very likely. I'm guessing that the answer is that although the people who run the restaurant are from Malaysia and speak Malay, and their cuisine is influenced by Malay food, their linguistic and cultural orientation is Chinese.

The other interesting thing about the trilingual menu is that, unlike the menu that Mark cites from Le Bec Fin in which the English and French say the same thing, the information conveyed by the three languages is different. For instance, as I recall, the English text of the entry for satay contains the information that it consists of skewered meat and comes with peanut sauce. The Chinese text lists the choice of meats (lamb, beef, and chicken) and specifies that an order consists of six skewers. And the Japanese text informs us that it is garnished with sliced cucumbers and another vegetable I can't remember. This is all very well for those who can read all three languages, indeed kind of fun, but I can't imagine that all that many of their customers can do so. So, was the information intentionally distributed over the three languages by someone being clever, or is it perhaps the result of having three different people compose the text in the three languages, each with a limited amount of space to use? I don't know, but I'm curious.

Posted by Bill Poser at 01:18 AM

May 16, 2004

Modification as social anxiety

Here's a thought: the impulse to pile up fancy words and extra modifiers, and the admonition to write simply and avoid adjectives, are both expressions of the same social anxieties, seen from slightly different places on the social scale.

As an illustration, consider the language of menus.

At the bottom of the social scale, we have simple places like Wendy's, whose menu includes items like "Single Hamburger on a Bun" and "side salad". When I was a boy in rural Connecticut, we ate out a couple of times a year at the only restaurant in town, the Village Treat, whose menu consisted entirely of simple phrases like "spaghetti and meatballs", "hot dogs and beans", and (my favorite) "grilled cheese sandwich".

Things are very different at the White Dog Café, whose lunch menu lists a hamburger as "Big Juicy Burger of Buck Run Farm’s Grass Fed Beef on our House made Poppy Seed Bun", a grilled cheese sandwich as "Butter Toasted Sandwich of Grilled Amish Cheddar, Sweet Red Onions and Tomatoes on Organic Sourdough Bread", and a side salad as "Mixed Salad of Many Lettuces from the Farm with lemon-olive dressing". The menu items are so elaborately expressed that it takes even hyper-literate academics a long time to process the choices, and foreign visitors often require a translation or at least an exegesis. Underneath the elaborate descriptions, the food is excellent. For many years, the White Dog has been one of the best restaurants in Penn's West Philadelphia neighborhood.

About 20 blocks east, and another notch up the scale in price, status and quality, is Le Bec Fin, generally regarded as one of the best restaurants in North America. Henry Gleitman is fond of saying that the two restaurants in Philadelphia that give the best value for the money are McDonald's and Le Bec Fin.

Le Bec's menu is given in both French and English, but the items in both languages combined are often shorter than analogous descriptions on the White Dog's menu. For example, where the White Dog has "Organic Wild Mushroom Risotto with Truffle Oil and Winter Herb Pesto" (11 words, 69 characters), Le Bec Fin has "Risotto aux champignons sauvages/Wild mushroom risotto" (7 words, 56 characters in both languages). Where the White Dog has "Pan-Seared Wild Caught White Albacore Tuna Loin in Coconut Lemongrass Herb Broth", Le Bec Fin has "Tartare de thon parfume au citron/Tuna tartar scented with citrus". The White Dog has "Anise-Black Pepper Seared Neptune Farms Organic Filet of Beef Carpaccio with Peppery Baby Greens", while Le Bec Fin has "Carre d’agneau servi avec une fricassée d’haricots jus de thym/Rack of lamb, bean fricassee with thyme flavored jus".

George Perrier's minions at Le Bec Fin know exactly where each ingredient in each of their dishes comes from, but they don't feel the need to tell us the names of farms and fishing techniques, or even the complete list of ingredients. Instead, they name only the core substances, preparation methods and flavors. This is not out of any concern for secrecy -- there are Le Bec Fin cookbooks -- but because we are meant to take it for granted that we can trust them to use an appropriate number and variety of appropriately selected ingredients. To go into details on the menu, either about what the ingredients are or where they came from, would be infra dig. And to state that something is crunchy or crispy or fresh -- unless this is unexpected -- would be unthinkable, because it would suggest that there might be some doubt about something about which there should be no doubt at all.

A similar dynamic applies in non-menu writing. There's plain work-a-day prose, like the menu at the Village Treat, which just means to tell us something simple in a simple way. Then there's ambitious prose, like the menu at the White Dog, which has much finer and more elaborate ideas to convey, and feels the need to make sure that we understand this, by using finer and more elaborate language. And at the level of Le Bec Fin's menu, there's prose that is secure in its status, or at least means us to understand that it is, and can therefore dispense -- perhaps ostentatiously -- with ostentation.

The "writing experts" who give apparently nonsensical and hypocritical advice about avoiding adjectives and adverbs can be understood as trying to show White Dogs how to aspire to become Bec Fins. Better advice is to step outside the whole frame, if you can manage it.

[Update: Another discussion of the form and function of American menu language can be found in Ann D. Zwicky and Arnold M. Zwicky, "America's National Dish: The Style of Restaurant Menus." American Speech, vol. 55, No. 2 (Summer, 1980) 82-93.]

[Update #2: Steven A. Shaw, aka "Fat Guy" (Director, eGullet.com) comments that:

Many of the best chefs in the world believe that producers, regions, and special products should be celebrated on menus. The expression of this belief can range from Georges Blanc's menu specifying "L'Aile ou la Cuisse de Poulet de Bresse Naturellement Rôtie" rather than just "Poulet" to a draft of a Cafe Gray menu I saw where the dishes are described simply but the bottom of the menu contains a list of producers, to Daniel Boulud's menus naming Tim Starck as his tomato supplier, to Alain Ducasse producing an entire book, Harvesting Excellence, containing photographs, bios, and other details about the American producers with whom he has relationships. There's not an insecure chef in that bunch -- they set the standards; they are the standards -- and I see no basis for complaining about being provided with this information. Certainly to call it "infra dig" is to reveal a lack of familiarity with menu writing at the top levels of cuisine today. There are menus that overdo it, and it's especially ridiculous when menus add meaningless modifiers that attempt to make ingredients sound better than they are ("ahi tuna" "USDA beef"), but real information, in moderation, can be a good thing.

Fair enough.

But my point isn't that writers who are socially and intellectually secure always write in a simple style, or that restaurants only provide lengthy descriptions of dishes or extensive information about food sources if their managers are insecure. I'm just pointing out that elaborate language is often displayed as a symbol in itself, on menus as well as in novels and essays. There are a number of reasons for such displays, and one is as an index of status. And among the motivations for indexing status, one is a concern that it might otherwise be evaluated as too low.

I also recognize that in commerce, status symbols are more likely to be a mirror of the buyers' identity issues than an expression of those of the seller. In this respect, most menus are probably somewhat different than most novels.

Other discussion on eGullet is also perceptive and amusing, including a quote from one of my favorite Monty Python sketches, involving the deathless phrase "the finest baby frogs, dew picked and flown from Iraq, cleansed in finest quality spring water, lightly killed, and then sealed in a succulent Swiss quintuple smooth treble cream milk chocolate envelope and lovingly frosted with glucose."

Now that I think of it, I could have omitted my entire post in favor of simply reprinting the transcript of that sketch, in line with the first First Rule of Fiction, "Show it, don't tell it." If this were fiction, that is...]

Posted by Mark Liberman at 10:12 PM

The International Phonetic Alphabet

Those of us here on Language Log who cite exotic languages or talk about phonetics frequently use the International Phonetic Alphabet. Mark referred to the IPA explicitly a few days ago, but usually we use it without explicitly indicating that that is what we are doing. Since non-linguists may find this confusing, and since the IPA ought to be used more widely than it is, I thought I'd talk about it a little.

The IPA is a system for transcribing speech. It attempts to provide a symbol for every distinct speech sound found in some human language. For instance, the sound at the end of ring is represented by the symbol ŋ, while the sound at the beginnng of this is represented by ð. Since there are several hundred such sounds, some of the symbols are composite. For instance, aspirated consonants are written with a small superscript h after the symbol for the corresponding unaspirated consonant. For example, [p] stands for the unaspirated sound, as in English spot; its aspirated counterpart, as in pot, is [pʰ]. The IPA is intended only to record those differences in sound that are distinctive in some language. However, diacritics are provided that allow finer detail to be recorded. The IPA is defined by the International Phonetic Association, which revises it from time to time as new speech sounds are discovered. The very first version of the IPA dates to 1886; it had attained essentially its current form by 1949. The changes since then consist almost entirely of the addition of symbols for the more exotic speech sounds.

Here is the official chart of the current version. If you're using it on-line, you may find this resizable version more convenient. The IPA proper is intended for describing normal speech, but there are extensions for transcribing disordered speech, described here [PDF]. There is a nice exposition here of a subset of the IPA, with large versions of the symbols, drawings of the articulatory configuration, and audio examples. The fullest exposition of the IPA is to be found in the Handbook of the International Phonetic Association (also in paperback). Audio files containing the illustrations from the Handbook can be obtained here.

Within the field of linguistics, and in related areas such as speech pathology, the IPA is widely used and has been a great success. So long as everyone uses the IPA, if one person writes something down, everyone else has quite a good idea of what it sounds like, whether or not he or she has ever heard the language in question. If someone writes about a certain sound, there is no question what sound is intended. To be fair, the IPA is not so good at representing things like tone and intonation; we still don't understand prosody well enough for there to be a reliable universal system of transcription for it. But so long as we are dealing with the segmental aspect of speech, the IPA does a very good job.

Although the IPA is intended to be an international standard, even people who use it don't always adhere to it strictly. This is partly a matter of idiosyncrasy and of national traditions, but also partly a matter of the use of typewriters. For instance, the IPA symbols for the voiceless and voiced post-alveolar fricatives, ʃ and ʒ, do not appear on normal typewriters, especially English language typewriters, meaning that anyone writing in or about a language with these sounds, which are quite common, had to leave space for them and write them in by hand. Moreover, ʃ is rather difficult to draw. If you aren't careful, it ends up looking like an ordinary <s>, and if you try to distinguish the two, you are likely to make the ʃ too tall, with the result that it sticks up into the line above or down into the line below. As a result, many linguists have used the symbols š and ž instead. These have the advantage that the base is found on the typewriter, so one doesn't have to leave space for them, and the diacritic ̌ is easy to write in later. A list of the typical North American deviations from the IPA can be found here.

Nowadays, with computer word-processing there is no reason not to use standard IPA symbols. The fonts are readily available. If you use a Unicode editor such as yudit or a Unicode-capable word-processor, you might try the Code 2000 font, available here. This font covers pretty much all of the Basic Multilingual Plane, which includes all of the IPA symbols. A smaller font, covering just the IPA, variants of the Roman alphabet, Greek, and a few others, is the Lucida Sans Unicode font. If you use Microsoft Word, you can download one of the free fonts available from the Summer Institute of Linguistics here. Depending on your software and your preferences you will enter the IPA symbols in different ways. It may be necessary to know the numerical codes, either to enter them directly or to set up your keyboard. Lists of numerical codes can be found here, here, and here.

You'd think that with a successful standard system for representing speech sounds in place for over half a century it would be widely used in contexts in which people want to indicate pronounciation, but outside of specialist publications, that isn't so. Take dictionaries. Both monolingual dictionaries and bilingual dictionaries intended for speakers of languages other than English usually indicate the pronounciation of words. Do they use the IPA? Almost never. It seems as if every dictionary has its own system for indicating pronounciation. That makes it painful to move from one dictionary to another, but what is worse, it means that if you don't happen to be familiar with the variety of English, or language other than English, that is used to explain a symbol, you have no way of knowing what it means. If dictionaries used the IPA, it would only be necessary to learn and keep track of one system, and when you used a dictionary, you would either know what a symbol meant or have a straightforward way of finding out.

The situation is similar with language textbooks. I recently had occasion to use John Mason's Tigrinya Grammar (Lawrenceville, NJ: The Red Sea Press, 1996). Tigrinya is a language whose sound system is quite exotic from an English speaker's point of view. It has a series of ejectives, voiced stops that are truly voiced even in word-initial position, phonemic glottal stop, and several fricatives absent from English, including a pair of pharyngeals. The English speaker desirous of learning Tigrinya is going to need some assistance with the sound system. This is not provided by the textbook. It uses the Ethiopic script, in which Tigrinya looks like this: ሰዋስው, throughout, without transliteration. That's very hard on the learner, who needs some practice with an unfamilar writing system, and some confirmation that he or she is reading it and writing it correctly. What is worse, the explanation of the Tigrinya writing system uses an ad hoc transcription system. Some of the ejectives are simply not indicated as being ejective; others are incorrectly labelled "plosive". Several Ethiopic letters are transcribed by the letter q, alone or in combination. If this means anything to an English speaker, it will probably be taken to indicate [kʷ]. In fact, it represents an ejective velar stop; the combinations represent a glottalized velar fricative and labialized versions of the first two. There is no way that anyone could learn this from the textbook.

As it happens, Mason didn't just invent this transcription with q. This is a traditional usage of Semitic specialists, based on the fact that in some Semitic languages the cognate sound is a voiceless uvular stop, whose IPA symbol is [q]. But this is a purely historical notation, irrelevant to the pronounciation of Tigrinya, and in any case is known only to Semitic specialists. To be fair, the book contains the disclaimer that to learn the pronounciation properly one needs to listen to a speaker. But many people aren't going to have access to a speaker, at least not when they start learning and not on a regular basis, and even if they do, without at least a foundation in phonetics, they will be hard put to know what to make of what they hear.

Now, I don't mean to give Mr. Mason a hard time. I imagine that he did his best. He's probably not a linguist. I suspect that he's a missionary. My point is that this is actually quite typical of language textbooks. Few authors of textbooks, and few publishers, seem to understand that it would be a good idea to provide the reader with an unambigous description of the sounds of the language, and that there is a simple way to do this: use standard linguistic descriptive terminology and use the IPA.

An example of a textbook that does use the IPA is R. E. Asher and E. Annamalai's Colloquial Tamil (New York: Routledge, 2002). Where it deviates from the IPA it is arguably for a legitimate reason, namely to reflect the native Tamil writing system. Furthermore, even where it deviates from the IPA, it explains its transcription in terms of the IPA. This book, as it happens, comes with audio recordings, from which the actual sounds can be learned, but even when such recordings are provided, a comprehensible text helps to prepare the student to make use of the recordings. (And of course, if the book is bought used or borrowed from a library, there is a good chance that it will come without the recordings.) It seems that Routledge has perhaps taken the idea of using the IPA to heart. Another recent entry in their series of language textbooks is Daisy L. Neijmann's Colloquial Icelandic (2001). This also uses the IPA to explain the sound system of Icelandic.

Although the use of IPA is still distressingly uncommon, these are by no means the first. Just rummaging through books I have on hand, I see that 小沢重男's モンゴル語四週間 (Mongolian in Four Weeks), published in 1986, uses the IPA to explain the Cyrillic writing system for Mongolian. So it isn't as if the IPA is a recent creation or that publishers and textbook authors have been unaware of it.

A topic that has come up repeatedly here is that most people, even highly educated people, know very little about language, and in particular, that they have no descriptive vocabulary for speech sounds. Mark has described several examples of this recently. What is more surprising is that people who do have a particular interest in language, such as the authors and publishers of dictionaries and language textbooks, fail so often to make use of the appropriate tools.

Posted by Bill Poser at 07:54 PM

No, the burly detective intoned, refusing to agree

Ray Girvan has pointed out to me a web site devoted to a guide to writing (primarily science fiction writing) that lists a whole bunch of features of bad novel-writing that a fiction writer should strive not to emulate: the Burly Detective Syndrome; Countersinking; Hand Waving; Card Tricks in the Dark; the Info Dump (including its variants the "As you know, Bob" and the "I've Suffered for my Art and Now It's Your Turn"); and so on. (As Ray notes, Dan Brown engages in all of these practices, frequently.) The site comes to you from the Turkey City Workshop in Austin, Texas. To find it at its rather unguessable URL, click here.

[And for an antidote (thanks to Arnold Zwicky for this), see John Rechy on why three of the most-cited Rules of Fiction -- Show Don't Tell, Write What You Know, and (not given in my previous post) Always Have a Sympathetic Character for the Reader to Relate to, are poisonous nonsense and you should utterly ignore them when you write. The choice of which advice to take is yours.]

Posted by Geoffrey K. Pullum at 04:55 PM

Highfalutin writing taught at Harvard?

Following up on Geoff Pullum's critique, Claire at Anggarrgon suggests that

Dan Brown's prose probably isn't entirely his fault. He was a Harvard undergrad, a classics major. As such, he's been subjected to Expos, which as far as I can see is a machine which turns semi-literate American high-schoolers into semi-literate American college students with a fondness for long words and extraneous adjectives.

This observation might help solve another literary puzzle. I'm still struggling through Matthew Pearl's The Dante Club, which is taking me longer to read than two dozen ordinary thrillers of similar length. My problem is the strange writing style, which I experience as a sort of constant low-level linguistic culture shock.

The difficulty is not so much with long words and extraneous adjectives -- though there are plenty of both -- but rather with words and phrases that seem strange in their context. In my earlier post, I gave an example where the strangeness goes so far that the result is outside the usual bounds of grammatical English ("a hefty bear of an indigoed uniformed man"), but usually the result is just odd and distracting. Here are a few more examples, from the thicket I happen to be thrashing through at the moment:

(p.101) Holmes jumped when he noticed the rifle leaning against the wall. "Longfellow, why in the land is that out here?"

(p. 102) Longfellow did not move. His stone-blue eyes stared ahead into the richly cracked spines of his books. It was not clear whether he'd remained a part of the conversation. This infrequent, remote look, when he sat silently running his hand through the locks of his beard, when his invincible tranquillity turned cool, when his maiden complexion seemed a bit dusky, put all his friends ill at ease.

(p. 103) "... I've done us all a good turn," Holmes said. "This could put us in a dangerous way!"

The phrases that I've highlighted in red all stop the narrative flow for me, while I try to sort out a complex of incompatible lexical and semantic associations.

Was "why in the land" really a late-19th-century variant for "why in the world"?

My understanding of the ... cracked spines of books is illustrated by this picture. However, such cracks are are not visible from the back of a shelved cloth- or leather-bound book, and anyhow, it's odd to say that a book's spine is "richly cracked" in this sense. Did Pearl instead mean that the books' spines had the kind of leather surface represented by this picture? Probably.

What kind of blue stones are Longfellow's stone-blue eyes like -- opaque pastel turquoise? clear cornflower-blue sapphire? the dusty gray-blue of indigo pigment, traditionally called "stone blue"? A commonplace expression like "sky blue" is open to similar ambiguities -- there are lots of colors of blue sky -- but because the expression is so common, it goes down easily. A phrase like "pale blue eyes": would also be inoffensive. An appropriate specific reference, to topaz or lapis lazuli, would create a specific image if one is needed, but simple "his blue eyes stared...", or even "he stared..." might have worked as well.

And what does it mean for Longfellow's "maiden complexion" to become "dusky"? Is it darker because he's blushing? That doesn't make any sense. Is it darker because he's hypoxic? That doesn't make sense either. Does Pearl just mean that Longfellow seems to his companions to be withdrawing into shadow? Again, I've just wasted a bunch of neural activation on a set of questions that have nothing to do with the story.

Pearl's "stone-blue eyes", "richly cracked spines", and "maiden complexion" that "seemed a bit dusky" are all phrases that seem to have little narrative function other than to add a highfalutin tone. They do add bits of circumstantial detail, but the images are like glittery found objects glued to the surface of a sculpture. The things they describe are not integral to the story, and the language of the descriptions is forced and somehow out of joint.

In my earlier analysis, I wrote that

The Dante Club's front matter tells us that "Matthew Pearl graduated from Harvard University summa cum laude in English and American literature in 1997, and in 2000 from Yale Law School". I ask you, is it likely that a person with that background would be so insensitive to the norms of the English language?

No, a much more plausible hypothesis is that Pearl graduated from a slightly different Harvard University, in a universe slightly different from our own, and read a body of English and American literature that is also just a bit different.

[...] I'd hate to revert to the much more prosaic theory that Pearl just systematically substituted fancier words for plainer ones, as one of my friends in junior high school used to do ...

Claire seems to be saying that the "Expos" course (short for "Expository Writing"?) is now teaching all Harvard undergraduates the technique that I teased my junior-high friend for using, as he sat with his copy of Roget's, systematically piling up modifiers and replacing common or expected words with rare or odd ones.

Please don't think, by the way, that I object to complex writing or to unusal words or phrases. Some of my favorite authors use archaic, specialized or dialectal language as a way of establishing character or propelling a narrative forward, as I've described here, here, here, here and here. Eamonn Fitzgerald explains how this works in the novels of Patrick O'Brian:

In O'Brian's hands, words change from being discrete items of vocabulary to elements making up a vast and vivid painting alive with nature, machines, horror, humour and humanity. ... This facet of his work was seized upon perceptively by Jason Epstein in The New York Times ... [quoting] a passage from The Far Side of the World in which Aubrey... constructs a device to raise the anchor because the usual mechanism -- the capstan -- has jammed:

"With scarcely a pause Jack called the midshipmen. 'I will show you how we weigh with a voyol,' he said. 'Take notice. You don't often see it done, but it may save you a tide of the first consequence.' They followed him below to the mangerboard, where he observed, 'This is a voyol with a difference.'" Bonden, a fellow officer, brings the heavy sheaved block. " 'Watch now. He makes it fast to the cable -- he reeves the jeer-fall through it -- the jeer-fall is brought to the capstan, with the standing part belayed to the bitts. So you get a direct runner-purchase instead of a dead nip, do you understand?' "
Not quite (especially since mangerboard and jeer-fall do not appear in the 12-volume Oxford English Dictionary or its several supplements), but enough for readers to see for themselves what O'Brian has left to the imagination: Aubrey bent under a hanging lantern in the dappled half light below decks surrounded by his midshipmen in their top hats, showing them with his hands how to raise an anchor when the capstan pawls are broken."

[The Epstein piece, a justly negative review of the movie Master and Commander, can be found here].

This kind of writing is not to everyone's taste, but it works for me in a way that Pearl's writing doesn't.

Posted by Mark Liberman at 12:47 PM

More on journalistic ignorance of linguistic description

Eric Bakovic emailed to point out that Bob Mondello's review of Troy on NPR includes the following characterization of Brad Pitt's speech:

"As with most sword-and-sandal epics, go indoors and everything's suddenly about statuary, and torches, and an international cast that's trying to reach common ground on accents; here the kings hail from Scotland and Ireland, and the followers from London's West End and Australia. Happily this makes Pitt's Achilles sound like the outsider he's supposed to be, even when he remembers to round his vowels..."

I haven't seen Troy, and don't have access to a sound track, so I don't know what Pitt actually does to accomodate his vowels to his transnational surroundings. However, I'll bet that it has nothing to do with what linguists call "rounding", in which the orbicularis oris (and some other facial muscles) are used to constrict and protrude (i.e. "round") the lips, thus lowering the vocal tract resonances known as formants.

Instead, Mondello is using "round" in an evocative way, as Arthur Rimbaud did when he assigned colors to vowels:

A noir, E blanc, I rouge, U vert, O bleu: voyelles,
Je dirai quelque jour vos naissances latentes

But there are some key differences here. I don't mean that Rimbaud is a dead French poet, while Mondello is a live American movie critic. I mean that Rimbaud was working out a systematic metaphorical (or synaesthetic) correspondence between (nouns for) vowels and (adjectives for) colors. Mondello, on the other hand, is just thoughtlessly mis-using one verb, originally a term for shape, that already has a standard meaning in the domain of vowel sounds. 19th-century French phoneticians didn't use rouge and bleu as technical terms for vowel "color" -- nor do they now -- but they did use the term arrondi ("rounded"), and in the vowel charts of the International Phonetic Association, since the first one was published in 1888, vowels have been presented in the traditional rounded and unrounded pairs, with a specific meaning that is not Mondello's.

Being a transgressive kind of guy, Rimbaud would have been happy enough to subvert the IPA's terminology if it had served his ends. But Mondello is not subverting anything, he's just ignorant.

Let's add this to the pile of evidence that American education needs an infusion of basic descriptive linguistics. As I wrote in reference to another case:

Leon Wieseltier is the Literary Editor of The New Republic magazine. He's not just an acute observer of social relations, he's also highly educated and well read. He knows the word sociolinguistic, for instance, and he's not afraid to use it. But as I pointed out at the time, Wieseltier shares a blind spot with most other intellectuals today -- he can't describe the basic facts of language in a coherent way, because he doesn't know what the basic descriptive vocabulary means. We've seen other examples of this same problem recently, when intellectuals were discussing passive verb forms, or the structure of arguments, or hiphop vowel sounds.

This is not their fault -- not the fault of people like Wieseltier and Mondello -- except in the limited sense that public intellectuals have some responsibility to try to learn the elementary science and scholarship of fields that they comment on. Instead, it's the fault of the American educational establishment, which has almost entirely abandoned its responsibility to teach the basic terminology and skills of linguistic description. It's also the fault of the field of linguistics, which has devoted little effective effort to basic education on a large scale.

Posted by Mark Liberman at 08:51 AM

May 15, 2004

A new verb: to thaler

Anoop Sarkar at Special Circumstances answers Geoff Pullum's question about how a part-of-speech tagger would deal with verbless English. The result: pretty good performance, and at least one amusing error:

  a   word of gratitude to Thaler -- otherwise an unimportant screwball
  DT   NN  IN    NN     TO   VB   --     RB    DT     JJ         NN

POS tags are documented at the end of Anoop's post, but these are DT=determiner, NN=common noun, IN=preposition, TO="to", VB=bare verb, JJ=adjective.

Ignoring punctuation, this sequence is analogous to a string like "a plan of action to arrange carefully a lovely bouquet". As if.

[Update: Michael Leuchtenburg emails:

When I saw the title of your post on Language Log, I at first thought that you were suggesting naming the practice of writing without verbs - or perhaps without any one part of speech - "thalering". It seems a fitting tribute to an author who writes without verbs to turn his name into a verb.

Indeed. Though Thaler already has a distinguished etymological history in English, via the (literal, silver) coinage of St. Joachim's Valley; the OED cites:
1864 CARLYLE Fredk. Gt. XVII. v. IV. 571 'Let my ducat be a Joachimsthal one, then!'.. 'a Joachimsthal-er'; or for brevity, a Thal-er; whence Thaler, and at last Dollar.
]

Posted by Mark Liberman at 07:59 PM

The sixteen first rules of fiction

In an earlier post I confessed that I was "still trying to come up with a convincing account of just what it was about his very first sentence, indeed the very first word, that told me instantly that I was in for a very bad time stylistically" with Dan Brown's The Da Vinci Code. Then today I heard a fiction writer (Eleanor Lipman) talking on an NPR program (The Splendid Table) about how to indicate what characters are like by describing the food they order, and she mentioned the first rule of fiction writing. Suddenly I felt very foolish, because it was very simple, but it said everything that was needed.

The rule was "Show it, don't tell it." That hits it nicely on the head. Look again at the opening line of The Da Vinci Code:

Renowned curator Jacques Saunière staggered through the vaulted archway of the museum's Grand Gallery.

What's so inept about that first noun phrase is that a good fiction writer shouldn't have to tell us that the curator is renowned, and probably shouldn't even have to tell us that this is a museum curator staggering into the Grand Gallery of the Louvre late at night trying to forestall an attempt on his life. It should become clear to us as the action proceeds. In a short newspaper obituary you have to pack in phrases like "renowned curator" and such other details as the age of the deceased, but a competent novelist doesn't do that in an opening action sequence. That's what I should have said, and was struggling to say in the earlier post.

I must say I was surprised, though, when I went to check via Google that this really was the First Rule of Fiction and found, with a search on "first rule of fiction", that in fact there are at least sixteen (16) First Rules of Fiction. In addition to Show, don't tell, which is mentioned on two or three sites, I found these (they are roughly paraphrased; often the rules are only hinted at):

Be readable; grasp the reader's attention.

Don't explain.

Know your characters.

Drop the reader right into the middle of the action.

You can do anything.

Write what you know.

You can't talk about fiction.

Be true to the characters and let the story flow from them.

A relieved sigh ALWAYS brings trouble.

Truth is stranger than fiction, so appeal to the sense of absurd to gain credibility.

Never, ever, let your readers be confused about the precise geographical locations of your minor characters.

The narrator can't die.

Create a believable universe out of nothing.

It is not real life, but it must somehow honestly represent something of real life.

The voice may be yours, but the characters are just characters.

Evidently this fiction-writing business (in which I am not an expert) has more rules than I thought. Perhaps if I limited myself to novels that respected all of the rules it would get me down to such short lists that it would be easy to pick the next novel I should read (I don't know if I dare admit this to you, but I read one novel a year, whether I need it or not; I don't have time for more than that, and occasionally I skip a year — or waste a year the way I wasted 2003 on The Da Vinci Code).

Posted by Geoffrey K. Pullum at 05:20 PM

Escher sentences: prior art

Neque Volvere Trochum at entangledbank -- or is it entangledbank at Neque Volvere Trochum? -- has found some prior art for the coinage "Escher sentence", in a MST-ing of some LotR fan fiction:

Frodo nearly didn't escape, but with one less finger.

Back: As opposed to the last time he nearly didn't escape, back when he had two less fingers.
Restless: Ooh. Could said finger be up your ass, scratching around, trying to find a Coherant [sic] Thought?
Tele: *blinks. reads sentence again. blinks more* No matter how you slice it, that's still not making sense.
Kyuu: This is sort of like one of those Escher paintings, like with the stairways. Or a moebius strip. You keep going round and round trying to make sense of it.

While you're visiting, N.V.T. (or entangledbank?) also offers several interesting observations on other Escher-like sentences, as well as insights into the mysteries of syntax, phonology, exam preparation and nanoporn.

Posted by Mark Liberman at 11:46 AM

All your base are belong to which lexical category?

The first sentence of this BBC story about Intel's profits took me slightly aback:

For the three months to 27 March, the Californian-based company made a profit of $1.7bn, almost double the $915m recorded for the same period in 2003.

Shouldn't that be "California-based"? I thought to myself.

Checking relative frequencies on the web, I got an answer: 416,000 ghits for "California-based" vs. 2,580 for "Californian-based". 99.4% of the web agrees with my judgment -- and also with grammar and logic, it seems to me, since "X-based" should be a compositional compound noun, meaning "based in (or on) X", where X is a noun. Nobody would say "based in Californian." QED. The feeble 0.6% are just confused, I thought smugly. Perhaps they are attracted by the irrelevant analogy of other adjective-noun sequences. So much for the Beeb, how the mighty have fallen, etc.

But wait a minute, said the still small voice of conscience. How about "European-based"? Doesn't that sound just as good as "Europe-based", or maybe even better? Checking the web, I found 42,600 ghits for "Europe-based" vs. 60,700 for "European-based": 41% for the noun, 59% for the adjective. Even-steven from a grammatical point of view (though an adjectival landslide in electoral terms!)

And looking at the next few examples of relevant noun/adjective pairs that occurred to me makes the picture even murkier. "Boston-based" is 80,000 times commoner than "Bostonian-based", but "Canada-based" is about 34% less common than "Canadian-based", and so on:

	noun	adjective	ratio
Athens/Athenian	6,460	11	587
Boston/Bostonian based	240,000	3	80,000
California/Californian based	416,000	2,580	161
Canada/Canadian based	70,300	94,400	0.745
China/Chinese based	39,000	7,450	5.25
Egypt/Egyptian based	4,920	4,520	1.09
Europe/European based	42,600	60,700	0.702
France/French based	24,800	29,100	0.852
Germany/German based	44,800	44,300	1.01
Greece/Greek based	3,320	2,970	1.12
Ireland/Irish based	34,400	16,300	2.11
Israel/Israeli based	20,100	6,750	2.98
Japan/Japanese	43,800	8,940	4.90
Korea/Korean based	14,900	5,680	2.62
Latvia/Latvian	558	250	2.32
Nigeria/Nigerian based	2,070	853	2.43
Norway/Norwegian based	8,400	3,800	2.21
Paris/Parisian based	91,000	297	306
Pennsylvania/Pennsylvanian based	45,100	38	1,187
Russia/Russian based	10,400	6,600	1.58
Scotland/Scottish based	31,100	28,900	1.08
Tunisia/Tunisian based	336	130	2.59
Turkey/Turkish	4,840	1,570	3.08
Vienna/Viennese based	21,000	32	656

(Some of these should probably be removed from consideration, at least pending reanalysis, because the "adjective" forms are really nouns much of the time, as in "Greek-based" meaning "based on the Greek language". I don't think this will change the overall picture much. It's possible that a more careful accounting for other sense differences and other details of semantic relationships would clear things up, but I doubt it.)

Adding it all up, it's about 79% for the nouns, 21% for the adjectives. A victory for logical grammar, but hardly a resounding one. There are several pockets of stalwart adjectival resistance (or craven concession to adjectival irrationality?): Europe, France, Canada, at .70, .85, .75 noun/adjective ratios respectively. Germany is on the edge at 1.01.

Seriously, it's clear that different place-names are behaving differently here. What's the principle, if any? Word length? Unigram (word) frequency? Longitude? Affix? Country vs. City? Few of my first few hypotheses are even true, and none of them explain much of the variance.

And what if we picked a different head noun, such as "X-oriented" or "X-bound" or "X-educated"? Would the statistics be similar, or different?

And does all this have anything to do with the compound nouns that don't involve a de-verbal head at all, but are created by adding "-ed" to a modified noun, as in "red haired"? In other words, is the construction (at least sometimes and for some people) [[Canadian base]+ed] ? If so, does that offer any traction in explaining the enormous variation in usage statistics sketched above? I don't see how, but at least it would provide a grammatically and logically plausible analysis for such phrases.

Then again, maybe I'm just being old-fashioned in expecting a coherent compositional account of how regular phrasal patterns acquire their form and meaning, as opposed to the currently-spreading view that "our interpretive capacities take into account holistic informational characteristics of linguistic constructions and don't simply generate meanings by way of 'bottom up' recursion principles."

Posted by Mark Liberman at 07:38 AM

May 14, 2004

Disgrace

Although I rarely disagree with Geoff Pullum, I will venture to do so with respect to his discussion of George W. Bush's statement that "Like you, I have been disgraced from [or "about", or "by"] what I've seen on TV, what took place in the prison."

Geoff argues that "disgraced" must have been a mistaken choice of word -- a malapropism -- because "(1) it makes no sense, since Bush is not personally and directly in disgrace over this (no one says he helped wire up prisoners or unleash dogs on them); (2) it would not be to the point (this isn't about his shame or loss of standing in society); and (3) even if some people thought it were all about him being shamed, Bush makes it a paramount principle of his administration that no one ever admits to shame about America."

Now, Geoff might be right that W really meant to say that he was "disgusted" or "dismayed", had in mind that the events are "a disgrace", and came out with "disgraced." But it ain't necessarily so. I'm perfectly prepared to state, speaking for myself, that I've been disgraced by the Abu Ghraib atrocities, simply by virtue of being a citizen of the country that ran the prison, and not because I feel any direct or indirect responsibility.

And this is not just a quirk of my experience with (the word) "disgrace". The OED's sense 2b for "disgrace" (the verb) is:

2b. To put out of countenance, abash, dismay.

1607 TOPSELL Four-f. Beasts (1658) 160 Casting..burning torches into the face of the elephant; by which the huge beast is not a little disgraced and terrified.

and its sense 7 is:

7. To bring (as an incidental consequence) shame, dishonour, or discredit upon; to be a disgrace or shame to; to reflect dishonour upon.

1752 JOHNSON Rambler No. 196 7 Of his children..some may disgrace him by their follies.

Perhaps swayed by the received idea that W is prone to misspeaking, Geoff has been too quick to take his own perceptions -- which in general are remarkably acute -- as a picture of English usage in general. I agree with Geoff that for the commander-in-chief to say that he has been disgraced does raise the option of a different sense of "disgrace", the one appropriate for someone who is responsible for a disgraceful action -- who is "in disgrace" rather than simply being shamed or dishonoured -- or "disgraced" -- by the actions of others. If I were in W's place, I'd have taken that possibility into consideration, and I would have tried to make it very clear just what I meant. But it's possible that what he meant was pretty much just what he said, or at least what I would have meant by saying the same thing.

To establish that W's sense of "disgraced" -- and mine -- are not just archaic 17th- and 18th-century usages somehow preserved in rural America, I'll cite, as one last piece of evidence, (Philadelphia native) Joan Jett's 1995 song Last to Know:

There was a time I'd given you a piece of my soul
Now everything's changed and it's gettin' so outta control
Cuz I caught you one time then I caught you two times
You'd better walk away cuz now the truth's been told

You disgraced me how can you face me
I was the last to know
For all I tried to be
You still lied to me
I was the last to know

I been rackin' my brain and the truth is plain to see
You got no shame about what you've done to me
I believed you one time there wouldn't be another time
Get on your way cuz nobody rides for free

I thought you'd save me
But you betrayed me
I was the last to know
It was a sneak attack
You stabbed me in the back
I was the last to know

Sometimes I wake up in the middle of the night in a cold sweat
The sheets are wet all these crazy thoughts going
Round and round in my head thinkin' 'bout the things we said
Some things we done somethin' happened somethin' changed
I'm not sure what I'm not sure who to blame

I got somethin' to say an' I hope it's getting through
Cuz I know what I gotta do what I never ever wanted to do
I warned you one time but now you crossed the line
It's too late now cuz I'm done with you

You disgraced me how can you face me
I was the last to know
For all I tried to be
You still lied to me
I was the last to know

Posted by Mark Liberman at 09:20 PM

Envoi for the omission of lexical categories

Continuing the lexical-category-omission thread, Paul Goyette at locussolus has contributed an excellent verb-free post, so cleverly done that you might not even notice.

Meanwhile, over at Livejournal, the Grand High Supreme and Mighty Empress Connie has observed that the folks "going out of their way to write without various parts of speech, such as nouns, verbs, and prepositions" are "out of their friggin' minds, but it's kinda cool, in that 'I'm so scared right now' sort of way."

The imperial comment is an appropriate epitaph for the whole exercise, which I think has now largely run its course, unless someone starts writing articles omitting two categories at once. Two significant categories, of course-- an article lacking interjections and pronouns, for example, would not pack the punch of one without any nouns or verbs.

The ultimate point in this progression would be a completely blank post, in the spirit of John Cage's 4'33". Perhaps someone has already copyrighted a blank document, as Cage has effectively copyrighted silence. If not, the royalties could be tremendous...

And well deserved, too. As Walt Whitman said before his tragic accident in Walt Disney Land:

I swear I see what is better than to tell the best;
It is always to leave the best untold.

Posted by Mark Liberman at 07:49 PM

Obligatorily Split Infinitives

(1)
J... says that he expects the staff size to more than double within two years. (from Judith Lee; gossiping dinner guest, 4/9/04)

At first this looks like a perfectly ordinary example of "splitting an infinitive", with the modifier more than intervening between the infinitive marker to and the verb double. Thoughtful writers on usage -- see the summary in the Merriam-Webster's Dictionary of English Usage entry for "split infinitive" -- have long admitted that such examples are at least sometimes acceptable, even in formal writing, though some permit them only when absolutely necessary, as if they were a vice that's regrettably unavoidable on occasion. But (1) doesn't work like standard examples of split infinitives; it's obligatorily split.

Your usual split infinitives alternate with two other structures, a "preposed" structure with the modifier before the to, and a "postposed" structure with the modifier later in the sentence, usually at the end of the infinitival VP:

(2)
a. Intervening: I expect to soon see the results.
b. Preposed: I expect soon to see the results.
c. Postposed: I expect to see the results soon.

For example (1), however, only the intervening modifier is acceptable:

(3)
a. Intervening: We expect it to more than double.
b. Preposed: *We expect it more than to double.
c. Postposed:*We expect it to double more than.

But we don't have to rely on my personal judgments; we can look at corpora. A Google web search on 4/21/04 yielded about 53,600 hits for "to more than double", most of them of the right sort:

(4)
a. ...BenQ plans to more than double its channel this year, its president says...
b. .... GPRC hopes to more than double student housing by September 2005...
c. ... The broadband market is set to explode, with access expected to more than double to 46 million-plus households in the United States by 2008, up...

Using a crude estimate that 20% of the hits will be repeats or irrelevancies, there are still about 40,000 citations.

Contrast this with a search on "more than to double": about 32 hits, only 20 with repeats removed, and under 10 actually relevant examples, among them:

(5)
a. ... These products have allowed the company's turnover more than to double: from £66.3m in the year to June 30 1998 to £150m in the 2001 -02 financial year.
b. ... It is rubbing salt into the wound more than to double the charges at such short notice. I appeal to the Minister and to his right hon. and hon. ...
c. ... If the cost of living has more than doubled, it would be necessary more than to double the pension in order to keep the recipients on the same poor scale of ...

I conclude that there are some people who allow preposing of more than, but not many of them, and I suspect that these are people who are conscientiously avoiding the horror of split infinitives, according to a "rule" they learned in school. After all, some textbooks and teachers prohibit split infinitives in any circumstances whatsoever, so that it's no surprise that some of their students would opt for the preposed variant, at least in extremely formal contexts like the administrative writing and parliamentary debates illustrated in (5). I claim, however, that almost all speakers and writers of English show the pattern in (3), not the one in (2).

Parallel to modifier more than are modfier up to and over. First, examples with finite verbs:

(6)
a. ... In one example, we found this up to doubled the speed of a small function.
b. ... dry soils. Human influences like over-cultivation and soil erosion may have up to doubled the flux of mineral dust.

(7)
a. ... Legal fees Fees depend on the lawyers. Mine charged me 75,000B +15% of the take - we settled for 180,000B, so I just over doubled my money.
b. ... For our regular readers - who have over doubled in numbers in the last year - the new reading room offers a significant improvement in working conditions...

As for infinitival verbs, here there are relatively small numbers of examples with intervening modifiers for to double -- on the order of 50 each for up to (8) and over (9) -- but none at all with preposed modifiers (up to to double and over to double), on the web or in newsgroups.

(8)
a. ... Indeed, fuel cells have the potential to up to double the efficiency of cars and power generators while significantly reducing air pollution.
b. ... We are also (again!) trying to get a bill passed that would allow us, on a county by county basis, to up to double the amount we get from each fee.
c. ... Marketscore is a free download accelerator which purports to up to Double your Internet Speed ...

(9)
a. ... Over the years, two to be exact, AMD has managed to over double the speed of the Athlon core from 650MHz to 1400MHz!
b. ... She had the skills to over double her membership at Stevenage, and prove that one-to-one communication with potential members is key.
c. ... While a mansion may not be an option there is full planning permission to over double the size of this 3-bedroom bungalow, which has a walkway leading ...

Once again, obligatorily split infinitives.

This is no accident; more than, up to, and over are in fact prepositional modifiers (Cambridge Grammar of the English Language, pp. 357, 432f.), and like prepositions in general, must be immediately adjacent to the expression they combine with:

(10)
a. Intervening: *We looked at especially Sandy.
b. Preposed: We looked especially at Sandy.
c. Postposed: We looked at Sandy especially.

Although I haven't done corpus explorations of the rest of the prepositional modifiers as used for degree modification, my judgments are that, so long as the semantics is coherent, they can be so used, as in the folowing invented examples:

(11) I expect our profits
... to just about double next year.
... to around double next year.
... to between double and triple within a year.
... to from double to triple within a year.

And they are restricted to intervening position, just like the others:

(12) I expect our profits
... *just about to double next year.
... *around to double next year.
... *between to double and triple within a year.
... *from to double to (to) triple within a year.

The bottom line is that nothing requires infinitival to to be immediately adjacent to the head V of the VP it combines with -- intervening modifiers are fine, so long as they belong to this VP -- but prepositions, whether they are heads or modifiers, must be immediately adjacent to the XP they combine with. And so: obligatorily split infinitives.

Posted by Arnold Zwicky at 03:39 PM

Serious just for a verbless moment

If I may get serious for just a moment about this discussion concerning writing without certain specified categories or characters, Matt Weiner at Opiniatrety says we are "philistines" here at Language Log because we think it "a priori ridiculous" (so he says) to engage in experiments like those of the pseudonymous `Michel Thaler', who tried writing a novel, Le Train de Nulle Part, with no verbs. Says Weiner:

Writing under apparently silly constraints can lead to new unthought-of fascinating patterns, as in Perec and other Oulipo writers. People should ignore ridiculous anti-verb polemics, but keep open minds about books they haven't read.

Matt is quite right.

I actually think Italo Calvino's experiment with a novel that has each chapter written in a different style (If On A Winter's Night A Traveller) is fascinating.

It is true that in my 350-word post with no verbs in it, just to have something to say, I chose to lambast Thaler as a posturing fool ("nuts, bonkers, round the bend. Mad as a March hare..."). However, I was completely insincere. I have not even laid eyes on Le Train de Nulle Part. It might be fascinating. For I can tell you this: I myself found it extraordinarily challenging to write without verbs. I couldn't believe how hard it was. Moreover, as I tried it I kept finding myself driven into a much more florid, literary, poetic style than normally comes out of me (not all of what I wrote survived in the posted draft). I have no idea what might emerge if I tried a serious full-length literary work in that mode, but just trying to do 350 words was highly instructive.

So I agree with Matt: when Thaler says in interviews that verbs are "invaders, dictators, and usurpers of our literature ... like a weed in a field of flowers ... You have to get rid of it to allow the flowers to grow and flourish", I suspect he is the one who is not being sincere: such remarks really are absurd. But his novel may not be. I have mocked it sight unseen, but that doesn't mean I have really made up my mind in advance about it. Sometimes I write over-the-top polemics or fantasies just for a giggle.

Posted by Geoffrey K. Pullum at 01:55 PM

Ben Franklin and German

In spite of the concern about being overwhelmed by German-speaking hordes reported by Mark, as a young man in 1732 Benjamin Franklin was the publisher of the first German-language newspaper in North America, the Philadelphische Zeitung. It lasted only briefly, perhaps in part because Franklin did not have German type. He used his usual Caslon Antiqua, which must have seemed as odd to the German readers of the day as English printed in Fraktur type would seem to us.

Posted by Bill Poser at 11:20 AM

Palatine Boors and their Maryland descendents

In connection with the fuss over the anti-multi-culti remarks by Maryland Governor Ehrlich and Comptroller Schaefer, an Op-Ed piece in today's Baltimore Sun quotes Benjamin Franklin complaining about the influx of Ehrlich and Schaefer's ancestors:

"Why should Pennsylvania ... become a Colony of aliens, who will shortly be so numerous as to Germanize us, instead of our Anglifying them, and will never adopt our language or customs, any more than they can acquire our complexion?"

-- Benjamin Franklin, Observations Concerning the Increase of Mankind, Peopling of Countries, etc., 1751

The context of this quotation (assuming the various copies of it on the internet are accurate) is more shockingly chauvinistic, to modern ears, than Erhlich and Schaefer's unpleasantries or even Samuel Huntington's recent screeds:

And since Detachments of English from Britain sent to America, will have their Places at Home so soon supply'd and increase so largely here; why should the Palatine Boors be suffered to swarm into our Settlements, and by herding together establish their Language and Manners to the Exclusion of ours? Why should Pennsylvania, founded by the English, become a Colony of Aliens, who will shortly be so numerous as to Germanize us instead of our Anglifying them, and will never adopt our Language or Customs, any more than they can acquire our Complexion?

"Swarm"? "Herding"?

In fact, it was not until 1923, some 172 years after Franklin's complaints, that the U.S. Supreme Court determined that "mere knowledge of the German language cannot reasonably be regarded as harmful." And now, 253 years later, the collateral descendents of the Palatine boors in question (i.e. peasant farmers from south-central Germany) have apparently lost track of their history.

Posted by Mark Liberman at 10:46 AM

Have you been disgraced?

"Like you, I have been disgraced from what I've seen on TV, what took place in the prison," President Bush said yesterday during a campaign visit to West Virginia where he was discussing education. Sources using AP generally reported "disgraced from"; some others had "disgraced about"; a few reported "disgraced by". But never mind the preposition. what no news sources have pointed out is that the President's ungainly piece of unscripted commentary about Abu Ghraib was surely uttered by mistake. In groping through his mental lexicon for that past participle he hit the wrong verb.

I, at least, am convinced that disgraced couldn't have been the word he was after. There are at least three reasons: (1) it makes no sense, since Bush is not personally amd directly in disgrace over this (no one says he helped wire up prisoners or unleash dogs on them); (2) it would not be to the point (this isn't about his shame or loss of standing in society); and (3) even if some people thought it were all about him being shamed, Bush makes it a paramount principle of his administration that no one ever admits to shame about America (I doubt that any president would allow talk of America or the presidency having been disgraced, certainly not in an election year).

No, Bush fumbled his words once again. Stumbling around between saying "I was disgusted" (or perhaps "dismayed" or "disturbed") and "It was a disgrace" (or "It was disgraceful"), he tripped over the dis- words, picked up the wrong one, and blurted out "I was disgraced." One NPR reporter (Don Gonyea, May 14, 2004) talked about this being the strongest condemnation yet. It wasn't. Perhaps one should say that the President of the United States ought to feel indirectly disgraced by the revelations about what happened in Abu Ghraib on his watch; perhaps we Americans all should. But I am absolutely certain that the President himself did not mean any such thing. Being disgusted (or dismayed or disturbed) is quite different from being disgraced by an event; to be disgraced is to have an event reflect badly on you personally -- it has to bring shame down upon you.

Posted by Geoffrey K. Pullum at 10:39 AM

Language Rights in Maryland?

Today's Washington Post reports that Maryland's Governor Robert L. Ehrlich, Jr., has called the concept of multiculturalism "crap" ("In Md., Multiple Views of Multiculturalism", by Darragh Johnson and Matthew Mosk). This includes the use of languages other than English:

"The goal here is assimilation, the goal here is to strengthen the melting pot that is American, not to separate outselves out," Ehrlich (R) said in an impromptu call to WTOP Radio yesterday morning...

Ehrlich's comments came last week in defense of state Comptroller William Donald Schaefer (D), who had announced he would no longer eat at McDonald's because of an encounter with a Spanish-speaking cashier. On a WBAL-AM radio show, Ehrlich said: "I reject the idea of multiculturalism...With respect to this culture, English is the language."

Leaders of various ethnic groups responded in the typical way -- typical and obvious, except that the point is apparently too abstruse for the governor and co. to grasp it:

"It's very, very important that you learn the language," said Angelo Solera, a Latino activist in Baltimore. "...But people have to understand that it takes time to learn English."

Duh. Why some politicians fail to grasp this fact, and why they also fail to understand that their goal can be achieved only if they spend some money financing English as a Second Language classes for legal immigrants, is a continuing mystery. (Well, it's a mystery if one assumes that they're sincere and not just posturing for political purposes.) But Ehrlich is unfortunately right in line with the other English Only folks around the country.

The article also emphasizes Ehrlich's own ethnic German background, which he is proud of -- and it points out that "the Baltimore school system was bilingual for many years, with classes taught in both German and English...Before World War I, there was a large contingent of German immigrants in Baltimore and, as a result, an extensive German subculture." Moreover, as a Johns Hopkins political science professor observes, "[I]t took generations for the assimilation the governor talks about to occur."

Posted by Sally Thomason at 07:42 AM

Not bad

According to The Gematriculator, Language Log is 36% evil, 64% good.

Geoff Pullum may be happy to know that "The Gematriculator uses Finnish alphabet, in which Y is a vowel", though I've learned to be cautious in predicting Geoff's moods.

[via Blinger]

Posted by Mark Liberman at 07:14 AM

Intellectuals in Britain and France

In response to recent posts on elite anti-intellectualism and French literature, Dave Long emailed:

A footnote from a mathematics book which would not have been mistitled as "The Joy of x"*:

In Anglo-Saxon countries, the word 'intellectual' ... is, in fact, an insult, even among intellectuals. In British society, clever men waste a great deal of time pretending to be stupid. (In France the situation is reversed.)

* T.W. Körner, The Pleasures of Counting.

Here is a link to the very page (293) via amazon's helpful "search inside this book" feature, which may work for you (or not, I don't know if such links are durable... )

Tom Körner's home page is here.

And I hope Körner meant to imply that on both sides of the channel, clever women are too intelligent to waste their time in any sort of pretense :-).

Posted by Mark Liberman at 05:58 AM

Zut alors!

Cette bêtise française me fait mal à la tête. Bntt l'n crr sns vylls. Ousansespaces. Ou a ooe.

Posted by Bill Poser at 12:39 AM

May 13, 2004

Lacking articles, conjunctions, ...

As available categories dwindle, Matt Weiner has posted without articles.

Posted by Mark Liberman at 10:42 PM

Writing without adjectives

With regard to whether one could write English without ever using an adjective (since we've seen examples of writing with no verbs and with no nouns and with no prepositions ), we are of course told by some that writing of quality never uses adjectives at all: "sages of writing" are alleged to agree that adjectives are to be avoided completely by writers of taste and discernment (I discussed this topic here). The sages who do say such things, if there are such sages, are nutballs. They should be drummed out of the grammar service, stripped of their sagehood. It is an absurdity to suggest that writing fails in some way when adjectives are permitted.

On the issue of identifying adjectives, however, I need to offer a word or two of guidance.

Here the analysis of The Cambridge Grammar facilitates things a little. There is a tradition in the grammar scholarship of the period 1500-2000 of equating the notion "adjective" with the function of qualifying the meaning of a noun. Take, for instance, the phrase I just used in the sentence that precedes this one: "grammar scholarship". Since "grammar" qualifies the meaning of "scholarship" there, as the tradition has it, "grammar" must be an adjective. The Cambridge Grammar rejects that tradition. It claims that grammar is a noun. Always a noun. But nouns can be used as modifiers of nouns. That is, they can be used attributively. It doesn't make them adjectives, any more than using an axe to hammer in a nail makes the axe a hammer.

Likewise, words such as "this" are not adjectives. Certainly, in "this book", the word "this" occurs before a noun and contributes something (perhaps a qualification) to the meaning of the noun phrase. But "this" is a determinative, not an adjective, and it functions as a determiner in cases of the sort in question, it doesn't function attributively. Likewise with some: it's a determinative. Pronouns such as my are not adjectives either; they are pronouns, genitively inflected, functioning as determiners.

Notice also that there are (at least in styles of U.S. English that bear the hallmarks of informality) adverbs derived from adjective bases by zero derivation. Examples include both the last two words in They beat him up real good (see Arnold Zwicky's post here). They're both adverbs. And this post isn't trying to avoid adverbs, it's trying to avoid adjectives. Looks like it managed it, too -- provided we take "like" to be a preposition, not an adjective taking a noun phrase complement.

[Thanks to Daniel Currie Hall for helping me to get the errors out of my drafts.]

Posted by Geoffrey K. Pullum at 03:34 PM

Writing prepositionlessly

Language Log fans will no doubt want to be sure, having seen a verbless post and a nounless one, that any other lexical category could likewise be eliminated, and intelligible prose still be written. This post constitutes one such exercise that is not too difficult: it absolutely and completely lacks prepositions. It is not too appallingly difficult to write this way (I confess that early drafts slipped many times, but I fixed them). However, it's a bit more difficult when we adopt the modern conception that The Cambridge Grammar advances. This conception entails that far more words get classified "preposition".

Specifically, there are many words that older traditional views take to be adverbs which, the way The Cambridge Grammar analyzes them (following the great Danish grammarian Otto Jespersen), are prepositions that don't require object noun phrases. (I can't list them; they are prepositions, it would ruin everything.) There are also words that traditionally get called "subordinating conjunctions", and The Cambridge Grammar assigns them the categorization prepositions that require a complement clause. I haven't used any words that either category embraces. I haven't been able to use any full comparative constructions, either, because there are two prepositional items that such constructions frequently demand (I'm not permitted to name them; can you?).

Don't worry, however, that the element "to" that the infinitival construction employs ("to be or not to be"), an item that I have used several times, might be impermissible. Its historical origin may be prepositional, but it has no remaining prepositional properties. It is unique: a special marker infinitivals need. No other English word can replace it; and the converse also holds.

Posted by Geoffrey K. Pullum at 02:48 PM

How extremely rum: elite anti-intellectualism and disfluency

In support of the notion that anti-intellectualism has long been stereotypically associated with the British upper classes, I can offer this famous anecdote:

Edward Gibbon once presented the Duke of Gloucester (brother of King George III) with a copy of the first volume of his Decline and Fall of the Roman Empire.

When the second volume appeared in 1786, Gibbon again arrived to offer a personal copy. The duke's reply? "Another damned, thick, square book! Always scribble, scribble, scribble! Eh, Mr. Gibbon!"

And as a suggestion that acquired disfluency has also had a place in that strange culture, consider this passage from Hilaire Belloc's 1910 Dedicatory Ode:

The Freshman ambles down the High,
In love with everything he sees,
He notes the racing autumn sky,
He sniffs a lively autumn breeze.

"Can this be Oxford? This the place?"
(He cries) "of which my father said
The tutoring was a damned disgrace,
The creed a mummery, stuffed and dead?

"Can it be here that Uncle Paul
Was driven by excessive gloom
To drink and debt, and last of all,
To smoking opium in his room?

"Is it from here the people come,
Who talk so loud, and roll their eyes,
And stammer? How extremely rum!
How curious! What a great surprise!"

I have little evidence that this cultural complex includes mispronouncing foreign words or culpable regularization of ethnonyms, but there may be some connection all the same.

In any case, I don't mean to suggest that there is no such thing as populist anti-intellectualism. But the kind of intellectual capital that we're talking about -- the ability to pronounce foreign words, to deal with the more obscure corners of irregular morphology, or to compose in a highly-ritualized formal style -- is not something that everyone is likely to value equally, if at all. In a modern society, such knowledge logically ought to be valued most by the upwardly mobile middle class, who can use it to get ahead. It ought to be resented by those stuck at the bottom, who can't use it and aren't going to get it anyway; and it ought to be scorned by those perched at the top, who have no need to strive, and who can hire whatever expertise is appropriate. In the absence of special cultural respect for learning, due to other mechanisms entirely, this seems to be just about the way that things work out.

Posted by Mark Liberman at 07:25 AM

To post verblessly is so jejune!

While having earlier always been been totally disinclined to attempt to write nounlessly, to denominalize (hmm... denoun, perhaps?) is now starting to seem potentially enjoyable and might turn out to be exhilarating, though admittedly rather elliptical. Go figure!

To write completely nounlessly is surprisingly difficult. Is to write verblessly as tough? Frankly, to blog verblessly is to imitate, to Thalerize, and thus, ultimately, to be ... passé. Without meaning to seem patronizing, or to insult, to deverb is not only nerdy, but also relatively easy, whereas to denominalize is unquestionably much harder. Indeed, to denominalize and yet be clearly comprehended is even trickier.

Furthermore, to scribble verblessly is to fail to stand, fail to walk, fail to run: shortly, to fail. (Getting too excited. Going overboard. Must try to relax. Breath deeply. Sit crosslegged and meditate awhile.) But ahhh, to write nounlessly is to live anew, not to be tied to thinking concretely, not to be anchored, not to be grounded, but rather to lift off and fly, as if previously to write was just to crawl, penned in, hemmed in, restricted. To write now seems to be no less than to innnovate, to create, and, verily to communicate as never before. Indeed, to denominalize, though unexpectedly demanding, could be even more enjoyable than to de-adjectivize, to de-prepositionalize or to de-adverbalize would be, and could entertain more than briefly. Think not? Well think again. Try to write nounlessly and see!

Posted by David Beaver at 03:44 AM

May 12, 2004

A verbless post

A verbless novel? Why?? What reason for the accomplishment by this showy fool in France, Michel Thaler, his effort at an entire novel with no verbs (perhaps not a wise or lucrative publication venture, given the not total incorrectness of my speculations) recently evident amongst the vast efflux of absurd literary pretense in the French language? Well, whatever his reasons, in response, my own contribution: a verbless post (the first on Language Log).

No verbs at all in this book of Thaler's, just nouns, pronouns, adjectives, adverbs, prepositions, subordinators, coordinators, and -- oh! -- interjections. All those among the permissible (and for him, past participles too, though no participial intrusions in this post, such the extreme character of my cruel and unreasonable self-applicable strictures), but never one single solitary verb. And, fantastically, all this a vision of some liberation for authors, not an absurd literary straitjacket for the writer's (albeit willingly) incarceration. Some freedom, this.

Thaler: nuts, bonkers, round the bend. Mad as a March hare. The Liberman conjecture (about survival of high school literary experimentation into adulthood because of a dysfunctional authoritarian French educational system): probably true. My attitude: contempt, really. Except... Unless...

Just possibly, an exercise, for the undergraduates in my course on English grammar this fall quarter. Their mission: an effort at construction of fifty words of coherent prose with never a verb; the point: only those in possession of enough grammatical knowledge for verb identification capable of success. Worth a try? Perhaps. And in that case, a word of gratitude to Thaler (otherwise an unimportant screwball). Always that extra possibility: the idea justifiable not because of its implementation, but in virtue of a complementary or counterposed idea emergent in the mind of someone else -- serendipitous bastard offspring of a deranged cognitive parent. So my gratitude to this pusillanimous poseur, this literary clown. A new idea, my idea, all mine (accessible here on Language Log to just a few thousand close friends).

[One other thought, for computational linguists: What price the performance of context-dependent part-of-speech-tagger algorithms on prose such as this?]

Posted by Geoffrey K. Pullum at 05:06 PM

High plains construction grammar

Marc Moffett at the University of Wyoming has just started a weblog entitled Close Range (philosophy@7200'), devoted to "philosophy, fletchings, and flies". His inaugural post yesterday afternoon was entitled Blogging, with altitude, and dealt with Laramie's historical and geographical suitability for philosophy, which seems to involve a metaphorical interpretation of altitude and an associative connection with outlaws and saloons.

Marc's second post, today at noon, advances a theory about Escher sentences, namely that David Beaver is headed in the right direction about what they mean, or at least, what they would mean if one thought they meant anything at all, which Marc believes they do; but that Geoff Pullum is wrong to claim that such sentences are semantically incoherent, because "our interpretive capacities take into account holistic informational characteristics of linguistic constructions and don't simply generate meanings by way of 'bottom up' recursion principles."

Being a mere phonetician, I have no dog in this fight, but I'm happy to see Marc joining the ranks of language-oriented weblogs. Welcome!

Posted by Mark Liberman at 04:32 PM

The verbless of the earth

In recent news from France, an exciting new theory has emerged about what's wrong with the world and how to fix it. Someone writing under the name of 'Michel Thaler' has published a novel "Le Train de Nulle Part" ("The Nowhere Train") composed entirely without verbs, dedicated "à tous les partisans de la décolonisation de l'écrit et de la mise à mort ... du verbe" ("to all the partisans of decolonization of writing and of putting the verb to death").

Thaler describes verbs as "invaders, dictators, and usurpers of our literature", adding "the verb is like a weed in a field of flowers ... You have to get rid of it to allow the flowers to grow and flourish." He has banned infinitives as well as tensed verbs entirely from his writing, but he does exempt past participles from his linguistic Nuremberg Laws.

Thaler is also quoted (in a verbfully hypocritical passage) as saying

"I am like a car driver who has smashed the windscreen so he cannot see into the future, smashed the rear-view mirror so he cannot see the past, and is travelling in the present."

I've occasionally encountered drivers like that, but none who has also written a novel. Confusingly, Thaler's road rage is applied not to other drivers, nor even, in the novel, to the guilty imperialistic verbs, but instead to the many passengers on an imaginary train, whom he attacks individually, at length, and in vitriolic (though verbless) detail.

Since Thaler's portraits of women make use of sexist stereotypes ("...those women there, probably mothers, bearers of ideas far too voluminous for their brains of modest capacity"), there is apparently some controversy in France about whether the novel is misogynistic. A spokesperson for his publisher has defended him on the grounds that he is "a very charming, courteous man who loves women", and that he "attacks both sexes". Neither defense seems relevant -- many misogynists of my acquaintance have good manners and are quite fond of some parts of women, and the fact that someone (not Thaler, as far as I know) hypothetically also attacks Jews and Asians would not absolve him of racism for attacking Africans in terms of offensive social stereotypes.

Not having read the novel (a state in which I plan to remain), I have two reactions. First, Thaler's attempt to become the Frantz Fanon of the anti-verb international seems only slightly nuttier than the ideas of the many previous theorists who have inveighed against adjectives and adverbs. Not being French, the anti-adjective militants haven't actually practiced what they've preached, but then Thaler himself seems to use a normal number of verbs in discussing (as opposed to composing) his novel. And second, I'm glad to see a French theoretician who blames the world's problems on verbs rather than on Americans. Next: the Protocols of the Elders of Conjunction?

[via Maud Newton and Language Hat]

[Update: Mark S, in a comment on Language Hat's site, points out that Thaler was scooped in 2001 by Miranda Tedholm, a 17-year-old New Jersey high school student. A female New Jersey high school student, in fact, whose brain capacity was enough larger than Thaler's for her to have the idea first, and then to give it up after seven paragraphs and a Scholastic "Art and Writing" award in the category of humor.

Seriously, I wonder whether some of the differences between American and French intellectual life can be explained by the fact that we Americans have the opportunity to get this sort of thing out of our systems in high school and college, while the French, with their more formal and rigid educational system, do not. ]

Posted by Mark Liberman at 06:54 AM

May 11, 2004

Status and fluency

When Geoff Pullum describes U.S. Senators stumbling over General Taguba's name during the Armed Services Committee hearings today, it reminds me of a long-ago colloqium where I heard about how how upwardly-mobile men among the Wolof nobility cultivate inarticulateness as a sign of status. They make morphological errors -- for example simplifying the Wolof system of noun-class indicators by moving nouns into the default category, as a child or a beginning adult learner might do -- and they may even develop a speech impediment. If I remember right, men who rise in traditional Wolof society show these changes over the period of their life from youth to middle age, while less successful members of their cohort stay as glib and morphologically correct as ever.

Though I don't have access to a copy at the moment, I believe that this pattern is discussed in Judith Irvine, "Wolof Noun Classification: The Social Setting of Divergent Change". Language in Society, 7: 37-64 (1978).

One way to understand this development is suggested by the description of social stratification among the Senegalese Wolof from the CSAC Ethnographic Atlas:

Wolof society is characterized by a relatively rigid, complex system of social stratification. This system consists of a series of hierarchically ranked strata in which membership is ascribed by patri-filiation. Although these strata are usually called "castes" (and less commonly, "social classes") in the literature, here they will be referred to as status groups. The status groups are organized into three major hierarchical levels. The first of these is an upper or dominant level called geer, which is pre-conquest times was divided into several status groups including the garmi or royal lineages, the dom-i-bur or nobility, and the jaambur or free-born commoners, the majority of whom were small-scale cultivators called baadolo; these distinctions may still be alluded to on special occasions, but essentially the different strata have fused into a single status group which retains the label geer. Second is a lower or artisan level called nyenyoo, consisting of several occupationally-defined status groups. These groups include the metalsmiths (teug), the leatherworkers (wude), the weavers (rab), and the griots (gewel), who are the lineage genealogists, musicians, and general carriers of gossip. The lowest level is composed of the descendants of slaves (jaam), who are still called by that term. The jaam are differentiated into subgroups which are named and ranked according to the status of their former masters.

As I recall, the griots also serve as spokesmen for important members of the high-status group. So one of the symbols of high status is hiring someone to speak on your behalf; and skill in speaking comes to have low status, rather like skill in typing once had, back when it was something that only secretaries and journalists did.

However, I think that something a bit more general may be going on. After all, male members of the British aristocracy are also stereotypically disfluent, at least according to P.G. Wodehouse and Monty Python.

Those of us who are professionally wordy -- griots of our own society -- should ask ourselves whether Lord Emsworth and the current members of the U.S. Senate, like the traditional Wolof upper crust, may be on to something. Perhaps what Lao Tse said is true (Tao Te Ching 45):

The straightest, yet it seems
To deviate, to bend;
The highest skill and yet
It looks like clumsiness.
The utmost eloquence,
It sounds like stammering.

As movement overcomes
The cold, and stillness, heat,
The Wise Man, pure and still,
Will rectify the world.

[Note: the "facts" presented in this post should be taken with a grain of salt, as they are based on my memory from a long time ago in an area that isn't a speciality of mine. And as the CSAC Ethnographic Atlas says:

The Wolof manifest a broad range of cultural variation and also share many cultural features with neighboring peoples such as the Lebu, Serer, and Tukulor. As Gamble (1957: vii) has clearly pointed out: "The variability in Wolof culture means that almost every statement made about them needs to be accompanied by a label as to time and place."

I'd recommend that you check out Irvine's 1978 article, and perhaps other sources, before trusting that my description of Wolof attitudes towards fluency and morphological accuracy is accurate.]

[Update 5/12/2004: Kerim Friedman suggests that the right Judith Irvine reference might be
Irvine, Judith T. " When Talk Isn't Cheap: Language And Political Economy." American Ethnologist 16 (1989): 248-67.
Here's a quote:

Among rural Wolof, skills in discourse management are essential to the role of the griot (bard), whose traditional profession involves special rhetorical and conversational duties such as persuasive speechmaking on a patron's behalf, making entertaining conversation, transmitting messages to the public, and performing the various genres of praise-singing. ... High-ranking political leaders do not engage in these griot-linked forms of discourse themselves; to do so would be incompatible with their "nobility' and qualifications for office. But their ability to recruit and pay a skillful, reputable griot to speak on their behalf is essential, both to hold high position and to gain access to it in the first place.

I didn't find anything in Irvine's 1989 article about acquired disfluency and morphological ineptness among Wolof nobles. I'll continue to surmise that this is discussed in the 1978 article, at least until I have a chance to get to the library and read it.]

Posted by Mark Liberman at 10:06 PM

Costs and Business Models in Scientific Research Publishing

The Wellcome Trust has recently released a study of "Costs and Business Models in Scientific Research Publishing", as a follow-up to its September 2003 study "An Economic Analysis of Scientific Research Publishing", and its January 2003 report "Sharing Data from Large-Scale Biological Research Projects".

The new resport's conclusion is that "Open access publishing should be able to deliver high-quality, peer-reviewed research at a cost that is significantly less than the traditional model while bringing with it a number of additional benefits." The basic distinction here is between "the current 'subscriber-pays' model, where publishing services are free to authors and the article is published in a journal available via subscription, and an 'author-pays' model where the author (or their funder or institution) pays for the publishing services but where the final paper is published in an open access journal, available for free via the Internet to all who wish to use it."

A news article from the May 8 British Medical Journal includes this quote from Dr. Mark Walport, director of the Wellcome Trust:

"The results of scientific research must be freely and widely available to help scientists throughout the world make the discoveries we need to improve health. That is why we have supported the principle of open access publishing.

"However, up to now there have been unanswered questions about the economic and practical viability of this system. Our report now shows this is a win-win situation: high quality, peer reviewed research available to everyone free of charge within a sustainable online market—plus savings of as much as 30%."

In a related development, the Public Library of Science has announced an international open-access medical journal, "PLoS Medicine", to start publication next fall. According to this ProMED-mail post:

"Thanks to the Internet and new strategies for financing publication costs, it is now possible to share the results of medical research with anyone, anywhere, who could benefit from it. How could we not do it?" argued Dr. Harold E. Varmus, Nobel laureate, former National Institutes of Health Director, and one of the co-founders of the Public Library of Science.

I have a parochial interest in this problem, through my work on information extraction from biomedical text, where open access would certainly make research and development much easier.

Posted by Mark Liberman at 08:15 PM

Taggaboo, Abboo Grebb, whatever

Listening to the U.S. Senate's Armed Services Committee hearings on the Iraqi prisoner abuse scandal this morning, I heard several manglings of the name of General Taguba. His name is about as difficult to pronounce as "Toledo" or "tuxedo", but at least two Senators said "TAGG-uh-boo" or "TAGG-oo-boo" and the chairman said "Tah-KOO-bah" toward the end. And as for the name of the Abu Ghraib prison complex, I heard all of the following:

Abboo GAH-bee	Abbo GARB
Abboo GREBB	Abbo GRIBE
Abbo Gah-RAB	Abboo Gah-RIBE
Abboo Gu-REBB	Abbah GRAHB

I've said this before, but... It really does seem as if American political figures actually try to avoid being good at pronouncing foreign words. These are men and women who spend their lives speaking in public on important topics. They have highly educated staff members to do research for them. Why on earth couldn't they get a bit better at pronouncing simple place names and names of U.S. generals? My hypothesis: U.S. Senators know that being good at pronouncing Filipino personal names or Arabic place names wouldn't gain them votes, but would more likely lose votes. I just don't see what else could explain how randomly inexpert Senators are on these things compared to (for instance) BBC World Service newsreaders.

Posted by Geoffrey K. Pullum at 06:31 PM

Ritual verbal enthusiasm for food

The winespeak style has recently colonized many other categories of food and drink, for example bourbon ("the palate bears some leather, tobacco, vanilla and hints of caramel" ... "a deep vanilla nose with hints of dark berries and mint"), scotch ("with a powerful smoke, seaweed, iodine, and faint nutty notes to the nose"), beer ("Bready, strawberry sweet start with a silky smooth citrus hop finish" ... "Hints of pine and mustiness to the nose"), coffee ("Fresh aromas of moist herbs and alpine flowers lead to complex, dark notes in the mouth with a silky, subtle finish"), tea ("wonderfully creamy texture ... with notes of ripe fruit and green bamboo"), olive oil (" intensely grassy with an underripe pungency"), cheese ("balanced saltiness and notes of chalk, grapefruit and hay") and chocolate ("Distinct cedar and brandy tones becoming lightly tart with mild nutmeg spiciness at peak").

There's something circular about this semantic field: red wine has "notes of leather and tea", tea has "notes of honey and chocolate", chocolate has "complex tangy red wine and spice flavors ... [l]ight earthiness and tea tones in the ... aftertaste". It would be a violation of foodtalk norms for wine to taste like wine, though in fairness, chocolate seems often to be described as having "chocolate flavor". Though of course it is always modified by a string of adjectives like "direct" or "moderately intense" or "deep".

A minor genre of jokes is created by applying winetalk-style in unexpected areas like horseradish ("With a beguilingly titanium core, the aroma is pugilistic") or markers ("Robust, sweet, distinctly petrochemical with an undertone of benzene" ... "A light, fruity bouquet wrapped in aromas of vinyl and modeling glue"). It would be nice to see a description of politicians and other public figures in this style.

Some of the expressions used in genuine winetalk seem to be self-parody, but are serious. I mentioned in an earlier post that "rubber" and "gasoline" are not only good but even obligatory aspects of rieslings. "Chalk" is apparently also a Good Thing in rieslings, champagnes and chardonnays:

A very floral nose with notes of chalk dust.
You'll detect notes of chalk indicating a great terroir.
Golden apples, honey, plums, hazelnuts and lovely mineral notes of chalk.
...layers of biscuit-y, lemony flavors up front, developing deep complex yellow fruit flavors with notes of chalk
Reticent nose hints at chalk and lime.
Intensely flavored, juicy and very ripe, with Champagne-like notes of chalk, lemon, lime, minerals and toast.

Jokes aside, what these descriptions have in common is that they deal with luxury food products. The evocative language helps sell at a premium price to the public at large; sponsored tasting rituals are good marketing events; and the whole linguistic complex elevates consumption of expensive specialties into a symbol of cognitive capital as well as disposable income. I'm not entirely sure why other food products like beef and sweet corn have been exempt so far, but I suspect that the reasons are mainly economic and social rather than gustatory. There's no economic or social point in ritual enthusiasm about foods that can't be finely differentiated, branded and distributed in a plausibly reproducible way.

Most of this stuff also tastes good, of course, but when the descriptions don't come from sites that are selling the products that they describe, the talk is more balanced and also generally more entertaining: "Mild, sweaty and slightly poopy nose. Then some hints of cherry trickle in. Rather Burgundian so far. But then, when I taste it, what's this? Something here tastes like chocolate Necco wafers." I'm sure that there is a healthy demand for this sort of foodtalk, but the supply side seems to be much stronger, so that most examples are essentially advertisements, like this page of yak cheese tasting notes:

The aroma, at first encounter, has a mild animal scent, a clean spicy note which is reminiscent of both sheep and goat. The cut interior of the cheese has a much milder aroma, and this animal aroma is nearly absent from the flavor of the cheese.

At first taste, the cheese is disarmingly mild, with a clean, delicate milky flavor, which is totally different from sheep, cow, goat or mares' milk cheeses. After about 30 seconds on the palate, the taster becomes aware of a growing complex of herbal notes, with the flavor continuing to develop and building to a crescendo in about 120 seconds. The afternotes are a clean, pleasant, fading collection of milky, herbal and sharp-sweet.

There's something heart-warming about the idea of all that internet technology being used to allow us all to read someone's detailed description of the sensation of holding a piece of yak cheese in his mouth for three minutes or so.

Posted by Mark Liberman at 11:25 AM

The non-expert on editing

There's a weekly column at The Morning News called The Non-Expert: "Experts answer what they know. The Non-Expert answers anything." In the May 7, 2004 Non-Expert, Andrew Womack writes on the topic of English. Specifically, he "illustrates, exhibits, and displays how proper editing makes English all that more the understandable."

Posted by Mark Liberman at 08:22 AM

May 10, 2004

The Internet Pilgrim's guide to g-dropping

About a month ago, when Leon Wieseltier discussed his guest spot as "Stewart Silverman" on The Sopranos with Jeffrey Goldberg, he gave a vivid account of the sociolinguistic analysis that led him to choose and even emphasize a certain pronunciation of a certain word. However, his description of the pronunciation itself was completely mixed up, in an elementary and embarassing way. I don't mean that it was embarrassing for him -- perhaps it should have been, but I don't know whether he cares one way or the other whether he screws up elementary linguistic terminology in public. I mean that it was embarrassing for the profession of linguistics, which has failed in its educational responsibility.

Here's the crucial passage:

Goldberg: ... Your enunciation of the word "motherfucking" was perfect. I smell Emmy.

Wieseltier: ... I am delighted that you recognize the sociolinguistic analysis that went into the enunciation of my searing expletive. These things are not as easy as they seem. Needless to say, when I first read my lines I discovered parts of myself I never knew existed. As I pondered the character of Stewart Silverman, I began to grasp the inner necessity of the hard "g" in my "motherfucking." Our Italian-American brothers and our African-American brothers might surrender the concluding letter of the exclamation, so as to establish some integrity on the street.

But Stewart Silverman lives in perfect horror of the street. He doesn't even park on the street. ...such a fellow is a long way from authenticity. And so he would land very hard on that "g". He didn't go to BU for nothing. This is a man who is this week boasting to anybody who will listen that he once flew into West Palm on the same plane as Peter Bacanovic. In sum: motherfuckinggg.

Now, "hard g" means the pronunciation of "g" in gum, while "soft g" means the pronunciation of "g" in gem. And neither of the options for pronouncing the end of motherfucking has anything to do with either hard g's or soft g's.

Leon Wieseltier is the Literary Editor of The New Republic magazine. He's not just an acute observer of social relations, he's also highly educated and well read. He knows the word sociolinguistic, for instance, and he's not afraid to use it. But as I pointed out at the time, Wieseltier shares a blind spot with most other intellectuals today -- he can't describe the basic facts of language in a coherent way, because he doesn't know what the basic descriptive vocabulary means. We've seen other examples of this same problem recently, when intellectuals were discussing passive verb forms, or the structure of arguments, or hiphop vowel sounds.

Here at Language Log, we don't believe in blaming the victim. So instead of cursing the darkness, I'll light a small candle by offering a linguist's summary of about the whole -ing pronunciation business, often referred to as "g-dropping". As a bonus, this also counts as another installment in our recent series on language and gender, since it turns out that men and women are different in this respect (as well as in other ways, of course). And in honor of Wieseltier's role as Literary Editor, I'll throw in quotations from Galsworthy's Maid in Waiting and D.H. Lawrence's Lady Chatterly's Lover.

(As for the hard g/soft g thing, we'll have to take it up some other time).

What is "g-dropping"? The term comes from the conventional orthography: -ing is written as -in', as in she's openin' the door.

In fact, there is no "g" involved at all, except in the spelling. Final -ng (in English spelling) stands for a velar nasal, which is written in the International Phonetic Alphabet as an "n" with a hook on its right leg: [ŋ], a symbol called "eng." The final -n' in spellings like openin' stands for a coronal nasal, which is written in IPA with an ordinary "n": [n]. In IPA, opening is written as [ˈopənɪŋ], while openin' is written as [ˈopənɪn]. The only difference in pronunciation is whether the final nasal consonant is velar (made with the body of the tongue pressed against the soft palate) or coronal (made with the blade of the tongue pressed against the ridge behind the front teeth).

Thus is "g-dropping" nothing is ever really dropped -- it's just a question of where you put your tongue at the end of the word.

Not all words ending in [ŋ] are candidates for g-dropping. English doesn't have a general alternation between final velar and coronal nasals: boomerang does not become boomeran', and ring does not become rin'. We are only talking about unstressed final -ing at the ends of words. In some dialects, g-dropping applies only to the inflectional suffix -ing (as in present participles such as trying), and not in words such as wedding or morning.

Historically, g-dropping is actually a more conservative pattern. The English present participle suffix was originally pronounced with a coronal, not a velar nasal: in early middle English, this inflection was -inde or -ende. There was a derivational ending -ung for making nouns out of verbs, which produced words like present-day "building." These eventually merged into the modern -ing suffix. In 19th- and early 20th-century England, the g-dropping pattern (which really was the "not g-adding pattern") marked the rural aristrocracy as well as the lower classes. Thus this passage from John Galsworthy's 1931 novel Maid in Waiting:

'Where on earth did Aunt Em learn to drop her g's?'
'Father told me once that she was a school where an undropped "g" was worse than a dropped "h". They were bringin' in a country fashion then, huntin' people, you know.'

The velar pronunciation, a middle-class innovation a couple of hundred years ago, has since become the norm for most educated speakers. Note, by the way, note that this is exactly the type of change that many prescriptivist language mavens rail against -- an innovation that systematically blurs a distinction between two formerly separate categories of words. Some g-dropping speakers cleanly maintain the old distinction -- for my wife, who is from Texas, tryin' or readin' are normal, but weddin' or buildin' are completely wrong.

Today, nearly all English speakers drop g's sometimes, but in a given speech community, the proportion varies systematically depending on formality, social class, sex, and other variables as well.

For instance, in a 1969 study done in New York City, Bill Labov found that in casual conversation, g-dropping varied with social class as follows:

	Lower class	Working class	Lower middle class	Upper middle class
Percentage of g-dropping	80%	49%	32%	5%

In other words, as class status "rises," percentage of g-dropping falls.

However, formality also matters: members of a given social stratum drop g's more often in less formal speech. Thus for the lower class members:

	Casual speech	Careful speech	Reading
Percentage of g-dropping	80	53	22

In the 1969 NYC study, this pattern was maintained across the full interaction of social class and degree of formality:

A similar pattern was found in percentage of g-dropping from a study done in Norwich, England:

	Casual speech	Careful speech	Reading
Middle-middle class	28%	3%	10%
Lower middle class	42%	15%	10%
Upper working class	87%	74%	15%
Middle working class	95%	88%	44%
Lower working class	100%	98%	66%

Overall g-dropping rates seem to be somewhat higher in Norwich compared to New York. However, the general pattern of double dependence on social status and formality is maintained.

Similar studies have been done in many places, for many linguistic variables other than g-dropping, and the pattern is always the same: there is a sort of systematic analogy between social class and formality. There are several competing theories -- all interesting -- about why this is true, but the parallel between class and formality always holds.

Class is not the only social variable that tends to work this way. Another study of g-dropping, this time in Los Angeles, compared males and females of similar socio-economic status. Male speakers (other things equal) tend to use more informal (or lower-class) modes of speech than females do, and this study was no exception. At the same time, for both males and females, the percentage of g-dropping was greater in joking than in arguing:

	Joking	Arguing
Males	46%	24%
Females	28%	21%

The difference between joking and arguing might be because joking creates a more informal speech style, or it might because there is a dimension of friendliness or intimacy that can also be involved in such things.

Consider the following passage from D.H. Lawrence's 1928 novel Lady Chatterly's Lover. Here Lady Chatterly (Connie), first encounters her husband Clifford's gamekeeper, Mellors.

Lord Clifford	'Thanks, then, for the help, Mellors,' said Clifford casually, as he began to wheel down the passage to the servants' quarters.
Mellors	'Nothing else, Sir?' came the neutral voice, like one in a dream.
Lord Clifford	' Nothing, good morning!'
Mellors	'Good morning, Sir.'
Connie	'Good morning! it was kind of you to push the chair up that hill. I hope it wasn't heavy for you,' said Connie, looking back at the keeper outside the door.
Mellors	His eyes came to hers in an instant, as if wakened up. He was aware of her. 'Oh no, not heavy!' he said quickly.
Mellors	Then his voice dropped again into *the broad sound of the vernacular: 'Good mornin'* to your Ladyship!'

Lawrence tells us explicitly when Mellors is switching into a different part of his linguistic repertoire. Though he has used -ing in his 'Good morning, Sir', to Lord Clifford, he "drops into" the broad sound of the vernacular as he bids good morning to Lady Connie. (I don't know whether morning would really take the -in' ending in the country dialect that Mellors speaks, or if this is a descriptive mistake on Lawrence's part.)

This pattern is continued in the passage below, when Connie and Mellors meet next.

Connie	'I wondered what the hammering was,' she said, feeling weak and breathless, and a little afraid of him, as he looked so straight at her.
Mellors	'Ah'm gettin' th' coops ready for th' young bods,' he said, in broad vernacular.
[...break in text...]
Connie	'I'm just going,' she said.
Mellors	'Was yer waitin' to get in?' he asked, looking at the hut, not at her.
[...break in text...]
Mellors	'I mean as 'appen Ah can find anuther pleece as'll du for rearin' th' pheasants. If yer want ter be 'ere, yo'll non want me messin' abaht a' th' time.'
Connie	She looked at him, getting his meaning through the fog of the dialect. 'Why don't you speak ordinary English?' she said coldly.
Mellors	'Me! Ah thowt it wor ordinary.'

The question is, what is ordinary? Mellors is capable of approximating the language of his lord and lady; but for him, ordinary English is the vernacular.

He always uses "cold, good English" in speaking to Lord Clifford, but he varies his speech to Connie according to how he feels:

(Mellors discussing his job as a game keeper with Connie)
Mellors	'I had to go getting summonses for two poachers I caught, and, oh well, I don't like people.' He spoke *cold, good English*, and there was anger in his voice.
Connie	'Do you hate being a game-keeper?' she asked.
Mellors	' Being a game-keeper, no! So long as I'm left alone. But when I have to go messing around at the police-station, and various other places, and waiting for a lot of fools to attend to me...oh well, I get mad...'

After Connie and Mellors have become lovers, he consistently uses the vernacular in speaking with her, including -in rather than -ing.

Mellors

'It isna horrid,' he said, 'even if tha thinks it is. An' tha canna ma'e it horrid. Dunna fret thysen about lovin' me.

Here the dimensions of formality and class have become aligned with the dimension of intimacy.

[Note: the above is adapted from my lecture notes for LING001 at Penn. The D.H. Lawrence passages were originally pointed out to me by Gillian Sankoff].

Posted by Mark Liberman at 03:43 PM

Lie or lay? Some disastrously unhelpful guidance

If you will just pop to this PartiallyClips cartoon and read it, and then pop back here and continue, I'll tell you the answer to the dentist's question, and I'll add some additional remarks.

Thank you. The answer is, of course, lying. There are three relevant verbs, one transitive and two intransitive, two regular and one irregular; and they share certain shapes for certain parts of their paradigms. The verbs are lie "deliberately speak falsehoods with intent to deceive" (intransitive; fully regular), lie "be recumbent or prone or in horizontal rather than upright position" (intransitive; irregular), and lay "deposit, set down, or cause to be recumbent or prone or in horizontal rather than upright position" (transitive; fully regular in phonetics, irregular in written form). Here are the paradigms (terminology is from The Cambridge Grammar):

	*lie* "tell untruths" (intransitive)	*lie* "be recumbent" (intransitive)	*lay* "deposit" (transitive)
plain present form	lie	lie	lay
3rd sg present form	lies	lies	lays
preterite form	lied	lay	laid
plain form	lie	lie	lay
gerund-participle	lying	lying	laying
past participle	lied	lain	laid

Here are the promised additional remarks. The general assumption is that the problem here is confusing the two verbs -- simply not knowing one from the other. But that's not quite what's going on. Everyone knows the difference between them, at least in some uses. For a phrase like The island of Madagascar lies several hundred miles off the east coast of southern Africa, no one is tempted to say lays. For a phrase like This hen lays a minmum of seven eggs a week, no one is tempted to say lies. For You are lying in your teeth, you lying bastard no one is tempted to say laying. For I got laid last night no one is tempted to say lain (it's a special idiom, of course, but the point is that the idiom is based on the verb lay, and we are intuitively aware of that). We know how to tell these verbs apart to at least some extent.

Nonetheless, it is true that the intransitive verb meaning "be recumbent" and the transitive verb meaning "deposit" (which is essentially the causative of the first one: it means "cause to lie") are beginning to share some of each other's uses in a way that is not fully accepted as standard yet. In fact the pool of relevant data is beginning to be (from the purist's point of view) highly polluted. Assuming the standard prescriptivist version of how English is and ought to remain (basically as set out in the table above), we have large numbers of "errors" all around us. Here is a moderately random sample of what's out there:

Phrase	Source	Prescriptivist judgment
As I lay dying	William Faulkner title (a.k.a. Sally Dang)	Correct (preterite tense)
As I lie dying	from a Bayne MacGregor poem	Correct (present tense)
Lay, lady, lay	Bob Dylan song	Incorrect
Lay down your weary tune	Bob Dylan song	Correct
Lay down, little doggies	Woody Guthrie song	Incorrect
When I Lay My Burden Down	Mississippi Fred McDowell	Correct
Come and lay down by my side	Kris Kristofferson song "Help me make it through the night"	Incorrect
Lay it soft against my skin	Kris Kristofferson song "Help me make it through the night"	Correct
lie it on the floor	web page about indoor marijuana cultivation	Incorrect
lay it on the floor	web page about yoga	Correct
lay on the floor	web page about spine exercise	Incorrect
lie on the floor	web page about abdominal exercise	Correct

If hardly anyone achieves error-free learning of the standard pattern from this kind of chaotic input, it's not surprising. And if you're as confused as the dentist, it's no wonder. The situation isn't going to get any better, so this merging of two verbs is likely to continue to spread. Sometimes you've got to play it as it lays (incorrect).

Thanks to Rich Alderson for catching some errors in the first version of this post.

Posted by Geoffrey K. Pullum at 01:13 AM

May 09, 2004

Great scientist joke

I spent this morning at a retirement party for Lila and Henry Gleitman. Or Gleitmen, as Paul Rozin suggested in his after-brunch remarks. Or was it John Trueswell who suggested it? There were six after-brunch speakers. Anyhow, for the occasion Lila was wearing a stylish gray T-shirt created some time ago by Merrill Garrett, immortalizing an overheard conversation:

[on the front] H. Gleitman: Most great scientists are not great men.

[on the back] L. Gleitman: Yeah. For instance, I'm not a great man.

Posted by Mark Liberman at 07:07 PM

Sometimes You Just Have to Play It As It Lays

Fixed expressions, from tightly constrained idioms through more open formulas, sometimes require features from non-standard or informal varieties; they just can't be elevated. How's the boy? 'How are you? How are you doing?', as a conventional greeting to a man, has to have a reduced auxiliary. And play it as it lays totally resists the standard verb form lies.

A more complex example, overheard at a Palo Alto restaurant on 2/24/04, from a man on crutches: I was on vacation and sprained my ankle good.

The non-standard adverb good (modifying a preceding VP) here is used ironically; the meaning is something like 'do an ironic-good job of VPing', that is, 'VP to a bad degree'. In this usage, good can't be elevated to the standard well: I sprained my ankle well describes a good performance in ankle-spraining (whatever that would be), not a bad ankle-sprain. This ironic good can be replaced, without change of meaning, by standard badly or non-standard bad, both lacking in irony, but not by its standard variant well.

If you don't want to sound like someone who would ever use non-standard adverbs zero-derived from adjectives, then you'll have to forgo this bit of conventionalized irony and manufacture your own irony from the raw materials available in the language, saying something like I did a good/fine/great job of spraining my ankle. Play it as it lays, or get out of this game.

A semantic note... Conventionally ironic good is used with VPs describing unfortunate events (treated as accomplishments), of many kinds: the engine blew up (real) good, you messed/screwed/fucked that exam up good, he broke his arm good. The events have to be unfortunate, since otherwise you just get the literal reading of non-standard good: the engine purred (real) good, you aced that exam good, his arm healed good. And the events have to be viewed as accomplishments; if the unfortunate occurrence just befalls you, conventionally ironic good, with its implicature of performance, is baffling: a paving stone fell on him good, I managed to contract a brain tumor good, all the pencils broke on him good.

Posted by Arnold Zwicky at 12:40 PM

Approximate inference and global (in)coherence

Commenting on one of our recent posts on "Escher sentences", Fernando Pereira writes:

Over the last ten years, work on graph models of (mostly probabilistic) inference has focused on techniques for approximate inference. Many interpretation and inference tasks, for instance image segmentation, have graph formulations that are computationally intractable. Some approximation methods enforce local constraints of bounded size while relaxing other constraints. Others approximate the original graph with tractable subgraphs (typically trees). Others till relax a discrete assignment problem into a continuous optimization problem that can be solved efficiently. In all cases, the results of inference may not be globally coherent. The rough parallels between these ways of approximately solving inference tasks and "Escher" perceptual phenomena are intriguing.

Posted by Mark Liberman at 07:39 AM

Gender and tags

Yesterday I promised to give some examples of the complexity of findings about language and gender, where published claims sometimes contradict one another, and where the various things that "everybody knows" are not always confirmed by experiment. This happens in every area of rational inquiry, but it's especially common in cases where generalizations are associated with strong feelings. In this case, we're talking about the nature of men and women as biological and social categories, and the way individual men and women interact in both private and public spheres. There aren't many topics that generate stronger feelings than this one.

Strong feelings tend to generate contradictory research for two obvious reasons. First, systematic observation sometimes fails to confirm evocative anecdotes, which may be evocative because they resonate with stereotypes rather than because they genuinely confirm experience. Second, even systematic observation can be misleading, if you don't make the right observational distinctions or don't control for the context in an appropriate way. When the emotional stakes are high, people should in principle be especially careful not to overinterpret or overgeneralize their findings, but in practice, the opposite is often true.

Tag questions are grammatical structures in in which a declarative is followed by an attached interrogative clause or 'tag', such as

You were missing last week, weren't you?
Thorpe's away, is he?

In her influential (1975) work Language and Women's Place, Robin Lakoff depicted a typical female speech style, allegedly characterized by the use of features such as hesitations, qualifiers, tag questions, empty adjectives, and other properties, which she asserted to have a common function: to weaken or mitigate the force of an utterance. Thus tag questions "are associated with a desire for confirmation or approval which signals a lack of self-confidence in the speaker."

Lakoff's description of female speech style was based on her remembered impressions rather than on any systematic, quantitative observation. When subsequent researchers went out and counted things, they often found it difficult to confirm her observations. For instance, some studies found that men actually used more tag questions than women did.

Thus Cameron et al. (1988) looked at tag questions in a 45,000 word sample from a British corpus of transcribed conversations, called the "Survey of English Usage" (SEU). There were nine sections of 5,000 words each; three of all-male conversation, three of all-female conversation, and three of mixed-sex conversation. In this corpus, there were 60 tag questions used by men, and only 36 by women. This is a significant sex difference, but in the opposite direction!

When they looked more closely at the function of the tag questions in this corpus, a further sex difference appeared -- which on closer examination seems not primarily to be a sex difference at all.

Holmes (1984) distinguishes two functions of tag questions: modal vs. affective. Modal tags "request information or confirmation of information of which the speaker is uncertain":

But you've been in Reading longer than that, haven't you?

Affective tags "are used not to signal uncertainty on the part of the speaker, but to indicate concern for the addressee":

Open the door for me, could you?
His portraits are quite static by comparison, aren't they?

Affective tags are further subdivided into two kinds: softeners like the first example above, which conventionally mitigate the force of what would otherwise be an impolite demand, and facilitative tags like the second example, which invite the listener to take a conversational turn to comment on the speaker's assertion.

When the tag data in the SEU study are categorized in this way, it turns out that in the category of modal tags -- that is, the tags that genuinely express uncertainty -- are much more likely to be used by men, while the affective tags are only somewhat more likely to be used by men:

	Females	Males
Modal tags	9 (25%)	24 (40%)
Affective tags	27 (75%)	36 (60%)
Total tags	36	60

Suspecting that something besides sex/gender was involved here, the authors of this study turned their attention to another corpus. This database consisted of

nine hours' recorded unscripted talk from three broadcast settings: a medical radio phone-in where the participant roles were ... doctor and caller/client; classroom interaction recorded for ... educational TV, in which the salient roles were those of teacher and pupil; and a general TV discussion programme, in which the roles were ... presenter and audience.

In each case, one of the participants can be identified as "powerful" -- "institutionally responsible for the conduct of the talk", and typically also endowed with greater social power and status in the context of the conversations -- doctor vs. patient, teacher vs. student. The data was sampled so that men and women were equally represented in the "powerful" and "powerless" roles. All tag questions were identified and classified according to Holmes' categories. The results:

	Women		Men
	Powerful	Powerless	Powerful	Powerless
Modal tags	3 (5%)	9 (15%)	10 (18%)	16 (29%)
Affective tags (facilitative)	43 (70%)	0	25 (45%)	0
Affective tags (softeners)	6 (10%)	0	4 (7%)	0
Total tags	61		55

First, in this database -- unlike in the SEU data -- there is no significant overall difference in tag usage between the sexes.

Second, men continue to use modal tags relatively more often, and affective tags relatively less often.

The most striking difference by far, however, is not the sex/gender effect but the power effect: it is only the people who are in charge of the conversations -- the "powerful" speakers -- who use affective tags.

The results of the tag question study can be interpreted in several different ways. One view would be that Lakoff's general orientation is confirmed, even though she was wrong about the facts: affective tags are used by people who feel that they are in control of a conversation; the greater use of tag questions overall by men in the SEU data means that the men in those conversations felt more powerful. Another interpretation of such data has been that women's higher proportion of affective tags, which are used to manage the flow of conversation, means that women are saddled with a higher proportion of "interactional shitwork."

Yet another interpretation might be that Lakoff was wrong: men are actually more insecure about their opinions (whence men's greater usage of modal tags), and less interested in controlling the conversational actions of others (whence powerful men's lower usage of affective tags).

Overall, the interpretation of gender differences in language use -- and the extent to which such differences are emphasized in the first place -- seems to have a strong political component. Certainly the more abstract interpretations that are sometimes given to observed differences -- for instance, the conclusion that women are more cooperative and men more competitive in conversation -- are highly political. In evaluating such interpretations, it is well to remember how widely they can vary.

An older set of sexist stereotypes about gender differences in communication are expressed in Rudyard Kipling's 1911 poem The female of the species. Kipling depicts the stereotypical man as an equivocator, "whose timid heart is bursting with the things he must not say." Men in conversation are therefore ready to compromise and to discuss all sides of an issue, and tend to be diverted by humor, doubt and pity. A woman, on the other hand, "who faces Death by torture for each life beneath her breast / May not deal in doubt or pity -- must not swerve for fact or jest." For a woman, "her contentions are her children," and anyone who disagrees will be met with "unprovoked and awful charges -- even so the she-bear fights." The conclusion, for Kipling, is that women should be excluded from politics:

So it cames that Man, the coward, when he gathers to confer
With his fellow-braves in council, dare not leave a place for her.

No doubt many Edwardian men (and even some women) felt the same thrill of recognition, in reading Kipling's poem, that many contemporary women (and men) have felt when they first read Robin Lakoff's book. The large number of copies of Kipling's poem on the net suggests that some contemporary men still respond this way to it.

We should have learned since Kipling's time that this rush of feeling, in response to the well-crafted expression of a social stereotype, is not to be trusted. To quote Penny Eckert and Sally McConnell-Ginet again:

Women's language has been said to reflect their (our) conservatism, prestige consciousness, upward mobility, insecurity, deference, nurturance, emotional expressivity, connectedness, sensitivity to others, solidarity. And men's language is heard as evincing their toughness, lack of affect, competitiveness, independence, competence, hierarchy, control [...] When we recombine all these abstractions, we really do not know what we have. Certainly we don't seem to find real women and men as sums of the characteristics attributed to them.

Posted by Mark Liberman at 06:50 AM

May 08, 2004

Never mind the sex, feel the rhetoric

What struck me about Rivka's writing on Respectful of Otters was not that it was like a man (I had no idea, and was conscious that I should make no hasty assumption, so in mentioning Rivka here I simply avoided using anaphoric pronouns altogether), but rather that it uses measured but sometimes extraordinarily effective rhetoric. For example, Roger Krueger of Vietnam Veterans of America Chapter 172 in Cumberland recently commented that the Abu Ghurayb horror "is being completely blown out of proportion" because "When a person is in combat, they have to do whatever they have to do to stay alive." Rivka's response to this ludicrous and disgusting piece of nonsense was the following acid sarcastic remark:

And sometimes, apparently, when a person is not even remotely close to combat, in order to stay alive they have to take unarmed, helpless, locked-up men, strip them naked at gunpoint, pose them as if they're having oral sex with each other, and take pictures. Who are we to judge, who have not seen the hell that is war?

I think that's one of the neatest 59 words of polemical prose I've ever seen. Aggressive, yes; but that last rhetorical question is wonderful. This is the voice of a highly skilled writer. So, is it also a male voice?

To find out, I turned to a machine. I went to the Gender Genie and simply entered the above passage. The female score was 0; the male score, 127. The Gender Genie's algorithm looks only at function words and pays no attention to the alleged facts about content (that "male style is characterized by adversiality — put-downs, strong, often contentious assertions, lengthy and/or frequent postings, self-promotion, and sarcasm" and female style by "supportiveness and attenuation" along with "expressions of appreciation, thanking, and community-building; as well as apologizing, expressing doubt, asking questions, and contributing ideas in the form of suggestions"). And according to the Gender Genie, Rivka is male. It works best with texts of over 500 words, so I gave it over 500 words of Rivka's fine writing, and again it said male.

But then according to the Gender Genie I'm female, and I have tried not to be concerned about that.

Posted by Geoffrey K. Pullum at 09:17 PM

Aw++

Roger Grow, following up on this post, emailed to draw my attention to a great Partially Clips cartoon.

Posted by Mark Liberman at 07:43 PM

Sexing Rivka

Recently, Rivka at Respectful of Otters (who seems to have become our subject of the week) expressed puzzlement, if not annoyance, because "[i]n the last few weeks, two different men have linked approvingly to Respectful of Otters - using male pronouns to refer to me". In exploring the question of whether she "writes like a man", she quotes a description of genderlects from an article by Paolo Rossetti, suggesting that

"The male style is characterized by adversiality - put-downs, strong, often contentious assertions, lengthy and/or frequent postings, self-promotion, and sarcasm"; while the female style, in contrast, is characterized by "supportiveness and attenuation" with expressions of appreciation, thanking, and community-building; as well as apologizing, expressing doubt, asking questions, and contributing ideas in the form of suggestions."

Over the past couple of decades, there has been a lively controversy over the question of whether men and women use language differently, what the differences might be, and how and why the differences (if they exist) arise. A different perspective was expressed a few years ago in an article by Penny Eckert and Sally McConnell-Ginet, the authors of a 2003 book entitled "Language and Gender":

Women's language has been said to reflect their (our) conservatism, prestige consciousness, upward mobility, insecurity, deference, nurturance, emotional expressivity, connectedness, sensitivity to others, solidarity. And men's language is heard as evincing their toughness, lack of affect, competitiveness, independence, competence, hierarchy, control [...] When we recombine all these abstractions, we really do not know what we have. Certainly we don't seem to find real women and men as sums of the characteristics attributed to them.

With respect to the real person Rivka, my experience of guessing her sex from her writing went like this. When I first read her blog, I assumed that it was written by a woman, not because of her name, but because of... well, I don't know what. It just seemed that way to me. So the first time that I posted something that featured her weblog, I used female pronouns. Then it occurred to me that I might be wrong -- after all, I know quite a few men with names ending in "-a", like "Sasha" and "Andrea". So I went back and looked for any clues in what she had written. I read all the Otters archives, and I concluded that really, I couldn't tell. So I changed what I'd written to avoid gendered references to her, and I've maintained the same practice since. I'm glad to know that she's a woman, so that I can reference her stuff in the future without avoiding pronouns! I actually considered emailing her to find out whether she was he or she, and decided that it would be intrusive -- was that a male response, I wonder? I guess it says something about my personality, at least.

I'm not at all an expert on the "genderlect" literature, but I've read some of it and learned about some other work from colloquia and talks at conferences., and I've tried to put it all together so as to be able to teach it in introductory courses. Here's a quick summary, taken from my lecture notes for an intro linguistics course, of published claims about gender differences in speech and language (in contemporary western societies):

Female speech tends to be evaluated as more "correct" or more "prestigious", less slangy, etc. Men are in general more likely than women to use socially-stigmatized forms (like "ain't" or g-dropping in English). On the other hand, women are usually in the lead in changes in pronunciation, typically producing new pronunciations sooner, more often, and in more extreme ways than men. Women's speech has been said to be more polite, more redundant, more formal, more clearly pronounced, and more elaborated or complex, while men's speech is less polite, more elliptical, more informal, less clearly pronounced, and simpler.

In terms of conversational patterns, it has been observed or claimed that women use more verbal "support indicators" (like mm-hmm) than men do; that men interrupt women more than than they interrupt other men, and more than women interrupt either men or other women; that women express uncertainty and hesitancy more than men; and that (at least in single-sex interactions) males are more likely to give direct orders than females are.

However, for nearly all of these claimed differences, there are some contradictory findings. I'll give some examples in later posts.

Because of the enormous effects of social and interpersonal context on all the variables involved, and the enormous range of individual differences among people of all sexes, both in general and in their response to differing circumstances, and the strong effect of social stereotypes on experimenters' interpretations as well as on their subjects' behavior, this is an especially difficult kind of thing to study. People and social circumstances are variable and complicated, and it's clear that you need to look at the details in order to predict behavioral tendencies, much less individual behavior.

As for the explanation of the (claimed) differences, you can take your pick of biological "evolutionary psychology" theories (or "just so stories", as some argue), and two classes of culture-based theories, generally known as difference theories and dominance theories.

According to difference theories (sometimes called two-culture theories), men and women inhabit different cultural (and therefore linguistic) worlds. To quote from the preface to Deborah Tannen's 1990 popularization You just don't understand, "boys and girls grow up in what are essentially different cultures, so talk between women and men is cross-cultural communication."

According to dominance theories, men and women inhabit the same cultural and linguistic world, in which power and status are distributed unequally, and are expressed by linguistic as well as other cultural markers. In principle, women and men have access to the same set of linguistic and conversational devices, and use them for the same purposes. Apparent differences in usage reflect differences in status and in goals.

The basic ideas of the two-culture theory go back at least to the early 1980's, beginning with John Gumperz's research on misunderstandings in intercultural communication involving immigrants, and Marjorie Goodwin's studies of conversational interaction among African-American children in Philadelphia. The most influential recent exponent of the theory has been Deborah Tannen.

In Tannen's version, women use language to achieve intimacy, resulting in what she calls "rapport talk." For women, "talk is the glue that holds relationships together," and so conversations are "negotiations for closeness in which people try to seek and give confirmation and support, and to reach consensus." Men, on the other hand, use language to convey information, resulting in what Tannen calls "report talk." Because men maintain relationships through other activities, conversation for them becomes a negotiation for status in which each participant attempts to establish or improve his place in a hierarchical social order.

Is this true? Many people have found Tannen's characterizations true to life, while other have criticized her for promoting social stereotypes. Of course, both views might be correct. More on this later.

Posted by Mark Liberman at 07:13 PM

Two out of three on passives

The writer of the Respectful of Otters blog (named Rivka, I noticed only after drafting this), is a highly educated psychologist whose series of posts on the Abu Ghurayb revelations I have been reading with admiration. The writing is clear, trenchant, sometimes brilliant. But even for the highly educated in this country, grammar instruction is now so cursory and misguided that it is rare for a non-linguist to be able to go through a passage of prose and pick out, say, the passive clauses. I pointed this out before (here) with respect to a published report on media bias against Israel, in which a claim crucially depended on distinguishing passive from active clauses, and the rate of correct identification achieved was 1 out of 3. In this passage Rivka does better than that, but still gets only 2 out of 3 correct:

Look at his use of personal pronouns and the active voice there - "the way I run the prison." "We've had a very high rate with our style of getting them to break." It's complete ownership of, and identification with, the situation in the prison. Compare that to the passive voice with which he fails to take responsibility for anything, in the journal he sent his father: "Prisoners were forced..." "A prisoner...was shot..." "MI has instructed us to..."

In actuality, only the first two examples are passives.

The clause with the verb instruct is active. And so is the subordinate clause, given in full earlier in the post:

MI [Military Intelligence] has also instructed us to place prisoners in an isolation cell with little or no clothes.

The infinitival clause to place prisoners... is in the active voice. A clause is not in the passive voice simply because it denotes an action that was not undertaken volitionally. This screwdriver keeps on bending is not passive, even though it does seem to sort of blame the screwdriver rather than the user. My girlfriend suffered an injury while we were arguing is not passive, even when uttered at the hospital emergency room by a guilty boyfriend concealing his agency in the affair.

It's not the slightest bit unusual for educated people who are excellent writers to be unable to state grammatical generalizations correctly. And as Mark recently wrote here, "It's partly our fault because we've allowed the educational system to turn out PhDs who think and write like this... We've come a long way since grammar, rhetoric and logic were viewed as the trivial foundations for any other sort of education." Sunk a long way, he could have said.

Grammar is hardly taught at all these days. Almost everything most educated Americans believe about English grammar is wrong, and hardly anyone even controls a system of grammatical terminology that makes any sense. It is to at least some extent the fault of my profession. We theoretical linguists do not generally deign to do applied analysis of discourse or propaganda ourselves, or assist in it; and we do so little teaching of basic grammar of relevant kinds to a broad audience that the prevailing conception of grammar in the English-speaking world has hardly changed in a hundred and fifty years. It is perfectly sensible to attempt to discern psychological states of an author (like refusal to accept responsibility) from examining the use of particular kinds of grammatical construction in a text; but it generally gets done by people who do not have a sufficient grasp of grammar to permit the analysis they seek to understake. You can hardly blame them. It isn't like they're forgetting things that other people know. It just isn't true that everybody with an advanced degree will have had at least one coherent course on English grammar. Things are likely to stay this way until grammar teaching changes, or textual analysis with writers on politics and society is done in collaboration with grammarians .

Posted by Geoffrey K. Pullum at 06:01 PM

Unspoken Interrogatories

Evidence that today's youth are still firmly moored to the mother ship of culture: Peter Nguyen's Biography of Walt Whitman.

Walt would have been amused:

Down-hearted doubters, dull and excluded,
Frivolous, sullen, moping, angry, affected, disheartened, atheistical,
I know every one of you---I know the unspoken interrogatories,
By experience I know them.

[Leaves of Grass, 1860-61]

Assuming that this was really some kid's high school English composition submitted in June of 2000, and that it was scanned and posted by the teacher rather than the student, it was probably illegal, but I'm happy to have been able to read it. I hope that Peter got a book contract or a Hollywood screenwriting job out of the deal. Though he does need to work on his punctuation, and lose the faggotry jokes.

I recall writing an essay in a similar spirit about Yeats when I was 14 or so, for an English teacher who had been giving me C's all year. To my immense surprise, he gave me an A, with the comment that I should tell the truth more often. I've tried to keep that lesson in mind ever since. The results haven't always been so well received, but on balance it works out more often than it doesn't.

[via incoming signals]

[Update: Daniel Ezra Johnson emailed:

I did a small amount of research into this peter nguyen. on the whole I think these essays / teacher comments are a form of fiction, and there's something pretty deep about it, actually. It reminds me of nabokov's "pale fire" almost.
you can see some more examples of the genre, among other things, at:
http://mastafuu.deviantart.com/gallery/
and here's a great thread on a message board devoted to whether the things are "real" or "fake" (people are agreed that they don't belong in that particular forum):
http://www.misetings.com/forums/archive/index.php/t-8404
"If this was real, will he marry me? If not, as is my guess, what the fuck?"
"you are a moron if you dont find these funny. They are drastically less funny if they werent really turned in as papers but even so, they are still worthy of a chuckle."
"COME ON! Its so FAKE it looks like it is made by FAKEMAN from the street of FAKESTREET in the town of FAKETOWN in the land of FAKELAND (FK)"

]

Posted by Mark Liberman at 11:30 AM

Father of the raven

Obligatory language connection: this page on Abu Ghurayb ("the preferred NIMA transliteration" of Abu Ghraib) explains the meaning of the name as "father of the raven", and also cites a dozen alternative transliterations and four places in Iraq with this name. [Update 4/11/2004: A discussion of how to pronounce this name, with a recording, is here.]

With respect to the content of the Abu Ghraib scandal, some of the most well-informed and insightful commentary has been posted over the past week by Rivka at Respectful of Otters. In chronological order from earliest to most recent: An Army of Liberation, Defending The Unspeakable, Crime And Punishment, A Man Stands Up, Lost: One Moral Compass, Bringing In An Expert, The Taguba Report, Part 1, The Taguba Report, Part 2, More Abu Ghraib Analysis, Who's To Blame? Alternative Theories.

Here are two small examples. Point #6 of Rivka's analysis of the Taguba report is:

The command structure of the 800th MP Brigade was totally and horrifically buggered. Just one example: one of the Battalion Commanders was so incredibly incompetent that General Karpinski sent him to Kuwait for two weeks to give him (and presumably everyone else) a break from the strain of command. She put another Battalion Commander in his place temporarily. Except: she didn't write any orders relieving the first guy of command or putting the second guy in place. She didn't notify any of her superiors about the change. She didn't notify any of the soldiers in the battalion that they had a new commander. Taguba, who must have been bleeding from the eardrums at that point: "Temporarily removing one commander and replacing him with another serving Battalion Commander without an order and without notifying superior or subordinate commands is without precedent in my military career."

And with respect to the implications of the famous Stanford Prison Experiment, Rivka writes:

In the 1970s, a social psychologist named Phil Zimbardo converted the basement of a Stanford building into a makeshift prison and randomly assigned psychologically healthy young men to play "prisoners" and "guards." Within days, a sick and abusive "guard" culture had developed, and "prisoners" had become cowed and submissive. Zimbardo actually had to stop the study after six days because the abusive behavior of the "guards" had gotten so far out of control. (I'm not going to discuss his repellent lack of experimental ethics, except to say that no one would be allowed to do this study today.)

What does the Stanford experiment tell us about Abu Ghraib? I don't think it absolves the low-level MPs from moral responsibility, but it should steer us away from explanations which depend on their moral exceptionality. The Stanford experiment tells us that there needn't have been anything psychologically or morally deficient about these MPs at the outset of the war, just as the "guards" in Zimbardo's experiment were psychologically indistinguishable from "prisoners" when the study began.

If anything, the Stanford study damns the leadership of the 800th MP Brigade even further than they've already been damned. We know that, in the absence of continuous training, supervision, and strict controls, people given absolute power over others will tend to become vicious. No one in that chain of command has any business acting surprised that their failures of leadership led to exactly what anyone who's taken Psych 101 at any time since the mid-1970s could have predicted.

I would only add that the same corollories of the Stanford experiment are being acted out in prisons all over the world every day, though in most cases the authorities encourage the abuses rather than being indifferent or unaware. And by all accounts, the world leaders who are most "shocked" by the current revelations are among the most guilty of condoning or encouraging similar things.

I suspect that in this respect, American prisons and the attitudes of American authorities are among the best in the world, not the worst. However, you might recall that last year Bill Lockyer, California's Attorney General, remarked that he "would love to personally escort" Enron CEO Ken Lay "to an 8-by-10 cell that he could share with a tattooed dude who says, 'Hi, my name is Spike, honey.' "

And if you don't remember, you can read about it here.

Posted by Mark Liberman at 09:46 AM

An Escher Sentence in the wild

From the April 2003 issue of Golf Today comes this genuine non-linguist example of an Escher Sentence:

"With him breathing down my neck, I was still able to focus on what I was doing," Beem said. "More people have analyzed it than I have, but it's a nice notion that Tiger was up near the lead and I outplayed him."

So, what did Beem mean? How did he come to misrepresent that meaning in such an awful way? And why did he produce a type of structure so awful (as Geoff made clear in the post that started this thread) that it's used as a stock example of how semantically awful structures can get?

It's actually clear in broad terms what Beem intended. He meant that others have analyzed his victory more than he has.

OK, so is there any way that Beem's utterance could mean something like that? Well, perhaps his little language organ, bless its heart, was doing its damnedest to represent the intended meaning with a structure like one of the following:

(1) an accidental have: More people have analyzed it than I (/me), i.e. people other than Beem have analyzed it.
(2) a missing verb phrase: More people have analyzed it than I have (counted/had hot dinners/dated).
(3) a missing noun phrase: More people have analyzed it than I have (time to go into/the ability to count/knowledge of).
(4) an event counting reading: More people have analyzed it than (there have been events in which) I have (analyzed it).

You can see how the guy's generator might have gotten Beem into this mess, at least if, like me, you are partial to psycholinguistic just so stories. For instance, it could have started off producing a structure like that in (1), but suddenly realized when it got to than I that it really didn't like Beem saying More... than I <end of clause> at all. More... than I <end of clause> is stilted and low frequency for the modern golfer. Yet in frequency terms all was not lost, for the generator found itself in a relatively dense region of lexico-syntactic space: there are loads of juicy high frequency combos in the area, like than I thought, than I could, and so on. One of these combos, than I have ends in a word that was already primed by the earlier have, and by the high rate of occurrence of parallel structures on both sides of a than.

Now, if it had been me and my language organ then, well.... I can't be sure since Tiger never actually has breathed down my neck, and I've never been interviewed by Golf Today afterward, but I'm guessing in such a situation we'd have been pretty flustered. We'd have flubbed it. Probably we'd have tripped my tongue into a disfluency so gross that the Golf Today journo would have given up completely on quoting me. (Hey, notice how plausible this is: I have NEVER been quoted in Golf Today.)

But we're not talking about me here. We're talking about a world class golfer with killer instincts. His generator stayed calm, unpanicked, steady as a rock. Cut its losses with a quick chip out of the bunker, long putt onto the green, tips the ball in for a barely noticed bogey, strolls straight onto the final sentence, which it takes with an easy looking 3 verb birdie; it tosses the ball into the crowd and declares victory. That's the type of cool you and your language organ gotta have when the Tiger's a breathin' down your neck.

Posted by David Beaver at 06:47 AM

Sweet Pee

Earlier today the etymology of the Japanese word for "diabetes" came up. In Japanese diabetes is 糖尿病 [to:ɲo:bjo:], literally "sugar urine disease". To my interlocutor, this was mysterious. It actually makes a lot of sense. Diabetes is a disease in which glucose in the blood stream is unable to enter the cells that need it. As a result, the glucose is not metabolized and a great deal is excreted in the urine. The urine of diabetics is therefore sweet. At one time, tasting the patient's urine was part of a European doctor's diagnostic toolkit.

The Mandarin term is the same as in Japanese (modulo the difference in pronounciation, of course), but the Cantonese term is 尿淋 [nŷ lʌ̄m] "urine falls in torrents", reflecting the fact that the urine is not only sweet but copious. (Incidentally, there's a nice English-Chinese medical dictionary with over 31,000 terms here, but even though it is based in Hong Kong, it is in Mandarin.)

[Update: I am told by a younger Cantonese speaker that 糖尿病 is now the more common term in Cantonese. I'm guessing that this is a result of Mandarin influence.]

The sweetness of the urine of diabetics is also the explanation for the full name for diabetes in English. There are actually two quite different conditions that go by the name diabetes. The more familiar one is diabetes mellitus, where mellitus is Latin for "honeyed". The other is diabetes insipidus, "tasteless diabetes", which refers to the weak taste of the urine.

Posted by Bill Poser at 12:42 AM

May 07, 2004

'you' as in 'pleasure'

Back in January, Geoff Pullum mentioned in passing Henry Sweet's listing of the four pronunciations of have:

"(1) like the first syllable of havoc in I already have; (2) like the first syllable of Havana in I have often thought so; (3) like the first syllable of avoid in I'd've thought so; (4) just a [v] in I've forgotten."

Another English word of protean pronunciation is you, with several reduced variants that have more-or-less conventional orthographic representations, as in willya, ain'tcha and didja. The other day I happened to notice a more extreme example, in a clip that I've used for years as one of a number of examples illustrating various regional and ethnic accents of American English (if you'd like to listen to the other ones, look for the table of links down in the middle of this set of lecture notes on sociolinguistics).

In this passage, a reduced version of you in the context "says you should not do it" becomes simply a long voiced palatal fricative, like the sound in the center of pleasure drawn out for about a third of a second.

Here's a spectrogram

and here is a link to an audio clip. As you can see and hear, this is not an especially informal or reduced passage -- "should not" is not contracted to "shouldn't", and the final [t] of "it" is released. The accent is that of middle-aged New York-area Jewish man, but this version of you is pretty widely distributed.

Posted by Mark Liberman at 03:33 PM

Gettin' all autistic on me

After attending Biolink-2004 yesterday in Boston, I took the train back to Philly today. This morning in South Station, as I was waiting to board, I overheard a conversation between two middle-aged male railroad workers. One of them said to the other "He was gettin' all autistic on me." The other one responded "Boy, I know what you mean."

"Huh, " I thought to myself. "And they say guys aren't interested in rapport talk." But as the conversation continued, I realized that I'm out of practice interpreting the r-less Boston accent:

"Yeah, he was squattin' down and takin' shots from all different weird angles, and I says to him, 'cut it out and just take some normal pictures, for the love of Mike.' You know, you pay somebody to take pictures of your daughter's wedding, you don't want a lot of shots lookin' up the best man's trouser leg, right?"

Posted by Mark Liberman at 03:30 PM

Escher sentences

"More people have written about this than I have." It's interesting, as Geoff Pullum observes, that such sentences go down so easy, since they're completely incoherent.

These sentences remind me of the pictures of stairways that spiral up endlessly within a finite space, and the Shepard tones whose pitch seems to go up and up without ever getting any higher. All these stimuli involve familiar and coherent local cues whose global integration is contradictory or impossible. These stimuli also all seem OK in the absence of scrutiny. Casual, unreflective uptake has no real problem with them; you need to pay attention and think about them a bit before you notice that something is going seriously wrong.

Like Escher stairways and Shepard tones, these sentences are telling us something about the nature of perception. Whether we're seeing a scene, hearing a sound or assimilating a sentence, there are automatic processes that happen effortlessly whenever we come across the right kind of stuff, and then there are kinds of analysis that involve more effort and more explicit scrutiny. This is probably not a qualitative distinction between perception and interpretation, but rather a gradation of processes from those that are faster, more automatic and less accessible to consciousness, towards those that are slower, more effortful, more conscious and more optional.

Posted by Mark Liberman at 02:49 PM

Truth and consequences

In his post on grammatical complexity and electability, Geoff Pullum linked to a press release from the Requisite Organization International Institute. As Geoff points out, this is an odd name, congruent with the odd ideas to be found at the other end of the link. I was especially struck by the breezy carelessness which the Requisite folk shift among propositions and actions, logical inference and causation, truth and effective policy.

Thus a "Serial" argument is said to be one of the form "I think that so and so is true because if we do it, it will lead to X, and that will then lead to Y, and that will then cause Z". This makes it seem like "so and so" is an action, since it's something we can do; and that X, Y and Z are events, since they're things that get caused by doing "so and so"; and that the argument for the position "so and so" depends on considering its consequences. But in the research summary that the press release links to, "Serial Processing" is defined as having the form "Idea A → Idea B → Idea C = Position", which treats A, B and C explicitly as ideas, and puts the "Position" (the result of the processing) at the opposite end of the chain of reasoning. Of course, it's not clear whether "Position" is supposed to be equated with Idea C, or whether the structure is something like "(A→B→C)=Position", in which case maybe the order doesn't matter, since equality is a symmetric relation?

But I'm overinterpreting here. The Requisites' ideas about the structure of arguments are clearly careless and even incoherent -- or at least their ways of writing about their ideas about the structure of arguments are careless and unclear. However, I'm not trying to make them seem foolish, or to dismiss their ideas, which might well have some real value underneath the incompetent presentation. Instead, I want to suggest that the strangeness of this material is the collective fault of philosophers and linguists.

It's partly our fault because we've allowed the educational system to turn out PhDs who think and write like this about the structure of arguments. It's pretty clear that if the Requisites ever took a course in logic or philosophy of language, not much of it stuck. We've come a long way since grammar, rhetoric and logic were viewed as the trivial foundations for any other sort of education.

It's also our fault because we've left a vacuum in public discourse about political arguments. There are plenty of us who are capable of thinking and writing clearly and coherently about the structure of arguments, and some of us who even put this ability into practice, but none of us has looked systematically at the rhetorical structure of political discourse in an insightful way.

(I don't count here the analyses of George Lakoff and others, which deal with political metaphors rather than political arguments).

Posted by Mark Liberman at 07:36 AM

May 06, 2004

Plausible Angloid gibberish

Try slipping into a conversation a remark like More people have written about this than I have. My colleague Jim McCloskey has pointed out to me that this kind of sentence (is it a sentence?) has a peculiar property: at first people seem to think it is grammatical and means something. Given a few moments to think, though, they soon realize that it is just plausible-looking English-style gibberish. It seems to be an intelligible sentence of the language but it is just masquerading. McCloskey has no explanation for this. Neither do I. And more people have tried to find one than we have.

[Read on for the rather astonishing attribution story regarding this type of example.]

The original example of this kind was something like "More people have been to Russia than I have", though other versions mentioning Moscow, Berlin, and Brooklyn have also been cited. The example type has been attributed to Andy Barss, Elliott Moreton, Lance Nathan, Colin Phillips, Chris Potts, Ken Shan, William Snyder, and probably others, but in fact researches by Kai von Fintel have revealed that the original source was probably an actual occurrence in the speech of Herman Schultze. He uttered the sentence in the presence of Mario Montalbetti, who thanks Schultze on page 6 of the Prologue to his 1984 MIT dissertation After Binding: On the Interpretation of Pronouns "for uttering the most amazing */? sentence I've ever heard". Von Fintel also found that a student at the University of Houston has actually used in an op-ed piece the sentence I admit that more people have been to Iraq than I have, so I don't know everything. But he may have been kidding; the piece is apparently intended to be humorous.

By the way, notice that the issue as I see it is not about whether the example type is ungrammatical, or whether it is merely semantically incoherent, or why. The puzzle is about why people initially don't notice there is anything wrong with it at all.

Posted by Geoffrey K. Pullum at 11:59 PM

Grammatical complexity and electability

The latest issue of The New Yorker has a Talk Of The Town piece by Ben McGrath about Kathryn Cason, who analyzes transcripts of interviews with political figures and examines the patterns of grammatical function words (coordinators, subordinators, prepositions, and such). On the basis of such analysis she claims to be able to predict electoral success.

Actually, the web site of her organization, the oddly named Requisite Organization International Institute, reveals (see this press release) that the relevant work was the dissertation research of Dr Alison Brause, who is not mentioned in the New Yorker piece (it says "Cason has discovered" and so on; it appears that should be "Brause has discovered"). Cason, via McGrath, appears to give a rather fuddled account of Brause's research. It appears to be this simple. If you use or you are doing declarative thinking. If you use and you are doing cumulative thinking. If you use if you are doing serial thinking. If you use if and only if you are doing parallel thinking. These indicate four successively more advanced levels of complexity of thinking, and the winner in a presidential race is always the more complex thinker. Bush is mainly cumulative, like Clinton, Mondale, and Nixon before him. But Kerry is serial, which is superior: Cason is 100% confident that he will win in November.

I have to tell you, I find myself skeptical about any simple counting of function words providing an assessment of a candidate's abilities in complex thinking, let alone his likely success in an electoral system where (to put it mildly) complex thinking doesn't always seem to be the foremost consideration. But who knows. It would be a major boost for the prestige and significance of my profession as grammarian if it were true. But there is a question about which way the causal arrow really points. Might candidates' electability be enhanced if they were taught to use more conditional adjuncts? Or is it just that the kind of people who know how to win in politics tend to be the sort who use complex syntactic structures?

Posted by Geoffrey K. Pullum at 05:50 PM

A tin ear

Well brought-up children are taught how to perform the speech acts of apology and thanking politely and appropriately in the proper social circumstances. But some just don't get it. Some people won't or can't apologize, or have no idea how to (I pointed out one case in this post about Steve Rose). Recently President George W. Bush arrived at a point where the only reasonable thing to do was to say that he was deeply sorry. But he just couldn't or wouldn't. Instead he blurted out lines like "I want to tell the people of the Middle East that the practices that took place in that prison are abhorrent and they don't represent America" (see the transcript) -- a defiant assertion of rectitude and lack of responsibility, the opposite of an apology.

He could have said something a lot more like this: "I deeply regret the shocking and dishonorable acts of cruelty and pointless humiliation that have been permitted to occur under my leadership; I feel deep shame over what has happened and I offer my sincere and unqualified apology." I for one would have respected him more if he had.

It is a vital part of a top official's job, and above all of a chief executive's job, to know how to say the right things to the right people, phrasing them appropriately. But our president simply has a tin ear for how to speak to people. An Egyptian journalist reported in an NPR interview yesterday that at the end of each of his two on-camera interviews with Bush, the president had said the same thing to the interviewer once the cameras were off: "Good job." The subtle insult was noticed. "Good job" is what a journalism instructor might say to an undergraduate student after an interview exercise for which he would be getting a B+, or to a flunky who had done as instructed. Bush could have said: "I want to thank you for the opportunity you've allowed me to speak directly to your audience in the Arab world. I value it greatly. Please convey my greetings and best wishes on behalf of the American people to all the staff of your organization." It would have cost nothing, and earned good will. But as I say, sadly, the president simply has a tin ear for appropriate language.

[Note added later: As of May 6 and 7, American newspapers are reporting that President Bush apologized during an appearance in the Rose Garden with King Abdullah II of Jordan. He did not. He said, describing an earlier private meeting, "I told him I was sorry for the humiliation suffered by Iraqi prisoners..." That is a claim about his earlier linguistic behavior in a private meeting of which we don't have a transcript. I don't know what utterances he used when talking to the King. But "I'm sorry this happened" is not an apology (it's an expression of regret); "I feel sorry for the prisoners" is not (it's an expression of pity); and so on. There many ways to weasel around with the word sorry and not actually apologize. If this president told a lie about matters of what speech acts had taken place, it would not be the first time, as I pointed out in my very first Language Log post.]

Posted by Geoffrey K. Pullum at 04:56 PM

H34R M3, 0 MU53!

Check out PartiallyClips' contribution to "the holiday formerly known as Web Comics Awareness Day."

By the way, if you want to know what "get wanged" means, read this weblog entry [via Russell Lee-Goldman]:

I was Slashdotted, sort of, and I took that in stride. I got Farked, but it was nothing I could handle [sic]. The goons at the Something Awful Forums found me, and my server started getting a workout. But then I got Wanged, and it all went to hell.

And " PWN3D" is l33tsp34k for "owned". But you knew that.

Posted by Mark Liberman at 07:21 AM

We cannot/must not understate/overstate ... ?

In the June issue of the Atlantic Magazine (not on line yet), Barbara Wallraff proposes a new explanation for the use of phrases like we cannot understate the importance of X when the writer seems to mean we cannot overstate the importance of X. At least, her explanation is new to me. When I first read it, her analysis didn't seem to make sense. But on reflection, I think she may be on to something.

In an earlier post, I related examples like cannot understate the importance of... to the hypothesis that it's hard for people to calculate the meaning of phrases involving negatives in combination with modals, scalar thresholds and so on. This interpretive difficulty explains why some phrases with semantically-backwards interpretations are hard to edit out -- it's hard to calcuate what they actually mean, and they include pretty much the right words, and they're syntactically correct. In order to explain why the erring phrases are constructed in the first place, I suggested combining this interpretive difficulty with a sort of lego-block model of sentence construction -- take out an assortment of relevant tree-fragments from the lexicon, and fit them together until it looks OK. Sometimes another factor may be a sort of semantic gap, created by the fact that there is hardly ever any reason to want to express the idea that corresponds to the correct interpretation of the phrase in question.

Wallraff's explanation (The Atlantic, v. 293 no. 5, June 2004, p. 154) is completely different. She suggest that phrases like cannot understate are genuinely ambiguous, not just confusing:

Cannot understate and cannot overstate are like architectural elements in an M.C. Escher drawing: if you like, you can flip-flop them in your mind. The trick is done by cannot, which has two meanings. Think of Parson Weem's tale in which the young George Washington delared, "I can't tell a lie". Of course Washington was physically capable of uttering a false statement; by can't, he meant he chose not to. Can't, or cannot, can mean something very much like must not -- and if it means that, cannot understate the importance of makes sense.

This is a bit unclearly expressed, at best. I couldn't find a dictionary that gives must not as an alternative meaning for cannot, nor do I believe that it has such a meaning in general. When I say "I cannot jump 50 feet", there's no way to construe that as meaning "I must not jump 50 feet."

And I don't buy the analysis that when George Washington mythically confessed his guilt by saying "I cannot tell a lie", he really meant "I choose not to tell a lie." That turns a claim of essential moral character into an expression of situated existentialist choice.

However, a version of Wallraff's analysis can work. First, the George Washington quote points us helpfully towards the modality of moral obligation in place of the modality of logical necessity. And second, it's true of modal logics, of whatever kind, that ~◊A → □~A (if it's not possible that A, then it's necessary that not-A). I believe that this is the connection between can and must -- with interchanging scope of negation -- that Wallraff has in mind.

How does this help? Well, we can interpret "X cannot underestimate Y" as "it is not possible that X underestimate Y". This in turn is equivalent to "it is necessary that X not underestimate Y".

If this is logical necessity, we're right back where we started. If it's necessarily true that X is not underestimating Y, that means that no estimate X could be making of Y's value could possibly be too low, and that means that Y's value is negligible. This is just a more elaborate explanation of why expressions like "we cannot underestimate..." actually mean exactly the opposite of what people who use them usually think that they mean.

However, if we're talking about deontic necessity -- the logic of what ought to be -- then things are different. When we say that "it's morally imperative not to underestimate Y", we may mean that Y in fact has considerable value, and therefore it would be untruthful and even unfair to assign Y too low a value.

This still doesn't solve the whole puzzle. It applies only to cases involving "cannot" or "impossible", and not to the many other types of apparent overnegation, such as "fail to miss". And it doesn't seem to be the whole story even in the "cannot" or "impossible" cases, because when people say "we cannot understate the importance of X" (the phrase Wallraff is discussing), they seem to mean "we cannot overstate the importance of X" (because it is so great), not just "we're obliged not to understate the importance of X" (because it is not negligible). So I think we still need to appeal to the kind of explanation I (and others) have offered earlier, which depends crucially on the fact that it's psycholinguistically difficult to calculate the meaning of phrases combining negatives, modals and scalar predicates. However, the fact that phrases like "cannot overstate..." have an interpretation that is close to what is meant, rather than being completely the opposite of what is meant, may play a role as well..

[While we're on the subject, Alessandra Staley wrote in today's NYT:

The challenge of creating weekly scripts that move seamlessly among six clearly defined principal characters cannot be underestimated.

]

[Note also that this discussion has nothing to do with the question of whether can can be used to mean "is permitted to". It's sometimes prescribed that may should refer to permission, and can should only refer to ability. This doesn't correspond to current usage, and in the discussion above, I assumed that can can refer to permission as well as ability, possibility and other forms of modality.]

Posted by Mark Liberman at 12:27 AM

Chinese Philadelphia Food

A few days ago I found a flyer under my door announcing the Grand Opening of a new Chinese restaurant around the corner, which evidently replaced the take-out pizza place that used to be there. For the most part the menu is like that of all the other take-out Chinese places in the area, but when I went over the appetizers I encountered some unfamiliar dishes. I had never before encountered 芝士士的. I suppose this doesn't make a lot of sense if you can't read Chinese, but it doesn't make much sense if you can read Chinese either. The first character means "a kind of magic fungus". The second and third mean "scholar, gentleman". And the fourth means "clear" or "a little". The trick is that this is a purely phonological spelling. If you read it in Cantonese, it comes out: [ʧisisitik]. cheese steak is the Philadelphia speciality.

This was already heartening. The people who wrote the menu must be Cantonese speakers, or at least speakers of something other than Mandarin. The final /k/ is a give-away. Mandarin has lost final /p/,/t/, and /k/. We linguists are supposed to love all languages equally just as parents are not supposed to favor one child over another, but we have human feelings too. I grew up believing that Cantonese is proper Chinese, and I still do. Mandarin is the language of school teachers and government officials. It is useful to learn, but you wouldn't use it with your friends. Cantonese is the language of intimacy and real life. It used to be that the great majority of Chinese speakers in North America spoke Cantonese, but there has been a great influx of Mandarin speakers in recent years. Why, last year I went into another takeout place nearby and was dismayed to realize that the two cute young women behind the counter were speaking Mandarin. So its nice to know that the new restaurant is holding up the side.

But it gets better. It isn't unknown for Chinese restaurants to serve some non-Chinese food, but this place has gone one step further. Not only can you get cheese steak, you can get 芝士士的巻 "cheese steak roll"!

As a linguist and connoisseur of Chinese food it was clearly my duty to check this out, so I ordered a couple. I can't say that it seemed terribly Chinese, but I have to admit, its good, and it goes nicely with my favorite (闌記 brand) chili and garlic sauce. It consisted of a miniature cheese steak inside a spring roll casing, and amazingly, the casing was fried to perfection, neither soggy nor brittle and dried out. I rarely order spring rolls because very few restaurants cook them properly. I had to sacrifice the first one to my investigation, so here's a photo of the other one.

I have to warn you that these things probably contain enough cholesterol to kill a horse. The mention of such things here on Language Log is not meant to endorse their consumption by amateurs. Remember, we're trained professionals.

[Update 2004/05/11: Quite a few people want to know where to get 芝士士的巻. The place is called Evergreen and is located at 4726 Spruce Street. The telephone number is: 215-476-0371. Cheese steak rolls are only $1.20.]

Posted by Bill Poser at 12:10 AM

May 05, 2004

The Languages of the UN

While we're on the subject of international languages, the choice of languages at the United Nations is interesting. The original official languages were English, Chinese, French, and Russian, not coincidentally the languages of the permanent members of the Security Council. The choice was largely political. English had perhaps the strongest case. Not only was it already widely used as an international language, it was the dominant language of the United States, which had emerged as the greatest military and economic power. Chinese too was the language of a major power, as well as the most widely spoken language. Russian was the language of one of the major powers though not particularly widely spoken outside of the Soviet Union. French was chosen because it was still widely considered the international language of diplomacy. Spanish and Arabic were added in 1973, in both cases because they are the languages of a score of nations.

Although in theory all six languages have equal status, some languages are more equal than others. English, French and Spanish are the working languages of the General Assembly; English and French are the working languages of the Security Council. Public information is often not translated into Spanish, Arabic, and Chinese. This led to a protest in 2001 by the representatives of the Spanish-speaking countries.

Some people think that the UN spends too much money and effort on translation and interpretation and that it should adopt a single official language. Here is the proposal of the Transnational Radical Party and Esperanto International Federation that the United Nations adopt Esperanto as its official language. Others want to add official languages. There is pressure to add Hindi. This site advocates the adoption of Hindi as an official language of the UN. And here is a speech by the Indian Ambassador to the UN.

The irony in all this is that it appears to be purely symbolic. The sort of people likely to end up as diplomats or staff at the United Nations almost all speak English. When the UN surveyed its member nations as to which of the official languages they would prefer to receive correspondence in, 130 opted for English, 36 chose French and 19 Spanish. Not a single country preferred Arabic, Chinese, or Russian.

Posted by Bill Poser at 01:01 AM

May 04, 2004

Editor impresses

Kyrie O'Connor, deputy managing editor for features at the Houston Chronicle, has started publishing her daily memos to her staff as a weblog. In her May 4th MeMo, she remarks that "the idea of a 'linguist joke' is inherently appealing", thus endearing herself to linguists everywhere. After quick notes on "Zebra butt", repairman excuses and Lewis Black, she closes with this "Grammatical cranky rant du jour":

When did it become cooler to be an intransitive verb than a transitive verb? Was it right around the same time men stopped wearing socks with loafers? You know what I mean -- "The shiraz impresses". Meaning it impresses the writer, formerly known as "me". It's an implied "me". It irritates.

I'm impressed.

I stopped wearing loafers, with or without socks, early in my freshman year in college, so I won't even try to comment on the fashion coincidence. But as for the rest of it, I do know exactly what she means.

I'm not sure her analysis is entirely right, though. The construction "the shiraz impresses" seems to be elliptical, with an implied object complement that is not expressed, rather than being a genuinely intransitive verb. O'Connor gives both analyses (intransitive verb and ellipsis of the object), and I think that the second one is probably right and the first one is probably not. Cases similar to this have been called "null complement anaphora" (NCA) in the linguistics literature, but no one (as far as I know) has ever pointed out the snooty tone that such examples sometimes exhibit.

In particular, "X impresses" has definitely become part of winespeak:

(link) While brown is not necessarily the top colour this wine really impresses.
(link) The wine impresses through its full and alluring taste with a very pleasant smell that reminds of the plum, cherry or wild fruits savours, being completed by fine tannins.
(link) This harmonious wine impresses as much for its elegance as for its intense flavors.

In fact, on the first two pages of Google listings for the pattern {wine impresses}, 18 of 19 examples of the verb form impresses are examples of NCA, leaving out the direct object. In contrast, searching for just plain {impresses} has only 10 of 20 with that pattern, and {football impresses} has only 5.

I think that Ms. O'Connor is also slightly wrong about what the implied antecedent for the unexpressed object is. Her suggestion is that it's "the writer, formerly known as 'me'". But part of the reason that NCA fits winespeak so well is that the unexpressed object connects us, through the writer, to some ethereally sensed set of connoisseurs. What better way to communicate that the impressed experiencers are Those Whose Opinions Matter than by leaving it all implicit?

Despite this, the winespeak use of Null Complement Anaphora is somewhat verb-specific. With "persuades", for example, we do have

(link) ... complete, spicy, sensitive, but with racy acid as backbone, this wine persuades with reliable quality.

but that's the only example of null complement anaphora in the first 20 Google hits for {wine persuades}.

The pattern is also tense-linked to some extent. Out of the first 20 hits for the pattern "wine impressed", 5 were examples of NCA, while 12 had an overt object, and 3 were irrelevant structures of other kinds:

(link) The barrel-sample of this wine impressed us as being much like the '84 only more so.
(link) Lighter than Chardonnay, this wine impressed both Eric and Katie.

(link) As well as a great frog motif on top of the cork, this wine impressed with its delicate, sweaty, fresh nose giving way to green apples, freesias and a creamy butteriness with good acidity.
(link) Sadly, neither the dessert nor the wine impressed.

By the way, while we're indulging ourselves, shouldn't Ms. O'Connor's heading really have been "cranky grammatical rant du jour", not "grammatical cranky rant du jour"? Still, her brief paragraph is much more insightful than the many grammatically uninformed cranky rants du jour that I've read over the past few weeks, and so I'm not complaining a bit.

Posted by Mark Liberman at 07:15 PM

Avoiding Eurocentrism

Commenting on Bill Poser's observation that the new EU in principle requires 380 different pair-wise translators for its 20 official languages, Geoff Pullum pointed out that multi-linguality helps to fight the otherwise inexorable combinatorics of communication. Considering the 6,000-odd native langauges of the world's population today, Geoff closed with the thought that "we are lucky that such a huge number... can use one of the languages of the great colonizing powers of the past few centuries: English, Spanish, French, Portuguese, Dutch, and Russian".

Geoff is absolutely correct, of course, but I must hasten to forestall unmerited accusations of eurocentrism. It's not only the European powers who have facilitated international communication by spreading their national languages along with settlements and political control -- and we don't need to go back to the days of Darius or Asoka or the four Caliphs to find examples.

Today's New York Times has a story about the on-going spread of Arabic at the expense of the languages of Western Sudan, such as Fur and Daju. Elsewhere in Africa, the past few centuries have seen quite a few large-scale colonization projects besides those managed by Europeans. In West Africa, for example, we can point to the spread of Fulfulde and Hausa by the various Fulani jihads, or the spread of Akan languages via the rise of the Asante kingdom and the migration of the Baule people to Cote d'Ivoire, among other cases.

There are many other parts of the world where non-European colonization over the past few centuries has facilitated communication, either by driving smaller languages out of existence or by creating the preconditions for larger populations to learn the languages of the colonizers. The Turks took over Anatolia just a few years before Columbus voyaged to the Carribean, and the Moghuls were colonizing southern India and the Deccan at about the same time that the Spanish and Portuguese were taking over Latin America. It's over the past few hundred years that the Vietnamese have spread south out of the Red River Valley near Hanoi, at the expense of Khmer and a wide variety of smaller languages. The same is true for Han Chinese colonization of Taiwan, at the expense of indigenous Austronesian languages and Portuguese. Japanese provision of wider communications opportunities to the Ainu peaked under the Hokkaido Colonization Commission, after the Meiji restoration in the late 19th century. Han Chinese colonization of Tibet, Inner Mongolia and Xinjiang continues to facilitate communication to this very day.

At this point, I can't resist quoting what Douglas Adams had to say about the Babel fish. You'll recall that according to the Hitchhiker's Guide to the Galaxy, the Babel fish

is small, yellow and leech-like, and probably the oddest thing in the Universe. It feeds on brainwave energy received not from its own carrier but from those around it. It absorbs all unconscious mental frequencies from this brainwave energy to nourish itself with. It then excretes into the brain of its carrier a telepathic matrix formed by combining the conscious thought frequencies with nerve signals picked up from the speech centres of the brain which has supplied them. The practical upshot of all this is that if you stick a Babel fish in your ear you can instantly understand anything said to you in any form of language. The speech patterns you actually hear decode the brainwave matrix which has been fed into your mind by your Babel fish.

The Babel fish has played a complex and interesting role in Galactic culture, history and theology, but as Adams explains, the basic result is that the "Babel fish, by effectively removing all barriers to communication between different races and cultures, has caused more and bloodier wars than anything else in the history of creation."

Posted by Mark Liberman at 03:53 PM

Get your boyfriend to move it: a speech perception story

[Public service announcement: Analysis of access logs for this site reveals that an extraordinarily large number of people — not that we know who you are, but we see what search strings led people here — find this page by conducting searches on the topic of finding a boyfriend (try handing Google how to get a boyfriend, for example: this page will quite likely be in the top five results). This public service notice is here to warn such readers that although the following story is worth reading because it is so funny, it will not assist you in any way in finding a boyfriend. We Language Log contributors have considered the boyfriend problem at length, and have decided that we don't even know where to send you for that. We do language stuff. Romance languages, maybe, but not romance.]

Santa Cruz-resident phonetician and speech scientist Caroline Henton told me this true story with much glee. A woman living in house near the beach in Santa Cruz County recently called the animal rescue service to explain that there was a dead sea lion under her house that was beginning to rot and it was going to have to be removed. The voice on the other end of the line was apparently unmoved by a few hundred pounds of decomposing blubber:

"Don't you have a boyfriend who could move it for you?"

The caller, somewhat dumbfounded both by the sexism and the lack of concern for her plight, explained that she was between boyfriends at the moment.

"Well couldn't your father do it for you?"

The stunned caller said: "Umm, my father?"

"Look, all he's got to do is put it in a cardboard box," the stubbornly reluctant animal rescue operative insisted.

"Er... I don't think a cardboard box any smaller than a full-size refrigerator carton would accommodate it, and even then, I don't see how you could drag it out from under there... I mean, the thing probably weighs three or four hundred pounds."

It was the turn of the animal rescue service for a moment of stunned, uncomprehending silence.

"Three or four hundred pounds?"

"Yes," said the increasingly annoyed caller; "It's a full-grown sea lion, it's enormous."

And then at last, despite all the obstacles natural language and telephony could interpose, communication started to occur:

"Oh, a sea lion! I thought you said a dead feline."

Posted by Geoffrey K. Pullum at 03:16 PM

From black to black

Following up on this post, Julia Moore emailed with another story about cross-dialect misunderstanding:

I just read your Language Log post about puzzling accents. When I came across this line,

"Bill Labov has some impressive demonstrations of Chicago-area speech in which "socks" sounds to outsiders like"sacks", "block" sounds like "black", "steady" sounds like "study", "head" sounds like "had", and so on"

immediately I remembered a time that this happened to me.

I live in Chicago, and one night I was watching a news report about the successful capture of a man who had been robbing deaf people in a particular Chicago neighborhood. The reporter interviewed a Chicago police officer who had a strong Chicago accent as part of the segment. She (the officer) was describing how the police caught the guy and said, "We just went from black to black until we caught the guy."

I was horrified. It appeared to be a brazen admission of racial profiling in detective work caught on videotape. Then I realized that what she had actually said was, "We just went from BLOCK to BLOCK until we caught the guy." Whew!

Posted by Mark Liberman at 02:06 PM

When's the Last Time You Heard an Old Person Say "Dadburn It"?

An old Bugs Bunny cartoon of 1944, THE OLD GREY HARE, depicts Bugs and Elmer Fudd as old men going through their usual antics with canes, gray beards, spectacles and the shakes. But these aren't the only traits indicating their having reached their twilight years. Bugs, as an oldster, talks in a hillbilly accent.

But Bugs Bunny as a young "man" spoke in a Brooklyn/Bronx patois. Why would he have shifted into an alien moonshine dialect as he got older?

This was no random occurrence chez the Looney Tunes crew. One sees this kind of thing again and again in pop culture of that era. Old people are very often depicted as talking like the Beverly Hillbillies even when the people around them use mainstream standard American.

In a 1932 musical film STRIKE ME PINK with Eddie Cantor and Ethel Merman, in one song near the end they are transformed into oldster versions of themselves (never mind why), and suddenly they are cackling along in "Consarn it!" accents that neither of the urban characters they were playing in the film used.

In the old radio hit FIBBER MCGEE AND MOLLY, a cherished character in the late thirties and early forties was "The Old Timer," who would always pop by telling tall tales ushered in by his catchphrase "That ain't the way I heerd it!" The Old Timer sounded like an old-time gold prospector -- but everyone else on the show, which took place in generic Wistful Vista, Illinois, spoke generic Midwestern Whatever.

The 1949 Looney Tune THE WINDBLOWN HARE is a parody of the Little Red Riding Hood tale. Bugs talks like Bugs, the Wolf talks like Bluto in the Popeye cartoons, but Granny talks, once again, like she grew up in the fastnesses of West Virginia. The Wolf is so focused on catching Bugs that he barely has time to acknowledge Granny as per the progression of the story that he has already read in a book. As he hastily shoves her out of the house, Granny exclaims "Land sakes, aintcha gonna eat me?" As the Wolf pushes her out the door she continues "Can't a body get her shawl tied?"

And the Looney Tunes squad had pulled the same thing in an earlier Red Riding Hood parody in 1937, LITTLE RED WALKING HOOD. This time Red talked like Katherine Hepburn, but Granny again had an Ozark accent, as well as a stereotypical fondness for her nip, furtively ordering gin over the phone.

This kind of thing was so common in American pop culture before 1950 that I would venture that I got a sense of the contours of hillbilly dialect (in caricatured form, to be sure) from these depictions of old people in the cartoons and old movies that were still staples on UHF as a grew up. I recall an afternoon in high school in 1980 when, joking with some friends, I passingly slid into such an accent depicting a person in their old age -- you know, "Sonny" and such. One guy joshingly objected "How come when he got old he would start talking in a Southern accent?" It struck me. He was right -- what kind of sense did this make?

But I wasn't alone. A few years later, a girlfriend of mine was given to road-rage moments when stuck behing a slow-driving old person where she would grouse "You can turn now, GRANDPAW!" But why the Li'l Abner appellation? Whence her natural sense that an oldster in 1985 was GRANDPAW rather than GRANDPA? I highly suspect she picked it up from reruns on the tube like I did.

It occurs to me that this way of depicting seniors' speech in the old days may have reflected a demographic reality.

1930's was the first census that revealed more Americans living in cities than in the country. Until then, for Americans, rural life was default. The City was the challenging, debauched setting depicted in tragic novels by Theodore Dreiser. The Country was the real America, such that Sinclair Lewis could write MAIN STREET about Carol Kennicott suffering the boredom of little Gopher Prairie, Minnesota and be feted as capturing "America" itself. If Sherwood Anderson wrote about the underbelly of small town America in WINESBURG, OHIO, this was news.

In 2004, things are quite different. The notion of the city as unhealthy for one's morals is antique. We pity urban residents dealt a bad hand, but hardly suppose that they would be best off relocating to Gopher Prairie -- we assume that The City should be made a better place for them. And every second short story in THE ATLANTIC shows us small-town folk underemployed, understimulated and on the brink of divorce. To us, "America" writ large is a list of the big cities, with the other places "in between," just as Germany is Berlin, Munich, and Cologne, not Regensburg, Wiesbaden and Siegen.

But for Americans in the thirties and forties, America's transition from being a rural nation to an urban one was as recent a phenomenon as the internet is to us. Presumably, it was quite common that old people had grown up in the country, but had moved to cities to raise their kids. Surely speech patterns reflected this. As such, it would have struck an intuitive chord to American audiences for age to be indexed with a backwoods accent, shorthand-style, in pop culture depictions.

Today this would make no sense. We do not spontaneously sense a person past sixty living in Philadelphia, Chicago or San Francisco as talking like Dolly Parton or Jeff Foxworthy. But the demographic tipping point in 1930 helps make sense of a tendency in the entertainment of the era that, otherwise, is intriguingly mystifying. Diane Keaton will be sixty in a couple of years -- and yet I doubt that in SOMETHING'S GOTTA GIVE, PART TWO she will be saying things like "Aintcha gonna eat me?"

Posted by John McWhorter at 01:04 AM

May 03, 2004

Puzzling accent story of the day

Here's another argument for teaching the International Phonetic Alphabet in secondary school. Patrick Belton at OxBlog posts a story (attributed to Josh Cherniss) about a Yale professor "with a strongly Southern accent" who is said to have told the students in a course on Faulkner to focus their exam preparation on "Sarah Dang". This turned out to be the professor's way of saying "As I lay dying." I'm skeptical, unless his "Southern accent" was from the southern Ryukyu Islands.

We've recently featured several cases where otherwise perceptive people stumble when they try to think or write about speech sounds. Sometimes acute observations are wrapped in incoherent nomenclature, as in Leon Wieseltier's discussion of "g-dropping". Sometimes a writer seems to get confused even about how to categorize sounds, not just how to describe them, as when the New Yorker's Jake Helpern mis-describes Nelly's "herre" for here as "hurr", with the same sound as "thurr" for there. In Belton's OxBlog post, phonetic ignorance spoils a joke.

Perception across accents of American English can certainly result in mistakes. Bill Labov has some impressive demonstrations of Chicago-area speech in which "socks" sounds to outsiders like "sacks", "block" sounds like "black", "steady" sounds like "study", "head" sounds like "had", and so on. Various southern-states varieties of English certainly have similar potential to be misunderstood by outsiders. But the case that Patrick Belton describes doesn't make sense to me.

The claim is that the professor said "As I lay dying", but was heard as saying "Sarah Dang". An American southerner might elide the vowel in "as", but he'd still have a voiced [z] for the final consonant in that word; he'd have a monophthong [ɐ] for the pronoun "I", but there's no way he'd turn the [l] of "lay" into an [r]; he might also have a relatively monophthongal vowel [e] in "lay"; but he'd also probably pronounce the "-ing" ending as [ɪn]. So the whole thing would come out in the IPA as something like

[ˌzɐleˈdɐjɪn]

which might plausibly have been heard as a pronunciation of the name "Zaleh Dayen", but hardly "Sarah Dang".

When Jake Halpern's New Yorker article falsely described Nelly pronouncing "herre" as "hurr", I imagine that was a slip of Halpern's memory. Nelly spells "here" in a non-standard way in the title of a famous song; he pronounces "there" (and other words in the same set) in a non-standard way, rhyming with (the general American pronunciation of) burr; the standard spelling of "here" and "there" makes them orthographic rhymes, so to speak; and Halpern just made up the rest, probably without realizing it, though obviously also without caring much one way or the other about phonetic accuracy.

I suspect that the "As I lay dying" to "Sarah Dang" story is similar. I can believe that some Yale students once had trouble understanding one of their professors; Josh Cherniss may well have been one of them; I bet that the rest is some amalgam of the original misperception with various layers of mis-remembering, story-telling and re-telling down the chain to Patrick Belton's post.

One sleep-deprived undergrad's slip of the ear? Maybe. His reconstruction of the experience, telling the story years later? Even more likely. An accurate account of cross-accent misinterpretation, by a whole class, of the sounds that a professor from the American south actually produced? I doubt it.

Of course, an audio clip of an American southerner -- or even a plausible imitation of a southerner -- saying "As I lay dying" in a way that sounds sort of like "Sarah Dang" would help convince me. There are lots of ways of talking that could be described as "southern", and maybe I'm not thinking of the right one.

[Update: the Oxblog post has been updated to change "Sarah" to "Sally", which makes the first part of the misunderstanding much more plausible.

It's still pretty hard to make sense of the "Dang" part. Here's a thought, though: maybe the speaker actually pronounced "dying" in the common, standard way, with a full rising diphthong and a velar nasal -- IPA [dɐjɪŋ] or [dɐʲŋ]. However, the listener(s) expected him to say "dang" (as in "dang nab it") in something like that way -- maybe more like [dʌ^æŋ] -- and so mis-interpreted a perfectly standard rendition of "dying" as a drawled, countrified southern rendition of "dang".

If that's the line of thinking that led to this story, then I'm even more convinced that somebody may have (re)constructed the whole thing long after the event, based on the same kind of half-remembered, half-confused associations that seem to have operated in Jake Halpern's New Yorker story.]

Posted by Mark Liberman at 11:32 AM

God bless the multilinguals

Bill Poser's observation that there are 380 language pairs of languages used in the European Community as from May 1st is of course obtained by the formula n² - n (multiply the number of languages, n, by itself, and subtract n to get rid of the cases where the two members of the pair are the same — you don't need French to French translation). Could the EU assume that any Latvian/Maltese interpreter or translator they find find will be able to do Maltese/Latvian as well? Maybe (though Bill points out to me that translators usually work only in one direction, and not all simultaneous interpreters can do both directions unless they are fully balanced bilinguals with experience of translating in both directions). If we could assume bidirectionality, the EU would only looking for (n² - n)/2 = 190 kinds of interpreter and/or translator. That still seems a lot.

In case anyone wondered, for the entire world (imagine a future world government with every distinct language community represented) we can assume the number of languages is roughly the number listed in the Ethnologue produced by the Summer Institute of Linguistics: 6,809. And setting n = 6,809 we get n² - n = 46,355,672 different types of interpreter or translator. Even if we assume bidirectional abilities we get (n² - n)/2 = 23,177,836 types. So if the entire population of Australia — men, women, boys, girls, and babies — were all bidirectional interpreter/translators of different types, that still wouldn't be anywhere near enough to give us one of each type.

We are so lucky that such an enormous number of the people in the world have taken the trouble to become bilingual or multilingual. And more to the point (since for an Abkhaz speaker to also know Zulu wouldn't help much for most of us) we are lucky that such a huge number of those people can use one of the languages of the great colonizing powers of the past few centuries: English, Spanish, French, Portuguese, Dutch, and Russian. God bless the multilinguals, the people (like John Kerry) who have put in the effort to become fluent in a foreign language so that a lot of other people don't have to.

Posted by Geoffrey K. Pullum at 12:56 AM

May 02, 2004

The Languages of the European Union

The addition of ten new members to the European Union has added nine languages, for a total of twenty. That makes 380 language pairs. For many of them, it is going to be hard to find translators and interpreters. How many people, for example, can translate from Latvian into Maltese? According to this article in today's New York Times, the EU is now looking for translators and interpreters for the newly added languages.

You might think that, at least in writing, the problem would not be so bad because machine translation can be done by means of an intermediate representation independent of the particular languages. In this case, adding a language means adding translation between that language and the intermediate representation, not adding translation between the new language and all of the other languages. Such interlingual machine translation systems have been studied for many years, but the dominant view still favors transfer systems, in which languages are translated pairwise in order to take advantage of detailed knowledge of the correspondance between the two languages. The MT system currently in use by the EU is a transfer system. According to this report [PDF document]], that is what it expects to continue to use for the forseeable future.

In speech, the need for interpreters is reduced by the fact that most EU politicians and staff speak English, French, or German, with English now the dominant language. According to the New York Times article, more than 90% of European highschool students now study English. French is studied by only 29 percent in Germany, 27 percent in Italy and 24 percent in Spain. German is still widely studied in Central Europe, but is studied by only 31 percent in France, 8 percent in Italy and 1 percent in Spain.

Posted by Bill Poser at 07:47 PM

Indigoed in Pearlspace

Geoff Pullum's spectacularly negative review of The Da Vinci Code makes me wonder, again, whether someone in the publishing industry might have opened a portal to a parallel universe, where stylistic and linguistic norms have developed a bit differently from the way that they have in ours. I first had this disturbing thought while reading another popular recent thriller, The Dante Club by Matthew Pearl. I enjoyed this book's plot, I think -- but I'm not sure, because once or twice on every page, I got distracted by its language.

There are a couple of examples in the book's first two sentences:

John Kurtz, the chief of the Boston police, breathed in some of his heft for a better fit between the two chambermaids. On one side, the Irish woman who had discovered the body was blubbering and wailing prayers unfamiliar (because they were Catholic) and unintelligible (because she was blubbering) that prickled the hair in Kurtz's ear; on the other side was her soundless and despairing niece.

The use of "breathed in some of his heft" in place of the more commonplace "sucked in his stomach" didn't faze me, but the idea that the woman's prayers "prickled the hair in Kurtz's ear" brought me up short. In our universe, hairs on the back of one's neck are often said to prickle, as a way of describing the response of arrector pili muscles to adrenaline; and old men get hair growing in their ears... but does ear hair prickle? and if so, would blubbering and wailing make the hair in the ear on just one side prickle? I'm not sure, but at this point I had sailed through another couple of paragraphs without registering their content, and had to go back and start over.

For the first few dozen pages, I figured that Pearl was just trying to give his prose a 19th-century tone by using awkward constructions, making up unexpected figures of speech, and substituting rare words for common ones. But then I came across a phrase on p. 45 that suggested a more sinister explanation:

In the lobby of the police station in Court Square, Nicholas Rey looked up from his notepad, squinting at the gaslight after a long engagement with a sheet of paper. A hefty bear of an indigoed uniformed man, swaying a small paper parcel as if it were an infant, waited in front of his desk.

"An indigoed uniformed man"? Now, I'm a linguistic libertarian, but that seems flat-out ungrammatical to me.

In the first place, the past participle indigoed is unexpected, independent of context. Indigoed is unexpected partly because indigo, in our universe, is used as a noun or adjective but not as a verb; and it's also unexpected because the adjective indigo would work fine here, if a fancier word for "blue" is needed.

The American Heritage Dictionary defines indigo the noun as

1a. Any of various shrubs or herbs of the genus Indigofera in the pea family, having odd-pinnate leaves and usually red or purple flowers in axillary racemes. b. A blue dye obtained from these plants or produced synthetically. 2. Any of several related plants, especially those of the genera Amorpha or Baptisia. 3. The hue of that portion of the visible spectrum lying between blue and violet, evoked in the human observer by radiant energy with wavelengths of approximately 420 to 450 nanometers; a dark blue to grayish purple blue.

and goes on to observe that indigo can also be an adjective, with the obvious range of meanings. The AHD doesn't recognized that indigo can be a verb -- and the OED doesn't either -- because (in our universe) it isn't one very often. Of course, in English any noun can be verbed, and so Google's index does have 26 examples of indigoed to go with its 3,350,000 mentions of indigo. These including someone who is worried about whether indigoed henna will oxidize, Neil who is "pretty much Indigoed Out" over the Indigo Girls, and some deeply purple "poetry" where someone's "lovemaking" is described as "[r]eigning unaltered within the passion of deep-indigoed space".

But what we have here is not just a bit of over-empurpled prose -- Pearl's use of indigo in the phrase "indigoed uniformed man" has got a more serious problem. It's not the man who is indigoed (or indigo, or just plain "blue"), it's his uniform. The structure of this phrase has to be something like

       (((indigoed uniform) +ed) man)
            ADJ     N       AFF   N

In our universe, English pretty freely allows phrases of the form MODIFIER NOUN1+"ed" NOUN2, taken to mean "NOUN2 with (a) MODIFIER NOUN1". Here are a few examples googled for the occasion:

Rudolph the Red-Nosed Reindeer
long haired hamster
white skinned potato
big trunked tree
small-brained reptiles
scarlet-cloaked angel
clean uniformed workers
black-shoed bohos

The problem is that the MODIFIER can't be a past participle:

*pressed uniformed policeman
*burned nosed tourists
*polished-shoed dancers

This is not just my opinion, it's also Google's. At least, I can't find any hits at all for obvious guesses like "burned nosed", "sunburned nosed", "tanned legged", etc. And "purpled prose" gets 83 ghits, "purple prosed" gets 137, but "purpled prosed" is unknown to Google.

Now, The Dante Club's front matter tells us that "Matthew Pearl graduated from Harvard University summa cum laude in English and American literature in 1997, and in 2000 from Yale Law School". I ask you, is it likely that a person with that background would be so insensitive to the norms of the English language?

No, a much more plausible hypothesis is that Pearl graduated from a slightly different Harvard University, in a universe slightly different from our own, and read a body of English and American literature that is also just a bit different. In particular, I hypothesize that in Pearl's universe, indigoed became a commonplace word for "blue" back in the 17th century, and by now is just an underived adjective. A parallel example in our universe would be bespoke, which started out as a past participle of bespeak, but has by now been adopted simply as an adjective, and so occurs in phrases like these:

(link) ... Duncan Palmer, general manager of the Sukhothai, is shrugging bespoke-suited shoulders at what would normally be a terrible occupancy rate of 35% ...
(link) Strolling among the bespoke-suited businessmen and their designer-clad wives, the candidate is something of a celebrity.
(link) ... Da Kingdom's Ambassador Extraordinary & Plenipotentiary & Consul General to the United States, a pudgy, bespoke-suited, Habanos-smoking dude named His Royal Highness Prince Bandar bin Sultan...
( link) ... bespoke-coded dynamic worldwide stockists locator... [via David Nash]

As a result, "indigoed uniformed man" is grammatical in Pearlspace, just like "blue-uniformed man" is for us.

[Note: a few examples of V+ed N+ed N can be found elsewhere in our own universe, as in "dyed haired women" or "braided haired men". Perhaps in Pearl's home universe, this pattern has spread further. I'd hate to revert to the much more prosaic theory that Pearl just systematically substituted fancier words for plainer ones, as one of my friends in junior high school used to do, and didn't notice in this case that the resulting sentence was ungrammatical.]

[Update: Keith Ivey emailed:

Does the rule prohibiting "burned-nosed tourists" really concern past participles in general, or only "-ed" past participles?
I see your point about "bespoke" no longer being a past participle, but Google has examples of similar phrases that do use what seem to me to be past participles:
broken-toothed smile/comb/windows sunken-chested wreck/man/Matthew Broderick shorn-headed madwoman/model/bassist frozen-smiled gals/fiends/car salesmen
There's even one occurrence of "burnt-nosed boarders" and one of "Burnt-nosed Werdläur".
But maybe all of those are a little more adjective-like in context than the average past participle. A chest can be very sunken or a nose very burnt. I don't know that a head can be very shorn, though. Still, there seem to be quite a few occurrences of "closed-mouthed" with various nouns, so maybe there's no difference between "-ed" and other past participles after all. At any rate, I think all of these are less adjective-like than "bespoke", but then there are far more past participles that don't seem to work.

There is probably a section in CGEL that clears all this up -- at least with respect to our universe -- but I haven't been able to find it yet. My only real complaint about that estimable book is that no one seems to have been able to figure out a decent indexing scheme for this kind of thing. ]

Posted by Mark Liberman at 09:15 AM

May 01, 2004

The Dan Brown code

Approximately three people still haven't read Dan Brown's The Da Vinci Code: Mark Liberman, David Lupher, and reportedly at least one other person (as yet unidentified).*

Regrettably, neither Barbara nor I are able to claim that the third non-reader is one of us. What can I say by way of excuse for this? I found the book was on sale really cheap in CostCo when we were about to leave on a trip to Europe. I bought it for the long, long flights that lay ahead of us, without knowing much about it except that it was supposed to be an intellectual mystery with cryptography and symbology and stuff and the blurbs said it was great. I didn't open it, I just grabbed one off a pallet of about 500 copies. Barbara was between mysteries at the time, so she grabbed it from me and rapidly read it over the next couple of days before we even left for the airport. I asked hopefully what it was like. She scowled and said something about the Hardy Boys. My heart sank; I understood her to mean it was pathetic but possibly of interest to the 11-year-old market. By the time we were on our plane she had made sure that her flight bag contained a new novel by Menking Hannell, and over southern Oregon she told me it was great as usual. Unfortunately I had no better idea of what to do with my time, so I opened The Da Vinci Code.

I am still trying to come up with a fully convincing account of just what it was about his very first sentence, indeed the very first word, that told me instantly that I was in for a very bad time stylistically.

The Da Vinci Code may well be the only novel ever written that begins with the word renowned. Here is the paragraph with which the book opens. The scene (says a dateline under the chapter heading, 'Prologue') is the Louvre, late at night:

Renowned curator Jacques Saunière staggered through the vaulted archway of the museum's Grand Gallery. He lunged for the nearest painting he could see, a Caravaggio. Grabbing the gilded frame, the seventy-six-year-old man heaved the masterpiece toward himself until it tore from the wall and Saunière collapsed backward in a heap beneath the canvas.

I think what enabled the first word to tip me off that I was about to spend a number of hours in the company of one of the worst prose stylists in the history of literature was this. Putting curriculum vitae details into complex modifiers on proper names or definite descriptions is what you do in journalistic stories about deaths; you just don't do it in describing an event in a narrative. So this might be reasonable text for the opening of a newspaper report the next day:

Renowned curator Jacques Saunière died last night in the Louvre at the age of 76.

But Brown packs such details into the first two words of an action sequence — details of not only his protagonist's profession but also his prestige in the field. It doesn't work here. It has the ring of utter ineptitude. The details have no relevance, of course, to what is being narrated (Saunière is fleeing an attacker and pulls down the painting to trigger the alarm system and the security gates). We could have deduced that he would be fairly well known in the museum trade from the fact that he was curating at the Louvre.

The writing goes on in similar vein, committing style and word choice blunders in almost every paragraph (sometimes every line). Look at the phrase "the seventy-six-year-old man". It's a complete let-down: we knew he was a man — the anaphoric pronoun "he" had just been used to refer to him. (This is perhaps where "curator" could have been slipped in for the first time, without "renowned", if the passage were rewritten.) Look at "heaved the masterpiece toward himself until it tore from the wall and Saunière collapsed backward in a heap beneath the canvas." We don't need to know it's a masterpiece (it's a Caravaggio hanging in the Louvre, that should be enough in the way of credentials, for heaven's sake). Surely "toward him" feels better than "toward himself" (though I guess both are grammatical here). Surely "tore from the wall" should be "tore away from the wall". Surely a single man can't fall into a heap (there's only him, that's not a heap). And why repeat the name "Saunière" here instead of the pronoun "he"? Who else is around? (Caravaggio hasn't been mentioned; "a Caravaggio" uses the name as an attributive modifier with conventionally elided head noun "painting". That isn't a mention of the man.)

Well, actually, there is someone else around, but we only learn that three paragraphs down, after "a thundering iron gate" has fallen (by the way, it's the fall that makes a thundering noise: there's no such thing as a thundering gate). "The curator" (his profession is now named a second time in case you missed it) "...crawled out from under the canvas and scanned the cavernous space for someplace to hide" (the colloquial American "someplace" seems very odd here as compared with standard "somewhere"). Then:

A voice spoke, chillingly close. "Do not move."

On his hands and knees, the curator froze, turning his head slowly.

Only fifteen feet away, outside the sealed gate, the mountainous silhouette of his attacker stared through the iron bars. He was broad and tall, with ghost-pale skin and thinning white hair. His irises were pink with dark red pupils.

Just count the infelicities here. A voice doesn't speak —a person speaks; a voice is what a person speaks with. "Chillingly close" would be right in your ear, whereas this voice is fifteen feet away behind the thundering gate. The curator (do we really need to be told his profession a third time?) cannot slowly turn his head if he has frozen; freezing (as a voluntary human action) means temporarily ceasing all muscular movements. And crucially, a silhouette does not stare! A silhouette is a shadow. If Saunière can see the man's pale skin, thinning hair, iris color, and red pupils (all at fifteen feet), the man cannot possibly be in silhouette.

Brown's writing is not just bad; it is staggeringly, clumsily, thoughtlessly, almost ingeniously bad. In some passages scarcely a word or phrase seems to have been carefully selected or compared with alternatives. I slogged through 454 pages of this syntactic swill, and it never gets much better. Why did I keep reading? Because London Heathrow is a long way from San Francisco International, and airline magazines are thin, and two-month-old Hollywood drivel on a small screen hanging two seats in front of my row did not appeal, that's why. And why did I keep the book instead of dropping it into a Heathrow trash bin? Because it seemed to me to be such a fund of lessons in how not to write.

I don't think I'd want to say these things about a first-time novelist, it would seem a cruel blow to a budding career. But Dan Brown is all over the best-seller lists now. In paperback and hardback, and in many languages, he is a phenomenon. He is up there with the Stephen Kings and the John Grishams and nothing I say can conceivably harm him. He is a huge, blockbuster, worldwide success who can go anywhere he wants and need never work again. And he writes like the kind of freshman student who makes you want to give up the whole idea of teaching. Never mind the ridiculous plot and the stupid anagrams and puzzle clues as the book proceeds, this is a terrible, terrible example of the thriller-writer's craft.

Which brings us to the question of the blurbs. "Dan Brown has to be one of the best, smartest, and most accomplished writers in the country," said Nelson DeMille, a bestselling author who has himself hit the #1 spot in the New York Times list. Unbelievable mendacity. And there are four other similar pieces of praise on the back cover. Together those blurbs convinced me to put this piece of garbage on the CostCo cart along with the the 72-pack of toilet rolls. Thriller writers must have a code of honor that requires that they all praise each other's new novels, a kind of omerta that enjoins them to silence about the fact that some fellow member of the guild has given evidence of total stylistic cluelessness. A fraternal code of silence. We could call it... the Da Vinci code; or the Dan Brown code.

_____________

*The third non-reader was unknown when this post was first drafted, but it has since been edited, and as of today (May 2, 2004) I can confirm that Bill Poser and Danny Yee are both claiming not to have read The Da Vinci Code. Fair enough. So at least four people have not read it. I just wish one of them was me.

[Update -- Additional Language Log posts about Dan Brown's novels and related topics:

"The sixteen first rules of fiction" (May 15, 2004)
"Dan Brown still moving very briskly about" (November 4, 2004)
"Renowned author Dan Brown staggered through his formulaic opening sentence" (November 7, 2004)
"Oxen, sharks, and insects: we need pictures" (November 8, 2004)
"Thank God for film: Dan Brown without the writing" (December 2, 2004)
"Learning the ropes in the trenches with Dan Brown" (July 14, 2005)
"Don't look at their eyes!" (July 19, 2005)
"A five-letter password for a man obsessed with Susan" (September 10, 2005)
"Some striking similarities" (May 15, 2006)
"Is Mark Steyn guilty of plagiarism?" (May 15, 2006)
Cutting in line: what would Of Nazareth do? (May 16, 2006)
A tale of two copiers (May 17, 2006)

]

Posted by Geoffrey K. Pullum at 03:43 PM

Speakers vs. hearers

It's an old idea that speech and language are a compromise between the need for a clear message and the desire to save effort and time. For example, word frequency and word length are inversely correlated -- common words tend to be short words -- so that a pronouncing dictionary is a rough sort of Huffman code.

The evolution of word pronunciation is a large-scale process, emerging from millions of particular communicational transactions between individual speakers and hearers. The same must be true for all of the other norms of speech and language. But each individual utterance is still a sort of compromise, a complex optimization on many dimensions -- how much to assume and how much to explain? what words to choose and how to combine them? what order to put things in? how fast and loud to talk? how carefully to articulate?

It's often assumed that speakers make these choices in a way that gives a lot of weight to the needs of their listeners. After all, the point is to be understood, right?

However, there's some evidence that this is often false. A recent contribution is "Avoiding Attachment Ambiguities: the role of Constituent Ordering", by Jennifer Arnold, Tom Wasow, Ash Asudeh and Peter Alrenga (to appear in Journal of Memory and Language). They studied the choice of structures and order of constituents in potentially ambiguous English sentences with both a direct and an indirect object, like "John showed the letter to Mary to her mother."

They show that speakers don't make the choices that would make things easier for listeners (and that the speakers themselves prefer when they are put in the listener's role). Instead, speakers act in their own interests, making choices that decrease the cognitive load of sentence planning.

I believe that non-specialists will find the paper easy to follow, and outsiders to psycholinguistics may find it thought-provoking for two different sorts of reasons.

First, there's the issue of how to model language evolution. There's a sort of economic problem here -- given that the utilities of cooperative speakers and listeners are different (as Arnold et al. show), how does optimization of communication continue to exert an influence, at least at a large scale, on the ways that languages develop? I don't mean that it's hard to think of ways to make this work out -- on the contrary, there are lots of choices, and the point is to explore them theoretically and empirically.

Second, there are obvious implications for teaching people how to communicate, in writing as well as in speaking. Arnold et al. point out that

Language production involves generating an utterance from a non-linguistic message. The message is never ambiguous to the speaker; the only way to identify the ambiguity is to consider how someone else would interpret the message in the current context. This would require passing the planned utterance through the comprehension system, while ignoring the known intended meaning. The production system would have to be sensitive to the degree of temporary parsing difficulty associated with an ambiguous prepositional phrase, and use that information to drive decisions about ordering and prosody. It is not clear that the language production system is built to handle this kind of task. Although language production clearly involves monitoring at some level..., the clearest application of these monitors is to the process of identifying and correcting errors. Ambiguities are not errors per se, and may require more sophisticated machinery for identifying them.

The same asymmetry between producer and consumers hold of other problems besides structural ambiguity -- reference resolution is another obvious example. Effective speakers and writers need to learn to overcome this asymmetry, or at least to compensate for it in some way.

Posted by Mark Liberman at 10:50 AM