February 29, 2008

Students May Speak Spanish on School Bus

In response to a letter from the ACLU, the Esmeralda County school district has rescinded the requirement that students speak only English on school buses that I commented on a while back.

Posted by Bill Poser at 07:44 PM

Doomed by poor spelling and rampant racism etc.


Stanford Daily columnist Nat Hilliard writes today, in a column entitled "Good riddance" (p. 4), about the government of Pakistan's blocking access within the country to YouTube.com, which Hilliard characterizes as a "horrible, horrible Web site", indeed "our society's worst cultural creation since hampsterdance.com".  Harsh words.  Hilliard goes on to see the site as a threat to our very language:

... if the comments below the videos are any indication, the English language itself is doomed.  The poor spelling and the rampant racism, sexism, anti-semitism and penis references are enough to make any first grade English teacher weep.

Poor English!  Not only is it threatened by creeping bad grammar, unfortunate word choices, and poor spelling and punctuation, but now it's doomed by being used to express distasteful social attitudes.  English teachers are weeping openly.  What's a language to do?

Ah, Hilliard has sounded the alarm too late.
JUST IN:
DEATH OF ENGLISH



(Photo passed on a long time ago by Geoff Nunberg and archived for use on this sad occasion.)

Posted by Arnold Zwicky at 04:33 PM

It's the linguists again

Breaking news from the Onion: "Idiom Shortage Leaves Nation All Sewed Up In Horse Pies", 2/29/2008.

WASHINGTON—A crippling idiom shortage that has left millions of Americans struggling to express themselves spread like tugboat hens throughout the U.S. mainland Tuesday in an unparalleled lingual crisis that now has the entire country six winks short of an icicle.

Amidst all this linguistic chaos, the Onion's reporter seems to have blended her interviews with Adam Albright and Howard Lasnik:

"This is an absolute oyster carnival," said Harvard University linguistics professor Dr. Howard Albright, who noted that the 2008 idiom shortage has been the country's worst. "I don't know any other way to describe it."

Albright said that citizens in the South and West have been hit by the dearth of idioms like babies bite the bedpost, with people in those colorful expression-heavy regions unable to speak about anything related to rain storms, misers, sensations associated with nervousness, difficult or ironic predicaments, surprise at a younger relative's rapid increase in height, or love. In some areas, what few idioms remain are being bartered or sold at exorbitant prices. And, Albright claims, unless something is done before long to dry out the cinnamon jars, residents of Texas may soon cease speaking altogether.

[Hat tip: Jay Ashworth]

Posted by Mark Liberman at 05:33 AM

February 28, 2008

It was the linguists

In response to my recent post on Benjamin Lee Whorf's Linguistic Relativity Hypothesis, Daniel Drucker sent along a link to a fun little story-fragment that David Chess posted yesterday:

Maybe it wouldn't have been so bad if we hadn't been each other's First Contacts. Virgin civilizations, groping each other in the dark.

"Damn it, damn it, damn it," the smaller of the two men moaned, his head down in his arms on the broken table, as the sounds coming in through the half-boarded-up window swelled louder.

"If they wanted to destroy us, why didn't they just send a missle, an asteroid, a fucking army?"

The taller man took another drink from the bottle in his hand, staring without seeing at the window.

"We started it, you know."

"Bastards, bastards."

"We nearly destroyed them."

"Should have."

"It was the linguists," his voice was rough and slow, detached, almost toneless, "that went out in the first starship. We taught the Tanatha suicide."

"Bastards." The sounds outside moved away a bit, grew softer.

"Their language was utterly alien. No reflexive forms, strange verb tenses. Eventually they learned enough of it to try to ask them questions, eventually they asked them what their word was for 'suicide'. They didn't have one."

"Bullshit."

"They didn't. They had no reflexive forms, and 'to be' and 'to kill' were such utterly incompatible concepts that they had been literally unable to imagine killing the person that you are. Until we asked the question, and kept asking it until they understood."

He took another long drink, a deep breath, and shuddered. The man at the table raised his head just long enough to wipe his eyes.

"It nearly destroyed their civilization. They didn't have the millennia of evolved defense mechanisms that we did, the cultural institutions that discourage killing yourself, the structures to deal with it.

"They experimented.

"They died.

"Their cultures crumbled."

"Not fucking far enough they didn't," the smaller man muttered, and lay his head down again with a thud.

"They fell so fast. Our linguists came back on the last starship they sent out, along with what was left of their Tanatha colleagues. Half the crew died on the way, but they got here."

"Bastards."

"And their linguists, the ones that stayed alive, learned our language in return, and one day they knew enough to ask, to ask what was our word for --"

"No, no, no, no, no," the man slumped over the table moaned monotonously, as another explosion bloomed outside and a chorus of voices raised in an ululating scream, full of fear and an incomprehensible ecstacy.

Many SF writers have done many things with many forms of Whorf's "linguistic relativity" idea, but this one seems to be pretty much the ultimate reductio.

Posted by Mark Liberman at 11:50 AM

February 27, 2008

Correcting the King?

So (as Nathan Bierma points out — see Arnold Zwicky's latest post) Martha Brockenbrough corrects The King on his grammar, because "All Shook Up" contains a past participle that exhibits non-standard English verb morphology? Just when you think the soi-disant grammar sourpusses can't get much dopier, there they go again. Well, I hope Elvis got due credit for the beautiful proper grammar of the last word in "Now and Then There's A Fool Such As I" (the nominative case on the pronoun I is one possibility in formal-style Standard English for a pronoun functioning as a predicative complement, you see). Of course, Grumpy Martha will also want to correct there's to there is in that title (prescriptive grammarians unfailingly confuse any element of informal style with a lapse into grammatical incorrectness). Some other redacted Elvis Presley songs that would be on Martha's Elvis playlist:

  • Treat Me Nicely
  • Do Not
  • Love Me Tenderly
  • I Cannot Help Falling in Love
  • Do You Not Think It Is Time?
  • I Do Not Care If The Sun Does Not Shine
  • It Is Not Any Big Thing (But It Is Growing)
  • Is That Not Loving You, Baby?

Hey, this is a fun game. And it takes my mind off thinking about the grammar loonies — the whining pedants who imagine that all informal usage should be made formal, and no infinitives must ever be split, and everybody who uses non-standard American dialects in any context needs to straighten up and fly right. (All right, all right, fly correctly.)


[Added later] Just to be scrupulously fair (the above is mainly just kidding around, of course), here is what Martha Brockenbrough actually wrote about Elvis:

Lest we fall into the trap of insisting that all artists follow the rules, I'll admit that there are plenty of times when rule-breaking makes for great songwriting.

I'm tempted to give Elvis a hall pass, and not just because I don't want corpses in study hall. "All Shook Up," frankly, sounds better than "All Shaken Up." Although you could make a case for "All Mixed Up," because it keeps the same meter as the original song, there's something about the word "shook" that contributes to the feeling of chaos that Elvis is feeling. He's so mixed up he can't get the grammar right, which is perhaps the same thing that happened to those movie parents who "shrunk" their kids.

It's true that she did not actually say that Elvis should be condemned; she was, magnanimously, "tempted" to give him a hall pass. But it looks like she truly does think that "All Mixed Up" would have been a preferable choice on the part of the songwriter! I think this is someone whose appreciation of sex and drugs and rock 'n' roll is limited to at most two out of the three.

By the way (since I'm being nibbled to death by ducks here, getting bombarded with emails from reputable scholars who should be ashamed of themselves for pompously explaining to me that it was Otis Blackwell who wrote "All Shook Up", not Elvis Aron Presley) could I just remind you that that I am perfectly well aware that Presley was not a songwriter? Could we just recall that I am a lifelong rock 'n' roll devotee and was a professional rock musician for five years before I found out that being a grammarian is even more fun than that, huh? Of course Martha Brockenbrough doesn't mean Presley, she means the writer of the song. But on that I have to defend her: she's using metonymy. When we say the kettle is boiling, we don't mean it; the water is boiling. But we refer to the container instead, using one to represent the other. That's metonymy. And likewise we attribute to singers the words of the songs they sing. Same thing. Get a clue, you linguistics professors (you know who you are) who have been writing me plonking emails about songwriting credits. I know these things. [Sigh.]

Posted by Geoffrey K. Pullum at 08:20 AM

February 26, 2008

National (omigod) Grammar Day


Nathan Bierma has sounded the warning, in his Chicago Tribune column on language today: next Tuesday, March 4, is National Grammar Day.  Bierma writes that he won't be joining

the witch hunt of the Society for the Promotion of Good Grammar (which goes by the unappetizing acronym of SPOGG), which is sponsoring National Grammar Day as a chance to flag any violation of standard English usage [AMZ: or what it believes to be standard English usage] in any situation.

"If you see a sign with a catastrophic apostrophe, send a kind note to the storekeeper," urges SPOGG at nationalgrammarday.com. "If your local newscaster says 'Between you and I,' set him straight with a friendly e-mail." Such corrections are seldom friendly, welcome or necessary. They are usually self-righteous, irritating and misinformed.

The policewoman behind National Grammar Day and SPOGG is Martha Brockenbrough, who serves as grammar guru for Microsoft's Encarta Web site (encarta.msn.com), where she writes a column called "Grumpy Martha's Guide to Grammar and Usage."


So, yes, it's just about as annoying as it could be.  In her column, Brockenbrough even takes Elvis Presley to task for singing "all shook up" instead of "all shaken up".  I'm not making this up.


The NGD site issues its overheated manifesto:

We owe much to our mother tongue. It is through speech and writing that we understand each other and can attend to our needs and differences. If we don't respect and honor the rules of English, we lose our ability to communicate clearly and well. In short, we invite mayhem, misery, madness, and inevitably even more bad things that start with letters other than M.

I'll pass over the only-too-familiar themes of threat and decline, to comment on three things.

The first is the assumption that non-standard variants are unclear and therefore impede communication.  This proposition is mostly just taken for granted, without any kind of defense -- in what way is "between you and I" less clear than "between you and me"?  in what way is "all shook up" less clear than "all shaken up"?  they're non-standard, certainly, but LESS CLEAR? -- and the occasional explanations of how particular non-standard usages are unclear don't survive scrutiny.  Instead, it's just an article of faith that non-standard variants (and conversational, informal, and innovative variants, and variants restricted to certain geographic regions or social groups) are unclear, vague, sloppy, or lazy; the written, formal, established, generally used standard variants are taken to be intrinsically superior, and everything that deviates from them to be intrinsically debased to some degree.  I have yet to see actual arguments in favor of this idea, and it has always struck me as deeply mean-spirited.  After all, you can point out that some variant is standard (generally used by the educated middle class) and an alternative non-standard without demonizing the non-standard variant.

The second is the very odd view of "communication", in which respecting and honoring "the rules of English" is what permits people to convey meaning to others.  This is a travesty of what happens when people use language.  Instead, writers and speakers work to adjust what they say for their audience, and (most important in this context) readers and listeners work to gauge the intentions of their interlocutors.  It's a complex collaboration, in which all the participants have to deal constantly with linguistic and cultural differences, with a good bit of indeterminacy and a certain number of inevitable misfires, with differences in knowledge, assumptions, and goals, and so on. 

Finally, a point that has come up in informal discussions at Stanford about the regulation of language.  Paul Kiparsky has noted on several occasions that while in some European countries the prescribing of language forms for certain public purposes is the job of official bodies, which normally include language scholars (as well as literary figures), this sort of regulation has been PRIVATIZED in English-speaking countries: it's managed by commercial publishers, newspaper and magazine editors, and a whole industry of free-lance advisers, only a few of whom know much about either the nature of language or the structure and history of English.  Such an arrangement resonates with American free-enterprise ideals and also with the widespread American disdain for "experts" and "intellectuals". 

In any case, one result of this arrangement is that there's essentially no one to speak with any authority for rational reform, no one to accord some sort of official status to variants.  Instead, all sorts of proscriptions live on in the marketplace of ideas -- proscriptions against stranded prepositions, split infinitives, sentences beginning with coordinating conjunctions, "singular they", and many more we've discussed here, endlessly -- even when the "high-end" advice literature generally admits them.  What we get is people on the Microsoft Encarta website shrieking for the public shaming of linguistic miscreants, and a lot of peevish ranting all over the place. 

It's no good maintaining that your stranded prepositions are impeccable because MWDEU and CGEL say they are -- after all, these books were written by LINGUISTS, what do they know about correctness? -- or because some advice manuals say they are; someone will be there to point you to their high school English teacher and eight sites on the web, all saying that preposition at end is just flat wrong.  It's no good declaring that you you generally use prescribed variants in certain formal contexts but reserve the right to use other variants elsewhere; someone will be there to tell you that what's right is right, in all contexts and at all times.

Kiparsky's point is one that at first sight seems paradoxical: an official regulatory body, properly constituted, can damp down the ugliness of privatized (and decentralized) prescription, by providing an authority everyone can appeal to, and by making clear the contexts in which its prescriptions are supposed to apply.

I know, that's the sort of thing a Finn would think of.  It's so not American.

Meanwhile, I'm ignoring the nastiness of National Grammar Day, in favor of doing research on varieties of English and how their grammars work.

Posted by Arnold Zwicky at 07:45 PM

February 25, 2008

Poor, arid, and, in appearance, deformed

Were the basic characteristics of Newtonian physics determined by the way that Indo-European languages treat space and time? That was the thesis of an article published in MIT's alumni magazine in April of 1940 (Benjamin Lee Whorf, "Science and Linguistics", Technology Review, 42(6): 229-231, 247-248).

This is surely the most influential article ever to appear in a publication of that type.

I haven't read this work since I was an undergraduate, but an opportunity to read it again came up last weekend. I was staying with Barbara Scholz and Geoff Pullum in Edinburgh, and Barbara is working on some philosophical aspects of the Sapir/Whorf hypothesis. As a result of re-reading this article and talking about it with Barbara, I had a small insight about one aspect of Whorf's idea. I expect that this same observation has been made before, perhaps often, in the vast literature on the topic. But it was new to me, and so I'll share it with you.

Whorf's article starts like this:

Every normal person in the world, past infancy in years, can and does talk. By virtue of that fact, every person — civilized or uncivilized — carries through life certain naive but deeply rooted ideas about talking and its relation to thinking. Because of their firm connection with speech habits that have become unconscious and automatic, these notions tend to be rather intolerant of opposition. They are by no means entirely personal and haphazard; their basis is definitely systematic, so that we are justified in calling them a system of natural logic — a term that seems to me preferable to the term common sense, often used for the same thing.

This idea of "natural logic" as determined by "speech habits" leads Whorf to "a new principle of relativity",

which holds that all observers are not led by the same physical evidence to the same picture of the universe, unless their linguistic backgrounds are similar, or can in some way be calibrated. This rather startling conclusion is not so apparent if we compare only our modern European languages, with perhaps Latin and Greek thrown in for good measure. Among these tongues there is a unanimity of major pattern which at first seems to bear out natural logic. But this unanimity exists only because these tongues are all Indo-European dialects cut to the same basic plan, being historically transmitted from what was long ago one speech community.

This is not just a matter of "Eskimo words for snow" and the like:

What surprises most is to find that various grand generalizations of the Western world, such as time, velocity, and matter, are not essential to the construction of a consistent picture of the universe. The psychic experiences that we class under these headings are, of course, not destroyed; rather, categories derived from other kinds of experiences take over the rulership of the cosmology and seem to function just as well.

His most extensive example contrasts the common-sense physics of the Indo-European speech community with that created by Hopi, which he says "may be called a timeless language". He adds:

Hopi grammar, by means of its forms called aspects and modes, also makes it easy to distinguish among momentary, continued, and repeated occurrences, and to indicate the actual sequence of reported events. Thus the universe can be described without recourse to a concept of dimensional time. How would a physics constructed along these lines work, with no T (time) in its equations? Perfectly, as far as I can see, though of course it would require different ideology and perhaps different mathematics. Of course V (velocity) would have to go too. The Hopi language has no word really equivalent to our 'speed' or 'rapid.' What translates these terms is usually a word meaning intense or very, accompanying any verb of motion. Here is a clue to the nature of our new physics. We may have to introduce a new term I, intensity. Every thing and event will have an I, whether we regard the thing or event as moving or as just enduring or being. Perhaps the I of an electric charge will turn out to be its voltage, or potential. We shall use clocks to measure some intensities, or, rather, some RELATIVE intensities, for the absolute intensity of anything will be meaningless. Our old friend acceleration will still be there but doubtless under a new name. We shall perhaps call it V, meaning not velocity but variation. Perhaps all growths and accumulations will be regarded as V's.

An enormous amount has been written to support or challenge various aspects of this argument. Is Hopi really timeless, in any sense in which English is not? Are the metaphorical connections between time and space really fundamentally different in Hopi and in English? And how much do the differences that exist really matter to "natural logic" and to the development of science? If you're interested in these things, Ekkehart Malotki's Hopi Time is a good account of the Hopi side of the comparison.

The thing that struck me about this passage was the reference to Hopi's lack of words for 'speed' or 'rapid'. Whorf's examples on the English side are unwisely chosen -- until the 16th century and later, English speed meant something like "success, prosperity, power"; and rapid was borrowed in the 17th century from Latin rapidus, an adjective based on the verb rapere "to snatch". So if English had any time-specific words for 'speed' or 'rapid' before Newton's time, they weren't, specifically, 'speed' or 'rapid'.

Malotki takes Whorf to task for the Hopi side of the speed/rapid assertion:

Whorf contended that "the Hopi language has not word really equivalent to our 'speed' or 'rapid.' What translates these terms is usually a word meaning intense of very, accompanying any verb of motion." This statment is true in so far as no nominal lexeme exists in the Hopi language that conveys the value 'speed/velocity.' It is also true that the notion of 'fast' in conjunction with 'running' is frequently captured by the quantifying intensifiers a'ni in the case of a male speaker and hin'ur in the case of a female speaker. Their basic force maybe rendered 'a great/a lot.' The semantic range of a'ni and hin'ur extends metaphorically to such values as 'fast/loud/excellent,' etc., depending on the given contextual circumstances.

[...]

It is not true, however, that no word is found that might be considered an equivalent of English 'rapid.' Halayvi is an adjective that translates 'quick/fast,' occasionally also 'active' or 'lively'.

More generally, I wondered about the English words for quantities like velocity, distance and time. It seems likely to me that most of them -- maybe all of them - originally meant things having nothing to do with their meanings in Newtonian (or for that matter Aristotelian) physics, or were borrowed recently, or both. In many cases, the physics-related senses are originally extended or metaphorical ones, which were developed during the Enlightenment, when intellectuals focused their attention on such abstract concepts, and began to think and to write about them in the vulgar tongues of Europe.

Barbara and I spent a few minutes probing the OED and other dictionaries, and found many examples confirming this hypothesis in the case of English.

According to the OED, speed comes from OE spówan "to prosper, succeed", and the early uses are given the glosses "Abundance; Power, might; Success, prosperity, good fortune; profit, advancement, furtherance". The sense of "Quickness in moving or making progress from one place to another, usually as the result of special exertion; celerity, swiftness; also, power or rate of progress" was originally a figurative extension of the "prosper/power" meaning. The abstract modern sense of "rate of motion" doesn't seem to emerge at all until the 16th century, and doesn't dominate for a long time after that.

Thus in Whorfian terms, it seems that roughly through Shakespeare's time, English speakers had a common-sense physics in which "velocity" was just a particular instance of a more general characteristic that we might describe as "prosperity" or "power".

The word rapid comes from Latin rapidus, an adjectival form of rapere "to seize, carry off", and was borrowed into English in the 17th century. So "rapidly" was originally "snatchingly". Does this reinforce the idea of velocity as prosperity?

It's not just these two words. The earliest citation in the OED for velocity is 1550, borrowed from French vélocité, in turn from Latin velox. And velox did mean "rapid" in Latin, but the AHD tells us that its indo-european root was weg-, meaning "to be strong, be lively".

The word quick, of course, originally just meant "living". It was used figuratively in extended senses like "lively, witty; busy, full of activity; vivid in color, or loud and clear in voice, or pungent in smell; keenly felt". The extensions to motion began in cases where inanimate things moved almost as if alive, like quicksand or flowing water.

The OE word swift originally meant "to move in a sweeping manner".

Fast originally meant "firmly fixed" or "strong".

The words for other physics concepts have undergone a similar evolution. One that particularly struck me was the history of distance. The OED's etymological note explains:

[a. OF. destance, distance (13th c. in Littré), ad. L. distantia 'standing apart', hence 'separation, opening (between); distance, remoteness; difference, diversity', f. distant-em pr. pple., DISTANT. By a further development, OF. destance had the sense 'discord, quarrel', which was also the earliest in Eng. ... ]

Thus the earliest English sense of distance was "the condition of being at variance; discord, disagreement, dissension; dispute, debate". According to the OED, the meaning "fact or condition of being apart or far off in space; remoteness" does not emerge until the late 16th century.

How about time? Well, OED's etymological notes says:

[OE. tíma = ON. tími, wk. masc., time, fit or proper time, (first, etc.) time, good time, prosperity (Da. time, Sw. timme an hour),:—OTeut. *tî-mon-, app. f. a root tî- to stretch, extend (see TIDE n.) + abstr. suffix -mon, -man ...]

There's that prosperity stuff again. And according to the AHD, the IE root involved is "to divide", which (like stretching) also applies to space, substances and groups (as in demos-derived words like "democracy" and "epidemic", where the divisions are social).

At least in lexicographic terms, the Indo-European languages do not, contrary to what Whorf says, share a linguistic history that predisposes their speakers unconsciously to a particular physics of time, distance, velocity and so on. In particular, the English words for those abstract physical concepts developed rather late, mostly as part of a conscious effort to import or develop explicit physical theories. And the terms used were figurative or metaphorical extensions of much juicier and more concrete words for things like "strength" and "discord" and "being alive".

I fancy that you can see this process at work in Thomas Hobbes' ponderous discussion of the meaning of velocity, in Chap. VIII of Elementa Philosophica (1656), which supplies the OED's earliest citation for the "rate of motion" sense of that word. Here Hobbes seems to be struggling to force the English language to convey something that it was not historically prepared to express easily:

15. Motion, in as much as a certain length may in a certain time be transmitted by it, is called VELOCITY or swiftness; &c. For though swift be very often understood with relation to slower or less swift, as great is in respect of less, yet nevertheless, as magnitude is by philosophers taken absolutely for extension, so also velocity or swiftness may be put absolutely for motion according to length.

(If you're suprised, as I was, by the idea of length being transmitted by motion, try the hypothesis that Hobbes intended "transmitted" to mean something like "traversed". The alternative would yield a very non-Newtonian physics indeed!)

In the opening chapter of the same work, Hobbes offers an excuse in advance for the need to write in this way:

I am not ignorant how hard a thing it is to weed out of men's minds such inveterate opinions as have taken root there, and been confirmed in them by the authority of most eloquent writers; especially seeing true (that is, accurate) Philosophy professedly rejects not only the paint and false colours of language, but even the very ornaments and graces of the same; and the first grounds of all science are not only not beautiful, but poor, arid, and, in appearance, deformed.

This is not, I think, just a rejection of sophistry. Hobbes is telling us that philosophical truth is likely to be linguistically unnatural, at least when first expressed. Far from taking for granted the metaphysics implicit in his native language, he is willing to try to start over, deriving new basic concepts and somehow finding ways to express them.

This was the method of enlightenment science in general, it seems to me. When it worked, as in Newton's physics, the results were stunning. And paradoxically, it's the residual prestige of this willingness to see all things new, reinforced by Einstein's example, that Whorf evokes in arguing that the "various grand generalizations of the Western world, such as time, velocity, and matter" are merely "the rationalizing techniques elaborated from [the] patterns" of "a few recent dialects of the Indo-European family".

[A list of several dozen other LL posts that mention Whorf can be found here.]

Posted by Mark Liberman at 08:21 AM

Going to grammar hell in a handcart

"Bad grammar is everywhere you look, and I don't think students care about improving their basic skills. One of the senior teachers at our university can't spell or use grammar; nor can the Government; nor can several major retailers. What chance do we have? We're going to grammar hell in a handcart."

The quotation is from an anonymous colleague (first name Ruth) of an anonymous lecturer who published a column in Times Higher Education whining about (British) students today and their slovenliness and illiteracy. (The majority of articles in Times Higher Education seem to be devoted to whining; I had mistakenly thought it was a magazine of news about higher education, but in fact it is a remarkable case of a hobbyist's magazine, focusing on the very popular hobby of grumbling about UK universities, administrators, and students.) Not a single example of any grammar error appears in the article, not even a tiny hint of where the syntax of the "senior teacher", the "several major retailers", and the entire government goes wrong. Language Log can hardly get going on the problem if no examples are given, can it? One wonders just what these alleged grammar errors would turn out to be, in a world where (for example) one finds people complaining bitterly about locutions like between you and I who are not aware that Shakespeare's English has many examples of the same sort (the phrase just cited appears in The Merchant of Venice). It would take some examples to convince me that Ruth the mystery grumbler would be able to tell whether her handcart had arrived in grammar hell or not.

Posted by Geoffrey K. Pullum at 05:19 AM

You say feminine, I say masculine, let's call the whole thing off

Last week, Dalila Ayoun, of the Department of French and Italian here at the University of Arizona, gave a talk in our linguistics colloquium series in which she dropped a bombshell: native French speakers don't know the genders of French nouns!

OK, that's not quite right: it would be more appropriate to say that native French speakers don't agree on the genders of French nouns. They really don't agree. Fifty-six native French speakers, asked to assign the gender of 93 masculine words, uniformly agreed on only 17 of them. Asked to assign the gender of 50 feminine words, they uniformly agreed only 1 of them. Some of the words had been anecdotally identified as tricky cases, but others were plain old common nouns.

Ayoun didn't set out to test whether native French speakers can accurately identify French nominal gender. Her primary research interest is in second-language learning of French. Like nearly everyone in the field, and with good reason, she had assumed that native speakers behave fairly uniformly with respect to the grammar of their native language.

Second language acquisition studies often have a common structure. The experimenter tests people learning the language on a particular linguistic task. Usually there are different groups of second language learners -- advanced vs. beginning, etc. They all do the task, and the experimenter looks at how many mistakes they make, how long they take to do the task, etc., and draws conclusions about the course of language learning, the efficacy of the teaching technique, or whatever.

The experimenter always also gives the task to native speakers, as a kind of control group, to show that when the language has been fully, correctly acquired, speakers perform at or near ceiling -- close to 100% right.

Just to give a typical example, I have student who is looking at second-language acquisition of Chinese. She is having her subject perform a task in which they form a sentence containing a relative clause from two independent sentences. (Input: John saw a man. The man was tall. Correct response: John saw the man who was tall.) In my student's pilot study, she discovered that she might have to reduce the number of sentences in her study, since even fairly advanced second language learners were taking up to an hour and a half to complete the test. In contrast, her native speaker subjects were taking ten minutes to do the same test, with of course 100 per cent accuracy. This kind of disparity between native speakers and second-language speakers is the norm, rather than the exception.

Ayoun was investigating second-language learning of grammatical gender in French -- a major difficulty for learners from non-gender languages like English. She had constructed a couple of tasks: grammaticality judgments of sentences where there was a gender agreement mismatch, and a gender-assignment task, where subjects were given a noun and had to choose among "masculine", "feminine", "both", or "I don't know".

In both tasks, to her great surprise, she found a great deal of disagreement among her native-speaker controls! In these tasks, there is always a normatively 'correct' answer -- French dictionaries and textbooks all agree on what the genders of nouns are, and how gender agreement in sentences should turn out -- in the same way they agree on how to form relative clauses, and how to form passives, and where to put clitic pronouns, and so on. Native speakers would be expected to perform close to ceiling on this grammatical task, as on others. But, surprisingly, they don't.

There's an even more interesting twist in Ayoun's native-speaker results. Her native speakers fell into two groups: 14 adult speakers and 42 teenage speakers. On most grammatical tasks, for all intents and purposes, teenagers' native-language abilities are identical to adults' abilities. But when she broke down the gender-assignment task results by age, she found that teenagers showed considerably more variation than the adults. On the 50 feminine nouns, for example, the 14 adults all agreed on 21 of them, while the 42 teenagers agreed on only one: cible, 'target'. Of the 93 masculine nouns, the adults agreed on 51 of them, while all adults and teenagers agreed on only 17 (of 93!!)

Below I reproduce one of Ayoun's tables illustrating significant differences in the rates at which adults and teenagers agreed on the gender of 10 feminine nouns.

There are many questions one would like to ask about this, of course, and since Ayoun's study was not designed to answer questions about native-speaker variation in gender assignment, answers to most of them will have to await further experimentation. But the result itself seems really remarkable to me. According to Ayoun, the last study in which anyone systematically tested native speakers' deployment of grammatical gender in French was Tucker et al. (1977) -- more than thirty years ago! Work to be done.

For the interested, some of the second-language speakers' results from the study have already appeared in Ayoun (2007). And second language speakers of French, take heart! Make your grammatical gender agreement mistakes with confidence. There's a chance that your native-speaker interlocutor will agree with your version!

Comments?

Ayoun, D. (2007). The acquisition of grammatical gender in L2 French. In D. Ayoun (ed.), French Applied Linguistics, pp. 130-170. Amsterdam and Philadelphia: John Benjamins.

Tucker, G. R., W.E. Lambert, and A. Rigault. 1977. The French speaker's skill with grammatical gender: An example of rule-governed behavior. The Hague: Mouton

Posted by Heidi Harley at 02:42 AM

February 24, 2008

Three words to win her heart

Today's Dilbert strip reveals the ultimate pickup line for getting the romantic attention of women. It is supplied by Dogbert, who tells our hero, "Find a woman who looks hot, carve her out from the herd, and read this." He hands Dilbert an index card on which he has inscribed the magic three-word sentence. I don't think it would be right to exhibit the sentence here on Language Log. The words are too powerful.

There are, of course, given any finite set of n lexical items, far fewer than n3 three-word sentences. Perhaps you could even mentally riffle through the relevant set for English and guess the sentence. It is of course a simple transitive active clause; the subject is an abstract noun and the object is a personal pronoun. The verb is disyllabic, and of Latin origin. The line seems to work for Dilbert.

Posted by Geoffrey K. Pullum at 09:41 AM

February 23, 2008

In memoriam Gardner Lindzey


On the 4th of February, Gardner Lindzey died, here in Palo Alto, at the age of 87.  Gardner was

a psychologist, editor and former president of the American Psychological Association who helped build a national framework to encourage scholarly exchanges and collaborations in the social sciences  [Jeremy Pearce in his New York Times obituary of February 18th]

As it happens, though he was a social psychologist, Gardner played a significant role in encouraging scholarship in semantics, cognitive science, and related fields; and his larger role in academia -- as someone whose strengths were in surveying, integrating, and synthesizing ideas and facilitating scholarly communication and collaboration -- deserves an encomium on its own.


I'll take the second thing first.  You get really famous in the academic world for novel ideas and discoveries.  But scholarship, satisfying the teaching and service responsibilities of universities, and administering the many sorts of institutions that keep the whole enterprise going require a great many people with various sorts of talents beyond the pursuit of original research in a narrow sense.  We need people who collect and organize large masses of data (though they might be derided as mere fact-gatherers).  We need people who survey scholarship for scholars, and people who communicate scholarly ideas through teaching and textbooks and through writing and speaking for audiences outside the academy (though such people might be derided as mere purveyors of other people's ideas).  And we need knowledgeable people who create and administer scholarly and academic programs and institutions of many sorts (though they might be derided as mere managers).

Gardner's great achievements were as editor of the comprehensive reference work The Handbook of Social Psychology (first edition 1954, solely edited by Gardner, with three later co-edited editions), and in academic administration -- at Texas, especially; on advisory panels addressing social and psychological issues; and, most significantly, as director of the Center for Advanced Study in the Behavioral Sciences (at Stanford) from 1975 through 1989.

That's where I came in.  I was a fellow at CASBS in 1981-82, a year in which there was a "special project" on Meaning and Cognition, whose core members were Jon Barwise, Manfred Bierwisch, Robin Cooper, Hans Kamp, Lauri Karttunen, and Stanley Peters.  There were also colleagues and research assistants who were not fellows but participated regularly in project meetings; in addition to me, these included Edit Doron, Elisabet Engdahl, Rich Larson, John Perry (who had been a fellow in 1980-81), Ivan Sag, and Hans Uszkoreit (this is far from a complete listing).  Semantics was clearly the center of the project (Barwise and Perry's Situations and Attitudes came out of CASBS activities), but the participants ranged over syntax, philosophy, mathematics, and computer science as well, and the project was followed by the founding of the Center for the Study of Language and Information at Stanford (which in the summer of 1984 sponsored research by, among others, Gerald Gazdar, Ewan Klein, Geoff Pullum, Ivan Sag, and me) and then by the creation of the undergraduate interdisciplinary program in Symbolic Systems (roughly, cognitive science) at Stanford.

Only two years before the Meaning and Cognition project there was a special project on Artificial Intelligence and Philosophy, with core members Dan Dennett, John Haugeland, Pat Hayes, John McCarthy, Bob Moore, and Zenon Pylyshyn.  And three years after Meaning and Cognition came a Morphology project, with core members David Dowty, Gerald Gazdar, and Jerry Sadock.  Those were heady years at the Center.

Now, Gardner didn't create any of these projects, but he encouraged and fostered them, and I think he was proud of the work that was done on his watch, especially in the Meaning and Cognition project.

Posted by Arnold Zwicky at 02:18 PM

Will need never happen

Are any fellow readers of The Economist puzzling over the final sentence in the article that ends on page 100 (issue of 23 February 2008)? It says this (about a plan for forecasting of viral outbreaks in Africa):

And then a catastrophe like AIDS will need never happen again.

I think it is just a word processing error, not an anomalous occurrence of a double modal in written Standard English. I think the writer wrote need never happen again (which is fine; need is a modal verb, so the negative adverb never comes after it), and then decided it didn't sound future-oriented enough, and considered saying will never need to happen again (which would be grammatical, with never following the modal verb will and the lexical verb need taking a to-infinitival complement, because it is not a modal verb). Perhaps the writer got as far as putting in the will, but then the phone rang or something, and things were left in that state. From then on it was the responsibility of the editors to notice the slip, but they failed to spot it.

The phrase *will never need happen does have an odd tendency to slip by without being noticed. But it definitely is not grammatical in Standard English. The need that takes no to is a modal, and thus has no plain form, and thus cannot follow another modal. Will never need to happen is an entirely different kettle of fish — it uses the lexical verb need rather than the modal. Will need never to happen is different again: it has a different meaning, because when never is placed after the lexical verb need it can only be understood as belonging to the happen clause. You therefore get the meaning "will necessarily not happen" rather than the meaning "will not necessarily happen". That's what we see in attested examples like "And we will need never to repeat the lunacy of awarding a raise to a bunch of out-of-touch, tax-crazy, office-seekers. The angry person who wrote that (probably struggling to avoid a split infinitive) means it is necessary that we never repeat the lunacy (need + [never repeat]), not that we will never need to repeat it (never + [need to repeat]).

Added a bit later:
Now that I've said what I immediately suspected had happened at The Economist, and I've had a few minutes to examine some Google hits, let me point out why I could be wrong: there are other people who have used the very same construction (will need never + Verb) in their writing on various blogs and other websites:

Maybe your heart will need never melt, in the wake of your having frozen it.

Speak evil of no one and you will need never whisper.

Your man will need never be asked to make an effort again.

Ever tried searching for something for your Land Rover, and couldn't find the answer? Well now with thanks to our website, you will need never worry again.

If those who love you can affect you, you will need never lose heart or suffer depression.

In doing this you will discover your very own way of listening to your Goddess Queen, and you will need never feel powerless, confused, or victimized...

That way, the server will need never send a broadcast to a client attached to a non-primary interface.

In most cases, you will need never deal with the taxing authority again...

Once you become a Lifetime member, you will need never pay a single cent more... [This one appears to be quoting a printed source.]

And P. Orbis Proszynski points out to me that the Routledge book The Philosophy of Utopia by Barbara Goodwin contains the sentence Having fully realised the principles of socialism, the citizens of utopia will need never again concern themselves with serious questions. That one seems particularly convincing.

Although we cannot rule out the possibility that all of these are word-processing errors too, as the number we gather goes up the plausibility of their being accidental word misplacements goes down, and the probability that a new construction is being born goes up.

This would be a new instance of what one might (cautiously) call a double-modal construction in which the modal need is permitted to occur as the head verb in the (bare infinitival) complement of the modal verb will.

Notice, I am not saying that my intuition is a gold standard for what goes on in proper English; but on the other hand I am not saying that if something occurs in a few comments on blogs it is therefore correct like everything else that people say or write. This pair of extremes make up the false dichotomy I have written about elsewhere: the pointless clash of "Everything is correct" versus "nothing is relevant" (Language Log, January 26, 2005). Syntactic investigation is difficult. Sporadic slips occur, we know that; but unrecognized but fully regular new constructions develop as well; and presumably at some points we are in a difficult grey area where a sporadic slip has become more than a little frequent and a new construction is starting to grow as a result. Knowing which of these situations one is in is a matter of very considerable epistemological difficulty. The reason we are so angry-sounding when we talk about complacent, simplistic, know-it-all grammar pontificators here at Language Log is not because we know it all; it's because nobody does, but it would be interesting to find out.

Posted by Geoffrey K. Pullum at 10:43 AM

The semantics of sharing and stealing

An interesting essay on semantics in the Los Angeles Times: it's about whether or not "file sharing" meets the definition of the term "stealing". Have I stolen anything if you let me make a copy of a music file you have on your laptop? I have something new, but you have exactly what you had before. And one could say that whoever created the song or performed it, and however you got it, they have exactly what they had before as well. It's just about the same as if you give me a kiss: you have lost nothing at all, and nor has anyone else (and consider giving someone a kiss because someone else had said "When you see him, give him a kiss from me"). On the other hand, I suppose you could say you still have what you had before if you give me a massage and then I run away without paying you for it, and that seems plainly wrong in moral and legal terms: there is such a thing as stealing a service. So is a song in the music downloads folder on your laptop more like a DVD on a rack in a store, or like a massage, or like a kiss? Language Log has an answer. It will disappoint you. If you want to know what it is, read on.

The answer: Language Log doesn't know, but suspects that despite the reference to semantics in the title of the article ("File sharing or stealing? The semantic debate over whether copyright infringement is theft") the lexical semantics of verbs like steal or massage will not be the place to look. Language Log thinks you should consult an intellectual property lawyer before engaging in... well, the act that is either an innocent sharing of aural pleasure between consenting parties or a flagrant ripoff of a hard-working entertainer and entertainment company, depending on how you look at it.

For a readable essay on sharing versus stealing by a lawyer, University of Michigan Law School professor Jessica Litman (one of the leading copyright-law experts in the U.S.), see this PDF file.

Posted by Geoffrey K. Pullum at 06:50 AM

February 22, 2008

Just in case...

...you haven't see xkcd from a few days ago:

Duty Calls

The mouseover title text: "What do you want me to do? LEAVE? Then they'll keep being wrong!"

Depending on your perspective, this is either why we don't have comments, or why we need to have comments.

Posted by Mark Liberman at 10:29 AM

February 21, 2008

Happy International Mother Language Day

ᐧᐊᐣᑔᐪ ᙌᐣ "International Mother Language Day" (ᗸᘏᑋ ᗝᙣᐠ ᐧᐅᘢ ᐁᗌᘆ ᙩᑔᒡᗠᐧ ᙌᐣ) ᐧᐅᐯᐣᗣᑋ. ᐣᑔ ᙌᐣ ᐧᐈᐪ ᘅᙢᑕᐧ ᗪᗌᘆ ᙩᑔᒡᗠᐧ.

Today is International Mother Language Day, the annual holiday proclaimed by UNESCO in honor of linguistic diversity.

Posted by Bill Poser at 12:12 PM

A grammatical Cupertino?

On the American Dialect Society mailing list, Ron Butters notes an unusual sentence appearing in today's Orlando Sentinel:


Ron wonders,

Perhaps this is just a typo that both the author and editor missed — or a hypercorrection — or do people really normally use "risen" as the past participle of "raise"?

I suspect this is neither a hypercorrection nor a variant usage. Instead, I see it as a twist on the Cupertino effect, but in this case a grammar checker instead of a spell-checker is to blame.

On a hunch, I typed the original sentence in Microsoft Word 2003, using raised instead of risen, and then ran it through the grammar checker. Sure enough, raised was the one "mistake" that got flagged:


It's a rather odd "incorrection" for MS Word to make. I could understand the grammar checker flagging "has rised," but "has raised" is, by and large, used in a perfectly grammatical fashion — very often in the exact context of the Orlando Sentinel article, specifiying how much money a candidate has raised. It looks like Geoff Pullum was right on the money when he wrote, "For the most part, accepting the advice of a computer grammar checker on your prose will make it much worse, sometimes hilariously incoherent."

[Update #1: Further enlightenment from ADS-L, first from Arnold Zwicky:

interestingly, the original *has* a reading, parallel to

Cambpell has risen far more in net worth than ...
(with "far more in ..." serving as an extent adverbial) but this wasn't the reading intended above (where "far more in ..." is a direct object). so why should the grammar checker go after "raised"?
perhaps because using forms of RAISE where RISE is called for is a moderately common error, treated (not always well) in lots of advice manuals. the reverse error is much less common.
but more is going on here, since my grammar checker (correctly) catches at least some occurrences of RISE for RAISE, as in:
Campbell has risen more children than his siblings.
Campbell has risen a lot of money.
(and it correctly raises no objection to the versions of these with "raised").
on the other hand, it incorrects "risen" to "raised" in
Campbell has risen a lot further than I have. Campbell has risen a lot more than I have.
so it looks like the program has some scheme for detecting direct objects -- well, NPs following a verb -- but the program isn't very good at distinguishing NPs from adverbials (because that would involve, umm, actually understanding the sentences).

And from Larry Horn:

You observe in the LL post that

It's a rather odd "incorrection" for MS Word to make. I could understand the grammar checker flagging "has rised," but "has raised" is, by and large, used in a perfectly grammatical fashion - very often in the exact context of the Orlando Sentinel article, specifiying how much money a candidate has raised.
But in fact if the Sentinel reporter had written and spell-checked not
Campbell has raised far more in campaign contributions than both his opponents combined.
but rather
Campbell has raised more campaign contributions than both his opponents combined.
no incorrection to "risen" would have been suggested. It's not the "has raised" that has raised (*risen) the spell-checker's red flag but the "(has) raised...in...", which it took (incorrectly, but plausibly) to signal the presence of an intransitive "rise/risen" as opposed to a transitive "raise/raised". (On the model of "You've raised my expectations/You've risen in my expectations".) In the case at issue, "raise" is a transitive verb being used absolutely, but you'll have to admit that's a pretty subtle point for the spell-check to be expected to grasp.

Meanwhile, Ron Butters wants to know if we can call this sort of error a typocorrection.]

[Update #2: Ron contacted the reporter directly, and she confirms that this was indeed a grammatical Cupertino. And from the other side of the computational transaction that led to the error comes this from James Lyle of the Microsoft Natural Language Group (a colleague of Thierry Fontenelle):

It is of course too bad that the grammar checker got this one wrong (rats!) but this sentence is a nice example of why grammar checking is hard—it's actually not a very odd incorrection for the GC to make, because of the ambiguous reading of "far more". I'm guessing the GC not unreasonably preferred the parse in which "far more" has an adverbial reading, as in the almost identical "Campbell has risen far more in the polls than his opponents combined...". In sentences with a less ambiguous direct object for "raise", e.g. "Campbell has raised far more money than his opponents..." the GC correctly doesn't flag any error.
This kind of problem is all too common (people still being smarter than computers, as we've blogged about before), so no grammar checker yet can be anywhere near perfect, as you and Geoff Pullum point out. The best we can hope for is to be right a lot more often than we're wrong (which fortunately we are, when you look across all our customers and the uses to which they put MS Word!).
]

Posted by Benjamin Zimmer at 11:52 AM

NYC subway semicolon and NCLB

As Mark has observed, the coverage of the NYC Subway Semicolon Case — a semicolon appropriately deployed in a subway message — produced a baffling quote from Noam Chomsky, after clearly relevant remarks by Louis Menand, Lynne Truss, [our own] Geoff Nunberg (who started the whole business), and Allan M. Siegal:

The linguist Noam Chomsky sniffed, "I suppose Bush would claim it's the effect of No Child Left Behind."

What on earth did Sam Roberts (the writer of the story) ask Chomsky to elicit such a response?

The comment seems totally off-topic, given that neither President Bush nor the NCLB Act were in the context. What was Chomsky trying to say? It sounds like he's saying that Bush would interpret ANYTHING linguistically competent, no matter how irrelevant, as a positive effect of NCLB.

Well, this use of the semicolon by Neil Neches (a 55-year-old writer for the city) could not possibly have the slightest thing to do with the No Child Left Behind Act of 2001 (note the date).  You wonder what the interview (presumably done by phone) was like.  And where the snarky "sniffed" came from.

Posted by Arnold Zwicky at 10:09 AM

Payackarama

According to the Chicago Tribune, Paul Payack is still peddling his nonsense about a rigorous census of the words in English that is approaching one million. He has a "a series of mathematical formulas" that he uses, it says (ooh! math!), yet it seems from the story that he personally checks each word (bagonize meaning "agonize while waiting for one's bags at the airport" is in; but nakation meaning "naked vacation" is not, and he makes those calls). Meanwhile, over at Slate, they have invented a widget that makes up new puns on Barack Obama's name, creating a lexical obamarama of vocabarackabulary. I have a very simple and obvious suggestion. Just plug Slate's vocabulabama into Payack's mathematical formularama and let it make up as many new words as Payack needs to hit the million. Who cares about the real size of the English vocabulary? Payack's project is publicizing himself. Slate's project is publicizing itself. But the two can work together. As always in cases of numerical vocabulary assessment for popular consumption, you can just make stuff up. Meaningless silly words for a meaningless silly word census. They'd be perfect together.

Posted by Geoffrey K. Pullum at 03:22 AM

February 20, 2008

Aphasia in the funny papers

Over the past few weeks, Doonesbury has been exploring clinical neurolinguistics. Toggle has come back from Iraq with some serious brain damage, and the classic symptoms of Broca's aphasia. Here are the first two panels for the strip from 2/18/2008:

Toggle's symptoms include expressive frustration:

Sometimes his attempts to communicate work out better. Here's the end of the strip from 2/10/2008:

And here's the strip from 2/19/2008:

 

Posted by Mark Liberman at 01:28 PM

What to blame it on


March approaches, and it's once again time for the Stanford Semantics Festival, known familiarly as SemFest in these parts.  SemFest 9 is on 14 March; a program, with abstracts, will soon be up on the Stanford Linguistics site.  As usual, I'm giving a paper (I'm not actually a semanticist, but I play one annually at SemFest) -- this year, on

What to blame it on: Diathesis alternations, usage advice, "confusion", and pattern extension

incorporating some discussion from Language Log, on the verb blame here and here, on the verb substitute here.

The abstract is below.  (Remember that this is just an abstract, not the whole paper; it's much compressed.)


The linking between syntactic arguments and participant roles is complex: some verbs allow alternative expressions for the same participant roles (give me the book, give the book to me; spray paint on the wall, spray the wall with paint), while other verbs will allow only one of the alternatives, and still others might allow only the other (Levin 1993).

When an alternative to some existing pattern arises, usage critics are quick to criticize it: they are antagonistic towards innovations (or what they perceive to be innovations) in general, but especially to innovations that introduce what they see as just new ways of saying old things.  If we already have the (a) variants, why should we also have the (b) variants?

  (1a) blame SOURCE (for CONSEQUENCE)
  (1b) blame CONSEQUENCE on SOURCE

  (2a) rid LOCATION of SOMETHING
  (2b) rid SOMETHING from LOCATION

  (3a) confuse ORIGINAL with REPLICA
  (3b) confuse REPLICA for ORIGINAL

  (4a) substitute NEW (for OLD)
  (4b) substitute OLD (with/by NEW)

"Why do these things happen?", the usage critics ask.  And the critics answer: because people "confuse" the correct usage with other related usages -- they combine, or blend, different constructions.

For blame, for example, the claim is (Funk & Wagnalls (1915)) that people combine the correct (1a) with the related

  (1c) lay/put/place (the) blame on SOURCE (for CONSEQUENCE)

For substitute, the claim is that people combine the correct (4a) with the related

  (4c) replace OLD (with/by NEW).

Now, there is certainly a sense in which the innovative variants have bits of stuff taken from two (or more) different places in English syntax.  And it's possible that occasionally such an innovation results from true syntactic blending, in which alternative formulations of the same content compete with one another in production, with the result that the actually produced expression has parts of both.  But in general, if the innovation is to be seen as a combination of two things, the combination is at a higher level, the level of patterns -- constructions -- not specific utterances-in-planning.

But I'm inclined to see even this pattern-combination account as gratuitously complex, given that EXTENSION OF PATTERNS to new items that have appropriate semantics is so common, as when donate is extended to the double-NP dative variant.

Why should people do this?  Aren't these just different ways of saying the same thing?  Maybe yes, maybe no, but linguists are here to tell the usage critics that when you have two non-subject arguments for a V, it's really useful to have alternative syntactic argument structures for them: whichever one serves as direct object is focussed on; whichever one comes first is more likely to be discourse-topical; and the different argument structures provide ways to put short before long (avoiding long things first, and, especially, short things last).

The details are different in each case, but in all of them we see speakers actively (though tacitly) re-shaping the materials of their language so as to increase the expressive capacity available to them -- not just balling things up.

Posted by Arnold Zwicky at 08:11 AM

Shaping up the dull-witted pagans

The book review pages of the official journal of the Linguistic Society of America are not always widely read even by active members of the society, so many linguists will not have noticed the stirring polemical writing to be found in the most recent issue of (Language vol. 83, no. 4, pp. 883-6). Interestingly, the target is linguists themselves. People like the honest workers here at Language Log Plaza, in fact. The reviewer is Professor Ronald Butters, and he seems to be fed up with being pushed around by language-loving sentimentalists.

Butters is reviewing Language in the USA (ed. by Edward Finegan and John Rickford, CUP, 2004), and he was irritated right up front by the foreword, by Language Log's own Geoff Nunberg. (He refers to him as a "popular writer and public-radio personality" (hey, watch it, buddy! Nunberg has a linguistics PhD and a teaching job at Berkeley!). The complaint about Nunberg's "breezy, patronizing" foreword is that it

lectures the linguistically unsophisticated about 'chronic American blindness to the complexities of our sociolinguistic history and of the contemporary linguistic situation' ... and presses upon the linguistic novice the solemn significance of the enterprise ('Language in the USA will unquestionably be an important resource for policymakers and decision-makers, and it should make us all better citizens'...).

Butters then goes on to grumble about "the muddied, contradictory, and sometimes seemingly arrogant political center of parts of the book with respect to the complex issues surrounding linguistic discrimination, multiculturalism, language death, and the hegemony of English (particularly those varieties spoken and written by the rich and powerful)." Ever want to see some liberal diversity-loving multi-culti-lefty linguists get slapped about a bit, just so they could taste some of their own medicine? This review is for you!

[Update: and now I see that our own Eric Bakovic received his copy of Language some time ago, and discussed the very same passages I discuss below; but having overlooked this (his opening paragraphs were about a different subject), I wrote the following entirely independently. We overlap. Oh well. It happens.]

The tone of too much of the book (which Butters suggests is set by Nunberg) implies that "the linguistic cognoscenti know what is right and wrong when it comes to language issues, and the public is blind and ignorant and selfish, and they'd better shape up." Whoa! That's not how we sound here at Language Log, is it? All right, I guess I have to admit that I have frequently found my hands hovering over the keys, just about to type something like "Shape up, you members of the public!", though I hope in most cases I have restrained myself from lambasting our gentle readers too much. Maybe we do diss the public sometimes. Perhaps we should try to resist when demons whisper in our ears that we should call non-linguists blind and ignorant, try to mute that tendency in ourselves — even if we care so much about some of the issues involve that in due course we always go back to it.

What Butters wants to see, apparently, is less abuse of the ignorant and diversity-intolerant populace and more actual argumentation:

The linguists in this book for the most part take it as unquestionable and in need of no rational argument that (again, in Nunberg's words) 'efforts to preserve Native American languages' are always a simple social good; that 'a drift towards bilingualism' is not at all 'dangerous' in any important way; that 'common sense' notions about language 'usually amount to no more than myths and folklore ... hardly the grounds that you would want to rely on for making policy'.

Linguists have now hammered many generations of American students with our contrary opinions about normal people's linguistic beliefs, without notable success. The most pliant undergraduates may parrot such ideas in response to exam questions because they know their grades depend on pleasing the linguist. For the most part, though, they go right on believing what the general culture and ‘common sense’ leads them to believe. Perhaps the time has come to ask ourselves why this is the case.

Much of the problem is apparent in the rhetorical stances of many of the authors in this volume. They are preaching to the choir in a church full of dull-witted pagans from another, very wicked planet.

I'm not sure whether all of this is fair or not; but I know splendid rhetoric (and a thought-provoking point) when I see it.

And who's next for a slap? Professor Joshua Fishman, a distinguished expert on the sociology of language:

Consider JOSHUA FISHMAN in his chapter 'Multilingualism and non-English mother tongues'. Speaking of the absence of a multilingual tradition in the United States, he writes: 'the ... linguistic resources of the United States have always been so monstrously squandered and destroyed (at worst) or neglected (at best) that ... we have become an overwhelmingly monolingual English-speaking country... During the twentieth century, several world languages were caused or allowed to atrophy in the USA' (117). Novice undergraduates — not to mention Nunberg's 'policymakers and decision-makers' — are going to meet this sort of rhetorical stance with skepticism and confusion. Apart from a few asides about the necessity for Americans to know second languages in the global village, Fishman nowhere explains to his readers WHY the USA would be a better place if the primacy of English were less than it is today, or WHY the apparent gradual death of Yiddish (his example) is such a great national loss, or WHY Australia is better off because 'resettled ... Macedonian and Arabic ... speakers have successfully pursued and attained a ... measure of intergenerational mother tongue transmission' that Fishman finds acceptable (presumably, Australians have more respect for the linguistic traditions of Macedonians and Arabs than they do for their indigenous peoples). Students — and liberal humanities professors, for that matter — know in their hearts that the melting pot has always been the great American tradition, and that it has been viewed almost totally positively by everybody but linguists, and that there are powerful common-sense arguments in its favor. Dismissive scolding has little effect against such deeply ingrained ideologies.

  Linguistics, we tell our students, is a science in which we objectively study the language of people as they use it, with a deep respect for the intelligence and good sense of the users, regardless of the language or dialect that they have learned to express themselves in. Ironically, when it comes to studying the beliefs that people have about language and their conclusions about language policy and language planning, we are all too often lacking in objectivity and respect.

Warming to his theme, Butters addresses a paper on Native American languages and the extreme danger of some of them becoming extinct (a paper by Akira Yamamoto and Ofelia Zepeda that he agrees is useful), and he comments:

What they do not really explain is why this is necessarily anything other than a rather good thing. Shouldn't we WANT to 'integrate' — read 'absorb' — these worthy people into mainstream economic and cultural life? Isn't it just inevitable? Isn't that why I am a member of the educated middle class and not mucking around without indoor plumbing in some Swedish monolingual farm community like my mother's grandparents? Yamamoto and Zepeda's answer (177), to someone who believes in the prevalent American linguistic ideologies, seems both effetely romantic and hideously self-serving: (a) 'When we lose a language, it means a "tremendous loss to the cultural richness and distinctness of the native communities" (Goddard 1996:3)'; and (b) 'the loss of linguistic diversity is a loss to scholarship and science'. Most of the students and other naifs who may be forced to read this book come from families who wear nice clothes and live in nice houses with numerous electronic appliances and good foreign cars in the driveway; most of the rest come from families who are struggling to find the means to live that way. Should people really be forced (or even encouraged) to 'preserve' languages if to do so might stand in their way of achieving middle-class comforts — even if they get some vague additional promise of 'cultural richness' — simply because linguists want to be able to study the living languages?

Butters very definitely does not claim that all of the public's beliefs and attitudes regarding language are correct, and of course neither do I. But I think his strident rhetorical questions should be listened to and reflected upon, not just kicked aside as ill-tempered populist rant of the sort associated with Southern politicians talking about the threat from south of the border. Serious questions about the benefits (and perhaps the losses) of having an assortment of distinct native languages within one national society should be addressed through research that objectively determines and assesses the effects, not through emotional appeals to imagined cultural riches not vouched for by the language users themselves, or self-serving demands that aboriginal tongues be kept alive (by poor people) for (comparatively wealthy) linguists to study.

In short, widespread faith in the ideal of linguistic and cultural assimilation should — especially in a democracy — be treated with respect and considered thoughtfully, not snapped at as if it were ignorant bigotry.

An odd coincidence is that in the week that this issue of Language reached me, the obituary of the week in The Economist (February 9th) was about Marie Smith, the last speaker of the now extinct language Eyak. But far from echoing anything like the tough-minded what-economic-benefit thinking that Butters alludes to, the Economist obituarist's discussion of the Eyak language, though well-written and interesting, is entirely devoted to sentimental musing about its many words for trees and roots and spruce needles and resin and abalone and nets and mixing bowls, and the way the word for "leaf" was the same as the word for "feather", as if that were the crucial thing we needed linguistic diversity for. It even adds, apropos of Marie Smith's dream of a future revival of the Eyak language: "impossible, scoffed the experts: in an age where perhaps half the planet's languages will disappear over the next century, killed by urban migration or the internet or the triumphal march of English, Eyak has no chance."

So here, far from a ringing endorsement of Butters from the leading magazine of liberal capitalism, we have a complete reversal of Butters' picture: Mrs Smith, an ordinary woman from extreme poverty who brought up nine children, dreams of linguistic diversity and the survival of an isolated southern Alaskan tribal language, with its precious word demexch for a dangerously thin spot on the ice; while we linguists figure only as the scoffing experts who condemn the dream out of hand!

So which are we? Annoyingly preachy liberals insisting on the preservation of languages of no economic importance for our own aesthetic pleasure and scholarly attention? Or Anglophone triumphalists whose realist picture ranks Marie Smith's native tongue way below the introduction of indoor plumbing?

The answer is, actually, that we linguists are all sorts of people with all sorts of views. There are linguists working on native-language maintenance programs, and other linguists who think there is no point in that at all. There are linguists dedicating their lives to detailed description of aboriginal languages because they believe the few hundred souls who speak them should have access to translations of the Christian gospels, and atheist linguists who think missionary work is cultural-imperialist arrogance or even downright evil. There are linguists who rant at a supposed ignorant public of dull-witted pagans, and other linguists who, like Butters, call for a bit more reflection on the basis for the sermon. We'll do fine if we continue to read each other's work, and reflect on each other's points of view, and don't all shout at once.

Posted by Geoffrey K. Pullum at 06:36 AM

February 19, 2008

Burgeoning and otherwise

The NYT keeps track of the relative popularity of its online stories, by counting how often readers click on the "email" link; and today's Most Emailed story is Sam Roberts, "Celebrating the Semicolon in a Most Unlikely Location", about an anti-littering poster on the New York City subways that reads, in part, "Please put it in a trash can; that's good news for everyone".

Language Log readers will already know about this poster, because Geoff Nunberg discussed it here a week ago. And the Times story quotes Geoff's post: "Geoffrey Nunberg, a professor of linguistics at the University of California, Berkeley, praised the 'burgeoning of punctuational literacy in unlikely places.'" (In fairness, Geoff's post gave a hat-tip in turn to Sam Roberts.)

The other quotes in Roberts' story come from Louis Menand, Lynne Truss, Allan Siegal -- and one other linguist:

The linguist Noam Chomsky sniffed, "I suppose Bush would claim it's the effect of No Child Left Behind."

But I suspect that the reason for the story's "most emailed" status is neither the public's interest in punctuation nor the eminence of the authorities quoted, but rather the correction added this morning:

Correction: February 19, 2008
An article in some editions on Monday about a New York City Transit employee's deft use of the semicolon in a public service placard was less deft in its punctuation of the title of a book by Lynne Truss, who called the placard a "lovely example" of proper punctuation. The title of the book is "Eats, Shoots & Leaves" — not "Eats Shoots & Leaves." (The subtitle of Ms. Truss’s book is "The Zero Tolerance Approach to Punctuation.")

Posted by Mark Liberman at 01:18 PM

English vowel sounds and internal dialect translation

The 29 bus that stops outside my home in Edinburgh runs in one direction straight to my office at the university, and in the other direction to a place on the Firth of Forth called Silverknowes. How does one pronounce Silverknowes? English spelling, typically, provides no clue. Does the last syllable rhyme with owes and rose, or with cows and rouse? Is it identical with knows, or with now's (as in now's the time)?

I wanted to know, so I asked the bus driver. But what he said was neither of the pronunciations I was expecting. It was in between the two. I thought for a second that what he had given me was no use at all. But then with a bit of quick mental dialect translation I was able to figure it out, and I had the answer. Let me explain.

We'll need to use a few symbols from the International Phonetic Alphabet (IPA), without which I simply can't explain what happened. (Browser note: Firefox and Safari are more likely to be able to display the Unicode IPA symbols below than Internet Explorer is.)

  1. [ə] represents the unrounded mid central vowel sound known as schwa (a term from Hebrew grammar), which you hear in unstressed syllables like the first syllable of banana or material or polite or potato or tonight when casually pronounced. In Southern England dialects like mine, it also occurs in the diphthong in words like know (see below).
  2. [ɐ] represents a lowered variant of schwa, heard in traditional London dialect in words like young, and in the stressed first syllable of words like oven and brother. I am not going to distinguish between this and the open-mid unrounded back vowel represented in the IPA by [ʌ]. Transcriptions of London English that use [ʌ] are talking about the same vowel sound that I mean when I write [ɐ].
  3. [a] is a low unrounded front vowel heard in Spanish pronunciations of [mapa] (not English map, which is pronounced [mæp]). It occurs in many dialects (such as mine) in the diphthongs heard in my and cow.
  4. [u] will be used here, for simplicity, to represent both the half-close back rounded vowel [ʊ] heard in look, wood, could, push (also in the second element of the diphthong in words like cows) and the close rounded central vowel [ʉ] that occurs in various dialects of Scots, English, and Australian English (London speakers are now generally pronounced goose as [gʉs], but I shall write [gus]). The differences between these two rounded vowels are important, but not germane to the story below.

So, to find out the correct pronunciation I simply asked the bus driver one day before getting off the bus: "How do you say the name of the place this bus goes to? Is it Silver[nauz], or Silver[nəuz]?"

And what he said was: "Silver[nɐuz]." Different from both of the pronounciations I had given him.

This looked like a real problem. The way he pronounced that crucial last syllable — [nɐuz] — would be just about typical for the pronunciation of knows in contemporary London English. But it would also be typical for a Standard Scottish pronunciation of now's. So what I had heard was crucially ambiguous between the two possibilities I needed to separate.

Everything depended on where the driver was from. And I couldn't just presuppose that the driver was not from London; there are plenty of people living and working in Edinburgh who grew up in southern England and speak some variety of London English; I'm actually one of them, despite my long years away in California.

For a second, I stared at the driver and wondered what to do; and then I suddenly saw that I could solve it. I thanked him and got off the bus.

How was the puzzle solved?

I realized, after that second of bafflement, that he had not said [sɪlvənɐuz]; he had said [sɪlvəɾnɐuz]. That sound [ɾ] before the last syllable — a lightly flapped r-sound like the one in Spanish para (IPA [paɾa]) — gave me all I needed.

You see, there are two great families of dialects in modern English: the rhotic ones and the non-rhotic ones. They are separated by many features in a multi-dimensional similarity space, but probably the most fundamental one governs whether the letter r in spellings is pronounced or is silent after the vowel of a syllable. In the rhotic dialects, mar and par and spar are entirely different words from ma and pa and spa. In the non-rhotic dialects this is not so; in fact when speakers from most parts of England say mar, par, spar they sound exactly the same as when they say ma, pa, spa. And non-rhotic dialect speakers say [sɪlvə] for silver; rhotic speakers have some variety of r-sound on the end.

Crucially for the solution of my puzzle, Scottish dialects (like many in Ireland and the West of England, and most American and Canadian dialects) are rhotic, but London English (like the speech of most of England, and Australasia, and some East Coast American varieties) is non-rhotic.

So that slight r-sound was the vital clue. If he pronounced the second syllable of silver in a rhotic way, he couldn't be a London speaker; he was a Scot. Therefore his [nɐuz] was not (for him) the diphthong of knows; it was his version of the diphthong in now's. Thus the last syllable of Silverknowes rhymes with cows, not with knows, which means that for me it is [nauz], not [nəuz].

In the future, when saying Silverknowes to Scots I will say [sɪlvənauz], and they, accomplishing the usual amazing feat of (mostly unconscious) internal dialect translation, will hear me as saying [sɪlvəɾnɐuz].

(Of course, what I confirmed is merely that at least one of the pronunciations current around Edinburgh has -knowes rhyming with cows. There may also be people who use the "nose" pronunciation, of course. I'm not claiming that what I got from my lone informant is correct; I'm just telling the story of how I figured out what my lone informant was saying to me.)

You can research the relation between English dialects and check the transcription of a selection of simple words by using the wonderful collection of sound files and transcriptions on the Sound Comparisons site, created (with support from the Arts and Humanities Research Council of the UK) by an Edinburgh team headed by April McMahon and several of her colleagues, particularly Paul Heggarty (who did essentially all of the massive programming job to create the site) and Warren Maguire (who gathered most of the huge amount of dialect data through travelling and interviewing, and did about 30,000 extremely narrow transcriptions — Warren is a fieldworker with the emphasis on the work). I recommend a visit to the Sound Comparisons site for anyone who is interested in English phonetics (if they have a fast Internet link and a good audio system on their computer; use Firefox as your browser if you can, because Internet Explorer is known to be slower and worse, both visually and auditorily, in a number of ways, in its performance on this site; Warren's transcriptions are more careful than the ones I have used above, and whether all their symbols all show up correctly will depend on details of which fonts your browser has access to. See John Wells' Unicode page for help.)

[Update: My thanks to April McMahon and John Wells for comments and consultation, and to Michael Davies, Matthew Rankine, and Jesse Tseng for correspondence.]

Posted by Geoffrey K. Pullum at 03:42 AM

February 17, 2008

More on Harper

M.J. Harper's publisher sent me a copy of The Secret History of the English Language. After reading about half of it, I put it away with a sigh. But since Sally Thomason has brought it up -- and revealed that Borders is featuring this little tract prominently on its New Books tables -- I guess I'll trot out a few of the more outrageous passages that I noted before giving up on it.

Harper finds the idea that Latin developed into the modern Romance languages too implausible to believe. But (p. 130)

Fortunately, there's a much more reasonable explanation that meets all the facts: Latin is not a natural language. When written, Latin takes up approximately half the space of written Italian or written French (or written English, German, or any natural European language). Since Latin appears to have come into existence in the first half of the first millennium BC, which was the time when alphabets were first spreading through the Mediterranean basin, it seems a reasonable working hypothesis to assume that Latin was originally a shorthand compiled by Italian speakers for the purposes of written (confidential? commercial?) communication.

So the history, according to Harper, is that English developed into French, which developed into Provençal, which developed into Italian; and then at some point, say around 400 B.C., some Italian merchants invented Latin as a form of shorthand. (Yes, this is really the historical sequence that he suggests.)

But wait, you might be saying to yourself, what about Plautus? Popular plays in shorthand? Fabulae! Mihi quidem hercle non fit verisimile.

Don't worry, though, Harper has a story to tell about that as well:

Does this not conjure up visions of shorthand-typists, left for several generations on a desert island, eventually beginning to converse in Pitman's? It would indeed be a ludicrous proposition, except that we actually possess historical records of a Mediterranean people learning to converse in a hitherto written language and managing to do so without difficulty in a single generation: the Israelis, with Hebrew, in the middle of the 20th century. Both the Ancient Romans and the modern Israelis were able to develop cohesive, aggressive and expansionist new states amidst a sea of hostile neighbours, and it must be assumed that the unique language played some part in this.

I think that Harper misses an obvious generalization here. Perhaps Romulus and Remus were actually Giudeeschi (Italian-speaking Jews), already familiar with the idea of passing a commercial shorthand off as an ancient language. Thus the whole Roman empire was really a Zionist false-flag operation that got out of hand!

You probably won't be surprised to learn that Harper is just as skeptical about biological evolution as about linguistic evolution:

.. one cannot demolish modern creation myths by direct methods of refutation because creation myths -- and academic paradigm theories in general -- are almost invariably, in the jargon, 'not falsifiable'. In essence they always contain some combination of circular argument and untestable assumption that renders them unassailable to normal evidential methods. [...]

Take the exemplar of all modern academic paradigms, the Theory of Evolution. There's no question that the theory is valuable in so far as it has led more or less directly to the creation of the modern Life Sciences, but, true or false, the theory no less certainly contains the seeds of its own infinite survival. Having adopted a properly scientific root-and-branch model of speciation in which ex hypothesi all species must be demonstrably linked to other species, it permits the indefinite opening of new categories whenever a species cannot be demonstrably linked to other species. This has the unavoidable corollary that nothing can ever discovered from now until the end of time that can ever call the model into question.

But in fact, I think, this is not true. Some particular model-instantiations can be shown to be almost certainly false -- that humans are not primates but lagomorphs, for example, or that French arose historically from English, or that Latin was originally a shorthand form of Italian.

The fact that many educated people apparently take this little tome somewhat seriously is an indication of the depth of my profession's failure to provide the public with a basic background in linguistics. And it's not only the the management at Borders -- Helen Gordon, who seems to be an intellectual of sorts, wrote a review of the British edition of this book ("Dons divided" New Statesman 9/4/2006) that ends as follows:

Unusual, funny and provocative, Harper wears his learning lightly, but has a serious point to make. While admitting that his own theories about the early Brits "may or may not be acceptable", he warns that historical anomalies are routinely ignored by the academics we rely on to explain our past. Whatever your stance on the Anglo-Saxons (and Harper's suggestions are rather seductive), this fascinating book is a useful investigation into the ways in which history is constructed and the dangers of "unassailable" academic truths.

My only comment is that anyone who thinks that linguists are in the business of constructing "unassailable" truths can't have spent much time in their company.

The reviews of the British edition on the amazon.co.uk site are more numerous than those on the American site -- since the book has been out there for a couple of years -- and generally more interesting, so far.

The review by Greg Kochanski (a physicist turned linguist, and a very smart guy who has done interesting work) gives the lie to Harper's view of academics by signaling considerable open-mindedness:

... while there are forces for conformity in academia, there are also forces for revolution. If an academic discipline slides into slothful conformity, you can be sure it is because real proof is unobtainable, not because people are too blind to see it. If there were clear evidence, some ambitious junior lecturer would grab it, and use it.

So, don't take the book too seriously. It's probably wrong. There's certainly no known way to prove it right. Still, it has an interesting idea or two in there.

I think that Greg is too hopeful about the possibility that Harper might be on to something. After all, most theories turn out to be wrong, eventually, so if you just insist that everything everyone believes is complete nonsense, you will probably turn out to be partly right -- even if your own suggestions are even more preposterous.

The review by H.J. Lomax is less kind:

Whilst reading this stupid, stupid book, it became clear within the first few paragraphs that M. J. Harper must at some time have been dreadfully wronged by academe and borne a grudge ever since. I can only imagine that historians ran over his childhood pet, or that his father abandoned his family to become an etymologist. Whatever its cause, the deep and burning resentment this man feels is palpable. One could almost feel sorry for him if it wasn't for the overwhelming torrents of smug self-satisfaction that cascade from every page.

Let me note in passing that it's far from clear that M.J. Harper is male. The book's "about the author" blurb reads, in its entirety: "MJ Harper lives in London". (The amazon.co.uk site identifies the author as "Michael John Harper", but I can't find anything in the book or on the publisher's web site to indicate that this is true.)

I also wonder whether Lomax might be wrong about the author's motivation. My own hypothesis is that the whole thing was written over a drunken weekend, to win a bar bet:

Harper: It's unbelievable, my friend. No one knows anything anymore. Not anything worth knowing.
Drinking buddy: Oh come now. The general level of education has never been higher.
Harper: Not among the so-called intellectual classes, the idiots that publish and review and buy books. Why, I bet I could write a little tract arguing that French is historically derived from English, and not only get it published, but sell ten times more copies than your last laboriously-researched academic tome.
DB: French derived from English? You're not serious. You might as well argue that Latin was derived from Italian. Everyone knows that's impossible.
Harper: You don't understand -- no one knows anything, not anything that'll stand up to an authoritative poke in an anti-authoritarian voice. Hell, give me a typical modern humanist, and I can make her believe that Latin was invented by Italian speakers as a form of commercial shorthand. Or at least make her accept the idea as an interesting hypothesis.
DB: Latin a shorthand form of Italian? A hundred pounds says no reputable publisher will put it out, unless you frame it as a burlesque.
Harper: Oh, it'll be serious, believe me. You're on for that hundred quid. And how about a side bet on how many copies I sell?

 

[Update -- Sally Thomason writes:

I'm as sure as I can be without meeting Harper that he's a man: Mark Newbrook knows him, and also he's known as Mick Harper to his friends and/or correspondents.

Sally also believes that Harper's book is meant seriously. I'm disappointed -- it works much better as an academic version of Stephen Colbert, in my opinion. Here's hoping that the evidence of Harper's seriousness is just him staying in character. ]

[David Eddyshaw writes:

The mother of all bizarre linguistic books ever to find a publisher must surely be "Hebrew is Greek" by Joseph Yahuda. I came across this years ago in Arthur Probsthain's deeply respectable academic and Orientalist bookshop opposite the British Museum. I picked it up expecting a humorous book or spoof and was astonished to find that it does - or tries to do - exactly what it says on the tin. The readers ' comments on Amazon are enough to make you lose all faith in democracy.

Well, on the American amazon site ( link), there are only six comments. Four of them were written by "A Customer" who sounds suspiciously likely to be the author. One was written by "Aristotelis Ellinas", who has never reviewed anything else, and might also be the author, if he's not Gus Portokalos from My Big Fat Greek Wedding:

Give me a word, any word, and I show you that the root of that word is Greek."
"Kimono, kimono, kimono. Ha! Of course! Kimono is come from the Greek word himona, is mean winter. So, what do you wear in the wintertime to stay warm? A robe. You see: robe, kimono. There you go!"
"The root of the word Miller come from a Greek word, millah, meaning apple, so there you go. And our name, Portokalos, is come from the word meaning orange. So today here, we have, apples and oranges. We all different now, but in the end, we're all fruit."

The remaining comment is by someone from Cyprus -- also his only review on amazon -- who writes only that "This is the almost impossible book to get hold of i have a copy if you are interested".

I'd say that democracy comes off pretty well, actually.]

[By the way, Harper's estimate of the relative compactness of Latin prose seems exaggerated to me. Looking on the web, I found a copy of Caesar's De Bello Gallico in Latin, and a copy of W. A. MacDevitt's rather ponderous English translation. After stripping formatting and other extraneous material from the copies in both languages, the English translation, far from being twice as long, contains about 28% more characters.

In general, responsible translation among modern languages results in an increase in length, because of the translator's attempt to render foreign nuances; and MacDevitt's translation is full of needless words ("at all times" instead of "always", "from that place" instead of "from there", etc.) so I think this overstates the expected degree of compression.]

[Ray Girvan writes:

This looks to be the same Mick Harper who's a prominent member of the Applied Epistemology Library (http://www.applied-epistemology.org), a forum site for the discussion of off-the-wall theories, with a notable focus on Harper's book in its older incarnation, "The History of Britain Revealed: the shocking truth about the English language". The forums are secret in the sense that they demand a confidentiality agreement to sign up.

Given the prominent role of Applied Epistemology™ in Harper's book, it seems plausible that he's in fact the proprietor of that site (though I've never visited it and have no other information about it).

If Mr. Harper can raise one eyebrow at a time, and has a sense of humor, I could see a future for him as the host of a sort of intellectual Colbert Report, proving that Britain was originally colonized from North America, that disease causes germs, that cancer leads to smoking, that the North Pole is actually the South Pole, etc., and generally slaying sacred cows right and left. ]

Posted by Mark Liberman at 03:24 PM

More spellchecker fun


Two pieces of e-mail yesterday on the oddities of spellcheckers.  Can you guess what happened when

Piotr Orbis Proszynski put benefactive through his Outlook spellchecker; and

Bruce Webster wrote e-mail about a cat named Tigger via Thunderbird mail; and

the word in question wasn't in the spellchecker's dictionary?


Apparently, there's not much close to benefactive; the only thing Proszynski's spellchecker could suggest was generative.  My Word for Mac spellchecker does the same.  At least, Proszynski remarked, the spellchecker offered a linguistics term.

I tried a few more grammatical terms, more or less at random, on my Mac, and found that ergative and illative were in its dictionary, but (no surprise) semelfactive was not -- simulative was the only alternative offered -- nor was antipassive, for which the entertaining ant passive was offered.  Other terms elicited a richer set of substitutes:

allative: illative, ablative, elative, ablatives, allusive

inessive: inside, intensive, emissive, indecisive, inactive, insider, insides, immersive

But on to Webster and Tigger.  This time the surprise was what was IN the spellchecker's dictionary.

Tigger: nigger, rigger, digger, bigger, trigger, tiger

My own spellchecker won't go there (but it preserves caps):

Tigger: Tiger, Trigger, Tigers, Digger

Webster wonders, "do any spell checkers have lists of words that are in the dictionary but that they won't suggest?"  I see that my own spellchecker DOES have nigger in its dictionary.  Maybe it blocks the word as a suggested replacement, or maybe its search algorithm just treats it as too far from Tigger.

[Added 2/18/08: five people (so far) have written to verify that many spellcheckers, including various releases of the spellchecker for Microsoft Word, do indeed block taboo and slur vocabulary as replacements.  The alternative would be to remove these words from the spellchecker's dictionary.  But then, as Jeff Erickson points out to me, the program would underline these words in red when they appeared in text, thus drawing attention to them.]

[Added 2/18/08: Bexquisite reports that a Firefox spellchecker suggested as replacements for fuchsia: Auschwitz, obfuscate.  Not at all helpful.]

Posted by Arnold Zwicky at 02:04 PM

The History of English Obscured

I was startled the other day when I wandered into Borders and saw, prominently displayed on the central "new books" table, M.J. Harper's book The Secret History of the English Language (Brooklyn, NY: Melville House Publishing, 2008). I didn't have time to give it a thorough inspection, but it looks like the same book, except for the title, as M.J. Harper's The History of Britain Revealed (London: Nathan Carmody Independent Publishers, 2002). It's possible that I'm the only Language Log reader who has encountered this bizarre book, but the current amazon.com Sales Rank (127,198) of the 2008 publication guarantees that a lot of people out there are getting to know it. That Sales Rank, and the readers' reviews of the earlier version on amazon.com, provide yet another sad piece of evidence that linguists are not succeeding in getting the word out to the general public about the nature of language -- in this case, the nature of language change.

The reason I'm familiar with Harper's 2002 version is that Mark Newbrook and I reviewed it for The Skeptical Intelligencer in 2004. Here's how our review begins:

In this curious little book Harper proposes a radically revisionist view of the history of the modern English language, continuing his record of promoting drastically nonstandard historical theories. Here he argues that Modern English, while related to Old English, is not descended from it (and that Middle English never existed, except as a highly artificial literary variety). Modern English, according to Harper, has been in existence since ancient times, and is in fact the ancestor of most modern western European languages. On page 134 he presents a family tree in which English, at the apex (or root), splits on the one hand into French and thence into Provençal, Catalan, Spanish, Portuguese, Italian, and (in parentheses) Latin, and on the other hand into German, from which Anglo-Saxon springs. In Harper's schema, Latin was thus not the ancestor of the Romance languages, but was instead an invented language. A further upshot of all this is, as he himself emphasises, that the vast majority of etymologies traditionally given for English words are wrong.

You don't believe anyone could write such things? You think we were exaggerating to make Harper look silly? Not so. Here's the blurb you'll find on the amazon.com website for The Secret History:

In a hugely enjoyable read, not to mention gloriously corrosive prose, M.J. Harper slashes and burns through the whole of accepted academic thought about the history of the English language. According to Harper: The English language does not derive from an Anglo-Saxon language. French, Italian, and Spanish did not descend from Latin. Middle English is a wholly imaginary language created by well-meaning but deluded academics. Most of the entries in the Oxford English Dictionary are wrong. And that's just the beginning. Part revisionist history, part treatise on the origins of the English language, and part impassioned argument against academia, The Secret History of the English Language is essential reading for language lovers, history buffs, Anglophiles, and anyone who has ever thought twice about what they've learned in school.

I have to confess that I did not enjoy Harper's `gloriously corrosive prose'. I also didn't enjoy the quotes Amazon gives from various newspapers about the book: "Unusual, funny, and provocative...This fascinating book is a useful investigation into the ways in which history is constructed and the dangers of unassailable academic truths" (New Statesman); "Mind-blowing, incredibly entertaining stuff....A well-written and entertaining book" (Daily Mail); and "The best rewriting of history since 1066 and All That" (Fortean Times). That last one reads almost as if they think Harper meant his book as a spoof; 1066 and All That is in fact a hilarious spoof. But I'm pretty sure Harper isn't kidding. I think he's serious. Faithful Language Log readers have seen other examples of credulous journalists, for instance here, so it's not a big surprise that the New Statesman finds the book "useful".

And at least some other readers are also convinced that Harper's onto something good. Amazon doesn't yet have any reader reviews on the 2008 publication, but here's how the three reviewers of the 2002 version assess the book on amazon.com: #1 (5 stars): "...one spicy little zinger of a criticism....It's shocking, incisive, sometimes profane, gratuitously insulting, sarcastic, gleefully perverse -- and ultimately, disconcertingly plausible. In the end, it just makes so damn much sense." #2 (5 stars): "Common teaching has it that the Anglo-Saxons invaded Britain and forced the native population to speak the language we now call English....This goes quite against the evidence elsewhere, in which invading forces failed to have any significant effect on the native languages, other than a few words and phrases borrowed from the conquered peoples. ...He challenges us...never to accept that something is true just because a lot of people with letters after their name claim it to be true. Overall, a highly recommended read for anyone who's interested in the study of languages, or history in general." #3 (3 stars): "It's a needlessly insulting little book, and at least partially incorrect. Still, it's an interesting book because it may be partially right. I hope the right people read it, and manage to overlook the insults and errors."

I was curious about the publishers, so I checked them on the web, but they didn't offer any insight into the publishers' motives (if any, other than making money). Nathan Carmody, according to Google, is "A small publishing company specialising in unorthodox and revisionist works of an academic nature". That description is accompanied by a link to Nathan Carmody's website; following that link gets you only to a page entitled "WWW.INFAMOUSADVENTURES.CO.UK" -- where there's not a book in sight. Melville House Publishing seems to be a respectable and innovative publisher, but I couldn't find any mention of Harper's book on their website, presumably because they haven't updated the site since the book came out.

In any case, we find again that Harper and his ilk make "so damn much sense", while linguists, contaminated by Establishment connections (like, academic training and jobs relevant to that training), are hide-bound types blinded by meaningless tradition, and therefore sure to be wrong. (One might ask why the public is so ready to believe someone who asserts that "the experts" are deluded on a whole raft of issues but who offers not one single piece of evidence in support of his assertions; but one would be foolish to bother asking.) A consolation is that David Crystal, a highly-respected linguist whose numerous books written for the general public are all excellent, has a new book out with an Amazon Sales Rank of 57,081 -- The Fight for English: How Language Pundits Ate, Shot, and Left (Cambridge University Press, 2008). And Steven Pinker's 2007 book The Stuff of Thought: Language as a Window into Human Nature (Viking, 2007) is ranked at #1,078. So we don't have to conclude that we're fighting a losing battle with unreason.

Posted by Sally Thomason at 12:08 PM

60%, 0%, whatever

In yesterday's post on "Behindology", I suggested that the Italian national pastime of dietrologia -- the "technique of the double, triple, quadruple hypothesis" that aims "to detect, behind the apparent causes, true and hidden designs" -- is nothing but Theory of Mind (ToM) gone wild. (Or perhaps it's ToM responding appropriately to Italian socio-political realities, I'm not sure.) In any case, the reason that we've discussed ToM some 30 times in other LL posts should be obvious from the definition that I gave a few years ago ("Mind-reading fatigue", 11/8/2003):

Theory of mind is a term introduced by Premack and Woodruff (1978) to refer to a set of abilities that may be uniquely human: to attribute mental states such as beliefs, knowledge and emotions to self and others; to recognize that the mental states of others many differ from one's own; to use these attributed states to explain and predict behavior; and to predict how such mental states would be affected by hypothetical actions.

Humans may not be the only animals with ToM abilities -- it's surprisingly hard to tell -- but I think everyone agrees that we're a lot better at this than chimps and gorillas. So it's not surprising that there's been a lot of interest, over the past couple of decades, in the idea that ToM abilities might be an evolved cognitive specialization, a sort of "mental module". It's been suggested that this ability is neurologically localized (the paracingulate cortex has been mentioned), and that at least some parts of the autism spectrum might be related to ToM deficits.

As a result, there are many reasons to be interested in "twin studies" that are designed to tease apart the genetic and environmental influences on ToM abilities. And if such studies are set up to distinguish ToM abilities from general verbal abilities, so much the better.

Unfortunately, as we've mentioned a number of times recently, such studies are quite difficult to interpret. And in the course of looking for something else, I recently stumbled over a really suprising example of these problems.

I'm going to leave the detailed analysis for another post. But today, I want to set the stage by quoting the quantitative conclusions of two studies with the same first author, published six years apart, which used the same experimental design (ToM and verbal IQ tests on monozygotic  vs. dizogotic twins) and the same statistical method (analysis of variance), but came up with radically different estimates of the genetic contribution to individual differences in ToM abilities.

Claire Hughes & Alexandra Cutting, "Nature, nurture and individual differences in early understanding of mind" (Psychological Science, 10, 429– 432, 1999) summarize their results like this:

The model attributes the variance in understanding mind to substantial genetic influence [60%], negligible shared environmental influence [7%], and moderate nonshared environmental influence [33%]. Of particular interest are genetic influences on theory of mind and verbal IQ. Two thirds of the genetic variance in theory of mind was independent from genetic variance in verbal IQ (.632/(.452 + .632)). In other words, only about a third of the genes influencing theory of mind also influence verbal IQ.

Smells like an evolved mental module, right?

But now read the statistical summary from Claire Hughes, S.R. Jaffe, F. Happe, A. Taylor, A. Caspi & T. Moffit, "Origins of Individual Differences in Theory of Mind: From Nature to Nurture", Child Development 76(2): 356-370, 2005.

As Table 1 shows, MZ and DZ correlations for ToM were identical (r=.53), suggesting substantial shared environmental influence but negligible genetic influence on individual differences in ToM. Table 2 summarizes the goodness-of-fit statistics and parameter estimates from the quantitative genetic modeling of these data. The proportion of variance accounted for by the latent genetic and environmental factors can be calculated by squaring each of the parameter estimates. For example, genetic influences accounted for 7% of the variance in children's ToM (i.e., .26 × .26). The strongest influences on individual differences in ToM were shared and nonshared environmental factors, which accounted for 48% and 45% of the variance, respectively.

Because genetic influences on ToM were nonsignificant in the full univariate model, we tested the fit of a more parsimonious model in which these genetic effects were hypothesized to be zero. The fit of the reduced model was not significantly different from the fit of the full model, χ2diff(1)=.81, ns. Thus, genetic factors do not account for significant variation in 60-month-old children's ToM.

And the summary from the same paper of the results of fitting a bivariate model:

In this large sample of 1,116 pairs of 60-month-old twins, 44% of the variation in ToM scores was accounted for by ToM-specific nonshared environmental influences, 20% by ToM-specific shared environmental influences, 21% by common shared environmental influences on ToM and verbal ability, and 15% by common genetic influences.

[Note that "shared" here means "familial"; and "common" means "influencing both verbal ability and ToM to the same degree".]

So quantitative estimates for the proportion of variance due to genetic influences (whether common to ToM and verbal ability or not) range from 0% to 60%, with stops along the way at 7% and 15%. (Another study, which I've spared you, has other intermediate values as well.)

What's going on? The main difference in the two papers is the subject pool: the 1999 paper involved 119 pairs of 4-year-old twins, apparently selected without any special demographic criteria; while the 2005 paper was based on 1,116 pairs of 5-year-old twins, selected to be representative of the SES demographics of the UK as a whole.

Why should this have made such an enormous difference? Stay tuned to find out.

[ Cosma Shalizi writes:

Actually, substantial genetic variance is exactly what I would _not_ expect from an _evolved_ mental module. Imagine a trait under directional selection, say positive (to map on to the ToM case). Since high ToM scorers have higher fitness, and high ToM scores are at least partly genetic, the next generation is disproportionately descended from those with high-ToM-score genes, and their genetic variance in ToM is reduced. Iterate until remaining differences in ToM no longer have substantial fitness effects, and you are left with a very small amount of genetic variance in ToM. I suppose its possible that the environmental variance could be shrinking as well, so as to keep the heritability constant or even increase, but the magnitude of the genetic variance should be small either way.

Alternately, one could conjecture that ToM is not an evolved but an _evolving_ mental module, _recently_ brought under selection. But this gets us into Julian Jaynes territory...

Certainly an ability that's never been under selective pressure would be expected to show quite a bit of genetic variation. So the fact that a DZ/MZ twin study shows "substantial genetic influence" on individual variation in performance is a lousy argument for past adaptation, since a trait that selection has never operated on ought to be quite variable.

And as Cosma notes, genetic influence on a variable ability is also consistent with evolution in progress. It seems plausible that ToM abilities are still under active selection, since there's quite a bit of individual variation, and clearly some effects on survival and reproductive success. And we don't need to get into Julian Jaynes territory to think that ToM evolution is still underway -- this process might have been going on for several hundred thousand (or even several million) years. (Jaynes believed that consciousness originated less than 3,000 years ago, when the invention of writing fragmented the "bicameral mind", the natural human condition in which language had evolved to communicate racial memories from the right hemisphere to the left hemisphere of the cerebral cortex, which our ancestors interpreted as advice in the voices of the gods. Really.)

But finally, for many evolved traits, the relevant fitness landscape is complex, so that the target of adaptation will not be a genetically uniform population. In the case under discussion, perhaps there are trade-offs between ToM abilities and susceptibility to paranoia, or self-esteem problems, or hierarchical position, or whatever, so that the equilibrium situation retains a lot of genetic variation.

So all this means that "substantial genetic variation" in individual abilities of a given sort is consistent with pretty much any evolutionary status -- no selection, selection far from equilibrium, selection at equilibrium, past selection no longer active, and so on.

In this case, we start with an experimental and statistical mystery: how can one study can find 60% of ToM variance accounted for by genetic factors, while another study (with essentially the same design, the same test instruments, and the same statistical modeling, but just a different sample) yields an estimate of 7%?

But whether the answer is 60% or 7% or (somehow) both (e.g. because of radical differences in the subject pools), Cosma is right, we need a lot more information before we can come to any conclusions about the evolutionary interpretation. ]

Posted by Mark Liberman at 09:46 AM

February 16, 2008

For you, broccoli rabbi, but NO BIKES


Three unconnected observations from the recent scene:

Item 1: My granddaughter Opal (almost 4) in contention with her friend Henry (4 1/2) when she said she'd draw a valentine for him.

Item 2: A Gordon Biersch menu offering:

"Tawny" sirloin ... atop roasted sliced potatoes and broccoli rabbi ...

Item 3: A notice taped to one of the doors of the Stanford building I teach in this quarter (yes, all caps, bold face, AND underlined):

DO NOT BRING
BIKES INTO THE
BUILDING

IF THIS HAPPENS
AGAIN IT WILL
BE REMOVED,
WHETHER OR
NOT IT IS
LOCKED


Item 1.  Valentines.  The incident was reported on by my daughter on the Armstrong-Zwicky family blog:

[Opal] was drawing valentines for people yesterday, and said she'd draw one for Henry. He objected; he wanted to do it himself, he didn't want her to do it for him. She objected; she didn't want him drawing on her paper. Much howling ensued, mostly on Henry's part, but also on Opal's as she said "I just wanted to make him a Valentine and now he's being mean to me!" Henry never fully accepted the proposition that drawing a Valentine for someone was not usurping their rights but doing them a favor.

Ok, YOU try explaining to someone -- in particular, to a 4 1/2-year-old (Henry's father did his best) -- the two readings of for you in

I'll draw a valentine for you.

Opal intended the benefactive/recipient reading of

I'll draw you a valentine.  [the "dative-moved" variant]
or
I'll draw a valentine that is for you.

but Henry heard instead the substitutive 'instead of' reading of

[You can't draw a valentine, so] I'll draw one for you / in your place.

Now, in many circumstances the choice between the benefactive and substitutive readings of for will be biased by accent:

I'm doing this for YOU.  [probably benefactive]
I'm doing this FOR you.  [almost surely substitutive]

(Of course, in context, the other reading could pop out instead.)

What I hadn't realized until I heard the sad story of Opal, Henry, and the valentine card was that sometimes even accent won't do the trick.  Opal almost surely accented the for --

I'll draw a valentine FOR you.

and without further context, this could go either way.  Opal was in the midst of drawing people valentines (she drew me one), so of course the substitutive reading didn't occur to her.  Henry, on the other hand, is, like little kids in general, accustomed to having people assume that he can't do things (or can't do them very well) and offer to do them for him (an offer he usually staunchly rejects), so the benefactive reading didn't occur to him.  If only she'd used the dative-moved variant.

Item 2.  Rabbi.  The startling broccoli rabbi (for broccoli rabe, a bitter green) is not just a one-shot glitch at Gordon Biersch; I recently got 39 (dupes removed) Google webhits for it (including a number from recipes and restaurant menus).  Granted, this is versus 900 for broccoli rabe, but it's still a significant number for such a remarkable error.

There are several ways broccoli rabbi could have arisen, and it's entirely possible that different mechanisms were at work in different occurrences.

It could be a "completion error", a typo that results you start writing or typing a word and then drift part-way in to another word.  I do this all too often with -ation and -ating words -- starting the verb COOPERATING but ending up with COOPERATION, for instance.  And several people have reported on the American Dialect Society mailing list that their intention to type LINGUISTS frequently leads them into LINGUISTICS, which then has to be truncated.  (This discussion on ADS-L followed my typing "original Broadway case", with CASE instead of CAST, and commenting on it.)  So RAB- ends up being completed by -BI.

Or it could be the result of automatic word completion (essentially the automated version of the typo), assuming that RABE is not in the software's dictionary, or is marked as much less frequent than RABBI.

Similarly, it could be a Cupertino "correction" of RABE to RABBI, assuming that RABE is not in the spellchecker's dictionary.  (We've been writing about the Cupertino effect on Language Log for two years now, beginning with a posting by Ben Zimmer and continuing through about fifteen more, from various hands.)  I would have expected a spellchecker to go for RAPE or RAVE rather than RABBI, though; in fact, the spellchecker on my Word for the Mac suggests: ROBE, RUBE, RAPE, RAVE, RABBI, RAGE, RACE, RABBET.

All three mechanisms would lead us to expect at least a few errors with RABE replaced by RABBIT, a word I assume is even more frequent than RABBI.  But there are no relevant hits for broccoli rabbit (though you can find an interesting-sounding recipe for a rabbit dish with broccoli in it).

It could, of course, be an eggcorn, with the unfamiliar word rabe replaced by the more familiar rabbi.  It would be nice to know if any of the people who came up with broccoli rabbi have some sort of story in mind in which rabbis are involved.  But I suspect that it's a "demi-eggcorn", a re-working of an expression by replacing a semanticaly opaque element by some similar meaningful expression, even if that doesn't contribute to the meaning of the whole.  The classic English folk etymology sparrow grass for asparagus has one clear eggcorn piece, grass for -gus (asparagus spears resemble grass stalks, and the asparagus in flower resembles a fluffy grass; but the many species of asparagus are actually in the lily family), and one demi-eggcorn piece, sparrow for aspara-.  Who knows how sparrows are involved?  But at least sparrow is a recognizable word of English.

In a similar vein, who knows what rabbis have to do with the bitter greens in question?  But at least rabbi is a recognizable word of English.

Though I suspect that rabbi is some ordinary kind of error, Ned Deily has suggested to me that it might spring from a dialect form in German -- maybe rabi [ra:bi] or even rabbi [rabi] -- meaning 'turnip' (broccoli rabe and turnips are both in Brassica rapa subspecies rapa -- the brassicas are something of a taxonomic morass).  After all, there's kohlrabi.

But kohlrabi, the OED tells us, goes back to Italian cavoli rape, which is the plural of cavolo rapa 'cole-rape', with its first element altered through the influence of German Kohl 'cabbage' (yet another brassica) -- an element clearly visible in modern English cole slaw (made from cabbage) and more distantly discernible in cauliflower (another brassica) and kale (of several types -- still more brassicas).  (No, the -col- piece in broccoli is not an occurrence of this element.  Broccoli is a diminutive of brocco 'shoot, stalk'.)

So maybe an Italian (rather than German) dialect was the source.  I've found the following as possibly relevant variants of broccoli rabe:

broccoli raab [a spelling that represents clearly the pronunciation of rabe -- an Italian dialect pronunciation of rapa or rape]

raab
broccoli rape
broccoli di rape
rape
rapini
rappone
cime di rapi
rappi

[In recent discussion on the ADS-L, Alison Murie suggested that the aa in raab might have influenced the bb in rabbi.  A certain number of spelling errors in English arise from people's recollection that "there's a double letter in there somewhere", and aa is infrequent in English spelling, while bb is unremarkable.  Possibly.]

[Further side note: rape [rep] as the name for these greens in English has an understandably unhappy history.  Even rapeseed oil, for the cooking oil made from the seeds of the rape plant, is edgy -- which is why we now have canola oil, made from a variety of rapeseed originally developed in Canada.]

The variants in the list above edge close to rabi or rabbi in Italian (or German) but don't quite make it.  I wouldn't be at all surprised if some Italian or German dialect with rab(b)i for 'turnip' and/or 'broccoli rabe' turned up.  But I find it hard to credit that the menu writers at Gordon Biersch's corporate headquarters -- who distributed this spelling to GB's 17 locations (from Honolulu to Atlanta, Miami, and the DC area) -- were drawing on dialect names for the greens in Italian or German.  I hold to some version of the (demi-)eggcorn account.

I should add that the staff of my local GB have been fascinated by my interest in this oddity on their menu.  Of course they have no idea how it originated.  And, though I'll forward a link to this posting to the relevant staff people at corporate headquarters (with a copy to the locals), I don't expect to be illuminated.  Mostly, when you ask people about such things, you get one of two responses:

(a) Isn't that how you say/spell it?

(b) Oh, I have no idea where that came from.

Reasonably enough.  Ordinary people shouldn't be expected to reflect on the sources of what they say or write; they're too busy unconsciously picking stuff up from what they hear and read, and then saying and writing it.

Item 3.  NO BICYCLES.  This notice is certainly inept, in two ways, both having to do with the design of the wording for its audience.

First way: the shift from plural bikes ("do not bring bikes into the building") to singular it ("it will be removed, whether or not it is locked").  It's just fine to prohibit bikes [generic plural] in the building, and to use a generic singular bike following this plural ("any bike in the building will be removed").  It's also fine to lead with a plural ("do not bring bikes into the building") and follow that with an anaphoric pronoun they ("they will be removed").  It's also fine to lead with a generic singular ("do not bring a bike into the building") and follow that with an anaphoric pronoun it ("it will be removed").  But if you lead with the plural, a singular it is very hard to interpret.  A well-intentioned reader can work out what must have been intended, but it takes real work.

Second way: "if this happens again", with its reference to a history that only particular miscreants might have access to.  A well-intentioned reader will speculate that the building supervisor (the writer of the notice) was writing in medias res and will supply a plausible background story: oh, this must be in response to some specific incident(s) in which someone brought a bike into the building, and it's aimed directly at the offender(s) -- though it will also serve as a warning to everyone else.

It would have been so much clearer to jettison the previous history and to stick to one number in both parts of the notice.  Here are two (of several) possibilities:

  Do not bring bikes into the building.  If you do, they will be removed ...

  Do not bring a bike into the building.  If you do, it will be removed ...

[Added 2/18/08: several readers have noted that the notice can easily be read as a threat to remove the BUILDING, whether or not it is locked.]


Posted by Arnold Zwicky at 01:04 PM

Behindology

The first chapter in Tobias Jones' The Dark Heart of Italy is "Parole, Parole, Parole" ("Words, Words, Words"). And my favorite word in the chapter is dietrologia, which Jones explains as follows:

As I began studying postwar Italian history, it became obvious that surrounding any crime or political event, there are always confusion, suspicion, and "the bacillus of secrecy." So much so that dietrologia has become a sort of national pastime. It means literally "behindology," or the attempt to trump even the most fanciful and contorted conspiracy theory. Dietrologia is the "critical analysis of events in an effort to detect, behind the apparent causes, true and hidden designs." [Carlo Ginzburg, The Judge and the Historian.] La Stampa has called it "the science of imagination, the culture of suspicion, the philosophy of mistrust, the technique of the double, triple, quadruple hypothesis." It's an indispensable sport for a society in which appearance very rarely begets reality.

Later in the book, Jones quotes Adriano Sofri, imprisoned for a political murder he claims not to have committed: "Dietrologia ... exalts an intricate intelligence".

The mode of reasoning in dietrologia is abduction: inference to the best explanation. Abduction is a good thing -- it's a key component of the machinery of science -- so why does abduction here lead to "fanciful and contorted" theories? There seem to be two problems:  a lack of effective coupling to the friction and inertia of fact, and an excessive value placed on indirect and even counter-intuitive explanations.

The sad fact is that science itself is sometimes subject to similar problems. And so is everyday life, even for those of us who are not convinced that the moon landings were faked or that the 9/11 attacks were staged by the CIA and the Masons. There's a well-known paper (Hobbs, Stickel, Martin and Edwards, "Interpretation as Abduction", ACL 26, 1988), which argues that

... the interpretation of a sentence is the least-cost abductive proof of the logical form of the sentence. That is, to interpret a sentence one tries to prove the logical form by using the most salient axioms and other information, exploiting the natural redundancy of discourse to minimize the size of the proof, and allowing the minimal number of consistent and plausible assumptions necessary to make the proof go through. Anaphora are resolved and predications are pragmatically strengthened as a by-product of this process.

Another approach of the same sort is Wilson and Sperber's Relevance Theory:

Relevance theory may be seen as an attempt to work out in detail [the claim] that an essential feature of most human communication, both verbal and non-verbal, is the expression and recognition of intentions [...] [This] inferential model of communication [is] an alternative to the classical code model. According to the code model, a communicator encodes her intended message into a signal, which is decoded by the audience using an identical copy of the code. According to the inferential model, a communicator provides evidence of her intention to convey a certain meaning, which is inferred by the audience on the basis of the evidence provided. An utterance is, of course, a linguistically coded piece of evidence, so that verbal comprehension involves an element of decoding. However, the linguistic meaning recovered by decoding is just one of the inputs to a non-demonstrative inference process which yields an interpretation of the speaker's meaning.

But our inferences about the beliefs, goals and intentions of others are seriously underdetermined by their actions and statements. And those of us who think we're more insightful than average may just be more imaginative. We all know people whose interpretation of communicative interaction is dietrological. Usually the "the true and hidden designs" that they infer are negative ones, amounting in extreme cases to paranoia; but sometimes the "fanciful and distorted" theories are positive ones, for example those associated with delusions like erotomania.

The trouble is, sometimes other people's intentions really do merit the evaluation of double, triple, quadruple hypotheses. But as far as I know, the practical question of how to prevent interpretive abduction from turning delusional hasn't been seriously investigated by any of the various research communities that study communicative interaction. Just aiming to "minimize the size of the proof" seems likely to overvalue behindological fantasies. Appropriate weighting by accurate prior probabilities would be one way to stay sane -- but what if such priors aren't available?

For an amusing fictional object lesson, also Italian, see "The perils of semiotic speculation" (3/14/2006).

[Hat tip to Chris Cieri for The Dark Heart of Italy.]

Posted by Mark Liberman at 11:33 AM

February 15, 2008

Making due

Noah Shachtman, "Air Force: $144 Billion a Year Not Enough", Wired, 2/11/2008:

The Air Force can't make due on $144 billion a year. The service is telling Congress it needs nearly another $19 billion for fiscal year 2009 -- including about $1.7 billion worth of extra fighter jets. 

As Arnold Zwicky notes in the Eggcorn Database's entry,

The idiom "make do" is pretty opaque, and I guess that "due" provides some sense of obligation to the expression.

Today's Google News gives 107 hits for {"make due"} compared to 1,599 for {"make do"}, for a ratio of about 16/1 for the original to the eggcorn. General web search yields 425K vs. 3.24M, for a ratio of 7.6/1. And some people say that copy editors don't due their job!

Checking the OED for the origins of "make do", I find that it seems to have originated as make <something> do , with do used in the sense of "suffice", as in "it will do" -- at least, the earliest citation given is

[1847 C. BRONTË Jane Eyre I. viii. 130 ‘Oh, very well!’ returned Miss Temple; ‘we must make it do, Barbara, I suppose.’]

Checking LION, I find another example where the thing that is made to do is also explicit, but at a distance -- from Maria Edgworth, Patronage (1814):

"My dear," said Mrs. Falconer, looking carelessly at the dress--- "You won't want a very expensive dress for Zara.---"

"Indeed, Ma'am, I shall," cried Georgiana--- "Zara will be nothing, unless she is well dressed."

"Well, my dear, you must manage as well as you can with Lydia Sharpe.---Your last court dress surely she can make do vastly well, with a little alteration to give it a Turkish air."

"Oh! dear me, Ma'am---a little alteration!" cried Lydia, "no alteration upon the face of Heaven's Earth, that I could devise from this till Christmas, would give it a Turkish air.---You don't consider, nor conceive, Ma'am, how skimping these here court trains are now---for say the length might answer, it's length without any manner of breadth you know, Ma'am---Look, Ma'am, a mere strip!---Only two breadths of three quarters bare each---which gives no folds in nature, nor drapery, nor majesty, which, for a Turkish queen, is indispensably requisite, I presume."

Posted by Mark Liberman at 08:27 AM

February 14, 2008

Egg, penis, whatever

With respect to the strange Chinese/English children's blocks discussed a few days ago ("A is for Apple", 2/9/2008), Doug Wilson explains:

The really weird ones are apparently from dictionary look-up errors ... not just taking an unlikely choice from the correct entry, but actually reading a different (but nearby) entry. I don't know why this happened, but I picture a Chinese who is utterly unfamiliar with English using a Chinese-to-English dictionary (either paper or digital). If there are hundreds of other blocks which are correct/reasonable and were not shown, then maybe these are just sporadic errors. Otherwise I imagine a corrupt file, illegible book, or unfamiliar interface.

The key point is that the first Chinese character is looked up correctly, but the wrong word is chosen from the list of words and expressions presumably shown under the head character.

For example, for the block that combines a picture of an egg with the English word Dick, Doug suggests that

"ji1 dan4" 鸡蛋 = "chicken egg" was mistranslated using the entry for "ji1 ba[1]" 鸡巴, translated "penis [colloquial]" in my dictionary but presumably translated "dick" in some other ones.

I agree that something like this must have happened -- but the hypothetical Chinese blocks-designer must be not only innocent of English, but also pretty inept at bilingual dictionary usage. Of course, we've seen plenty of evidence that Brits and Americans -- even highly cultured intellectuals -- can be lexicographically challenged in surprising ways. Still, the Chinese seem to be unusually careless in their approach to such things.

[Christopher Neufeld writes to suggest that the whole thing was a joke or hoax:

Engrish is always good for a laugh. It can occasionally provide insight into the mechanics of another language and the perils of electronic (or poorly informed) translation, but as it's grown in popularity, it's started getting photoshopped. The English-learning blocks are plausible except for the last one, "Destroy the evidence." It's the only block featuring a sentence, and the characters also read "destroy the evidence", so it's not just a shoddy translation. Hilarious, but probably not real.

But Doug Wilson responds:

But the characters on the block do NOT read "destroy the evidence"! Did Neufeld read the wrong entry in the dictionary the same as the block-labeler did?

I've found the expressions in question on-line -- and adjacent! -- here:

http://baike.baidu.com/view/828651.html

You see "miehuoqi" = "fire extinguisher"; the alternate form right after the brackets is the form on the block.

And the next entry is "destroy the evidence".

Here's the block in question:

And an image of the relevant online dictionary segment:

I confess that the picture on the block looks to me more like an old-fashioned gas pump than a fire extinguisher, but perhaps that sort of uncertainty is fitting, given the rest of the situation. ]

Posted by Mark Liberman at 02:11 PM

Adverbs for my love

A beautiful Valentine's Day to all our readers. For my philosopher partner I managed to find a card which had the words passionately, devotedly, fervently, completely, utterly, and absolutely on the front (with the first person singular pronoun as subject and adore you as the predicate). It seemed ideal. When a grammarian loves you, you should expect adverbs. Lots and lots of adverbs.

Adverbs do have enemies; Stephen King has said (in his On Writing) that "the road to hell is paved with adverbs", and his hostility to them follows in a long tradition. A long tradition of pontificating fools who should shut up and write rather than telling us how (nearly all of them unwittingly use adverbs in the very paragraphs in which they condemn them; on this, see chapter 2 of Ben Yagoda's lovely little book If You Catch An Adjective, Kill It, published in 2006). There were adverbs, daffodils, morning tea, and breakfast in bed this morning. And a kiss, of course. When a grammarian kisses you, you stay kissed.

Posted by Geoffrey K. Pullum at 07:18 AM

February 13, 2008

Grace Notes from Underground

The woods are full of linguistic pejorists poised to pounce on every misplaced apostrophe as a harbinger of the imminent collapse of English. But who is there but Language Log to record the occasional burgeoning of punctuational literacy in unlikely places? Like, for example, the semicolon in a recent ad from the New York City Transit Authority. (Hat-tip to Sam Roberts.)

Posted by Geoff Nunberg at 09:32 PM

Apology in Australia

Language Log has visited the topic of linguistic means for apologizing many times, first in my post on the sorry story of Pete Rose, who found himself utterly incapable of making a true apology despite apparently struggling to do it. The topic was taken up again here and here and here and we have returned to it since (see in particular Geoff Nunberg's delightful discussion of apologies in 2006). In Australia, apology has been particularly slow in coming — over ten years — but it finally happened yesterday. The new Prime Minister, Kevin Rudd, managed what John Howard had found himself utterly unable to do for over a decade.

For as long as I have been aware of what was going on in Australian politics (I first visited in 1996, spending time there every year until The Cambridge Grammar of the English Language was finished, and I have retained a strong interest in the place) there has been an insistent demand that the country's political leaders should apologize to the Aboriginal population for the policies that led to what is known as "the stolen generation". The topic is extremely controversial (those who want to read about the side of it that says no apology was ever necessary should look into the work of Keith Windschuttle), but it is fairly clear that large numbers of Aboriginal Australian children (and especially mixed-race children) were taken away from their Aboriginal families to be raised in institutions during the years between the passing of the Aboriginal Protection Act in 1869 and the official discontinuance of the policy a hundred years later, and especially between 1910 and 1970.

The debate is mostly not about whether children were at least sometimes taken from their families (there are many people alive to whom it actually happened, and a significant sum of financial compensation has been paid to at least one of them); it is about frequency, and about peripheral matters like whether words like "stolen" or "generations" are appropriate, and more substantively, about whether the policy was benign — whether it was mostly aimed at rescuing children to protect them from poverty and abuse in disorderly or alcohol-addicted communities. But there is fairly good evidence that, especially in Western Australia, for at least some people the reason for raising Aboriginal and mixed-race children in the white community was to breed the Aboriginality out of them and kill off the entire race and all of the associated languages.

Aboriginal adults who were removed from their families as children have reported that they were taught to regard their culture of origin as evil and their native language as worthless, something to be ashamed of rather than proud of. The many Australian linguists who have spent their careers puzzling out the details of the rich and intriguing grammatical systems found in Australia's hundreds of languages find this particularly tragic. I have known quite a few such linguists, and they seem without exception to have come to love and admire the people whose languages they worked on. My introduction to the fascinating details and complexity of Australian languages, but also to some of the facts about how Aboriginal people had been treated over the years, came from reading Robert M. W. Dixon's extraordinary book The Dyirbal Language of North Queensland (Cambridge University Press, 1972), and the landmark political statement on its dedication page. For me, it was a surprise to arrive in Australia for the first time almost a quarter-century after that book had been published to find that (in the state of Queensland anyway) prejudice against Aboriginal people was still quite strong. I guess I thought, naively, that we were over all that.

We weren't entirely over it. But things had changed a lot. By about 2000 the majority of the Australian population had come to support the idea that an apology should be made. However, conservative Prime Minister John Howard set his face firmly against it no matter what, and he did in that way hold onto the votes of many of the more bigoted rural white Australians. (In Australia most conservatives are in a party known as the Liberal Party; do not try to interpret Australian politics using an American political dictionary.) Howard succeeded in getting re-elected four times, and served eleven years. But he would never apologize to the stolen generations, no matter how popular that would have been.

Kevin Rudd has now done it very simply and very clearly, to great applause from all over Australia. Wikipedia has an article (tagged "The neutrality of this article is disputed" at the top) in which the evidence is reviewed, and it already includes the full transcript of Mr Rudd's speech.

Posted by Geoffrey K. Pullum at 04:57 AM

February 12, 2008

Religious Courts

The Archbishop of Canterbury's recent mention of the role that Shari'a law might play in Great Britain has aroused considerable controversy, in part because many people did not understand what he said, as Geoff explained. I too would oppose in the strongest terms the general application of Shari'a law. This does not mean, however, that voluntary recourse to Shari'a law ought not to be permitted to those who wish to avail themselves of it. This is really no different from the use of other alternative means of dispute resolution, such as arbitration. In fact, there is precedent for the use of religious courts in the United States for the resolution of non-religious disputes.

My grandfather was a diamond trader, that is, someone who traded uncut gem quality diamonds. Many years ago he took me to the diamond exchange in New York. (The other major exchanges are in Antwerp, Amsterdam, and Tel Aviv.) When you got out of the elevator, you found yourself flanked by armed guards. The actual trading floor was sealed by a heavy steel door, which was opened electronically by a guard who looked through a bulletproof glass window to the side. There were no ID cards or biometrics: if he didn't recognize you, he didn't let you in.

Once inside, the trading floor was something of a letdown. It looked like a school cafeteria, just a big room with a bunch of bland tables and chairs. I don't recall seeing any women. Most of the traders were Jewish, many of them Hasidim, so there were quite a few black coats and hats. The dominant language was Yiddish, though one also heard snatches of Hebrew, Flemish, French, and English.

The traders circulate from table to table, telling each other what they have or what they are looking for. If one trader has something of interest to another, he takes out the briefke ( a folded paper that forms a sort of envelope) containing them and hands it to the other trader, who spreads it out on the table and inspects the stones. If he is interested, they negotiate. If a deal is reached, they shake hands and the buyer keeps the briefke. No contract is signed, no receipt made out; no money changes hands.

This system works because everybody knows everybody else. If you cheat, you cheat only once, because the word will get around and no one will deal with you again. Nonetheless, disputes do occasionally arise. It is of course possible to take such disputes to the civil courts, but that is unusual. Most disputes among diamond traders are settled by the Beth Din בית דין, the Jewish court. Since the Beth Din has a reputation for fairness, is familiar with the diamond trade, and is faster than the civil courts, even non-Jews often agree to have it adjudicate their disputes. This has not led to the imposition of Jewish law or the breakdown of the separation of church and state.

Posted by Bill Poser at 09:52 PM

February 12, 1809

Today is the 199th anniversary of the birth of Abraham Lincoln, who freed the slaves, and Charles Darwin, who freed the mind.

Posted by Bill Poser at 01:13 PM

The liturgy of lost causes

On 2/6/2008, I tried to calm the genitive anxiety of reader SS, who unwittingly echoed a 1993 James Kilpatrick column on double genitives like "a friend of Bill's". Now, by coincidence, Mr. Kilpatrick opens his column of 2/10/2008 with the very same subject ("Bobby Fisher's What?"):

Bobby Fischer, the Chicago-born chess master, died in Iceland three weeks ago. The New York Times informed its readers:

"The death was confirmed by Gardar Sverrisson, a close friend of Mr. Fischer's."

How's that again? What's that hanging possessive doing there? A critical reader is bound to inquire, Mr. Fischer's what? Mr. Fischer's brother? His mother? His dog? We're talking grammar today. The topic ranks with economics as a deadly science, but grammar has to be a constant concern of every writer.

Mr. Kilpatrick's topic, now as in 1993, is not really grammar, unless by "grammar" we mean "Why Almost Everyone's Usage is Wrong". He goes on to castigate the NYT, the Washington Post, and Robert Novak for failing to use genitives in gerund-participial clauses (not "no evidence of it occurring in Indiana" but rather "no evidence of ITS occurring in Indiana"; not "without someone going into a meeting" but "without SOMEONE'S going into a meeting"; not "would result from him being the last man standing" but rather "would result from HIS being the last man standing"). And he closes with a medley of other golden oldies, likewise accompanied by citations of high-profile sinners: "smarter than he", not "smarter than HIM"; "everyone shakes his head", not "everyone shakes THEIR HEADS".

The citations are new, but the "mistakes" are old ones, drawn from a familiar set of linguistic used-to-bes and never-weres. (The prescription against double genitives is one of the never-weres -- Shakespeare used them, and so did Jane Austen.) The liturgical core of peevology is the ritual lamentation of lost causes.

We all have our favorite lost causes -- I tend to carry on about journalists' misquotations, for example -- but grammatical hopelessness seems to be especially popular. Email from Michael Chen pointed out to me that Kilpatrick's 2/10/2008 column made it to the top of Yahoo's "most viewed" list for a while, just ahead of "Bride dies during marriage's first dance".

[Update -- Bruce Rusk writes:

I happened to be on Lexis-Nexis when your most recent LL post appeared, and it inspired me to do a quick check of James J. Kilpatrick's own corpus. Turns out he is as consistent as most peevologists: in the last two paragraphs of an August 1998 column, he defends the phrase "friend of mine" as part of a class of "benign or forgivable redundancies."

In the cited column, Mr. Kilpatrick complains about the needless words in phrases like "free gift" and "past experience", and then observes that

Opposed to such hairy redundancies are the benign or forgivable redundancies. A reader in Carmel, Calif., objects to the "on" in "Singles match play resumes on Sunday." I would leave it in. A gentleman in Tyler, Texas, complains that "of mine" is redundant in, "the senator is a friend of mine." A year ago the AP reported on the suicide of White House lawyer Vince Foster: "Starr's review came to the exact same conclusion reached by three prior investigations." What does "exact" add to "same"?

As an editor, I would not object seriously to any of these. It seems to me that even patently unnecessary words sometimes may serve a purpose. They contribute to clarity or to cadence. I retreated four years ago on "nape of the neck," and I may yet condone "sworn affidavit." Not everyone can define a reprise or locate the nape, and not everyone knows that an affidavit must be sworn. The words may be redundant, but they are words that tell.

In this context, K's concern is whether "of mine" is redundant, not whether it's grammatical. And mine is distinguished from my, in modern English at least, as occupying the position of a nominal head, whereas his (or Fisher's) can also be a possessive determiner. This ambiguity might lead K. to permit "his dog is larger than mine" and "he is a friend of mine", while forbidding "my dog is larger than his" and "I am a friend of his". Then again, it's not clear from his writings on the subject whether he's tried to work out a consistent analysis of the grammatical structures involved in such examples.]

Posted by Mark Liberman at 08:32 AM

February 11, 2008

Searching for (un)clarity in the OED

Geoff Pullum recently brought us up to speed on the case of Dr. Rowan Williams, Archbishop of Canterbury, who has been credited by the British media with the idea that the adoption of shari'a law in Britain "seems unavoidable." As Geoff explained, Dr. Williams didn't really say that, but his remarks were hardly "a model of clarity." Now, according to the (UK) Times columnist Alan Hamilton, Dr. Williams has come out with a statement acknowledging his lack of clarity (if not actually apologizing for it):

“Some of what has been heard is a very long way indeed from what was actually said in the Royal Courts of Justice last Thursday. But I must, of course, take responsibility for any unclarity in either that text or in the radio interview, and for any misleading choice of words that has helped to cause distress or misunderstanding among the public at large and especially among my fellow Christians.”

Hamilton chose to follow up the Archbishop's statement by questioning a single word from it: unclarity.

So, Christians: to pride, lust and the rest, add the eighth deadly sin of “unclarity” — a word that is obscure enough not to appear in the Oxford English Dictionary.

I've said it before and I'll say it again: people who don't know how to use a dictionary shouldn't try to appeal to lexicographical authority to advance an argument.

It's true that if you browse through the headwords of the OED, you will not find unclarity. But there are a lot of words in the OED that don't merit their own headwords. Chief among them are words regularly formed by common prefixes like un-, where the meaning is "compositional," i.e., transparently understood by the combination of the prefix plus the root (as in un- + clarity). It's a bit laborious to wade through an entry like un- in the print edition of the OED, but the electronic versions (either on CD-ROM or online) will take you directly to the relevant part of the entry when you type in the prefixed word. Entering unclarity into the online OED search form brings you to this part of the un- entry:

A bit further on are the historical citations for the word:

This entry hasn't been updated since the 1989 second edition, so it has yet to be enriched with citations antedating the 1934 Webster's dictionary entry, now easily found in digitized texts — for instance, in the 1912 book The New Realism. Still, the OED found room for a 1936 cite from the journal Mind and a 1980 cite from Le Carré's Smiley's People, suggesting that this is a word that can be found in a variety of reputable sources, albeit infrequently.

Does the fact that unclarity is indeed in the OED suddenly make it less "obscure"? I'd argue that the word was never obscure to begin with, regardless of its inclusion in the OED, since it is a transparent combination of a common prefix and a common noun. That, however, does not have any real bearing on the Archbishop's sad situation. The knives are out for him now, and his every word is used against him — even words seeking to clarify his lack of clarity.

[Update: Arnold Zwicky notes that there are huge number of web hits for unclarity, including this gem:

Vagueness is standardly defined as the possession of borderline cases. For example, 'tall' is vague because a man who is 1.8 meters in height is neither clearly tall nor clearly non-tall. No amount of conceptual analysis or empirical investigation can settle whether a 1.8 meter man is tall. Borderline cases are inquiry resistant. Indeed, the inquiry resistance typically recurses. For in addition to the unclarity of the borderline case, there is normally unclarity as to where the unclarity begins. In other words 'borderline case' has borderline cases. This higher order vagueness shows that 'vague' is vague.
— Roy Sorensen, Stanford Encyclopedia of Philosophy, "Vagueness"

Arnold also observes that unclarity provides an instance of the "bracketing paradoxes" in morphology ("unhappier", "transformational grammarian", etc.). Semantically, it's:

[ un + clear ] + ity

but phonologically it's:

un + [ clar + ity ] .

]

Posted by Benjamin Zimmer at 11:09 PM

February 10, 2008

Watch your pronouns and verbs!

The January 2008 issue of the FBI Law Enforcement Bulletin contains an article by Special Agent Vincent Sandoval about the methods his agency uses and advocates to local law enforcement officers as they struggle to determine whether suspects are lying or telling the truth. What he wrote came as no shock to me because I've heard agents and consultants lecture about this in the past and I've read most of the sources from which they get their ideas. But on the remote possibility that Language Log readers are not regular subscribers to this journal, and in an effort to give you helpful  guidance if, for some reason, you should ever be interrogated by the police, I thought it might be a public service to pass along some of S.A. Sandoval's tidbits about how that agency can tell that you're lying by the way you use pronouns and verbs.

Your pronouns can indicate deception

Sandoval points out that in sexual assault cases, especially when the suspect alleges that the sexual contact was consensual, investigators should listen carefully for the absence of the pronoun, we. To illustrate, he offers one suspect's description of what happened after the sex act:

I put her clothes on her and, um, and she and I walked outside and said our  good-byes. I gave her a hug and told her I had a good time and she talked for a minute and then I left. I walked home.

  The author points out that the suspect never used the pronoun, we, in his description and goes on to advise cops who read this article:

...this would suggest that a healthy relationship did not exist between the two individuals and, thus, increases the likelihood that the sexual contact was less than consensual.

Okay, folks, repeat after me, "we got dressed," "we walked outside," "we said our good-byes," "we hugged," "we talked," and "we  walked home." Learn to say we when the law talks with you. Otherwise, you're being deceptive.

Not content with his helpful explanation of the absence of we, Sandoval goes further:

Investigators should pay attention when writers or speakers change a word or phrase used to describe any verbal interaction with  the same person, for example, when an interviewee says we discussed but later switches to he and I talked. 

Sandoval claims that such pronoun shifting reflects the nature of the couple's relationship. Just how this reflects their relationship is not totally clear but, just in case you are ever interviewed by the feds, you probably ought to avoid varying your pronouns.

Your verbs also will give you away

Sandoval also tells us that verb choices are important clues that signal deception, advising:

Such variations could include changing the tense of action verbs, using the passive voice instead of the active voice, and employing 'uncompleted' action verbs.

He describes the principle of past action = past tense and announces:

...individuals may be reliving the events cognitively and thus, resort to using the present tense.

To him, this signals deception. Never mind that we find the use of the historical present tense all around us every day. Just remember not to use it yourself. It tells the police that you're lying.

Turning to the passive voice, Sandoval advises that suspects attempting to conceal or minimize the extent of their involvement say things like:

The pistol was fired by someone.

It was determined that I would drop her off.

Language Log has had a few things to say about the good old passive voice. See  this post by Arnold Zwicky, for example. I've examined hundreds of police reports and FBI 302s over the years. And guess what? I find that the police use passives themselves all the time. Hmm.

I regret to inform you that these ideas are not new. Google locates over 100,000 sources about lying and deception, including programs and books by Mark McClish and Don Rabon that say pretty much these same things about pronouns and verbs. They and others (such as Avinoam Sapir) instruct law enforcement officers about these and other aspects of language and deception in the seminars they promote and sell around the country. Your tax money at work.

Posted by Roger Shuy at 04:11 PM

Ontological Promiscuity v. Recursion

On Friday, the colloquium speaker at the Institute for Research in Cognitive Science was Jerry Hobbs, talking about "Deep Lexical Semantics": a project to express the meaning of words and phrases in a way that's integrated with formal theories of commonsense knowledge. You can learn more about his work in this area from the references here, but that's not what this post is really about. Instead, it's about something that seems to be completely unrelated: the controversy over recursion and its role in human language in general, and in certain languages such as Pirahã in particular.

In brief, the debate has gone as follows. Noam Chomsky and others "hypothesize that FLN [the faculty of language in the narrow sense] only includes recursion and is the only uniquely human component of the faculty of language"; but Ray Jackendoff, Steve Pinker and others deny this, arguing that "language is a complex adaptation for communication which evolved piecemeal". Meanwhile, Dan Everett has argued that a contemporary language of Brazil, Pirahã, lacks recursion entirely, a claim that echoes Ken Hale's analysis, several decades earlier, of Australian languages such as Warlpiri; but Andrew Nevins and others have challenged Dan's description.

For more (than you want to know) about these recursion controversies, you could read some old Language Log posts, such as "JP versus FHC+CHF versus PJ versus HCF" (8/25/2005), "Dan Everett and the Pirahã in the New Yorker" (4/9/2007), and "The enveloping Pirahã brouhaha" (6/11/2007).

Recursion, in this context, means "linguistic structures that are embedded inside other structures of the same type". Familiar examples of recursive embedding of sentences include subordinate clauses ("before <sentence>"), sentential complements ("noticed that <sentence>"), and relative clauses ("the book that <sentence>"). In English, these are freely combined -- picking a news story at random from today's NYT, the first sentence has three levels of sentential embedding, and so does the (shorter and simpler) fourth sentence:

[ It could also help the administration make its case that 
  [ some detainees at Guantánamo, where [ 275 men remain, ] would pose a threat if 
    [ they are not held at Guantánamo or elsewhere ] ] ]

For a funnier and more transparently recursive example of clausal recursion, check out the third panel of this recent Cathy strip:

OK, what about Jerry Hobbs' talk? Well, a remark that he made in passing reminded me of idea that he had more than 20 years ago, described in a paper entitled "Ontological Promiscuity'' (ACL 23, pp. 61-69, July 1985). The abstract begins like this:

To facilitate work in discourse interpretation, the logical form of English sentences should be both close to English and syntactically simple. In this paper I propose a logical notation which is first-order and nonintensional, and for which semantic translation can be naively compositional. The key move is to expand what kinds of entities one allows in one's ontology, rather than complicating the logical notation, the logical form of sentences, or the semantic translation process.

The basic idea is to replace recursive embedding (in the semantics) with reference to abstract entities created for the occasion -- events, time, places, propositions and so on. To illustrate how this works, he analyzes a complex journalistic sentence:

(4) The government has repeatedly refused to deny that Prime Minister Margaret Thatcher vetoed the Channel Tunnel at her summit meeting with President Mitterand on 18 May, as New Scientist revealed last week. [New Scientist, 6/3/1982, p. 632]

When this sentence is analyzed following his prescription, the result is a conjunction of 11 simple predications, with the embeddings all replaced by anaphoric references to abstract individuals (via the subscripted variables En):

The representation of just the verb, nominalizations, adverbials and tenses of sentence (4) is as follows:

The upside-down Vs -- "wedges" -- are semantic conjunctions, so translated into heavy English, this means something like "Thing1 is completed, and Thing1 is repeated, and Thing1 is the Government refusing Thing2, and Thing2 is the Government denying Thing3, and Thing3 is Margaret Thatcher vetoing the channel tunnel, and ..."

As Jerry explains,

Sentence (4) shows that virtually anything can be embedded in a higher predication. This is the reason, in the logical notation, for flattening everything into predication about individuals.

This is more or less what I had in mind when I wrote that Dan Everett's no-embedding claim "imposes a lot fewer constraints on what the Pirahã can say than you might think" ("Parataxis in Pirahã",  5/19/2006); or when I suggested a fake homework assignment ("Communicating", 7/29/2007), to

... rewrite paratactically (i.e. by stringing phrases together without embedding, using explicit or implicit anaphora to keep track of the connections) what Jeremy expressed syntactically (here, using complement clauses): "Tell her that Brittany said that Zuma said that Sara said that it's okay with her if that's what D'ijon said."

I forgot a key precedent: back in 1985, Jerry Hobbs suggested that the semantics of all natural-language sentences ought to be mapped into a paratactic form, i.e. a conjunction of simple propositions, tied together with anaphoric references to appropriate abstract entities. Jerry's purpose was to make the logic of discourse more tractable:

The real problem in natural language processing is the interpretation of discourse. Therefore, the other aspects of the total process should be in the service of discourse interpretation. This includes the semantic translation of sentences into a logical form, and indeed the logical notation itself. Discourse interpretation processes, as I see them, are inferential processes that manipulate or perform deductions on logical expressions encoding the information in the text and on other logical expressions encoding the speaker's and hearer's background knowledge.

Towards that end, he argued, the logical notation

... should be syntactically simple. Since discourse processes are to be defined primarily in terms of manipulations performed on expressions in the logical notation, the simpler that notation, the easier it will be to define the discourse operations.

And so his candidate for the logical notation is a conjunction of simple propositions, where everything that appears to motivate more complex structures is handled by anaphoric references to entities in a promiscuous ontology.

Yesterday, when I pointed out to him the connection between this work and the current recursion controversy, Jerry got worried. "But wait a minute -- whose side am I on?" he asked.

I'm not sure. Maybe he's against the language=recursion side, since his proposal suggests (as the ancient Greeks believed) that the choice between syntaxis and parataxis is just a stylistic or rhetorical one, so that a language can give up syntactic recursion without suffering any essential expressive loss. Or maybe he's on the language=recursion side, since his proposal essentially replaces structural recursion with a sort of ontological and anaphoric recursion. Or maybe his claim gives Dan Everett a more precise way to phrase the claim that a culture's ontological puritanism might translate into avoidance of syntactic recursion. Or again, maybe the point is that nominalizations and (implicit or explicit) anaphora are the moral equivalent of recursion, which undermines Dan's assumption of a one-to-one correspondence between ontological attitudes and syntactic resources.

Either way, his proposal clarifies the questions in the recursion wars, at least for me, even if it doesn't make the answers any plainer.

Meanwhile, I'm still working on the Hobbsian translation of "You're one of those people who says you're not one of those people who says you're not one of those people".

[Fernando Pereira writes:

I'm not sure that Jerry's proposal really "clarifies the questions in the recursion wars", for two reasons. Formally, as far as I know there is no proof that Jerry's unfolding method preserves the combinatorial distinctions of recursive formulas, in particular with respect to scope ambiguities. Computationally, what Jerry's proposal reminds me of are ideas like "last-call optimization" in programming languages and "recursive ascent" in parsing (an idea of Rene Leermakers that is less known than it deserves and generalizes left-corner parsing, the Marcus deterministic parser, and others). In both cases, recursive computations that do not really need to be recursive are automatically transformed into iterative recursions by recognizing when elements saved in the recursion stack can be overwritten because they will never be used again. However, such transformations still require a stack data structure for genuine center embedding. Which, in Jerry's proposal, would require an unbounded number of pending anaphoric relations. So, it's not clear that Jerry's proposal makes testable predictions with respect to sentence processing.

Well, I think it's clear that Jerry meant "promiscuous" to embrace an unbounded number of ontological one-night-stands. And even though I haven't seen even a sketch of a proof that there's a well-defined translation between recursion in the automata-theory sense and a formal language with "ontological promiscuity" in its semantics, I reckon that if you get (to keep a record of) as many typed and indexed entities as you care to create, you ought to be able to imitate a stack, and (for example) prune Σ* back to some arbitrary context-free language.

But I had in mind a much narrower point.

In Dan Everett's current account, the sentences that he once analyzed in his dissertation as involving sentential embedding, instead involve things like nominalizations and anaphoric references to evoked situations. That's the Paratactic Way, and it's also pretty much how Jerry argued that the semantics of English ought to be modeled.

This might or might not be a good way to describe NL meaning, whether in English or in Piraha. But it makes me wonder about some of the arguments on both sides: Everett's argument that the Piraha avoid clausal embedding because of a cultural commitment to certain ontological restrictions; and Nevins' argument that the Piraha can't possible lack clausal embedding, because that would prevent them from the expressing a significant part of the range of things that any human language (including, clearly, theirs) can express.

Fernando's response:

These arguments are more evidence, if any was needed, of the deleterious effect of a kind of formalistic thinking that poisoned linguistics with the advent of transformational grammar. Harris was in my reading quite careful not to make claims based on the form of particular representations; in fact, if we think of Jerry's "representation" as just a concise notation for NL sentences, then Jerry's account has much in common with Harris's account of nesting. On the other hand, Chomsky and his followers, maybe influenced by early formal language theory, have gone into assigning more computational/cognitive import to particular representations than they can actually bear, missing the existence of efficient reductions between certain types of representations. Both Everett and Nevins are amusingly similar in their biases even though their conclusions are opposed.

The empirical question for me is how embedding vs flat+anaphora use memory. In embedding, information about incomplete clauses must be kept in short-term memory as nested clauses are processed. In flat representations, referents must stay available for easy retrieval by later clauses in the discourse. Are these memory systems the same or different? My slightly informed guess is that they are different. The first is more likely to be shared with those involved in perceptual-motor nesting in complex tasks. The second is more likely to be shared with those involved in sequencing complex interactions with peers. A further guess is that both of these systems are more refined in us than in other primates, and both share in supporting language, although the relative shares of the load vary between languages.

Mark Eli Kalderon writes:

Nice post.

The trade-off of recursion for ontology may only be plausible modulo assumptions about what constraints there are on natural language semantics. If the semantics is merely a formal representation, then ontological profligacy is no problem. If, however, a competent speaker who understands a sentence is supposed to be implicitly committed to the entities posited by the semantics, then there may be problems. Consider, for example, Davidson's semantic analysis of adverbs in terms of quantification over events. Works reasonably enough. But consider "a rapidly converging function". If semantics is meant to be ontologically committing, then Davidson's analysis, if correct, would commit us to the existence of events when we are not so committed (and indeed where there are none).

A key insight of early analytic philosophy, as I understand things, was that the routine ontological commitments of natural-language semantics are sometimes disastrously misleading. On this view, we must sometimes either find new ways of talking, or else agree to interpret old ways of talking in artificial and perhaps unnatural ways. It's not a surprise to find ourselves in this situation with the semantics of mathematical language like "rapidly converging function".

But the philosophy of language is even further outside my job description than automata theory is, so I may be missing the point. ]

[Update #2 -- John Cowan writes:

It's interesting to note that Flanagan, Sabry, Duba, and Felleisen introduced the same notion to computer science in 1993, under the name of "administrative normal form" (ANF): the transformation maps "f(g(x), h(y))" into "let v0 = g(x) in (let v1 = h(y) in f(v0, v1))", so that all arguments to functions are constants or variables, never other function calls.

]

Posted by Mark Liberman at 08:14 AM

February 09, 2008

No Spanish on the School Bus

A language rights case playing out in Nevada provides an excellent illustration of both the complex issues involved as well as the frequently muddled and ill-informed thinking of participants. In short, Richard Aumaugher, the superintendent of the Esmeralda County schools has prohibited students from speaking Spanish on the school bus. The affair is described nicely by Dennis Baron, who has somehow even managed to come up with photographs, and in the Pahrump Valley Times. The ACLU's summary is available in English and in Spanish. Likewise, the letter it has written to Mr. Aumaugher is available in English and in Spanish. The English version of the ACLU's letter includes as an appendix a letter sent to parents by Superintendant Aumaugher explaining his decision.

I agree with the ACLU that the ban on Spanish is unconstitutional. It is well established that there is a First Amendment right to speak the language one chooses to speak and that, with some exceptions students do not lose their First Amendment rights when in school. In this case, none of the educational exceptions apply. No class or other educational activity is conducted on the bus that might require the use of a particular language or that might be disrupted by the students' speech. Since this is far from the first time that such issues have arisen, it is surprising that a school superintendent would be unaware of the law and act in a flagrantly illegal manner, if not out of deference to the law, out of the desire to avoid controversy and litigation.

The motivation for the school district's ban on Spanish is interesting. If we take Mr. Aumaugher at his word, he has no dislike for the Spanish language or for Spanish-speakers. His action was, he says, triggered by learning that in Nevada, while 75% of "white" high school students graduate, only 55% of Spanish-speakers do. Surmising that this disparity is due to the inferior English skills of Spanish-speakers, he decided to ban the use of Spanish on school buses in order to force Spanish-speakers to get additional practice in English.

What is striking about this is the lack of any specific relationship between the needs of the kids on the school bus and the prohibition of Spanish. There is no indication that those particular Spanish-speaking kids were not doing well in school, nor is there any indication that their English is less than perfect. The available demographic data for Esmeralda County show a total population of 940, of which 63 were born outside the United States. Of the 69 Spanish speakers, only two are reported as speaking English "not well". None are reported as not speaking English at all. 47 are reported as speaking English "very well", 20 as speaking English "well". In other words, there is no reason to believe that in Esmeralda County Spanish-speakers suffer from any deficit in English. Nor does it appear that Superintendent Aumaugher made any investigation into the reasons for the lower graduation rate of Hispanic students. While language certainly could be a factor, there are many other possibilities. It is, for example, well known that educational attainment is related to family income. Since Hispanic people tend to have lower incomes than Anglos (and the statistics for Esmeralda County show a substantially lower median income for Hispanics), it is quite possible that this is the explanation.

In sum, the Esmeralda County schools took an offensive and unconstitutional action on the basis of the dubious assumption that Spanish-speakers necessarily have poor English skills and that this is the reason for lower graduation rates, as well as the further questionable assumption that Spanish-speakers whose English skills remained inadequate in spite of attendance at an English-speaking high school would be brought up to par by additional practice on the school bus. The English-only movement is certainly driven in part by xenophobia and a desire for cultural and linguistic homogeneity, but this example shows how even in the apparent absence of such factors, policies are all too often formed in cavalier ignorance of the facts.

Posted by Bill Poser at 02:58 PM

A is for Apple...

And E is for Epidiascope. And then there's Navvy, Armet, Wecker, and "Destroy the Evidence". It's not only supermarket signs where a bit of editorial oversight might be advisable -- the foundations of Chinglish are laid early.

A colleague of ours found this set of building blocks to help children learn English. One of the things that is so amazing about teaching English as a foreign language is that it forces you to rethink components of English you take for granted and experience the language in a completely new way. With that in mind, try to remember back...far back...to yourself at a more innocent time.

For the whole hilarious set, see "Chumble Spuzz" at the peer-see blog, by Emily and Joshua. [Hat tip: Victor Mair]

Posted by Mark Liberman at 02:21 PM

Thai Mystery Food

We remark from time to time on the interesting and amusing mistranslations that appear in the English versions of Chinese menus. Today's New York Times reports an example from Thailand encountered by the Olympic triathlon team:

Be Dental Alveoli Quick to Salad Bangkok Hot Paddle Fish

My very limited knowledge of Thai is not up to this. Can any of our readers make sense of this?

Posted by Bill Poser at 01:39 PM | Comments (5)

The archbishop, the law, and the press

Under the headline "Rowan Williams says Sharia law unavoidable" the Telegraph newspaper says:

The adoption of some aspects of Islamic Sharia law in Britain "seems unavoidable", the Archbishop of Canterbury has said.

Did he say that? No, he didn't. Language Log has gone to the text of his lecture to see what he really said. But if you think Language Log is going to provide a brave defense of the hapless man, you're in for a disappointment.

Poor Dr Williams. All he wanted to do was give a lecture in a series on Islam in English Law at the Royal Courts of Justice in London, a lecture on theoretical aspects of the interaction of law and theology, with special reference to the role of sharia in Britain. No chance of that. The reaction to his remarks, as he should have predicted, has been a huge nationwide brouhaha, enormously damaging to his church.

The press fell on Dr Rowan like a pack of hounds yesterday. Indeed, I may be doing something of an injustice to hounds by saying that: The Sun, probably the trashiest paper in Britain, used the headline "WHAT A BURKA", punning on Arabic burqa "long face-concealing garment worn by some Muslim women" and British English berk "stupid person". But set aside the The Sun, which is noted for displaying a level of tastefulness that makes licking your balls in public seem refined. (According to The Sun, "In an explosive outburst Dr Rowan Williams, the country's top Anglican, said there should be one set of rules for Muslims — and another for everyone else." Unbelievable mendacity — unless you know about The Sun's standards, in which case it is believable mendacity.)

What actually happened is somewhat muddied by the fact that the Archbishop did an interview on the radio earlier on the same day when he was due to deliver his lecture in the evening (more on the radio interview below, at the end). But the key part of his long, dense, scholarly lecture (full text here) began about three-quarters of the way through, when the Archbishop provided a partial summing up of his general drift, in case (as seems all too likely) the minds of some of his audience might have drifted:

I have been arguing that a defence of an unqualified secular legal monopoly in terms of the need for a universalist doctrine of human right or dignity is to misunderstand the circumstances in which that doctrine emerged, and that the essential liberating (and religiously informed) vision it represents is not imperilled by a loosening of the monopolistic framework.

Got that clear and sharp in your mind? Probably not. It is a bit suboptimal syntactically. There is a pronoun ("it", before "represents") that is a bit hard to hook up with an antecedent, and there is a noun phrase ("a defence...") used as subject of a predicate with a subjectless infinitival clause as predicative complement ("is to misunderstand..."), which might have been better expressed if the the subject were an infinitival clause as well ("to defend... is to misunderstand..."). In a more radical syntactic recasting, he could have used two separate sentences. He's saying something like this: (1) Defending an unqualified secular legal monopoly on universalist human rights grounds betrays a misunderstanding of where human rights doctrines came from. (2) It would actually be no threat to human rights or dignity if the monopoly of secular law were loosened a little.

From there he went on from there to elaborate a bit more (and "elaborate" really does seem the right verb to use when talking about his baroque prose), remarking that

both jurisdictional stakeholders may need to examine the way they operate; a communal/religious nomos ... has to think through the risks of alienating its people by inflexible or over-restrictive applications of traditional law, and a universalist Enlightenment system has to weigh the possible consequences of ghettoising and effectively disenfranchising a minority, at real cost to overall social cohesion and creativity. ... [B]oth jurisdictional parties may be changed by their encounter over time, and we avoid the sterility of mutually exclusive monopolies.

So the Archbishop is saying that if religious and secular legal authorities interacted and considered their own roles and operations critically they could learn something from each other, which might help us avoid pointless clashes between legal authorities that behave like unyielding bitter rivals.

Finally he concedes that this entails a somewhat troubling overt recognition of the potential for competitive tussling between jurisdictional frameworks:

It is uncomfortably true that this introduces into our thinking about law what some would see as a 'market' element, a competition for loyalty... But if what we want socially is a pattern of relations in which a plurality of divers and overlapping affiliations work for a common good, and in which groups of serious and profound conviction are not systematically faced with the stark alternatives of cultural loyalty or state loyalty, it seems unavoidable.

I have boldfaced the part that arrives at the word "unavoidable", a word that most of the press reports quoted. This is the only place that word appears in the lecture. It is in an adjectival predicative complement to the verb seems (he never said that anything is unavoidable); and it is in a clause preceded by a conditional adjunct ("if what we want..."); and the subject is the anaphoric pronoun "it", which crucially needs an antecedent for its interpretation. He said, for a certain X and Y, "If we want X, then Y seems unavoidable." What is the X, and what is the Y?

Dr Williams' text is intensely complex and difficult on such points; I don't regard it as a model of clarity. But it is crucial to read enough of the context to see that the whole second half of the lecture expounds an idea due to a Jewish legal theorist, Ayelet Shachar, in what he describes as her "highly original and significant monograph", Multicultural Jurisdictions: Cultural Differences and Women's Rights (Cambridge University Press, 2001). The value for Y, the antecedent for the Archbishop's pronoun "it", seems in context to be Shachar's "scheme in which individuals retain the liberty to choose the jurisdiction under which they will seek to resolve certain carefully specified matters, so that 'power-holders are forced to compete for the loyalty of their shared constituents'" (the Archbishop cites page 122 of Shachar's book at the point where he gives this characterization). He is following Shachar in envisaging voluntary recourse to quasi-religious tribunals to resolve delimited matters: he mentions marital matters, regulation of financial transactions, and mediation and conflict resolution.

And his X, in the if clause, seems to be a society where state and church are not at pointlessly at loggerheads by virtue of each regarding themselves as an unchallengeable sole source of justice. If we want a society of that sort, he is saying, then it seems to him that following Shachar's suggestion is the way we have to go.

Now, what Shachar is envisaging might not need much change to British law at all. The Beth Din system of tribunals for settling certain matters in the Jewish community is already operating in various parts of the country. And more generally, if two parties want to settle a matter out of court by having a religious or cultural authority figure make the necessary judgment, it is hard to see how the legal system of a free society could deny them that right. In the USA people go to paralegal mediators to work out divorce matters and to various TV programs for minor financial and family disputes. The accommodation necessary to allow that some people want to go to a rabbi or a mullah for decisions on such matters that they will agree to regard as binding is virtually no accommodation at all.

Dr Williams was merely musing about the beneficial effects such developments might have on legal and theological aspects of our culture. He was not saying that a thousand years of British law was going to be swept away and replaced by sharia in Muslim-dominated British cities, with councils of mullahs in the back streets of Leicester and Bradford determining sentences of stoning or flogging or cutting off of hands for criminal offenses that the population had elected to keep out of the hands of the secular legal system.

Well, the result of Dr Williams' learned and well-intended philosophical musings has been worldwide outcry, furious and almost universal condemnation in essentially all newspapers and hundreds of thousands of web pages (go to Google and see), angry denunciations from Muslims as well as Christians, and calls for his resignation.

So am I going to defend him against all this, on the grounds that he has been wronged (which he has been) by sensation-seeking journalists who cannot handle sentences beyond nine words in length? No. Not me. Sorry. I am a realist.

I could not defend Larry Summers either, in 2005. His ill-judged remarks at a closed conference would have been, if uttered by a professor of economics, just interesting speculations about possible explanations for under-representation of women in science. But coming from the President of Harvard they seemed like an inexcusably tactless controversial pronouncement about women's genetic unsuitedness to science, uttered by the chief executive of a university employing many women scientists. He should have known better. Your job in any such administrative position is not to control the facts and the arguments, and be right, but to control the appearances, predict the reactions, manage the effects.

I cannot defend Dr Williams either. He is in effect the chief executive of the Church of England, which is the state-established church of his country. As Archbishop of Canterbury he is not quite the counterpart of the Pope (Queen Elizabeth II is technically the supreme head of the church), but almost. The issue is not whether his remarks were sensible and reasonable (which they were), but about whether he can do the demanding job of holding this figurehead position, and manage the appearances and the politics, without causing his church to fall apart in social and political discord.

The cruel fact is that by provoking this huge row he has shown that he is unsuitable. (For the second time. Remember, he also took a position on homosexuality — one that I mostly agree with — that has caused a worldwide rift in Anglicanism as all the conservative African churches recoiled in horror. Bad move.) I hate to say it, but the people who say he lacks the leadership skills for his job are basically right.

Dr Williams is a gentle, learned, brilliant, scholarly man, and a bit of a public relations doofus. The calls for his resignation are not unjustified. He should be the holder of an endowed professorship in some suitable subject at some research-led university. He should not be a prominent church administrator, and certainly not the Archbishop of Canterbury. Someone duller, less original, less intelligent, and more political should be found for that job.

[Update: It has been pointed out to me that the text of the radio interview, on a BBC program called "The World At One", was a little closer to the newspapers sensational reports. The interviewer had clearly been reading the text of that evening's speech. Part of the interview went thus:

Interviewer: To begin with, you've given this vision of if as a nation Britain wants to achieve social cohesion, that challenge is how to accommodate those of religious faith in relation to the law; and your words are that the application of Sharia in certain circumstances if we want to achieve this cohesion and take seriously peoples' religion seems unavoidable?

Dr Williams: It seems unavoidable and indeed as a matter of fact certain provisions of sharia are already recognised in our society and under our law...

And so on. However, note again that the "it" is not the coming of sharia law to sweep away centuries of British heritage and replace our laws. Even in the interviewer's mouth, it's "the application of Sharia in certain circumstances if we want to achieve this cohesion and take seriously peoples' religion." That's not what the newspapers reported. They used phrases like "one law for Muslims and another for everybody else"!

The newspapers' behavior (that of The Sun particularly) was abominable. That of Dr Williams was merely a little out of touch with what the prevailing culture was likely to make of things. It's interesting that, according to Wikipedia (February 12, 2008), the Church Times columnist Andrew Brown once remarked that "The trouble with Rowan Williams is that he can never remember that he is Archbishop; the trouble with [his predecessor] George Carey was that he could never forget." It is the former that makes for the most trouble. My tip: if they make you an Archbishop, always remember that you're now an Archbishop.]

Posted by Geoffrey K. Pullum at 12:38 PM

She was seeing at me

Yogi Berra is often quoted as saying "Sometimes you can observe a lot just by watching" (48,500 times on the web, according to Google). He's less often quoted as saying "Sometimes you can see a lot just by looking" (1,020 times on the web, according to Google).

Both versions of the epigram are based on the same two differences. In the first place, observing and seeing imply some kind of psychological uptake that watching and looking don't. Thus you can say "I looked but I didn't see any problems", or "I was watching but I didn't observe any problems"; but you can't turn it around and say (in the same sense) "I saw but I didn't look at any problems", or "I observed but I didn't watch any problems". And in the second place, watching and looking imply some kind of choice or intent in allocating attention that seeing and observing don't. Thus it's normal to say "don't watch" or "don't look", but not "don't observe" or "don't see".

So it doesn't work to turn Yogi's quotes around: "Sometimes you can watch a lot just by observing", or "Sometimes you can look at a lot just by seeing". In the usual order, we can understand his remarks to mean "if you pay attention, you might learn something" -- if we turn them around, that interpretation is lost.

Lynn Johnston's wonderful comic strip For Better or For Worse, on 2/5/2008 and 2/6/2008, reminds us that kids can misunderstand such things in creative ways.


Robin's usage "... seeing at me" creatively combines the purposefulness of "look at" with the (distractingly intrusive) uptake of "see". At least, I think that's what he has in mind.

[Update -- John Cowan reminds us that Sherlock Holmes pairs "see" and "observe" (in A Scandal in Bohemia):

I could not help laughing at the ease with which he explained his process of deduction. "When I hear you give your reasons," I remarked, "the thing always appears to me to be so ridiculously simple that I could easily do it myself, though at each successive instance of your reasoning I am baffled until you explain your process. And yet I believe that my eyes are as good as yours."

"Quite so," he answered, lighting a cigarette, and throwing himself down into an armchair. "You see, but you do not observe. The distinction is clear. For example, you have frequently seen the steps which lead up from the hall to this room."

"Frequently."

"How often?"

"Well, some hundreds of times."

"Then how many are there?"

"How many? I don't know."

"Quite so! You have not observed. And yet you have seen. That is just my point. Now, I know that there are seventeen steps, because I have both seen and observed. [...]"

]

Posted by Mark Liberman at 10:24 AM

February 08, 2008

Universal forgetfulness

Mitt Romney has ended his campaign for the Republican presidential nomination. And with the video segment "Mittpocalypse Now", Josh Marshall has said goodbye to a string of Mitt-based neologisms that included "Mittmentum", "Mitt-nertia", "Mitt-sheviks", "Mitt-ptonite", " Mittstasis" and some more -- I forget all of them.

Did you have any trouble with that last phrase? Some do, most don't. But really, it's interesting and somewhat suprising that people often use I forget all of them to mean "it's not the case that I remember all of them". On the basis of a quick scan of the web, I reckon that the phrase gets that reading (what a semanticist would call "wide scope of negation") about half the time:

Miss Conduct's Blog, 2/7/2008: Another entertaining food-company name, which I similarly encountered walking to work many years ago, is the Puritan Ice Cream Company. That may not turn a native New Englander's head, but it sure sounded odd to this Midwesterner. I started a contest with some of my friends come up with the best Puritan ice cream flavors--I forget all of them, but some of the better entries were Straight & Narrow Rocky Road, Chocolate Mather, Sorbet in the Hands of an Angry God, and of course Preachers & Cream.)

Heralddk.ca: Many articles have pointed at other cities and the success of downtown arenas there — places like Vancouver and Columbus and Denver and ... well I forget all of them, but there's quite a few.

In the rest of the examples, the phrase means, well, "I forget all of them". In this reading, a semanticist might say that the quantifier all takes wider scope than the negation that's implicit in forget, i.e. for all x, I forget x:

Secret Agent Mom: I'm resisting the urge to expand the list, not because I haven't been thinking of lots of fascinating new things about me, but because the minute I get in front of a keyboard, I forget all of them.

"Shadows of the Pine Forest" (The Southern Literary Messenger, 1851, p. 623: "All my life," said she, "is bound up with the old house and grounds -- yonder I played, there i sat down and cried: many mournful things have happened here, but I forget all of them and only remember the pleasant part."

The key difference is whether you forget every single one of the things under discussion (that's the all...not meaning), or you forget some and remember others (the not...all meaning).

Why is it surprising for the negation to take wide scope, sometimes, in phrases like "I forget all of them"? Well, if you turn forget into not remember, the rest is easy. "I don't remember all of them" normally and even inevitably means "it's not the case that I remember all of them", and not "I don't remember any of them". (Here I hope that even James Kilpatrick will agree with me.)

But forget isn't really just an alternative form of "don't remember". For example, you can't say "I forget any of them."

And construing   "I negVerb all of them" as "it's not the case that I Verb all of them" doesn't work for most other implicitly negative verbs. I can't interpret "I dislike all of them" as "it's not the case that I like all of them" -- and a quick scan of instances on the web suggests that others agree with me. Similarly, "I missed all of them" doesn't mean "it's not the case that I caught all of them", and "failed all the tests" doesn't mean "didn't pass all the tests".

Perhaps the trick with forget is that the expected reading for all x, I forget x is pragmatically odd -- it suggests mentally enumerating things that you claim not to remember well enough to enumerate. And there are some cases that go the other way, where an overt negation binds so tightly to a verb that it winds up taking narrow scope, e.g. "didn't notice all the signs".

There's a large literature on this subject, but I don't know most of it, and forget all the rest. I'm sure that kind readers will remind me.

[Hat tip to Jan Freeman for "I forget all of them".]

Posted by Mark Liberman at 09:17 AM

February 07, 2008

Unicode Humor

For the true geeks among us, there is now Unicode humor. Check out this xkcd cartoon. Let your mouse hover over the cartoon to see the title, which is part of the joke. (If you're using a browser that doesn't show all of long titles, such as Firefox prior to version 3.0, I've put the full title below the fold.)



U+FDD0 is actually Unicode for the eye of the basilisk, though for safety reasons no font actually renders it.

Posted by Bill Poser at 11:30 PM

Romney can't compete with Senator Moccasin

Today's big political news story was Mitt Romney's announcement that he was suspending his presidential campaign. When a major event like this occurs, everyone's anxious to get the news out quickly, so it's tailormade for... the Cupertino effect! Once again, Google News gives us an early report off the Associated Press wire with an embarrassing spell-checker error, and then leaves it online long after corrected feeds have gone out:

This is a transformation of McCain's name already predicted in this space, in Mark Liberman's Jan. 12 post reporting on what happened when Cody Boisclair ran the names of presidential candidates through the built-in spell-checker on Mac OS X. But McCain appears correctly throughout the rest of the AP article, so there hasn't been a global substitution of the name. And McCain is actually included in recent Microsoft Office speller dictionaries (unlike Huckabee and until very recently Obama), so it's fair to assume this isn't the result of a correct spelling being misrecognized.

Instead, it appears to be a case of a spelling error being miscorrected into a different word on the spell-checker's list of suggestions. My first guess was that the reporter typed Moccain here, since that's just one missing letter away from moccasin. As Thierry Fontenelle of the Microsoft Natural Language Group explained, deletion of one character is a pretty small "edit distance," so moccasin is an obvious suggestion. But Steve Chrisomalis, who was one of two readers (along with Paul Justice) to email me about this, has a better candidate: Maccain, since his version of MS Word gives moccasin as the first choice and McCain as the second.

As usual, this is not intended to mock the hard-working reporters and editors who fall into spell-checker traps. Do not judge a journalist until you've walked a mile in his McCains moccasins.

[Update: Our good friend Thierry Fontenelle of Microsoft writes in:

I took your text and ran it through Office/Word 2007 (our latest version). As you can see below, McCain appears as the first suggestion of Maccain. But even if it did not, as in your reader's version, apparently, I really wonder why people do not read the suggestions before accepting them... I'm afraid there is very little we (i.e. the people who create these tools) can do about that...

And Bob Hay's got another theory:

You suggest "Maccain" as a possible source of the correction to "moccasin". Confusing "Mac" and "Mc" is certainly possible, as both are common in names. But this seems somewhat unlikely to me, since the name was spelled correctly in the rest of the article. Additionally, capitalization of the second "C" is relevant, "Maccain" (2 errors) versus "MacCain" (1 error). My spell check (Word 2004 for Mac) gives "moccasin" for "Maccain" but properly gives "McCain" for "MacCain".
My theory is that the original mistake was "McCasin". It's an easy keystroke error to make since the "a" and "s" are next to each other on the keyboard. Only one letter away, and my spell check gives "moccasin" as the first suggestion regardless of capitalization.
]

Posted by Benjamin Zimmer at 08:39 PM

Lounsbury on linguistic martyrdom and the transience of slang

My latest column on OUPblog takes a lead-a-horse-to-water approach to two usage points that are among the favorite bugaboos of peevologists. One of them — the less vs. fewer distinction — has already come up on Language Log several times recently. But I don't believe the other one — none used with plural verbs — has been addressed here directly (though Mark Liberman's post on syntactic and notional number and Arnold Zwicky's posts on agreement with the nearest are certainly helpful for thinking about cases of plural concord like "None of them are old").

It's a bit hard for me to fathom why anyone might still be clinging to the notion that none can only be used with singular verbs in the misguided belief that it is a shortened form of not one. Yale professor of language and literature Thomas R. Lounsbury (1838-1915) demolished this argument a century ago in his fine antidote to linguistic degenerationism, The Standard of Usage in English. In my column I quote the following eloquent passage from Lounsbury's book:

There is no harm in a man's limiting his employment of none to the singular in his own individual usage, if he derives any pleasure from this particular form of linguistic martyrdom. But why should he go about seeking to inflict upon others the misery which owes its origin to his own ignorance?

Lounsbury (no relation, as far as I know, to Mayan hieroglyphics expert Floyd Lounsbury, who taught at Yale several decades later) had choice words for contemporaries who railed against the "corruption" of English from some supposed "golden age." He also saw no danger in slang terms, since most of them pass quickly from the language anyway, and the ones that last have proven their utility.

The good professor wasn't perfect in his predictive powers, though. In a New York Times interview about slang from May 31, 1908, he said:

Those who think that 'slang' is getting too much of a permanent place in the language need only look at college slang for reassurance. In colleges slang is especially prevalent... When I first taught at Yale the word 'snab' was used to designate the female sex as a whole. There was a college poem at that time which began with these lines:

The snab fill all the gallery
In beautiful array.
Any one noticing the prevalence of 'snab' in speech at that time might have become seriously alarmed as to the future. Yet who ever hears of the word nowadays?

So far, so good. (Maybe snab is due for a comeback?) But then Lounsbury pushes his luck:

Take 'dude,' too. It has never really won a place and is, I think, dying out.

Dude. Dude! No, seriously, dude:

(Bud Light commercial via Slate, via Away With Words.)

[Update #1: Rich G. points out that this week's Mutts comic strip (1, 2, 3) uses dude prominently, in a direct homage to the modern cinematic classic The Big Lebowski. Lounsbury notwithstanding, the dude does indeed abide.]

[Update #2: Karl Hagen sends along an excellent example of linguistic martyrdom over none, from the British actor Stephen Fry in his position as host of the television show QI:

Karl has blogged about this clip here.]

Posted by Benjamin Zimmer at 09:00 AM

February 06, 2008

Headline of the year?

It's only February 6, but Martyn Cornell believes that he's found the headline of the year.

Although it's an AP story, the headline seems to be uniquely Fox. There's extensive discussion here. A different pun on the same words came up back in June of 2005 (here and here).

[Just in case the link decays, or editorial discretion improbably intervenes:]

Posted by Mark Liberman at 07:21 AM

Genitive anxiety

SS writes from Florida:

My question relates to the locution that takes the form of: "So-and-so is a friend of Bill's." This drives me nuts. I always want to ask, "A friend of Bill's what? His wife? His son? His father?"

Would that be an appropriate issue for someone to write about on the blog? Or am I off base?

Well, I'm writing about it, so I guess it's appropriate. And it's not off base to be puzzled by the logic of this construction, especially when it's embedded in another possessive, as in "A friend of mine's pet bear". But puzzling or not, these double genitives "have been around for centuries and are not hard to find in real life", as Arnold Zwicky wrote in the post I just linked to.

My correspondent certainly knows that double genitives are not hard to find in real life -- that's exactly what's driving him nuts. Just think of his anxiety during Bill Clinton's presidency, when "friend of Bill's" was so common as to deserve its own initialism F.O.B. Indeed, SS's note to me echoes almost verbatim the gripe voiced by James J. Kilpatrick in March of 1993:

How should we punctuate a double possessive? USA Today reported in January that Tyrone Ford "had been a supporter of Marion Barry's during the former mayor's drug trial.'' That same day USA Today identified Laura Anthony as "a college classmate of Hillary Clinton's.'' But USA Today also has identified Labor Secretary Robert Reich as "a close friend of Clinton.''

The New York Times in January said that "journalists defy a friend of Bill at their own risk.'' On another occasion, the Times spoke of a party arranged by a friend of Mr. Brown.''

The editor of Webster's Dictionary of English Usage prefers the apostrophe-s, i.e., "a friend of Mr. Brown's.'' My own vote goes to the Times style: "a friend of Bill.'' Whenever I see "a friend of Bill's,'' I want to ask, Bill's what? A friend of Bill's mother? His brother? A friend of Bill's wife?

My advice to SS is to buy a copy of Merriam-Webster's Concise Dictionary of English Usage, and to turn to it for counsel and consolation during those long nights of the soul when he might otherwise be poring over old James Kilpatrick columns. Referring to the entry in MWCDEU for double genitive, we learn that this

... is an idiomatic construction of long standing in English -- going back before Chaucer's time -- and should be of little interest except to learners of the language, because, as far as we know, it gives native speakers no trouble whatsoever.

But the double genitive was discovered by the 18th-century grammarians and has consequently been the subject of considerable speculation, explanation, and sometimes disapproving comment.

MWCDEU points out an example from the work of Alexander Woollcott where Kilpatrick's prescription won't work: "... that place of Dorothy Thompson's is only sixty miles away".

No native speaker of English would write [this example] as "that place of Dorothy Thompson".

The MWCDEU entry gives some interesting information about this construction's analytic history:

Lowth 1762 ... ran afoul of "a soldier of the king's." He seems to have mulled it over awhile; then he adds that "here are really two possessives: for it means 'one of the soldiers of the king.'" This is a partitive construction in which two ofs are used to explain away the supposedly redundant 's. The partititve explanation has persisted down to [the present], even though the grammarian Otto Jespersen exploded it in 1926 with an example from Tristram Shandy: "This exactness of his." The phrase cannot be turned into "one of the exactnesses of him." An even plainer example is another Shandean phrase, "the long nose of his."

The double genitive is standard English and should not be worried about.

But merely telling someone that they shouldn't worry is not always adequate therapy, as generations of psychologists have learned to their profit. So SS might try to master his anxieties through analytic insight, beginning with what he could learn from the Cambridge Grammar of the English Language. (It's $175 -- but that's less than the cost of two therapist-hours, at standard rates, and its 1,860 pages will provide far more than two hours of solace.) CGEL points out that the construction we're talking about, which it calls the "oblique genitive", is semantically more restricted than the construction in which the genitive precedes the head noun. Thus we have (chapter 5 "Nouns and noun phrases", §16.5.3, "Alternating patterns of complementation"):

Mary's green eyes those green eyes of Mary's
Mary's book that book of Mary's
Mary's secretary that secretary of Mary's
Mary's new house that new house of Mary's

but

Mary's anger ?that anger of Mary's
Mary's obituary ?that obituary of Mary's
the cathedral's spire *that spire of the cathedral's
the summer's heat *that heat of the summer's

(The judgements as given in CGEL.)

There's plenty more to be learned about this construction and its relationship to the rest of syntax, semantics and history of English. As Richard Feynman famously said, "No problem is too small or trivial if we really do something about it". But the mode of doing that I recommend is to investigate and to try to understand, not to sink into peevish frustration. It's better for your brain as well as for your digestion.

[John Cowan writes:

"Many a modern philosopher is a student of Kant, but any student of Kant's has been dead for more than a century."

(I don't know who should get the credit for this.)

It seems to me that "student of Kant" is what classicists call an objective genitive (I don't know the current jargon for this): there is an underlying clause "X studies Kant". "Student of Kant's" by contrast is truly possessive, and means "student belonging to Kant".

Apropos of this, English allows both subjective genitives like "Caesar's murders" (Caesar murdered X.PL) and objective ones like "Caesar's murder" (X murdered Caesar), as well as double-barreled ones like "Brutus's murder of Caesar". But in French (or so I am told) subjective genitives appear only in fixed phrases, whereas objective genitives are productive, and the double-barrelled form is "l'assassinat de César par Brutus", where "par" is the regular agent preposition in passive constructions.

]

Posted by Mark Liberman at 05:37 AM

February 05, 2008

Imperfect tense of the subjunctive mood

Oh, dear, I think linguification is back. Eric seems to be right. I hope I did not tempt it out of whatever rhetorical crypt it was lurking in during 2007. Fred Inglis, emeritus professor of cultural studies at the University of Sheffield, reviews George Steiner's My Unwritten Books in this week's Times Higher Education (31 January 2008, 44-45), and describes Steiner's conceit (a survey of seven books that he claims he would have written had he the time) as "a memoir in the imperfect tense of the subjunctive mood."

An imperfect tense is one used in describing a habitual state rather than a completed event (He walked up to the village each evening as opposed to (He had walked up to the village immediately); and the subjunctive mood is a form of the verb used in certain contexts that do not involve reporting of facts (the closest approach in English would be that it be written as opposed to that it was written). Professor Inglis does not mean that Steiner avoids the perfect tense (after all, my unwritten books means the books I have not written, which is in the present perfect); and as for the subjunctive, it is rare enough in English to make it likely that it hardly turns up in Steiner's book at all.

The linguistic claim is not true, and is not supposed to be believed. Inglis merely means that the book is a memoir concerning uncompleted works that Steiner hoped might be written. He expresses this in terms suggesting he is talking about verbal inflections for tense and mood, though in fact he is not. But why? Why a false claim about tense and mood to cloak a (possibly true) claim about nonlinguistic reality? Is Professor Inglis showing us that he knows how to brandish technical terms from traditional grammar, in the belief that this will convince us he is clever enough to be writing a review of George Steiner? I do not know. I have never been able to answer why-questions about linguification.

Several people have suggested to me that he is brandishing his knowledge of Latin grammar. Well, it is quite right that in Latin a conditional clause about something not done ("if I had written...") would be in the subjunctive; but not necessarily in an imperfect tense. The Latin translation of "if I had written a book about it" would be in the pluperfect subjunctive. Steiner didn't write in Latin anyway, so you could call this redundant, but even if we pretended we thought Inglis was talking about Latin, his linguification would not come out true.

Posted by Geoffrey K. Pullum at 06:00 AM

Noah Webster

In Penny Arcade for 1/4/2008 ("Noah Webster, You're My Only Hope"), Tycho succumbs to Word Rage Syndrome, triggered by some rationalized spelling in an advertisement (click on the image for a larger version):

I expect that in some cozy coffee-house in heaven, Noah Webster and Benjamin Franklin are having a good laugh together over this. From Noah's Dissertations on the English Language: With Notes, Historical and Critical, to Which is Added, by Way of Appendix, an Essay on a Reformed Mode of Spelling, with Dr. Franklin's Arguments on That Subject (Boston. 1789), pp. 393-98:

The principal alterations, necessary to render our orthography sufficiently regular and easy, are these:

1. The omission of all superfluous or silent letters; as a in bread. Thus bread, head, give, breast, built, meant, realm, friend, would be spelt, bred, hed, giv, brest, bilt, ment, relm, frend. Would this alteration produce any inconvenience, any embarrassment or expense? By no means. On the other hand, it would lessen the trouble of writing, and much more, of learning the language; it would reduce the true pronunciation to a certainty; and while it would assist foreigners and our own children in acquiring the language, it would render the pronunciation uniform, in different parts of the country, and almost prevent the possibility of changes.

2. A substitution of a character that has a certain definite sound, for one that is more vague and indeterminate. Thus by putting ee instead of ea or ie, the words mean, near, speak grieve, zeal, would become meen, neer, speek, greev, zeel. This alteration could not occasion a moments trouble; at the same time it would prevent a doubt respecting the pronunciation; whereas the ea and ie having different sounds, may give a learner much difficulty. Thus greef should be substituted for grief; kee for key; beleev for believe; laf for laugh; dawter for daughter; plow for plough; tuf for tough; proov for prove; blud for blood; and draft for draught. In this manner ch in Greek derivatives, should be changed into k; for the English ch has a soft sound, as in cherish; but k always a hard sound. Therefore character, chorus, cholic, architecture, should be written karacter, korus, kolic, arkitecture; and were they thus written, no person could mistake their true pronunciation.

3. Thus ch in French derivatives should be changed into sh; machine, chaise, chevalier, should be written masheen, shaze, shevaleer; and pique, tour, oblique, should be written peek, toor, obleek.

4. A trifling alteration in a character, or the addition of a point would distinguish different sounds, without the substitution of a new character. Thus a very small stroke across th would distinguish its two sounds. A point over a vowel, in this manner, a, or ű, or i might answer all the purposes of different letters. And for the dipthong ow, let the two letters be united by a small stroke, or both engraven on the same piece of metal, with the left hand line of the w united to the o.

These, with a few other inconsiderable alterations, would answer every purpose, and render the orthography sufficiently correct and regular.

[Hat tip: Ken Mallott]

Posted by Mark Liberman at 05:14 AM

February 03, 2008

Coal-fire(d)?

Andrew C. Revkin, "A 'Bold' Step to Capture an Elusive Gas Falters", NYT, 1/3/2008, starts with this sentence:

CAPTURING heat-trapping emissions from coal-fire power plants is on nearly every climate expert's menu for a planet whose inhabitants all want a plugged-in lifestyle. [emphasis added]

This surprised me, because I'm used to seeing power plants that burn coal called "coal-fired" -- but "coal-fire" as a modifier occurs four times in this article, so it was clearly a choice, not a typo.

My expectation seems to be in tune with historical as well as current usage. The OED has

coal-fired a., heated or driven by coal

and gives 10 example sentences with coal-fired, including for example

1909 Daily Chron. 17 Apr. 4/7 Baked fifty-five minutes in *coal-fired oven.
1956 Nature 4 Feb. 204/2 The capacity of coal-fired plant must be expected to continue to rise.

In contrast, the OED treats coal-fire as a (head) noun, "A fire made of coal", and all of the 34 examples scattered through the work conform to that structural pattern, e.g.

1656 S. HOLLAND Zara (1719) 41 Though strong with stubborn wire, I melt in thy coal-fire.
1816 J. SMITH Panorama Sci. & Art II. 330 Common oyster shells to be calcined in a good coal-fire.

A Google search yields 1,270,000 hits for "coal-fired" vs. 223,000 for "coal-fire" -- and, since Google search ignores punctuation, many (maybe most) of the "coal-fire" hits are for things like

A mine fire or coal fire is the underground smouldering of a coal mine.
If you do not follow the right procedure the coal fire will go out.
...the smoke and ash comes from a coal fire which may have been burning for over 5,500 years.

And in fact the NYT itself has 1,223 instances of "coal-fired" in its archive since 1981, most recently:

1/31/2008: The deputy secretary of energy, Clay Sell, said that the program would be revamped to split off the costs of building a new coal-fired power plant ...
1/30/2008: But some lawmakers who attended the briefing later insisted that any departure from building the coal-fired, 275-megawatt prototype power plant anywhere other than the central Illinois town of Mattoon would be unacceptable ...
1/28/2008: A lot of people think that an electric car leaves no carbon footprint, but of course that's not the case if you are recharging with electricity from a dirty coal-fired power plant.

In comparison, there are only 50 instances of "coal-fire" in the same post-1981 archive -- and again, most of them are head nouns. The most recent three of these are:

1/13/2008: Hired as a fireman to keep the engine's coal fire going, he made $9.08 for a 16-hour day.
8/26/2007: Miss Huntington's six-month course taught children of the poor how to use matches and light a coal fire.
1/26/2003: They warm themselves by a coal fire in the early morning before strapping on their equipment.

I could only find three examples where "coal fire" means "heated or driven by coal":

1/24/2008: Thousands of people die annually breathing the noxious particles of coal-fire installations.
11/10/2001: ... a 20-foot-by-20-foot house that had an outhouse and a coal-fire stove as amenities.
8/3/1986: The apparatus has a black coal-fire boiler trimmed in silver and brass.

There were another two instances where "coal fire" is used as a modifier, but in a rather different sense:

1/1/1987: Another Hawaiian word, ''aa,'' pronounced ah-ah, describes the chunky lava that looks like coal fire cinders.
3/10/1991: ... King's Chapel and choir, and candlelight, the coal-fire smell, and walking across the Quadrangle in a dressing gown in the rain to take a bath.

Don't get me wrong here, I'm not trying to defend the purity of the English language against the incursions of barbarian hordes. I'm just documenting and exploring a little piece of lexical change in embryo.

There are at least two forces at work here, one phonetic and the other syntactico-semantic.

Taking up the form and meaning first, it's perfectly regular to use a noun like coal fire as a modifier, with the usual range of loose, contextually-appropriate restrictions on the head it modifies. (A hyphen may be interpolated or not, ad libitum.) This gives "coal-fire smell" (meaning the smell of a coal fire), or "coal fire cinders" (meaning cinders from a coal fire); but it could also give us "coal-fire boiler"  (meaning a boiler that is heated by a coal fire), or "coal-fire power plant" (meaning a power plant that is heated by a coal fire).

We generally don't take this last step, but only because another phrase was there before us: coal-fired. This is (probably) an example of the common pattern where -ed is added to a modified noun to make another modifier. The template is something like

MODIFIER NOUN1+"ed" NOUN2

taken to mean "NOUN2 with (a) MODIFIER NOUN1", e.g. long-haired girl, poker-faced opponent, slate-roofed villa, etc.

I wrote that this is "probably" the analysis, because there's another pattern that also fits,

NOUN1 VERB+"ed" NOUN2

taken to mean "NOUN2 VERBed with/by NOUN1", as in grass-covered mound, storm-tossed nation, jelly-filled donut.

Either way, coal-fired was there first, anchored by its contrastive partners oil-fired, gas-fired, wood-fired and so on.

But coal-fire, even if it lost the chance to be the early mover, has always been waiting in the wings. And it's got an ally in the English sound system: t/d deletion.

That's the process that leads us to reduce or omit [t] or [d] at the ends of syllables (See e.g. J.L. Roberts, "Acquisition of Variable Rules: (-t,d) Deletion and (ing) Production in Preschool Children", IRCS, 1994). This is more likely to happen in common phrases and before consonants -- as in "coal fired".

There's a long history in English of the final [t] or [d] of -ed forms being lost in lexicalized phrases:

skim milk
skimmed milk
popcorn
popped corn
wax paper
waxed paper
screen porch
screened porch
ice cream
iced cream
ice tea
iced tea
shave ice (Hawaian dessert)
shaved ice (?)
cream corn (informal)
creamed corn
whip cream (informal)
whipped cream

(Some -ed-less forms like "skim milk" are quite old, and may have been formed originally from as V+N, I'm not sure.) Forms such as "popped corn" and "iced cream" are now archaic at best, with "popcorn", "ice cream" etc. being standard. The case of "iced tea" seems to be transitional -- I usually see it written as "iced tea", but I'm pretty sure that I pronounce it as if it were "ice tea". And for me, "creamed corn" and "whipped cream" are still normal, though I know that many people have lost the final -ed in those words as well.

So is "coal fired" starting down the same path? Time will tell, but if NYT authors and editors are starting to use "coal-fire" instead, we're probably looking at a bear market for that particular [d].

[Update -- fev from Headsup: The Blog writes:

Neat catch on "coal-fire." The lede looks to have been tweaked to "coal-fired," tho the cutline and other references in the body copy are still "coal-fire." I'm wondering if some copy editor didn't infer or overextend a rule, tho I can't guess offhand what it might have been.

FWIW, vastly influential as the Times is at agenda-setting, I don't think it has that much impact on day-to-day style. It still does stuff that looks kind of bizarre in the hinterlands, like its insistence on not reducing relative clauses and the thing about articles with occupational titles before names. (If you wouldn't address somebody by the title, it needs to have an article: "good morning, Senator Lamb" means "Senator Stacy Lamb" is OK, but "good morning, infielder Agnelli" means you have to use "the infielder Lee Agnelli.")

Style's just no end of fun, but "coal-fire" could be a one-time thing. Alas.

Well, on January 24, Roger Cohen wrote ("America Needs France’s Atomic Anne") that "Thousands of people die annually breathing the noxious particles of coal-fire installations", by which he meant coal-fired power plants. So maybe it's a trendlet if not a trend.]

[A reader writes:

I don't think the forms cited for "t/d" deletion are really parallel. Those are noun phrases in which an initial past participle is replaced with a monosyllabic noun homonym of the verb in question, so that the end result is a compound noun. With "coal-fired", it's the last element that loses the "d", and the question is, how is the original form construed and why was this construction changed.

My own sense is that "fired" is a participle and "coal" is part of an adjectivally subordinate compound adjective, so that "coal-fired" means "fired with coal", just as "steam-powered" means "powered with steam". Surely, it's not a "pseudo-participle", meaning an adjective meaning "equipped with NOUN" along the fashion of "jug-eared", which doesn't presuppose the existence of a verb "to ear". That is, "coal-fired" doesn't mean "equipped with [the compound noun] coal fire".

But I think that here is perhaps where the problem arises. Whereas the verb "to power" meaning "to operate a contraption via a certain form of power" does exist, I don't think that one normally "fires" a plant (or other form energy provider, engine etc.). (One can "fire it up", but you don't simply "fire it", I think.) Hence, "coal-fired" is lexically "stranded", and maybe a bit "odd" feeling. Could it be that some sentiment that "coal-fired" is hard to construe has given rise to the desire to convert the form into a simple compound adjective based on a compound noun like, say, "steam engine factory accidents" meaning "accidents in factories for steam engines". Similarly, "coal-fire plants" meaning "plants using coal fire"?

Clearly, I didn't explain myself very clearly.

First, I didn't mean to suggest that t-d deletion only applies to -ed endings in cases like popcorn. On the contrary, the commonest place to see it is inside words (e.g. "postpone" or "handmaiden"), or in common in phrases like "first Friday" or "best buy"; and it can also happen across larger phrases boundaries, as in "last for a while" or "band together".

So it's true that "coal-fired" is not at all the same kind of form as "screened porch" -- but in both forms, the possible loss of final -ed in both forms is promoted and reinforced by t-d deletion in their pronunciation.

Second, I think it's unclear to current speakers of English whether the "fired" in coal-fired is the participial form of the verb fire (as in the analysis of "jelly-filled" or "grass covered"), or an -ed form of the noun fire (as in the analysis of "poker-faced" or "long-haired"). At least, it's unclear to me. Whichever it is, the analysis is not transparent or unproblematic -- and that makes replacement by coal-fire all the more plausible. ]

[David Carlson reports grocery-store aisle signs reading "Can Vegetables". ]

[Andy Hollandbeck writes:

Your post touches ground close to a problem I often have when copy-editing demographic studies -- the use of the words "age" and "aged." I often see both "school-age children" and "school-aged children." I normally try to just go with the author's choice and make sure it's consistent throughout, but many articles have multiple articles, so I have to make a choice one way or the other, and I never feel totally comfortable going one way or the other.

Another similar problem is deciding between age and aged in, for example, "The third cohort consists of 200 adults age(d) 31-40."

]

Posted by Mark Liberman at 12:43 PM

February 02, 2008

Obituary: Eloise Jelinek

Eloise Jelinek, professor emerita of Linguistics at the University of Arizona, died in Tucson, Arizona on December 21st, 2007, after a long illness.

She was born in Dallas, Texas on February 10, 1924. Her life-long passion for language began during her childhood in Texas where she became fluent in Spanish. Her passion for linguistics was nurtured at the University of Michigan where she completed both a BA and MA in Anthropology and Linguistics.

Because of health concerns for her son, Tom, she, and her husband, Arthur Jelinek, moved to Tucson in 1967. In the mid-70's she was able to continue graduate studies in linguistics at the newly formed Linguistics Department at the University of Arizona. She received her doctorate in 1981 under the direction of Adrian Akmajian.

Her dissertation title was "On defining Categories: AUX and Predicate in Egyptian Arabic." Her knowledge of Arabic (and also Hebrew) was acquired while doing fieldwork on Egyptian Arabic during her time at the University of Michigan.

Following her doctorate, she served on the faculty at The University of Arizona from 1981 to 1992. She taught and spoke extensively around the world, from Santa Cruz to Prague. She served on committees and organized workshops for the Linguistic Society of America, for the Society for the Study of Indigenous Languages of the Americas, and for the American Anthropological Association. She received research grants from the National Science Foundation, the Jacobs Fund, the Lindley Foundation, the Wenner-Gren Foundation, the American Philosophical, Charles University, and the University of Arizona. Following her official "retirement" in 1992, her activities seemingly only increased, as she organized several workshops, continued to work on national committees, co-edited three volumes of collected papers, and, of course, continued to publish her original research.

Eloise was all that a scientist who studies human language should be: endlessly fascinated by the complexities of language, constantly seeking to formulate the explanatory principles that under that complexity. From her dissertation onward, her research represented an optimal marriage between original ideas and original field data, an example to theoretical linguists everywhere. She was especially instrumental in demonstrating the importance of data from endangered and less-studied languages to generative linguistics, among them the Straits Salish languages Samish and Lummi, as well as Navajo, Choctaw, and Yaqui.

Her great insight into the human language faculty was founded on her remarkable ability to grasp the underlying structures of such typologically diverse languages. Her work advanced in fundamental ways the understanding of linguistic variation and its relationship to linguistic universals.

One striking thing about Eloise's research is that her later work was always an expansion and deepening of her earlier work. She originally was part of a research team consisting of Adrian Akmajian, Susan Steele, and Thomas Wasow. The focus of the research of this team was the category AUX and its role in the syntax of the world's languages. The AUX was shown to typically have as constituents subject (and often object) person marking, tense, aspect, and modality. Eloise's Pronominal Argument (PA) Hypothesis (Jelinek 1984) grew directly out of her research on the AUX category. She proposed that a major typological distinction among languages such that some languages obligatorily satisfy their argument positions with pronominals (Pronominal Argument Languages) and other languages satisfy their argument positions with nominal constructions (e.g., nouns) (Nominal Argument Languages). What is important is the set of syntactic consequences that she showed follow once a language is described as a PA language. Her proposal has been the foundation of theoretical treatments of nonconfigurational, 'head marking' languages in the literature since it first appeared.

Throughout her career she was most intrigued by phenomena at the syntax/semantics interface. Quantification in PA languages, the relationship of discourse structure to syntactic structure, and the thetic/categorical predication distinction were central foci of her theoretical work. Especially indicative of her creativity is her analysis of morphological reduplication in Salish as a type of quantification. A conference focussing on her work was held in Utrecht in 2001, and colleagues presented her with a volume of papers in her honor in 2004.

She was also all that a humane scientist should be, personally. She was deeply committed to the communities of speakers who shared their languages with her. She shared Ken Hale's vision of native-speaker linguists describing and analyzing their own languages, and worked extensively to recruit minority students to the linguistics program at Arizona, particularly speakers of endangered languages. She supervised Dr. Mary Willie's doctoral dissertation on Navajo, and also Dr. Fernando Escalante's doctoral dissertation, the first Ph.D. on the grammar of the Yaqui language written by a Yaqui speaker. She also collaborated with the Pascua Yaqui tribe to produce a grammar workbook, and provided training in grammatical analysis to future language teachers in several workshops organized in the late 1990s.

She communicated the excitement of linguistic analysis and the beauty of grammatical structure to all of those with whom she worked. Those of us lucky enough to have known her will always also remember her sense of humor and her infallible kindness. With her death we have lost a great linguist, a steadfast friend, and wonderful human being.

Heidi Harley and Dick Demers

A 2001 version of her CV can be viewed here.
To view her Arizona Daily Star obituary, and sign her memory/guest book, click here.

Posted by Heidi Harley at 11:20 PM

Be easy

One of the three finalists in the Doritos "Crash the Superbowl" competition is Soul Tap Records' "Be Easy (Koi Naa)", a South Asian hiphop anthem by Nivla and P. Oberoi:

Anna at Sepia Mutiny wrote

I'm massively tickled by the fact that Nivla peppers rap with Malayalam phrases like I do my posts, though he is not as consumed with the word "kundi". Despite that minor shortcoming, when he's flowin "edi penne...ingota va", I'm goin', "HELL YES!".

Nivla may sprinkle some Malayalam into his English, but P. Oberoi's performance is in pure Punjabi. (If you can transcribe and translate the lyrics, please tell me.)

According to Peter Mucha (" Rutgers' Punjabi singer up for Super Bowl ad, Philadelphia Inquirer, 1/25/2008), Parag Oberoi grew up near Princeton, NJ, recently graduated from Rutgers, and now works for Goldman Sachs. Nivla, whose real name is Alvin Augustine ("Nivla" = "Alvin" backwards), is from New York. For more about him, there's an interview here, and some quotes here in the context of a story about arranged marriages in (U.S.) Desi communities. Soul Tap Records has a Superbowl blog here.

The NY Post featured this as a NY vs. Texas contest, because the other two Doritos finalists are based in Austin and Dallas (Raakhee Mirchandani, "Super Subcontinent: NY Act Goes for Bowl", 1/24/2008):

WHILE there's nothing you can actually do to help Eli and the boys beat the Pats next weekend, you can help New York "Crash the Super Bowl." That's an off-field contest to win airtime for a music video during the game.

For more on the corporate context of the contest, see Betsy McKay, "Super Bowl Is Crunch Time for Doritos' Risky Youth Strategy", WSJ, 2/1/2008. (Note that this article begins with a reference to "Nivla featuring P. Oberoi, a little-known hip-hop group", embodying the care with which journalists and editors at big-time newspapers check their facts.)

[One last thing -- Soul Tap Records' logo is an abstract dancing figure that also looks a bit like a devanagari character:

But unless I'm missing something, it's not actually a character used in writing Hindi, or for that matter the vattezhuthu script used by Malayalam. If you recognize this logo as a version of a glyph used in some actual writing system, please let me know.]

[Nihal Parkar writes:

The logo is just an S superimposed over a T, and it is not similar to any Indian language script. I am Indian, and have an acquaintance with most of the important scripts of the country.

I should have seen the S/T connection, which is obvious if I look at the logo in terms of latin characters. But there's a clear stylistic reference, at least, to some of the graphical components of devanagari, and so I wondered whether an actual character in some South Asian writing system was there as well.]

Posted by Mark Liberman at 08:02 AM

Pot-pourri

That's French for olla podrida, and it means "hodge-podge", by extension from the original meaning of "dish made from different kinds of meat cooked together in a stew". The NYT happy-face etymology is a fast-speech form of "pot pour rire", i.e. "just-for-fun pot", though some empiricists with no sense of humor (or concern for readers' sensibilities) insist that the base meaning of pot pourri is really "rotten pot".

Anyhow, whether just-for-fun or somewhat-bacterially-decomposed, here goes.

Here's a geographical quiz question. This is a place where certain real-estate transactions are restricted to speakers of the local majority language, and where kindergarten teachers can be fired for speaking a minority language on school premises, even outside of class. Are we talking about ethnic cleansing in the Balkans, or suppression of the Kurds in Turkey, or local anti-immigrant statutes in the U.S.? No, this is the heart of multicultural Europe. OK, then, it's some local right-wing reaction to North African immigrants, right? No, the minority language in question is French, and the location is a suburb of Brussels, where Francophones have been around since their language was called Vulgar Latin. (Delphine Schrank, "Belgians Limp Along, Hobbled by Old Language Barriers", Washington Post, 1/30/2008).

Across the Atlantic, the authorities in Ridley Township are apparently still thinking about whether their obscenity statutes apply to profane roof-top taunts aimed at the FAA via passing airliners, but meanwhile, another transportation-related free speech issue is simmering in Virginia. Or maybe I should say, "free expression issue", since what state law HB1452 says is: "No person shall display upon or equip any motor vehicle with any object or device that depicts, represents or resembles human genitalia". As you might expect, the controversy has been an enormous boost to the TruckNutz industry. (Kerry Dougherty, "World-Famous Truck-Hitch Bill is Well-Meaning - But Still Nuts", Virginian-Pilot, 1/17/2008).

And in other East Coast news, I've been informed that the use of yo as a gender-neutral pronoun was prefigured in the devotional practices of Yoism, which bills itself as "The world's first open-source religion". This is discussed in Yo FAQ II -- but before you click on the link, I have to warn you that Yoism is apparently also "The world's first religion that auto-plays loud Steven Colbert clips on its web pages", so you might want to turn the sound off on your computer before you click here. Yoist usage rules for the pronoun "yo"are explained (a similar warning applies) here. An apostle of Yosism, writing under the name of Dan Kriegman, has conveyed to me the theory that a Yoist at JHU "spread the meme to some friends studying urban language in the Baltimore school system, and they in turn inadvertently contaminated their student population with it".

[Jason Eisner writes:

"No person shall display upon or equip any motor vehicle with any object or device that depicts, represents or resembles human genitalia." Human? I suppose TruckNutz (like handbrakes) may coincidentally "resemble" human genitalia. But surely they're intended to evoke *dog* genitalia, what with the truckz being on all fourz? And as for what they "depict" or "represent," isn't it mythical *truck* genitalia?

Well, the advertising features squirrels. But the law's use of the word "resembles" covers most (mammalian) bases, I think, though perhaps via unconstitutional vagueness, since it might be taken to prohibit, say, the Edsel's grille or the Jaguar's hood ornament.]

[Update 2/12/2008 -- a francophone reader from Canada writes:

n your February 2 LANGUAGE LOG posting you refer to a suburb of Brussels where "francophones have been around since their language was called Vulgar Latin": this is quite incorrect. While Brussels (and indeed Flanders and everything South and West of the Rhine) was Vulgar Latin-speaking at the time of the fall of the Roman Empire, the region was wholly germanicized in the wake of the Great Invasions: while French was widely known and used as an L2 by Flemish elites throughout the Middle Ages, the vernacular remained (dialectal) Dutch: this is true of the Brussels region as well: it is only during the latter part of the the nineteenth-century, in the wake of industrialization and French-only schooling, that French became the L1 of a majority of the city's inhabitants. There is thus no continuity whatsoever between the Vulgar Latin once spoken there and the French now spoken there: the latter is a transplanted idiom, indeed one which is much younger than such transplanted languages as American English or Canadian French.

I wonder. There's no question that Brussels is in the middle of a Flemish-speaking area, but the Wikipedia article claims that

Research in the city's archives shows that Dutch was by far the most widely used language in both the population and the local administration until the French occupation (1793–1815), even though French had been the language of the local governors since the Burgundian era.

Obviously this is a highly politicized history -- but it does seem that there have been some French speakers around the area for a long time.]

Posted by Mark Liberman at 07:16 AM

February 01, 2008

Incorrections in the newsroom: Cupertino and beyond

Many of the journalistic "incorrections" we've noted here recently, from the "Muttonhead Quail Movement" to "GOP cell phones," can be blamed on the inattentive use of spell-checkers, otherwise known as the Cupertino effect. Thierry Fontenelle of the Microsoft Natural Language Group gave us some insight into efforts on the programming side to improve the advice given by spell-checkers, such as the addition of new words to the speller dictionary, as MySpace was added to the Office 2007 dictionary. And Mark Liberman was recently told by Fontenelle that Obama has been added by Microsoft, available as a part of an update to Office 2003 or 2007. But as Nitya Venkataraman of ABC News now reports, those who haven't updated Office 2007 will get an unfortunate first suggestion for Obama: Osama. A Microsoft PR representative went into damage-control mode, explaining to Venkataraman that the Office spell-checker is not "specifically targeted towards the word 'Obama' to change it to 'Osama.' Instead, the spell-checker just didn't have 'Obama' in its dictionary, so it tried to provide alternative suggestions based on closest match."

Venkataraman bemoans the fact that harried campaign reporters are "one careless spell-check away" from turning Obama into Osama. But it's not just word-processing spell-checkers that can cause trouble for journalists editing their work under deadline pressure. There are other types of automatic replacement than can lead to incorrection, when a global search-and-replace is conducted to conform to a news outlet's style policies. There's an old story that a newspaper once published an article using the expression "back in the African American," presumably due to a politically correct search-and-replace on the word black. According to the urban legend trackers at alt.folklore.urban, the Fresno Bee did actually run an article with "back in the African-American" in 1990, but it turns out it was more of a practical joke than an automated "correction." Nonetheless, there have been a few recent examples where this sort of stylistic search-and-replace has wreaked havoc on published news reports.

M.C. De Marco brings to my attention one such case, in the Jan. 22 edition of the Boston Metro, a free daily newspaper. As noted on the Random Squeegee blog, an article about the observation of Martin Luther King Day explained that "King's birthday is Jan. 15, but the federal holiday bearing his name is observed on the third yesterday in January."


It looks like what happened here is that the Metro, which relies heavily on wire reports for its news content, was making sure that any story mentioning "Monday" would be changed to "yesterday" for the Tuesday paper. So someone went a little too far with the search-and-replace, altering a "Monday" that was not in fact the day before that Tuesday. Thus the deictically grounded use of "yesterday" was entirely inappropriate, leading to a bit of gibberish. ("For the record, the third yesterday in January is January 2," observes John of Random Squeegee.)

A more startling search-and-replace gaffe turned up in an October 2006 Reuters article about honey bees. The eagle-eyed folks at Regret The Error got a screenshot before it was corrected:

Reuters style apparently avoids mentions of "the Queen" (of England, that is), instead favoring the full name "Queen Elizabeth." But that rule of thumb is of course very misguided when applied to a sentence like "The queen has 10 times the lifespan of workers and lays up to 2,000 eggs a day." British tabloid media had a field day with this one, but they should know better: any journalist can fall victim to the perils of search-and-replace.

Posted by Benjamin Zimmer at 11:54 PM