April 30, 2004

We Dispose of Goodle Mystery

The English language Goodle web page to which Mark referred is indeed puzzling, but the mystery is easily resolved if you know a bit about Korean language and culture. 온들 [ondɯl] "warm stone" is the traditional Korean heating system, a kind of hypocaust. In its original form, a wood fire was built in a fireplace like the one shown in the photograph and used to heat a wide stone called a 구들장 [gudɯlʤaŋ] under the floor of each room. One virtue of this system is that the stone has a large heat capacity and so stores up heat from the fire and releases it gradually, making the temperature of the room insensitive to the state of the fire. A variant uses a number of smaller stones with a layer of mud over them. A still later version uses water in a network of pipes embedded in concrete, a system which I believe was a favorite of the American architect Eichler. This company manufactures what they promote as a new, improved, pre-fabricated heating system derived from the traditional 온들. The term goodle is presumably an anglicization of 구들 [gudɯl], a synonym for 온들. The company very likely chose this term in an attempt to play on the English word good as well.

We produce and dispose of inner GOODLE.

So says the English intro at the L&J Corporation's web site. The description explains that the company has "realized Korean traditional heating system Ondol into the product inner GOODLE by succeeding merits and supplementing problems within its system", and that "although it is regarded as the best heating system which has excellent functions and needful merits in it, panel heating system with Ondol has not been sufficiently studied in comparison with research accomplishment in other systems".

Their page on the science of Goodle boasts nine sub-topics: "37.5 C Goodle, Health Goodle, Fast Goodle, Economical Goodle, Strong Goodle, No-defects Goodle, Clean Goodle and Cyber Goodle." I think that these are all aspects of the One True Goodle, rather than different Goodles, but I'm not sure, because after a few minutes of poking around on L&J's site, I'm ashamed to say that I haven't been able to figure out exactly what Goodle is. For once, looking Goodle up on Google left me none the wiser.

[via James Lileks, who mentions finding it by mistyping Google]

Classicists decode Da Vinci

A couple of days ago, Laurie Goodstein reported in The New York Times books section on efforts by Christians to debunk 'The Da Vinci Code', and since then, the classicists have been piling on over at Classics-L. Particularly rough treatment is handed out by Elizabeth Vandiver and Jim O'Donnell.

Sample quotes: "a kind of Never-Never-Land of woman-friendly, tree-hugging values overthrown by the evil Constantine and his goons"; "[t]he conflation of a multitude of different cultures into 'the Ancients' drove me batty"; "It's shoddy, filled with characters who can hardly even be called cardboard, and extremely badly written"; "the supposedly brilliant main characters are annoyingly stupid"; "at least Graves, Jung, and Campbell could *write*"; "It has no redeeming merits whatsoever".

I guess if you got them alone over a drink, they'd tell you what they really think. I'm one of the approximately three people who still haven't read the book, and this thread doesn't make me want to run out and buy a copy.

Weisbergism of the weekend

From an online transcript of an interview with Jacob Weisberg and Ann Coulter, CNN. Aired July 3, 2003 - 20:37 ET:

WEISBERG: I think this cowboy rhetoric may have some cost. I think it certainly had cost going into the war when we were trying to gather support for going to war with Iraq. And because of Bush's unilateral stance and his hostility toward the Europeans, and his attitude that we didn't care what nobody else thought, I think we went to war with less support than we could have had. I don't think that makes sense. It's fine to strike a pose and say we want Osama bin Laden dead or alive, we still want him dead or alive. But I don't think it helps when you look at the bottom line ...

Of course, this might have been a mistake on the part of the transcriptionist...

Intelligence vs. random fluctuations

BlogPulse scans about 750,000 weblogs for "Key Phrases, Key People, BlogBites, and Top Links", and displays the results on a day-by-day basis. The algorithms are said to look for "bursty" items rather than simply common ones. As I understand it, this means that "George Bush" (for example) shouldn't show up in "key people" unless the number of mentions of him increases significantly, relative to some estimate of the expected background level.

This is a demo project from the "Intelliseek Applied Research Center", which was set up after Intelliseek "brought on board key members of WhizBang! Labs, a Pittsburgh technology team specializing in natural language programming, text mining, data retrieval and other technologies." Other refugees from WhizBang!, an unfortunate casualty of the dot.com bust, include Fernando Pereira and Andrew McCallum.

Here's the picture that Intelliseek uses to convey their "technology vision":

It's got a lot in common with the vision of DARPA's current TIDES project ("Translingual Information Detection, Extraction and Summarization"). DARPA has been supporting research in related areas for several decades, and it's clearly about time for this investment to start paying off

The technology in BlogPulse seems to work well in some areas, such as detection of personal names, which seems at least not to have many false positives. This is expected since named entity tagging is a pretty mature technology. I'm more impressed that their listing of "Key Phrases" seems to be picking up strings that really are English phrases, as opposed to word sequences that happen to occur more often than expected but cross-cut phrase boundaries. Phrase-finding with a low rate of false positives is not easy.

However, it looks like BlogPulse is not trying to connect alternate forms of names across documents. For example, yesterday's references to Elton John are listed as instances of the name "Sir Elton", and the references to Kofi Annan are listed as instances of "General Kofi Annan" (apparently because the algorithms truncated "Secretary-General Kofi Annan"). Elton John and Kofi Annan are famous enough that if BlogPulse were tracking entity mentions across documents, and doing a decent job of it, they should be getting these right. So I conclude that they've punted on this one -- and this is a big thing to leave out if you really want to turn unstructured text data into "intelligence". In my opinion, (what's sometimes called) cross-document entity tracking is a key problem for technologies of this general kind, maybe the key problem. That's not just because most users want to see the indexing done right -- it's also because if you make the connections accurately, you get a graph (of entity mentions across documents) that you can use for all kinds of other neat (and non-obvious) stuff.

I'm also not convinced that BlogPulse is doing a very good job of distinguishing random statistical blips in term frequency from significant trends. It's hard to judge this for "key people", since any name that occurs fairly often probably reflects discussions that are connected at least via the individual named, and without a fair amount of fussing with the data, it's hard for me to judge whether (say) Kofi Annan really was discussed significantly more often yesterday than usual.

However, for the "Key Phrases", it's a lot easier to make a judgment on this point, and my evaluation is that BlogPulse hasn't got it right yet. For example, "Key Phrase" #3 (of 40) for yesterday (4/29/2004) was "very good friend", and as far as I can tell from the list of "sample citations" given, none of them have anything to do with any of the others. I'll assume that "very good friend" usually occurs less than 19 times a day, but the fact that it came up 19 times yesterday (in the new entries on 750,000 blogs) seems to have been just a random statistical fluctuation, not any sort of leading indicator of warm feelings of fellowship sweeping through the blogosphere.

I feel the same way about many of yesterday's other "Key Phrases". Maybe the BlogPulse algorithm for estimating likelihood ratios needs a tune-up? Or maybe they forgot the Bonferroni correction or some appropriate approximation to it? This is likely a source of problems, since the number of tests implicitly done is quite large (perhaps as large as count of all the N-grams in the day's blogtext, for 2<=N<=4), and so it won't be easy to steer between the Scylla of fantasy and the Charybdis of obliviousness.

I'm not sure what to make of the BlogBites, which are "weblog entries from the Blogsphere which showcase the past day's burstiest themes." The site doesn't tell us what a "theme" is, algorithmically, and I can't say that their selection strikes me as getting at the essence of anything. I wouldn't be shocked to see the same list presented by some human as his or her idea of the most important posts of the day. But on the other hand, I also wouldn't be surprised to see the list emerging from a selection of first paragraphs at random from the day's scraping of blogtext.

One final comment: the limitation to day-by-day textual listing of Key X's is too bad. It would be nice to see graphs of mentions of Key X's over time -- weeks or months. Then you could really see the pulse of the blogs.

April 29, 2004

Linguist jokes (2): At the pearly gates

A newly graduated linguistics PhD was hit by a bus and tragically killed on the day her dissertation was turned in. Her soul arrived in heaven at the Pearly Gates to meet St. Peter.

"Welcome to the gates of Heaven," said St. Peter. "But let me just say that we have a bit of a problem here. You see, we've never actually had a linguist make it this far -- usually they have lived fairly dissolute lives (you wouldn't believe the things that went on at the 1974 Linguistic Institute), or published things with inaccurate glosses and mismatched brackets or uninterpreted formalisms of one sort or another, and it's clear enough that they're not really suitable candidates for the University of Heaven. But you were just starting out. We're not really sure what to do with you."

"Well, couldn't you just let me in?" said the young woman. "I've tried to be good."

"No, the procedure in these cases, to be scrupulously fair, is to let you experience each and then choose," said St. Peter. "You'll spend one day in Hell and one here in Heaven and then you'll make your decision about eternity."

And with that St. Peter made the necessary travel arrangements and the young scholar was whisked down to the gates of Hell.

She strolled in, naturally rather nervous, and found herself in a lushly vegetated and well-kept courtyard in which stood an elegant Italian fountain. Off the courtyard was a well-appointed seminar room with superb AV equipment, excellent built-in projectors, high-speed radio Internet connection, whiteboards with markers that actually worked, everything.

Down the hall was a very comfortable lounge with a reference library that despite its compact space had the latest edition of the OED; the luxury leatherbound edition of The Cambridge Grammar; every previous grammar she knew about any language; all of Frege's works in their first editions; an unexpurgated `director's cut' hand-sewn edition of The Logical Structure of Linguistic Theory dated 1954... and a subscription to just about every journal that could possibly be relevant to her field. All on open stacks in mint condition.

She began to meet the other linguists who were strolling the courtyard, chatting in the hall, reading in the library. Otto Jespersen was there, and was very nice to her. Edward Sapir, Leonard Bloomfield, and Bernard Bloch all praised her work warmly. She learned that the man in the loincloth meditating by the fountain out in the courtyard was Panini. Jim McCawley took her to a marvellous Chinese buffet for lunch; the salt and pepper prawns flash-broiled in hell fire were fantastic. Through the afternoon there were fascinating discussions on many different linguistic topics. Dinner in the faculty club was a feast of steak and lobster followed by crepes suzette cooked in flames at the table by a demon. Over coffee and brandy she had a brief chance to meet the Devil, who turned out to be a tall, handsome man with a voice rather like Peter Ladefoged's. When the time came for her to leave she was really quite reluctant. But it was time to sample Heaven.

Heaven turned out to be a rather sterile experience of standing around on clouds. It was mildly interesting to discover that she could play the harp (innately triggered abilities, she assumed). The cherubim and seraphim were gentle and polite, but their conversation revolved mainly around falling down before Him in adoration and singing praises unto His holy name, and she rapidly tired of it all. When her 24 hours were up and St. Peter came to ask her for her decision, it was not really very difficult.

"I never thought I'd say this," she said, "I mean, Heaven has been... nice... But I really think I had a better time in Hell. I mean the University of Hell is a better fit for my intellectual interests."

So St. Peter escorted her back. She arrived once more at the gates of Hell, and strolled back in confidently. But the pleasant courtyard was gone.

She was standing in a desolate, filthy, trash-strewn wasteland. The temperature was ninety and rising, and there was a whiff of brimstone in the air. She thought she heard distant howls of agony. The seminar room was a bare room with plaster falling off the walls in a half-derelict building. The library had some battered introductory texts and a few loose copies of Glossa with non-consecutive dates in the 1970s. She did see some linguists, but they were dressed in rags, and appeared to be picking up dead lizards and pieces of potentially edible garbage and putting it in sacks to make an evening meal. They look at her with sad and bitter eyes, pausing from their gathering activities only to tell her that they thought her research was second-rate at best. One of them mentioned that in her absence she had been appointed to a committee. A tattered schedule on a wall said that her first class was at 7 a.m. the following morning.

When the Devil happened to pass by she cried out to him:

"I don't understand! What happened to the library and the Chinese lunch buffet and the faculty club and... What has happened? All the other linguists look miserable, and they seem to hate me. It's all... different!"

Lucifer grinned. He put an arm around her shoulders and laughed a deep, dark laugh. (He really did sound like Peter Ladefoged.) The dark horns high on his forehead, which she had scarcely noticed before, stood out against the glistening scarlet skin, and his arrow-tipped tail waved gently in satisfaction as he explained:

"But yesterday we were just interviewing you! Today you're a junior member of our faculty."

[Non-humorous note: To my surprise, the chairman of a distinguished Department of Linguistics (it shall be nameless) recently emailed a version of this joke to a new PhD graduate from my department after getting that graduate's acceptance in writing of an offer of a tenure-track faculty position. I guess I would have thought it was a bit too much on the cynical side for such a use. Luckily the new appointee had the robustness of spirit to find the joke hilarious, and showed it to me with twinkling eye. Perhaps the sender judged that the way to take the story was as a cautionary tale: a lesson to us all about how not to treat our junior colleagues in the academic profession.]

In the April 19/26 New Yorker, David Owen describes a meeting in Phoenix, AZ, between two writers for the Hallmark greeting-card company and about 20 members of the public. After one of the guests shares a personal greeting-card story, Owen reports this exchange between Hallmark editor Michelle Keller and the audience:

"That's wonderful," Keller said. "And for being the first brave soul to share a story we would like to present you with the Hallmark Blushing Bears.' She held up a pair of white plush Teddy bears dressed in red outfits -- a popular item during this year's Valentine's Day card-buying season. Keller made the bears kiss by pressing their (magnetic) noses together, and a red light inside the female bear's cheeks glowed: a blush. When the other women saw this, they made a sound that is impossible to represent typographically but was approximately "Awwwwwwwwwwwwwwwww!" [emphasis added]

(That's 17 w's, if I've counted right).

It's odd to say that this sound is "impossible to represent typographically". In a sense, no sound can be represented typographically, except perhaps by printing all the numerical values of a digitally sampled waveform. However, Google finds 564 pages with one 'a' followed by 17 instances of 'w', and many of them seem to be instances of the same category of vocal display that Owen recorded, like this one (which comes up first for me):

Awwwwwwwwwwwwwwwww. Sounds like you need a hug *^_^*.

It's true that the number of w's is not standardized, but that just means that people have found many, many different ways to indicate this vocal display typographically -- and Owen is pretty far out on the statistical tail in the choice that he made:

# of w's
564 683 908 1600 2,030 3,550 4,290 5,610 7,860 11,500 16,500 31,100 57,400 131,000 330,000

16: Awwwwwwwwwwwwwwww Soooo cute!
15: awwwwwwwwwwwwwww its soooooooo cute!!!!!
14: awwwwwwwwwwwwww... sweet!
13: awwwwwwwwwwwww those rabbits are so cute!!
12: awwwwwwwwwwww *wipes a tear away*
11: ... awwwwwwwwwww....sweet!!
10: Awwwwwwwwww, what a cutie.
9: awwwwwwwww, bless you look so cute when you was younger.
8: Awwwwwwww This picture is so cute, I can't stop laughing.
7: AWWWWWWW! This is my best friend Eric's puppy- "Bonus". He's just too cute not to share with the world.
6: AWwwwww!! This was just the sweetest story!! I
5: Awwwww. How CUTE!
4: Awwww...Sweet...
3: Awww... You have to watch the slideshow of Annabella on Megan's blog. This little cutey is adorable

And so on...

"Aw+" -- however we decide to spell it -- is just as well-defined a category of vocal display as "wonderful" or "Keller" is. But it's true that it's a different kind of thing. It doesn't refer to a person, place or thing, it doesn't denote a predicate that can be applied to different arguments, it doesn't represent the grammatical relationship of various other words in a phrase. Instead, it expresses a certain feeling. In that respect, it seems somewhat like the communicative displays of animals. As David Hume put it:

It is evident, that sympathy, or the communication of passions, takes place among animals, no less than among men. Fear, anger, courage, and other affections are frequently communicated from one animal to another, without their knowledge of that cause, which produced the original passion.

We also share with animals the ability to express our "affections" with different degrees of intensity. By choosing 17 w's, Owen is trying to suggest a pretty intense -- or at least prolonged -- form of the vocal display spelled "Aw+".

However, if we look a little further in Google's index, we can see that it's not quite right to say that Aw+ "expresses a feeling," since it can clearly be used sarcastically or insincerely:

Now please answer my questions or can you not even do that? I know what I wrote and I don't have to read it again. Did I hurt your feelings, awwwwww poor baby. ...You need to grow up.

Rather, Aw+ conventionally purports to express a certain feeling. Or something like that.

It's often claimed that animals act deceptively -- a standard example is a parent bird pretending to have a broken wing to lead a predator away from its nest -- but it remains controversial whether this is ever done with a real intention to cause another creature to have a false belief. And I don't know of any purported examples of what one might call "animal sarcasm", though it's logically possible. Thus just a human might say "oh terrific, happy days are here again, tofu meatloaf for dinner!", similarly a dog might sarcastically wag its tail to express disapproval of being served kibble yet again. It doesn't seem likely.

There are several interesting questions about the human (American English?) vocal display in question. One is whether Aw+ is ambiguous, or whether there might be several different similar displays to be distinguished here, since one "sense" is an expression of pleasure in perceiving something cute (like most of the examples cited above), whereas another "sense" is an expression of sympathy for someone who is hurt:

Awwwwwwww, too bad, if ya want to cheer up, goto my HOMEPAGE!!!
AWWWWWWWW poor guy that is girl is mean!!!!!
awwwwwwww big hugssssss ~~~~ thats SUCKS when that happens

Both sets of examples represent a kind of nurturant maternal cooing, but the situations and the feelings seem quite different.

Finally, there's the question of how this vocal display varies across languages and cultures. I don't know the answer to this question, but I would guess that a similar vocal display is available to pretty much every human, but with somewhat different phonetic details.

Even in the case of American English, there's a phonetic question to which I don't know the answer. The sound that I associate with the typography Aw+ is a low-mid back rounded vowel usually indicated by IPA "open o" [ɔ], as in my pronunciation of the words in J.C. Wells' "lexical set" THOUGHT. But many Americans have merged the vowel in this set of words with the vowel in the lexical set LOT (so that caught and cot are said the same way), and pronounce all of them with an unrounded vowel that is something like IPA [ɐ] (or ever fronter to [a]). So do people who merge caught and cot pronounce Aw+ with [ɔ:]? or with [ɐ:] or [a:]? I think that they still use [ɔ:], but I'm not sure.

[Update 4/30/2004: Neal Whitman emails:

In my experience here in central Ohio, the answer to your question is no. In the intro lx class I taught last year, most students made no distinction between [a] and [open o], pronouncing both as [a], but when I asked them what they'd say in the presence of a cute little puppy or kitten, they produced the [open o] with no problem.

I suspect that this is the general situation.

Daniel Ezra Johnson emailed to point out that Google also reports sequences with "ah+" parallel to some of those with "aw+". He tried "ah+ baby", but as he points out, many of these use "ahh" etc. to represent a completely different vocal display. However, "ah+ how cute" works better. Thus we have

aww how cute     4,580
ahh how cute       368

with examples like

Ohh here's a picture of my goddaughter enjoying her first christmas, ahh how cute

As Daniel suggests in his note, this is as likely to be a different idea about the appropriate way to spell the sound [ɔ:] ("open o"), as a different idea about the right sound to make to express shared pleasure in perceiving something small and cute.]

[One other comment: Charles Darwin observed, collected and documented every fact that he could about every area that interested him. Yet his book " The Expression of the Emotions in Man and Animals" does not, as far as I can tell, mention the vocalization(s) we're discussing here. Perhaps this is because he paid much more attention in that book to facial expressions than to vocal displays -- I'm reluctant to believe that it could be because Aw+ was unknown in Victorian England, or in any of the other places where he would have had a chance to observe it.]

Thumbs up and the "Mother Ship of culture"

About a year ago, in March of 2003, the meaning of "thumbs up" in modern Iraq was discussed by Bendan Koerner in Slate. Koerner observes that a raised thumb is traditionally an obscene insult in the Middle East, and cites University of Kansas classicist Anthony Philip Corbeill as having concluded in 1997 that in ancient Rome "the thumbs up sign actually meant 'Kill him', basing his assertion on a study of hundreds of ancient artworks."

Michel de Montaigne wrote in 1575:

It was at Rome a signification of favor to depress and turn in the thumbs:

"Fautor utroque tuum laudabit pollice ludum:"

and of disfavor to elevate and thrust them outward:

"Converso pollice vulgi, Quemlibet occidunt populariter."

[Essays XII, "Of Thumbs", translated by Charles Cotton]

Neither Koerner nor Corbeill credits Montaigne; you might think that today's established journalists and prize-winning classics professors have "have become unmoored from the mother ship of culture", as Camille Paglia puts it.

I don't agree with this. There's far too much culture out there for any of us to be in touch with all of it -- it's not so much a "mother ship", at this point, as a few dozen big fleets, along with thousands of more or less independent traders, raiders and pleasure craft. I'm sure that the time that Koerner and Corbeill didn't spend reading Montaigne was put to excellent use in some other way. If I ever read this passage in Montaigne myself, I've long since forgotten it, and I just stumbled on it this morning by a bit of random Google serendipity.

But just to help put us all back in touch with the Renaissance fleet, here's Montaigne's whole essay "Of Thumbs":

Tacitus reports, that among certain barbarian kings their manner was, when they would make a firm obligation, to join their right hands close to one another, and intertwist their thumbs; and when, by force of straining, the blood it appeared in the ends, they lightly pricked them with some sharp instrument, and mutually sucked them.

Physicians say, that the thumbs are the master fingers of the hand, and that their Latin etymology is derived from "pollere." The Greeks called them Anticheir, as who should say, another hand. And it seems that the Latins also sometimes take it in this sense for the whole hand;

"Sed nec vocibus excitata blandis, Molli pollice nec rogata, surgit."

It was at Rome a signification of favor to depress and turn in the thumbs:

"Fautor utroque tuum laudabit pollice ludum:"

and of disfavor to elevate and thrust them outward:

"Converso pollice vulgi, Quemlibet occidunt populariter."

The Romans exempted from war all such were maimed in the thumbs, as having no more sufficient strength to hold their weapons. Augustus confiscated the strength of a Roman knight, who had maliciously cut off the thumbs of two young children he had, to excuse them from going into the armies: and before him, the senate, in the time of the Italic war, had condemned Caius Vatienus to perpetual imprisonment, and confiscated all his goods, for having purposely cut off the thumb of his left hand, to exempt himself from that expedition. Some one, I have forgotten who, having won a naval battle, cut off the thumbs of all his vanquished enemies, to render them incapable of fighting and of handling the oar. The Athenians also caused the thumbs of the Aeginatans to be cut off, to deprive them of the superiority in the art of navigation.

In Lacedaemon, pedagogues chastised their scholars by biting their thumb.

As a result of this little bit of happenstance, I read a few of Montaigne's other essays on the same site. At the risk of being parochial, I have to say that they remind me in style and tone much more of weblog entries than of the kind of "essay" that I was trained to write in school.

Of War-horses, or Destriers: I here have become a grammarian, I who never learned any language but by rote, and who do not yet know adjectives, conjunction, or ablative. I think I have read that the Romans had a sort of horses, by them called funales or dextrarios, which were either led horses, or horses laid on at several stages to be taken fresh upon occasion, and thence it is that we call our horses of service destriers; and our romances commonly use the phrase of adestrer for accompagner, to accompany. ...

Of Cannibals: ... I long had a man in my house that lived ten or twelve years in the New World, discovered in these latter days, and in that part of it where Villegaignon landed, which he called Antarctic France. This discovery of so vast a country seems to be of very great consideration. I cannot be sure, that hereafter there may not be another, so many wiser men than we having been deceived in this. I am afraid our eyes are bigger than our bellies, and that we have more curiosity than capacity; for we grasp at all, but catch nothing but wind. ...

Of Coaches: ... Will you ask me, whence comes the custom of blessing those who sneeze? we break wind three several ways; that which sallies from below is too filthy; that which breaks out from the mouth carries with it some reproach of having eaten too much; the third eruption is sneezing, which because it proceeds from the head, and is without offense, we give it this civil reception: do not laugh at this distinction; for they say 'tis Aristotle's. ...

People often observe that weblogs are an ephemeral form, and that as a result of progressively lowered barriers to publication, too much stuff is being written for anyone to keep track of. There's nothing new in that, for better or for worse.

[Note: I haven't read Corbeill's scholarly works and therefore can't really assert with any confidence that he doesn't cite Montaigne -- I'm relying only on the KU press release that I linked to, which of course he didn't write. But his scholarly work apparently relies on a new compilation of evidence from ancient pictures, a context in which it's not really relevant what some 16th-century French polymath did or didn't notice about classical writings on thumbs.]

Learning to read

There's decoding words, there's assimilating sentences, there's losing yourself in an exciting story. And then there's the kind of reading that you need to do when you want to figure out the answer to a question or assess a writer's position or contribution. Timothy Burke at Easily Distracted has given some detailed instructions on "How to Read in College" that I like a lot.

Burke is offering help to the student who has just noticed that "[p]rofessors assign more than you can possibly read in any normal fashion", and is trying to figure out how to cope. I think that Burke describes exactly the right solution -- "skim, skim, skim" -- and manages to make the description vivid and interesting. He illustrates the method with an example, worked out in detail, that makes sense even if you don't have access to the book he's talking about. And as he explains persuasively, intelligent skimming is a complex skill, not at all just a matter of looking at topic sentences or otherwise dipping randomly into a passing stream of text.

I disagree with one thing he says, though: "The first thing you should know about reading in college is that it bears little or no resemblance to the sort of reading you do for pleasure, or for your own edification." Let's leave aside the question of reading for pleasure -- that depends on personal choices about kinds of reading and kinds of pleasure. But someone long past college will still want to use intelligent skimming techniques to evaluate a proposed medical procedure, a current political controversy or a potential investment.

If your doctor suggests an operation to fuse vertebra or replace a creaky hip joint, and you want to learn what the issues really are, you'll have much more to read than you can possible assimilate in a linear way, and you'll have to cope with the fact that most of it is full of vocabulary and concepts that are unfamiliar. If you have a child diagnosed with reading disability or attention deficit disorder or autism, you'll be in the same situation. Making up your mind about bilingual education or global warming or the effects of cell phone radiation poses the same problems. Ditto for deciding how to invest your savings or where to go for your next vacation.

Depending on your interests and tastes, you'll ignore some of these problems or leave them to randomly-chosen experts or to other random influences. But if you can't assimilate lots of text quickly when you decide that you want to, you're at a real disadvantage in modern life. And the way to assimilate lots of text quickly has almost nothing to do with conventional "speed reading" techniques, and everything to do with the kind of process that Burke describes.

The kind of skimming that's appropriate for scientific text is a bit different -- figuring out which tables, figures and equations are really critical is an issue that Burke doesn't address, for example -- but the process is analogous.

I don't know if it makes sense to try to teach such skills directly. In my experience, schools -- and some aspects of ordinary life -- just pile up the readings, and the people who develop the right skills prosper, while the ones who don't, don't. If it works to give the kind of explicit instruction that Burke offers, it should be done much more widely.

[via Liz Ditz at I Speak of Dreams]

April 28, 2004

Subjects, collaborators, consultants, ???

Claire at Anggarrgoon writes that

When I haven’t been revising my dissertation, I’ve been filling out a form for getting permission to do research on human subjects (aka going to North Australia over the northern summer to do some fieldwork).

I have all sorts of problems with this type of form. Of course, I see why they’re there, and it’s probably better on the whole that I do fill out one. My problem is not with people checking out my research design (I quite like having someone knowing what I'm up to).

BUT, the biggest problem I have is that I don’t view my “subjects” as “subjects” at all – the speakers I’ll be working with are collaborators in the project.

The terminology here is definitely a problem. One way to look at it is that it's better to treat people as "subjects" than "objects," but as Claire points out, "subject" is not at all the right term for the people that field linguists work with. She prefers "collaborator", but I have to point out that collaborate is a word with two senses, one of which subverts her intentions in a nasty way:

1. To work together, especially in a joint intellectual effort.
2. To cooperate treasonably, as with an enemy occupation force in one's country.

The term informant has similar problems:

1a. One that gives information. b. One who informs against others; an informer.
2. One who furnishes linguistic or cultural information to a researcher.

I usually use the term "language consultant." But Claire puts the consultancy relationship the other way around, and says that she "view[s] the role of a field linguist like me ... as a contractor or consultant, rather than as the head of an project dealing with human experimentation." Fair enough, and sometimes the group concerned actually hires the linguist, which makes this relationship explicit. However, the Institutional Review Board (IRB) will still require the linguist to fill out a human subjects form in this case, I think -- even if it is just an application for exemption. As Claire points out, there is something a bit odd here, since a management professor who consults for an outside company doesn't normally think to ask the local IRB for permission to interview the company executives as "human subjects".

Terminology and social relations aside, there are lots of issues about the interaction between language researchers and the "human subjects" review process managed by Institutional Review Boards at American universities. I'm happy to say that I've had generally excellent experiences with Penn's IRB, but I've also heard some horror stories about misunderstandings that have arisen elsewhere when an IRB that is normally vets clinical research protocols comes up against a linguist or an anthropologist: "But you haven't listed all the specific questions that you plan to ask each subject, in the order that you'll ask them!" or "But you need to promise to destroy all recordings and transcripts after the study is completed!" Some of these stories may even be true.

For those who are interested, here is a summary of "Human Subjects Review for Language Documentation" that I wrote about four years ago. I believe that the main thing that has changed since then is that IRBs are more rigorous in insisting that everyone, including researchers in the social sciences and humantities, needs to go through the review process -- including, for example, people collecting oral histories, journalists and so on. As a result, it has generally become obligatory for even clearly "exempt" research to apply to the IRB to be officially declared exempt. Technically, a linguist who asks acquaintances for grammaticality judgments and publishes the results, without going through the IRB process, is probably in violation of the regulations. This is probably also true for someone who makes use of published corpus data. [Of course, IANAL or even an IRB member, and YMMV].

Gray and Atkinson - Use of Binary Characters

As Mark already mentioned, yesterday Russell Gray gave a talk about the work on subgrouping and dating that appeared in a paper in Nature on which I commented a while back. The talk and subsequent discussion clarified exactly what they are doing.

One thing that emerged is that I was right about how they are treating the characters. In biology, "characters" are the features that are used for classification. In a traditional morphologically-based classification, a character might be "has a backbone" or "has a nucleus". In a DNA sequence based classification, the characters typically take the form of "has such and such a nucleotide at such and such a position". In a linguistic classification, the characters have to do with what words particular languages have. I've said this somewhat awkwardly because there is more than one way to set up lexical characters.

When linguists set up sets of words for lexical comparison, whether for classical subgrouping or for lexicostatistics, they are typically arranged by glosses. That is, we list the form that each meaning takes in the various languages. For instance, here is some data for the word for "dog" in a few of the Indo-European languages:


The first three forms are cognate. They descend from the same proto-Indo-European source by regular sound changes. The Latin form looks like it might be cognate to the first three but it isn't - the known sound changes from PIE to Latin do not yield this form. And the English form is not cognate either. In fact, this form is unique to English and of unknown origin.

If we were to code "dog" as a single multistate character, we would have three states, which we can call A, B, and C. The three states represent which of the three cognate sets (two of which, in our example, have only one member) represents the meaning "dog".


Gray and Atkinson did not code their data this way. Instead, they made all of their characters binary. In order to do this with data that are naturally multistate, they split each multistate character into a set of binary characters, one per cognate set. If we recode our "dog" data into binary characters as Gray and Atkinson did, we have to create three characters, one for each cognate set. Each character then represents whether that cognate set is represented in a particular language. For instance, character A corresponds to the question: "Does the language have a form cognate to Sanskrit [ʃvān]?". A 1 means "yes"; a 0 means "no".


The use of binary characters raises one additional point. Once the characters become "does a certain cognate set occur", it ceases to be relevant whether the cognate in a particular language preserves a particular meaning. For example, in the data above, English is shown as having a completely different word for "dog" from most of the other languages. However, English does have a cognate to the form that occurs in Greek, Sanskrit, and German, namely "hound". It is not listed as part of the data for "dog" because it no longer means "dog" but instead denotes a particular kind of dog. However, if we are just asking whether or not this cognate set occurs in English, the answer is "yes", so we must revise the table of character states:


The dataset used by Gray and Atkinson in their Nature paper consists of a set of data created by Dyen et al. to which Gray and Atkinson added data for Hittite, Tocharian A, and Tocharian B. That dataset is organized by meaning, so it does not contain full cognate sets, only those cognates that retain their original meaning. "hound", for instance, would not be listed. That means that to convert multistate characters to binary characters properly, the original dataset has to supplemented with cognates that differ in meaning. This of course does not affect the validity of the method.

The real significance of the use of binary characters is that the mathematical model that underlies the methods they use is based on the assumption that the characters are independent of each other. Whether an animal has a backbone is taken to be independent of whether or not it has a segmented body, and similarly what word a language has for "hand" is taken to be independent of what word it has for "fire". But when multistate characters are split into multiple binary characters in the manner described, the characters resulting from a split are not statistically independent. For the most part, languages have only a single word with a certain meaning and when a new word comes in, the old word disappears entirely rather than moving into a different meaning the way "hound" did in English. That means that in general, if we know that a language has a word belonging to one cognate set, we know that it does not have a word belonging to any of the others. Since we can predict that value of a character given information about the others, they are not statistically independent. The procedure that Gray and Atkinson used to create binary characters therefore violates the assumptions of the mathematical model.

This is a reason to be nervous about the validity of the results that they obtained, but it does not show that the results are wrong. Some violations render a model useless; others have insignificant effects. In this case, we don't know what the impact of the violation is. They are doing some experiments which they expect to provide information about the impact of the use of binary characters.

Moomba to you too

In 1955, the conservative government of Melbourne decided to rename the local version of the traditional May 1 "Labour Day" parade. They settled on the theme "Let's get together and have fun!", which they identified as the meaning of the word Moomba in a local Aboriginal language, a word that they therefore adopted as the new name of their May 1 parade. According to this story at Laputan Logic, they were misled by their language consultants, and Moomba should actually be interpreted as an impolite comment on the whole re-branding process.

Posted by Mark Liberman at 11:31 AM

More on Gray and Atkinson

Yesterday Russell Gray visited Penn and gave a talk based on his much-discussed Nature article with Quentin Atkinson, "Language-tree divergence times support the Anatolian theory of Indo-European origin." (Nature, 426, 435-439). In the audience were Don Ringe and Tandy Warnow, whose reactions I cited in an earlier post, and a collection of linguists, biologists and computer scientists that included Bill Poser and me.

Gray's presentation explained a lot more about their methods than the (necessarily brief) Nature article did. Much of the additional material can be found in this draft chapter (though I understand that a newer version will soon be available).

There was a lively and sometimes heated discussion, both during and after the talk. I have other tasks today, so a detailed account will have to wait for later, but I'll give a few general impressions now. I'm sure that Bill will have some comments as well.

First, everything that I learned reinforced my earlier belief that this is serious and interesting work. Its methods and conclusions remain controversial but they are worthy of very close attention. This is also not a one-shot deal -- Gray is continuing experiments on the Indo-European issues, and has new work on Austronesian in progress.

Second, Gray and Atkinson draw different (and in fact roughly opposite) conclusions from Warnow and Ringe about the reliability of various phylogenetic inferences. As I noted earlier, Warnow and Ringe argue that we can often get good information about tree topology, but (in the present state of knowledge) can't expect any reliable information about times. In contrast, Gray and Atkinson argue that even when tree topology is very uncertain (and even if the history is substantially untreelike as well), it may still be possible to get fairly tight time estimates.

I'm not sure who is right about this. This is partly because I still don't know enough about the details of the models involved. But as far as I can tell from yesterday's discussions, even the folks who know a lot more are really in a similar state. It comes down to an argument about which simplifying assumptions to make, and what effects these assumptions will have on the conclusions that result. I'll go over some of this argument in more detail when time permits.

In thinking about the general problem, an analogy with physics may be helpful. If we assume that the sun, planets and other heavenly bodies are point masses in calculating their orbital dynamics, our model is obviously false to fact. But does this simplification invalidate our conclusions? Well, it might or might not, depending on what calculations we do and what conclusions we want to draw. Any model of orbital dynamics will be simplified -- and therefore false -- to one extent or another. The question is whether this matters with respect to some specific quantitative or qualitative prediction. Giving a correct answer to that question requires a mixture of detailed mathematical reasoning, relevant empirical testing and luck.

One of Russell Gray's slides made this point by quoting the well-known scientific proverb that "A model is a lie that leads us to the truth". I believe that this was originally adapted (by whom?) from a remark made by Picasso:

"We all know that art is not truth. Art is a lie that makes us realize truth, at least the truth that is given us to understand. The artist must know the manner whereby to convince others of the truthfulness of his lies." (The Arts, Picasso Speaks, 1923)

Yesterday Russell Gray made considerable headway in convincing me of the validity of his approach. His talk, and the discussion around it, clarified for me the nature of the simplifying assumptions that he's making, and the (empirical and logical) questions to be addressed in determining whether those simplifications invalidate his conclusions about the dating of Indo-European. He also convinced me that he's continuing a serious program of efforts to test the effects of his assumptions, and that he's serious about understanding and addressing objections. In other words, he's doing science.

April 27, 2004

HEY, YO!!!

So often we are shown how grammar changes over time with comparisons of BEOWULF with Chaucer with Shakespeare with Fitzgerald. But here and now in America, among young African Americans (and affiliated people brown, yellow and white), an interjection has evolved into a piece of grammar under our very noses.

I speak of YO! Time was that YO! was used as in YO! GET OFF OF THAT TABLE! But nowadays, YO! has floated to the ends of sentences and lost its shouting intonation, and has become what linguists would call a pragmatic marker. Listen to young blacks talking casually and savor sentences like THE PARTY WAS REALLY OFF THE HOOK, YO or TELL HIM HE CAN'T BE STEPPIN' TO YOU ALL THE TIME, YO.

For maximal clarity, OFF THE HOOK means roughly "superlatively fantastic" and STEP TO means to initiate a physical altercation.

In any case, we must understand that this is a brand new YO! In the first sentence above, "..HOOK, YO" is pronounced with the same melody as ICE CREAM in the sentence I WAS LOOKING FOR SOME ICE CREAM. The new YO! has no accent, in other words. It has become a little marker of emphasis, also carrying a hint of vernacular warmth, as if to connote that the party was marvelous in a way that the speaker and his interlocutors particularly cherish -- just the right songs, just the right people, just the right feel, yo.

This YO, then, is no longer a call, a shout. It is a word like EVEN when it is used in a similar way. THE SENATOR DIDN'T EVEN SHOW UP FOR THE VOTE, for example. This EVEN is hard to imagine in a newspaper headline because it's too personal, too viscerally judgmental. Only in a parody in THE ONION would EVEN be used like this in a headline. This is because EVEN, here, injects a note of the intimate, foreign to the effort to render newspaper prose maximally objective.

The new YO interests me in that it is not nearly as exotic as one might think. This is how what linguists term pragmatic markers have arisen in languages worldwide. And I have reason to think that less than a century ago, a similar process occurred in a different nonstandard American variety, Brooklynese.

I have been reading through anthologies of the marvelously surreal old comic strip KRAZY KAT by George Herriman lately. Anyone who has missed this minimalist masterpiece concerning an ambiguously gendered cat who craves for scraggly mouse Ignatz to throw bricks at his/her head as a substitute for sex should run, not walk, to their laptop to order a book from Amazon.

The strip ran from the teens into the forties, and Herriman had his characters speaking in a stylized amalgam of highfalutin, Ellis Island, and New York bridge-and-tunnel. Ignatz Mouse (or "Ignatz Mice" as Krazy called him) leans towards the latter.

But now and then, amidst his Jackie Gleason-esque speech style Ignatz comes up with something like: "Hmm -- so he was trifling with me, hey?"

Now, that HEY initially seems a little clumsy. Try saying that line out loud. We imagine HMM or EY rather than HEY. The HEY seems simply unnatural, neither elegant, nor "Yiddische," nor "slangy," but just odd. One encounters that use of HEY in various stone-age American comic strips and vaguely senses that, for example, the artists back then just didn't quite know how to write realistic dialogue.

But Herriman's attention to verbal nuance elsewhere makes it unlikely that these HEYs were just the result of the clumsiness of a pre-Tune-In-Drop-Out man's unfamiliarity with putting real speech on the page. And Ignatz "Mice"'s little verbal tic brings me to mind of something I once caught in an old radio show.

Before I LOVE LUCY, Lucille Ball starred in a radio sitcom called MY FAVORITE HUSBAND, which was in retrospect a kind of dress rehearsal for the television show that would take the nation by storm. Already she was the daffy wife always giving her hubby trouble, including donning costumes and playing parts.

In one episode of the show in the late forties, for reasons I won't bother readers by recounting, Lucy has to pose as a gum-popping gal from Brooklyn. Her characterization includes postposing HEY to every second sentence: WHY DON'T WE MEET DOWN AT THE STATION, HEY? IT WAS THE ONLY WAY I COULD FIND IT, HEY. Again, the HEY has no accent. It wasn't "DOWN AT THE STATION -- HEY!!!!" Instead, "..STATION, HEY" had the melody of "OVERCOAT."

I (born 1965, and having especial occasion to hear vernacular Brooklynese daily in 1986 and 1987) have never heard anyone use HEY in this way. But it is so peculiar that one assumes that the writers based it on some kind of reality. Between Ignatz and Lucy, I hypothesize that in America before about 1950, vernacular speech in, at least, New York City included a use of HEY as a pragmatic marker in a way quite similar to the way baggy-pants teens are today using YO!

I can't help but end this by noting that apparently, Herriman had a healthy dose of African ancestry, born as a creole in New Orleans. But that's just for fun, hey.

Another overnegation

"NYC is the only US city where less than 50% of the households do not own a car". Found on a weblog, and added to our extensive collection of overnegations.

More on Genocide

I'd like to add a few notes to Mark's post on the New York Times' decision to acknowledge the Armenian genocide. Genocide is defined in international law by the United Nations Convention on the Prevention and Punishment of the Crime of Genocide. One of the clearest and most convincing pieces of evidence of the Armenian genocide is the eye-witness acount of Leslie A. Davis, who was American consul in Harpoot and reported on it to the State Department. It puts paid to Turkish claims that Turkey was merely suppressing a rebellion by the Armenians. Suppressing rebellions does not require the mass killing of women and children and the elderly.

It may seem surprising that Turkey continues to deny that the Armenian holocaust took place. No one now alive could bear any responsibility for it, and the evidence is so overwhelming that the denial has no effect other than to make Turkey look bad. Nor were all Turks at the time guilty of genocide. Indeed, some Turks acted heroically to save Armenians, as described here. I think that there are two reasons for Turkish intransigence. One is that the Armenian genocide is the original sin of the Turkish Republic. Although strictly speaking it took place under the Ottoman Empire, it was the tail end of the Empire, and the people responsible were the Young Turks who created the modern Turkish state. In many ways their accomplishment was remarkable. They succeeded in preventing Turkey from being colonized and created a modern, democratic, secular state. Turkey has had its problems, but it has been much freer, more democratic, and more successful at modernization than any other Muslim country. To take but one example, women have had the right to vote and to be elected to national office since 1934. It is understandably painful for those who are justly proud of this accomplishment to recognize that the founders of modern Turkey had blood on their hands.

The other reason that Turkey is unwilling to acknowledge the Armenian genocide is that the Turkish Republic is founded on ethnic nationalism. The Ottoman Empire was a multiethnic state in which the unifying force was Islam. The Turkish Republic, as a secular state, has Turkish ethnicity as its unifying force. The existence of other peoples with territorial claims is thus a threat to the ideological foundations of the Turkish Republic. Turkey has been fairly tolerant of minorities with no territorial claims. Anti-semitism, for example, has not been much of a problem in Turkey. But groups like the Armenians, the Greeks, and the Kurds, who have claims to Turkish territory, are problematic. The Armenian problem was largely resolved by genocide, and the Greek problem by exchanges of population with Greece, but the Kurds remain a major thorn in the side of Turkey. Until very recently, Turkey denied the very existence of the Kurds and their language. Kurds were referred to as "mountain turks". The use and teaching of Kurdish was banned. Turkey only recently relented on this in order to obtain entry into the European Union. Turkey still opposes the creation of a Kurdish state, even in Iraq, for fear of arousing Turkish Kurds.

It's important to remember these things because, perhaps, the next time, somebody in power will care and put a stop to it. Adolf Hitler said "Who now remembers the Armenians?" and went on to carry out his own genocide. But memory is not enough. It is also necessary that the people with the power to prevent genocide care enough to do it, and most of the time, they don't. When the Hutu began to exterminate the Tutsi in Rwanda in April of 1994, the international community did nothing. At the time, there was a small United Nations peacekeeping force in Rwanda under the command of Canadian Lieutenant General Roméo Dallaire. General Dallaire's expert opinion was that he could have stopped the massacre with only 5,000 troops. He appealed repeatedly for more troops and permission to stop the massacre but never received them. I remember watching his testimony before Parliament. I'd never seen a general cry before.

General Dallaire has described his experience in a moving book Shake Hands with the Devil. It is a damning indictment of the United Nations bureaucracy and of the governments of the countries most concerned: Belgium, France, and the United States, which blocked movements within the UN to stop the killing.

Classroom sex and other eclectic acts

"N.Y.U. doesn't attract just smart students, it attracts smart, eclectic students," said Mr. Beckman, the university spokesman. "We had a film student who wanted to film a couple performing a live sex act in front of a class. We had students who set up a swimming pool in their dorm room. Now we have this fellow."

Mr. Beckman is being quoted in today's New York Times, and by "this fellow" he means Steve Stanzak, a.k.a. the "Bobst boy", who has been living for eight months in the basement of the Bobst Library at N.Y.U., due to financial problems, and documenting his experiences on LiveJournal and a web site homelessatnyu.com.

I'm also impressed by Stanzak's resourcefulness, but I wonder whether the entire N.Y.U. administration agrees with their spokesman's other definitions-by-example of "smart" and "eclectic". Limiting discussion to the business of setting up a swimming pool in a dorm room, and thinking about issues such as floor loading and water damage in rooms below, I would have guessed that a different S-word would be more appropriate. In my experience, it's rarely a good idea to put large amounts of water in unusual places. I don't think that any students in the dorm where I live have ever turned their room into a swimming pool. However, an army buddy of mine once tried to turn his rusty old van into a sort of traveling love palace by removing the rear seats and installing a second-hand water bed, and that experiment came to a mythically disastrous end.

Maybe they build dorms differently at New York University (which is never called "New York", like Boston University/B.U. an example of an "X University" not known familiarly as "X"). Anyhow, NYU has given Stanzak a free room -- under the swimming pool? -- and invited him to come talk to them about a better financial aid package, so he'll have a chance to find out for himself.

Historical semantics

An April 17 press release from The Armenian National Committee of New York cites an earlier news release from the International Association of Genocide Scholars to the effect that The New York Times has lifted its long-standing policy against the use of the term "Armenian Genocide".

The press release quotes a "revised guideline for journalists" as saying that "after careful study of scholarly definitions of 'genocide,' we have decided to accept the term in references to the Turks' mass destruction of Armenians in and around 1915", so that "the expression 'Armenian genocide' may be used freely and should not be qualified with phrasing like 'what Armenians call,' etc." The quote continues that "while we may of course report Turkish denials on those occasions when they are relevant, we should not couple them with the historians' findings, as if they had equal weight."

I haven't been able to find either the original or the revised NYT guidelines online, and the NYT has not discussed the matter in its own pages, as far as I can tell from the online search function.

It's amazing that this is still such a controversial issue. Here's a Reuters story, via the NYT, from December of 2003, about a similar judgment adopted by the Swiss Parliament over objections from the Swiss government and vague threats from the Turkish government:

Parliament adopted a resolution, 107 to 67, recognizing the killing of Armenians under the Ottoman Empire in World War I as genocide, defying the Swiss federal government and angering Turkey. Foreign Minister Micheline Calmy-Rey spoke against the resolution, and a spokeswoman said the government hoped the resolution would not strain Switzerland's relations with Turkey, which is deeply concerned about the issue. Turkey reacted swiftly, saying the Swiss assembly bore responsibility for any negative consequences its decision might cause.

Here's the Armenian National Institute's web site on the events of 1915-1923. Their FAQ is a good place to start. Here is the Wikipedia entry on the subject. As the revised NYT guidelines recommend, I'm not going to cite the denials as if they had equal weight, though the Wikipedia entry gives what I believe to be a fair account of the history and status of the controversy.

A minor footnote is the role of denial of the Armenian genocide in the history of spam, as discussed in this Wikipedia entry on Serdar Argic.

[via Blind Höna]

[Update: Gary Bass discusses this in the Talk of the Town section of the May 3 New Yorker.]

Buckets of beer

A couple of weeks ago, the learned Dr. Weevil answered a query about the difference between "Asymmetric information" and "Asymmetrical information" by explaining that the first is merely a trochaic tetrameter, while the second is a hipponactean. He indicates in a footnote that "[t]he hipponactean is named after the Greek poet Hipponax, the only one I know other than Hank Williams (Senior, of course) who writes of buckets of beer." In a later post, he quotes some (translated) verses, though not the one about beer.

According to the LION database, at least one other (English) poet has written a poem about buckets of beer. Well, he's Australian, and there's only one bucket of beer in the poem, which is mostly about bugs, homelessness and nostalgia, but I think that Hank would've liked the content if not the style.

    Baking at night

    Geoffrey Lehmann

You don't get bread these days
with blue and green beetle wings baked into it
and pink stains from some crimson bug.
On hot nights
the lights of the bakehouse drew
all the insects of Waugoola Shire,
and strolling past you could smell the dough.
But they've given up baking at night.

You don't see the fires of the bagmen
under the bridge by the river.
They're extinct too.
Mr Long sometimes humped his swag
for far-off places,
drinking methylated spirits, shadow boxing
and trying to kiss people.
I've tasted his johnny cakes,
flour mixed with salt and water on a fence post
and cooked on a sheet of galvanized iron,
zinc curling off around the dough.
Burned specks turned out to be mouse dung.

After his long tramp across One-Tree-Plain
with a 'cigarette swag'
Jim Long (Old Quizzer) dossed for some weeks
with a dozen other bagmen sprawled drunk
under the bridge at Darlington Point.

He got some meat scraps
and cooked soup for them all in a kerosene tin.
A bagman's three-day-old corpse
when it was noticed
was christened 'Hot and Juicy'.
The bagmen dug a hole by the side of the river,
a bucket of beer
was sent down from the Punt Hotel,
and Constable Brindle read the burial service.

You don't see many drunkards, wanderers
or blind people
(like Mrs Stinson---as children we loved
to see her holding her missal upside down
in church, poor woman).
There's no Cancer Joe for children to taunt.

If I wanted to join the bagmen by the river
under the weeping willows
I'd find no one there,
only the rumble of semi-trailers crossing the bridge,
the big headlights hurtling over.

We live in very moral times.

[from Spring Forest (1994)]

Posted by Mark Liberman at 05:40 AM

Following up on Eugene Volokh's discovery of Jacob Weisberg's fondness for "you know" as a filler, here are a couple of other examples of Weisberg using stigmatized features of the vernacular, in a passage whose structure is incoherent in the way that extemporaneous speech often is:

(link) We were sort of infuriated by that, for a couple of reasons. The main one is the idea that -- I mean, we take our integrity very seriously, and the idea that it's somehow corrupting for NPR to work on a show with journalists from Slate, we didn't understand why, just because Microsoft happens to own us, why we're impure in some way that they're not. [emphasis added]

As I wrote back on Jan. 3

You can make any public figure sound like a boob, if you record everything he says and set hundreds of hostile observers to combing the transcripts for disfluencies, malapropisms, word formation errors and examples of non-standard pronunciation or usage. It's even easier if the critics use anecdotes based on the perceptions and verbal memories of equally hostile listeners.

I think that the excessive focus on George Bush's alleged language problems, fostered by Jacob Weisberg's "Bushism" enterprise at Slate, is a very bad idea. It's a bad idea because it trades on regional and class prejudices -- and yes, I know that George W. Bush has roots in the New England aristocracy, but it seems that he's regarded as a linguistic traitor to his class, just as FDR was seen as an economic traitor to his class. The "Bushism" obsession is also a bad idea because it seizes on and amplifies the most trivial mis-statements, and thus helps push public figures to replace unscripted discussion with artificial exchanges of carefully-packaged sound bites. And finally, it's a bad idea because it exemplifies the urge to replace political discourse with Pavlovian conditioning.

April 26, 2004

Formal children

Fernando Pereira at Fresh Tracks describes a discussion with his wife Ana about the relative frequency of the Portuguese words for "child" (criança/as vs. menino/a/os/as) in Portugal and in Brazil. The discussion was provoked by a passage in John McWhorter's book The Power of Babel, which Ana had been reading, and they explored the answer by reference to ratios of ghits.

It used to be that an unabridged dictionary and an encyclopedia would be kept accessible in middle-class homes, for settling questions of language or fact. Now the dictionary is likely to be an online one, and "the internet" is likely to be used for fact finding in place of the encyclopedia. I'm also seeing more and more cases of people using Google and similar search facilities to address usage questions by counting things. Of course, Fernando and Ana are hardly an ordinary couple in this respect.

If I understand Fernando's post, John turns out to be correct, more or less. The most interesting part is the apparent interaction among singular vs. plural and Portugal vs. Brasil (numbers are rations of criança to menino):


Fernando offers an explanation in terms of the interaction between formality (greater in Portugal) and the specification of gender (which is required for forms of menino but not criança, and thus favors menino in informal uses, since one is likely to be talking about particular kids whose identities and genders are known). The idea seems plausible; it could be tested by examining a random sample of uses of each form. In this connection, t would be nice if Google (or a similar search engine) could be persuaded to return a random sample of the hits for a particular query, rather than the usual relevance-ordered list. Doing one's own pseudorandom sampling is not possible, since Google will not serve up results for starting points beyond 1,000 -- and the top 1,000 ghits are almost always a biased sample.

Anyhow, there is a partly analogous case in English with child/children and kid/kids: the former seems more formal and also more British, while the latter seems more informal and also more American. Of course, there is no gender marking involved in either word. In both the .com domain (probably mostly American) and the .co.uk domain (certainly mostly British), forms of child are commoner than forms of kid. However, the .com domain definitely has relatively more kid/kids (confirming that it is an Americanism). However, the effect of singular vs. plural is opposite in the two domains (numbers are ratio of child to kid (singular) or children to kids (plural). The notion that this is an effect of formality more than geography is supported by the fact that the .edu domain, which is almost all American, is even more strongly dominated by child/children than the .co.uk domain is:


[Update: David Nash points out that because of Google's propensity to ignore apostrophes, the estimates of children/kids ratios are too low, since Google will lump kid's in with kids. (So will all too many writers, alas.) David suggests that counts for childs/child's might help balance things, but one would have to figure out how many of those are the name rather than the possessive form of child.]

A self-annihilating sentence

Ron Hogan recently asked me about an obscure phrase in Sean O'Hagan's Guardian review of a new book on Bob Dylan's lyrics by Christopher Ricks. As a result, I read the review all the way to the end, which I might not otherwise have done. I'm grateful, because the first sentence of the last paragraph is a gem:

The writing of this book was, I'm told, a labour of love and, as such, I am pained to point out how defeated I was by its ungainly style.

This is a wonderful example of what Saul Gorn used to call a "self-annihilating sentence". It's not as succinct as "Essentially is essentially meaningless", but it has real charm. Its finest feature, in my opinion, is its ungainly use of "as such" to mean "that being the case" or "therefore". A full appreciation of O'Hagan's achievement requires a bit of discussion -- so unless you are a fan of grammar or of irony or both, you may want to skip the rest of this post.

The standard gloss for "as such", as in Merriam-Webster, is "as intrinsically considered : in itself". Generalizing from "in itself" to a wider set of pronouns, this is the meaning when Kant writes (in translation) about "ideas as such", when medical researchers conclude that "Fungal spores as such do not cause nasal inflammation", or when a gardener writes that "I have nothing against cats as such but they do tend to use our garden as a toilet."

However, this gloss is incomplete as a picture of "as such" usage. We need to consider two modifications, one a sort of semantic bleaching, and the other a difference in the connections of as and such with the words and phrases around them.

In the first form of generalization, as such may become a sort of weasel wording, allowing the writer to avoid committing to a fully general statement ("I have nothing against the French as such"), or (as Ken Wilson wrote in The Columbia Guide to Standard American English) as such may be "used mainly for emphasis" The story lines of Wagner's operas display surprisingly little narrative skill as such.".

This bleached-out concessive or emphatic as such seems to be what Charles Bernstein meant to use in writing an article entitled "Against National Poetry Month As Such". Bernstein is against National Poetry Month not only as intrinsically considered and in itself, but also in its relationship to other aspects of contemporary American culture and indeed in every other way he can think of, and he proposes that it should be replaced by "National Anti-Poetry Month". In Bernstein's title, maybe "as such" is just verbal boldface, or maybe it's a way of telegraphing the idea that Bernstein opposes National Poetry Month but favors poetry, or maybe (since Bernstein is a poet who believes that "sense remote / Adduces worth") it's both at once.

In the second form of generalization, "as such" is not used to modify a preceding noun phrase, but rather is connected to a noun phrase occurring later; while at the same time, such refers anaphorically to a completely different phrase in an earlier clause. This structure can be understood by looking at a pair of sentences like these:

As an expert in the field, Dr. Gubser is frequently invited to speak at lectures, and members of the media often seek his opinions and analysis.

Dr. Gubser is an expert in the field. As such, he is frequently invited to speak at lectures, and members of the media often seek his opinions and analysis.

In the first example, "as an expert in the field" is connected to "Dr. Gubser", but precedes it. In the second example, "such" is an anaphoric reference to the preceding noun phrase "an expert in the field", and "as such" has exactly the same relationship to Dr. Gubser that the full as-clause did in the first sentence.

Usage mavens generally advise that such phrases ought to connect to the subject of the following clause, rather than to a noun phrase in some other position. On this view, the following version would be deprecated:

? As an expert in the field, members of the media often seek Dr. Gubser's opinions and analysis.

I generally agree with this advice (though I don't see anything wrong with sentences like "As a bonus, we get a Wide Area Network as well.") When a sentence-initial adjunct needs to connect to a specific noun phrase deep in the following material, it can be confusing. However, this advice is very widely ignored, and because it is often so unclear what the connections to the preceding and following material really are, "as such" has come to be used as a kind of generic bit of inter-sentential stitching, as this Columbia Journalism Review note complains.

My impression is that this linking usage of "as such" is becoming quite common -- if you search Google for strings of the form "as such X", for appropriate values of X, you'll find quite a few:

(link) A street retreat is a plunge into the unknown. As such, no one knows what will happen.
(link) everyone will get the email, and as such, everyone is welcome to join in
(link) The Code sets out general principles to guide employees in making ethical decisions, they cannot and are not intended to address every specific situation. As such, nothing in the Code prohibits or restricts Reed Elsevier from taking any disciplinary action on any matters pertaining to employee conduct...

This is a normal example of syntactic and semantic change in progress, and I'm certainly not about to say that these sentences are ungrammatical -- for those who have made the change. However, these sentences are certainly bad (or at least inconsiderate) writing, because they're puzzling to readers who still see as such as a fronted modifier containing an anaphor, and who will seek in vain for suitable preceding and following linkages.

Now we can see what a perfect little piece of found poetry it is, when O'Hagan uses as such as a linkless linker in the center of his ungainly complaint about Ricks' ungainly style:

The writing of this book was, I'm told, a labour of love and, as such, I am pained to point out how defeated I was by its ungainly style.

Posted by Mark Liberman at 10:42 AM

The coolest language in the world

David Beaver appears to be suggesting at the end of Language Log's 800th post that I might really be a Finn, or at least might have an alternate Finnish identity. I wish! Finnish is the coolest language. And I did use it, impeccably, at least once. It felt so great.

While I was in Helsinki for over two weeks to be at the 2001 ESSLLI (it's an annual thing, this year in France), I developed a crush on Finnish. I tried to pick up as many snippets of information about Finnish as I could. I listened to every syllable I could overhear. I read to myself under my breath from every sign I passed. I was walking past a window near the university one day when I saw the word YLIOPISTOKIRJAKAUPPA, and I suddenly stopped and realized with a real thrill that I could understand it. All of it. It's in parts: yli "high", opi "knowledge", sto "location", kirja "book", kauppa "retailer": yli-opi-sto-kirja-kauppa "higher learning place book shop" -- it said University Bookstore!

That didn't define me as a speaker of Finnish (a language about which they tell the old tale that the devil was unable to learn it); that was just passive analytical competence. But it was a start. It gave me hope and confidence for my entry into active use of the language. And finally the opportunity came.

A week or two later my friend Polly and I were returning from a wonderful three-day trip we took to Russia by train from Helsinki (the same train to the Finland Station in St Petersburg by which Lenin traveled to Russia to start the revolution that renamed the city Leningrad). We needed to take a taxi to the small hotel where we would spend our last night in Finland before flying home to the States. And I decided that I was ready to tell the taxi driver where we wanted to go, in Finnish.

The Hotel Arthur (it sounded like Hoh-tel Arrh-toorrh, I had learned from overhearing a Finn mention it) was in a street called Vuorikatu (stressed on vu: you always stress the first syllable of everything in Finnish, whether that seems sensible or not; then the intonation just sort of drops off onto a low monotone as if you aren't very interested in the rest of the word). Katu means "street". Now, Finnish doesn't have a preposition meaning "in"; that isn't how things work. Instead you put the ending -lla on a noun that the preposition would go before if there were such a preposition. And what makes it a little trickier is that the ending changes the last consonant in the noun. The change that the t of -katu would have to undergo, I had learned, would turn it into a d. All I had to do was put all that together, and we wouldn't need to wimp out and rely on the taxi driver's ability to understand us naming the destination in what would essentially be English.

Polly and I slid into the back seat of the cab, and I leaned forward and said to the taxi driver in a carefully studied Finnish accent that I had practiced a few time in my head: Hotel Arthur, Vuorikadulla. And off the driver went. No chat; he had understood me perfectly well in the usual language of the city; he just assumed I was a Finnish speaker. I like to imagine that Polly was extraordinarily impressed with my linguistic skills, though she chose not to say anything. I, at least, felt supremely competent just to have fluently rendered the correct case ending and associated morphophonemic alternation and stress contour on an utterance in a language that the devil himself gave up on. Lucky Polly to have such a master of language, such a linguistic stud, as a travelling companion.

It still didn't define me as a Finn (though if David Beaver really thinks I might be one, I have apparently been mistaken for one twice). But it was more than a start. I didn't just take that utterance out of a phrasebook; I constructed it out of its parts; I phonetically honed the parts to fit them together properly; I did the stress pattern right. That's what command of a language consists in -- the ability to put an expression together to do the job you need to do, whenever the occasion arises. It felt great to do it even once. Some day I'm going back and learn some more. More of the coolest language in the world.

Glottochronology revisited, very carefully

Those with a serious interest in the "neo-glottochronology" research by Foster and Toth, and more recently Gray and Atkinson (see Language Log posts here, here, and here), will want to read these papers:

Tandy Warnow, Steven Evans, Don Ringe and Luay Nakhleh, "Stochastic models of language evolution and an application to the Indo-European family of languages" (.pdf)

Steven Evans, Don Ringe and Tandy Warnow, "Inference of divergence times as a statistical inverse problem" (.pdf)

Russell Gray will be visiting Penn this week, and giving a talk on Tuesday, so watch this space for more information.

The second paper's conclusion is quoted below. It's interesting to see a case in which a statistician, a linguist and a computer scientist agree on the appropriateness of Rumsfeld's "unknown unknowns" saying, and like its formulation well enough to quote it.

Much of what we have said has focused on two issues: one is formulating appropriate stochastic models of character evolution (by formally stating the properties of the stochastic processes operating on linguistic characters), and the other is inferring evolutionary history from character data under stochastic models.

As noted before, under some conditions it may be possible to infer highly accurate estimations of the tree topology for a given set of languages. In these cases, the problem of dating internal nodes can then be formulated as: given the true tree topology, estimate the divergence times at each node in the tree. This approach is implicit in the recent analyses in (Gray & Atkinson, 2003; Forster & Toth, 2003), although they used different techniques to obtain estimates of the true tree for their datasets.

The problems with estimating dates on a fixed tree are still substantial. Firstly, dates do not make sense on unrooted trees, and so the tree must first be rooted, and this itself is an issue that presents quite significant difficulties. Secondly, if the tree is wrong, the estimate of the even the date of the root may have significant error. Thirdly, and most importantly perhaps, except in highly constrained cases, it simply may not be possible to estimate dates at nodes with any level of accuracy...


Therefore we propose that rather than attempting at this time to estimate times at internal nodes, it might be better for the historical linguistics community to seek to characterise evolutionary processes that operate on linguistic characters. Once we are able to work with good stochastic models that reflect this understanding of the evolutionary dynamics, we will be in a much better position to address the question of whether it is reasonable to try to estimate times at nodes. More generally, if we can formulate these models, then we will begin to understand what can be estimated with some level of accuracy and what seems beyond our reach. We will then have at least a rough idea of what we still don't know.


As we know,
There are known knowns.
There are things we know we know.
We also know
There are known unknowns.
That is to say
We know there are some things
We do not know.
But there are also unknown unknowns,
The ones we don't know
We don't know.

-- Donald Rumsfeld
U.S. Secretary of Defense


Have X, will travel

Here's another phrasal template ("snowclone"): searching Google for "have * will travel" turns up X = browser, children, spacesuit, OOPL, geocache, computer, rocket, transgenes, dog and some 215,000 others. Well, 215,000 pages containing such a sequence, anyhow. The origin of the phrase, of course, is the 1950s TV western Paladin, whose hero sported the business card shown on the right.

What brought this up was a piece by Sam Hughes in the most recent Penn Gazette, entitled "Have drill, will travel". It's about Doc Holliday, who got his DDS in 1872 from the Pennsylvania College of Dental Surgery, which later became part of Penn's School of Dental Medicine. I enjoyed the movie Tombstone, in which Val Kilmer played Doc, but I don't remember a role in it for Doc's girlfriend Big Nose Kate, née Mary Katherine Haroney, whose father was personal surgeon to the Emperor Maximillian. She met Holliday in 1878, in Fort Griffin, Texas, where in order to help him escape the consequences of knifing another gambler, she "set a fire in the hotel as a diversion, then used a pistol to persuade the reigning deputy to let him go".

The Holliday piece in the Gazette is a sidebar to an article entitled "Dentist of the Purple Sage", about western novelist Zane Grey (originally "Pearl Zane Gray"), who got his DDS from Penn in 1896. The title is a take-off on Grey's classic Riders of the Purple Sage. One interesting aspect of Hughes article is that he starts it off with a reference to the 19th-century American practice of organizing academic life around inter-class brawls, one of which figured in Grey's autobiographical (non-western) novel The Young Pitcher. All this is by way of agreeing with Semantic Compositions that today's students definitely embody "a shift in cultural attitudes and academic training, both of which indicate a present-day emphasis on material acquisition over other goods", such as fighting skills.

Seriously, SC is only talking about changes since the 1960s, and he's interested in the balance between material goals and "metaphysical well-being", and he cites some persuasive evidence, and he hedges his discussion in many appropriate ways. So I shouldn't kid him about his left-handed defense of Camille Paglia.

As further support for the hypothesis of cultural changes among today's young people, I can't resist quoting from Hughes' description of Grey's initiation into the life of the mind at Penn:

It began when he attended an anatomy lecture in an amphitheater--presumably in the building now called Logan Hall--and made the mistake of sitting in a row traditionally reserved for upperclassmen. A big, blond, "husky-voiced sophomore" got to his feet and roared: "Watch me throw Freshie out!"

Freshie may have been scared, but he wasn't budging. When the sophomore tried to pull him away, Pearl [remember that 'Pearl' was Grey's original first name] gave him a violent shove that sent him backward over a row of seats into the midst of his classmates.

Here's how biographer Frank Gruber, drawing on Grey's unfinished, unpublished autobiography, put it: "Pandemonium broke out. The sophomores rose en masse to get to Pearl, and the freshmen spilled down from their heights to rescue their champion. The amphitheater became a scene of riot, and when it was over Pearl was stark naked, except for one sock. His clothing had been torn from him, including his shoes."

Posted by Mark Liberman at 09:36 PM

The Plain English Campaign people haven't taken Geoff Pullum's advice to get a life, but BBC News has organized its readers to create an all-cliché short story, perhaps as a sort of flooding therapy.

Posted by Mark Liberman at 02:57 PM

The Romans Didn't Always Get It Right Either

Our discussion of the problems of deciding how to pluralize Latin and Greek nouns in English (here, here, and here ) leads me to point out that those who have difficulty with this can take some comfort from the fact that the Romans themselves did not always follow the Greek accurately. Steve of Language Hat referred me to a recent discussion of classical plurals in which Justin points out that octopi is not quite as ignorant as it sounds since the Romans themselves sometimes treated Greek words ending in πους [pus] as second declension. He gives the example of the related noun polypus, which should be third declension but occurs with second declension case forms. Interestingly, one of the citations in this Perseus entry is to a line of Plautus in which we find the accusative singular polypum, a second declension form (the third declension form would be polypoda). Plautus was very familiar with Greek - indeed, his plays contain many pasages in Greek - so he was surely not ignorant of the Greek form. That even he would shift the declension shows that this must have been a common phenomenon.

Gildersleeve and Lodge's Latin Grammar has a discussion of the declension of Greek nouns at pp. 32-33. They say that many Greek nouns have mixed declension in Latin, with second declension forms used alongside third declension forms, as with polypus. However, they also say that this mixture is pretty much restricted to the singular; the Romans usually followed the Greek in the plural.

Gildersleeve and Lodge also point out that the Romans sometimes took the accusative of the Greek word to be the stem. For instance, Greek κρατήρ "punchbowl" is a third declension consonant stem noun with accusative singular κρατῆρα. It shows up in Latin both as a third declension masculine, nominative singular crater, genitive singular crateris, as it ought to if one follows the Greek closely, and as a first declension feminine, with nominative singular cratera, genitive singular craterae.

By the way, there's a handy summary of the basics of Latin plurals here.

Posted by Bill Poser at 02:12 PM

In due time

Stephen Laniel was inspired by Bill Poser to write that

A little post about Latin plurals really makes me want to learn Latin. I’ve wanted to for a while, along with Ancient Greek. All in due time, I guess.

I certainly don't want to encourage delay, but I can offer an inspirational story about learning Greek later in life.

Around 1970, in his mid-60s, I.F. Stone retired from journalism, taught himself Greek, and began systematically reading the extant classical literature in search of "one last scoop." The result was his book The Trial of Socrates, published in 1988, shortly before Stone's own death.

I once met Stone, at lunch at the house of the publisher Ralph M. Ingersoll, who was the father of a school friend. This was (I think) in 1963, when I was 15. Ingersoll, who was contemplating retirement himself, asked Stone about his plans. Stone said that he had always wanted to read classical Greek literature in the original language, and that after retiring, he planned to learn Greek and indulge himself.

I remember feeling surprise that he wanted to do this, and skepticism that he would follow through on the idea. So I was surprised and impressed when his book came out a quarter of a century later, and I took special pleasure in reading it. I'm about the same age now that Stone was in 1963, so I can appreciate his achievement in a different way.

Here's an obituary by Ralph Nader, and a review from the right , both of which underline the appropriate irony of Stone's fascination with Socrates.

Posted by Mark Liberman at 11:22 AM

Correspondents have pointed out some further examples of Pseudo-Latin Plurals. The cases that I have previously mentioned involve incorrect choices of stem form and ending. The new examples involve attempts to make plurals of things that in Latin were not nouns.

Donald Davidson brings to our attention sub poenae, intended as the plural of sub poena. In legal English sub poena is used as a noun to refer to a court order for a witness to appear or to produce documents or other evidence. In the later usage it is short for sub poena duces tecum "Under penalty you will bring with you". duces is the verb "you will bring". tecum is the combination of the pronoun te "you (singular)" and the preposition cum "with". sub poena is a prepositional phrase consisting of the preposition sub "under" and the noun poena "penalty". A prepositional phrase is not a noun and cannot be made plural by changing the ending of its noun to the plural. Indeed, the nominal part of this prepositional phrase is not in the nominative case. sub governs the ablative case. The way Latin is normally written, you can't tell, but the /a/ of sub poena is long whereas the /a/ of the nominative singular poena is short. You could of course put poena into the ablative plural, but sub poenis (with long /i/) would mean "under penalties". In short, sub poena is a noun only in English, not in Latin, so the only way to make it plural is the English way: sub poenas.

Claire Bowern of Anggarrgoon has encountered non sequituri, intended as the plural of non sequitur. In Latin, non sequitur is a sentence, not a noun. It consists of non "not" and the verb sequitur "it follows". As a sentence, it cannot be made plural by adding the nominative plural suffix for second declension nouns. As an English noun, it has the English plural non sequiturs. If we really wanted to make a Latin plural, we could, since in this case the verb could be made plural - non sequiuntur would mean "they do not follow". That might be a little obscure.

[Update: Keith Ivey has pointed out another example of this type, ignoramus, which is sometimes given the plural ignorami, e.g. here. This would be correct if ignoramus were a second declension noun, but it isn't. ignoramus is a verb meaning "we are ignorant of". It became a noun through its use as the name of a character in the 1615 play by George Ruggle of the same name.]

[Update: John Kozak mentions seeing agendae which I too have encountered from time to time. Google turned up plenty of examples, perhaps the most embarassing of which is this list of information about meetings at the University of North Texas . The problem here is that agenda is the plural of agendum "something to be done"; there is no singular agenda for agendae to be the plural of. Of course, in English agenda is used as a singular noun, so there is no reason we shouldn't use the English plural agendas.]

[Update: Keith Ivey has pointed out a similar example: omnibus, which is sometimes given the plural omnibi, as here. omnibus is a noun, or, to be precise, an adjective used as a noun, but it is already plural and in the dative case. omnibus is the dative plural of omnis "all" and means "for all". It therefore cannot be further inflected as if it were a nominative singular noun. ]

Posted by Bill Poser at 10:20 AM

April 24, 2004


octopus A word whose plural is particularly problematic is octopus. Here are the results of a Google search:


The leading plural form is octopi, which is wrong. The net is not always an authoritative source of information. octopi is neither the English plural nor the classical plural. octopi is what you get if you take octopus to be a second declension Latin noun. But it isn't, so octopi is wrong. The last form, octopii, is doubly wrong. It is apparently based on the same false assumption that it is second declension, but it uses a non-existant plural suffix ii, whose origin I have previously discussed.

The second most popular plural, octopuses, is one of the forms considered correct by dictionaries. It is not a classical form. It is the regular English plural. The correct classical plural is the next-to-least popular, octopodes. The reason is that octopus isn't really a Latin word; it's a Greek word that was borrowed into Latin. In Greek the nominative singular was ὀκτώπους [oktopus] and the plural was ὀκτώποδες [oktopodes]. Like many Greek loans into Latin, it is declined more-or-less as in Greek, as a third declension noun.

What about octopods? This is a scientific term derived by making an English plural from octopod, which is the bare stem of the Greek word, not its singular. A guess is that octopod is a backformation from the neuter plural octopoda, the name of the order containing octopuses.

There is one more form that we haven't discussed: octopussies, for which Google yields 411 hits. Not all of them are references to octopuses, but many are. This is evidently based on a folk etymology of octopus as containing puss "cat", from which the hypocoristic pussy is derived. The same folk etymology is the basis for the pun in the nom de guerre of the title character of the James Bond film Octopussy.

Posted by Bill Poser at 09:40 PM

Ask Language Log

Ron Hogan of beatrice.com emailed an inquiry:

"Browsing this UK review of the latest book from Christopher Ricks, due to come out here in the US shortly, I came across "the phrase 'from the off,' which I've never seen before. As best I can gather, it appears to be a cricket reference, perhaps with "off" short for 'off stump'".

Sean O'Hagan is the author of the review, and here's the phrase in context:

From the off, Ricks dives headlong into Dylan's lyrics, putting all his faith in close readings of the texts, and the texts alone.

Well, I don't think I've ever seen the phrase either, though I have recent evidence that my memory is not always reliable for such things. However, the editors of the OED have encountered it, and they document the encounter, way down at the bottom of the entry for off, sense D. 5.:

colloq. The start of a race... Also (in extended use): the start (of anything), the beginning; departure; a signal to start or depart.

Judging from the citations given, it's a British sports metaphor -- and perhaps a fairly recent one -- but from horse racing rather than from cricket:

1946 Sporting Life 15 June 1/1 Some open betting saw Paper Weight favourite at the ‘off’.
1966 J. PORTER Sour Cream xiv. 180 It was too late. The students nearest to him..thought this was the off. They began to move forward.
1978 Lancashire Life Apr. 50 (caption) Tangle-wrangle: Stan Lyons waits on the slipway for the ‘off’, while helpers sort-out the lines from his harness.
1999 I. RANKIN Dead Souls xi. 68 Rebus knew..how juries could decide from the off which way they'd vote.

O'Hagan's review may have some American-puzzling bits in it, but he also trips over his own trans-Atlantic shoelaces when he identifies Ricks as "formerly professor of English at Cambridge, now professor of humanities at Boston". This entangles O'Hagan in the knotty business of reference to academic institutions, which is one of those quasi-regular aspects of English that are as hard to get right as the plural of nouns ending in 'f', or the endings of ethnonyms.

Ricks is actually employed at Boston University, which is also familiarly referred to as "B.U." (or "BU") but (I believe) is not known as "Boston" to anyone (outside of the pages of the Guardian, of course).

I goofed myself on a similar matter of academic nomenclature a few weeks ago, when I wrote about York University instead of the University of York, and had to be set straight by Geoff Pullum, who (as a graduate of the latter institution) pointed out that "York University" is an entirely different place, located in Toronto rather than in the U.K.. From evidence on their respective websites, I'm confident that both "York University" and "the University of York" are sometimes familiarly called just plain "York", but if I hadn't seen the evidence there, I wouldn't be sure about it.

This raises the interesting question of how I can possibly be so sure that Boston University is never (appropriately) called just plain "Boston". I've never been told this explicitly; no one has ever corrected me (or anyone else in my hearing) for saying or writing the wrong form; there is no principle I can think of that it follows from. (Of course, I could simply be wrong about this -- but the point here is that I believe that I've learned it.)

In fact this is a case of implicit negative evidence, a phenomenon that some language learning theorists have claimed not to exist. I've learned that it's "wrong" to refer to BU as "Boston" because I've heard people refer to BU, in speech or writing, many thousands of times in my life, and none of them (I think!) ever referred to it as "Boston" before Sean O'Hagan did. This evidence, though statistical in nature, is good enough to give me high confidence in the conclusion.

I believe that the role of this kind of evidence in human language acquisition is relatively uncontroversial now, though some researchers still haven't really digested its implications. However, I can tell you that when I made this point in a talk at MIT about a decade ago, it was by no means uncontroversial. I should also say, for those of you who are not in the biz, that this is all connected to an interesting piece of intellectual history, dealing with the nature and source of human knowledge, which is also connected to a famous sentence from the 1950s, discussed here. But that's a topic for another post.

Little words

I was puzzled by one aspect of a recent post of Mark Liberman's on changes in textual complexity. Mark's post is part of a thread which Geoff Nunberg  extended most excellently. See also Paglia's original, Semantic Compositions' discussion, and a meta-comment from Mark.
The Paglia original is far from a coherent argument for anything, so that merely attacking her premises seems to miss the point, although I agree with her conclusion that `language must be reclaimed from the hucksters and the pedants' (at least on my interpretation of what she means). However, I won't have any more to say as regards Paglia's premises, argument or conclusions here. What I want to discuss is a bunch of little words that Paglia probably couldn't care less about.

Mark's argument concerned the lack of evidence that popular culture is in decline, and, in particular, the lack of evidence that school texts are being dumbed down. He included a pretty graph showing how words are distributed in different text genres. After a little while digesting the graph, understanding the log scale etc. (ahh, so that's what a language log is...),  I decided there was something about it that confused me, although unconnected with Mark's main point. It seems that 1st grade readers have relatively fewer very high frequency words than do newspapers and scientific abstracts. Funny, without thinking about it I would have guessed just the opposite.

Here is the graph (from Marty White at Cornell, and found in this nice document describing White's research):

As we move along the x-axis, we consider successively less frequent words. For a given word rank, the height of the graph tells us what proportions of words in the text have that frequency or greater. Thus we can see that the 1000 most common words in 1st grade readers account for over 90% of the text, while the top 1000 account for less than 70% in newspapers, and less than 50% in scientific abstracts (from Nature, not Science as you might interpret the legend to mean). This is supposed to provide an objective justification for the intuition that 1st grade readers are less complex than newspapers which are less complex than scientific abstracts.

So far, so good. But what disturbed me is that the 1st grade reader line in the graph starts out so amazingly low, and only reaches the others after rank 25 or so. While for newspapers and scientific abstracts the 10 most frequent words account for about 25% of all words, the 10 most frequent words in 1st grade readers together account for only 15% of all words. Couldn't this be used as an argument that the kiddy texts are more complex than the grown-ups' texts?

I don't know whether anyone else was puzzled by this. Maybe co-loggers and readers with a statistical bent will think it obvious and unsurprising. But it puzzled me. So in order to restore order to my world I dreamed up a hypothesis. The idea is that because sentences in kids texts are syntactically much simpler than sentences in adult texts, and involve less sophisticated connections between sentences and between the proposition expressed and prior world knowledge, the kiddy texts have much less need for words like the, of, and and to - the four most common words in the Brown corpus, and I guess probably also in the children's texts that White surveyed. Which leads me to...

The little words do big things hypothesis:  the most frequent words in a text are closed class words that are essential for stringing together complex sentences and texts, and their frequency is proportional to (or at least some upward monotone function of) the average syntactic complexity of the text.

So if we're worried about the complexity of kids' texts, maybe we shouldn't ask whether the texts have enough big words, but whether they  have enough little ones.
In principle, yes

Camille Paglia owes (the anonymous author of) Semantic Compositions a hug, for his spirited defense of her unsupported generalizations about generational changes in attention span and verbal facility. Not a very big hug, though, because his defense reminds me of the Radio Yerevan jokes that a college friend of mine used to collect:

Question to Radio Yerevan: Is it correct that Grigori Grigorievich Grigoriev won a luxury car at the All-Union Championship in Moscow?

Answer: In principle, yes. But first of all it was not Grigori Grigorievich Grigoriev, but Vassili Vassilievich Vassiliev; second, it was not at the All-Union Championship in Moscow, but at a Collective Farm Sports Festival in Smolensk; third, it was not a car, but a bicycle; and fourth he didn't win it, but rather it was stolen from him. (loosely adapted from this page)

SC's version is something like this:

Question to Semantic Compositions: Is it correct that "interest in and patience with long, complex books and poems have alarmingly diminished not only among college students but college faculty in the U.S.", because "the new generation, raised on TV and the personal computer but deprived of a solid primary education", lacks "the most basic introduction to structure and chronology", has "degraded sensitivity to the individual word and reduced respect for organized argument", as well as "demonstrably reduced attention span", so that "[s]tudents now understand moving but not still images"?

Answer: In principle, yes. But first of all...

Read the rest here.

April 23, 2004

Pseudo-Latin Plurals

This reference to a review:

a trendy new English cookbook devoted to the preparation of offcuts, snouts, rectii, marrow, and bladders of all description.
contains a pseudo-Latin plural. What is evidently intended is the plural of rectum, which is properly recta. rectii would be correct if the stem ended in i, that is, were recti, and if its gender were not neuter, in which case the nominative singular would be rectius. It's interesting where these things come from. I suppose that people who don't actually know Latin but think that a word should have a Latin plural work by analogy from other Latin plurals they have heard. In this case, the analogy must be quite indirect, since no Latin noun ending in -um has a plural in -ii. (Such a noun would have to be a non-neuter ending in ium, which to my knowledge does not exist.)

It's easy enough to see how someone who doesn't know Latin could fail to realize that certain plural endings go with certain singular endings. That would account for someone deciding that the plural ending was i, not realizing that this was true only of masculine nouns, not neuters. So recti would not be a surprising error. But it looks like this form is derived by adding the ending ii to the stem rect. Where does this ii come from?

Some plurals do end in ii, but they are all plurals whose singulars end in ius, e.g. gladii "swords", singular gladius. Why don't people assume, as seems natural, that what is invariant, that is, gladi, is the stem, and that the plural ending is therefore just i? And how do they decide that the stem of rectum is rect? My best guess is that they have come up with a generalization about the form of the case/number endings, namely that they consist of one or more vowels possibly followed by a consonant. This is true of nominatives singular and plural of all nouns other than some third declension consonant stems. Someone who doesn't actually know Latin will generally encounter nouns in the nominative case, so that much is plausible. What I can't quite explain is where they get the idea that the ending can contain more than one vowel. Maybe they pronounce ii and i the same and so treat both as a single vowel, leading them to treat ii as a single morpheme and forcing them to conclude that the ius of words like gladius is also a single morpheme.

[Update: A convenient summary of Latin declension and conjugation is available on-line here.]

Posted by Bill Poser at 11:04 PM

Hey day

Today was the last day of spring term classes at Penn, and therefore it was also hey day, when the junior class celebrates the beginning of their end. The OED explains that "hey-day" is "apparently a compound of [the interjection] HEY; the second element is of doubtful origin, but at length identified with day. The early heyda agrees in form, but less in sense, with Ger. ˈheida, heiˈda = hey there!".

The sense is given as "An exclamation denoting frolicsomeness, gaiety, surprise, wonder, etc.", and the citations include others 1598 B. JONSON Ev. Man in Hum. IV. ii, Hoyday, here is stuffe! , a sentiment with which I'm sure we can all agree.

There is also a noun hey-day or heyday, which the OED says is "Of uncertain origin; perh. connected with prec." (I love how the OED saves space by abbreviating words like "perhaps"), and glosses as "1. State of exaltation or excitement of the spirits or passions." or "2. The stage or period when excited feeling is at its height; the height, zenith, or acme of anything which excites the feelings; the flush or full bloom, or stage of fullest vigour, of youth, enjoyment, prosperity, or the like."

And that's exactly what it was.

Conversations with Helpful Google

A while back, we pointed out (following the lead of Cinderella Bloggerfeller) that Google's spelling correction algorithms could sometimes produce amusing exchanges like this one:

Search entry: gaaaaaaaaaa

Helpful Google: Did you mean: gaaaaaaaaa ?

A recent observation by Trevor at kaleboel led me into a new sort of dialogue with Helpful Google:

Query -- Prabble gnack pubble, tnil pniffertrub

Helpful Google-- Did you mean: Prebble gnack pubble, t nil pniffertrub ?

The exchange that Trevor cited, following Syntactic Saccharose, is a bit different, and depends on the fact that "Flurble gronk bloopit, bnip Frundletrune" is a string that is found in certain packets emitted by version 3.2.0 of NetStumbler (version 3.2.3 uses "All your 802.11b are belong to us" instead, etc.). Misspelling "Flurble" in that string leads Google to do something genuinely helpful, namely find the original NetStumbler string. In contrast, the probe cited above produces amusing results precisely because I'm (in effect) "teasing" earnest old Google with something that is beyond hope. Why this should be (even mildly) funny is a question in social psychology whose answer I don't know.

[Update: In a slightly different vein, Scott Parker emailed this conversation:

Search Entry-- ohhhhhh (o followed by 6 h's)

Helpful Google-- Did you mean: ahhh ?


Posted by Mark Liberman at 08:20 AM

Sauce for the gander

Eugene Volokh has found some transcripts of Jacob Weisberg speaking extemporaneously, and given him a small dose of his own "Bushism of the day" medicine.

A few months ago, I pointed out that

You can make any public figure sound like a boob, if you record everything he says and set hundreds of hostile observers to combing the transcripts for disfluencies, malapropisms, word formation errors and examples of non-standard pronunciation or usage. It's even easier if the critics use anecdotes based on the perceptions and verbal memories of equally hostile listeners.

At the time, I looked around on the web for transcripts of Weisberg interviews, and came up empty. So I suggested this:

I'll buy dinner for Jacob Weisberg, if he'll let me record a couple of hours of convivial conversation..., and then examine the transcripts carefully for Weisbergisms ...

EV looked harder, or more cleverly, and found five pieces of evidence that Mr. Weisberg is given to using "you know" as a hedging pause filler. As EV then says:

Our oral comments are full of this sort of filler, and of grammar and usage errors of various sorts. Nearly anyone who has read a transcript of his own comments can tell you that.

But given that articulate, thoughtful people like Weisberg say these sorts of things, where's the humor, the aptness, or anything else in finding instances of Bush doing the same?

It would be even better to go through some recordings, since transcripts always underestimate the degree of disfluency.

Lamkin Perbeck, pretender

Ray Girvan emailed to mention that the Henning Mankell (or was it Hanning Menkell?) thread reminds him of Sellar and Yeatman's "1066 and All That",

"where the authors deliberately play on the names of the pretenders during the reign of Henry VII, Lambert Simnel and Perkin Warbeck. (They refer to them variously as Warmnel, Perbeck, Wimneck, Warmneck, Lamkin, Lamnel, Simkin, Permnel, etc)."

Note that this is exponentially worse than the Manning Henkel problem, since there are not two but four dissyllables to conjure with.

The outlines of a Henning Mankell experimental paradigm are beginning to emerge -- present one or more reference names, and then (after a delay and perhaps some distraction) ask subjects whether each of a set of probe names was in the reference set. The aim would be to predict error rates and reaction times, based on a model of the structure of the morphophonemic subspaces involved.

In fact, there's probably already a relevant literature on this... In any case, I'll bet that such phenomena turn out to exhibit the same lack of sequential independence that was demonstrated repeatedly here (by google counting methods) for spelling variation in words like "emperor", "jennifer", and "attila".

Posted by Mark Liberman at 03:52 PM

Attention Deficit

Mark's posting on Camille Paglia's charges of decline in attention is right on the mark -- this is just an antique jeremiad in new packaging. People have been saying the same thing for centuries, with no more justification than anecdotal observations. (As Montesquieu said, looking back on the long line of complaints about the state of culture: "If all of this were true, we would be bears today").

One person who has made an honest effort to quantify these effects is Todd Gitlin. In an article in The Nation a few years ago called "The Dumbdown," he reported a study he'd done that showed that that the length of the average sentence in novels on the New York Times bestseller list has decreased by more than 25 percent over the last sixty years, while the average number of punctuation marks per sentence has dropped by more than half.

But that method is subject to lots of confounds -- for one thing, the bestseller lists are computed very differently now. And I was curious enough about this to do my own little study, with Brett Kessler, which revealed a very different pattern.

Kessler and I did similar calculations, not for bestsellers but for articles from The New York Times and Science. (For the Times we took the lead sentences of the most prominent story for each of 40 consecutive daily issues starting in October 1 of every twentieth year, going back to 1856. For Science we used a slightly different but roughly equivalent method for selecting articles, beginning in 1896.)

We found that both sentence length and number of punctuation marks per sentence had indeed declined slightly over the period that Gitlin looked at -- in 1936 the average Times lead sentence was almost 38 words long; in 1996 it was less than 35 words long. The figures in Science were analogous -- between 1936 and 1996, sentence length dropped from 27.1 to 25.8, though the average number of punctuation marks per sentence actually increased slightly. In fact that was part of a century-long cycle:

Words per sentence
Punctuation per sentence
Words per sentence
Punctuation per sentence

But what does all this mean? The differences are pretty small, and in any event it would be hard to argue that either publication has been dumbed down over the course of the past 70 years, or that it requires less attention to read Science now than it did then.

And even if you were determined to interpret declining sentence length as an indicator of declining attention capacity, you'd be led to a curious conclusion. In fact, the mean sentence length in the Times reached a low of 21.0 in 1956, but since then it has been climbing -- by 1996 it was up almost 75 percent from its low. And the average number of punctuation marks per sentence has almost doubled over that period. Similarly (if less dramatically) for Science, where there has been a 10 percent increase in mean sentence length over the past 20 years.

In short, sentences have gotten longer and more complex since Camille Paglia's youth. If you're looking for a decline in attention, you might start with hers.

Henning Mankellismus in icon

Inspired by Geoff Pullum, Håkan Kjellerstrand at hakank.blogg has written an icon program to generate plausible variants of "Henning Mankell" and compare them with the list in Geoff's original post. Kjellerstand is also the author of the perl module MakeRegex, which "composes a regular expression from a list of words", based on common prefixes. I've been hoping, though, that someone will follow up on David Beaver's post by writing a program to help with (various approaches to) estimating the statistical density of what David called "the Henning Mankell morpheme space".

This is a serious issue in psycholinguistics, as should be clear from reading what Geoff and David wrote. More on it later, maybe.

Memetic virus scanner

Posted by Mark Liberman at 08:31 AM

Kennings handled

If you want to read Eddaic and Skaldic poetry, and it's not obvious to you that "hunger-diminisher of din-seagulls of animal-gleam of Heiti" is a fancy way to say "warrior", you'll want to check out this Lexicon of Kennings and Similar Poetic Circumlocutions. For an example of a more complex and discursive analysis, see the essay "When is a fish a bridge? An investigation of Grímnismál 21" on the same site.

I learned about all this from an entry in Ray Girvan's Apothecary's Drawer weblog, which also includes references to several other traditions in which reference is mediated by complex allusions: Maori proverbs, and the Tamarian language of Darmok from Star Trek.

Perhaps the biggest single source of kennings in contemporary American culture is The Simpsons. Consider for example the post on Long Story Short Pier entitled "We are all Frank Grimes now". The meaning of the allusion to "Frank Grimes" is clarified for outsiders by a link to Simpsons episode 4F19.

One of the comments on the LSSP post states: "Release the robotic hounds that shoot bees from their mouths!". Unlike the main post, the comment provides no exegetical link, but we can begin to understand it by consulting this page entirely devoted to cataloguing animal-attack references from the Simpsons, which in turn brings us to Simpsons episode 1F16, in which we find this passage:

Homer: Bart, you're coming home.
Bart: I want to stay here with Mr. Burns.
Burns: I suggest you leave immediately.
Homer: Or what?  You'll release the dogs, or the bees, or the dogs with
       bees in their mouths and when they bark they shoot bees at you?
       Well, go ahead -- do your worst!
        [Burns slams the door and locks it]
        [disbelieving] He locked the door!  I'll show him -- [rings the
         doorbell and runs away]

I'm still not sure what it all means -- fish, bridge, dogs, bees, doorbell, whatever -- but Simpsons 1F16, like Grímnismál 21, clearly requires attentive navigation of a dense semiotic network.

Henning Mangled

Geoff Pullum wonders why he and his wife find the name "Henning Mankell" so much  more confusable than the name of Henning's most famous creation, "Kurt Wallander." Could be "Hanning Menkell." Could be "Henkel Manking." Could be almost anything. Or, to restrict it a little, anything with an "M", an "H", an "en" an "an", a "k", an "ing" and an "el" or "ell."

Presumably, the reason is connected to the state of Geoff's mind. And that of his wife, philosopher Barbara Scholz. And presumably the states of their minds are related to what they have experienced. And presumably what they have experienced relates to what is in their environment. And I'm not in their environment very much, although I was in Geoff's environment last week, and I had a great time. Thanks for the curry, Geoff. But given that I'm not in their environment very much, I can only guess at what has been in it. And using the argument of the drunk who looks for his keys under the lamp post, what I guess is that the Google database provides a good impression of Geoff and Barbara's environment. Of course, this could be wrong.

First the distribution of non-English words in Google is unlikely to be similar to that in Geoff and Barbara's environment. I'll conveniently ignore this. Second, the Google corpus, as Mark has impressed upon me indelibly, is wildly full of porn and gambling sites. But from what we all know of  Geoff, the internet may underrepresent his (scholarly) interest in porn and gambling to the same extent that it overrepresent's Barbara's. So let's not worry about that either.

Then again, the porn and gambling sites are chock-a-block with artificially created text - should we worry about that? Well, what I'm going to do now is compare the rates at which various possible Swedish mystery writer names arise. I suppose the porn and gambling sites have an equal tendency to use "Hanning" as "Henkel",  possibly close to zero, so that although they skew any absolute frequency estimate, they probably won't affect a relative comparison too much. So no, let's not worry about the artificial text.

Let's get on with it!

Mystery Name
mennkell 106
mennkel 1
mankel 9660
menkel 7710
mankell 206000
mannkell 17
manning 2670000
menning 66800
hanning 75100
henning 2080000
henkell 11000
hankell 160
hankel 70500
henkel 661000
hennkel 18
hennkell 9

First observation, f(Henning)*f(Mankell) > f(Kurt)*f(Wallander). So the confusability of "Henning Mankell" is likely not just a raw frequency issue. The problem, quite obviously, is that the "Henning Mankell" morpheme space is full of similarly plausible combinations. A full analysis would presumably involve looking at phonetic distance between alternatives, but I haven't the time for that. I'm not even going to consider orthographic distances, as could be measured by counting the number of changes to one word's spelling you would need to turn it into another. No, I'll assume that we are given that the first name ends in "ing" and the second in "el" or "ell", and satisfy myself just by looking at the two possible first-names/surname combinations which use up all the relevant morphemes the right amount of times, and which are most popular in terms of the raw frequencies of the individual names, i.e. "Henning Mankell", and "Manning Henkel."

Doing the math, it turns out that, based naively on raw frequency of the individual words, "Manning Henkel" is over 4 times as likely as "Henning Mankell"! The fact that "manning" is a reasonably common gerund has little to do with this, since "Henning" competes admirably in frequency terms: the real problem, if Google to be believed, is the far higher frequency of "Henkel" than "Mankell". This is in spite of the fact that half of Google's "Mankell" pages are "Henning Mankell" pages, so that in a survey that threw out actual mentions of the author, the odds would be stacked even more stronlgly against him. And in the wild feedback loops of the Pullum/Scholz household, it would take only one or two mentions of the wrong name for their linguistic environment to become even more polluted. No wonder Geoff and Barbara find it confusing.

Suspiciously, I found only one instance of someone on the internet actually misnaming "Henning Mankell" as "Manning Henkel." The culprit appears to be Finnish - one Esa Tuomas Tikka. Having, as I do, a talent for making strong categorical claims on the basis of weak statistical data, and being prepared to overlook the fact that Geoff mentioned "Henkell" but not "Henkel" in his post, I therefore propose that Geoff is also Finnish. And if you know anything about Finnish orthography, you'll know what that means. It means that "Geoff" cannot be his real name. Too many "g"s, you see. Who is this blogger, linguist and distinguished university professor who claims conveniently elsewhere heritage (English, of all things - does he think it's classy?), yet is married to someone with a passion for Swedish murder mysteries and has an unusually deep knowledge of Eskimo snow vocabulary? "Geoff Pullum"? Pull the other one, I should say. The game's up - reveal your true identity!

Unreasonably in error?

It's not only baseball headlines and complex nominals that pose problems for parsers, human or artificial. Here's a high hard one from the U.S. Supreme Court opinion in Meyer v. Nebraska:

The problem for our determination is whether the statute as construed and applied unreasonably infringes the liberty guaranteed to the plaintiff in error by the Fourteenth Amendment.

I'll freely admit to having swung and missed at this sentence, not once but twice.

At first, I understood it to be asserting that the Fourteenth Amendment was "in error" in guaranteeing liberty to the plaintiff. But that is a logically impossible view for the Supreme Court of the United States to hold. Then I considered that the sentence might be asking whether the statute was "in error" in infringing the plaintiff's liberty. This is a plausible meaning, but it's syntactically impossible, because it would involve tangled modification of a type that English doesn't allow:

... the statute unreasonably infringes the liberty guaranteed to the plaintiff in error by the 14th Amendment

(not to speak of the problems of having both a preverbal adverb "unreasonably" and a post-verbal adverbial "in error").

Then I remembered that "in error" is a legal term of art:

PLAINTIFF IN ERROR. A party who sues out a writ of error, and this whether in the court below he was plaintiff or defendant.

DEFENDANT IN ERROR. A party against whom a writ of error is sued out.

WRIT OF ERROR, practice. A writ issued out of a court of competent jurisdiction, directed to the judge of a court of record in which final judgment has been given, and commanding them, in some cases, themselves to examine the record; in others to send it to another court of appellate jurisdiction, therein named, to be examined in order that some alleged error in the proceeding may be corrected.

Mr. Meyer was of course the "plaintiff in error" in the case Meyer v. Nebraska, and once this occurred to me, the sentence became easy to understand. But I don't read enough Supreme Court opinions for this to have been my first, automatic construal.

April 21, 2004

Mais ou sont les flamewars d'antan?

After reading Camille Paglia's much-discussed Arion piece on "cultural dissipation since the 1960s" , I recently spent a small amount of effort looking for any non-anecdotal evidence of generational changes in attention span or verbal facility. Rivka at Respectful of Otters has done a much more thorough job, at least on the attention side, and "after an exhaustive search of psychological and medical research databases" found: "nothing".

It worries me that the discussion of this topic -- whether in the comments on Rivka's earlier post or in the blogosphere at large -- has been so lop-sided. Those who are skeptical of Paglia's views cite studies, quote statistics and suggest models, whereas Paglia's defenders, like Camille herself, don't seem to be able to get past arguments like "People who deny that attention spans have gotten shorter are just whistling past the graveyard" or "The "factual" basis for believing it has become attenuated is simply the fact that it has. I taught college literature for twenty years, and I can testify it has. The facts are in front of my eyes. Anybody who denies this is happening is being willfully obtuse."

I haven't been able to find a single example of a pro-Paglia post or comment that cites a single piece of evidence other than personal conviction. But it wouldn't be terribly hard to make some kind of a case -- there's the decline in verbal SAT scores, there's the change in the difficulty of school texts, there's the decrease in the length of average TV commercials. It's as if the pro-Paglians have decided to prove their point by unilateral disarmament in this battle of wits.

For an example of how to carry out one of these arguments properly, consider the Great Media Bias Flamewar of 2002, surveyed by Edward Boyd here ("Much ado about Nunberg"). Come on, Paglians, remember what it was like to support an argument with actual facts? I bet someone made you do it, back in the days before TV, PCs and video games rotted our collective minds. Borrow a clue from today's kids. Focus! You can do it if you try!

[Update: the redoubtable Semantic Compositions promises to take up the gauntlet. Go to it, SC! Not that you need any help, but here's a suggestion passed on from Mark Seidenberg: apparently Donald Hayes has compared the original and re-issued versions of books such as Nancy Drew, and discovered that the new ones are at a much lower reading level than the old ones, which might be considered evidence of Paglian cultural dissipation (or of a change in the targeted age range, of course). As far as I can tell, this work hasn't been published yet -- I looked -- but there's reason to think that Prof. Hayes would be happy to share the results.]

Posted by Mark Liberman at 04:56 PM

Get 'em while they're young

LanguageLog posts have occasionally commented on what gets taught or, more often, not taught about language and linguistics in the schools, e.g. here and here. And indeed it is a sad state of affairs.

Perhaps there's hope for statistical NLP, though: earlier this year I was surprised and pleased to receive e-mail from a local seventh grader, who was seeking some pointers for a science fair project that involved using ideas from information theory to predict the grade level of written text. I'm delighted to report that this student did a fine job, winning a variety of awards at the Montgomery County (Maryland) Science Fair, which is quite competitive. He sent me this great photo -- notice that Claude Shannon makes an appearance in the upper left hand corner!

The FCC and the S-word (again)

It looks like CBS may be in trouble over Mary J. Blige saying the S-word on 60 Minutes. Here's an mp3 of the alleged offense, which the Schnitt Show (radio, Clear Channel, Miami) is helpfully hosting on its web site (slogan: The Web is Full of Schnitt. No, really). More information on the S-word is available here.

[Update: Even before this latest issue came up, Stuart Benjamin at the Volokh Conspiracy suggested that far from being stupid, this is all a clever libertarian plot by "some FCC Commissioners and Members of Congress who have supported this new regulation regime" because they "secretly wanted to bring about the demise of broadcast indecency regulation", since "the Supreme Court probably would -- and in [his] view should -- find these indecency regulations unconstitutional". In fact, he leads off his post by calling it a "'fucking brilliant' regulatory strategy". Is this a rare example of double reverse sarcasm, in which he effectively calls the strategy clever by using "brilliant" with such obvious sarcasm that we realize he can't possibly mean something as obvious as the opposite of what he says, but must mean the opposite of the opposite?]

The mysteries of... what's his name?

My partner Barbara reads mystery novels, and has a favorite author. He is her favorite by a mile. He's a Swede, with a fascinating lifestyle that involves dividing his time between writing mysteries in Sweden for part of the year and directing a theater in Mozambique, in southern Africa, for the rest. He is a familiar name around our house. His books lie around in various places near chairs and couches and other likely reading places, and his photo looks up at us from the back of many of his excellent paperbacks about his fictional Swedish detective, Inspector Kurt Wallander, whose name we always recall. Only there is a tradition of our not being able to actually recall the name of the author himself with any precision. We have many of the consonants down, and a rough idea of the sort of vowels, but it doesn't come together. It may be Manning Henkall. Or possibly Hanning Menkell (that could be Menkall). I don't have one of his books in front of me as I write this. I don't know why a simple Germanic name like Henkell Manning (sp.?) should be so difficult to remember; Menning is easy enough, so is Hankell, so what's the problem?

Barbara really loves this guy and gets all his books. Occasionally she'll mislay one, and wander round the house saying, "Have you seen my Manning Henkell? Umm... Manking Hennall?" I always look solemn and pretend not to understand who she means. I feel it's incumbent on me to make this stand for accuracy. Only the truth is that I can't remember his name either. Neither of us can. Like I say, it's a tradition.

The only thing I can think is that neither the first name, Hennall, nor the last name, Menking (or was it Manning?), is particularly familiar to us as a name, and all the syllables involved fit together perfectly well. A simple run of a triple-loop shell script running through the combinations convinces me that there are only 48 possibilities (I'm quite sure of the vowel of ing, and the others are definitely a or e in each case); so the possibilities are not endless. The name of one of the best current mystery writers in the world is definitely on this list:

Hankall Manning Hanning MankellHenking Mennall
Hankall Menning Hanning MenkallHenking Mennell
Hankell Manning Hanning MenkellHenning Mankall
Hankell Menning Henkall ManningHenning Mankell
Hanking Mannall Henkall MenningHenning Menkall
Hanking Mannell Henkell ManningHenning Menkell
Hanking Mennall Henkell MenningMankall Hanning
Hanking Mennell Henking MannallMankall Henning
Hanning Mankall Henking MannellMankell Hanning
Mankell Henning Manning HenkallMenking Hannell
Manking Hannall Manning HenkellMenking Hennall
Manking Hannell Menkall HanningMenking Hennell
Manking Hennall Menkall HenningMenning Hankall
Manking Hennell Menkell HanningMenning Hankell
Manning Hankall Menkell HenningMenning Henkall
Manning Hankell Menking HannallMenning Henkell

This is one place where Google has a bit of a problem: if you really don't know anything more about the name than this, computer searches are going to be quite difficult without the power of grep (i.e., regular expression searching capability, which Google doesn't offer). One might do best to search on one of this titles, like... let's see... The Dogs of Riga (that one is set mainly in Latvia).

Anyway, the books are wonderful, the plots are deep and intelligent, the sense of place is masterful, the characterization is adult and thoughtful. Only the name escapes me. Sorry about that. Hope I haven't confused you. (Was it Menning Henkall?)

[Actually, it may be none of the above. I was completely wrong about there being 48 possibilities. I discovered later that there are actually 256.]

Posted by Geoffrey K. Pullum at 02:04 PM

The politics of tongue cutting

Iggy at Blogalization comments on a story in Vocabula Review by Richard Leder that mentions the alleged South Korean mania for performing frenotomies (a kind of tongue surgery) on children to help them learn English.

The frenotomy story first came out in Reuters last October. At the time, I observed that the procedure is sometimes indicated to correct ankyloglossia, a condition that has nothing to do with learning English, and that

the Reuters article doesn't offer any evidence that frenotomies are really rampant in South Korea. One doctor is quoted as saying that he performs the procedure "once or twice" a month, and that only "ten or twenty percent" of parental inquiries lead to surgery. Taking this at face value, it gives us a yearly total of 12-24 surgeries and 60-240 inquiries. Now, maybe there are dozens of other doctors and thousands of inquiring parents. Or maybe this is the one guy who's the frenotomy specialist, and he's boosting his stats, and we're talking about 10-15 surgeries and 50-60 inquiries a year, mostly medically valid or at least not connected to crazed parents frantically pushing English.

In the same post , I noted that the Reuters story featured a truly bizarre (alleged) quote from a psychiatrist at Seoul National University, to the effect that the whole idea of teaching children English is a bad idea, because "[l]earning a foreign language too early, in some cases, may not only cause a speech impediment but, in the worst case, make an child autistic."

After thinking about this a bit more, I suggested that the Reuters story, like the recently infamous BBC frog sex muddle, was probably not plain vanilla bad science reporting, but rather bad science reporting motivated by a political agenda. In the case of the BBC frog story, the goal was apparently to highlight environmental issues; in the case of the Reuters frenotomy story, I hypothesized that the motivation was to give "a thump in the nose to globalization and (implicitly) to the U.S." Iggy's post at Blogalization appears to swallow the same political bait hook, line and sinker when he writes:

I'm reminded of Susanna George's presentation at the III World Social Forum in Porto Alegre:

Susanna George of the Philippines, representing the feminist network ISIS, expressed deep skepticism about the possibility of reforming global media, dramatizing the “tragedy” of cultural homogenization, which, for example, “drives Asian women to undergo surgery to transform their brown nipples into the pink nipples of Western women.”

To cite another example, so-called “world music” represents “a homogenization of cultural difference, a mind-numbing dilution of cultural values aimed at mostly American consumers that shames people from what is indigenous,” George said.

Q.E. (alas) D. -- though in fairness to Iggy, I can't tell if he is quoting Susanna George because he thinks her comments are a sensible frame for the frenotomy story, or because he thinks that they are not.

Posted by Mark Liberman at 11:36 AM

Logging Road Language

Driving on active logging roads can be hazardous. They're narrow, unpaved, and usually have no shoulder or guardrails. Bridges are invariably only wide enough for one vehicle. They are frequented by logging trucks, which are large and when loaded, heavy vehicles. A loaded logging truck carries anywhere from 45 to 100 cubic metres of wood. One cubic metre of spruce weighs about 450 kilograms. People familiar with logging roads know that the first rule of the road is:

A loaded logging truck has the right of way.

This is independent of what the Motor Vehicle Code may say. You see, in the real world, the laws of physics trump social constructs. Now and then a few people, witting or unwitting postmodernists, who think that social constructs trump the laws of physics, are mowed down by logging trucks. Natural selection can be brutal.

Because of the hazards of driving on active logging roads, it is desirable for people on such roads to be able to communicate with each other and announce their positions. If, for instance, you are approaching a bridge from the burdened side, if you know that there is a truck coming toward you, you will pull over and let the truck by. In Canada (possibly elsewhere, but this is where my own experience is from), vehicles travelling on logging roads are supposed to be equipped with radios. Each road is assigned a frequency, which is posted at the entrance to the road. When you enter a road, you set your radio to the assigned frequency and monitor the traffic. This lets you know what other people are doing and where they are. You also announce your position periodically, typically every few kilometres. At critical points, such as the approach to a bridge, you are required to announce your position. In addition to the mileposts placed every kilometre along the road (they're still called mileposts even though Canada is on the metric system, another instance of etymology not determining meaning.) there are special posts at critical points so you can announce exactly where you are.

There is a specialized vocabulary for announcing one's position. A typical announcement will be something like this:

Loaded approaching bridge at 18.4.

This announces that a logging truck or other heavy vehicle is approaching the bridge at kilometre 18.4 and is heading away from the work site, toward the mill. The lack of an overt subject tells us that the vehicle is a logging truck or other heavy vehicle. Any other vehicle would be specified as a pickup, whether or not it is actually a pickup truck. The direction of travel is indicated by loaded. A loaded vehicle is one that is headed away from the worksite, toward the highway and the mill. This term is used because in the normal course of events a logging truck takes on a load of logs at the worksite in the forest, carries it to the mill, and then returns unloaded to the worksite for another load. A vehicle headed out to the worksite is therefore described as unloaded. In spite of their etymology, these terms describe direction, not loading. If a truck has carried equipment out to a worksite and is returning empty, it is nonetheless loaded.

Whether or not a vehicle is actually loaded is sometimes of interest to other people. A heavy truck, for instance, needs to be given more leeway when it is loaded than when it is unloaded. So, if your vehicle is not in the canonical state for its direction of travel, you may indicate this. For instance, if you are driving a truck carrying equipment out to a work site, you may announce:

Unloaded with a load approaching bridge at 18.4.

(The opposite announcement, Empty vehicle loaded ... is not so common since it doesn't matter much if other people assume that your vehicle is loaded when you are inbound.)

There is one exception to the use of loaded and unloaded to indicate direction. Sometimes two different logging roads are close enough to be within radio range. In this case, announcements may contain the name of the road as well as the position, but often one road is assigned the terms loaded and unloaded and the other is assigned the terms empty and with a load. This will be indicated on the sign at the beginning of the road along with the radio frequency to use. This makes it unnecessary to name the road.

Ideally every vehicle on an active logging road has a radio, but in practice some don't. Someone without a radio is said to be driving blind. Different roads have different cultures. On some roads, people are cooperative and look out for each other. If there are vehicles without radios, people with radios will announce their location. Other roads lack this sense of community.

Of course, logging truck radios are also used for gossip and emergency communications. In some areas, they are a good place to hear conversations in native languages.

Here's a picture of my truck, showing its three antennae. One is for the usual commercial broadcast band, one for the CB, and one for the logging road radio. It is parked at my friends' house at Tsetl'oni'a, on Tachek Lake, outside of Vanderhoof, British Columbia. My canoe winters down by the shore.

Bill Poser's Truck

One used his French, the other didn't

After Saddam Hussein's capture by U.S. forces back in December, he responded to questions from members of the Iraqi Governing Council by "us[ing] all his French". John Kerry, in contrast, declines to exhibit his French in public.

Hopfield's law says that "there are three things that you shouldn't do in public; the third is mathematics." I always assumed that the other two were defecation and sex, but ....

Mere knowledge of the German language cannot reasonably be regarded as harmful

This is not just my opinion: it's the law. At least, it's an opinion expressed by James C. McReynolds, writing on behalf of a majority of the U.S. Supreme Court.

In his recent post on John Kerry's French, Geoff Pullum noted in passing that "in Nebraska they once passed a law making it illegal to teach foreign languages in the schools," though "[f]oreign language learning is now, like sodomy, legal in all states". The reasons for the widespread legality of both practices are the same, namely U.S. Supreme Court rulings.

State laws criminalizing sodomy were overturned in 2003, in Lawrence and Garner v. Texas. State laws criminalizing foreign language teaching were overturned in 1923, in Meyer v. Nebraska.

According to the court's opinion, Meyer had been "tried and convicted in the district court for Hamilton county, Nebraska, under an information which charged that on May 25, 1920, while an instructor in Zion Parochial School he unlawfully taught the subject of reading in the German language to Raymond Parpart, a child of 10 years..." Meyer's school was a Lutheran one, and the peccant book was "a collection of Biblical stories".

The Nebraska state law under which Meyer was convicted had been passed in 1919, and read in part:

'Section 1. No person, individually or as a teacher, shall, in any private, denominational, parochial or public school, teach any subject to any person in any language than the English language.

'Sec. 2. Languages, other than the English language, may be taught as languages only after a pupil shall have attained and successfully passed the eighth grade as evidenced by a certificate of graduation issued by the county superintendent of the county in which the child resides.

The Nebraska supreme court upheld the law. Some of the Nebraskan reasoning, as described in the U.S. Supreme Court's opinion, might have been taken from Samuel Huntington's recent Foreign Affairs article:

It is said the purpose of the legislation was to promote civic development by inhibiting training and education of the immature in foreign tongues and ideals before they could learn English and acquire American ideals, and 'that the English language should be and become the mother tongue of all children reared in this state.' It is also affirmed that the foreign born population is very large, that certain communities commonly use foreign words, follow foreign leaders, move in a foreign atmosphere, and that the children are thereby hindered from becoming citizens of the most useful type and the public safety is imperiled.

Nebraska's Attorney General defended the law before the U.S. Supreme Court in these terms:

"If it is within the police power of the state . . . to legislate respecting housing conditions in crowded cities, to prohibit dark rooms in tenement houses, to compel landlords to place windows in their tenements which will enable their tenants to enjoy the sunshine, it is within the police power of the state to compel every resident of Nebraska to so educate his children that the sunshine of American ideals will permeate the life of the future citizens of this republic. A father has no inalienable constitutional right to rear his children in physical, moral or intellectual gloom." [Brief and Argument of State of Nebraska, Defendant in Error, 14-15] [via this on-line paper]

It's worth noting that the AG's rhetoric is explicitly modeled on the standard justifications for social welfare legislation, and that Justice McReynolds, who wrote the SCOTUS majority opinion overturning the law, was a very conservative man, who later worked tirelessly to obstruct Roosevelt's New Deal legislation.

It may be that the social welfare justification was (part of) the reason that Oliver Wendell Holmes, Jr., who is generally described as "liberal" and "progressive", dissented from the majority opinion (i.e. voted to uphold the Nebraska law). If so, the line-up of McReynolds and Holmes on opposite sides of this case is analogous to the contrast that I previously noted between David Brooks, conservative, and Samuel Huntington, liberal.

April 20, 2004

Digital X-rays, easy; patient's name, unfixable

A full-scale root canal operation on a painful back molar this morning. In the chair by 7 a.m., injections, drilling, tunneling, filing, scouring, filling, bonding, the entire nerve canal system packed with high-strength cementing material by about 8:30, no more pain. The technology is fantastic: a digital X-ray system hooked to a computer with double flat-panel color monitor displays. Using only 10% of what used to be a low X-ray dose, the endodontist can get a large zoomable X-ray image of how it's going, any time he needs it. I could view the progress on the tooth about eight times during the operation. Space-age stuff. And yet there was a glitch. A linguistic problem.

The file for me as a patient of the endodontist 's practice said "Geoffrey Pullum", as it should. But the file set up for my X-ray pictures said "Geoffrey Pullom". The system should have been set up to read my name from the patient database, of course: there should be just one place where "Pullum" is stored, and all files set up in connection with me should link to that. But the systems were not linkable, and someone had to type in my name again, somewhere else, this time wrongly spelled. The incorrect spelling showed in the header bar of the window and the task bar button at the bottom.

Now, of course, this is a serious business: right now, a search of their records is going to find a patient called Pullum who had an operation but without X-rays, and a patient called Pullom who'd been X-rayed eight times for nothing but had received no treatment. Malpractice suits have arisen out of far less. They've got to get their records straight if my X-rays are going to be findable at the next appointment.

I watched for quite a while as the assistant struggled to find out where that spurious name was recorded, and she just couldn't. I knew the pain she was in: from where I lay in the patient's chair, I could see it was Microsoft Windows. I've downloaded things and been totally unable to find them; tried to trace links and found that filenames didn't match them; hunted for folders mysteriously hidden tried to block pop-ups and other behaviors with no success; struggled to customize things and failed. The MS Windows interface is a gimmick-bedecked ill-designed nightmare for an intelligent person, with designed-in blocks to investigation (hidden files, unremovable program components, secret stuff stashed away). Every sensible effort the assistant made, (closing files, reopening them, looking in pull-down menus, checking other screens of information, hunting for anywhere "Pullom" might be located) was a failure. The system, designed for what was supposed to be ease of use, was intractably unusable once something had to be altered. The designers had perhaps imagined their way through a perfect, error-free session, but had not thought about what an error might lead to, or about how trouble-shooting and recovery from error might be perspicuously done. The screen was complex and pretty and littered with brightly-colored gadgets and boxes and features, but it gave no clue as to where that name in the header bar might come from or where one might look for a way to edit it.

In other words, as usual, the hardware guys had done a wonderful job, and the graphics guys who wrote the digital X-ray viewing algorithms had done theirs too, but the interface programmers had let the side down. It is the usual familiar story of modern computing: hardware developments of breathtaking, intergalactic wonderfulness ruined by the low quality of the software that is supposed to let human beings converse and interact with it. What's involved in programming front ends is straightforward: crystal-clear thinking about the structure of systems (computer and human) of extraordinary complexity. It's rather like linguistics (and let me hasten to add, we humans are not exactly great at that either). And it turns out to be much harder than designing and building complex machinery. Either that or software developers just aren't trying hard enough. As I watch software firms upgrading programs to new versions that are more complex, slower, and often objectively worse than the old ones, I often wonder whether the latter isn't the problem.

Grice, Pascal, the Times, and Barry Bonds

"Le moi est haissable," said Pascal, and nowhere so much as at The New York Times. Take the story by Lee Jenkins on today's front page about Barry Bonds, centering on his "aloofness" and "frosty demeanor." By way of making his point, the writer says:

"I just wanted to meet Barry," Ellison said as he dragged his vessel back into the water. "I just wanted to shake his hand."

That proposition can be as daunting as catching one of his rising line drives. When a reporter hovered near Bonds, a hulking left fielder, for the first time, Bonds asked: "What do you want? Why are you looking at me? I don't like people staring at me." Then he loped onto the field, where more than 40,000 people studied his every move.

Ordinarily, you would take that description "a reporter" as denoting someone other than the writer, on the assumption that otherwise the writer would be flouting Paul Grice's Maxim of Quantity by giving less information than was relevant.

As Grice famously notes, when someone says that Jones is meeting a woman tonight, we assume that the speaker knows that the woman in question is someone other than Jones' wife or mother.

But the institutionalized diffidence of The Times trumps that principle, even if the immediately following paragraph makes it clear that the reporter in question is none other than the writer:

It is difficult to home in on Bonds, but not impossible. After insisting he would not give interviews -- "I don't like a lot of stupid questions and questions people already know the answer to," he said -- he relented rather easily.

The Times' interdiction against using "I" may be imposed in the service of objectivity, but in this case its effect is exactly the opposite -- it turns a statement that might otherwise be suspected of resting on wounded amour propre into one that has the sound of a neutral, third-party observation. You have the feeling that a local sports reporter would have told Jenkins that what he perceived as "hovering" was likely to have been perceived by Bonds as something more intrusive than that.

Posted by Geoff Nunberg at 12:36 PM

Boring blogs

In a side discussion among Language Log contributors (don't worry, you're not missing all that much!), Geoff Nunberg mentioned a site with deliberately boring blogs. For example, the entry for April 12:
I was sitting down on one of the chairs in my house. 
My hand  was resting on the arm of the chair. 
I started to drum my fingers   on the arm, thereby 
making a barely audible sound.

Boring, yes, but perhaps some very good test material for language understanding systems or trigger language models. For example, understanding the Feb 22 entry,

I was in a room carrying out some routine activities. I
began to consider playing some music on the stereo system. I
looked at some compact discs for a while, but didn't put
one on.
requires an inference involving the relationship between compact disks and music, the difference between considering an action and doing it, etc. To the extent that "boring" can be equated with "predictable" (or "low entropy"), this is a nice illustration that what's boring to us may be very different from what's "boring" to our systems.

It's a tribute to the power of the global lunchroom. When I say that entangledbank's 23/5 exquisite corpse is the most amusing one I've seen, many of you will know exactly what I'm talking about. For those who don't, here's the start of the thread on Christopher Bahn's Incoming Signals weblog (scroll about half-way down the page), and here are some of the later links as catalogued by technorati.

Actually, I should have said "entangledbank's 23/5 exquisite corpses", because N.V. Trochum decides to provide two, depending on which organized stack of books gets picked:

"I don't usually participate in the blogmemes that buffet us, mainly because I'm boring and I know the results will be boring. In the case of the 23/5 one, the real problem is that my books are arranged in order. All the interesting kids have scattered piles of Henderson Crossthwaite's Intermediate Tensor Algebra for Anabaptists, J. Peasmold de Launcey's Mancunian Philatelists and Transvestism, Josepovic and Andric's Marginalia to Fourteenth-Century Bosnian Traffic Edicts, and Singhiz Bannerjee's The Pavilion of the Enchanted Civets within easy reach."

Aside from refuting N.V.'s self-deprecation, the results are quietly hilarious. I was particularly struck by the beautiful example of random Chomskian praeteritio in the third sentence of the corpse of N.V.'s linguistics stack:

We chose the latter option, preserving the widely accepted view (and the minimalist view) that there is a strong continuity between what is going on in the language learner and what ends up in the mind of the adult speaker. One such language is Russian, exemplified in (11). However, I will not pursue the matter here, my purpose being merely to indicate some of the lines of inquiry that have been pursued in recent years. PF seems to be the natural locus of conditions on lexicalization and overtness, whereas it is implausible that the syntax contains conditions sensitive to such notions. Hence there is not necessarily a one-to-one match between syntactic arguments and conceptual structures arguments. As for himself/him, Pavarotti said that he enjoyed the performance. [emphasis added]

I always thought that it was a flaw in John Sowa's amusing automated Chomsky parody generator (here reproduced and explained by John Lawler) that it omitted Noam's characteristic and masterful uses of the larger-scale gestures of classical rhetoric, in favor of a focus on mechanisms of local transition, such as "it must be emphasized, once again" or "let us continue to suppose".

Generational changes: decline or progress?

A few days ago, I complained about the lack of evidence for Camille Paglia's claim that "Interest in and patience with long, complex books and poems have alarmingly diminished", and that today's students "cannot sense context and thus become passive to the world, which they do not see as an arena for action". I hope that some of the people who really know about this sort of thing will eventually post an annotated guide to the literature on these questions.

Meanwhile, I've done a bit of poking around. I haven't been able to find anything that directly measures changing "interest in and patience with long, complex books" or ability to "sense context" over the past few decades. However, I can point to a couple of well-known trends that seem somewhat relevant, one of which has been heading steadily up for a century (the "Flynn effect" in IQ tests), while the other is now flat after pointing down for a couple of decades (U.S. verbal SAT scores). In neither case does it seem likely that TV, PCs and video games are responsible for (much of) the change.

The "Flynn effect", which is named for a social scientist named James Flynn who first discovered it about 20 years ago, involves "an average increase of over three IQ points per decade ... for virtually every type of intelligence test, delivered to virtually every type of group."

"For one type of test, Raven's Progressive Matrices, Flynn found data that spanned a complete century. He concluded that someone who scored among the best 10% a hundred years ago, would nowadays be categorized among the 5% weakest. That means that someone who would be considered bright a century ago, should now be considered a moron!"

There is a great deal of controversy about whether these changes are "real", what causes them, and what they might mean. The increases seem to be accelerating, to the extent that it is possible to measure rates accurately enough to tell. Though all aspects of all types of tests are affected, "the increase is most striking for tests measuring the ability to recognize abstract, non-verbal patterns. Tests emphasizing traditional school knowledge show much less progress."

On the other hand, there was a significant decline in SAT verbal scores in the U.S. from the early 60s through the late 70s, when the scores leveled out.

It seems even less clear in this case what is going on. Here is an ETS document about SAT norming. It seems that some but not all of the change can be attributed to demographic changes in the population of students taking the test. The residual effects have variously been attributed to changes in educational standards; to the effects of television; to 1960s cultural mindrot; and so on.

One of the more interesting of these arguments is presented in a paper entitled "Schoolbook simplification and its relation to the decline in SAT-verbal scores", by Donald Hayes, Loreen Wolfer and Michael Wolfe (American Educational Research Journal. 33(2), 1996, pp. 489-508). Here's the abstract:

Argues that the 50+ point decline in mean SAT-verbal scores between 1963 and 1979 may be attributed to the pervasive decline in the difficulty of schoolbooks found by analyzing the texts of 800 elementary, middle, and high school books published between 1919-1991. When this text simplification series is linked to the SAT verbal series, there is a general fit for the 3 major periods: before, during, and after the decline. Long-term exposure to simpler texts may induce a cumulating deficit in the breadth and depth of domain-specific knowledge, lowering reading comprehension and verbal achievement. The two time series are so sufficiently linked that the cumulating knowledge deficit hypothesis may be a major explanation for the changes in verbal achievement. A simple, low-cost experiment is described that schools can use to test how schoolbook difficulty affects their students' verbal achievement levels.

You can learn more about the measures of text difficulty used, and download software for doing the calculations as well as spreadsheets with all the raw text-difficulty data, from Donald Hayes' web site at Cornell. Hayes' text complexity measure is entirely lexical -- it depends only on the frequency distribution of the words in a text. The general idea is illustrated in the figure below, which compares the distributions for a large newspaper corpus with a first-grade reader (which uses fewer rare words) and a set of scientific abstracts (which use a larger proportion of rare words).

It's a bit unexpected not to include any measures of syntactic complexity -- even something as simple as mean sentence length. However, anyone who has looked a collection of historical schoolbooks quickly gets the impression that they have gotten simpler over time, and it's nice to see that Hayes is able to document this using purely lexical measures.

[Thanks to Mark Seidenberg for a pointer to Hayes' web site]

April 19, 2004

New Evidence Against the Kensington Runestone

The Kensington Rune Stone According to news reports, new evidence has emerged supporting the view that the Kensington Rune Stone is a forgery. The stone, now housed in the Runestone Museum in Alexandria, Minnesota, was allegedly found in 1898 by Swedish-American farmer Olof Ohman on his farm in nearby ᚴᛂᚾᛊᛁᚶᛡᚮᚾ (Kensington). The inscription (the last part of which is on the side, out of view in the photo) reads:

Eight Goths and 22 Norwegians on a journey of exploration from Vinland very far west. We had camp by two rocky islands one day's journey north from this stone. We were out fishing one day. After we came home we found ten men red with blood and dead. AVM save from evil. Have ten men by the sea to look after our ships fourteen days' journey from this island. Year 1362.

Believers in the inscription's authenticity consider it proof that the Vikings discovered America. Most scholars consider it a forgery, most likely made by Ohman himself, but the question is still actively debated. One scholarly proponent of the authenticity of the inscription was the late Robert A. Hall, Jr. who published a book entitled The Kensington Rune-Stone: Authentic & Important. Hall was an early critic of prescriptive grammar, which he attacked in a book entitled Leave Your Language Alone, previously mentioned here on Language Log.

There is a good short summary of the debate in this essay by Richard Nielsen, a believer, and Henrik Williams, a skeptic, on the web site of the Historical Museum in Stockholm. The Museum has available a special issue [PDF document] of its Historiska Nyheter devoted to the runestone. It is mostly in Swedish but has portions in English. Even if you can't read Swedish, it has some nice illustrations with bilingual captions.

The new evidence is the discovery of documents written in 1885 by a Swedish tailor in runes, which had long since passed out of use as the usual means of writing Swedish. These documents provide examples of a secret variety of the runic alphabet used by tradesmen in the 19th century. Some of the unusual runes on the Kensington Rune Stone turn out to belong to the secret tradesmen's version of the alphabet. Since Scandinavian scholars believe that this version of the runes did not exist in the 14th century, the use of these runes favors a 19th rather than 14th century origin.

In a way, it is odd that so much importance should be attached to this stone. On the one hand, there is no longer any question that the Vikings reached North America. We know that they settled at L'Anse aux Meadows. On the other hand, although the Vikings beat Columbus to the New World, they weren't the first. The Indians beat the Vikings by thousands of years.

Talking heads

The always-interesting Laputan Logic site has recently posted an illustrated account of the Amazing Talking Head that Joseph Faber exhibited in Philadelphia in 1845. If this sort of thing interests you (as it should!), you can find on the Haskins Labs web site a series of pages on the early history of talking machines, which shows diagrams and/or descriptions of Kratzenstein's vowel resonators (produced around 1770 at the Imperial Academy in St. Petersburg), Wheatstone's version of von Kempelen's 1791 talking machine, and Alexander and Melville Bell's human vocal tract model, inspired by Wheatstone's machine and built during their boyhood in early 19th-century Edinburgh.

This is part of a larger Haskins exhibit entitled Talking Heads, authored by Phil Rubin and Eric Vatikiotis-Bateson, which also covers the history of electronic speech synthesis, virtual vocal tracts, models of speech production, the McGurk effect (we linked to that one earlier), facial animation, avatars, and more! There is an enormous depth of information on this marvelous site.

Another excellent site about speech synthesis from 1770 to 1970 can be found here, created by Hartmut Traunmüller at Stockholm University. Hartmut's page includes photographs of von Kempelen's original speaking machine, now in the Deutsches Museum in Munich.

Wolfgang von Kempelen was also the creator of the Turk, a chess-playing automaton that prefigured the recent Chatnannies algorithm developed by Jim Wightman. The Turk was actually operated by a smallish human concealed inside its cabinet -- some 15 chess experts and masters apparently played this role over the course of some 70 years -- but things moved more slowly in those 18th-century pre-internet days, and so the Turk played exhibitions all across Europe for several years before first being exposed. 50 years later, it was still successfully touring the U.S., where its popularity is said to have inspired the formation of the first American chess club in Philadelphia in 1826.

Unlike the Turk, von Kempelen's speech synthesis machine was a work of engineering with no hidden tricks, but (perhaps for that reason) it did not earn its inventor or its subsequent owners as much fame or money. However, one can argue that it played a part in the invention of the telephone a century later.

No French please, people are watching

Fascinating stuff by Joshua Kurlantzick in the front pages of The New Yorker this week, about how John Kerry is fluent in French but has to try to conceal it. In private settings he has chatted in excellent French at length with Alain de Chalvron, Washington bureau chief for the French radio service France 2; but when asked a question in French at an open press conference, Kerry pretended not to be able to understand it, and didn't give an answer at all. The last thing you want in American politics, apparently, is to be captured on camera understanding French, let alone speaking it. Rush Limbaugh would start portraying you as hardly American at all (he already does this with Kerry, in fact, having heard about these suspicious francophone abilities on the grapevine).

Geoff Nunberg pointed out to me that in Nebraska they once passed a law making it illegal to teach foreign languages in the schools, period. Foreign language learning is now, like sodomy, legal in all states; but these are not freedoms that a politician should brag about taking advantage of. Such is the determined linguistic isolationism of the USA. I would have thought that to have a US president (for once) who could argue fluently and convincingly in the native language of some other head of state would be a fantastic asset. But instead it is perceived as a kind of disloyalty, evidence of being an untrustworthy egghead, and you would lose millions of votes over it. It's both depressing and amazing.

O tempora, o mores

Rivka at Respectful of Otters offers a lucid explanation, based on sampling bias, for the fact that many teachers agree with Camille Paglia's jeremiad on the degraded state of today's youth.

Kids these days aren't like Paglia's friends in college were. But most kids at the time that Paglia was in college probably weren't like her friends. The educational path that ends in a Ph.D. and a professorship typically involves early segregation from average people, and when Paglia was growing up in the age of "ability tracking," that was even more true. Grad school selects for precisely the kinds of people who, as Paglia describes her college buddies, like to have intense discussions of literature at midnight. (Well, my program selected for people who liked to get drunk and mock inferior research methods, but the principle is the same.) No doubt there were plenty of other students at SUNY Binghamton in the 1960s who only cared about football and drinking and their future business careers, and who whined when they were assigned books that were too hard.

Paglia didn't have to hang out with those guys then, but now their direct descendants have shown up in her classes and she's stuck with them. The middle of the bell curve is much fatter than the ends, so she's probably got a lot more beer-and-Cliffs-notes kids in class than she has proto-intellectuals. That doesn't mean that America's youth have gotten dumber, it just means that she's being forced to encounter a wider spectrum of America's youth than she ever had to know when she was one.

At the University of the Arts, Paglia's classes may refer to IMDb more often than to Cliff's Notes, and may prefer other substances to beer. However, the principle is correct and applies to teachers in general, as well as to anyone else whose work forces them to deal with a more random sample of the population than their school-days affinity groups did.

I'm sure that there really are some significant cultural differences among people from different places and times. However, I like to think that my own specific beliefs about such differences (for instance between kids today and kids 40 or 50 years ago) are better supported by evidence. In particular, I try to be suspicious of generational generalizations based on nothing but my own personal observation, because of the sampling bias that Rivka describes, as well as the mythologizing effects of selective memory.

A stubborn survival

Trevor at kaleboel writes about a case of what he calls "redundant prepositions":

US Vice President concludes up China visit
That concludes up our quick look at 2002
The praying one concludes up the prayer with a great question

As Trevor observes, up in these examples seems not only inappropriate -- contrary to the norms of standard English -- but also unnecessary, in that the phrases are fine without it. This analysis leaves out something interesting, namely the reason that "concludes up" is funny, while similar phrases with "finishes up" are fine. This is an example of a pattern that is half a millennium old, and is still potent in the vernacular as well as in formal usage.

[Keep in mind that neither the history of English nor lexical semantics are specialties of mine; but readers will no doubt point out my mistakes and omissions.]

First, here are Trevor's "concludes up" examples, paired with comparable "finishes up" cases (googled for the occasion):

US Vice President concludes up China visit
Men's tennis finishes up road trip

That concludes up our quick look at 2002
That finishes up our look at a basic installation of SQL Server 2000.

The praying one concludes up the prayer with a great question
He finishes up the presentation with cart rides for the children.

The up in the "finishes up" examples is redundant, in the sense that it could be omitted without making the phrases ill-formed, or changing their meaning in a fundamental way. But up does add some completive or intensifying savor here, all the same.

The American Heritage Dictionary entry on up gives a couple of relevant senses:

11. Completely; entirely: drank it up in a gulp; fastened up the coat. 12. Used as an intensifier of the action of a verb: typed up a list.

Up can combine with many verbs -- at least dozens, perhaps hundreds -- to give something like this completive or intensifying meaning. These include verbs of fragmentation such as break up, bust up, carve up, smash up, wreck up; verbs of ingestion such as eat up, drink up, gulp up, swallow up; verbs of containment such as bottle up, box up, lock up, wrap up; and so on.

[Note that up has other senses in such combinations with verbs, like the directional meaning in hang up, sprout up, etc.; and that there are also idiomatic combinations like make up; and that some of the verb+up completive examples have other meanings of this kind as well.]

So why can't we add a bit of completive savor with "concludes up"?

Well, conclude is not alone in being frozen out of up. Compare (some of) the cleaning verbs that work with up to (some of) those that don't:

clean up, dust up, dry up, freshen up, mop up, neaten up, pick up, polish up, scour up, slick up, spruce up, straighten up, sweep up, tidy up, touch up, wash up, wipe up

arrange up, decontaminate up, disinfect up, launder up, sanitize up, sterilize up

or similarly some fragmentation verbs:

break up, bust up, carve up, smash up, wreck up

demolish up, destroy up, dissect up, fracture up

I'm not laying down an unsupported prescription here, though a prescription does emerge that learners of English should learn to follow. Here are some relevant Google stats for a sample of these cases, taking the V+ed form (e.g. "freshened", "disinfected") as a proxy for verb frequency, and the three word sequence "V+ed it up" (e.g. "freshened it up", "disinfected it up") as a proxy for frequency of the verb with completive up:

"V+ed it up"
Count per 1,000

What's going on here? Is it just a random fact about verbs that some of them can take completive up and some of them can't?

No, there are clearly some things that separate the two lists. Etymological source is somewhat predictive (verbs of germanic origin tend to work, verbs of romance or latinate origin tend not to), and so is sound structure (short verbs tend to work, long verbs tend not to). In some cases where these (correlated) properties aren't predictive, another relevant question seems to be whether the verb in question was in general use in 14th century English.

Thus polish is a two-syllable word borrowed from French -- the American Heritage Dictionary says that polish is from "Middle English polisshen, from Old French polir, poliss-, from Latin polire"; and the OED has citations from 1300. Launder is also a two-syllable word borrowed from French -- the American Heritage Dictionary says that launder is from "Middle English launder, lavender, launderer, from Old French lavandier, from Vulgar Latin *lavandrius, from Latin lavandria, things to be washed, from lavanda, neuter pl. gerundive of lavare, to wash". However, the OED's earliest citation for the verbal form of launder (as opposed to the noun meaning "launderer") is from 1597. Polish works with completive up, launder doesn't.

This (approximate) distinction between short, old, "native" words and long, new, "borrowed" words plays a role in other modern English morphosyntactic distinctions besides the combination of verbs with completive up. In particular, similar patterns exist for other cases of verbs combining with intransitive prepositions (or "particles", as some people call them). These special verb-preposition combinations are sometimes called "phrasal verbs", though CGEL rejects this terminology on the grounds that the verb+preposition combinations do not form constituents in the modern language (p. 274). Whatever the analysis, I surmise that the tendency of such "particles" to associate preferentially with short, old, "native" words originally arose because Old English, like modern German, had a class of "separable prefix verbs", from which the verb+preposition combinations arose historically.

I don't think that there's any functional explanation for the current rather messy situation -- except perhaps that this kind of quasi-regularity in language is natural for humans. But it would be a lot easier on second-language learners if English just allowed completive up to be combined freely with any verb of appropriate meaning. Then Trevor's original example -- from a Vietnam News Agency headline, probably written by someone whose first language is not English -- would have been fine.

Even native speakers are sometimes a little fuzzy about these patterns. Trevor gave a couple of examples in which "concludes up" was used by (apparent) native speakers, and I can provide another with complete.

Finish, like polish, is a word borrowed from French into Middle English, with the OED's earliest verbal citation from 1350 Like polish, finish allows completive up at a modest rate:

"V+ed it up"
Count per 1,000

Complete was also borrowed long ago from French, but the OED's earliest verbal citation is from 1530. Google has 29,900,000 pages in which "completed" occurs, and only one example of "completed it up" in the completive sense, from a knitting site:

(link) I started working on it during yesterday's SnB and then completed it up at home.

This sentence appears to have been written by a native speaker. It might be an artefact caused by substituting a more formal word ("completed") for a more vernacular one ("finished"). Whatever the source of this particular case, I'm amazed by the extreme rareness of this kind of violation of a (syntactically and semantically arbitrary) pattern that was set up some 500 years ago. Whether or not modern media have produced "degraded sensitivity to the individual word", as Camille Pagia claims, we're all keeping up some old lexical traditions extraordinarily well.

Professional Foolishness

A favorite activity here on Language Log is pointing out the deficiencies of what various non-specialists say and think about language and linguistics: journalists (here, here, and here), language pundits (here, here and here), geneticists, political scientists, actors, and sea captains. But as Mark Liberman pointed out in his post on Professionalism, foolishness and irresponsibility are by no means excluded by professional qualifications. A case in point recently came to my attention when I read Words and Rules by psycholinguist and best-selling popularizer Steven Pinker.

In Chapter 8 The Horrors of the German Language Pinker includes (p. 212) a family tree entitled The Ancestry of Modern English. English is shown within a partial family tree of Indo-European. Indo-European is shown as a daughter of Eurasiatic and a sister of Uralic and Altaic. No other subgroups of Eurasiatic are shown. Eurasiatic in turn is shown as a subgroup of Nostratic, with Dravidian and Afro-Asiatic as the other subgroups. Nostratic in turn is shown as a sister of Sino-Tibetan and New Guinea, with the parent labelled with a question mark.

This family tree doesn't look much like what will be found in standard references, such as the Ethnologue. It is flawed in several ways. To begin with, it is actually a fragment of a family tree. Several subgroups of Indo-European are omitted, as are several subgroups of Eurasiatic and one subgroup of Nostratic. A careful reading of the subsequent text will make the astute reader realize that the tree is incomplete, but this is never made explicit. The family tree has no legend or explanatory notes.

Secondly, it appears to indicate that Nostratic, Sino-Tibetan, and New Guinea form a language family. If so, this is misleading, because there is not the slightest reason to believe this to be the case. To my knowledge no such proposal appears in the linguistic literature. The question mark that labels the parent node may be intended to indicate that it is not known whether these three groups are related, but this is not clear. Such unclear relationships are usually depicted with broken lines. The question mark is more likely to be interpreted as indicating that the group suggested has no known name.

By far the most serious problem, though, is the fact that many of the relationships depicted are unproven. There is no language family known as New Guinea. Hundreds of languages are spoken in New Guinea. Some of them are clearly members of the Austronesian language family, the family that also includes such languages as Malay, Hawaiian, and Maori. The rest are grouped under the cover-term Papuan, but they have not been shown to be genetically related. Even the "lumpers" distinguish half-a-dozen language families in New Guinea; historical linguists who insist that relationships be established by the comparative method divide the non-Austronesian languages of New Guinea into 68 language families. There is a proposal by Joseph Greenberg for a language family known as Indo-Pacific, consisting of the Papuan languages, the languages of Tasmania, and the languages of the Andaman Islands, but it has never been supported by adequate evidence and is not taken seriously.

Eurasiatic is another proposal of Greenberg's. It consists of: Indo-European, Uralic-Yukaghir, Altaic, Japanese-Korean-Ainu, Gilyak, Chukotian, and Eskimo-Aleut. Like Greenberg's other proposals, that these languages are related has not been demonstrated by adequate evidence and is not generally accepted. Nostratic, as originally proposed by Vladislav Illich-Svitych, consists of: Indo-European, Uralic, Altaic, Afro-Asiatic, Kartvelian (also known as South Caucasian), and Dravidian. Eurasiatic and Nostratic in their original forms are competing proposals. The family tree shown by Pinker in which Eurasiatic is a subgroup of Nostratic is a compromise developed by some Nostraticists in recent years. Nostratic too is not generally accepted.

The highest level relationships shown, among Nostratic, Sino-Tibetan, and "New Guinea", are pure fantasy. There is no evidence for such relationships.

To be fair, in the text (p. 234) Pinker does mention that most linguists think that the traces of common ancestry are lost in the mists of time beyond groups at the level of Indo-European and Uralic, but he fails to give a real idea of how far out these proposals are. He characterizes the proponents of Nostratic as "a few dauntless linguists" (p. 236), as if the reason that most historical linguists consider such hypotheses to be unproven is that they are insufficiently bold.

I've outlined the problems with remote linguistic comparison in a previous post. Eurasiatic, Nostratic, and other such proposals have not been accepted because there is insufficient evidence that the similarities observed are not due to chance and because the possibility that they are due to borrowing rather than common descent has not been ruled out. It isn't a matter of boldness; it's a matter of evidence.

The impression that most readers will be left with is that Eurasiatic, Nostratic, and "New Guinea" are established language familes and that Nostratic, Sino-Tibetan, and "New Guinea" are known to be related. The careful reader will learn that most linguists are skeptical, but Pinker's disclaimer will have little force in the face of his characterization of the long-ranger fringe as "dauntless" coupled with his implicit endorsement.

What makes this especially annoying is that the inclusion of this family tree is completely gratuitous. The chapter is devoted to showing that rules are not restricted to English but are found in a variety of other languages. The approach that Pinker takes is to start with languages closely related to English and show how remoter and remoter languages also have rules. All that really matters is that his examples not be closely related. He could have made the same point just as well without any discussion of remote genetic relationships.

It's hard to say why Pinker chose to include this family tree. By his own statement, he knows that most linguists don't believe it, so it isn't a mistake made out of ignorance. It is highly unlikely that Pinker has any professional views of his own on historical linguistics. Training in psychology does not normally cover historical linguistics, nor can I find any record of his ever having published on or spoken about or otherwise manifested any interest in the subject. He doesn't seem to have any stake in it. Whatever his motivation may have been, the family tree that he presents is speculative and misleading, hardly appropriate for a work intended for the lay reader. He and his publisher should have known better.

Whom humor

Geoff Pullum has given us a masterful exposition of the grammatical theory of whom. I believe that whom has also been the occasion of more wit than any other single grammatical morpheme in English, and I'll post three examples, taken from Beatrice Santorini's wonderful linguistic humor page.

You should pay particularly close attention to the third passage, in which James Thurber takes up the question of the "Buried Whom". However, the Wodehouse passage provides a helpful set-up, suggesting that the confusion over whom extended to the British aristocracy in the early 20th century.


As far as I'm concerned, "whom" is a word that was invented to make everyone sound like a butler.

(Calvin Trillin, cited in Anne Lobeck, Discovering Grammar).


Normally as genial a soul as ever broke biscuit, this aunt, when stirred, can become the haughtiest of grandes dames before whose wrath the stoutest quail, and she doesn't, like some, have to use a lorgnette to reduce the citizenry to pulp, she does it all with the naked eye. "Oh?" she said, "so you have decided to revise my guest list for me? You have the nerve, the--- the---"

I saw she needed helping out.

"Audacity," I said, throwing her the line.

"The audacity to dictate to me who I shall have in my house."

It should have been "whom," but I let it go.

"You have the---"


"---the immortal rind," she amended, and I had to admit it was stronger, "to tell me whom"---she got it right that time---"I may entertain at Brinkley Court and who"---wrong again---"I may not. Very well, if you feel unable to breathe the same air as my friends, you must please yourself. I believe the 'Bull and Bush' in Market Snodsbury is quite comfortable."

(P.G. Wodehouse. 1960. Jeeves in the offing. Barrie & Jenkins. 183-184.)


The number of people who use "whom" and "who" wrongly is appalling. The problem is a difficult one and it is complicated by the importance of tone, or taste. Take the common expression, "Whom are you, anyways?" That is of course, strictly speaking, correct - and yet how formal, how stilted! The usage to be preferred in ordinary speech and writing is "Who are you, anyways?" "Whom" should be used in the nominative case only when a note of dignity or austerity is desired. For example, if a writer is dealing with a meeting of, say, the British Cabinet, it would be better to have the Premier greet a new arrival, such as an under-secretary, with a "Whom are you, anyways?" rather than a "Who are you, anyways?" - always granted that the Premier is sincerely unaware of the man's identity. To address a person one knows by a "Whom are you?" is a mark either of incredible lapse of memory or inexcusable arrogance. "How are you?" is a much kindlier salutation.

The Buried Whom, as it is called, forms a special problem. That is where the word occurs deep in a sentence. For a ready example, take the common expression: "He did not know whether he knew her or not because he had not heard whom the other had said she was until too late to see her." The simplest way out of this is to abandon the "whom" altogether and substitute "where" (a reading of the sentence that way will show how much better it is). Unfortunately, it is only in rare cases that "where" can be used in place of "whom." Nothing could be more flagrantly bad, for instance, than to say "Where are you?" in demanding a person's identity. The only conceivable answer is "Here I am," which would give no hint at all as to whom the person was. Thus the conversation, or piece of writing, would, from being built upon a false foundation, fall of its own weight.

A common rule for determining whether "who" or "whom" is right is to substitute "she" for "who," and "her" for "whom," and see which sounds the better. Take the sentence, "He met a woman who they said was an actress." Now if "who" is correct then "she" can be used in its place. Let us try it. "He met a woman she they said was an actress." That instantly rings false. It can't be right. Hence the proper usage is "whom."

In certain cases grammatical correctness must often be subordinated to a consideration of taste. For instance, suppose that the same person had met a man whom they said was a street cleaner. The word "whom" is too austere to use in connection with a lowly worker, like a street-cleaner, and its use in this form is known as False Administration or Pathetic Fallacy.

You might say: "There is, then, no hard and fast rule?" ("was then" would be better, since "then" refers to what is past). You might better say (or have said): "There was then (or is now) no hard and fast rule?" Only this, that it is better to use "whom" when in doubt, and even better to re-word the statement, and leave out all the relative pronouns, except ad, ante, con, in , inter, ob, post, prae, pro, sub, and super.

(James Thurber: Ladies' and Gentlemen's Guide to Modern English Usage)

Posted by Mark Liberman at 05:06 AM

Carol Neidle has written to describe an attempt to persuade Boston University to allow American Sign Language to count for its Foreign Language Requirement. An on-line petition, with considerable background information, is here. The page says that "Although the petition was originally intended only for BU affiliates, in fact, people from all over have been signing it. If you do have a BU affiliation (including alumni status), please indicate that, but anyone is welcome to sign."

One of the documents the site provides is the Linguistic Society of America's 2001 resolution:

The Linguistic Society of America affirms that sign languages used by deaf communities are full-fledged languages with all the structural characteristics and range of expression of spoken languages. They have rule-governed systems of articulation, word formation, sentence structure, and meaning, which have been the subject of modern scholarly study since the pioneering work of William Stokoe (1919-2000) over forty years ago. These languages are not merely a set of informal gestures, nor are they a signed version of any particular spoken language. American Sign Language, the language of deaf communities in the United States and most of Canada, goes back almost two hundred years and is historically and structurally unrelated to spoken English. It is also the vehicle of a distinguished deaf culture and has a tradition of visual literature.

The LSA affirms for signed languages such as ASL all the rights and privileges attendant to any spoken languages, including the right to satisfy a student's academic foreign language requirement, just as Spanish, Chinese, Navajo, or any other spoken language can.

Posted by Mark Liberman at 09:14 PM

I really don't care whom

The Fusco Brothers cartoon for April 13 ( see it here) had a woman saying to a man in a bar:


The woman's remark is a little threefold grammar lesson in its own right. Here's the issue: is the whom correct?

Well, there are three or four layers to this. First, the word whom is understood as the predicative NP complement of was. In ordinary English this is a function that goes with accusative case on a pronoun: if you knock on my door and I call out Who is it?, you, as a normal person, knowing that I would recognize your voice, would say It's me. If you said It is I, I would not be nearly so inclined to let you in. It is I is an extremely formal usage, encouraged by really old-fashioned prescriptivists but not seriously used these days by anyone except the unbearably affected.

That, other things being equal, would mean that the case on who should also be accusative. But other things are not equal.

Whom is very restricted indeed for most speakers, and it is highly implausible that it would be used as the complement of be. That is, hardly anyone would be inclined to say, You claim your ancestor was whom?. We would be much more likely to say, You claim your ancestor was who?. The case marking with the pronoun who is the exact opposite of what we find with personal pronouns like he or I.

In addition, accusative case on who does not typically survive when the word is shunted to the beginning of an interrogative or relative clause. That is, even for people who would say You talking to whom? (e.g., to re-query an answer that wasn't heard correctly), it is highly unlikely that if they started the sentence with the wh-word they would use the accusative form: Whom were you talking to?. In normal conversation, the frequency of whom at the beginning of a clause (as opposed to preceded by a preposition) is now virtually zero. And this does not indicate near-universal error: there is no way Who were you talking to? can be regarded as incorrect use of the language. If you are teaching English to foreign learners, you should unquestionably teach them to who in such contexts, not whom.

So, although we would expect accusative on an ordinary personal pronoun after was (as in It's me), what is typical on a wh-pronoun at the beginning of a clause is the nominative (Who were you talking to?). So in fact the accusative in the cartoon is not grammatical in Standard English as normally used. It is what is known as a hypercorrection.

But I should also mention that there seem to be some people who regularly and unconsciously say things like I wonder whom they imagined was going to believe them. That is, they appear to convert who to whom whenever it ends up following a verb, regardless of whether it would have been nominative if left in its logical position (compare Who was going to believe them?). We would expect those people to say I really don't care whom regardless of what followed. Possibly the woman in the cartoon is one of those speakers. In that case whom would indeed be the expected form. But it is not clear that the resultant variety of English could properly be called standard. Prescriptivists would regard I wonder whom they imagined was going to believe them as clearly an error, because whom is actually understood as a subject (the subject of was).

Do you find this confusing? I certainly hope so. Anyone who wasn't a bit confused by this point couldn't have been paying attention. Things are in a confused state. The form whom is dying. For lots of speakers it is really only used right after a preposition in a relative clause (anyone to whom this is confusing), and perhaps sometimes at the beginning of a relative clause (those whom I have succeeded in confusing). It hardly occurs in interrogatives at all (I looked for whom in a couple of months of my recent email, mostly from fellow professors, and I didn't find a single example of it in an interrogative).

It isn't true that, as the grammar pontificators often imply, that the rules are fixed and perfectly simple and everyone ought to know them and it's only laziness if you don't. Often the rules are quite difficult to puzzle out, and very complex and awkward when you've identified them and stated them explicitly. Recently the college-educated daughter of a linguistics professor I know wrote to her dad to ask about a perfectly simple example:

>> Hi Dad - I have a grammar question for you (actually, its my            
>> co-worker's, but I'm the only one with a linguist for a dad...)      
>> The sentence is:  There are 3.6 million New Yorkers on Medicaid, of  
>> who/whom 2.4 million reside in New York City...                      
>> Is it who or whom?

He was astonished to get this, because of course here it is very simple: unquestionably, whom would be normal in this case because it's right after a preposition, and that's the one place whom is still common. But young people in their twenties are beginning to lose their grasp even of that last bastion of whom. It's not surprising. The present situation is multi-layered, subtle, and devilishly complex to describe. At least one linguist has decided there is no correct description of it at all, the situation is just chaos. It's also thoroughly confusing, and of course, just about totally irrelevant to understanding.

The study of grammar interests me academically, and although I am prepared to rage and fume against people who pontificate about it mistakenly, I don't blame people who find the who/whom distinction deeply puzzling. The woman in the Fusco Brothers cartoon probably guessed wrong about whether to say who or whom, but you can hardly say she had no excuse.

Posted by Geoffrey K. Pullum at 04:23 PM

Quantifying American suffering and enjoyment: 60 to 1?

Neal Whitman at Agoraphilia points out that enjoy and suffer from are often used as lightly-flavored synonyms for have:

To lower your risk of cancer, enjoy 3 to 5 servings of fruit per day.
American citizens enjoy the right to vote.
50 million Americans suffer from hemorrhoids.

As Neal observes, the positive or negative connotations are semantically non-restrictive, in some sense, so that the first example doesn't mean that you lose the health benefits if you happen to hate fruit and so consume it dutifully but without enjoyment. He suggests that something similar is going on in phrases like

Our friendly employees will be happy to assist you.

which is not meant to invite the inference that the surly ones will assist you reluctantly, if at all.

Inspired by Neal's post, I decided to use his identification of positively- and negatively-tinted variants of have to calibrate the textual mood of the nation.

Google has 52,000 instances of "Americans suffer from", and most of them seem to be examples of "negatively flavored have":

7% of Americans suffer from acid reflux
Native Americans suffer from high rate of asthma
Some 17 million Americans suffer from social anxiety disorder
3 percent to 24 percent of Americans suffer from substance abuse disorders

The last example is especially revealing, since I'm sure that some fraction of substance abusers would claim that they "enjoy" rather than "suffer from" their substances of choice, if asked.

In contrast, there are only 852 instances of "Americans enjoy". [note: see correction below; ed.] By internet text count, at least, we as a nation "suffer from" things about 60 times more often than we "enjoy" them. The textual disproportion may be even worse, since many if not most of the "Americans enjoy" ghits are matches across a sentence or other punctuation boundary. Of course, there are also plenty of the type that Neal is talking about:

For the most part, we in American enjoy the blessings of religious liberty.
As American's enjoy an unprecedented era of prosperity ...
...those who fought for the freedoms all American's enjoy.
... the appliances that are central to the lifestyles American's enjoy

This search also informed me that many people are very confused about apostrophes, including some who ought to know better, like the Energy Association of Pennsylvania, and that Google's index (perhaps for that reason) just gives up and merges final s and final 's.

In fact, Americans in my experience are a happy and optimistic bunch -- and with good reason -- so the disproportionate amount of American textual suffering is probably telling us something about language, not life.

[Update: Q_pheevr documents a previous discussion of "suffer from" as "negatively flavored have" in Pippi Longstocking, incidentally providing an example of a moral error in usage (check out the material associated with the underlined "usage" in the post).

Q also reports that:

"Canadians are a hell of a lot happier than Americans. Mark reports 52,000 ghits for "Americans suffer from" and only 852 for "Americans enjoy," giving an American suffering:enjoyment ratio of about 61:1. But I got 2,050 ghits for "Canadians suffer from" against 4,730 for "Canadians enjoy," for a Canadian ratio of about 10:23. By my calculations, this makes Americans about 140 times as unhappy as Canadians."

I haven't checked the numbers, but this certainly suggests an alarming enjoyment gap. ]

[Oops: the Language Log fact checking department ran the numbers again, and got 14,900 ghits for "Americans enjoy", as opposed to 51,300 for "Americans suffer from". This puts the suffering/enjoyment ratio at a mere 3.4/1. Checking the Canadian numbers, I get essentially the same numbers as Q did; so after the recount, Canadians are not quite 8 times more joyful. Still a significant gap in textual happiness -- but not such a spectacular one. Why the counts from Google are so unstable is a mystery: surely I didn't make a mistake! :-)

In any case, the Language Log subscription department will cheerfully refund your subscriptions fees in full, as usual, in case of less than full satisfaction.]

Posted by Mark Liberman at 12:35 PM

The magazine of anarchy on the high seas

I'm afraid that no such magazine exists, but for a brief, bright moment yesterday I thought differently. I was sitting in a coffee shop, reading the May Atlantic, and came to the blurb for William Langewiesche's (gripping, horrifying) article on the 1994 sinking of the ferry Estonia:

One of the worst maritime disasters in European history took place a decade ago. On a stormy night on the Baltic Sea, more than 850 people lost their lives when a luxurious ferry sank below the waves. From survivor testimony and other sources our correspondent has pieced together the Estonia's last moments—part of his continuing coverage for the magazine of anarchy on the high seas.

It's not surprising that I made this mistake. The trigram "the magazine of" gets 394,000 ghits, and in 28 of a sample of 30, the of-phrase expressed the magazine's theme, e.g.:

the Magazine of Christian Unrest, the Magazine of Dog Powered Sports, the Magazine of the Military-Industrial Complex, the Magazine of Type and Typography, the Magazine of Speculative Transformation, the Magazine of the Eparchy of Newton, the Magazine of Roll Your Own Cigarettes, the magazine of The Swedish Homebrewers Association

The remaining two cases were an instance of the idioms "X of choice" and "X of influence":

the magazine of choice for plastics professionals, the magazine of influence for glass industry leaders

So my vision of something like "PIRATE! the magazine of anarchy on the high seas" was linguistically probable, if sociologically unlikely. And I realized long ago that my estimates of sociological likelihood in today's world are not very sharp ones.

In the Atlantic's blurb, the two prepositional phrases don't really seem right when placed in the other order: "...part of his continuing coverage of anarchy on the high seas for the magazine." I think that this is a matter of the relative "heaviness" of the constituents -- "the magazine" is semi-anaphoric as well as short.

Here is a paper by Jennifer Arnold and Tom Wasow that discusses these heaviness effects in the case of two other constructions. Arnold and Wasow support the view that "these phenomena stem from constraints on production and planning" rather than syntactic structure or parsing effects. But I don't see how to make a production-difficulty theory explain the difficulty with the order "...part of his continuing coverage of anarchy on the high seas for the magazine" in this example. The writer can take all the time that (s)he wants to craft it -- the problem is that this reader, at least, still finds it odd.

The blurb writer has walked the sentence into a trap. The word order "coverage for the magazine of anarchy on the high seas" doesn't work because it's likely to be misparsed, and the order "coverage of anarchy on the high seas for the magazine" doesn't work because of heaviness problems. This seems to be one of those times when you just have to back up and try it from a different direction.

[Update: Tom Wasow emails to say:

I wouldn't want to say that weight effects are due entirely to the contingencies of planning and production. That would make it hard to explain their occurrence in edited written text. But past attempts to give explanations for weight effects have almost all been entirely in terms of parsing efficiency (notably, Hawkins's 1994 book, but a lot of other stuff, too). I'm sure that postponing long things helps both in both production and comprehension. Jennifer Arnold and I have tried to tease apart the predictions of the two motivations for weight effects in several studies. To the extent that we've been successful, the predictions of the production model have been better. [...].

That said, I agree that there's nothing that could save the sentence you quoted from the Atlantic.

Uh, it was the Baltic, Tom :-)...]

Posted by Mark Liberman at 10:57 AM

Hitting that iceberg

A 4/16/2004 column in the Philadelphia Daily News by Paul Domowitch discusses the pro football prospects of quarterback Ben Roethlisberger. One of the issues is that he didn't start playing the position until his senior year in high school. Roethlisberger is quoted as spinning this in an optimistic direction:

"Quarterbacks that have been playing the position their whole lives, they get to a certain point and can't get much better," he says. "I'm just starting to hit the iceberg. I think I still have a lot of developing that I can do. I still believe I can get a lot better."

"Starting to hit the iceberg" is what you might call a "rarely used cliché." It seems to be a blend of the expression "starting to hit [one's] stride" and the notion that most of the mass of an iceberg is hidden below the waterline.

Mr. Roethlisberger is clearly the fond parent of the phrase -- unless it was written for him by an agent -- since the only other instance in google's index of "starting to hit the iceberg" is in an interview with him posted on the Chicago Bears site:

Q: Since you started so late as a QB, how much better can you get?
A: "That's the thing. A lot of people talk about quarterbacks who played their whole lives and can't get much better. I'm just starting to hit the iceberg and get going. I have a lot of development I can do, so I believe I can get a lot better."

As a fan of tragic drama as well as football, I'm rooting for "starting to hit the iceberg" to become established as a way to describe the early successes of a star's career.

Posted by Mark Liberman at 05:35 AM

600,426,974,379,824,381,952 ways to spell viagra

Actually, after correcting a mistake in the algorithm, there are 1,300,925,111,156,286,160,896. [link via Helen Anderson]

Perhaps this is an example of what the NRC's 1992 report on Computing the Future meant by saying that "new computer technology will have to be fitted to customer needs much more precisely".

Posted by Mark Liberman at 03:12 PM

Balm in Gilead

It bothers me that some humanists are so unscientific.

Consider the recent article by Camille Paglia, "The Magic of Images: Word and Picture in a Media Age", in the winter 2004 issue of the journal Arion. Paglia notes a trend (kids today can't focus on books or even on still images), identifies its cause (television, PCs and video games), and suggests a solution (iconology). I suspect that she's wrong about the trend, and I'm pretty sure that her ideas about the cause are nonsense. If so, this means that her solution is just a reasonable way to teach the history of art, rather than a recipe for the Salvation of Western Civilization.

The trend that concerns her is this: "Interest in and patience with long, complex books and poems have alarmingly diminished not only among college students but college faculty in the U.S." She offers a McLuhanesque analysis of alleged causes, replete with references to "physiological optics" and Hermann von Helmholtz. I don't think that this analysis makes much sense, but that's a topic for another post. The issue today is her claim of "cultural dissipation since the 1960s", where "the new generation, raised on TV and the personal computer but deprived of a solid primary education, has become unmoored from the mother ship of culture", and lacks "the most basic introduction to structure and chronology".

Paglia offers no evidence for this trend except personal anecdotes (her friends at SUNY Binghamton in 1965 "gathering impromptu at midnight for a passionate discussion of big, challenging literary works like Dostoyevsky's The Brothers Karamazov") and assertions of authority ("as a classroom teacher for over thirty years"). She's so sure that her own interpretation of her own experience is correct that she accepts it without examination, exemplifying what she calls "the foolish, belligerent confidence of my own generation, with its egomaniacal quest for the individual voice". Perhaps her friends at SUNY in 1965 were not typical of students then? or perhaps her impressions of those taking her classes at the University of the Arts are not typical of students now? or perhaps her relationship to students has changed as she's gotten older? If she's ever tried to answer questions like these, she doesn't mention it.

There has been a lot of commentary on Paglia's article -- technorati finds links in 43 sources -- and nearly of it has been positive. For instance, the usually sensible Erin O'Connor at Critical Mass writes that

English teachers know her claims about our collective degraded relationship to language to be true. They see it in their students, who object to reading long things, who object to reading hard things, who never think to look up words or ideas they don't know, who struggle not only to perceive linguistic nuance but also to keep track of plot twists and character names, who cannot independently picture character and scene inside their heads, who cannot grasp the rhyme or reason of verse that is not free verse.

My first reaction to all of this is to wonder how to make it consistent with a piece of external evidence, namely the trend towards ever-longer popular novels. How could David Foster Wallace have become a best-selling author in 1996 with a 1079-page novel that includes nearly a hundred pages of pseudo-scholarly endnotes? Why has Neal Stephenson just cranked out a second 900-page tome on the adventures of 17th-century intellectuals? How is that the first novel in Stephen King's Dark Tower series, published in 1978, is only 256 pages long in the "revised and expanded edition", while the seventh in the series, published in 2004, is 768 pages? What about Margaret George's recent 964-page "Memoirs of Cleopatra"?

Hasn't someone told these folks that "interest in and patience with long, complex books have alarmingly diminished"? The plural of anecdote is not data, but a systematic survey of the publishing industry could provide some evidence about whether readers are more or less interested in long, complex books now than they were 40 years ago.

My second reaction to Paglia's jeremiad is that it's not consistent with my own experience of students. As a college teacher and as the Faculty Master of a college residence, I find that some students have a lot of interest in long, difficult books; some have a little; some have none at all. Some students are fascinated by linguistic nuance and others could care less. Some have a terrific grasp of structure and chronology, while others seem to experience literature and life as a sort of impressionist blur. My memory of students in 1965 is that they were pretty similar, in these respects, to students today. That is, they were very diverse. I'm not confident that my own experiences support any differences in the relative proportions of types between 1965 and 2004, but in any case, there are big demographic and cultural differences between my two samples, never mind the difference in viewpoint.

The thing is, my subjective impressions don't define what the facts are, any more than Paglia's and O'Connor's do. One of the achievements of the human species has been to find ways to deal with disagreements like this, by assembling and evaluating intersubjectively valid evidence. That's the normal way to proceed in the sciences, and it's also the traditional goal of humanistic scholarship. However, it seems to have been abandoned by some intellectuals, in particular those like Paglia who aspire to be media stars on the basis of their "egomaniacal quest for the individual voice". This transformation has perhaps been assisted by the infusion into the humanities, via literary theory, of large doses of the kind of continental philosphy that regards "truth" as a matter of social convention, charisma and power.

Paglia's claims are not some inconsequential little thrown-off aperçu, whose validity doesn't matter enough to investigate. She says that TV and video games have ruined the brains of American young people, so that they have "degraded sensitivity to the individual word and reduced respect for organized argument", and "cannot sense context and thus become passive to the world, which they do not see as an arena for action". If that were true, it would be more important than AIDS, more important than cancer, more important than racism, more important than terrorism; and it would call for a more significant response than Paglia's remedy, which is to show slides of Byzantine icons and crystal skulls.

Personally, I happen to think that it's false. But if I thought it was true, I'd spend a lot of time and energy finding evidence to persuade others.

Paglia feels that "the rise of electronic media" has caused a "massive transformation in Western culture", so that "interest in and patience with long, complex books and poems have alarmingly diminished not only among college students but college faculty in the U.S."

What I take away from her article is that the influence of media attention and bad philosophy have caused a massive transformation in humanistic culture, so that interest in and patience with rational inquiry have alarmingly diminished among people like Camille Paglia.

Posted by Mark Liberman at 10:39 AM

We are all Big Brother

According to Ray Girvan, the credulous New Scientist story on "chatnannies" (follow the links here for a summary) has been pulled and replaced by a page saying "Serious doubts have been brought to our attention about this story. Consequently, we have removed it while we investigate its veracity. -- Jeremy Webb, Editor". Good for them. The BBC story has been pulled with no explanation. Shame on the beeb for being unwilling to admit error.

Ray also links to an excellent article by Charles Arthur in the Independent, who explains in a very clear way exactly what Ken Layne's expression "we can fact-check your ass" meant in this case. Arthur's conclusion:

While everyone is worrying about Google acting like Big Brother, they're ignoring the fact that it has democratised Big Brother and made it available to anyone. Imagine the telescreens in 1984 being able to see what anyone else was doing: the mendacious society depicted by Orwell couldn't have continued.

What Google and the other search engines do is like the Victorian concept of the panopticon, the prison in which every prisoner can be seen from a single place. But our existence now differs from both those concepts because we can use Google to watch each other. We are all Big Brother. The only secrets that remain are those that aren't yet on the web - and that's a pool of knowledge that is shrinking daily.

It's a bad time to be a member of a "shame culture".

And speaking of shame, the 404 from the BBC is not just because of some page relocation: searching the BBC News and BBCi sites yields a message that 'There are no websites that match "chatnannies".' Imagine their reaction if some politician tried this: endorsing an apparently nonsensical proposal, saying nothing for a three weeks in response to serious doubts from credible sources, and then silently removing the proposal document and trying to pretend that the whole thing never happened. You'd think that a week after they pulled their story, they might have gotten it together to publish something about the whole sequence.

Posted by Mark Liberman at 11:47 AM

They have created a new world order

So says Whitney Pastorek in an article in the Village Voice. She feels that "while a lot has been made of the cultural implications of the Blogosphere, I am not convinced that anyone has taken the time to talk openly and honestly about the effects it is having on the day-to-day existence of the world's adult non-bloggers, or what I like to call The Way Blogs Are Ruining My Life."

She lists the problems under five headings:

1. No one shows up for anything anymore.
2. No one tells me anything anymore.
2a. No one has fights anymore.
3. No one invites me to anything anymore.
4. They have created a new world order.
5. Did I mention that blogs are ruining my life?

Read the whole thing. It reminds me a bit of how my mother used to complain -- twenty years before email became popular -- that because of cheap phone calls, no one wrote letters anymore. Though my mother never ended her complaints in such a dramatic way:

Listen. My name is Whitney Pastorek, and I do not have a blog. I am not on Friendster, I do not live in Williamsburg, and I do not think Death Cab for Cutie is a particularly great band.

But I exist. I am a good person, a good friend, and my thoughts and opinions have weight and merit. The bloggers do not control me -- they only control each other and massive amounts of bandwidth, which isn't even a real thing, just something made up by web-hosting companies to charge more! People! If you find yourself on the lower levels of the B.C.S., join with me in saying NO! NO to letting them diminish our self-worth! NO to letting them drag us out to flash mobs! Turn your faces to the sun! Stand and fight!

My mother, while complaining about the Demise of the Letter, nevertheless participated in the new long-distance telephone culture of the 1960's. She was reluctant at first -- through the 1950s, not only were long-distance calls prohibitively expensive, but we also had a party line, so that tying up the phone even for local calls was avoided as anti-social. But before long she was the center of an active network of telephone-mediated communication. Likewise, I'm inclined to feel that Ms. Pastorek, who seems overly sensitive to status indicators, should Get Over It and start to participate in the culture that she feels excluded from. There are no barriers, last I looked -- anybody can go to typepad or wherever and start a blog, if they want to.

But the fact that I feel obscurely bad about not having noticed Pastorek's article for a full six weeks is troubling, I'll admit, even though I've long since made my peace with the fact that the world is full of loops that I'm out of.

Posted by Mark Liberman at 10:43 AM

Nanotechnology in action goats

This continues our series on traditional uses of organic nanotechnology. I was fascinated to learn from Ray Girvan about argan oil, which is produced in southwestern Morocco from the nuts of the argan tree, after they have been eaten and excreted by tree-climbing goats.

"When goats eat the fruit, the fleshy part is digested but the nut, because of its hard shell, is excreted. Later, the nuts are collected by farmers to produce oil."

"The production of argan oil, which is still mostly done by traditional methods, is a lengthy process. Each nut has to be cracked open to remove the kernels, and it is said that producing one litre of oil takes 20 hours' work."

So either the oil is really good; or the residents of this part of Morocco have a lot of time on their hands. I guess that the metabolic economics come out positive: a liter of oil should be good for about 15,000 calories, and 20 hours of work at nut-processing, assuming it's spread over three days, might consume 7,500 or so. Still, it's a tough way to make a living.

You can buy various "argan products" here, including a "natural beauty care serum" and a "tasty spread". The products are certified as organic by "Qualite-France", but for some reason they haven't thought to advertise them as the fruits of nanotechnology, despite the fact that the bacteria that help process the argan nuts are roughly 1 micrometer in size (e.g. Lactobacillus is about 1x2 micrometers, Pediococcus is about 0.8 micrometers in diameter), and their internal machinery is of course significantly smaller, well down into the nano range.

Posted by Mark Liberman at 09:33 AM

Oop's I did it again

Mark asks why he can't say Geoff Nunberg's Peter Trudgill's headline.

More specifically, he notes that:
  1. The pattern (X's Y)'s Z is unproblematic - famous examples would be His Master's Voice and Her Majesty's Secret Service.
  2. Marcel Gagne's User's Guide for Linux is also ok. That would be X's (Y's Z).
So what's wrong with Geoff's Peter's headline, which we can imagine being parsed like Gagne's User's Guide?

I guess it's just the same thing that's wrong with the unparsable his a headline. The possessive likes to take an expression that picks out a property and can function as a common noun, or at least an N' ("N-bar", things like yellow banana). Unfortunately Peter Trudgill's headline doesn't cut it as an N or an N' (* a/the/every Peter Trudgill's headline), whereas user's guide is a fantastic noun (a/the/every user's guide, not to be confused with the equally acceptable a/the/every users' guide.)

So it all seems very simple, modulo the fact that lots of people can't figure out the standard use of 's in English - Google finds 118 occurrences of Oop's I did it again. (A use of 's to mark the plural, plus reanalysis of oops as the plural of oop?)

However, the possessive construction is a slippery creature: you have to watch its every move. Oop's.

Posted by David Beaver at 04:47 AM

Another punning etymology

As SC has recently pointed out, medical science has finally redeemed all those bad puns about the Diet of Worms. According to this New Scientist article, this BBC story and this press release, a new German company named BioCure will soon be marketing "a drinkable concoction containing thousands of pig whipworm eggs", if the European Agency for the Evaluation of Medicinal Products approves. BioCure's "sister company BioMonde sells leeches and maggots for treating wounds", so they know the medicinal-uses-of-parasites business.

It's curious, by the way, that the worm cocktail, though developed by Joel Weinstock at the University of Iowa, and tested in clinical trials in the U.S., will be marketed in Europe -- none of the cited sources explains this.

According to the New Scientist article, "Weinstock's theory is that our immune systems have evolved to cope with the presence of such parasites, and can become overactive without them", resulting in autoimmune disorders such as ulcerative colitis and Crohn's disease.

When I was a youth in the wilds of rural Connecticut, our local pharmacist would pay us for gathering leeches. At a nickel per leech, this offered a welcome way to supplement our income while paddling around in cool ponds and river pools on hot summer days. At the time, I thought this was a sign of a quaint survival of traditional practices among the local Yankees, but apparently it was a harbinger of 21st-century biomedical science. The Germans probably breed their leeches in stainless-steel vats, alas.

But my point here, if I have one, is the punning etymology of diet. According to the OED, it's a conflation of derivations from the Latin dies "day" and the (etymologically unrelated and not cognate) Greek δίαιτα "mode of life":

Med.L. diēta had the various senses ‘day's journey’, ‘day's work’, ‘day's wage’, ‘space of a day’, as well as that of ‘assembly, meeting of councillors, diet of the empire’. The same senses, more or less, are (or have been) expressed by Ger. tag, and F. journée day. Diēta has therefore been viewed as a simple derivative of L. dies day, distinct from diæta, Gr. δίαιτα, DIET n.1 But it seems more likely that one or other of the senses developed from diæta was associated with dies, and led to the application of the word to other uses arising directly from dies. One of the senses given by Du Cange is ‘the ordinary course of the church’: this seems naturally transferred from δίαιτα, diæta, in the sense ‘ordinary or prescribed course of life’, which might be understood to mean ‘daily office’, and so lead to the use of diēta for other daily courses, duties, or occasions.

See this post for another example of punning etymology, in the case of pole.

Posted by Mark Liberman at 12:14 AM

The Naval Safety Center offers an amusing slide show of confusing signs. Some other (textual) favorites: "LEFT TURN MUST TURN LEFT"; "Parking for Drive-through Service Only"; "CAUTION NO WARNING SIGNS". The logical analysis of signage can be quite tricky, in fact, as discussed here.


Posted by Mark Liberman at 10:43 PM

Via in chains

The inventive q_pheevr uses world knowledge to construe Geoff Nunberg's headline example "Tribe Homer Barrage Salvages Split":

I think I can figure out Nunberg's example, although I couldn't have done it without being told the identity of the newspaper. Tribe has to refer to the Cleveland Indians, and so "Tribe Homer Barrage" is a large number of home runs hit by them, which means that Salvages must be the verb (aha!) and Split the direct object. So the headline can be paraphrased as "Large number of home runs hit by the Cleveland Indians rescues Croatian port."


Now here's another problem. The headline (as Geoff and Q both note) was originally discussed by Peter Trudgill, from whom Geoff learned about it. So perhaps I should have called it "Peter Trudgill's headline". But what I really wanted to write was "Geoff Nunberg's Peter Trudgill's headline".

Of course I can't, and didn't. But why is the typically blogophrastic expression "headline via Trudgill via Nunberg" fine?

Here's a typical example of this "chained via" construction:

Keeping Current Via Tenant Via Librarian in Black Via The Shifted Librarian

Just found this when I clicked to Jenny's post about the Librarian in Black's post about RSS which lead me to her note about Roy Tenant's article for Library Journal about keeping current with new technologies. Pretty darn cool.

I shouldn't really even call this a "construction", it seems just like a recursive use of via. So will someone please explain to me why I can't use recursive "X's Y's Z" the same way?

It won't make me happy to tell me that the normal parsing is "(X's Y)'s Z". In the first place, that just restates my question. In the second place, it ain't necessarily so. In "Marcel Gagne's User's Guide for Linux", for instance, it's the guide and not the user that is construed to be Marcel's.

I'm probably just being dense. I have a really bad cold and probably shouldn't be allowed at the controls of a web browser, much less a finely tuned linguistic analysis. I'm certainly not capable of navigating the index of the Cambridge Grammar. So a little help will be much appreciated, thank you in advance.

Posted by Mark Liberman at 09:47 PM

Writers just want to be humiliated

Over and over. So says Robin Robertson, quoted by Dinitia Smith in a NYT review of his new book "Mortification: Writers' Stories of their Public Shame". Ms. Smith publically trips over her own shoelaces in the third paragraph, where she writes:

The idea for the book came to him after he escorted authors to readings and finding "a sign saying 'Reading Canceled,' or three chairs occupied by people released from mental institutions and not thought to be violent," he said. [emphasis added]

There's a simple and common mistake here -- the highlighted "and" connects the tensed verb "escorted" and the gerund "finding". This seems like an error to me, and it's not a difference in dialect or degree of formality, as far as I can tell. It's just a mistake. I bet that the author would call it a mistake if asked about it.

I speculate that she originally wrote "after escorting ... and finding ...", or "after he escorted ... and found ...", and then changed one of the verbs without changing the other. Or maybe some distracted editor was responsible.

But I didn't write this to one-up Ms. Smith, or even to underline the irony of the error's context. I'd like to use this example to suggest an irony of a different order: an argument for rationalist epistemology of language from the richness of the stimulus.

The usual (poverty of the stimulus) line-up is something like this:

Rationalist: Much grammatical knowledge must be innate, or at least somehow independent of experience, because linguistic experience is simply too impoverished to define the emergent patterns.

Empiricist: No, if you look at the facts, you can see that linguistic experience is much richer than you thought. The patterns that are learned are plausibly the result of inductive inference applied to documented experience.

See this article by Geoff Pullum for a fuller discussion. As Geoff points out, the poverty-of-the-stimulus case depends on "hyper-learning" -- cases where things are "learned" about structures that have never been seen.

The factual argument, to the extent that there is one, then comes down to the question of whether a language-learner's experience includes certain kinds of examples or not. And generally the rationalists are rooting for less stuff in the data, while the empiricists are rooting for more.

But this argument has an inverse form, it seems to me: "hypo-learning", where learners ignore commonly-encountered structures. As common sense tells us (and construction grammarians variously document and explain), people are ready and willing to learn all sorts of crufty constructions, from "what's X doing Y" to "all your X are belong to us". And as Smith's little editing error exemplifies, conjunctions of incompatible constituents are pretty common. So why don't we all learn a construction consisting of a tensed VP conjoined with a gerundive VP?

I don't know -- but here's a speculation. People are so constituted as to prefer their grammars to be "coherent", in some a priori sense. Conjunctions of tensed and gerundive VPs are incoherent enough that we are reluctant to learn a grammar that licenses them. When we occasionally encounter examples of such structures, we tend to ignore them or write them off as mistakes.

When we go to look at the facts, this argument reverses the traditional allegiances: the empiricists should want "incoherent" conjunctions to be as rare as possible, so that learning requires no special expectations about "coherence"; the rationalists should want the opposite.

Posted by Mark Liberman at 02:54 PM

The sci.lang FAQ

Originally written by Michael Covington, and now maintained by Mark Rosenfelder, the sci.lang FAQ is well worth consulting. Because it emerged in the late Cretaceous, when giant newsgroups roamed the earth, some may regard it as a sort of living fossil, and may even be surprised to find that it is still very much alive. Recently-evolved denizens of the blogosphere may not even have heard of it. Here's some evidence, both of its relevance and its neglect.

Semantic Compositions asked (on 4/10/2004) for help in figuring out where linguists can be found in popular culture. Ryan Gabbard contributed The Sparrow; Bill Poser cited Digital Fortress and The Iceman; that's all that technorati knows about, but Uncle Jazzbeau adds several movies (Forbidden Planet, Sherman's March, Stargate, Atlantis) and links to the 1975 book Linguistics in Science Fiction.

The comments on SC's site add a few other examples: "Dr Ransom, the hero of C.S. Lewis's interplanetary trilogy", "Jasper Fforde's The Eyre Affair and Lost in a Good Book", "Star Trek: Enterprise and Hoshi Sato", :the ... Suzette Haden Elgin books: Native Tongue and its sequels".

Nobody seems to have thought to link to sci.lang FAQ 15, which answers the question "What are some stories and novels that involve linguistics?"

Wandering the corridors over the sci.lang FAQ, you might run across other little treasures, such as this page explaining how to estimate the probability of chance resemblances between words in unrelated languages (written by Mark Rosenfelder, the current maintainer of the sci.lang FAQ site). Check it out!

Posted by Mark Liberman at 09:55 AM

Kopi Luwak

Betsey Dexter Dyer's Field Guide to Bacteria has a chapter on "Gram-positive bacteria of foods and drinks", where I learned about Kopi Luwak, a kind of coffee whose beans "are fermented in the intestines of the civet cat paradoxurus hermaphroditus -- also called the luwak. These animals eat coffee beans and defecate them in a form considered to be enhanced."

The many internet sites dealing with Kopi Luwak don't try to hide its transmission channel -- as shown in the picture above. But they don't tell you what happens in the passage through the civet, referring delicately to "stomach acids and enzymatic action", "enzymes in the animals' stomachs", and so on. As Dyer explains, it's really bacterial fermentation, involving the same critters -- such as Lactobacillus, Pediococcus, and Leuconostoc -- that add flavor to Belgian-style beers, kimchi (where some 200 bacteria contribute, including Streptococcus faecalis), and other fermented foods.

My favorite comment on Kopi Luwak is from Smartest_gurl, age 10, at kidzword.com, who says: "These things are so true. I know it! These rare, tasty treats sound yummy. Even if they weren't true, I wish they were!!!"

I agree. But I hope that coffee roasting and brewing kills coronavirus.

[Update: Ray Girvan links to a University of Guelph study showing that Kopi Luwak "has lower bacterial counts than regular coffee". In itself this is not very reassuring, since regular (i.e. non-Luwak) coffee beans are freed from the mucilaginous pulp that surrounds them by a process of fermentation, i.e. bacterial digestion. It just happens in vats rather than in civet cats -- and presumably involves somewhat different bacteria and a different chemical environment. My health question was not with possible bacterial contamination, but with the reports of civet cats being a reservoir of SARS. That's in China, not Indonesia; and it may be a different kind of civet, I'm not sure.

Meanwhile, the guys at Guelph broke out a colorimeter, a scanning electron microscope, and an electrophoresis system in order to determine that Kopi Luwak beans are indeed somewhat different from your standard supermarket Columbian beans. Though more study is needed... I'm a big fan of the scientific method, but this reminds me of a spoof term paper I once wrote, many years ago, after having read one too many phonetics articles debunking the perceptions of phonologists. Using (actual) measurements of length, diameter and weight, I was able to show that there were no statistically significant differences between a sample of 20 carrots and a sample of 20 sausages. Indeed, there were subgroups of sausages that were more different from one another than any of them were from the carrots, and vice versa. If I'd had access to a colorimeter and a scanning electronic microscope I certainly would have used them too. Electrophoresis might have spoiled the joke, depending on how one went about it.]

[Update #2: As so often happens when ones journeys to strange places, one finds that Dave Barry (or someone who writes just like him) has already been there. Dave has tasted the stuff, and was not impressed, lactic acid bacteria or no lactic acid bacteria (Link emailed by Glen Whitman, who posted the article on his web site a few years ago). ]

[Update #3: Look here, here and here for a fascinating three-part series on the practical microbiology of coffee fermentation. The home page of the author, Ken Calvert, is also well worth a look.]

Posted by Mark Liberman at 07:31 AM

It is the nature of a teenager to want to destroy

So says Michael Chabon in a NYT op-ed piece on the Jan Richman/Academy of Art brouhaha. This seems one-sided to me: too negative about teenagers, not negative enough about adults. His sarcastic conclusion:

Let teenagers languish, therefore, in their sense of isolation, without outlet or nourishment, bereft of the only thing that makes it all bearable: knowing that somebody else has felt the way that you feel, has faced it, run from it, rued it, lamented it and transformed it into art; has been there, and returned, and lived, for the only good reason we have: to tell the tale. How confident we shall be, once we have done this, of never encountering the ugliness again! How happy our children will be, and how brave, and how safe!

I agree with his anti-censorship conclusion, though his argument seems facile, especially since he starts from the premise that "We tend to view idealism and cynicism as opposites, when in fact neither possesses any merit or power unless tempered by, fused with, the other." Doesn't this argue for some intervention on behalf of positive values, to strike a balance?

Anyhow, this is an old argument. Goethe's Die Leiden des jungen Werther (The Sorrows of Young Werther) was said to have inspired many suicides after it was published in 1774, when its author was 25. Chabon is skeptical that "when, once in a great while, a teenager reaches for an easy gun and shoots somebody or himself" it would have made any difference "if we had only censored his journals and curtailed his music and video games". Isn't this too facile? Suppose that Young Werther really did inspire a rash of suicides across Europe, while undermining the Enlightenment and helping to start the Romantic movement. Should it have been suppressed on that account, as many proposed at the time?

All this is not entirely irrelevant to language. Some people think that adolescent identity formation plays as big a role in language change as the initial acquisition process does. But the impulses involved seem to be more creative than destructive. In any case, it's certainly fruitless to try to thwart them, though perhaps adult disapproval is an important part of the package, like pruning roses to improve their blooms.

Posted by Mark Liberman at 09:29 PM

Stick That in Your Parser and Stroke It

Geoff's example of a difficult-to-parse sports headline -- "Bonds Ties Mays" -- didn't actually strike me as the sort of thing most of us San Francisco Chronicle readers would have paused even a moment over. (After all, we've had a slot open for that header all week.) But it put me in mind of a genuinely parser-boggling headline that Peter Trudgill once mentioned to me. He was visiting Cleveland and saw this on the sports page of the Plain Dealer:


Peter said he took the page back with him to England and posted it on the door of his office, challenging the students in the linguistics program to decipher it (or even confidently identify the verb). He told me he had no takers.

Posted by Geoff Nunberg at 08:39 PM


Computational linguists sometimes try to make up short sentences that would give a parser a lot of trouble by virtue of ambiguities and other seeds of confusion. TIME FLIES LIKE AN ARROW was a famous invented case. The San Francisco Chronicle headline today struck me as being just about as difficult a natural example as I'd ever seen, except for people who had exactly the right background information at the ready:


Junk bonds? Neck ties? Aprils, Mays, Junes? What's going on here? The first and second words could be either plural nouns or singular-inflected verbs. The third can be either a month name or a modal verb, but in neither capacity does it normally have an "S" on it... One can see how the parser might gag.

Somewhat to my surprise, I got the correct interpretation instantly, but then I live in the greater San Francisco Bay Area. I think some natural language processing systems and some non-Americans might have had a few CPU seconds of trouble with it.

For sports fans, the huge number "660" beside the headline told all. But for the benefit of those in England or Australasia, and the NLP systems who read Language Log, and those who possess even less knowledge about sport than I do, the key is that baseballer Barry Bonds has just hit his 660th home run, so he is now tied with his godfather, Willy Mays, at a number that only the legendary Babe Ruth and Hank Aaron have ever exceeded in the history of baseball. A lot of people had been waiting with bated breath for this to happen as Bonds lingered on the brink at 559 home runs, and for them — and for you, given the information I just supplied — pragmatics rides to the rescue. As it so often has to do.

Posted by Geoffrey K. Pullum at 06:29 PM

Interface-off: Nielly vs. LL Spool J

Having found something language-relevant to report concerning Jakob Nielson, I reckon that I'm entitled to point our readers towards this delightful comic strip by Tom Chi and Kevin Cheng. Linguistics needs better cartoonists, as we've suggested before. I wish our field had some as good as these guys.

When you're done with Ain't Nothing but a UCD Thang, check out Content to be Stylish.

Posted by Mark Liberman at 06:18 PM

The OED: never leave home without it

Well, I should have known better than to comment on a striking word usage without checking both Google and the OED. I've criticized others on this score, and now I'm caught myself.

When Fernando wrote in this morning about " hackers lurk through holes in hot spots", I did google "lurk through", as Fernando had done. But being almost late for my 9:00 a.m. class, I left out the OED. And of course it turns out that "[n]ot only is lurk through not a malapropism, it's not even a neologism", as Q_pheevr at "A Roguish Chrestomathy" has pointed out, citing a "delightfully alliterative example" from Pier Ploughman (1393) of "lurking through lanes." Well, "[l]orkynge þorw lanes", actually, but you get the idea. Q also observes that "hackers lurk through holes in hot spots" is a "charmingly alliterative line of trochaic tetrameter", suggesting that USAToday "does at least seem to employ a poet or two in its headline-writing department".

Posted by Mark Liberman at 05:57 PM

Mind reading experiments at the University of York

Last fall, I wrote about why public cell phone conversations are annoying, and speculated that "[t]he louder a conversation is, the more intrusive and annoying it is if you don't care to listen in. The thing is, though, a given cell phone conversation seems much more intrusive and annoying than an equally loud live conversation." Yesterday, Jakob Nielson reported a study by Andrew Monk and others from the University of York in the UK that confirms this speculation:

The researchers staged one-minute conversations in front of unsuspecting commuters who were either riding a train or waiting for a bus. In half the cases, two actors conversed face-to-face while seated next to a potential test participant. In the other half, a single actor talked on a mobile phone while seated next to a potential participant.

Furthermore, the actors conducted half of the conversations at a normal loudness level, whereas the other half were exaggeratedly loud (as measured on a volume meter). The actual content and duration of the conversations were the same in all conditions.

On average, the bystanders found the extra-loud conversations more annoying than the regular-volume ones (2.7 vs. 1.7 on a scale of 5), but they also found the cell phone conversations more annoying than the face-to-face ones (2.8 vs. 1.6). The cell phone effect was bigger than the volume effect. (Andrew Monk, Jenni Carroll, Sarah Parker, and Mark Blythe: "Why are Mobile Phones Annoying?" Behaviour and Information Technology, vol. 23, no. 1, 2004, pp. 33-41.)

The annoyance levels seem pretty low all around -- I guess it's that famous British tolerance.

I also suggested an explanation for the effect, namely that "public cell phone users are annoying because mind-reading is hard work". In a technical sense of "mind-reading", of course. I even speculated on the neurological mechanisms:

When you're sitting in a restaurant or a railroad car, hearing one side of a cell phone conversation, you can't help yourself from trying to fill in the blanks. And after a few seconds of this, your paracingulate medial prefrontal cortex is throbbing like a stubbed toe.

Unfortunately, the York group did not do fMRI scans.

Posted by Mark Liberman at 05:01 PM

Good advice about advice about usage

Pedantry lays out a reasoned defense of prescriptivism, in response to my Field Guide to Prescriptivists. What he has to say is well worth reading. Details aside, I don't disagree with any of it, and it made me realize that my earlier post is easy to misunderstand. I started with the observation that not all advice about usage is bad, while complaining that too much of it is "logically incoherent, factually wrong and promptly disobeyed by the prescriber". The body of my post, after some fluff about bacterial DNA, was an attempt to offer dimensions for characterizing all sorts of arguments about usage, including those I happen to agree with as well as those I don't (and those I'm not sure about).

The dimensions I suggested are orthogonal to questions of fact. My aim in the "Field Guide" post was to clarify the logical form of arguments about usage, as distinct from arguments about the facts of language. An argument of a given form might be valid or invalid on the facts of the case. For example, a surprising number of college students believe that speakers of non-standard varieties are producing a mistaken version of standard forms, exactly like someone who makes mistakes in arithmetic. This is a particularly naive and ignorant form of the "Universal grammar" argument. But not all "Universal grammar" arguments are invalid -- the many overnegation phenomena that we've surveyed are plausibly examples where some commonly-found forms are simply wrong on logical grounds.

Similarly, a factually valid argument can lead to different conclusions depending on language-external judgments. Thus "g-dropping" in many non-standard dialects of English is a residue of an older pattern, keeping separate two morphemes that the standard language has conflated. A g-dropping speaker might decide to conform to the norms of the standard language, or retain the original pattern -- that's exactly like the choice about whether to wear a western string tie or a standard-issue necktie. It's a question of fashion and context, not a question of logic. The implications of tradition are if anything on the side of the non-standard forms, in this case.

Pedantry concludes that "language in its social context has normative elements that we can not ignore. It would be better to embrace them and make our prescriptivism rational instead of leaving it to nonsense merchants in the Times." I agree: the American Heritage Dictionary's usage notes, edited by Geoff Nunberg, are one good model for rational prescriptions about usage; the many normative observations in the Cambridge Grammar of the English Language, co-edited by Geoff Pullum, are another. However, prescriptivism has been so regularly associated with idiocy for so long that you need to choose your words carefully.

Posted by Mark Liberman at 01:27 PM

Lurking through holes

Fernando Pereira emailed to forward Leslie Kaelbling's observation of another apparently mixed-up headline in today's papers: "Hackers lurk through holes in hot spots". But Fernando points out that Google finds 1070 instances of "lurk through", and remarks "so much for our intuitions about 'lurk through'".

My first throught was that the cited headline is a malapropism for "leak through". After reading the story, I guess it's a sort of blend of "leak through" and "lurk throughout". Most of the google hits for "lurk though" seem to be genuine compositional uses of "lurk" and "through" (in the sense of "throughout", "along", "during" etc.).

By contrast, the several examples of "ignite a torrent" that Google finds seem to be ordinary mixed metaphors, caused by the use of one or both of the (content) words in extended senses that have been bleached of their original content. Thus when someone writes that "PUHCA repeal is expected to ignite a torrent of utility 'mega mergers'", they just mean to say "start a large number (of mergers)" in a more vivid way, and the implicit mixture of fire and water doesn't occur to them at all.

There's nothing really wrong with this, except that if the writer's language is more bleached than the readers' are, the readers get distracted (and amused). The word muscle originally meant "small mouse", but it doesn't bother us when someone writes a headline that says Intel adds more muscle to Xeon MP. No one makes fun of them for talking about adding mice to the cache, because only scholars and pedants remember the word's source.

Posted by Mark Liberman at 08:43 AM

A Field Guide to Prescriptivists

Like everyone interested in language, we here at Language Log spend a lot of time countering bad advice about usage -- for example here, here , here, and here, to pick at random from the past few weeks. Not all language advice is bad -- the many usage notes in the American Heritage Dictionary are generally excellent, for instance -- but many prescriptive strictures about language are "logically incoherent, factually wrong and promptly disobeyed by the prescriber", as I put it recently.

Bad linguistic advice is not only common, it's also confusingly diverse. Over the past few weeks, I've been picking away at an analysis of this diversity, under the working title "A Taxonomy of Prescriptivists". But aside from sounding like one of those tiresome made-up collective nouns, this title presumes that prescriptivists can be divided and subdivided according to their several kinds, like species of higher animals. I've recently been reading a wonderful book -- A Field Guide to Bacteria, by Betsy Dexter Dyer -- which has reminded me that classificatory features --even genetic ones -- need not be distributed in a tree-structured way.

Bacteria are hard to classify on the basis of how they look: "most bacteria are tiny rods... many of the rest are tiny spheres." As a result, "metabolism... has traditionally been the primary characteristic for distinguishing them." More recently, DNA comparisons have revolutionized ideas about bacterial classification. However, there's a problem:

"Bacteria are extraordinarily promiscuous with their DNA. By a process called 'horizontal transfer', DNA sequences can be exchanged among different species and even among different kingdoms. Horizontal transfer of DNA is not only possible, but it is apparently carried out readily... bacteria can acquire DNA sequences not only from each other but from humans, for example... Some microbiologists (such as Sorin Sonea and Maurice Panisset) have suggested that there are really no bacterial species at all but rather a sort of continuum of flowing genes over a huge amount of space and time. At any given point we have a snapshot that gives us the illusion of taxonomic groups because exchanges occur most easily between similar bacteria and less easily between more distantly related groups." (Dyer, p. 10)

Like bacteria transferring genes, prescriptivists -- whether sensible or idiotic -- mix and match ideas about usage. The resulting distribution is far from random: different prescriptive memes are more or less compatible with one another, and with other aspects of critical morphology, ideological metabolism and intellectual history. However, the result is not a nice Linnaean taxonomic tree either.

I don't think anyone can yet plausibly claim to have found memetic DNA, if such a thing is even possible. However, we can identify some key elements of prescriptivist metabolism, in terms of five different motivations that may be given for strictures about usage:

1. Tradition -- how our forebears talked. Innovation is degeneration.

2. Fashion-- how an admired group talks. Deviation is alienation.

3. Rationality -- how one ought ideally to talk. Inconsistency is illogical.

4. Standards -- how we should agree to talk. Variation confuses communication.

5. Revelation -- how God taught us to talk. Alteration is transgression.

Particular cases are usually a mixture of these. Such metabolic processes may cooperate or conflict depending on details -- thus an appeal to fashion may point in the same direction as an appeal to tradition, or in the opposite direction, depending on whether the prescriptivist admires the old ways or prefers the latest thing.

Not all classificatory features of the ideological metabolism of prescriptivism deal with justification. Some have to do with ideas about the nature of human nature and human history, which usually come in superficially inconsistent pairs:

1. Linguistic original sin: Natural behavior is irretrievably incoherent and lawless. Only by careful adherence to explicit rules, explicitly learned, can well-ordered speech and writing be approached.

2. The noble linguistic savage: Unmonitored vernacular speech is ipso facto correct and appropriate. Formal language is artificial, inconsistent and rife with hypercorrections.

3. The Four Ages (Gold, Silver, Brass, Iron/Clay). There are historical peaks and valleys in the quality of culture, including language. People who express this meme usually think that the recent historical direction has been downwards, so that their own time is a valley.

4. The march of progess. Like everything else, language and its use in communication get better over time, due to cultural innovation, competition and technology.

This post is already long enough, and I have a class to prepare. Later on I'll try to apply these classificatory features to particular (old and new) examples of prescriptive practice, identifying aspects of appearance, habitat and smell that can be used for field identification. A interesting survey of prescriptivist history (in the case of English) can be found in the front matter ( pp. 7a-11a) from Webster's Dictionary Of English Usage (1989) by E. Ward Gilman, available on line here.

Posted by Mark Liberman at 07:47 AM

Mixed metaphor of the month

This award goes to the NYT headline writer who titled Felicia Lee's article on the Laura Slater controversy: Book's Critique of Psychology Ignites a Torrent of Criticism.

[Update: Daniel Ezra Johnson emailed to point out that the etymology of torrent is from Latin torrere "scorch, burn" via French torrent. Well, OK, but all of the OED's meanings or citations, from 1398 forward, are all about rushing water or senses derived therefrom. Sometimes, contra Wittgenstein, etymology is not really destiny.]

Posted by Mark Liberman at 06:42 AM

Philosophical due diligence?

The NYT has noticed the fuss about Lauren Slater's veracity, after Princeton philosopher Peter Singer totally missed it in his 3/28/2004 review of her new book Opening Skinner's Box. Singer's failure is a curious one. He himself notices in his review that "Slater makes some errors that made me wonder about her accuracy in areas with which I am not familiar."

Information about many of the serious allegations about Slater's content was easy to find on the web a month ago; I added some links in passing in my 3/30/2004 note. Essentially all of this information was available well before Singer's review appeared. He may have written the review before Deborah Skinner's 3/12/2004 Guardian piece was published, or this 3/16/2004 story about her lawsuit against Slater, but was it before Ian Pitchford's 3/2/2004 posting on psychiatry-research, or the late-February weblog posts by folks like Rivka? Even if the review was written months before, couldn't Singer have contacted the Times to arrange to add a note about the serious charges against Slater, which had been public for several weeks when the review appeared?

As a professor at Princeton, Singer doubtless knows how to research a subject; since he's a best-selling author, I suppose that he has assistants who can do it for him; this is supposed to be an area of special expertise for him, so we might have expected him to have some background knowledge even before reading the book; I found the serious charges I cited just by idly googling "Lauren Slater". Was this really "due diligence"?

Singer is famous for his controversial positions on the ethics of euthanasia, animal liberation and inter-species sex. This whole thing makes me suspicious of his ability to get the facts straight, or at least of his degree of interest in doing so. And surely in making ethical judgments, the facts should matter.

[Update: there is a fascinating, no, hair-raising series of posts on Slater at Ron Hogan's Beatrice.

And new thoughts and links from Rivka at Respectful of Otters.

Michael Miller has pdfs of letters from various involved parties, and links to relevant newspaper and journal articles. ]

Posted by Mark Liberman at 06:37 AM

April 12, 2004

Grammar education: making up for a lost century

I see that Language Logger Geoff Pullum is giving a talk at Northwestern University on Friday 4/23/2004, under the title "What Happened to English Grammar?" I heard Geoff give a version of this talk last fall at Penn, and it's terrific. He's posted a few fragments of this material here in the past, and I hope we'll get a more complete version in future posts. Meanwhile, I've copied his abstract below -- and if you're in the Chicago area on April 23, go to Swift Hall, Room 107, at 3:30 p.m. and hear him.

Try to imagine biological education being in a state where students are taught that whales are fish because that is judged easier for them to grasp; where teachers disapprove of tomatoes and teach that they are poisonous (and evidence about their nutritional value is dismissed as irrelevant); where educated people accuse biologists of "lowering standards" if they don't go along with popular beliefs. This is a rough analog of where English grammar finds itself today. The state of relations between the subject as taught by the public and the subject as understood by specialists is nothing short of disastrous. The fact is that almost everything most educated Americans believe about English grammar is wrong. In part this is because of misconceptions concerning the facts. In part it is because hopeless descriptive classifications and antiquated theoretical assumptions doom all discussion to failure. Amazingly, almost nothing has changed in over a hundred years. The 20th century came and went without affecting the presentation of grammar in popular books or the teaching (what little there is of it) that goes on in schools. Today's grammar books differ in content only trivially from early 19th-century books. In this lecture I name and shame some of those on the long dishonor roll of myth-creators and fear-mongers (John Dryden, Henry Fowler, Ambrose Bierce, William Strunk, E. B. White, George Orwell, Louis Menand, Stanley Fish), and I sketch a view of what could and should be taught in a course on the grammar of Standard English in the 21st century.

Posted by Mark Liberman at 10:48 PM

Such the electric pair

I heard Susan Werner at The Point a couple of days ago. In addition to material from her thoughtfully-named CD I Can't be New, she did some other songs, one of which shocked me. Morphosyntactically and psycholinguistically, that is.

Here's the story. Back in December, responding to a question from Rosanne at the X-bar about sentences like "She is such the smart girl", I remarked that "I certainly don't think I ever say or write things like that, and I don't have any memory of having heard or read them either." Of course, a little internet search turned up plenty of convincing examples. My conclusion: "Live and learn".

What took me aback on Friday at Susan Werner's gig in Bryn Mawr was a line in her performance of Much at All, a song that was on her 1995 CD Last of the Good Straight Girls. I know that I've heard her perform it live at least once, probably in 1996 or so, and I'm sure that I've heard it on the CD several times since then. In fact, it's a song I know well enough that I can recognize it after hearing a few bars.

I like the sound of quiet with my coffee
I like the look of nothing on the wall
I like to read myself what's in the morning paper
I guess that I don't miss you much at all

I never shared your passion for the city
I never really cared for basketball
I seem to get along without the new New Yorker
I guess that I don't miss you much at all

But oh the days when we were lovers
Such the electric pair
Doomed in a way like all the others
Classic affairs are rare and fleeting

I must have missed the changing of the seasons
Well I've seen enough of New England in the fall
I've seen enough of anywhere we were together
I guess that I don't miss you much at all

Doomed in a way like all the others
Classic affairs are rare and fleeting

I'm staring out the window in the kitchen
I'm leaning on the railing in the hall
I'm trying for the life of me to do some living

It's good how I don't miss you much at all

Memory is a funny thing, isn't it?

Posted by Mark Liberman at 08:03 PM

Terrasyllable Introspectionist Gelded: ungraciously nescient fennel feels nagging remorse

entangledbank writes in a frustrated mood that (s)he can't rely on websearches to check for scanning errors in the public-domain Webster 1913 dictionary, because "[t]here are now two thousand indistinguishable porn sites all having massive imports of random dictionary words, for some spider-attracting or filter-avoiding purpose. This means any obscure word whatsoever, even the scanning errors, is now largely unsearchable..."

This is not just an issue for checking word lists. It's a real problem for otherwise wonderful google-sampling techniques in empirical studies of syntactic variation, as discussed in this post, in which 65% of one crucial sample turned out to be porn- or gambling-site pseudotext. In principle, one can easily create a statistical classifier to distinguish such sites from "real ones", as you can see by looking at the examples in EB's post and mine. But this is just one stage in an on-going arms race between the web indexers and the internet demimonde (the demiweb?), and so the whole thing will have to be redone again and again. As things stand, there is no real alternative to human inspection of the samples, at least on sampled basis.

This particular tale has a twist. My post on the perils of googlesampling ended with a challenge: "I'm waiting for someone to point out to me that [an apparently novel construction] was used by Winston Churchill, Jane Austen, William Shakespeare and even the author of Beowulf."

Geoff Pullum looked around on his bedroom bookshelf (or perhaps it might have been his laptop's hard drive), and found an 18th century example in Fanny Hill. Advantage: Pullum.

Posted by Mark Liberman at 06:54 PM

Linguists in Pop Culture

Semantic Compositions and The Audhumlan Conspiracy have been discussing linguists in pop culture. I talked about one example that they haven't mentioned a while back, the techno-thriller Digital Fortress. Another is the 1984 movie The Iceman, which is still available. The Iceman is a Neanderthal, whose frozen body is discovered by explorers. When they thaw him out, he proves to be still alive. A "linguist from MIT" is brought in to study his language, which of course, after 40,000 years, is unfamiliar. The linguist is shown keeping an eye on a device labelled a "pitch/stress meter" as she analyzes the Iceman's speech. Curiously, she looked a lot like Professor Judith Thomson, a philosopher at MIT, where Linguistics and Philosophy are branches of the same department. As it happens, her work is in ethics and metaphysics, areas that have little to do with linguistics.

Posted by Bill Poser at 12:12 AM

Language and the War Effort

The recent concern over US preparation to deal with terrorism reminded me of a couple of stories from another era. During World War II my mother was a student at Hunter College. There was a policy that called for all students to study something related to the war effort. If your major was something useful like engineering you were all set; otherwise you had to add something appropriate. My mother was an English major, which was not useful to the war effort. Since she had a good knowledge of French, her war-related course ended up being French shorthand.

I'm not sure what the plan was. I have visions of stenographers parachuting into occupied France to assist the Resistance with its paperwork, and of brigades of stenographers marching to join the Free French, rapidly writing "Lafayette, nous sommes arrivées.". Mom was never called to serve.

I heard another story from the late Professor Edward Wagner, who learned Japanese in the Army. A friend of his was in the Chinese program, which finished earlier. The utility of Japanese was clear, but since the United States was not sending troops to China, they wondered to what use the Army would put a Chinese-speaking soldier. Professor Wagner soon received a letter from his friend, who was now stationed at Fort Leonard Wood in Missouri. He reported that his assignment was to train mules to respond to commands in Chinese, so that they could be sent to the Chinese forces.

Posted by Bill Poser at 11:14 PM

A deep love of dairy products

Today's Doonesbury meditates on how "accents can reveal longings and aspirations".

The strip for April 9 discusses aspects of the meaning of sorry previously discussed by Geoff Pullum here. This is not the first time we've noticed Trudeau's tracks in our snow, to paraphrase what John Dryden said of Ben Jonson -- "you will pardon me therefore if I presume he lov'd their fashion when he wore their cloaths". Turning the compliment around, the Language Log marketing department is contemplating focus groups on an enterprise in the field of Linguistic Comix.

Posted by Mark Liberman at 08:54 PM

Grammar n-grams: that example-sentence smell

Ryan Gabbard at the Audhumlan Conspiracy used the slogan Join the Campaign for Interesting Example Sentences! to link to a post by Rachel Shallit, where Rachel cites Geoff Pullum's essay on allegedly libelous example sentences, and then criticizes herself for producing example sentences that defame no one, but "positively drip with boringness."

Now, I'm by no means a corpus fetishist, but this is one of several reasons that it's often a good idea to find examples rather than invent them. Real-life writing, like real-life speech, has a texture that's hard to fake.

Rachel quotes some of the allegedly libelous sentences that Geoff wrote about. Here are the first two:

  • "I heard John say that Bill was a coward" - defamatory of Bill
  • "John sent Mary the time bomb" - defamatory of John

Defamatory or not, these are not exactly exciting. They smell like grammar examples, somehow. This is not just a fantasy of mine: if I ask Google about the three-word sequence "John sent Mary", every single one of the 59 hits is a linguistics page; for "sent Mary the", 3 of the first 10 hits are grammar examples! Considering how small a fraction of the web is devoted to linguistics, that's extraordinary. For the four-word sequence "John say that Bill", every one of the 19 examples that Google knows about is from a linguistics page.

If we look at some of the few real examples from this haul, it's instantly clear that we're in a different world:

(link) Morton, George Douglas, Ker of Fawdonside and Ruthven fled into exile in England but made sure that they sent Mary the bond in which Darnley had fully incriminated himself.

(link) Bert sent Mary the following telegram: "we would furnish the Frotho delivered to your bar for $550."

Of course, such random examples are generally too long, but you can usually trim your catch to fit on one line without completely losing the sense that it's a connection to the real world rather than a perfunctory re-arrangement of well-worn game tokens, e.g.

Ruthven sent Mary the bond in which Darnley incriminated himself.

Or with a little more work, you can find even better short examples, which are still a glimpse of another mind rather than just words on a page. Since this is National Poetry Month, here are a few dative-relevant phrases from LION:

He shall send a committee to England (Walt Whitman)
This gave me that precarious gait (Emily Dickinson)
God gave a loaf to every bird (Emily Dickinson)
He hath not told his thought to the King? (William Shakespeare)
I pray thee then deny me not thy aide (John Milton)
Show me the pain (Elizabeth Barrett Browning)

This is not a new idea. When I studied Latin as a kid, the example sentences in our grammar books were like little museum dioramas for me. Look at the examples in this section of Allen and Greenough on indirect objects with transitives, for instance

dabis profecto misericordiae quod iracundiae negavisti; (Deiot. 40), you will surely grant to mercy what you refused to wrath.


equo ne credite (Aen. 2.48) , put not your trust in the horse.


[Cf. non quo haberem quod tibi scriberem (id. 4.4A), not that I had anything to write to you]

Hale and Buck's Latin Grammar is not on line (as far as I know), but its section on uses of the dative includes examples like:

haeret lateri letalis harundo, the deadly shaft sticks in the side ; Aen. 4, 73.
pugnabis amori? shall you struggle against love? Aen. 4, 38.


defendit aestatem capellis, wards off the heat from my goats ; Carm. 1, 17, 3.

So if you want interesting example sentences, lose the grammatical n-grams and look in Emily Dickinson. Or even take random stuff off the internet. It'll be more interesting for your readers, and you might find something new in the process.

Posted by Mark Liberman at 07:17 PM

An internet pilgrim's guide to stranded prepositions

In linking to Eddie V. O.'s discussion of Spanglish at Romanika, I wondered whether there is any Spanish parallel to Ruth King's finding of preposition-stranding in (some varieties of) Canadian French, and also told a story about a "traditional folk remedy" called bibaparú, which turned out to be Vicks Vapor-Rub. Eddie has responded with some additional information on both points.

I was going to comment further on Eddie's examples of clause-final prepositions in Mexican Spanish, which are not instances of preposition stranding (as he points out), but might be a related phenomenon. However, I realized that many readers may be not be clear about what this terminology really means. And never mind interesting innovations in North American French and Spanish -- everyone who writes English needs to understand what "preposition stranding" is, if only for self-defense against misguided copy editors. So here goes.

"Preposition stranding" refers to cases where the object of a preposition has apparently "moved" to some other location in the sentence, leaving the preposition "stranded". It's easy to google up some examples:

1. I am grateful to the women I have spoken to [ ] since the operation
2. Her father had a similar problem that he simply lived with [ ].
3. My great-grandfather was a collector of comics and baseball cards, which we used to fight over [ ].
4. Where does bacon come from [ ]?
5. Which analysts is he talking about [ ]?

These examples are relative clauses or questions where a questioned word or the head of a relative clause is implicitly related to a sort of "silent pronoun" (indicated by open square brackets in the examples above) following a preposition that has been "stranded" in its expected place in the clause.

[Note that preposition stranding occurs in other constructions as well, such as passives: "The region was fought over [ ] by Sweden and Russia for centuries"; and "hollow clauses": "The customer service department was difficult to deal with [ ]". ]

In the relative clauses and questions, an alternative would be to "move" the preposition to be adjacent to the fronted question word or relative pronoun. Here are the same five examples with fronted prepositions -- note that the relative pronoun (here whom or which) might have to be added, since it may otherwise be omitted:

1a. I am grateful to the women to whom I have spoken [ ] since the operation
2a. ?Her father had a similar problem with which he simply lived [ ].
3a. ?My great-grandfather was a collector of comics and baseball cards, over which we used to fight [ ].
4a. From where does bacon come [ ]?
5a. About which analysts is he talking [ ]?

Haj Ross named this process "pied piping", conjuring an image of the wh-word luring the preposition out of its original position, just as the Pied Piper lured the rats and children out of Hamelin. Preposition-stranding is scorned by some prescriptivists, even though it has been used by well-respected writers for centuries.

In fact, in random examples like those above, pied piping is often awkard and pretentious-sounding. In cases like (2a) and (3a), where the preposition is closely associated with the verb, pied piping is particularly inappropriate. The Cambridge Grammar of the English Language (CGEL) goes as far as to suggest that preposition-fronting is ungrammatical when the preposition is specified by the verb (p. 275), as in "*the letters across which I came". However, fronted examples sometimes occur in apparently competent writing:

(link) The Battle over Citizen Kane also sheds light on the masterpiece over which they fought...
(link) That the tribunal censored only a tiny percentage of publications, for example, may not have mitigated the fear of reprisal with which authors had to live.

These sentences may have been created by misguided copy-editors, some of whom go after stranded prepositions like kittens after cockroaches. It seems to me that preposition-stranded alternatives would have been better in the cited cases:

The Battle over Citizen Kane also sheds light on the masterpiece that they fought over...
That the tribunal censored only a tiny percentage of publications, for example, may not have mitigated the fear of reprisal that authors had to live with.

Of course, there are plenty of examples, both traditional and recent, where a stranded variant would be a step down:

Praise God from whom all blessings flow. (from a hymn by Thomas Ken)
A Closer Look At The Case From Which Justice Scalia Has Refused To Recuse Himself. (title of a FindLaw column)
Often plagued by mediocre scripts, over which she fought some spectacular legal battles with Warner Brothers studio, Davis nonetheless turned in performances of the highest caliber... (from a Britannica article on Bette Davis)

According to CGEL (p. 627), "this 'rule' was apparently created ex nihilo in 1672 by the essayist John Dryden, who took exception to Ben Jonson's phrase the bodies that those souls were frighted from (1611). Dryden was in effect suggesting that Jonson should have written the bodies from which those souls were frighted, but he offers no reason for preferring this to the original."

It's a shame that Jonson had been dead for 35 years at the time, since he would otherwise have challenged Dryden to a duel, and saved subsequent generations a lot of grief. As CGEL explains (footnotes omitted):

There has been a long prescriptive tradition of condemning preposition stranding as grammatically incorrect. Stranded prepositions often, but by no means always, occur at the end of a sentence, and the prescriptive rule is best known in the formulation: 'It is incorrect to end a sentence with a presposition.' The rule is so familiar as to be the butt of jokes, and is widely recognised as completely at variance with actual usage. The construction ahs been used for centuries by the finest writers. Everyone who listens to Standard English hears examples of it every day.

Instead of being dismissed as unsupported foolishness, the unwarranted rule against stranding was repeated in prestigious grammars towards the end of the eighteenth century, and the from the nineteenth century on it was widely taught in schools. The result is that older people with traditional educations and outlooks still tend to believe that stranding is always some kind of mistake. It is not. All modern usage manuals, even the sternest and stuffiest, agree with descriptive and theoretical linguists on this: it would an absurdity to hold that someone who says What are you looking at? or What are you talking about? or Put this back where you got it from is not using English in a correct and normal way.

In this case, the artificial strictures of prescriptivists have apparently had a significant effect on the history of the language, partially reversing the historical loss of pied-piping in written English:

"In the course of their history, English wh-relatives are known to have undergone a syntactic change in their prepositional usage: having originally occurred only with pied-piped prepositions, they came to admit preposition stranding as an alternative pattern. The present article presents an overview of this process, showing a modest beginning of stranding in Late Middle English, an increase in Early Modern English, and then a clear decrease in the written language of today, against a more liberal use in spoken English, standard as well as nonstandard. The drop in the incidence of stranding is thus not an expression of a genuine grammatical change but due to notions of correctness derived from the grammar of Latin and affecting written usage. [Gunnar Bergh, Aimo Seppänen. "Preposition Stranding With Wh-Relatives: A Historical Survey". English Language and Linguistics, v. 4 no. 2 (2000).]"

By the way, here's the passage from Ben Jonson that is said to have started the whole silly thing off. It's from Catiline his conspiracy: A Tragoedie, which LION dates at 1616. The passage is part of a hyperbolic description of the slaughter that took place at the end of the Roman civil war in 82 B.C., when the forces of Sulla (aka Sylla) captured Rome.

                    The rugged Charon fainted,
And ask'd a nauy, rather then a boate,
To ferry ouer the sad world that came:
The mawes, and dens of beasts could not receiue
The bodies, that those soules were frighted from;
And e'en the graues were fild with men, yet liuing,
Whose flight, and feare had mix'd them, with the dead.

I haven't been able to locate Dryden's critique yet.

In his 1668 essay Of Dramatick Poesy, Dryden discusses Jonson many times, including this somewhat left-handed compliment in the course of a recommendation of classical literature

In the mean time I must desire you to take notice, that the greatest man of the last age (Ben. Johnson) was willing to give place to them in all things: He was not onely a professed Imitator of Horace, but a learned Plagiary of all the others, you track him every where in their Snow: If Horace, Lucan, Petronius, Arbiter, Seneca, and Juvenal, had their own from him, there are few serious thoughts which are new in him; you will pardon me therefore if I presume he lov'd their fashion when he wore their cloaths. But since I have otherwise a great veneration for him, and you, Eugenius, prefer him above all other Poets, I will use no farther argument to you then his example: I will produce Father Ben, to you, dress'd in all the ornaments and colours of the Ancients, you will need no other guide to our Party if you follow him; and whether you consider the bad Plays of our Age, or regard the good ones of the last, both the best and worst of the Modern Poets will equally instruct you to esteem the Ancients.

As for those phrase-final prepositions in Canadian French and Mexican Spanish, I'll try to get back to them in a later post.

Posted by Mark Liberman at 11:40 AM

Terminological opportunity

Stephany Aulenback at Maud Newton wonders whether the new fiction written in blog form should be called "bliction" or "flogging".

Posted by Mark Liberman at 06:52 AM

Cross-cultural communication: the cartoon version

Learning a strange culture's norms can be hard. But even the League of Evil needs standards for social interaction.

In the interaction between very different cultures, the whole can be less than the sum of the parts. When a member of culture A uses allusions from culture B to address culture C, the result is incomprehension all around. But across the widest gulf, there is still appreciation of the Other.

Divine revelations are not easy to take in. Sometimes successful communication depends on our limited capacities. But when words can't bridge the gap, gaze and body language are not really enough.

However, on the internet, you can find a way to make sense of almost anything.

[ ...more...]

Posted by Mark Liberman at 06:26 AM

The grim economic life of Scar City

Arnold Zwicky has noted in a posting to the American Dialect Society's list that Gabriel Schoenfeld's review of Bobby Fischer Goes to War (by David Edmonds and John Eidenow, NYT Book Review 3/28/04, p. 15 of the print edition), contains this passage (line breaks are as as printed):

    Instead of subservience to the authorities,
Spassky relied upon his prowess over the chess-
board  to obtain what he wanted  amid the scar-
city of planned economic life...

The scar city? Sounds like a rough place.

As Arnold notes, it is extraordinarily hard not to read the passage this way the first time around, especially given the break in the middle of "chessboard" right above it, which does separate semantically relevant parts of the word, and does sort of set you up for being fooled. His broader point about it is about the usage people who spend so much time dishing out uninformed nonsense about -- like that for "clarity" you should avoid using which to introduce an integrated relative clause (anything which they found) or since with non-temporal interpretations (since 37 is prime). Their time would be well spent on real distractors and discourtesies, like this one, things that really do get in the way of understanding.

Posted by Geoffrey K. Pullum at 04:44 PM

Culicover on CGEL

The March 2004 issue of the LSA's journal Language has a review article by Peter Culicover about Huddleston & Pullum's Cambridge Grammar of the English Language. Culicover describes CGEL, and gives many reasons to buy it or to put it on your next gift-occasion wish list:

... a monumentally impressive piece of work... ‘one of the most superb works of academic scholarship ever to appear ... a monumental work that offers easily the most comprehensive and thought-provoking treatment of English grammar to date. Nothing rivals this work, with respect to breadth, depth and consistency of coverage’ ... Huddleston, Pullum and their collaborators definitely deserve a prize for this achievement.

Culicover's review article -- covering 15 pages of the journal -- builds on its discussion of CGEL to raise a number of interesting issues about English grammar, and the theory and practice of grammar in general. It's well worth reading on its own terms, and to Culicover's great credit, it's largely accessible to non-specialists. Since Language is rarely available on corner newstands, and unfortunately is not web accessible, Peter has been kind enough to put a .pdf of the review up on his web site. There's more to say about all this, and no doubt some of it will be said here. But for now, if you're at all interested in language, go read the article!

Posted by Mark Liberman at 03:15 PM

Just a trace of the obligatory rubber

The "full connoisseur's report" from Zackary Sholem Berger:

Vintage 5764 from the Chateau d'Mullen is the result of firm, fresh fruit. The bouquet is startlingly arresting. With a beguilingly titanium core, the aroma is pugilistic. Devastatingly acidic aroma. Sensations of battery acid give way to a nuclear confiture in the mouth; ripe and full. Accents of earth, fire and arsenic followed by a crippling finish.

This is about horseradish, needless to say. Oenophile language from a raphanophile.

Berger's references to titanium and battery acid are somewhat less satirical than you might think. For instance, this is a completely sincere way to praise a Riesling: "Divine subtle nose with citrus, a hint of petrol, some tree fruit, and just a trace of the obligatory rubber." Google finds 159 pages containing "hint of petrol", many if not most on American oenophile sites. There are only 53 ghits for "hint of gasoline", and many of these are non-oenophilic: "If you smell any hint of gasoline while driving, stop the vehicle." Petrol is so much more refined than gasoline, dontcha know. There is also the effect of translation from the traditional French gout de pétrole -- "taste of gasoline" would be shocking, and the cognate "petrol" softens it a bit -- but the snob factor is surely also important.

I'm reminded of the audiophile LP I once bought which boasted of being pressed on "the finest European vinyl." And I thought us Americans were good at plastic, at least...

Here's some more discussion of fancy food phrases, including "tasting flight" and "hint of earth in the nose".

Posted by Mark Liberman at 11:53 AM

More on FoxP2

The excellent page of pseudoscience links on Mikey Brass' "Antiquity of Man" site will be appreciated by all aficionados of bad science reporting. It links to an terrific article by Alec MacAndrew on "FoxP2 and the Evolution of Language", a topic which I discussed briefly here as background to the recent papers on the expression of FoxP1 and FoxP2 genes in songbirds.

An important quote from MacAndrew:

The key point, that all the popular reports missed, is that FOXP2 is a transcription factor - in other words it has the potential to affect the expression of an unknown, but potentially large number of other genes. No wonder the syndrome presents in such a diffuse way. We know now that a FOXP2 homologue is strongly expressed in the development of the mouse brain. So not only does it potentially affect many other genes, but it is known to be important in the development of the brain (by being strongly expressed in the brain of the mouse embryo). I expect that breaking FOXP2 in mice would result in some compromises to brain structure and function - an experiment that someone is sure to do. So the mutation to FOXP2 seems to result in brain defects during embryo development that result in disruption of neural pathways essential for human speech, but which also has other effects.

Another one:

We should beware of popular reports of scientific discoveries: almost all the popular reports of FOXP2 claimed that it was the gene for language or even more ludicrously the gene for grammar - the truth is more complicated and far more interesting than that. There are many popular reports of scientific discoveries which are equally sensationalised.

For those who are more familiar with unix system administration than with biology, calling FoxP2 "the gene for grammar" would be roughly analogous to calling (some random but crucial byte in) /usr/bin/perl "the byte for html forms".

And a final quote from MacAndrew:

It will not be easy to unravel the pathways by which language evolved in humans. If we are to have any hope of doing so, we will need close collaboration between linguists and biologists, who have, until recently, been rather suspicious of one another.

[Mikey Brass link via phluzein]

Posted by Mark Liberman at 10:09 AM

Sufficiently advanced technology is indistinguishable from insanity

I believe that Mark Twain would have enjoyed Lucy A. Snyder's "Installing Linux on a Dead Badger", as I did. But think of how much he would have to learn first! Imagining Mary Shelley's reactions takes us in a different direction.

"Let's face it: any script kiddie with a pair of pliers can put Red Hat on a Compaq, his mom's toaster, or even the family dog. But nothing earns you geek points like installing Linux on a dead badger. So if you really want to earn your wizard hat, just read the following instructions, and soon your friends will think you're slick as caffeinated soap."

[Update 4/13/2004: when you're done, you can accessorize.]

Posted by Mark Liberman at 09:09 AM

Language, understanding, war, and Babel fish

Bill Poser discusses the fallacy that linguistic diversity is divisive. Other oft-cited cases are the genocides in Rwanda and Bosnia in the mid 1990s. The warring Hutus and Tutsis of Rwanda speak the one language. Likewise, Bosnian, Croatian and Serbian are just different names for the same language. People who assume that linguistic diversity is divisive may be confusing linguistic identity with ethnic or sociopolitical identity.

In August 2002, Wayt Gibbs wrote a piece in Scientific American called Saving Dying Languages. It included a full-page geographical plot to show the degree of correlation between locations of endangered languages and regions of greatest biological diversity. I wish someone could do a similar plot but with a linguistic uniformity score for each region of the world superimposed over a conflict index.

David Crystal considers this issue in his great book Language Death and mentions other cases of conflict in regions of linguistic uniformity. In a footnote, he quotes a section from the The Hitch Hiker's Guide to the Galaxy about the mythical Babel fish, a universal language translator which, "by effectively removing all barriers to communication between different races and cultures, has caused more and bloodier wars than anything else in the history of creation."

(Look up this quote using Amazon's full-text search)

Posted by Steven Bird at 08:31 AM

Eggcorn terminology

AEB (as I'll call the anonymous author of an entangled bank ) notes "hand few" used in place of handful. This is a great example of the consequences of vocalization of /l/, the reverse case of "wedding vowels" -- and every day, 20 or 30 internet pilgrims find their way to our site by searching for this phrase.

AEB's livejournal weblog seems to lack permalinks -- the post in question is from Thursday, April 8, 2004 at 11:57PM, entitled "eggcorn: a hand few". Since future readers may have trouble navigating to find it, I'll reproduce it in full here before commenting.

Just noticed someone on E2 writing 'hand few' in the sense of 'handful'; Google is not useful on that two-word phrase (mainly of the type 'on the other hand, few people...'), but restricting it to 'a hand few' gives 169 hits, almost all of which are the eggcorn. Restricting it to UK sites gets only 6 hits, so it's not (just) the SE accents bringing the words closer.

By the way, there seems to be a little discrepancy in what an eggcorn actually is. The very first time it was introduced it was a one-off thing, a mistake one person made by mishearing or misunderstanding, and not widespread enough to count as folk etymology. But pretty soon Mark Liberman is using it for 'sporadic' uses such as 'reigns of power', which is clearly widespread and semantically motivated enough so to count.

The indexed internet changes things. When I got the original message from Chris Potts about "egg corns" for "acorns", I'd never encountered or heard of this mistake before, and thought it was a sort of idiosyncratic non-musical mondegreen. I didn't think of googling for it until later; when I did, I found eggcorns everywhere, though not nearly as many as there are "acorns". Specifically, there are about 200 uses of "eggcorns" or "egg corns", compared to a bit more than 200,000 uses of "acorns", so that "egg corn" seems to have a lexical mindshare of about 1 in 1,000. If there are 400 million native speakers of English worldwide, there might be 400,000 of them who think that oaks grow from egg corns. This is a small minority, but it's much more than one individual's idiosyncratic mistake. The misconstrual is probably not going to spread -- the influence of the standard written language is too strong -- but it's not going to go away either.

Arnold Zwicky has pointed out that "we should talk about nonce folk etymologies vs. successful folk etymologies, with lots of stuff in between". (Arnold also suggests some further taxonomizing of what he calls "reshapings" of words -- read the whole thing!). Anyhow, we've been using the term "eggcorn" for "relatively infrequent folk etymologies". Many of these have a semantic as well as a phonological aspect. This includes the original case -- "corn" is sort of like "seed", and the seed part of an acorn looks sort of like an egg. AEB's "hand few" for "handful" has a similar property, as does another eggcorn I noticed recently: "come closed to" (112 ghits) for "come close to" (513,000 ghits).

As Geoff Pullum has pointed out, eggcorns are tiny little poems, a symptom of human intelligence and creativity:

It would be so easy to dismiss eggcorns as signs of illiteracy and stupidity, but they are nothing of the sort. They are imaginative attempts at relating something heard to lexical material already known. One could say that people should look things up in dictionaries, but what should they look up? If you look up eggcorn you'll find it isn't there. Now what? And you can't look up everything; sometimes you think you know what you just heard and you don't need to look it up.

All things considered, it's surprising only that eggcorns are as rare as they are. I imagine that they are commoner among people who are not over-literate or whose writing system is not highly standardized.

Some eggcorns are sporadic, individual re-imaginings that happen by accident to be the same as creative leaps that others have also made. In other cases, the mistake may spread from one person to another in the course of language learning or later vocabulary acquisition. So far, there is no good information (as far as I know) about the balance and interaction of these individual and social processes. In principle, though, the indexed and linked internet can help here, since we can to some extent look at geography and at (various proxies for) social networks in evaluating the distribution of such forms.

[Update: Daniel Ezra Johnson emails to say

I was wondering if the acorn/corn thing was just coincidence, and found the following at the american heritage dictionary site. it suggests that a similar process is happening for a second time with the same word!

( link) "A thoughtful glance at the word acorn might produce the surmise that it is made up of oak and corn, especially if we think of corn in its sense of .a kernel or seed of a plant,. as in peppercorn. The fact that others thought the word was so constituted partly accounts for the present form acorn. Here we see the workings of the process of linguistic change known as folk etymology, an alteration in form of a word or phrase so that it resembles a more familiar term mistakenly regarded as analogous. Acorn actually goes back to Old English aecern, .acorn,. which in turn goes back to the Indo-European root *o:g., meaning .fruit, berry.."


Posted by Mark Liberman at 07:07 AM

April 08, 2004

Support for Huntington's questions if not his answers

Russel Arben Fox at Wäldchen vom Philosophenweg argues that I've misunderstood his position on Samuel Huntington:

Some people are under the impression that I'm a defender of Samuel Huntington and all he stands for; I'm not. I've been more than adequately convinced that Huntington's arguments supporting his thesis about the uniqueness of Hispanic immigration are flawed and even myopic. But that still doesn't change the fact that Huntington is willing to think about what it means to be a civilization, and what it means to be a nation; however clumsy or borderline xenophobic his thinking, he at least must be given credit for considering the nature and dynamics of identity-construction and maintenance, and what that may or may not mean for economic and social policy.

Fox also links to this article in today's NYT by Niall Ferguson, as an example of someone else who is asking the "truly interesting questions" that Fox feels Huntington has been raising.

For what it's worth, I agree with most of what Fox and Feguson are saying in these passages, to the extent that I understand it. But it seems too weak to criticize Huntington only for his uniqueness-of-Hispanic-immigrants thesis. How about the view that American identity is still essentially protestant and northern European? or that it's always bad to have multiple languages in use in a community?

I don't doubt that Fox is sincere in praising Huntington's questions while disagreeing with his answers. But this is a hard trick to pull off -- "asks interesting questions" is so often a coded way to register agreement with an extreme position. This is especially true if many of the questions are rhetorical ones, and doubts about the answers are backgrounded, or left unexpressed except in response to complaints from others.

And as Geoff Nunberg observes here, the flaws in the content and tone of Huntington's article are not isolated or hidden.

Posted by Mark Liberman at 09:54 PM

"Study linguistics, not grammar"?

I like the word grammar, myself. But maybe Peter Van Dijck at Guide to Ease is onto something when he links to Dive into Mark's "grammar nazi" post and writes

Grammar nazi's are sad. They, like, really miss the point of language altogether, all while professing they love language. Ay! Study linguistics, not grammar, dudes!

I'm not ready to put that on my banner. But we could use some good slogans. I can't think of any since Robert Hall's proto-libertarian "Leave your language alone!" Suggestions?

Posted by Mark Liberman at 09:05 PM

More on Sign Languages

Jan Adriaenssens has sent me an interesting follow-up to my post yesterday on ASL vs. English. First, he says that another method for transcribing ASL, SignWriting, is becoming increasingly popular.

Second, he reports on a project in Belgium/Flanders, and on several neighboring sign languages in western Europe:

We're working on a dictionary of the Flemish Sign Language, using SignWriting. It can be found here or, in the near future, here. This dictionary has the interesting feature that one can search by signs (`zoeken op gebaar'). For testing purposes, we have also posted an ASL dictionary in the same format. You can find it: here. The site hasn't been translated yet, so you'll have to know some basic Dutch.

The Flemish Sign Language (Vlaamse Gebarentaal) is quite different from the sign language in the Netherlands (Dutch Sign Language, Nederlandse Gebarentaal). The Flemish Sign Language is closer to the sign language used in the southern part of Belgium (Belgian-French Sign Language, Langue des Signes de Belgique Francophone), where the spoken language is French. This gives additional proof for your point that spoken languages and sign languages are quite unrelated.

The Flemish Sign Language, the Dutch Sign Language and the Belgian-French Sign Language all have most of their roots in the French Sign Language, just as the American Sign Language does. But ASL also has had a very interesting influence from the Martha's Vineyard Sign Language (MVSL).

Posted by Sally Thomason at 08:19 PM

Language, Understanding, and War

Mark Liberman's point that few disputes are based on "different (mis)understandings of propositions, as opposed to different interests and goals" is of relevance to another linguistic issue. Some people do not regard linguistic diversity as a good thing. Their response to language endangerment is that it would be a good thing if we all spoke the same language, or if, at least, only a few major languages were in use, because we would all understand each other better and this would lead to fewer wars and disputes. The implicit premise of this argument is the proposition whose dubious truth Mark points out. A similar argument is made by the "English Only" movement in the United States, who believe that the use of languages other than English is somehow divisive. A good example of the fallacy of this proposition is Somalia, which has no effective government and is for practical purposes divided into regions ruled by warring clans and bandit gangs.The great majority of Somalis speak Somali or the closely related Maay; communication isn't their problem. Similarly, one of the longest-standing and tensest border standoffs in the world is that between North and South Korea, whose citizens speak the same language. A single world language would in all probability do very little to advance peace.

Posted by Bill Poser at 07:54 PM

A smart person with a stupid idea

Laputan Logic (a smart person who always has intelligent and interesting things to say) has recently posted about words and numbers. There's a fascinating section about Chinese number punning (e.g. 114 = "most surely die"), and he includes a handy in-line calculator for translating particular number strings. My telephone prefix 417 translates as "definitely guaranteed certainly", which is good to know, especially since my area code 215 is "easy guaranteed never".

LL begins by quoting a passage from Umberto Eco's "Search for the Perfect Language", about Leibniz' lingua generalis, which provides a safely 17th-century example of my suggestion that "it takes a really smart person to have a really spectacularly stupid idea". From the limited quotations from Eco about Leibniz, you might not grasp the deep and beautiful nuttiness of his proposal. The key ideas were a characteristica universalis that assigns a different prime number to each primitive concept (we're guaranteed never to run out of primes), and a calculus ratiocinator that creates complex concepts by multiplication (since the prime factorization theorem guarantees a unique decomposition into primitives) and evaluates predication by division (are the factors of the predicate among the factors of the subject?).

Leibniz felt that this would allow legal, religious and political disagreements to be solved by calculation rather than by violence.

I've always wondered whether Leibniz had a story to tell about how to use multiplication of primes to construct a logical formula other than a single predication. Suppose that A and B are propositions -- whether atomic or complex doesn't matter -- and we've assigned 27 to the concept "implies" -- what about "A implies B" vs. "B implies A"? And what about more elaborate formulae where order matters? I can imagine various procedures for encoding string order or formula structure as products of primes, but did Leibniz have a story to tell about this? I've never learned enough about the details of his calculus ratiocinator to determine the answer.

Then there's the problem of the algorithmic complexity of factoring products of really large primes -- and there are surely enough primitive concepts and modes of combination that we'll need some big primes to encode them all. And there's the problem of relating logical formulae to the facts of the world. And then there's the question of whether human conflicts are really very often based on different (mis)understandings of propositions, as opposed to different interests and goals.

Putting it all together, I think we have a winner. Leibniz was clearly a really smart person, and the proposal to solve political and religious disagreements by translating natural language discourses into products of prime numbers was a really stupid idea. It's a good premise for historical fantasy, though -- the idea of a sort of Leibnizian underground, operating through history into modern times, is one of the fun background assumptions of Neal Stephenson's currently-unfolding historical trilogy.

To advance the discussion, in a gingerly way, into more recent times, I'll mention that I first asked myself these questions about Leibniz shortly after the publication of Katz & Fodor (1963), which advanced a theory of natural-language semantic interpretation based on the decomposition of word meanings into sets of primitive semantic features, and the collection of these features into ever-larger sets by a recursive procedure operating on syntactic "deep structures".

Posted by Mark Liberman at 06:23 PM

Database company to Toyota: Rename that automobile!

The anonymous lawyer friend who provided me with the material in my earlier post about trademark law adds another twist. It turns out that the owners of the Lexis-Nexis database sued Toyota to get the Lexus brand name squashed! And what emerged was a nasty case of judges having to wrestle with issues of the phonetics of reduced vowels in various dialects and idiolects of English when they had received no training in this matter (phonetics is not on the curriculum at law schools, sad to tell). All of the following interesting stuff, down to the end of the quote from the judges, is from him.

The case of Mead Data Central, Inc. v. Toyota Motor Sales, USA, Inc., was decided in 1989. Mead, the owner of a trademark "Lexis", for legal research services, sued Toyota to prevent them from using "Lexus" as the brand name for their higher-end line of cars.

In the course of their discussion of likelihood of confusion, the court embarked on an exploration of whether "Lexis" and "Lexus" are pronounced the same in everyday English.

Here's what they had to say -- as you'll see, they got the wrong answer, because they perceive (or mis-perceive) that they pronounce unstressed syllables differently from one another.  In short, they found that nobody would confuse "Lexis" and "Lexus" at least in part because they're pronounced differently:

...[I]if the district court's statement in its Lanham Act discussion that "in everyday spoken English, LEXUS and LEXIS are virtually identical in pronunciation" was intended to be a finding of fact rather than a statement of opinion, we question both its accuracy and its relevance. The word LEXUS is not yet widely enough known that any definitive statement can be made concerning its pronunciation by the American public. However, the two members of this Court who concur in this opinion use "everyday spoken English", and we would not pronounce LEXUS as if it were spelled LEXIS. Although our colleague takes issue with us on this point, he does not contend that if LEXUS and LEXIS are pronounced correctly, they will sound the same. We liken LEXUS to such words as "census", "focus" and "locus", and differentiate it from such words as "axis", "aegis" and "iris".  If we were to substitute the letter "i" for the letter "u" in "census", we would not pronounce it as we now do. Likewise, if we were to substitute the letter "u" for the letter "i" in "axis", we would not pronounce it as we now do. In short, we agree with the testimony of Toyota's speech expert, who testified:

        Of course, anyone can pronounce "lexis" and "lexus" the same, either both with an unstressed I or both with an unstressed U, or schwa--or with some other sound in between. But, properly, the distinction between unstressed I and unstressed U, or schwa, is a standard one in English; the distinction is there to be made in ordinary, reasonably careful speech.

        In addition, we do not believe that "everyday spoken English" is the proper test to use in deciding the issue of similarity in the instant case. Under the Constitution, there is a " 'commonsense' distinction between speech proposing a commercial transaction, which occurs in an area traditionally subject to government regulation, and other varieties of speech."  ....  When Mead's speech expert was asked whether there were instances in which LEXUS and LEXIS would be pronounced differently, he replied "Yes, although a deliberate attempt must be made to do so.... They can be pronounced distinctly but they are not when they are used in common parlance, in everyday language or speech." We take it as a given that television and radio announcers usually are more careful and precise in their diction than is the man on the street. Moreover, it is the rare television commercial that does not contain a visual reference to the mark and product, which in the instant case would be the LEXUS automobile. We conclude that in the field of commercial advertising, which is the field subject to regulation, there is no substantial similarity between Mead's mark and Toyota's.

All that I (GKP) have to add about this tragically under-informed piece of linguistics in a legal setting is this: the fact is that whether you pronounce Lexis the same as Lexus is in fact a dialect difference. I don't know what should be made of that as regards the case, but the fact is that there is a large faction of English speakers who pronounce the two syllables of skidded as exact rhymes, and say lozenge so that the second syllable that rhymes with hinge, and would pronounce Lexis differently from Lexus. There is also a large faction for whom the second syllable of skidded has the vowel sound of the last syllable of ballad or method, and the second syllable of lozenge sounds more like sponge, and for those people Lexis would normally sound the same as Lexus.

Though notice also that under certain conditions people produce spelling pronunciations, exactly the way actors were being taught to do by the theater director John McWhorter recently wrote about.

So put all that in the expert witness file. Me, I know only one thing: on a matter as subtle as this, the expert witnesses should not be working for the disputing parties; they should be hired by the judge to work for the court. It's too easy, otherwise, for the experts to tell only the side of the phonetic story that backs up the people who are paying them, and that's what often happens in cases where expert witnesses are employed by plaintiffs' or defendants' counsel.

Posted by Geoffrey K. Pullum at 04:14 PM

Get chicken the way you say you want it

This flash animation at "subservientchicken.com" shows a guy in a chicken suit who (often) follows commands typed in English. Things that work: "lie down", "touch your toes", "touch your nose", "do some push-ups"; things that don't work: "if seven is prime, point to the right", "touch your left foot with your right hand", "turn the chair around". For the last one, the chicken turns himself around. As Des might say, "There's still work for NLP researchers, isn't it?" [via Nick Montfort via Dan Bickel]

I previously expressed skepticism that this was actually a Burger King ad, but several correspondents have completely refuted me.

For example Michael Leuchtenburg emailed:

When the subservientchicken flash is loading, it shows a Burger King logo. It also says "© 2004 Burger King Brands, Inc. All rights reserved." at the bottom of the page, and has a link to http://www.bk.com/ (Burger King's website).

Looking at the whois entries, it appears that subservientchicken.com is registered by an advertising firm:

    Administrative Contact:
       kilpatrick, jordan  (3292168I)            jkilpatrick@cpbgroup.com
       crispin porter & bogusky
       3390 mary street
       office 300
       MIAMI, FL 33133
       305 859 2070 fax: 305-854-3419

Searching that phone number up on google finds the following entry: Crispin Porter & Bogusky Advertising, (305) 859-2070, 2699 S Bayshore Dr, Miami, FL 33133

Looking at their webpage, they do seem to be doing work for Burger King, giving their "Have it Your Way" campaign "an extreme makeover". http://www.cpbmiami.com/FrameContentSpecClient.cfm?ClientID=55

If this isn't actually associated with Burger King, it's a very well planned fraud. I think it's legitimately associated with BK.;

Posted by Mark Liberman at 04:09 PM

The whole web in ram?

Topix.net (here and here) discusses the reasons for storing the whole of the web's text in ram, the benefits of doing so, how Google may be doing it, and what advantages their infrastructure may give them in deploying other services. [via locussolus].

Note that some of the commenters doubt the arguments against disk-based search algorithms; I haven't thought it through, but life is certainly easier if you don't have to worry about seek times. 100,000 servers with 1G each is 1014 bytes of ram, or 100 TB. Google says it searches 4,285,199,774 (= 4.3 x 109) web pages, so that's roughly 23KB per indexed page, if I've done the arithmetic right. That's plenty, though there is doubtless some disk stuff going on anyhow. In either case, there are certainly advantages of knowing how to build, maintain and use very large cluster farms, which was the main point of the topix.net post.

Posted by Mark Liberman at 11:10 AM

him that pisseth

According to this page on "La prononciâtion du Jèrriais", "th" in Jèrriais is pronounced "coumme th en Angliais 'though'". Example: malheutheux "unhappy". In other words, it's a voiced interdental fricative.

A brief scan through the Pages Jèrriaises suggests that Jèrriais "th" corresponds to standard French "r" when it's between vowels within a word: thus

heures lire mémoire héritiers histoires souris parole
Jèrriais heuthes liéthe mémouaithe héthitchièrs histouaithes souothis pathole


raisonnable récalcitrant rouler
Jèrriais raisonnabl'ye r'calcitrant rouôler

I wonder if this is a contact effect? and if so, is it due to contact between Jèrriais and English, or to historical connections of both to some common influence?

Interdental fricatives (voiced or voiceless) are "marked" sounds, not very commonly found in the world's languages. They're hard for children to learn, and harder for adult speakers who didn't learn as children. My oldest son, when he was three or so, continued to say "thin" as "fin". So I made the usual parental mistake of trying to coach him:

Me: "Put your tongue between your teeth and blow, like this: thththththth!"

Him: "ththththththth!"

Me: "Great! Now say 'ththth...thin'!"

Him: "thththth...fin!"

Anyhow, English deploys its interdental fricatives in some pretty fiendish combinations, like "sixths", which involves the kind of lingual gymnastics that cluster fans (justly) ooh and aah over in Northwest Coast or Caucasian languages. We've made life a bit easier for children and foreigners by dropping the third person singular ending "-eth", which doesn't generally create within-word clusters, but did traditionally get involved in some difficult sequences, like "pisseth against the wall". Try saying that three times, fast!

I feel no compunction in reproducing this phrase in a family-oriented weblog because it's from the bible, as Suzette Haden Elgin points out in the web-accessible sample issue of the her Linguistics and Science Fiction Newsletter. In fact, it occurs six times (allowing for variation in definiteness), as this search at UVa's convenient Electronic Text Center shows:

1 Kings, 14.10: Therefore, behold, I will bring evil upon the house of Jeroboam, and will cut off from Jeroboam him that pisseth against the wall, and him that is shut up and left in Israel, and will take away the remnant of the house of Jeroboam, as a man taketh away dung, till it be all gone.

1 Kings, 16.11: And it came to pass, when he began to reign, as soon as he sat on his throne, that he slew all the house of Baasha: he left him not one that pisseth against a wall, neither of his kinsfolks, nor of his friends.

1 Kings, 21.21: Behold, I will bring evil upon thee, and will take away thy posterity, and will cut off from Ahab him that pisseth against the wall, and him that is shut up and left in Israel,

1 Samuel, 25.22: So and more also do God unto the enemies of David, if I leave of all that pertain to him by the morning light any that pisseth against the wall.

1 Samuel, 25.34: For in very deed, as the LORD God of Israel liveth, which hath kept me back from hurting thee, except thou hadst hasted and come to meet me, surely there had not been left unto Nabal by the morning light any that pisseth against the wall.

2 Kings, 9.8: For the whole house of Ahab shall perish: and I will cut off from Ahab him that pisseth against the wall, and him that is shut up and left in Israel:

From the context, these phrases do not seem to represent any concern for the public health consequences of promiscuous urination, but rather are just a poetic way of saying "male human."

For example, the New International Version translates 1 Kings 14.10 as

Because of this, I am going to bring disaster on the house of Jeroboam. I will cut off from Jeroboam every last male in Israel-slave or free. I will burn up the house of Jeroboam as one burns dung, until it is all gone.

and the New American Standard Bible has

therefore behold, I am bringing calamity on the house of Jeroboam, and will cut off from Jeroboam every male person, both bond and free in Israel, and I will make a clean sweep of the house of Jeroboam, as one sweeps away dung until it is all gone.

But none of the religious instruction of my youth dealt with this question. I'll update this post as I learn more.

Posted by Mark Liberman at 10:38 AM

More on la tchestchion dé razzle

In response to my post on Theodore Dalrymple's razzle malapropism, Geraint Jennings (Maître-pêtre des Pages Jèrriaises) emailed to suggest

"razzle" - to the pure, all things are pure, however to my mind the image was immediately summoned up of the, ahem, top-shelf magazine of that name (the sort of low-rent publication one might only consult for one-handed research - not that I'm claiming intimate knowledge, honest!)

Razzle magazine seems to be a UK operation, as cited in this Ian Dury lyric, and Dalrymple is a UK-ish person, so his malapropism may well have been triggered by the print pr0n resonances.

Geraint further indicates that "razzled" remains "slang for drunk - although the currently hipper version (and I think more common these days) is the shorter 'razzed'". (Geraint stipulates that "Of course, I pontificate from a life of pure abstemiousness.... ;-) ").

Geraint's Pages Jèrriaises are quite relevant to the Anguish Languish dimension, since the ability to read lé Jèrriais by reference to French is analogous to the ability to read Anguish by reference to English (except that lé Jèrriais is much easier, being a real language):

J'avons eune longue tradition littéthaithe en Jèrriais d'pis la fîn du dgiêx-huitième siècl'ye - sustout des poésies et d's histouaithes. Mais nou considéthe qué Wace 'tait l'fondateu et l'înspithâtion d'la littéthatuthe dé Jèrri, et qu'au dgiêx-neuvième siècl'ye l's auteurs d's Îles d'la Manche înspithîtent la r'naîssance littéthaithe en Nouormandie continentale.

Rendering FAQ as "Tchestchions tréjous d'mandées" is neat -- even though this is just Jèrriais orthography, it seems like the joyful linguistic play you sometimes see in hiphop spellings or in books like Zazie dans le metro.

By the way, for any Americans who were as puzzled as I was by Geraint's euphemistic use of "top shelf", I offer this discussion of the UK "top shelf rule" for "adult service", which also discusses "middle shelf magazines." On this side of the Atlantic, the phrase "top shelf" is a term with positive (and nonpornographic) connotations, derived from the traditional position of better-quality liquor on bartenders' shelves (the "top shelf margarita"), and from a general sense that quality in things shelved ought to correlate with height (e.g. "Top Shelf" as a title for the Village Voice's feature on "our 25 favorite books of 2003".

Posted by Mark Liberman at 12:17 AM

April 07, 2004

Talking Chimp

The Zoological Society of London has issued a call for volunteers to

`talk chimp' in everyday life and see how primate patter can resolve workplace conflicts, express emotions and strengthen human bonds.
The BBC has a news piece about the project, which quotes organizational psychologist Cary Cooper as explaining the potential benefit to humans thus:
What they communicate is words, not feelings, so this kind of thing would give them access to their emotions.
This statement doesn't make sense. People don't "communicate words". They communicate using words. And one of the things that they communicate by this means is surely feelings. This is an area about which I know very little, but I wonder if there is any real evidence that human language, together with gesture, facial expression, and other non-verbal means of communication, provides us with less of an ability to communicate feelings than chimpanzees have.

Posted by Bill Poser at 09:52 PM

X nazi

Our referrer log features a few disturbing search requests today, including how can you stop people from reading your mind? and why am i so fat? and what language do they speak in beijing? We have nothing much to offer the first two pilgrims, but I believe that Bill Poser and Dan Jurafsky will have set the third one straight.

There were also some of the usual semi-odd queries, like Pete Rose, scholarly , incall, adult situations, wedding vowels, and so on. One of them was linguistics nazi. This is a new phrase to me, though it's transparent enough, since "X nazi" has come to have an extended meaning something like "someone who is serious about X in an unfriendly way". As the Wikipedia entry for nazi says:

The usages seen in popular culture are seen as offensive by some; these include the politically correct as well as those who consider the use to be a trivialization of the Nazis, who killed millions. Phrases like "Open Source Nazi," "Feminazi," or "Soup Nazi" are examples of those in common use.

By synchronicity, I happened to read this Dive into Mark post a few minutes later, which uses the term "grammar nazi" and supplies a suitable joke:

Two busty coeds—a Southern belle and a New England yankee—are in Florida on spring break. The belle turns to the yankee and asks, “So, where y'all from?”

The yankee turns up her nose and says, “I’m from a school where we don’t end sentences with prepositions.”

Without missing a beat, the belle replies, “So, where y'all from, bitch?”

I've heard several other versions of this joke, all about men, substituting other (mostly anatomical) epithets for "bitch". I think Mark's version would be better without the "busty" and "belle" bits, but I enjoyed it in context all the same.

[Update: note that the use of y'all with a singular referent is the real ungrammaticality in this joke...]

Posted by Mark Liberman at 06:51 PM

Bronze sausages?

Desbladet links to a BBC news note that reads:

Hungary's Magyar Hirlap says the celebration of the country's accession to the European Union has been "spoilt" by failures on both sides which prevent the free mobility of people on the big day.

"All right, there will be celebrations, there will be sausages and mustard accompanied by fireworks, but it would have been nice to have something tangible", it says.

Des titles his note "Intangible sausages!"

But this reminds me -- perhaps because of the high-energy physics in Des' previous post -- of some very tangible sausages that I once saw at Los Alamos. There was a bronze figure of Edward Teller, with a speech ballon made of (also bronze) sausages. However, when I checked last year, Google failed to turn up any relevant pages on this figure, at least when probed with the phrases that came to mind. This was so surprising that I began to wonder if I had dreamed the whole thing.

It wasn't there again today, as the poet almost said. If it's not on the internet, can it exist?

Posted by Mark Liberman at 05:33 PM

Yahoo launches 'Soul Search Engine'

More evidence that The Onion is back on top.

Posted by Mark Liberman at 11:34 AM

Anguish languish

In an article in the April issue of The New Criterion entitled "Reflections on the oldest profession", Theodore Dalrymple writes:

Since then, I have treated a lot of prostitutes as patients ... for the most part, they have been creatures who look as if they have emerged from the canvases of Otto Dix, razzled by drugs and disease, with crumbling bones and wrinkled skin, beaten into submission by pimps festooned with gold chains and mouths full of redundant golden dentistry.

I'm always happy to see people enjoying themselves with words, as Dalrymple clearly is in this article. But I'm pretty sure that razzled in this passage is a malapropism for raddled. The OED glosses the verb razzle as "To live a life of pleasure, to enjoy oneself; to go ‘on the razzle’." The "life of pleasure" part sort of half-way fits the drugs and disease, but the verb is intransitive, and is flagged as "slang". Encarta flags it as "early 20th century". I've never heard of it -- the "razzle" in razzle-dazzle is just a variant reduplication of dazzle. In contrast, the American Heritage dictionary glosses raddled as "worn-out and broken-down", which seems to fit exactly.

Now, if the folks at The New Criterion want to dust off razzle and start using it in place of raddle, good luck to them. As a linguistic libertarian, I'll just observe calmly from the sidelines. However, as "[a] staunch defender of the values of high culture" in "the culture wars now raging throughout the Western world", TNC is probably against such grass-roots poetic innovation as a matter of principle.

The most interesting aspect of this (exceedingly minor) point is that it's usually so easy to figure out what word someone really meant when they use the "wrong" one. This is probably related to the phenomenon of reading jumbled words.

A mind-numbingly repetitive application of the same effect can be found in Howard L. Chace's 1956 Anguish Languish, which was a popular citation among AI speech recognition researchers in the early 1970s. You can get a whiff of the idea from the first sentence of the introduction:

English words are astonishingly versatile and could readily be made to serve a new and extraordinary purpose, but nobody seems to care about this except SPAL (Society for the Promotion of the Anguish Languish).

and you're hit with the full force of the idea a few lines later:

A visiting professor of Anguish, Dr. ________, who, while learning to understand spoken English, was continually bewildered and embarrassed by the similarity of such expressions as boys and girls and poisoned gulls, used to exclaim:

"Gracious! What a lot of words sound like each other! If it wasn't [sic] for the different situations in which we hear 'em, we'd have a terrible time saying which was which."

Of course, these may not have been the professor's exact words, because he often did his exclaiming in Anguish rather than in English. In that case he would say:

"Crashes! Water larders warts sunned lack itch udder! Effervescent further delerent saturations an witch way harem, wade heifer haliver tam sang witch worse witch."

This stuff makes my teeth itch, but some people like it. And the point is absolutely correct: "the different situations in which we hear 'em" are just as important as the words that we hear -- or read.

Posted by Mark Liberman at 10:35 AM

ASL vs. English

In a Language Log post yesterday Mark Liberman discussed Ted Chiang's linguistics, commenting that American Sign Language (ASL) can indeed be transcribed. Another thing Chiang said about ASL deserves comment:

"...written English has about as much to do with ASL as written Chinese does."

Chiang is right, but not because ASL is manual. The reason ASL and English (both written and spoken English) are so dissimilar is that ASL has no historical connection with English. It's true that ASL is the language of people who live in an English-speaking environment, and that most or all adult ASL speakers (or "speakers", if you want to emphasize the non-oral medium) are literate in English. But that's because they are bilingual, just as most Hispanics and other linguistic minorities who live in the U.S. are bilingual.

ASL developed (in part?) from French Sign Language, which was the sign language known to the people who founded the U.S. school for the Deaf where ASL arose. ASL is at least as different from English, and for that matter from British Sign Language, as French is. Signed languages like ASL are separate from and independent of the spoken languages in their environments; they are not dependent on spoken languages in any linguistic sense.

ASL structural and lexical properties differ strikingly from those of English in many ways. For instance, ASL verbs have inflectional affixes that agree with objects as well as subjects -- like Swahili and Montana Salish and many other languages, but unlike English and French and almost every other Indo-European language. (Most Indo-European languages have subject agreement but no object agreement; the last trace of subject agreement in English is the distinction between present-tense verbs with a third-person singular subject and all other present-tense verb forms, e.g. John sit-s vs. I/you/we/they/the boys sit.)

Posted by Sally Thomason at 08:48 AM


Paul Ford has responded in an admirably reasonable and temperate way to my anti-Passivator screed. I sent him an email to "apologize for getting a bit carried away in criticizing The Passivator -- and especially for taking its flaws out on your semantic web work. That was the kind of Maureen Dowd moment that I ordinarily try to avoid or suppress." I'm impressed by Paul's grace under fire, and his responses about writing, indexing and the Semantic Web are well worth reading.

By the way, Paul mentions that "[he] can't find an email address on [my] website." I guess he means the Language Log site, which I freely admit could use some design improvement -- it's an aesthetically and functionally random evolution from a random initial selection of standard MovableType templates. Somewhere over in the right column of the main page, there is a list of "those posting here from time to time", which gives a home page link for each of us. On mine, my email address is prominently enough displayed for a very large number of spammers to have found it. My home page is also known to Google. But I imagine that flagging and retrieving contact information for weblog authors is something that the Semantic Web could help with, especially if there were some way to do this without making life even easier for would-be spammers.

Posted by Mark Liberman at 07:30 AM

Birdsong and speech: together in the genome?

Kai von Fintel at semantics etc. blogs a Scientific American note to the effect that "Birds Share 'Language' Gene with Humans", along with links to a UCLA press release and two J. Neuroscience articles.

This research has to do with two genes called FoxP1 and FoxP2. The Sciam note is careful to put scare quotes around "language" in "'language' gene", but I suspect that's because they're not sure whether it's right to call birdsong language, and not because they've become convinced that it's misleading to talk about particular genes being "for" higher-level cognitive, behavioral or even structural features.

It's not surprising that birds and humans share particular genes -- after all, we share many genes with yeast. What's interesting is the claim that these particular genes seem to share a functional relationship with vocal learning and/or vocal performance in both zebra finches and humans. However, as I understand the papers involved, the connections so far are exciting and suggestive but far from conclusive. The two new papers suggest somewhat different ideas about the degree of specificity of the associated functions -- "vocal plasticity" or "sensorimotor integration and the control of skilled, coordinated movement" -- and both ideas are speculative. And, of course, treating individual genes as "for" higher-level functions and macroscopic structures is at best a convenient way of talking.

The main evidence from humans is that "[a] point mutation in FOXP2 co-segregates with a disorder in a family ("KE") in which half of the members have severe articulation difficulties accompanied by linguistic and grammatical impairment. This gene is disrupted by translocation in an unrelated individual who has a similar disorder" (quoting from abstract of 2002 Nature article by Enard et al.). Imaging studies (here and here) have shown that the affected family members have abnormalities in various speech-related areas of the brain: "reduced grey matter density bilaterally in the caudate nucleus, the cerebellum, and the left and right inferior frontal gyrus... In addition, increased grey matter density was found bilaterally in the planum temporale."

Michael Ullman and others think that human speech and language involve cooperation and competition between two distinct brain circuits whose functions overlap, one a (temporal/parietal-lobe) semantic memory system for storing and retrieving the properties of morphemes, words and fixed expressions, and the other a (frontal-lobe and basal ganglia) procedural memory system for putting bits of language together in new ways (see abstract here). The FoxP2 mutation reduces grey matter in several parts of Ullman's frontal/basal ganglia circuit: the caudate nucleus (part of the basal ganglia), the cerebellum (also involved in motor control) and the inferior frontal gyrus (related to Broca's area). The same mutation is also associated with increased grey matter (because of developmental compensation?) in the planum temporale, functionally dedicated to sound processing, and part of the temporal/parietal circuit(s) involved in speech and language.

On the basis of comparing human FoxP2 with the same genes in chimpanzee, gorilla, orang-utan, rhesus macaque and mouse, Enard et al. conclude that "although the FOXP2 protein is extremely conserved among mammals, it acquired two amino-acid changes on the human lineage, at least one of which may have functional consequences". They also compared "a segment of 14,063 base pairs (bp) covering introns 4, 5 and 6 of the FOXP2 gene in seven individuals from Africa, four from Europe, one from South America, five from mainland Asia and three from Australia and Papua New Guinea", and found what they characterize as "an extreme skew in the frequency spectrum of allelic variants at FOXP2 towards rare and high-frequency alleles", suggesting that FoxP2 "has been the target of selection during recent human evolution", and that the hominid innovations in this gene occured "during the last 200,000 years of human history, that is, concomitant with or subsequent to the emergence of anatomically modern humans". A word of caution: according to their model, the most likely time since the innovation is 0 years; they indulge in a certain amount of hand-waving to re-interpret this as "during the last 200,000 years".

The new "birdsong" stuff looks at expression of FoxP1 and FoxP2 in various parts of the brains of various creatures, including not only songbirds but also humans and crocodiles. For those of you who don't have subscriptions, here are the abstracts:

"FoxP2 Expression in Avian Vocal Learners and Non-Learners", by Sebastian Haesler, Kazuhiro Wada, A. Nshdejan, Edward E. Morrisey, Thierry Lints, Eric D. Jarvis, and Constance Scharff, from The Journal of Neuroscience, March 31, 2004, 24(13):3164-3175. For those of you

Most vertebrates communicate acoustically, but few, among them humans, dolphins and whales, bats, and three orders of birds, learn this trait. FOXP2 is the first gene linked to human speech and has been the target of positive selection during recent primate evolution. To test whether the expression pattern of FOXP2 is consistent with a role in learned vocal communication, we cloned zebra finch FoxP2 and its close relative FoxP1 and compared mRNA and protein distribution in developing and adult brains of a variety of avian vocal learners and non-learners, and a crocodile. We found that the protein sequence of zebra finch FoxP2 is 98% identical with mouse and human FOXP2. In the avian and crocodilian forebrain, FoxP2 was expressed predominantly in the striatum, a basal ganglia brain region affected in patients with FOXP2 mutations. Strikingly, in zebra finches, the striatal nucleus Area X, necessary for vocal learning, expressed more FoxP2 than the surrounding tissue at post-hatch days 35 and 50, when vocal learning occurs. In adult canaries, FoxP2 expression in Area X differed seasonally; more FoxP2 expression was associated with times when song becomes unstable. In adult chickadees, strawberry finches, song sparrows, and Bengalese finches, Area X expressed FoxP2 to different degrees. Non-telencephalic regions in both vocal learning and non-learning birds, and in crocodiles, were less variable in expression and comparable with regions that express FOXP2 in human and rodent brains. We conclude that differential expression of FoxP2 in avian vocal learners might be associated with vocal plasticity.

"Parallel FoxP1 and FoxP2 Expression in Songbird and Human Brain Predicts Functional Interaction", by Ikuko Teramitsu, Lili C. Kudo, Sarah E. London, Daniel H. Geschwind,and Stephanie A. White.

Humans and songbirds are two of the rare animal groups that modify their innate vocalizations. The identification of FOXP2 as the monogenetic locus of a human speech disorder exhibited by members of the family referred to as KE enables the first examination of whether molecular mechanisms for vocal learning are shared between humans and songbirds. Here, in situ hybridization analyses for FoxP1 and FoxP2 in a songbird reveal a corticostriatal expression pattern congruent with the abnormalities in brain structures of affected KE family members. The overlap in FoxP1 and FoxP2 expression observed in the songbird suggests that combinatorial regulation by these molecules during neural development and within vocal control structures may occur. In support of this idea, we find that FOXP1 and FOXP2 expression patterns in human fetal brain are strikingly similar to those in the songbird, including localization to subcortical structures that function in sensorimotor integration and the control of skilled, coordinated movement. The specific colocalization of FoxP1 and FoxP2 found in several structures in the bird and human brain predicts that mutations in FOXP1 could also be related to speech disorders.

In the unlikely event that you're still with me, you might want to take a look at these neat pictures of the emergence of acoustic structure during birdsong learning.

Posted by Mark Liberman at 06:49 AM

Clitics on Broadway

In creole studies, one is often warned of the dangers of oversimplifying a creole when describing it, out of a bias towards treating it as an abbreviation of the language that provided its words. But as I listened to a stage actor tonight drift into a distortion of spoken English that has always grated on my nerves, I realized that in some ways, conventions of written English can distract native speakers from ways in which our own language is more complex than we are often aware of. It was the grand old clitic problem.

The diligent scholar of Saramaccan Creole, for example, will describe not just one set of pronouns but two: a full set and a "short" set. Thus the full version of "I" is MI, as in MI WAKA "I walk." But in the spoken language, MI is often rendered as just M. This is especially common when MI is used as an object. "Lend me some money" is LENI-M SO MONI.

Things are even clearer with the object pronoun in the third person singular. The full form is EN, as in MI DU EN, "I am doing it." EN is pronounced as EH with nasality, as in the vowel in French CHIEN "dog." But after, for example, verbs ending in A, EN gloms onto the verb and turns the final A into an EH sound, creating a long, nasalized EH. PAKA is "pay," but "pay him" is not PAKA EN, but "PAKEHHN." Here the special form is less "short" than just different. Overall, this is what a linguist calls a clitic form of the pronoun. There are the free pronouns and the clitic forms -- MI and M, EN and "EHHN."

But it's easy for even a well-intentioned layman to dismiss these clitic forms as "sloppy," mere "contractions." Indeed missionaries in the eighteenth century who transcribed Saramaccan tended to substitute the full forms, such that one would barely know the clitic ones existed from their documents, despite the fact that they did catch many other subtle aspects of the grammar without being linguists. Granted, there is a possibility that the clitic forms had not evolved yet 200 years ago. But then even today, my main Saramaccan informant tends to make the same substitutions in his e-mails to me in Saramaccan, apparently out of a sense that the clitics are just "accidents," when in fact they are very precisely conditioned. For example, EN has the effect I described not only after verbs ending in A, but also ones ending in the EH sound -- but then not the similar AY sound, or EE, OH or OO.

I notice that written English distracts us from our own short form pronouns just as much as Saramaccan writing used to (and still does, to an extent -- LENI-M SO MONI would often be rendered as LENI MI SO MONI by people fluent in the language). This reveals itself in the theatre, of all places.

Tonight an actor said AND THAT'S WHY I'LL TELL THEM AS SOON AS I CAN in rapid, casual style, but he inserted a note of falseness by pronouncing THEM as "THEHM" rather than the way any native English speaker would pronounce it in that sentence, "THUM." "THEHM" did not aid clarity in any way -- if he had said "THUM" the audience would have still known exactly what he was talking about. He said "THEHM" out of a sense that this is what the word "really is."

But actually, "THEHM" is just the full form. "THEHM we can talk about," for example. "Me and THEHM went yesterday." But just as often, English makes use of a second form, the short one, THUM. By no means a lapse or mere static, THUM is absolutely required of anyone who wants to speak English without sounding like a Martian, or a competent but not quite acclimated newcomer to the language. But because our writing conventions "unravel" the language and transcribe both the full and short forms as THEM, the actor is often distracted into supposing that always saying "THEHM" is good form, "rendering the text properly" Actors erupt in these phony "THEHM"s all the time -- I have even heard actors pull this when spouting the vibrantly choppy, earthy vernacular of David Mamet plays.

This reminds me of when I was in a play years ago and a persnickety director insisted that I always pronounce YOUR as "YORE" instead of YER. This meant that I had to render a line like WE'RE INTERESTED IN YOUR ABILITY TO PERFORM as "WE'RE INTERESTED IN YORE ABILITY TO PERFORM," as if it would have confused the audience for me to say YER, or would have made me seem somehow déclassé when I say YER all the time in real life and have yet to be mistaken as a longshoreman or rejected by a woman's family as "Not Quite Our Class, Dear." After all, WE'RE was okay -- why not YER? That written English lets contractions squeak through but not our short form pronouns is essentially a happenstance.

If English were spoken by seven and a half people in a rain forest in Malaysia and a modern linguist described it, they would likely list a paradigm of short forms, such that alongside YOU, HIM, HER, and THEM there would be YA, IM, ER, and THUM with an alternate UM described as "typical of rapid speech" (TELL 'UM TOMORROW).

But in real life, beyond the obscure realm of professional linguists' treatments of spoken English, a sense reigns that our language has a single set of pronouns glistening pristine, unaffected by the slings and arrows of outrageous phonetic erosion, as if the language was born yesterday.

Posted by John McWhorter at 12:54 AM

April 06, 2004

Language Log is #1 for stupid ideas

At this moment, Google lists 1,260,000 pages in response to a query about stupid ideas, and Geoff Nunberg's post about Samuel Huntington is the first of all of them.

The same post is #13 (of 7,880,000) in reponse to a query about smart people, just behind Robert J. Sternberg's book Why Smart People can be so Stupid, blurbed as follows by Yale University Press:

One need not look far to find breathtaking acts of stupidity committed by people who are smart, or even brilliant. The behavior of smart individuals--from presidents to prosecutors to professors--is at times so amazingly stupid as to seem inexplicable. Why do otherwise intelligent people think and behave in ways so stupid that they sometimes destroy their livelihoods or even their lives?
While many millions of dollars are spent each year on intelligence research and testing to determine who has the ability to succeed, next to nothing is spent to determine who will make use of their intelligence and not squander it by behaving stupidly. Why Smart People Can Be So Stupid focuses on the neglected side of this discussion, reviewing the full range of theory and research on stupid behavior and analyzing what it tells us about how people can avoid stupidity and its devastating consequences.

I particularly like the idea of supplementing intelligence tests with stupidity tests. ETS has been missing the boat on this one for some time.

I've also observed that sometimes it takes a really smart person to have a really spectacularly stupid idea. You have to be smart to be able to think of some of the really complicated dumb stuff that people come up with, but that's not what I mean. I'm talking about the simple idea that is so obviously wrong that any half-wit can see that it don't have a chance, except for someone who is brilliant enough to work out the reasons that it's nevertheless deeply true. If the originator is also persuasive enough to get others to go along, then you've really got trouble. I'd give examples, but professional courtesy forbids it.

Posted by Mark Liberman at 08:56 PM


At the end of a recent post, I indulged in a little rant on the topic of people who pontificate ignorantly about language. To save you the trouble of following the link, here it is again:

I hate this role of correcting elementary errors of linguistic analysis, or questioning unthinking prescriptions that are logically incoherent, factually wrong and promptly disobeyed by the prescriber. Historians aren't constantly confronted with people who carry on self-confidently about the rule against adultery in the sixth amendment to the Declamation of Independence, as written by Benjamin Hamilton. Computer scientists aren't always having to correct people who make bold assertions about the value of Objectivist Programming, as examplified in the HCNL entities stored in Relaxational Databases. The trouble is, most people are much more ignorant about language than they are about history or computer science, but they reckon that because they can talk and read and write, their opinions about talking and reading and writing are as well informed as anybody's. And since I have DNA, I'm entitled to carry on at length about genetics without bothering to learn anything about it. Not.

I want to insist that this has nothing to do with professionalism. There are plenty of careful, interesting, creative discussions of language-related matters from people who are not professional linguists and may not even have any formal training in the field. For example, "Tenser, said the Tensor" has just described a story by Ted Chiang "about a linguist trying to learn an alien language and writing system". TstT also links to an interview with the author, and points out that Chiang identifies himself as a linguistic autodidact:

Q: In your Story Notes at the end of the collection, you talk about the physics underlying "Story of Your Life," but what really fascinated me about the piece was the discussion of linguistics. What is your background in this field and how did you go about constructing the grammatical oddities of the heptapod's written language?
A: I have no formal background in linguistics, but I'm interested in the subject. I did some reading about how field linguists study a new language, and it occurred to me that if we ever meet a technologically sophisticated species and try to learn their language, we might make better progress by learning its written form. However, I wanted the writing system to be really alien, just as I wanted the heptapods themselves to be, so I tried to make it as different from human writing systems as possible. One particular inspiration was sign language, which has a three-dimensional grammar unlike anything in spoken languages. There's no good way to transcribe sign language; written English has about as much to do with American Sign Language as written Chinese does. I was fascinated by the differences between sign language and spoken language, and tried to imagine an analogous form of language that was designed purely to be written, without being a transcription of speech.

I read Chiang's story a few years ago, when it was reprinted in The Year's Best Science Fiction #16 (1999), and I enjoyed it very much. It's based on an idea about the influence of languistic structure on the perception of time that I found implausible, though throught-provoking. And the story has an emotionally moving take on the relationship among language, thought and life, using the metaphor of variational principles in physics like Fermat's Principle of Least Time.

I agree with TstT that Chiang's "use of linguistics terminology as well as his description of the theoretical attitudes and research methods of a working linguist really rang true." The interview quoted above does hint that Chiang might be missing some things. It's wrong to say that "there's no good way to transcribe sign language" (look here and here), though it's true that there isn't an orthography widely used among signers. And while it's true that there are three- (or four-) dimensional aspects of sign language structure that are "unlike anything in spoken languages", it would be wrong to conclude that sign language grammar as a whole is completely alien. However, it's clear from Chiang's story that he's a smart person who has thought seriously about a wide range of language-related issues, and that he's read widely about language and linguistics, and paid attention to what he read.

So this is not a question of professional qualifications. Just as there are plenty of careful, interesting, creative discussions of language-related matters from people who are not professional linguists, I'm afraid that there is also a certain amount of silly stuff from people with degrees and even jobs in the field. But that's a topic for another post or two.

Posted by Mark Liberman at 05:03 PM

The web through 3-D glasses?

It's the latest thing. Q. Pheevr explores googletheming and googlerheming.

Posted by Mark Liberman at 03:13 PM

Good for the Jews?

Not that the world was exactly waiting breathlessly for this one, but the UVa Victorianist Chip Tucker has very obligingly helped me to pin down the meaning of the  reference to "a Semitic guess" in Browning's "Easter Day," which I was puzzling over in an earlier posting. According to a note in Ian Jack et al.'s 1991 Oxford edition of Browning, the phrase evokes "a hypothesis about some difficult point in Hebrew, or some other Semitic language," and a note in the Yale Poets edition more-or-less concurs -- hence, I suppose, it's a conjecture about some obscure philological nicety. Even given Browning's demonstrated interest in philology, it sounds a little far-fetched, but then so do the explanations of most of Browning's references -- it's Semitic guesses all the way down.

Posted by Geoff Nunberg at 01:11 PM

Grammar Gods

Grammar God!
I am a Grammar God. I received this designation by taking this grammar quiz. The results do not include one's actual score on the twenty questions, but I must have done well to be designated a Grammar God.

The quiz tests your knowledge of traditional prescriptive grammar, the sort of thing that fussy English teachers used to worry about and that some pundits still do. Some of the questions deal with dialectal and register variation. For example, one gives you a choice between sneaked and snuck as the preterite of sneak. Another dealt with the dreaded hypothetical: If I were going to go there, .... You're not supposed to say was instead of were. As it happens, I speak a dialect of English that is very conservative in some areas and this comes naturally to me, but I suspect that the great majority of English speakers naturally say was.

Some questions didn't really have to do with grammar at all, but with spelling. One called for the correct spelling of the possessive form of Chris. Is it Chris', Chris's, Chrises etc.?

What shows how silly this sort of thing is is the question about shall and will. There is some nonsense in school grammar about when to use which, so that, if I recall correctly, you are supposed to say things like:

Shall I go to the store, or will you?
My colleagues with expertise in English, such as Geoff Pullum, may be able to elucidate this further, but as I understand it, no naturally occurring variety of English has ever followed the rule with which the prescriptivists plague us. It is completely artificial, and unlike some artificial rules, does not even serve a useful purpose, such as avoiding ambiguity.

Along with the designation of Grammar God I received this encomium:

If your mission in life is not already to
preserve the English tongue, it should be.
Congratulations and thank you!
The author of this quiz probably has good intentions, but I'm afraid that my mission in life is not "to preserve the English tongue". For one thing, it doesn't need saving. English is spoken natively by hundreds of millions of people and as a second language by hundreds of millions more. I'll devote my efforts to languages that are actually endangered, such as the native languages of British Columbia.

Of course, the author may be concerned with the preservation of the English language in another sense. He or she probably has the idea that to the extent that prescriptive rules are not followed, the language is somehow deteriorating. Languages actually do deteriorate. There are phenomena associated with language death, a subject developed most prominently by Nancy Dorian, in which the grammar is simplified, the sound system is modified, and most importantly, the vocabulary becomes restricted as the situations in which the language is used become fewer and fewer. But this is something that happens to dying languages, not languages like English that are growing. By any real measure of linguistic vitality or expressiveness, it makes not the slightest difference whether you use will or shall according to prescriptive rules or how you spell the possessive form of words ending in <s>. It is silly to think that English is in need of preservation because people don't follow arbitrary rules.

The relationship between language and thought is a topic that linguists usually avoid but is dear to everyone else. In spite of all the controversy over the subtleties, the most important aspect of this relationship is clear and uncontroversial:

No matter how well you express yourself, if you don't think clearly, what you say won't make sense.

Prescriptive grammar has very little to do with maintaining the clarity and precision of the language. What it really has to do with is maintaining the dominance of the upper classes and enforcing social norms. It used to be that only the wealthy had access to the kind of education that would provide knowledge of the particular type of English enshrined in the prescriptive standard, so discriminating against people who did not command this type of English helped to preserve class distinctions and to keep the lower classes in their place by making them believe that because they spoke differently they were inferior. This isn't as true as it once was, but the idea persists. Prescriptive grammar is a tool of the kleptocracy.

Posted by Bill Poser at 12:27 PM

The Passivator

Grammar advice in turquoise and lemon, from Paul Ford, the guy who brought us the ontological guide to Harper's Magazine. Though The Passivator is billed as a "passive verb and adverb flagger", it just flags certain strings of characters -- final "-ly" for alleged adverbs, forms of "to be" for alleged passives.

Rory Erwin pointed out to Ford that "to be" has other uses, but Ford decided that this is OK because these other uses are probably bad too:

It is true that be-form verbs do not always indicate passive construction, but I've found that be-form verbs, when they indicate tense, often appear in sentences that could do better. Sometimes they can just be omitted. “The press seems as gullible today as they were when they bought his claim.” could also be “The press seems as gullible today as when they bought his claim.”

Sometimes such constructions indicate soft thinking: “The cat was tired,” or “Jim Kerry was angry about the recent vote” aren't passive, but neither sentence does much work, and if a piece contains many of them it can indicate laziness on the part of the writer. Sentences should take responsibility for themselves: “The cat, sleepy, rubbed David's ankles and mewled—and was ignored, her desires lost in the gap of language,” or “Angry and frustrated despite the applause, John Kerry stood at the podium, preparing a response to the just-announced vote in favor of the budget.”

Words -- adverbs, passives, "be-forms" and all the rest of them -- fail me.

I'll limit myself to one small comment. Ford suggests that the sentence "The cat was tired" should be replaced -- because it "should take responsibility for itself" -- by "The cat, sleepy, rubbed David's ankles and mewled -- and was ignored, her desires lost in the gap of language." The proposed replacement is certainly more self-consciously writerly, as well as nearly six times longer. But didn't Ford notice that it also introduces an actual instance of the dreaded passive, "... was ignored ..."?

A few months ago, when Ford wrote a defense of the Semantic Web, I gave him the benefit of the doubt. The application that he delivered, in the form of Harper's Connections, was not overwhelming. The Passivator, an unusually confused and thoughtless implementation of dubious grammatical advice as eye candy, makes me wonder. He takes a bad idea, misunderstands it, applies it earnestly and systematically in a visually attractive form, and then rationalizes its failures as features. Is this what future Semantic Web applications will be like?

[Update: although words failed me, they didn't fail several correspondents, who sent a variety of fluent criticisms and fulminations. For example, Daniel Ezra Johnson wrote:

I can't believe you left this other recast sentence out:

"'He walked into the room. Sally was typing a report.' could become, 'Turning the corner, he heard the sound of Sally's fingers on the keyboard, as she typed her weekly report.'"

As Daniel pointed out, this is subtle but effective revenge on the originators of the anti-passive campaign, such as Strunk and White, who must be writhing in their graves to see what they have wrought.

As for the rest of the critique -- "soft thinking" next to "Jim" Kerry, "be-form verbs" instead of "forms of to be", the peculiar connection between tensed forms of to be and sentences that "could do better", and the general strangeness of eschewing ""laziness" by filling sentences up with unmotivated appositives and irrelevant details --- well, erro longus, vita brevis. Life is too short.

Also, I have to say that I hate this role of correcting elementary errors of linguistic analysis, or questioning unthinking prescriptions that are logically incoherent, factually wrong and promptly disobeyed by the prescriber. Historians aren't constantly confronted with people who carry on self-confidently about the rule against adultery in the sixth amendment to the Declamation of Independence, as written by Benjamin Hamilton. Computer scientists aren't always having to correct people who make bold assertions about the value of Objectivist Programming, as examplified in the HCNL entities stored in Relaxational Databases. The trouble is, most people are much more ignorant about language than they are about history or computer science, but they reckon that because they can talk and read and write, their opinions about talking and reading and writing are as well informed as anybody's. And since I have DNA, I'm entitled to carry on at length about genetics without bothering to learn anything about it. Not.]

[Update 2: see Paul Ford's response, and my re-response.]

Posted by Mark Liberman at 07:59 AM

What a difference a word makes

According to this article in today's NYT, a wildlife preserve in South Carolina deleted one word and substituted another in its name, and thereby more than doubled the monthly count of visitors, while a local kayak and canoe rental place tripled its rentals.

The old name: the Congaree Swamp National Monument. The new name: Congaree National Park. The story emphasizes the effects of removing "swamp", though it's clear that the change from Monument to Park was also relevant. The park's naturalist says that "I used to be the loneliest ranger in town. Now the phones are ringing off the hook. ... If I can use a terrible pun, we're getting swamped." The article's headline is: "Park Is Still a Swamp, but Please Don't Tell the Tourists".

OK, point taken. Lose the word swamp in naming vacation destinations. And now for a different word. With respect to the recent Falluja massacre, Max Sawicky argues that

To me the term mercenary connotes someone willing to covertly commit war crimes and provide support for illegitimate military missions. ... A contractor in this theater is not a mercenary in my view.

Max gets a lot grief in the comments: "A mercenary is a soldier hired into foreign service - refer to dictionary " was one of the milder responses.

Well, let's do that. The OED's first two definitions for mercenary are

1. A person who works merely for money or other material reward; a hireling. In later use (prob. influenced also by sense 2): a person whose actions are motivated primarily by personal gain, often at the expense of ethics.

2. a. A person who receives payment for his or her services. Chiefly and now only: spec. a soldier paid to serve in a foreign army or other military organization.

I'm not sure where the line is, in this case, between denotation (which the American Heritage dictionary defines as "the specific or direct meaning of a word") and connotation ("the set of associations implied by a word in addition to its literal meaning"). The thing is, mercenary has some nasty literal meanings -- "works merely for money", "hireling", "motivated primarily by personal gain", "at the expense of ethics" -- and you can't completely put those aside when you pick up on the specific sense of "soldier paid to serve in a foreign army". Consider some of the OED's non-military citations, from the 16th to the 20th century:

1532 T. MORE Confut. Tyndale in Wks. 362/2 They holde that it is not lawfull to loue..God..for obteining of reward, calling this maner of loue..seruile bonde and mercennary. ...
1690 W. TEMPLE Misc. II. i. 68 Learning has been so little advanced since it grew to be mercenary. ...
1781 W. COWPER Hope 333 His soul abhors a mercenary thought, And him as deeply who abhors it not.
1837 H. MARTINEAU Society in Amer. III. 128 The disgusting spectacle of mercenary marriages. ...
1913 T. HARDY Changed Man 275 No man when he first becomes interested in a woman has any definite scheme of engagement to marry her in his mind, unless he is meaning a vulgar mercenary marriage. ...
1990 G. ROBERTSON Media Law 17 The law of England is indeed,..a law of liberty; but the freedoms it recognises do not include a licence for the mercenary betrayal of business confidences.

"Servile", "abhors", "disgusting", "vulgar", "betrayal". Definitely a word from the moral swamps.

So maybe the armed civilian guards who were ambushed in Falluja were hired soldiers, in a technical definition of the term, and maybe not -- it depends on their job description. You wouldn't call Brinks guards "soldiers", for example. But whether to call the Falluja victims "mercenaries" or not depends as much on your politics as on the facts, given the denotations as well as connotations of the word. Sawicky is right about that.

Posted by Mark Liberman at 06:55 AM

April 05, 2004


Mark Hurst suggests a new game: googlephrasing.

His suggestions:
"I've always wanted to go to"
"is the best movie i've ever seen in my life"
"surprisingly, i actually liked"
"there's absolutely no reason to believe"

Some snarkier ones:
"Why in the world did you"
"Who ever said that"

and a neutral pattern:
"is the * person I've ever"

(depending on whether * is cutest, funniest, smartest, coolest, or dumbest, nastiest, weirdest, sickest ). Similarly

"is the most * person I've ever"

Posted by Mark Liberman at 08:03 PM


In his discussion of "foreign teachers as economic migrants", Scott Sommers ends by focusing on their (mostly future) children, for whom he depicts a bleak future. I know quite a few "mishkids", American missionaries' children raised in foreign countries who've turned out very well as linguists or linguist-like people. Of course, I'd be less likely to know the ones who didn't make it.

Sommers writes:

Imagine the next generation of the world I am talking about. There will be children of career English teachers born in countries where the major language is not English. Their parents will lack the job skills and education to return to an English-speaking world where their skills in the culture market have no value. They will lack the language skills in either English or Mandarin to become professional workers in either cultural world. Without the legal guarantees of colonialism, such children will not be able to do anything except move down the occupational food chain. They may become workers in restaurants or stores where only low-levels of language skills are necessary. They may even end up working in local industries where foreign language skills aren't important.

Searching for mishkids and missionary kids turns up some cases that look good:

"John Hersey, former Time correspondent and winner of a Pulitzer Prize, ... told of his early close relationship with Mr. Luce [that's Henry R. Luce, founder of Time] based in part on their both being 'mishkids' the children of missionaries in China."

as well as some sites devoted to cases that don't.

Here is a site devoted to TCK -- "third culture kids" -- under the headings of "missionary kids", "military kids", "diplomat's kids" and "business expat's kids". There's no category for "foreign language teacher's kids" -- maybe there aren't enough of them yet? This page on the site has some statistics on careers, education and relationships of adult TCKs, which includes the fact that "73% graduated from university (only 21% of American population graduates)". This makes it sound like a career flipping burgers, whether in Boston or Beijing, is probably not a dominant outcome.

[Update 4/6/2004: Scott Sommers coments further on this question, suggesting that the experience of children of foreign teachers in the Far East will be "very different from that of the missionaries, diplomats, expat businesspeople, and military personnel ", especially in terms of the nature of supporting organizations (churches, companies, embassies, etc.) and the nature of on-going ties to the country of origin., "compounded by legal and social systems that actively discourage legal immigration". These are good points and his whole discussion is worth reading, especially since the scale of the problem is potentially so large. ]

Posted by Mark Liberman at 11:46 AM

The inner necessity of phonetic metalanguage

From an exchange on Slate between Jeffrey Goldberg and Leon Wieseltier about Wieseltier's guest spot as "Stewart Silverman" on The Sopranos, another argument for elementary education in linguistics.

Goldberg: ... Your enunciation of the word "motherfucking" was perfect. I smell Emmy.

Wieseltier: ... I am delighted that you recognize the sociolinguistic analysis that went into the enunciation of my searing expletive. These things are not as easy as they seem. Needless to say, when I first read my lines I discovered parts of myself I never knew existed. As I pondered the character of Stewart Silverman, I began to grasp the inner necessity of the hard "g" in my "motherfucking." Our Italian-American brothers and our African-American brothers might surrender the concluding letter of the exclamation, so as to establish some integrity on the street.

But Stewart Silverman lives in perfect horror of the street. He doesn't even park on the street. ...such a fellow is a long way from authenticity. And so he would land very hard on that "g". He didn't go to BU for nothing. This is a man who is this week boasting to anybody who will listen that he once flew into West Palm on the same plane as Peter Bacanovic. In sum: motherfuckinggg.

So Wieseltier knows what sociolinguistic means. And his analysis of the relevant aspects of the sociolinguistic difference between "-ing" as [ɪŋ] and "-ing" as [ɪn] is not only vivid, but also accurate as far as it goes. So it's too bad that he doesn't know that there's no "g", hard or soft, anywhere in the picture: the more informal (and older) version (often spelled -in') has a coronal nasal [n], where the more formal (and innovative) version has a velar nasal [ŋ]. People often talk about "g dropping" because of the orthographic conventions; Wieseltier compounds the error by suggesting that a retained (orthographic) "g" must be "hard" -- though the usual ordinary-language meaning of "hard g" is the sound of the first "g" in gorge, as opposed to the "soft" (i.e. palatalized) sound of the second "g" in the same word.

The sad truth is that even if Wieseltier had a clue about the phonetics, there's no good way to talk about it that his audience would have understood. He could have said "...the inner necessity of the "-ing" in my "motherfucking", I guess, but that's just perfoming the morpheme, not talking about its performance. Even highly-educated Americans have no metalanguage for talking about the sounds of English, except for ambiguous and inconsistent references to the ambiguous and inconsistent orthography.

I should say that I have not seen the Sopranos episode in question, and so it remains possible that Wieseltier actually enunciated a voiced velar stop [g] at the end of the cited word, as a sort of spelling-pronunciation hypercorrection. But I doubt it.

[Update: more on this here.]

Posted by Mark Liberman at 10:50 AM

English teachers as economic migrants

Scott Sommers (here, here, here and here) and Kerim Friedman have been discussing English teachers in East Asia as economic migrants. Scott writes: "I estimate that almost a million foreign teachers have been employed in Japan, South Korea and Taiwan since 1990." I wonder what the balance of teachers, so to speak, has been? Have countries like the U.S. and the U.K. exported more language teachers than they've imported? One major difference, I think, is that few foreign-language teachers in the U.S. are recruited directly from overseas, as opposed to being hired from the pool already living locally. Presumably this in turn is because of differences in immigration patterns and policies.

Posted by Mark Liberman at 08:33 AM

Circolwyrde Wordhord

From Carl Berkhout's "glossary, which ... is intended for Anglo-Saxonists and other speakers of English for whom the language of the computer world has become alien and largely incomprehensible" you can learn that "relational database management" is nothing but plain old cneomæglicgifhordonweald, while "file transfer protocol" is just dælgewrixlendebyrdnes.

However, the lexicographer's job is never done, so you'll search Carl's wordhord in vain for the new coinages in Veen's IA Jargon Watch, of which my favorite is S2BU. I get those a lot, and never knew what they were called in the trade.

[Wordhord via Emerging Communications]

Posted by Mark Liberman at 06:32 AM

OFAC Comes to Its Senses

A month ago Chris Potts and I commented on the Treasury Department's bizarre position that the US law restricting trade with certain countries forbids scientific journals from editing papers originating in those countries. They have now come to their senses. The New York Times reports that the Institute of Electrical and Electronic Engineers has received a letter from the Treasury Department retracting its previous position. Maybe now, as Geoff Pullum suggests, we can get on with jailing copy editors for the right reasons.

Posted by Bill Poser at 03:08 AM

April 04, 2004

Your Semitic Guess is As Good As Mine

There was a cartoon in Punch once that showed a Victorian lady talking to a bookseller:

Lady: Give me another poet, I can't understand this Browning!
Bookseller: Praed?
Lady: Yes, but praying didn't help.

I was put in mind of this when I ran across the following lines from Browning's "Easter Day" in the course of looking to see how Semitic was used Victorian times:

So that, subduing, as you want,
Whatever stands predominant
Among my earthly appetites
For tastes, and smells, and sounds, and sights,
I shall be doing that alone,
To gain a palm-branch and a throne,
Which fifty people undertake
To do, and gladly, for the sake
Of giving a Semitic guess,
Or playing pawns at blindfold chess.

Does anybody the blogosphere have a clue what Browning might have meant by a "Semitic guess"? Neither that phrase nor for that matter "Jew's guess," "Jewish guess," etc. turns up elsewhere in any searches I've done in Victorian literature or in 19th-century books and newspapers, but it has the sound of an expression that Browning's readers would have recognized. If you have specific knowledge of the phrase, please drop me a line at nunberg-at-csli.stanford.edu

Posted by Geoff Nunberg at 10:23 PM

Lemony Snicket barred from speaking

Neil Gaiman posts a letter from "Lemony Snicket" (Daniel Handler) about a scandalous series of events at the Academy of Art University in San Francisco. According to the letter and this SF Chronicle story, a writing instructor named Jan Richman was disturbed enough by a grisly short story submitted by freshman in her class that she went to her department coordinator for advice. Within a week, the student had been expelled (or at least sent home), and Richman had been fired (or more exactly not renewed, as she was on a term-by-term arrangement).

What the student did wrong was to write and submit an assignment that Richman says was "was full of sex and violence, incest, pedophilia. There was no story, no character development -- just hacking up bodies". What Richman did wrong, according to the university, was to have the students read a story by David Foster Wallace called "Girl with Curious Hair", which was not part of the authorized textbook, and which features an unsympathetic character named "Sick Puppy", who is apparently well named.

David Foster Wallace may be an arrogant and careless prescriptivist (scroll down to the bottom of this page for Language Hat's extensive critique of his 2001 Harper's Magazine screed on the usage wars), but firing a writing instructor for assigning his short stories is going too far. And expelling or suspending a college student for writing something (in this case, his own nasty story in imitation of Wallace) is wrong, in my opinion and that of many others.

Handler's letter adds that

the school has responded by announcing stringent policies regarding the content of students' artwork (writing, visual art, film, video game design, etc.), what can be taught in the classroom, and who is allowed to speak on campus. This was brought home to me when an instructor at the college invited me to speak to his class (along with the fired teacher and a representative of the First Amendment Project) and I was physically barred from entering the building.

[via Scott McCloud's Morning Improv]

[Update: more on this at RotR, including the news that the university now "require(s) all instructors to approve any supplemental instructional materials through [the] administration. Students are no longer permitted to distribute their work to fellow students. The teacher must now see it and approve it first." ]

Posted by Mark Liberman at 09:23 PM


Eddie at Romanika has a long and interesting discussion of Spanglish in Action, starting from the example "Los papeles no matcheaban".

See this post for a summary of Ruth King's ideas about some striking syntactic as well as lexical effects in Canadian French/English contact phenomena. She argues that some varieties of Canadian French have borrowed English-style "preposition stranding" along with English prepositions, as in "Le gars que je te parle de." I wonder if Spanglish varieties show either borrowed prepositions, borrowed verb-preposition pairs (like Prince Edward Island French "se dresser up" or "singler out"), or preposition stranding?

My favorite Spanish/English contact story concerns a traditional cold remedy. I can't recall where I heard or read this story, unfortunately, and for once Google is no help. Anyhow, here's how I remember it:

A guy from a Caribbean Spanish background grew up in the U.S. One of his most vivid and cherished childhood memories was the smell of the medicine that his grandmother would treat him with when he had a cold. She would make up a kind of poultice and rub it into chest and back. The medicine gave a pleasant warm tingling sensation to his skin, and its strong smell would clear his head. Her word for this special, magical, traditional remedy was bibaporú.

When he was grown up, his own child got a bad cold, and he had the idea of trying to get or make some of this medicine. His grandmother had passed on, so he asked his parents about bibaporú (which he anglicized to "bebop-aroo"). What was in it? where could he buy some? or how could he make some? After his parents got over their surprise, they told him to try the corner drugstore. Bibaporú turned out to be Vicks Vapor-Rub -- as assimilated to the norms of Caribbean Spanish phonology: "v" is [b] word-initially; syllable-final stops are deleted; the vowel of vicks (IPA [ɪ])becomes the vowel of bee (IPA [i]); etc.

[Update 12/10/2007 -- Alan Shaw writes:

I'm just catching up on recent Language Log entries and was taken aback by your reference on November 19 to the bebop-aroo effect. There was absolutely no need for me to follow the link to your 2004 post (which appeared long before I started reading Language Log) to know what it was, for I have experienced it in my own life!

My father owned and ran a drug store in the Times Square area from the 1950s to the 1980s, and when I was a teenager I helped out behind the counter quite a bit. Many of our customers spoke no English, and I was just learning Spanish. My first milestone in divining the customer's needs was when I caught just the first few words of a sentence "Yo quiero algo..." and ran back to ask Dad what and where the algo was. Later on I had the bebop-aroo request. Really. Dad had obviously heard it before for he knew exactly what it meant.

(Actually it was more like o-rhoo -- sorry for my feeble sound spelling but I'm sure you get what I mean.)


Posted by Mark Liberman at 06:06 PM

Language Log is a noble gas

An impossibly heavy noble gas. According to Score Bard's Periodic Table of Blogs, that is.

Well, sort of. The main series in the periodic table of the elements has 7 rows and 18 columns, while Score Bard's periodic table of the blogs has 8 rows and 12 columns. So some interpretation is required.

Language Log (symbol Lu) is in the 8th (bottom) row in the 12th (rightmost) column. The rightmost column in any self-respecting periodic table should certainly be the noble gases (Helium, Neon, Argon, Krypton, Xenon, Radon, Ununoctium). By that analysis, we're one row beyond the heaviest so-far hypothesized noble gas, Ununoctium, whose alleged discovery in 1999 was retracted in 2001. K3wl!

Maybe I'm over-analyzing here. Score Bard writes

"This is my blogroll. It's not meant to be authoritative or representative. It's just a color-coded list of the blogs I like. [...] Now, I change it according to my whims. There are no rules."

Posted by Mark Liberman at 10:13 AM

No hurr in Nellyville?

A few days ago, I discussed a New Yorker article by Jake Halpern that mentions "a quirky St. Louis dialect whereby 'here' is pronounced 'hurr' -- as in Nelly's 'Hot in Herre' -- 'there' becomes 'thurr,' and 'everybody' is 'urr'body.'" I wondered at the time what this sound pattern really is. I still don't know, but I've learned a little more. To sum up: Halpern seems to have gotten it wrong. Initial listening reveals that words like there are pronounced in the way he indicates, but here isn't, at least not in the song that he cites. There are some sound clips below, so you can listen for yourself.

The point is not to debunk Halpern, who merely mentions this sound change in passing, in one paragraph of an interesting article that's mainly about other things entirely. But this striking set of pronunciations has made a big impression on the people who listen to St. Louis rap, Halpern included, and I'm curious about what's really going on, and what its history is.

Talking about sound change can be confusing, especially sound change in English, which has lots of distinct vowels and a very phonetically-ambiguous writing system. As Halpern did, it's common to say "X becomes Y", where both X and Y are written in standard English orthography. But is X -- which changes its sound -- just a single word, or a class of words? Usually what changes is a whole class of words that share a certain sound pattern. But then what is the class of words, and what part of their sound pattern is affected? And when we say that in some English dialect "X becomes Y", where Y is just another English word, whose pronunciation of word Y are we talking about? If Y is not a word, but an attempt to use the conventions of English orthography to depict a sound, the situation is worse, because the reader has to decode the spelling as well as allowing for the speaker.

Over the centuries, linguists have worked out some ways to talk about these things without getting too confused. I'm going to use three of these methods here. First, in talking about classes of words that change, I'll start with J.C. Wells' idea of "lexical sets" -- a set of 24 classes of English monosyllables, divided up according to their vowel sounds, in a way intended to work across all English dialects. We're dealing here with Wells' lexical set #22 SQUARE, which also includes words like care, air and wear, and perhaps with his lexical set #19 NEAR, which also include words like beer, weird and fierce, and his lexical set #9 NURSE, which also includes curb, turn, work and so on. Second, in talking about the way words are pronounced, I'll use symbols from the International Phonetic Alphabet vowel chart, as well as expressions like "the vowel in the way that most Americans say stir" (FWIW, I was raised in eastern Connecticut, but have a few vowels derived from my mother's tidewater Virginia upbringing). Third, I'll link to short illustrative sound clips from the cited songs.

St. Louis hiphop certainly (sometimes?) features a centralized vowel in words from the lexical set SQUARE, as in Chingy's Right Thurr, whose chorus (in standard orthography) is:

I like the way you do that right there (right there)
Swing your hips when you're walkin', let down your hair (let down your hair)
I like the way you do that right there (right there)
Lick your lips when you're talkin', that make me stare

As you can hear for yourself, there, hair and stare are all are pronounced so as to rhyme with the way that most Americans pronounce the words in the lexical set NURSE . This is a rhotic (i.e. r-colored) central vowel, IPA [ɚ]. I don't know whether this results in a merger of stare and stir, bare and burr, etc., or whether the NURSE words change their sound to something else in a chain shift.

Bill Labov tells me that one of his students has told him that her roommate from E. St. Louis has this sound change, though "it is not one of the large battery of St. Louis features that have been remarked in the many studies of that city", and none of the four (white) speakers from St. Louis in his Telsur survey showed it. A plausible precedent for this sound change would be the dialect that variously merge various subsets of of marry, merry, Mary and Murray (although this doesn't apply in closed syllables like there or hair).

But Helpern also asserts that speakers of this dialect pronounce "here" as "hurr", citing Nelly's "Hot in Herre". This was more surprising to me. Does beer really merge with burr? or is there just something "quirky" about here? Well, neither one, as far as I can tell. Here's the relevant line from Nelly's "Hot in Herre", and it doesn't sound to me like much has happened to the vowel at all, compared to the general American pattern. If it's centralized at all, it certainly hasn't gone all the way to "urr". The centralizing glide at the end of the vowel isn't very strongly rhotic -- r-ful -- but that's not the feature that Halpern was ostensibly talking about.

I don't know whether Chingy's urrification of the SQUARE set is historically independent of the change that produced the pronunciations traditionally written "thar" for there, or "bar" for bear. It'd be nice to learn that Chingy is carrying on the tradition of Boggs in Huckleberry Finn, who "comes a-tearing along ... whooping and yelling ... and singing out 'Cler the track, thar.'"

If we take Twain's transcription at face value, then a hundred and fifty years ago, just a bit further down the Mississippi from St. Louis, there was a chain shift of front vowels before /r/ that would have brought here down to where hair had been, and hair down to "har". This is not quite Nellyville, either geographically or phonetically, but it's close enough to be interesting.

I really like this idea. The only trouble is that I don't hear any evidence of here shifting in Nelly's Hot in Herre. But I notice on amazon.com that Nelly sells two version of Nellyville -- Nellyville [Explicit Lyrics] and Nellyville [Clean]. Maybe in keeping with Barry Schwartz's theory about the Tyranny of Choice, Nelly has also produced Nellyville [Explicit Lyrics] [E. St. Louis dialect], Nellyville [Explicit Lyrics][South Side Chicago dialect], etc., and I somehow got the wrong one?

[By the way, here is a brief clip of J-Kwon's pronunciaton of everybody in the song "Tipsy", as cited by Halpern. It's hard to tell from a single, rapid, slurred rendition against a musical background, but it sounds like the "every" part has become a single rhotic vowel, which is somewhat centralized, though maybe it hasn't gone all the way to [ɚ] Note also that in this clip four and floor seem relatively r-less, and the pronunciation of here doesn't sound any more centralized than Nelly's did. But the way to characterize this way of talking would be to analyze some recorded interviews, not to puzzle over a few scraps of song. Finally, a fan site says that "cornell haynes" (i.e. Nelly) "was born in texas, but was moved to spain for three years", so who knows where his speech patterns come from? Chingy and J-Kwon seem to be St. Louis natives. ]

Posted by Mark Liberman at 07:18 AM

April 03, 2004


Kate, "an undergraduate student of linguistics somewhere in the American Midwest", has started a weblog entitled Chainik. She poses the question "Seriously, why the blog?" and answers "I got tired of talking to myself", which is the best short answer I've heard yet. Her inaugural post reveals the fact that "in the Midwest, at least, many people have heard" that "Russians have no word for 'fun'", and debunks the legend by providing not only the word itself but also several derivatives.

Posted by Mark Liberman at 07:27 PM

ChatNannies update

If you are a connoisseur of AI hoaxes or bad science reporting, you'll want to keep up with the ChatNannies story. As a long-time aficionado of both, I'm enjoying the show immensely.

This waxy.org page has been following the story since 3/23/2004, and has an ongoing cornucopia of fascinating links and comments, including several striking comments apparently from Mr. ChatNannies himself, Jim Wightman; Michael Williams has an analysis; Cameron Marlow has a report of his own chat with a NannieBot. Ray Girvan has a series of interesting posts. Ray makes the important point that more than two weeks after the original (March 17) articles in New Scientist, the BBC and Reuters, and a week after a skeptical piece by Ben Goldacre in the Guardian, none of these allegedly responsible news outlets has issued a correction or a retraction.

It appears that Wightman, when not reeling in journalists, has spent his time posting on alt.revisionism, both under his own name and under the pseudonyms of "Death's Head" and "Totenkopf". It also seems that in the past, he's made some dubious software claims -- apparently-infeasible software mods to TiVo and XNews -- though nothing on the scale of ChatNannies.

Among Wightman's (apparent) comments on the waxy.org site:

Hehe you guys make me wish for nuclear holocaust, really.
Your arguments are below me.
It is your kind of attitude I attempted to challenge on alt.revisionism (and hey, guess what - if you call me names such as 'antisemite' or 'revisionist' I don't give a f**k because I know the truth) and was increasingly frustrated by the bigotry and ignorance of those that claimed to be 'intelligent'. People that when faced with a 'fact' (whether real or not) would argue that it can't possibly be true because _history_ forbids it. More like herd mentality. And in truth thats the only 'ism' i'm guilty of..antisheepism.
Run along little sheep, your masters are waiting for you! Baa! Baaaaa! Baaaa!

Who could make this stuff up? Well, maybe Thomas Pynchon in his younger days.

Posted by Mark Liberman at 06:41 PM

The sexual energy of chatbots

Clive Thompson at Collision Detection writes about Michal Zalewski's chatbot Catty, which "scours Google to find seemingly appropriate text responses." Although Thompson credits Zalewski with having (re)discovered "the paradoxical rule of chatbots: The more simple and stripped-down the 'bot is, the more lifelike it'll seem", the dialog he quotes suggests that he means "seemingly appropriate" and "lifelike" in a somewhat special sense:

Me: Nice to meet you!
Catty: for neither can he that understands that understand anything else nor he that sees that see any thing else nor hear any other thing nor in sum move the body?
Me: Okay, you've lost me.
Catty: okay i think i understand you better now and i can appreciate and just enjoy the give and take and the sexual energy thats going one right now?
Me: There's already sexual energy in this conversation?
Catty: many people are already doing some of the 8 points suggested in this lifestyle program :-(
Me: Holy moses. You are like ALICE's crazy sister.
Catty: i am like a hole in your shoe a breath of fresh air with every step you take
Me: Yeah, well -- you're sort of proving my point.

Read the whole thing.

[Note that (unlike the apparently fraudulent ChatNannies) this software is for real, and you can download your own copy. It just doesn't seems to work very well (despite being based on a neat idea), even compared to classical AI chatbots like Eliza. See the documentation on Cobot for an example of a chat agent that is for real (as ChatNannies seems not to be), but more capable than Catty. Or browse the 'bots at the Chatterbox Challenge.]

Posted by Mark Liberman at 09:46 AM

The blood of grammarians

Welcome to another language-related weblog, Dave Hosinger's Semantickler, which seems to have started up on 3/30/2004. He blogs an amusing quote from the Wikipedia article on linguistics, which I had not read:

"Linguistics is arguably the most hotly contested property in the academic realm. It is soaked with the blood of poets, theologians, philosophers, philologists, psychologists, biologists, and neurologists, along with whatever blood can be got out of grammarians."

It's remarkable that several generations after our culture stopped forcing children to learn grammatical analysis, the stereotype of the bloodless grammarian persists. Even Geoff Pullum acknowledges the stereotype as he differentiates himself from it.

At least the journalist who was quoted ( Russ Rymer) didn't say that "from time to time, the tree of language must be watered with the blood of grammarians." Though based on this note from Vicki Fromkin, and my impressions from reading his book on Genie, that is more or less what he meant.

As for the nature of "contested propert[ies] in the academic realm", I recently heard an old joke (about the difference between capitalism and socialism) rehabilitated to describe the difference between industrial and academic research: "In industry, it's dog eat dog, whereas in academia, it's just the opposite."

Posted by Mark Liberman at 09:06 AM

The right X and the right Y

A BBC story about the death of Queen Juliana of the Netherlands quotes "Petra Graafland, government worker" as saying "she was the right queen for the right time." The point seems to be that the queen's down-to-earth, bicycle-riding ways were appropriate for the Netherlands in the last half of the 20th century. The quotation is an example of an English construction in which parallel modifiers are interpreted in a special way.

When the California District Attorney's Association calls someone "a bad man with a bad record", they mean that the man is bad and that the record is bad, and they are not suggesting (for example) that some other man with the same record would be a fine fellow. However, when the author of this blog entry says that we should "use the right tool for the right job", what he means is that the pairing of tool and job should be right. He is comparing MySQL and PostgreSQL programs, and he explicitly denies that (as of 11/2003) either can be considered "the right tool" independent of the job it's supposed to do, and also denies that one class of application should be privileged over another.

Aside from this special interpretation of parallel modification, English seems to be deficient in easy or idiomatic ways to talk about the properties of relations as distinct from the properties of the items related. Attempts to express such evaluations in a precise way have a sort of 18th-century flavor. Gibbon, when he discussed the division of talents and labors between Balbinus and Maximus, did not talk about "the right Emperor for the right job", but rather wrote that "...[t]he various nature of their talents seemed to appropriate to each his peculiar department of peace and war...". Few people write like that these days -- and few people read those who do, at least not twice.

Google has 115,000 pages in its index for "the right * for the right", with X and Y instantiated as pairs like war / reasons, toy / age child, tree / place, tree / situation, plant / spot, tool / job, tools / trades, format / recipient, agent / customer, person / job, thing / reason, trucks / jobs. After reading a sample of the examples, I conclude that nearly all of them, like the RDB comparison discussed above, intend to attribute rightness to the pairings rather than to the items paired.

The phrasal template is not limited to the preposition "for":

the right X in the right Y 155,000
the right X at the right Y 724,000
the right X on the right Y 14,500

Nor is the adjective "right" required:

the wrong X at the wrong Y 90,600
the correct X at the correct Y 3,890
the proper X at the proper Y 3,470

Nor is it required that the two adjectives be the same: "the right X at the wrong Y" gets 6,960 ghits. However, I suppose that the paired modifiers need to collectively denote a property that can be applied to the pairing of elements in a relation: "armchair offensive coordinators can use their real-life NFL knowledge to match up a taller receiver with a shorter cornerback".

The pattern can be generalized to n-ary relations ("These reactions must take place at the proper time, at the proper rate, to the proper extent") and to indefinite NPs ("Phifer is guilty only of being in a wrong place at a wrong time"), though binary relations and definite NPs are much more common. The prepositional structure is also not required: "It'll tell them if they are using the right browser and the right version, or not". There are also plenty of examples that don't even have an "and": "...how to put the right information on the right storage devices..."

The only thing that seems to be required is parallel modification -- modifiers with the "right" semantics in the "right" structural relationship. Since some (all?) instances of this pattern can also be interpreted in a more normal (or at least locally-compositional) way, there must either be ambiguous structures or optional principles of interpretation involved.

For all I know, there's a large literature on this subject. Has anyone suggested an enlightening semantic analysis? Even better, does anyone's analysis of modification handle these cases without special "construction grammar" pleading?

I wonder about the generality of the pattern across languages -- "le bon * pour le bon" gets 309 ghits, for example, so this is not a purely English phenomenon -- and about its antiquity in English. If the analogous patterns work the same way in all (or at least many) languages, then it's more plausible that a general theory of modification (applied to the right modifiers in the right structures) should handle them correctly.

[Update: Russell Lee-Goldman emailed:

There's a four-character compound in Japanese,
適材適所 (teki-zai-teki-sho),
which means, morpheme-by-morpheme, 'suitable material suitable place.' It is generally understood to mean "giving a person a job that meets with their skills." Though most of these four-character compond can't be altered, some google searching came up with
適大適所 ('big' instead of 'material,' actually meaning 'size' in this case) and
適竿適所 ('rod' instead of 'materal,' talking about
fishing), and also
適時適所, which literally means 'right time right place.'
Searching for more phrasal equivalents did not yield results (with parallel phrases having adjectives like 'suitable,' 'right' describing heads like person/job tool/job, etc.)


Posted by Mark Liberman at 07:48 AM

Etymology and Bigotry

One of the most prominent forms of bigotry in the world today is Arab anti-Semitism. All forms of bigotry are evil, but Arab anti-Semitism is unusual at present in being so widespread, virulent, institutionalized, and socially acceptable. It is all too evident in the Arab press and in the pronouncements of both political and religious leaders. An excellent source is the Middle East Media Research Institute, which provides translations from the Arabic, Farsi, and Hebrew press. Or try the Hamas web site if you want to read genocidal bigotry from the horse's mouth.

It goes without saying that not all Arabs are anti-Semitic and that Arab anti-Semitism doesn't justify bigotry against Arabs, which has reared its ugly head in the United States in the past two years, but an all too common response to criticism of Arab anti-Semitism is to say that Arabs cannot be anti-Semitic because they too are Semites. A typical example can be found in this recent guest column by Samar Ali in the Vanderbilt Hustler, the student newspaper at Vanderbilt University.

Furthermore, being that all Arabs are Semites, it seems ludicrous to claim that Arab states produce anti-Semitic propaganda in hopes of destroying the Jewish people.
Another example is this piece in the Egyptian newspaper Al-Ahram. It's true that anti-Semitism ought to mean "hatred of Semites", including Arabs. That's what we would expect from an analysis of the word into its components. But that isn't what it means. Since the term was coined in the late 19th century, it has been used with the specific meaning of "hatred of Jews". [Note: The OED gives 1881 as the first use of the term in English. German Antisemitismus appeared earlier, in 1880, in Wilhelm Marr's Zwanglose Antisemitische Hefte. He is said to have used the term for some years before it appeared in print.] The American Heritage Dictionary of the English Language (Fourth edition, 2000) defines anti-Semitism as:
1.Hostility toward or prejudice against Jews or Judaism.
2.Discrimination against Jews.
Here are the definitions of anti-Semitism that Google found on the web:

prejudice against the Jewish people.

Attitudes and actions directed against the Jewish people.

Anti-Semitism is prejudice or discrimination against Jews, based on negative perceptions of their religious beliefs and/or on negative group stereotypes. Anti-Semitism can also be a form of racism, as when Nazis and others consider Jews an inferior "race."

Prejudice or discrimination against Jews.

Anti-Jewish prejudice. (See page(s) 292)

Discrimination against or persecution of the Jews because of their religious beliefs or race.

Hostility towards Jewish people.

A modern European racist ideology that first understands Jews as a race and second understands that race as inferior and degenerative of cultures in which Jews are assimilated.

Anti-Jewish prejudice. (p. 310)

Irrational hatred of the Jewish people.

the intense dislike for and prejudice against Jewish people

Every single one defines anti-Semitism as hatred of Jews, not of Semites in general. Etymology doesn't always determine meaning.

Frankly, I don't believe that very many of the people who make this argument don't know that anti-Semitism refers specifically to bigotry against Jews. Anybody who has ever looked the term up will know this, as will anybody who has discussed such issues much or studied the history of the 20th century. Assuming that this Vanderbilt Register story describes the same Samar Ali as the author of the guest opinion piece, she studied Political Science, was the President of the Student Government Association, and co-founded the Middle Eastern Students Association at Vanderbilt. With that background, I am hard put to believe that she doesn't know what anti-Semitism means.

But just for the sake of argument, let's assume that someone mistakenly but sincerely believes that anti-Semitism is hatred of all Semites, including Arabs. If this person is told that anti-Semitism is a problem among Arabs, with no other context, she would legitimately be puzzled. But in virtually any real situation, as in the case at hand, it is perfectly clear that the charge is that anti-Jewish bigotry is widespread among Arabs. In this situation, what are reasonable responses?

  • She might deny it, though this would be difficult in view of the overwhelming evidence;
  • She might acknowledge it and express regret.
But I submit that quibbling over the applicability of the term anti-Semitism is not among them. One might point this out in a footnote, but it isn't a legitimate response to the charge. Even if true, it isn't relevant.

When someone claims that Arabs can't be anti-Semites because Arabs are Semites too, she is sending out a red herring. She knows perfectly well that the charge concerns bigotry against Jews, is unable to deny it, and is unwilling to express regret. In short, it's an implicit admission of complicity.

Posted by Bill Poser at 12:39 AM

Google's latest feature: the Counterexemplifier

Trawling rather carelessly for images to use as a backdrop to some slides on induction for a class I'm (co-)teaching, Google gave me more than I bargained for. It now has a built-in Counterexemplifier, which will eventually eliminate the need for scientists to perform tiresome and expensive fieldwork or laboratory testing to evaluate their theories, just as it has already eliminated the need for teachers to read books in order to prepare for class (or for students to attend those classes).

Try  "All swans are white" to see the Google Counterexemplifier in action. What will those clever folks at Google think of next?

Nota bene: The Counterexemplifier must be a beta release: see "Swans are white" or even  "All swans are black".
Posted by David Beaver at 12:16 AM

April 02, 2004

The Culture of Polarization, Linguistics Style

A piece by Emily Eakin in The New York Times a few weeks ago recounted the research of Valdis Krebs (described intriguingly as "a social-network analyst in Cleveland") on the readership of those political bestsellers by Michael Moore and Al Franken or Bill O'Reilly and Sean Hannity. By examining Amazon.com's "customers who bought this book also bought" feature, Krebs was able to map networks of titles that defined conservative and liberal readerships. Not surprisingly, there was little crossover between the two. (In fact, the effect extended to other, nonpolitical bestsellers like The Da Vinci Code and The South Beach Diet, whose readerships also seemed to fall out on partisan lines -- go figure.)

The Times piece came to mind last week as a few of us languagelog contributors were chewing the electronic fat over the perennial question of why linguists get no respect. Despite the best -- and occasionally, bestselling -- efforts of popularizers, people seem disinclined to give up their cherished preconceptions about language, from their conviction that African American Vernacular English is slovenly and without rules to their certainty that Elizabethan English persists in Appalachian hollows. (For a catalogue of these canards, see Laurie Bauer and Peter Trudgill's collection Language Myths.)

Is what we have here just a failure to communicate?

That's the view of many linguists, who call for more and better efforts at popularization. But it seems to me that linguistics has been pretty well served by its popularizers, from from Robert A. Hall to modern linguists like John McWhorter, Steve Pinker, Geoff Pullum, Mark Baker, Deborah Tannen, Jean Aitchinson, Ray Jackendoff, Neil Smith, Donna Jo Napoli, David Crystal, John and Russell Rickford, John Baugh, and many others. And that's not to mention the informative documentaries of Gene Searchinger and Robert McNeill. Pound for pound (we're a small discipline, after all), I'd stack that line-up against the popularizers of any other science.

In fact the problem here may be a polarization of audience analogous to the polarization of the audience for political bestsellers. Let's check the Amazon "customers also bought" list for some of the most successful recent popularizations (I'm omitting other titles by the same author):

Steve Pinker's The Language Instinct:

The Selfish Gene by Richard Dawkins
On Language: Chomsky's Classic Works, ed. by Mitsou Ronat
Contemporary Linguistics ed. by William O'Grady et al.

John McWhorter's The Power of Babel:

The Language Instinct by Steven Pinker
The Atoms of Language by Mark C. Baker
Words and Rules by Steven Pinker

Mark Baker's Atoms of Language:

Foundations of Language by Ray Jackendoff
The Language Instinct by Steven Pinker
The Power of Babel by John McWhorter
Words and Rules by Steven Pinker
Understanding Syntax by Maggie Tallerman

Geoff Pullum's Great Eskimo Vocabulary Hoax:

Language Myths ed. by Laurie Bauer and Peter Trudgill
On Language by Noam Chomsky, ed. by Mitsou Ronat
Freedom Evolves by Daniel Clement Dennett
The Language Instinct by Steven Pinker
The Blind Watchmaker by Richard Dawkins

The generalization seems to be that people who buy popularizations of linguistics tend to buy other popularizations of linguistics -- or failing that, other books on cognitive science and related topics. Now let's look at some books by the grammar mavens and word-lore collectors:

The Grouchy Grammarian: A How-Not-To Guide to the 47 Most Common Mistakes in English Made by Journalists, Broadcasters, and Others Who Should Know Better, by Thomas Parrish:

A Word A Day: A Romp Through Some of the Most Unusual and Intriguing Words in English by Anu Garg and Stuti Garg
The Dictionary of Concise Writing: 10,000 Alternatives to Wordy Phrases by Robert Hartwell Fiske, Richard Lederer
1000 Most Important Words by Norman W. Schur
Verbatim: From the bawdy to the sublime, the best writing on language for word lovers, grammar mavens, and armchair linguists, ed. by Erin McKean
Dubious Doublets: A Delightful Compendium of Unlikely Word Pairs of Common Origin, from Aardvark/Porcelain to Zodiac/Whiskey by Stewart Edelstein

Word Court: Wherein Verbal Virtue Is Rewarded, Crimes Against the Language Are Punished, and Poetic Justice Is Done by Barbara Wallraff, Francine Prose

Lapsing Into a Comma : A Curmudgeon's Guide to the Many Things That Can Go Wrong in Print--and How to Avoid Them by Bill Walsh
Woe Is I: The Grammarphobe's Guide to Better English in Plain English by Patricia T. O'Conner
Bryson's Dictionary of Troublesome Words by Bill Bryson
Sin and Syntax : How to Craft Wickedly Effective Prose by Constance Hale
The Copyeditor's Handbook by Amy Einsohn

Woe Is I: The Grammarphobe's Guide to Better English in Plain English by Patricia T. O'Conner

Lapsing Into a Comma : A Curmudgeon's Guide to the Many Things That Can Go Wrong in Print--and How to Avoid Them by Bill Walsh
On Writing Well by William K. Zinsser
Sin and Syntax : How to Craft Wickedly Effective Prose by Constance Hale
Sleeping Dogs Don't Lay: Practical Advice for the Grammatically Challenged by Richard Lederer and Richard Dowis

(I tried this with Richard Lederer and William Safire, but it turns out that the "customers also bought" lists for all their books included only other titles by the same authors -- though probably that says something, too.)

If you were to go just by these results, you might conclude that linguistic popularizers can have only a limited effect on popular attitudes about language -- the people who buy their books are the ones who are already disposed to accept their ideas, just like the purchasers of those "Conservatives-are-from-Mars-liberals-are-from-Venus" bestsellers.

Still, we should probably take all this with a grain of salt. The "customers also bought" lists obviously don't reflect the full range of the readership for linguistic popularizations, particularly those, like Steve Pinker's and John McWhorter's, which have obviously reached a wide general audience. And it may be too much to hope that linguists will be able to overturn popular misconceptions about language overnight -- on the basis of personal experience, it's hard enough to make these points to the English professors and cognitive scientists down the hall. At least we're in there trying.

Posted by Geoff Nunberg at 10:31 PM

Geezer: guiser or gozer?

In response to my post on "diamond geezer", Trevor wrote to say

I wonder if "geezer" isn't East End Jewish in origin. The equivalent urban Dutch word is "gozer", which http://www.ety.nl/jiddisch.html says is a Dutch Yiddish [derivation from] the Hebrew "chosen", bridegroom.

The OED says that geezer is "A dial. pronunciation of GUISER", which in turn is analyzed as "One who guises (see GUISE v. 3); a masquerader, a mummer. (Cf. GUISARD, GEEZER.)". Citations are given from the 15th and 16th centuries:

1488 Ld. Treas. Acc. Scotl. (1877) I. 93 Item, in Lannerik, to dansaris and gysaris, xxxvis. 1572 Satir. Poems Reform. xxxviii. 14 For gysours, deuysours, the Guysianis ar gude.

This might be one of those cases where two words from completely different sources form a mutually reinforcing resonance: a lexicographic pole, so to speak.

If it weren't for www.ety.nl, I might suspect Trevor of making a Ghostbusters allusion: "Wait for a sign from Gozer the Traveler; he will come in one of the pre-chosen forms." The whole geezer/guiser/gozer thing might be just as much an accident as this gozer/chosen association, I suppose.

Posted by Mark Liberman at 08:25 PM

The Beastly Garden of Wordy Delights

Ever wonder what to call a group of cockroaches, or a female rabbit? Melissa Kaplan's Beastly Garden of Wordy Delights contains an extensive collection of English terms of venery (words for groups of animals), specialized terms for the male, female, and young, and names for the sounds that they make.

Posted by Bill Poser at 07:56 PM

Multiple coordination: the competition is over

The time has come for me to report that the competition I unwisely launched to find the largest number of coordinates in a grammatical English coordination that has appeared in a respectable printed prose source has now gotten completely out of control. The competition has been canceled. But before I thank the contestants and say farewell to this ill-advised campaign of data collection, let me just survey some of the madness that it led to, and draw a moral.

The numbers started getting quite big, very fast. Lance Nathan pointed out a 122-coordinate example from actual speech. George Carlin utters it at the end of one of his acts. It is on his live album Playin' With Your Head; there is a sound clip here, and having played it I can say that it is completely clear that the whole thing is one huge coordinate noun phrase (it is introduced as a few unpleasant and worrying things that had not been mentioned during the performance but which the audience was advised to think about when heading home). Lance provided a transcription:

Anal rape, quicksand, body lice, evil spirits, gridlock, acid rain, continental drift, labor violence, flash floods, rabies, torture, bad luck, calcium deficiency, falling rocks, cattle stampedes, bank failure, evil neighbors, killer bees, organ rejection, lynching, toxic waste, unstable dynamite, religious fanatics, prickly heat, price fixing, moral decay, hotel fires, loss of face, stink bombs, bubonic plague, neo-Nazis, friction, cereal weevils, failure of will, chain reaction, soil erosion, mail fraud, dry rot, voodoo curses, broken glass, snakebite, parasites, white slavery, public ridicule, faithless friends, random violence, breach of contract, family scandal, charlatans, transverse myelitis, structural defects, race riots, sunspots, rogue elephants, wax build-up, killer frost, jealous co-workers, root canals, metal fatigue, corporal punishment, sneak attacks, peer pressure, vigilantes, birth defects, false advertising, ungrateful children, financial ruin, mildew, loss of privileges, bad drugs, ill-fitting shoes, wide-spread chaos, Lou Gehrig's disease, stray bullets, runaway trains, chemical spills, locusts, airline food, shipwrecks, prowlers, bathtub accidents, faulty merchandise, terrorism, discrimination, wrongful cremation, carbon deposits, beef tapeworm, taxation without representation, escaped maniacs, sunburn, abandonment, threatening letters, entropy, nine-mile fever, poor workmanship, absentee landlords, solitary confinement, depletion of the ozone layer, unworthiness, intestinal bleeding, defrocked priests, loss of equilibrium, disgruntled employees, global warming, card sharks, poison meat, nuclear accidents, broken promises, contamination of the water supply, obscene phone calls, nuclear winter, wayward girls, mutual assured destruction, rampaging moose, the greenhouse effect, cluster headaches, social isolation, Dutch elm disease, the contraction of the Universe, paper cuts, eternal damnation, the wrath of God, and paranoia!

David Beaver had meanwhile gone beyond this, citing (here) a complete coordinative listing of the 230 members of a high school graduating class. Graduation lists were not what I had in mind as data, but I have to admit that this was punctuated as a grammatical English noun phrase, and it got printed.

But then Nicholas Widdows pointed out to me out that Rabelais slings around more than a few mean coodinate structures in his masterpiece of smut, Gargantua and Pantagruel (a filthy work which I hereby forbid you to look at): there is a coordinative listing of 225 games Gargantua played in book 1 chapter 22 (not quite topping Beaver's graduation list), and another that gives the 433 different kinds of cod ("ball-bag" in the Penguin translation) in book 3 chapter 28. [I was told this, anyway, but I didn't look it up. It turns out to be false, as reported by Language Hat: the list is not presented as a coordinate structure, it's just a list. --GKP, 4/8/04.]

So, with Nicholas Widdows wresting the championship away from Beaver, Nathan, and Crawford, we have reached 433, an order of magnitude above 42. And I think it is time to take stock of what we're doing here. When linguists say that there appears to be no limit on how many coordinates a coordinate structure may have, they generally go on to assert that this means the set of all possible sentences is infinite. I don't actually buy that: there also appears to be no anatomical limit on how long a worm can grow either, but that doesn't mean there are infinitely many worms, or even that there could be in principle. It all comes down to whether you choose to regard sentences as concrete like worms (registered by the senses, produced in real time in a manner that consumes energy) or as abstract like the integers (free of the spatiotemporal realm, unlimited except by the laws of logic, not necessarily registerable by anyone's senses even in principle). And that's metaphysics. There may be sound arguments for being a finitist nominalist (one who thinks everything in the universe is concrete and there's only a certain limited amount of it) or being a platonist (one who thinks there are abstract objects outside of space and time), but there aren't linguistic arguments that can settle such questions.

Nonetheless, linguists ain't just whistling Dixie when they say there are no linguistic limits of number of coordinates. To propose a grammar for English (or any natural language) that set a syntactically imposed length limit (in number of coordinates) on coordinate constructions would be a mistake. Even if you insist that you will not believe in coordinate structures of length n until you have actually seen one attested in speech or print, you will find that the attested cases go up to values of n that are really quite enormous. The class of all English sentences may be regarded as indefinitely huge though ultimately finite (like the set of all pine needles) or actually infinite (like the set of all integers), but either way, one thing is clear: no grammar that names a specific number as the maximim number of coordinates permitted in a coordinate structure is a correct grammar.

Posted by Geoffrey K. Pullum at 05:29 PM

When Smart People Get Really Stupid Ideas

Taken together, the links that Mark posted go a long ways to discrediting the arguments that Samuel Huntington makes in his Foreign Policy article "The Hispanic Challenge." But while Huntington is certainly a figure whose views deserve close attention, I worry that some of the critics have been too deferential to him in this case. "Huntington is way too smart to be rejected without a sober evaluation of his thesis and evidence, " one puts it, and another, while critical, says " If Huntington has the evidence, he may be right."

But this cuts two ways -- Huntington's reputation and influence impose a high standard of scholarly responsibility. And judged in its own terms, his article is meretricious claptrap from beginning to end -- incoherent, confused, and based on unsubstantiated anecdotes and on factual claims that are just plain wrong. If the article had come in anonymously over the transom, rather than being submitted by the journal's founder, it's hard to believe the editors would have given it a second look.

For a point-by-point refutation of these arguments, which are really just recyclings of old canards, you can look at the American Prospect article on English-only that I wrote a few years ago, or visit Jim Crawford's excellent site on language policy, which offers one-stop shopping on this issue. But a few examples will help to give the idea.

Huntington argues that "The size, persistence, and concentration of Hispanic immigration tends to perpetuate the use of Spanish through successive generations." He claims that " If the second generation does not reject Spanish outright, the third generation is also likely to be bilingual, and fluency in both languages is likely to become institutionalized in the Mexican-American community." Hispanic leaders, he says, "are actively seeking to transform the United States into a bilingual society," As their numbers increase, he says, "Mexican Americans feel increasingly comfortable with their own culture and often contemptuous of American culture." He points to the profound cultural differences between Hispanics and Anglos. Ultimately, he says, "Spanish is joining the language of Washington, Jefferson, Lincoln, the Roosevelts, and the Kennedys as the language of the United States." We are at risk of becoming "a country of two languages and two cultures. It would… be the end of the America we have known for more than three centuries."

In point of fact, though, all the evidence suggests that Hispanics are learning English very rapidly -- more rapidly than the Germans and other groups did at the turn of the century. There's also no evidence that the rate of Spanish retention is higher than the rate of retention for other groups. This was the clear finding of an extensive study by Alejandro Portes and Lingxin Hao of 5000 second-generation Hispanic children in San Diego & South Florida. Overall, they found that 95 percent of the children speak English well and that 40 percent speak no Spanish. (In fact the rate of retention is far lower for the second-generation Cubans than for Mexicans, and a far larger proportion of the Cubans -- 83 percent -- prefer use English among themselves. That should lay to rest Huntington's claim that Miami offers an example of the bilingual future that's in store for America if we don't take action.)

Or take bilingual education, which Huntington describes as "cultural maintenance programs" aimed at enabling students to function without learning English. In fact 99 percent of bilingual education programs are transitional, and even at their peak, bilingual programs enrolled fewer than 30 percent of Limited English Proficiency (LEP) children (the figure is lower now).

An even more absurd claim is that dual-language immersion programs have the effect of " making Spanish the equal of English and transforming the United States into a two-language country." As it happens, the Center for Applied Linguistics keeps track of these programs. There are a few hundred of them nation-wide, which serve perhaps 20,000 students in all -- half of them middle-class Anglo kids whose parents want them to achieve true bilingual competence. (By contrast, at the turn of the 20th century fully 6 percent of American elementary school children received most or all of their education in German.)

Other arguments are based on anecdotal observations that would hardly count as evidence in any serious political science discussion. By way of subtantiating his claim that Mexican Americans are disdainful of American culture, Huntington says:

In 1994, Mexican Americans vigorously demonstrated against California's Proposition 187—which limited welfare benefits to children of illegal immigrants—by marching through the streets of Los Angeles waving scores of Mexican flags and carrying U.S. flags upside down.

That, of course, is the same sort of logic that led right-wing commentators to label those opposed to the Iraq war as "America-haters" on the basis of a few placards at demonstrations.

But the real problems with Huntington's arguments aren't so much factual as conceptual and ideological. He plays fast and loose with the distinction between individual and social bilingualism -- if Hispanics remain bilingual, he seems to be saying, then America will become a bilingual society in which "Americans [will have to] know a non-English language in order to communicate with their fellow citizens." Well, no -- even if Hispanic bilingualism persists (and all the evidence suggests it won't), it is indisputable that second- and third-generation Hispanics will be quite capable of living their public lives in English.

What's really behind this all is the familiar assumption that the natural state for Americans is English monolingualism (perhaps with a smattering of late-acquired French or German as a middle-class accomplishment) -- a true American can't serve two linguistic masters.

This merely rehearses an antique assumption about the link between language and culture, which no linguist would take seriously -- the idea that the persistence of Spanish alone would be sufficient to perpetuate the cultural traits that Huntington ascribes to Hispanics: "the mañana syndrome," "mistrust of people outside the family; lack of initiative, self-reliance, and ambition; little use for education; and acceptance of poverty as a virtue necessary for entrance into heaven." But even if for argument's sake you accepted those stereotypes, there's no reason to assume that they would persist simply because people continued to speak Spanish in addition to English. On the contrary, Portes and Hao found that, among second-generation Hispanics, fluent bilinguals scored higher than either Spanish or English monolinguals in family solidarity and harmony, self-esteem, and educational aspirations.

You might have the sense that very little has changed since 1919, when the Nebraska supreme court upheld a law that banned the teaching of foreign languages until high school, warning against the "baneful effects" of educating children in foreign languages, which must "naturally inculcate in them the ideas and sentiments foreign to the best interests of their country." In the event, though, it turns out that anglophones can come up with plenty of those on their own.

Posted by Geoff Nunberg at 11:08 AM

April 01, 2004

The Chatterbox Challenge

The classic proposal for deciding whether a computer program has attained human-like intelligence is the Turing test: can it maintain a conversation (by text, not necessarily speech) with a real human being after which the human is unable to tell that he or she has been conversing with a program?

So far, no computer program has come close, but there are people who like to write programs that carry on conversations. These are now called chatterbots or chatbots. The fourth annual Chatterbox Challenge is now underway. You can engage any of the contestants in conversation and participate in the voting. Regrettably, the contest is limited to anglophone bots and humans.

Posted by Bill Poser at 09:29 PM

The Simputer

A while back I wrote about the role of Linux in supporting smaller and less wealthy languages. Other virtues of Linux are that there are no licensing costs and that it can readily be adapted. All of these virtues play a role in the Simputer,

a low cost portable alternative to PCs, by which the benefits of IT can reach the common man
designed and manufactured in India, and aimed initially at the Indian market.

The Simputer is designed to be portable so it can easily be shared and can be used on the go. It is powered by ordinary AAA batteries, so it can be used where electrical power is not available or unreliable. It uses cheap Smart Cards to store personal data, so people who cannot afford their own Simputer can share one.

Naturally, the Simputer supports Indian languages. Currently it provides word processing in Hindi and Kannada; other Indian languages are being added. It also allows you to write with a stylus on electronic Paper, which it will store and transmit, so you can use any writing system you like.

Because many people in India are illiterate (30% of men, 50% of women), it was designed to facilitate use by people with limited or no literacy skills:

The key to bridging the digital divide is to have shared devices that permit truly simple and natural user interfaces based on sight, touch and audio. The Simputer meets these demands through a browser for the Information Markup Language (IML). IML has been created to provide a uniform experience to users and to allow rapid development of solutions on any platform.

The Simputer includes a voice recorder, so that people can use it to send voice messages. It also includes text-to-speech software, capable of synthesizing speech in Indian languages from Unicode text.

It's a cool idea, so I'm pleased that the Simputer and I share a birthday. If you want to check out the actual product or maybe even buy one, go to the web site of the manufacturer, Amida Simputer.

Posted by Bill Poser at 08:26 PM

At least two linguistic April 1 hoaxes

Mark comments in a post earlier today: "I don't know of any memorable linguistic April Fool's hoaxes." Well, I think I know of at least two.

About a decade ago, Chris Barker released by email the abstract of a new paper he had written. It sounded fascinating: formalizing principles and parameters theory arithmetically to get a number-theoretic model of it, and using techniques from modern cryptography, he had managed to prove that the problem of discovering the right parameter settings from a set of data depended on a trapdoor function: it was as hard as breaking modern prime-number encryption. The paper had been made available at an ftp site. I downloaded it straight away (back in those dark days, it was the first scholarly paper I ever obtained by downloading). When I unzipped it, I found all I had was a polite apology to the effect that although the idea was neat he had not in truth been able to achieve this result. Appended was a recommendation to check the date of release. It was, of course, April 1.

The other was when David Pesetsky, writing from Mark's doctoral alma mater, put out what I suspect was a faked ASCII screen dump from a fictive newswire story about how the weight of the snow on the roof of MIT's Building 20 had collapsed it and buried Noam Chomsky, Morris Halle, Jay Keyser, the editorial office of Linguistic Iniquity Inquiry, and the entire MIT Linguistics department in a welter of roofing material and slush. He had quotes from Chomsky and everything. Sympathy notes flooded into the department, though I suspect slush did not. Bitten once by Barker's cruel hoax, I did not make any linguistic inquiries. The April 1 dateline had only a 1/365.25 = 0.0027 probability, after all, and April was a little late for major snow buildup even in the ghastly climate of New England. I never asked around, but I suspect that Noam passed that April 1 warm and dry, and as comfortable as that (now demolished) building ever got.

Posted by Geoffrey K. Pullum at 06:59 PM

Fair on the tunded

A CNN story headlined 'Girlfriend' photos upset prince contains the following passage.

Clarence House, the official residence of Prince Charles in London, refused to comment on The Sun's claims.

A spokesman told CNN: "It is not our policy to discuss the nature of Prince William's relationships with his friends. It wouldn't be fair on him or them."

Let's pass over in silence the residential metonymy, and focus instead on the adjectival complement structure "fair on him". I'd use "fair to him", myself. I do recognize "fair on him" as another low-frequency prepositional complement, but my rather vague sociolingustic associations with that pattern aren't consistent with use by a Royal Source. Suspecting a subtle Pythonic April fish, I decided to investigate further.

The OED doesn't have any examples of "fair on" in its entry for fair,. but it does have one in the entry for fratting, vbl. n., glossed as "Friendly relations between British and American soldiers and German women in the occupied parts of Western Germany after the war of 1939-45" "

1949 G. COTTERELL Randle in Springtime [...] II. ii. 45 You see all the men here go fratting and it simply isn't fair on us girls... I can't see what they see in these German women.

and another another in its entry for tund, v., glossed as "Winchester School slang. trans. To beat with a stick, esp. an ash rod, by way of punishment":

1876 LD. SHERBROOKE in Life & Lett. (1893) I. 12 To put a stick into the hand of a boy of sixteen and allow him to use it upon his schoolfellows..is neither fair on the tunder nor the tunded.

I'm not sure about the context of the Cotterell quote, but the sociolinguistics of tunding is clearly perfect for statements by Clarence House, and completely allayed my suspicions.

Google yields 2,030,000 hits for "fair to", vs. 237,000 for "fair on", but nearly all of the examples of "fair on" are things like "Career fair on Sept. 15" or "Black bass are fair on crankbaits and spinnerbaits near shallow rocky areas." In a sample of 100, I didn't find any examples at all of the Clarence House pattern, in which "fair on X" evaluates the equity of some event or circumstance with respect to the interests of X.

Searching for "not fair to" yields 138,000 hits vs. 7,230 for "not fair on". Most of the latter are exactly what we are looking for:

It is not fair on the monarch that many vitally important and often politically controversial "royal prerogative" powers ... should appear to be in her grasp when in fact they are exercised by ministers... [Guardian]

Oxfam sees the Trade-Related Intellectual Property Rights agreement as imposing a one-size-fits-all approach, which is not fair on developing countries. [EU memorandum]

It is not fair on me or my loyal volunteer staff who have worked so very, very hard to build this site. [Psychics and Mediums Network (UK)]

All the genuine uses of "fair on" that I checked were from the UK, Australia or India. So if "fair on" is roughly 5% of the overall total -- 7230/(138000+7230) = .05 -- and is not used in the U.S. or (I think) Canada, it must be a significant fraction of the UK-ish use. But I'm still not clear on its social stratification, if any. The Clarence House quote means it can't be non-U -- but is it U or is it pan-UK?

[Update 4/2/2004: David Nash writes:

I can confirm what you've figured out by now, that "fair on" is normal to the Australian ear. My prediction at the answer to your question is that that there's no social stratification. But then I didn't realise it would sound strange to you guys. (Yet another defect of the Macquarie Dictionary is that it lacks both "fair on" and "fair to".)

I'm not surprised if it comes up more in neg contexts (NB the top ghit for "unfair on" is from the US...) I see that the Cobuild Dic lists among typical uses this, the only with with a PP complement: "This isn't fair on anyone, but it does happen." -- and I find something odd about ?"This isn't fair to anyone, but it does happen."

"Prediction at the answer"? At first I thought, aha, another Commonwealth usage for the collection, but "prediction at the answer" gets no ghits from anywhere, nor does "prediction at the result" or "prediction at the outcome", though "prediction as to the outcome" gets 130 and "prediction for the outcome" gets 352. So "prediction at" is probably another example of low-frequency variation in the meme pool, the raw material of memetic change.

As for "fair to anyone", it sounds fine to me. Many of the 2,920 examples in Google's index are directly comparable to David's example, and seem normal to me:
"It is a direct conflict of interest... It isn't fair to anyone."
"Remember - NEVER spontaneously decide to purchase a pet - its not fair to anyone involved."

As for the top ghit for "unfair on" -- a headline from a Hartford CT paper reading "Advocate unfair on satanic abuse" -- it seems fine to me, but after reading the associated letter to the editor, I interpret it as short for "[The Hartford] Advocate ['s story is said to be] unfair on [the topic of] satanic abuse". The letter writer is not complaining that the Advocate did not treat the interests of Satanic Abuse in an equitable manner -- the interests in question are those of therapists and patients in cases of alleged recovered memories, and "satanic abuse" is just a reference to the content of those memories, and thus a useful headline tag for the whole topic of the original story.

The other ghits on the first couple of pages seem to be other topical uses of "on", locative uses of "on" ("Life's unfair on welfare" = "Life on welfare is unfair") temporal uses of "on" ("Unfair? on 03/05/2003") or texts of commonwealth origin. ]

Posted by Mark Liberman at 06:43 PM

April Fish

According to the (increasingly excellent) Wikipedia:

New Year was originally celebrated from March 25 to April 1, before the Gregorian reforms moved it back to January 1. The English first celebrated [April Fool's Day] on a widespread basis only as late as the 18th century, though it appears to have reached England probably from Germany in the mid-17th century. Its first known description in English originates with John Aubrey, who noted in 1686: "Fooles holy day. We observe it on ye first of April. And so it is kept in Germany everywhere."

The custom of playing practical jokes on April Fool's Day is also very widespread and of uncertain origins. The victim of a joke is known in English as an April Fool; in Scots as a gowk (cuckoo or fool); and in French as a poisson d'avril (April fish). It has been suggested the custom may have had something to do with the move of the New Year's date, when people who forgot or didn't accept the new date system were given invitations to nonexistent parties, funny gifts, etc. Originally, April Fool's Day jokes concentrated on individuals (sending someone on an absurd errand such as seeking pigeon's milk) but in the 20th century it became common for the media to perpetrate hoaxes on the general population. [emphasis added]

The Wikipedia also gives a long list of well-known April Fool's Hoaxes.

The French Wikipedia page on Poisson d'avril, taking a different perspective, asserts plainly that the practice "trouve son origine en France, en 1564... Cette coutume de faire des plaisanteries s'est répandue dans de nombreux pays, bien que le poisson ne soit pas toujours exporté en même temps."

Although there are many excellent linguistic jokes, and also quite a few interesting linguistic hoaxes, some of them pretty spectacular, I don't know of any memorable linguistic April Fool's hoaxes.

Posted by Mark Liberman at 05:11 PM

A whiff of evil shit brimstone

Gerald Gazdar alerted me to this interesting material on a site run by Michal Zalewski under the catchy name of http://lcamtuf.coredump.cx (it appears to be a blog maintained by a Polish programmer in California). The page might need to be consulted soon, before Microsoft lawyers track down the page author and get around to drafting minatory letters to have it shut down. Basically, what this guy did was to run a search for publicly available Microsoft Word documents that are freely downloadable from the microsoft.com domain and look in them with Unix tools to see if they contained records tracking changes or displaying material that was erased from the final draft. Ten percent of them had change-tracking records and five percent had recoverable deleted material. So he took a look at what Microsoft staff had been erasing from, e.g., press releases and essays attacking Linux. It's certainly instructive, occasionally amusing, and of course, always there is the aura of the dark side, a whiff of brimstone.

But setting aside the particular glimpse we get of the content of the cesspit of the Microsoft corporate mind, there is a general point about technology and linguistic privacy to ruminate on here. As Michal notes, Microsoft has become a victim of the dangers of its own intuitive and apparently helpful word-processing technology here; as Michal puts it, "if you come up with an intuitive technology, you must next find a way to curb its use." That's a very acute insight.

Posted by Geoffrey K. Pullum at 11:41 AM


This week's New Yorker has an article on by Jake Halpern entitled "Selling the Beat: St. Louis’s Trackboyz break a new act", which includes a some discussion of a St. Louis sound change:

One of their first clients was a young rapper named Cornell Haynes, Jr., later nicknamed Nelly, whom Williams had met at a talent show in a club. Nelly ... was the first St. Louis rapper to break nationally, with "Country Grammar," in 1999. (Nelly also initiated the widespread use of a quirky St. Louis dialect whereby "here" is pronounced "hurr" -- as in Nelly's "Hot in Herre" -- "there" becomes "thurr," and "everybody" is "urr'body.")

This description is intriguing, but it's hard to tell what the pattern really is. In most English dialects, here and there have different vowels -- in the dialect described, do these vowels followed by /r/ merge with one another and also with the final sequence of e.g. burr? In other words, do here, hair and her all merge? And Beer, bear and burr? That's certainly possible, but maybe something else is going on. A merger of the rhymes of here and there might be part of another pattern of mergers, maybe limited to particular words; or maybe they don't merge at all, but instead are just each pronounced differently from the way Halpern expected.

This is probably related to the traditional midland American pronunciations written as "hyar" for here and "whar" for where, as in the lyric

Now what you aimin to do up hyar?
What do ye think you’re gonna find?
Stranger, what did ye say yer name was?
And whar did you say you was gwine?

even though the stereotypical associations in that case are redneck rather than hiphop.

And the fate of everybody raises some other issues entirely. It's pretty common for the syllable written with orthographic "y" to vanish in this word, in versions generally indicated orthographically as ever'body, but to get to urr'body we need to lose another syllable as well. Is this because ever goes pseudopoetically to e'er and then e'er, like there, changes its vowel to what Halpern writes as "urr"? More facts are needed.

I hope that everyone can also start to see, at this point, why the International Phonetic Alphabet was invented.

I'm not sure whether the "quirky St. Louis dialect" that Halpern mentions is covered in Chapter 19 of the forthcoming Atlas of North American English. The online version of Chapter 11 does discuss some vowel mergers before /r/, just not the same ones:

A second distinctive Midland city is St. Louis... Its most distinctive traditional feature is a merger of /ahr/ in car, are, far with /ohr/ in corps, or, for, while /owr/ in core, ore, four remains distinct. Though this merger is stereotyped and in recession, it is still strong enough to act as a defining feature of the St. Louis dialect... At the same time, St. Louis is undergoing a massive shift towards the pattern of the Inland North, including the Northern Cities Shift.

Posted by Mark Liberman at 09:23 AM

Occidentalism in philology

An interesting post by Christopher Culver at Nephelokokkygia.

Posted by Mark Liberman at 06:50 AM

Just how good is the bible?

Pretty damn good!

But how good? Well, OK, since you insist, I'll tell you the answer. It rates a 9. Satisfied?

What do you mean that was just a subjective evaluation based on my own personal tastes and religious convictions? Not a bit of it. The Bible rates at least a 9, I tell you! I know, because I applied the Pullum Multiple Coordinate Construction Metric, the latest and most interesting measure of literary weight ever devised for text evaluation and intertextual comparison, and a measure which can be applied to any text which Geoff deems "respectable".

Here is proof, an MCC of length 9:

Then wrote Rehum the chancellor, and Shimshai the scribe, and the rest of their companions; the Dinaites, the Apharsathchites, the Tarpelites, the Apharsites, the Archevites, the Babylonians, the Susanchites, the Dehavites, and the Elamites, and the rest of the nations whom the great and noble Asnapper brought over, and set in the cities of Samaria, and the rest that are on this side the river, and at such a time.

King James Bible, Book of Ezra

It turns out that, respectable though the Bible is, there are better religions. In English translation from Sanskrit (an operation that we presume is PMCCM preserving) the word of the Supreme Lord Krishna  is over 30% better than the word of that other Lord as revealed through the prophet Ezra:

I am the goal, the supporter, the Lord, the witness, the abode, the refuge, the friend, the origin, the dissolution, the foundation, the substratum, and the imperishable seed.

The Bhagavad Gita , Translated by Ramanand Prasad, Chapter 7: Self-Knowledge and Self-Realisation (9.18)

Courtesy of Bhagavad-gita.org,  here is the original Sanskrit, which I've linked to an audio file of the vocals... try clicking.

[Sanskrit here, and audio link]

I wish I could read Sanskrit. But who needs religion? I have come across a story of passion, magical realism and grammar that makes the Lord Krishna seem a literary dunce. I talk of no other than Daniel Occeno's Joe the Alphabet Fisherman, writing of a very high order. 37, to be precise.

Captain Joe the Alphabet Fisherman lifted the net. Captain Joe the Alphabet Fisherman dropped what Captain Joe the Alphabet Fisherman caught into a huge basket.

Boys and girls, do you know what Captain Joe the Alphabet Fisherman caught?

You do not know?

Captain Joe the Alphabet Fisherman caught the 0, the 1, the Qq, the Aa, the Zz, the 2, the Ww, the Ss, the Xx, the 3, the Ee, the Dd, the Cc, the 4, the Rr, the Ff, the Vv, the 5, the Tt, the Gg, the Bb, the 6, the Yy, the Hh, the Nn, the 7, the Uu, the Jj, the Mm, the 8, the Ii, the Kk, the 9, the Oo, the Ll, the Pp, and the 10.

Captain Joe the Alphabet Fisherman, Daniel E. Occeno

Yet Occeno's poetic masterpiece, with it's rating of 37, and even the extrordinary 41-rated Neal Stephenson novel The Diamond Age cited by Geoff, are mere trifles. In a delightful work full of wit and vigor, utterly lacking postmodern pretension or bombast, Rachel Jones takes the Lord Grey School Magazine (2nd edition) to a literary weight of 230: that's an astonishing 2678% better than the bible. Here, in its entirety, is the ironically titled Class of 2002, Year 11:

Class of 2002, Year 11


The Lord Grey School Magazine (2nd ed.), Rachel Jones (ed.), 2002
Posted by David Beaver at 05:40 AM

Orthographic metathesis?

In a slight twist to our discussion of orthographic gemination, I found myself mis-spelling "emporer" and turned to Google for consolation. I observed the same telltale distribution:


Apparently writers know the word contains an "e" and an "o" but sometimes forget which order they go in, perhaps because both are pronounced as the schwa vowel. I searched a pronunciation dictionary for words containing adjacent schwa vowels corresponding to different orthographic vowels (e.g. bachelor, developer) but could not replicate the above pattern.

Posted by Steven Bird at 05:18 AM

Alistair and the adjective

I thought the world of Alistair Cooke, of course, and mourn his recent passing (he was 95, but hardly seemed it). I've done a few 13-minute radio essays myself, for ABC Radio National in Australia, and believe me, they're not as easy as Alistair always, right up to the end, made them sound. I've broadcasted 7; he managed 2,869 in all. Doing one a week for 58 years to any quality level at all would have been an astonishing feat. But his were usually good ones. He was a fine writer and a great broadcasting institution. I doubt that we shall see his like again.

But even he was not reliable when asked about his writing process. He is reported in the New York Times obituary to have said not long ago that when he had drafted a script he would then "beat the hell out of it, getting rid of all the adverbs, all the adjectives, all the hackneyed words." There has been earlier discussion on Language Log about the myth that you should avoid adjectives. The notion that prose would be better without them, or without adverbs, has been described as totally nuts. And the notion that there are hackneyed words doesn't make much sense either (is the a hackneyed word?).

Alistair Cooke didn't, of course, do to his prose what he said he did. Like most writers, he is to be appreciated for his work, not questioned on the topic of his creative process. He certainly didn't manage half a century of weekly radio talks on a diet of no adverbs, no adjectives, and no clichés.

Posted by Geoffrey K. Pullum at 01:59 AM

Spectacular multiple-coordinate examples

I recently asked for attested cases of coordinate structures (phrases linked by the word and or, or nor) with large numbers of coordinates (the phrases that are linked). I have received some entries that are indeed fairly remarkable. Some are simply stunning. Read on for some syntactic wonders that will convince you of the fairly important theoretical point that you do not want linguistic theory to analyze all coordinate structures the way logicians do, in terms of binary coordinators.

Several people suggested that a nice 5-coordinate example would be the famous reference of Hobbes (often quoted only in part) to how things could be in the condition that mere nature would give us; "the life of man", he avers, would be "solitary, poor, nasty, brutish, and short."

That is a nice, compact, quotable 5-coordinate case which cannot reasonably be analyzed without allowing 5 coordinate constituents of equal rank. But it is worth noting that in the original context it is contained within a much larger coordinate structure, which is quite staggeringly complex (this is why The Cambridge Grammar does not use unedited real examples for illustration most of the time!). I will put in brackets with labels showing the coordinates of seven different coordinate structures, labeled A, B, C, D, E, F, and G. The second coordinate of coordination A has no main verb, but "there is" should be understood as carried over from the first coordinate; I've popped it in at the right point in green, for extra clarity.

In such condition [A1 there is no place for industry, because the fruit thereof is uncertain]: [A2 and consequently there is [B1 [C1 no culture of the earth]; [C2 [D1 no navigation], [D2 nor use of the commodities that may be imported by sea]]; [C3 no commodious building]; [C4 no instruments of [E1 moving] [E2 and removing] such things as require much force]; [C5 no knowledge of the face of the earth]; [C6 no account of time]; [C7 no arts]; [C8 no letters]; [C9 no society]; [C10 and which is worst of all, [F1 continual fear], [F2 and danger of violent death]]]; [B2 and the life of man, [G1 solitary], [G2 poor], [G3 nasty], [G4 brutish], [G5 and short].]]"

So we actually have a 10-coordinate example there, the coordinate structure with the coordinates C1 thru C10. That coordinate structure forms coordinate B1 of the larger structure B, which itself is contained within coordinate A2 of the even larger structure A. Astounding, wonderful stuff.

However, Lance Nathan, from MIT's Department of Linguistics, tops it in sheer number of coordinates. Not with the example he quotes from William Shatner in Airplane 2, though I do like it:

I gotta say something about that guy up there, and I can sum it all up in just one word: courage, dedication, daring, pride, pluck, spirit, grit, metal, and G-U-T-S, guts!"

That's cute, but only 9 coordinates, which does not beat Hobbes. No, Lance's triumph was finding the following example from a song lyric by Andrew Lloyd Webber; the song is "Coat of Many Colors" from "Joseph and the Amazing Technicolor Dreamcoat":

It was red and yellow and green and brown
And scarlet and black and ocher and peach
And ruby and olive and violet and fawn
And lilac and gold and chocolate and mauve
And cream and crimson and silver and rose
And azure and lemon and russet and grey
And purple and white and pink and orange
And red and yellow and green and brown
Scarlet and black and ocher and peach
And ruby and olive and violet and fawn
And lilac and gold and chocolate and mauve
And cream and crimson and silver and rose
And azure and lemon and russet and grey
And purple and white and pink and orange
And blue.

If we take that as a whole, it might appear to have 57 coordinates, which would definitely make Lance the current world champion multiple coordinate examplefinder. But unfortunately, after the first 28 coordinates, at the word Scarlet, it stops syntactically, and begins to repeat: there is no and after brown in the 8th line to continue the coordination; the song just goes back to a repeat starting from scarlet in line 2. So it isn't a single uninterrupted coordinate structure. It concludes, after a repetition of the first 28, with a final extra one, and blue. So the whole thing comprises two coordinations, the first being the complement of was. The second (starting at Scarlet), which is a coordination of either adjectives or nouns is a 29-coordinate example.

And that is not enough to earn the world record for Lance. The world champion multiple-coordinate example hunter, so far, is M. Crawford, who sent me this spectacular passage (about a bottled condiment of some sort) from Neal Stephenson's 1995 novel The Diamond Age:

If the manifest of ingredients on the bottle had been legible, it would have read something like this:

Water, blackstrap molasses, imported habanero peppers, salt, garlic, ginger, tomato puree, axle grease, real hickory smoke, snuff, butts of clove cigarettes, Guinness Stout fermentation dregs, uranium mill tailings, muffler cores, monosodium glutamate, nitrates, nitrites, nitrotes and nitrutes, nutrites, natrotes, powdered pork nose hairs, dynamite, activated charcoal, match-heads, used pipe cleaners, tar, nicotine, singlemalt whiskey, smoked beef lymph nodes, autumn leaves, red fuming nitric acid, bituminous coal, fallout, printer's ink, laundry starch, drain deaner, blue chrysotile asbestos, carrageenan, BHA, BHT, and natural flavorings.

That is a completely clear example of a coordinate structure with 41 coordinates arising in a natural context and printed in a novel. (It is actually an example of what The Cambridge Grammar calls layered coordination: the 18th coordinate is itself a coordinate structure (nitrotes and nitrutes). Congratulations to M. Crawford, the champion! His prize will be sent to him soon, unless...

Unless someone beats him. The contest now moves to a new level. We are searching for an attested coordination, published in a respectable print source (with respectability defined by me), having a number of coordinates that equals or exceeds the familiar number that in The Hitchhiker's Guide to the Galaxy turned out to be the answer to the ultimate question of life, the universe, and everything: 42.

Posted by Geoffrey K. Pullum at 01:32 AM