Language Log: May 2006 Archives

May 31, 2006

"What up, Nick--?"

Last June in the Howard Beach neighborhood of Queens, New York, 19-year-old Nicholas "Fat Nick" Minucci beat black 23-year-old Glenn Moore with a baseball bat. Details are being hashed out in trial as to exactly what the sequence of events and motivations were: Moore admits he was in the neighborhood to steal cars, and Minucci and some friends claim that Moore tried to rob one of them, sparking the later baseball bat attack. But what's getting all the attention is that Minucci used "the N word" while beating Moore.

Or, perhaps just before: Minucci's version is that when Moore tried to rob him, he said "What up, n-----?" To the prosecutors, this means Minucci committed a hate crime that could get him sent up the river for years. The defense, however, are claiming that n------ is now heard so often that it is merely slang, and is no longer a "bad word."

The idea that Minucci's linguistic impropriety is more evil (and interesting) than the beating of anyone for any reason is one thing -- and a sad one, if you ask me (and for the record I am black). However, the defense's cute feint that n----- no longer carries a sting because it's all over rap albums and black men use it with each other is, well, b------- and they know it.

It's very simple. Long ago (and long before the 1980s, contrary to what many seem to think), black people and especially black men recruited n----- as an in-group term of endearment. It was a way of taking the sting from the slur. Today, the word signifies that said n----- is "one of us," no higher than the rest of us. N-----is a democratizer -- among black people.

Indeed, this means that white people are not allowed to call black people niggers (see, I can say it!). This is not difficult. This kind of thing happens in language. In Japanese, for example, instead of the meaning of a word varying with the situation, the word itself can. To give is AGERU if I give something to people, but it's KURERU if someone gives something to me -- but then if I give something to a high-placed person then I SASHIAGERU, and if a high-placed person gives something to me then they KUDASARU it to me.

Well, in English, when said by a white person to a black one n----- is an insult, while if black people use it among themselves, it's a term of endearment. Just as rank is the grand obsession of the Japanese (or at least used to be), race has been the grand obsession of America since the 1960s. Naturally, ceremonial linguistic rituals will arise on its basis, and internalizing them becomes part of the national identity.

When this goes as far as whites getting fired for even uttering the word in reference to it rather than actually wielding it, we have slid into the realm of senseless taboos of the sort that make remote tribal ones look so foreign to us. But the basic idea that a term is used only among groups corresponds neatly to universals of human social bonding and group definition.

If we all hear it "around" more lately, it's mostly on recordings of black men addressing and referring to one another. The rule nowadays even extends to other groups using n----- among themselves -- I have heard it used by Latino, Filipino and Asian teens. Okay. Among a new generation of hip-hop obsessed white kids, one even hears them using it with each other. But still, they cannot use it with a black person anymore than a Japanese person can AGERU something to the Emperor.

Minucci and his defense team would have it that if Minucci said "What up, n-----?" to Moore, then he was just using common slang like fella, or the Russian MUZHIK, which means peasant but is used affectionately to mean FELLOW or GUY rather like n----- is. That is, they want us to believe that n----- no longer carries any racial meaning, just as DUDE, now used among women as well as among men, is becoming gender-neutral.

I suppose that would be nice, but we aren't there yet, no matter how many hip hop albums younger white kids have heard, and no matter how much some of them may think of themselves as, on some level, black. Minucci broke the rules on how n----- is used — which is especially clear if he indeed used it repeatedly while beating Moore. Imagine him punctuating each blow with, say, "Pal! Pal! Pal!" A term of endearment? And even if it was just a single "What up, n-----?", them was fightin' words and Minucci and his defense know it very well.

Posted by John McWhorter at 11:00 PM

Grunt and Grumble: sociolinguistic speculation at Slate

A few months ago, it was Jason Horowitz telling us about the "City Girl Squawk" of younger urban women ("The Affect: sociolinguistic speculation at the NYO", 3/22/2006; "Further thoughts on The Affect", 3/22/2006). Now it's Jon Katz telling us about the "Grunt and Grumble" of older rural men. ("Grunt and Grumble: Why do men in the country talk that way?", Slate 5/29/2006).

Both pieces are interesting examples of the genre of verbal caricature. I don't have time to say much about Katz's characterization of rural men right now, since I've got an appointment in a few minutes on the other side of campus (for a cute Language Log post about older-generation rural speech, see this one). But I'll mention one thing that struck me. Horowitz's caricature was aimed specifically at young women in the American northeast; Katz's caricature is pitched as a characterization of all rural men, although his observations are apparently limited to his neighbors in upstate New York.

Upstate New Yorkers are not exactly core Yankees, but they're not that far out in the fractal penumbra of Yankeehood; and there's an old stereotype of rural Yankees, represented by various anecdotes about Calvin Coolidge -- "silent Cal".

Coolidge was both the most negative and remote of Presidents, and the most accessible. He once explained to Bernard Baruch why he often sat silently through interviews: "Well, Baruch, many times I say only 'yes' or 'no' to people. Even that is too much. It winds them up for twenty minutes more."

[...]

Both his dry Yankee wit and his frugality with words became legendary. His wife, Grace Goodhue Coolidge, recounted that a young woman sitting next to Coolidge at a dinner party confided to him she had bet she could get at least three words of conversation from him. Without looking at her he quietly retorted, "You lose." And in 1928, while vacationing in the Black Hills of South Dakota, he issued the most famous of his laconic statements, "I do not choose to run for President in 1928."

And on his first day of retirement, back in his small hometown in Vermont, it's said that he went down to the local store, made his selections, and checked out via the following exchange:

Store owner: [rings up purchase, displays total] Been away.
Coolidge: [counts out money] Ayuh.

But I don't think you'd hear that, even as a joke, in rural Texas.

Posted by Mark Liberman at 10:09 AM

Congratulations to Joseph Aoun

Boston, MA, 7:00 a.m., Wednesday
Joseph Aoun, the highly successful Dean of the College of Letters, Arts and Sciences at the University of Southern Califonia, will soon become the first Professor of Linguistics to assume the top position (president or chancellor) in a major university in the USA.

He has just been named the next president of Northeastern University here in Boston. Congratulations to both him and Northeastern.

Aoun earned his PhD in the Department of Linguistics and Philosophy at MIT, and has served for six years as a dean. He is noted for his an excellent record in fundraising.

Further details in the Boston Globe.

Aoun is the first Professor of Linguistics to become a university president in this country, but not the first holder of a PhD in linguistics. The Swedish-born Nils Hasselmo earned a PhD from the Department of Linguistics at Harvard University in 1961, and later served from 1989–1997 as the president of the University of Minnesota. His university posts, however, were as a professor of Scandinavian languages and literatures. And Father Lawrence Biondi, S.J., who earned an MA in linguistics (1966) and a PhD in sociolinguistics (under Roger Shuy, in 1975) from Georgetown University, and since 1987 has been the very successful president of St Louis University, a Jesuit university of moderate size (11,000 students) in Missouri — and the oldest university west of the Mississippi. Father Biondi has been active in fields other than linguistics (notably theology and university administration) since earning his doctorate.

Other high-ranked linguists who have held US university administrative positions have not been permanently appointed at ranks higher than that attained by the late Victoria Fromkin, who had the title Vice Chancellor for Graduate Programs at UCLA in addition to being graduate dean. It should be noted that Sheila Blumstein served for a while as Interim President at Brown University; Susan Steele was vice provost at the University of Connecticut and then provost at Mills College; Samuel Jay Keyser was an associate provost at MIT; a significant number of professors of linguistics have held deanships (probably a dozen or more); and Alfred Bloom, the current president of Swarthmore College, started at Swarthmore as an assistant professor in psychology and linguistics and is thus certainly an honorary linguist (though his degrees are in adjacent fields: a BA from Princeton in Romance languages and literatures and a PhD from Harvard in psychology and social relations).

Posted by Geoffrey K. Pullum at 07:09 AM

GAN: Whodunnit, and how, and why?

[Victor Mair sent in further analysis of a common but spectacular mistranslation, discussed in earlier LL posts: "A less grand Chinglish" 5/30/2006, which dealt with a button labelled "dry fry" in Chinese and "fuck to fry" in English; and "Engrish explained", which discussed a menu item reading "Hot and spicy garlic greens stir-fried with shredded dried tofu" in Chinese, but "Benumbed hot vegetables fries fuck silk" in English, 3/11/2006. Victor's note follows. ]

The translation of GAN as "fuck" is fairly ubiquitous in China. There are complications, of course, since GAN1CHAO3 on the sign I wrote about must mean "dry fry," with GAN1 in the first tone, whereas GAN meaning "fuck" probably derives from GAN4 ("to do") in the 4th tone. This latter word, furthermore, is written with an entirely different character in the traditional script (幹), though GAN1 and GAN4 have both collapsed into the same three-stroke calendrical graph in the simplified script (干). Furthermore, the actual sign from which I took this example has an arrow next to the GAN1CHAO3 / FUCK TO FRY which seems to be pointing to a button that you're supposed to PUSH to start the frying. Still, if naughty people are intentionally producing these risque, nonsense translations, then the double entendre of GAN1/4 ("dry / fuck") must be taken into serious consideration.

These sites show how widespread the mistranslation of GAN1/4 as "fuck" is:

http://www.flickr.com/photos/xiaming/70761148/
http://www.cameraontheroad.com/?p=1010
http://pangea.stanford.edu/~pvermees/chinglish/index.html (select "Fuck the price" in the radio box).
http://www.alwayson-network.com/comments.php?id=P14329_0_6_0_C

Just google {Chinglish fuck} and you'll get a lot of results.

I am trying to make sense of how this phenomenon actually came about. It seems that the twenty or so different meanings of the three-stroke calendrical graph that is used to write GAN1/4 (a total of three distinct graphic forms in the traditional script -- 乾, 幹, 干 -- all reduced to one -- 干 -- in the simplified script) in Chinglish have all collapsed into the single meaning of "fuck". Wherever that graph occurs, Chinglish speakers will translate it as "fuck".

This is an extremely bizarre situation, because:

a. normal Chinese-English dictionaries do not even give this definition

b. the widespread rendition of GAN1/4 as "fuck" in all sorts of situations where other translations are called for occurs on restaurant menus, official notices, and so forth, and it is not likely that the proprietors would intentionally want to insult or embarrass their patrons

Who's telling the menu-makers and sign-painters to write "fuck" for GAN1/4? They probably don't even know English and probably don't know much Chinglish either. How did this get started? (Perhaps somebody was being intentionally mischievous.) And how did it become such a common phenomenon? That's the real mystery. How is this horrible mistranslation continuing to spread and not being caught by the tens of millions of Chinese who do speak good English?

I'm deeply interested in the linguistic mechanics and the sociolinguistics of this baffling phenomenon. It is almost beyond belief that GAN1/4 as "fuck" proliferates when there are so many other good translations available in different contexts. You'd think that at least they'd write "do" everywhere, or that people who do know English would tell the proprietors to hurry up and change the offending word so as to avoid further embarrassment!

[Guest post by Victor Mair.]

[Update -- Brendan O'Kane wrote:

Hi - long-time listener, first-time caller.

I've been living in Beijing and working as a free-lance translator for some time now. It's pretty common for clients to take a text, hand it off to a (cheaper) Chinese translation company, and then pass it on to me to 'edit' - a lower-paid gig - and so I've seen quite a lot of this kind of thing. My guess, with the disclaimer that Prof. Mair has forgotten more than I'll ever know about Chinese, is that someone ran the Chinese 干炒 in the example in your blogpost, and the menu posted on rahoi.com, through a machine translation program, perhaps Jinshan Kuai Yi or something of the sort, with the offending results.

Can someone verify that there is a common Chinese-English MT program that maps GAN to "fuck"? That would explain a lot, if it's true. The puzzle then would become why outraged customers have not forced a modification of the software... ]

Posted by Mark Liberman at 05:31 AM

May 30, 2006

Supreme Freedom of Speech

Internal conflict is not common at Language Log Plaza. But you would think that when one of our writers disagrees with one of the high mucky-muck elders, there could be all hell to pay. And once in a while conflicts actually happen here, right in our hallowed halls. You may recall that last month there were lots of Language Log posts about the Harvard plagiarist, Kaavya Viswanathan. Geoff Pullum thought the young student/budding chick lit author should be hogtied for copying passages word for word from an earlier novel (see here). Bill Poser wasn't as sure about this and came to her defense (well, sort of anyway) (see here). The rest of us hunkered down at our desks, waiting for a donnybrook to take place. It didn't. Life went on as if no conflict had ever happened (we're special that way).

Why are we so calm and dignified at Language Log? Because we Loggers are civilized people who don't mind disagreeing with each other. No sweat. On the the next exciting topic! But it's a good thing that Language Log isn't part of the government. The Supreme Court just nailed a public employee who claimed that he had been denied promotion for challenging the legitimacy of a search warrant (New York Times). A Los Angeles deputy prosecutor complained to his boss that he had found some serious misrepresentations in an affidavit that his office used to get it. Apparently not a good move, since the deputy prosecutor shortly afterward was reassigned and denied promotion, for which he promply filed a grievance. He lost in federal court but later prevailed in the Court of Appeals, which upheld his claim that his freedom of speech rights had been violated. Off the case then went to the Supreme Court, which ruled against the deputy prosecutor, claiming that when people enter government service, they have to accept certain limitations on their freedom. The deputy prosecutor was said to be acting in his official capacity, not as a private citizen, when he made his internal complaint about the inadequate search warrant. The Supreme Court's 5 to 4 decision split along the lines of ... well, you probably know how it split without my mentioning it. Many feel that, among other things, this decision could cast an uncomfortable pall on whistle blowers in the future.

So the Supremes now tell us that our right to freedom of speech has some pretty strong limits, especially if you happen to be a government employee. Fortunately, those of us at Language Log work in the private sector where we can disagree with each other all we want, even though we seldom do. We have freedom of speech. That's what democracy is all about -- except for government employees. But now that I think about it, aren't Justices Souter, Ginsberg, Stevens and Bryer also public employees, disagreeing with the majority? I wonder if ... naah.

Posted by Roger Shuy at 11:49 PM

So the search engine can understand

An article by Steve Lohr in the New York Times last April 9, about the way newspapers are using duller headlines online to make sure they get the right pickup by news-hunting web crawlers, contains the following quote from the head of product development and technology at BBC News Interactive, Nic Newman:

"The search engine has to get a straightforward, factual headline, so it can understand it," Mr. Newman said.

Now, if I seem a bit over-cautious here, keep in mind that BBC News is the organization that brought you the telepathic parrot and the three-headed frog, and Language Log is a little bit concerned that loonies have infiltrated the fine organization in question. But if Mr Newman's remark here is taken at face value, he would appear to believe that search engines understand things.

I will present no view here on whether machines might or might not be in principle capable of understanding (you'll want to vote yes if you go with Turing, no if you go with Searle; for now, I'm neutral), but my understanding is that search engines today, on this planet, cannot conceivably be described as understanding anything at all. The headline scanners of Google News do scan headlines and the first paragraphs of stories, and they do pick up enough information to classify the stories (the Google News page is put together entirely by machines, which is a really remarkable achievement). But the scanners simply look for words (letter strings) that are of normally low frequency and thus might be clues to the topic at hand. (For example, they conclude nothing at all from finding the, which occurs in nearly all sentences, but they conclude quite a lot from seeing Iran, which in texts on most subjects is rare.) They don't read for content, get the drift of the story, compare the sense of the paragraphs with their background knowledge and common sense, and chat about the issues with their friends. They tabulate letter strings and do statistical computations.

The very least one has to admit about machine understanding is that there is a big difference between a search engine algorithm and a genuine understander like you or me — and I'm not saying it necessarily reflects well on me. If you switch a Google-style search engine algorithm from working on English to working on Arabic, it will very largely work in the same way, provided only that you make available a large body of Arabic text from which it can draw its frequency information. (I have actually met people working at Google on machine processing of stories in Arabic. They do not know how to read Arabic. They don't need to.) I, on the other hand, will become utterly useless after the switch. I will no longer be able to classify news stories at all (I don't even know the Arabic writing system, so I can't even see whether Iran is in a paragraph or not).

Call the machines cleverer, or call me cleverer, I don't care, but we're not the same kind of animal, and it seems to me that the verb understand is utterly inappropriate as a term for what Google News algorithms do.

[Added later: People from the programming culture have been mailing me to point out that a metaphorical use of the word ("The compiler won't understand that unless you put brackets round it") is commonplace among programmers. And if Steve Lohr is a programmer, the above could well be regarded as unfair. Maybe so. In that case, just ignore the above cautions. But be very aware that metaphor is in play. Google's algorithms are ingenious and they work very well; but they understand things only in a very attenuated metaphorical sense under which you might also say that a combination door lock set to 4357 understands you when you punch in 4357 but not when you punch in 4358.]

Posted by Geoffrey K. Pullum at 05:35 PM

A Less Grand Chinglish

[Guest post by Victor Mair]

(Signs in Photographs Taken by My Student Carley Williams during Her Travels in China)

For each item I give the Chinglish sign, identification of the site where it occurred [in double parentheses ((xxx))], pinyin transcription, literal word-for-word translation, and then an idiomatic English translation; sometimes I omit the latter when the meaning of the word-for-word translation is sufficiently clear.

1. THEREOUT PULL IN GOT ON

((at a Taishan [Mt. Tai] cable car entrance))

   YOU2 CI3  JIN4  ZHAN4   CHENG2 CHE1
   from here enter station board  car

"Enter the station here and board the car."

2. GODCHOSEN TRAVEL SERVICE

((on a banner held by a guide))

   TIAN1  ZUO4 LYU3YOU2 
   Heaven-made Travel

3. The stairs and Pu Jiang Hotel of the carving arm-rest is like the long history

((on a wall next to a staircase in a hotel))

   DIAO1HUA1 FU2SHOU  DE LOU2TI1, YU3 PU3JIANG1   DE LI4SHI3 YI1YANG4 YOU1JIU3 
   carved    handrail 's stairs   and north-river 's history equally  old/long

"This staircase with carved banister has a history as old as that of the Pujiang Hotel."

4. My beauty comes from your painstaking care and attention

((at a scenic vista))

   WO3 DE MEI3LI4 LAI2  ZI4  NI3 DE JING1SHEN2 HE1HU4
   my     beauty  comes from your   spirit     protection

"The beauty of these natural surroundings depends upon your conscientious care."

5. Those who suffer from high blood pressure, mental disease, horrifying of highness and liquor heads are refused.

((notice at the entrance to a ride in an entertainment park))

   HUAN4     YOU3 XIN1ZANG4BING4, GAO1XUE4YA1,         JING1SHEN2BING4, KONG3GAO1ZHENG4 
   afflicted have heart disease,  high blood pressure, mental illness,  vertigo

   JI2 XU4JIU3ZHE3              XIE4JUE2 CHENG2ZUO4
   and those who are inebriated decline  ride

"Those who suffer from heart disease, high blood pressure, mental illness, or vertigo, and those who are drunk are not permitted to ride."

The next sign is in a class of its own. It comes from a photograph taken by another of my students named Jeisun Wen. Jeisun encountered this sign in a restaurant that he went to with his girlfriend. Neither of them could figure out what the sign was instructing them to do. I've shown this sign to scores of people but nobody can understand what it means. Because of my long experience in reading ancient Chinese manuscripts, I was able to decipher this most mystifying Chinglish sign within a couple of minutes.

FUCK TO FRY

(written all in capital letters just that way)

This sign is located at the corner of a panel in the center of which is found a tray labled "CANDIED FRUIT". The corresponding Chinese text for "CANDIED FRUIT" is MI4JIAN4 LING2SHI2, which does indeed mean "candied / preserved [lit., honey] fruit snacks", so there's no real problem there, except that it's a bit odd to say "candied fruit snacks", since MI4JIAN4 traditionally would have been used by itself to signify a type of snack, and there is no need to specify MI4JIAN4 as LING2SHI2.

Now, on to the solution of the difficult part. The corresponding Chinese text for "FUCK TO FRY" is GAN1CHAO3,lit., "dry fry," which doesn't help us to unravel the "FUCK TO FRY" knot. I believe what happened is that a Chinese person asked an English speaker what to write below the GAN1CHAO3 sign that would more or less equal it. The English speaker must have told them to write PUSH TO FRY, i.e., push a button at the corner of the table to heat up the MI4JIAN4 on the tray. Unfortunately, when the Chinese sign painter did the lettering for PUSH TO FRY, P morphed into F, S morphed into C, and H morphed into K (such things can happen when one's handwriting is not perfectly clear!), and the rest is history, immortalized in this eternally perplexing instruction: FUCK TO FRY.

[Guest post by Victor Mair]

[Comment by myl: I never thought I would be in a position to amend Prof. Mair's Chinese philology, even indirectly! In an earlier Language Log post by Ben Zimmer, "Engrish explained", you'll find a related puzzle, the scanned menu item

taken from a blog post by Jon Rahoi, an American living in China. One commenter accused Rahoi of photoshopping it; but "an anonymous professor of China studies" rescued Rahoi by offering the following explanation, reproduced below:

Take #1313, "Benumbed hot vegetables fries fuck silk." It should read "Hot and spicy garlic greens stir-fried with shredded dried tofu." However, the mangled version above is not as mangled as it seems: it's a literal word-by-word translation, with some cases where the translator chose the wrong one of two meanings of a word.

First two characters: "ma la" meaning hot and spicy, but literally "numbingly spicy" -- it means a kind of Sichuan spice that mixes chilies with Sichuan peppercorn or prickly ash. The latter tends to numb the mouth. "Benumbed hot" is a decent, if ungrammatical, literal translation.

Next two: "jiu cai," the top greens of a fragrant-flowering garlic. There's no good English translation, so "vegetables" is just fine.

Next one: "chao," meaning stir-fried, quite reasonably rendered as "fries" (should be "fried," but that's a distinction English makes and Chinese doesn't).

Finally: "gan si" meaning shredded dried tofu, but literally translated as "dry silk." The problem here is that the word "gan" means both "to dry" and "to do," and the latter meaning has come to mean "to fuck." Unfortunately, the recent proliferation of Colloquial English dictionaries in China means people choose the vulgar translation way too often, on the grounds that it's colloquial. Last summer I was in a spiffy modern supermarket in Taiyuan whose dried-foods aisle was helpfully labeled "Assorted Fuck." The word "si" meaning "silk floss" is used in cooking to refer to anything that's been julienned -- very thin pommes frites are sold as "potato silk," for instance. The fact that it's tofu is just understood (sheets of dried tofu shredded into julienne) -- if it were dried anything else it would say so.

I believe that this explanation applies to "FUCK TO FRY" as well, and is simpler than the letter-substitution theory.

Also see "A grander Chinglish", "Regale in Basilica"; and from the other side, "Semen, green rice and the rate of internet decay" ]

Posted by Mark Liberman at 10:39 AM

Rhythms of the blogosphere

Last year ("Language: the anti-beer?" 4/23/2005) I mentioned something that's obvious to anyone who tries blogpulse -- the blogosphere, like the ocean, has rhythms on several different time scales. Comparing, say, "paper" and "movie", we can see an inverse correlation at two of these scales:

There's a week-vs.-weekend pulse, which we can also see in pairs like "work" vs. "fun":

And in the case of "paper" and "movie" there's also a semester-sized rhythm, with an decrease of school-related concerns relative to leisure during the Christmas break, and an increase at the end of the spring semester -- and then there's the leading edge of the summer holidays.

The same sort of rhythms are apparent in Language Log's visits and page views. Language Log's weekly rhythms traditionally correlate, alas, with "work" as opposed to "fun" -- up during the week, down on the weekends (ignore the fractional-day numbers for today, May 30):

The weekend of May 21 was something of an exception, due to traffic associated with the opening of The Da Vinci Code.

On a larger time scale, we can see the (negative) effect of holidays superimposed on a general positive trend:

Not all words in the blogosphere resonate to the semesterly rhythm: "work" vs. "fun" seems to show the more local effect of grown-up holidays:

Down in the Language Log marketing department, Arnold Zwicky keeps yelling "come on, enough with the lexicostatistics and grammar already, it's after Memorial Day, we need more movie reviews and travel features!"

Temporarily ignoring this sensible prescription, I'll observe that there's an opportunity here for some interesting lexico-temporal hacking. "Latent Semantic Analysis" and similar techniques find useful relations among words based on the eigenstructure of a term-by-document matrix; does adding the dimension of time contribute anything that is not already implicit in the distribution of words across (atemporal) documents? There are some large weblog and webforum databases where this could be explored.

[Update -- Bob Carpenter emailed a pointer to Krisztian Balog and Maarten de Rijke, " Decomposing Bloggers' Moods", 3rd Annual Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics (at WWW 2006). This paper applies ARIMA time-series analysis to "20 million mood-annotated blog posts harvested between June 2005 and March 2006". The authors draw four conclusions:

(i) there is a clear overall decline in the usage of mood annotations; (ii) weather phenomena and holidays have a clear impact on the profile of some moods; (iii) looking at the relative counts, we observe that some moods are stationary, while others decline or climb; and (iv) several moods display changes in their cyclical or seasonal component during the period covered by our data.

This seems both plausible and interesting, but it's not what I was suggesting. My idea was that by modeling word-co-occurrence data relative to significant periodicities, such as weekly or seasonal rhythms, you could learn something about the distributional implications of word content beyond what would emerge only from considering a-temporal co-occurrences.]

Posted by Mark Liberman at 09:38 AM

May 29, 2006

Tolstoy enlisted to sell Viagra

I just received a spam email (and it got through the filters) containing part of Chapter 18 of Tolstoy's Anna Karenina. It was included as random plain text, to fool the spam filters into thinking it was a perfectly ordinary email from a Russian aristocrat. This does worry me a bit: I'm not sure I want to re-train spamassassin to deep-six all messages containing fine writing, richly delineated characters, and depth of emotion. Seems like throwing the baby out with the bathwater.

The main content of the spam, of course, was contained in an image showing an advertisement aimed at getting me to purchase pharmaceuticals that would give me "an unbelievable sex during all the night". There are certainly unbelievable aspects to its picture of sex.

The message asks me a number of rhetorical questions, the first two of which are: "Wanna be the first in her list? Are you dreaming about her friends beating your time?"

We appear to be talking about a chick with a stopwatch who has so many friends-with-benefits that she has to keep a list. And my goal is taken to be to rise to the top of that list.

The message then asks, provocatively but incomprehensibly: "Wanna her making all your dreams come true in the bed?" I think actually I wanna them making all my emails come understandable in the translation.

But let's move on. The next topic has to do with what other people say: "Would you like to hear from the babes, ‘he was the best man in my life’?" Well, note (for this is Language Log) the third person singular. If the babes are saying, "He was the best man in my life" to me, they're talking about someone else (someone who, if they're all saying it, has bedded all of them).

Perhaps the intent is to ask me whether I would like to hear from the babes that one of their number had been talking about me and had said in that context "He was the best man in my life", meaning by "he", me? Well, I invite you to note also (for this is Language Log) the preterite (or simple past) tense. If she (or they) said "was" rather than "is" when talking about me, then apparently I'm history already. (Who has displaced me, in this scenario? All those "friends" of hers beating my time, I guess.) So the answer is, since they ask, no, I definitely do not want to learn that the babes have been saying "He was the best man in my life."

Following this indirect-discourse puzzler, the message moves directly to the heart of the matter, or rather (for its priorities are clear), the phallus of the matter. It turns out that "your hypersexuality doesn't depend on the size of your penis, it depends on its ability to keep its hard-on up to several hours! And that's the way to deliver the best orgasm to her!"

So there we have it. Hours and hours of pounding, that's what those babes want. Relentless, unstoppable, bed-breaking hammering from a guy (size doesn't matter) with a multi-hour chemically-induced erection that beats all the time records set by everyone else in the long lists of timings and orgasm counts that the little sluts all keep in their diaries. This is the picture of mutual sexual pleasuring that is offered by these people, whom we are expected to trust in matters of pharmacosexual advice.

What we have here is a case of someone trying to sell generic Viagra using advertising copy written by sexually inexperienced male illiterates with small peckers. And the copy comes wrapped in Anna Karenina now. I'm so glad Tolstoy didn't live to see this.

Posted by Geoffrey K. Pullum at 10:09 PM

Selling ignorance

We Americans love to learn about how ignorant we are. At least, you'd think we did, given the steady pulse of news stories about how we can't find Afghanistan on the map, enumerate first-amendment freedoms, and so on. There are some other motivations, I guess, including the traditional sport of grumbling about cultural decay, and educators' interest in persuading the populace that we got trouble; but whatever the reason, there seems to be a small industry whose product is press releases suggesting that most of us are about 10 SAT points above grunting and bashing one another with sticks.

The First Amendment to the U.S. Constitution is a particularly easy peg to hang such stories on. It's important, but complicated -- and so it's easy to find what looks like evidence of ignorance, and it's obvious that this matters, and it's trivial to apply the rhetoric of survey spin to make the point. For some LL discussion of an earlier case, see "Freedom of speech: more famous than Bart Simpson?" 3/3/2006. The truth about public knowledge and opinion in this area matters, in my opinion, and it's worthwhile for people to inoculate themselves against the kinds of spin used to exaggerate public ignorance, so I thought I'd post a little tour of a recent blogospheric example.

A couple of days ago, Glenn Greenwald posted a passionate denunciation of what he sees as recent assaults on First Amendment rights (Unclaimed Territory, "People who don't understand how America works", 5/27/2006). He describes the threats to "[imprison] journalists who publish stories containing information which the Bush administration wants to conceal", and the belief that "[if] you are a U.S. citizen, the President can unilaterally order you abducted and imprisoned; does not have to charge you with any crime; can block you from speaking with anyone, including a lawyer; can keep you incarcerated indefinitely (meaning forever); and can deny you the right to any judicial review of your imprisonment or any mechanism for challenging the accuracy of the accusations." He quotes approvingly from Antonin Scalia's opinion (in Hamdi v. Rumsfeld) that "[t]he very core of liberty secured by our Anglo-Saxon system of separated powers has been freedom from indefinite imprisonment at the will of the Executive...", and concludes that "people who never learned that American citizens can't be imprisoned by Executive decree and without a trial, or that American journalists aren't imprisoned for stories they write about the Government's conduct ... plainly do not embrace, or comprehend, even the most basic principles of what America is".

I agree with Greenwald and Scalia about these issues, and (when the rhetorical underbrush is cleared away) I think that most other Americans do too. But one of the commenters on Greenwald's post suggests that the core of the problem is in the American population, not in certain factions of the American intellectual and governing classes, and supports the case with a quotation from one of the ignorance-mongering press releases I'm talking about:

Glenn, these demagogues are just reflecting the beliefs and understanding of their voters. Sadly enough, a recent poll shows that a significant minorty of Americans think the press should have moderate to severe restrictions on its freedom:

* Only 14% of Americans – and only 57% of journalists – can name freedom of the press as a right in the First Amendment.
* 43% of Americans believe the press has “too much freedom,” while 3% of journalists agree.
* 22% of Americans believe government should be able to censor newspapers.
* 72% of journalists said the media is doing at least a good job in reporting information accurately; 39% of Americans agreed.
* Only about one-third (36%) of Americans agree the news media tries to report the news without bias, while 61% claim there is bias in news coverage.

The hyperlinked press release doesn't tell us what questions were asked in what order, but it gives a clue:

Only 14% of Americans, and 57% of newspaper and TV journalists, can name “freedom of the press” as a right that is guaranteed by the First Amendment, according to a new University of Connecticut study.

“Freedom of the press is at the core of America’s brand of democracy,” commented Professor Ken Dautrich who directed the study. “It’s quite surprising that so few Americans can name it as part of the First Amendment. Even more disappointing is the fact that those who use free press rights in their work aren’t more knowledgeable about it.”

When asked to identify the specific rights guaranteed by the First Amendment, “freedom of speech” is cited most frequently (58%) by Americans, followed by freedom of religion (16%). The right to peaceably assemble (10%), and the right to petition government for a redress of grievances (1%) are even less identifiable than free press.

If you're hip to the rhetoric of survey spin, you'll guess at this point that the survey asked people to enumerate first-amendment rights by free recall. They probably weren't given a list of possible rights (real and fake) to pick from; and they probably weren't asked to list the rights guaranteed by the constitution as a whole, or by the bill of rights, or whatever.

Think about this for a minute. Do the Ten Commandments prohibit adultery? I bet that most people would say "yes". What is the number of the commandment that prohibits adultery? I bet that most people can't remember. What does the 8th commandment say? I bet that most people can't remember this either (hint: that's not the one about adultery).

Now, if you want to design and report a survey to show that people are ignorant of the decalogue, you'll ask a question like "what does the eighth commandment say?", and you'll report the results by writing something like "Only 14% of Americans, and 57% of preachers, were aware of the commandment that prohibits stealing".

The UConn press release tells us that "A complete copy of the survey results can be found at: http://www.dpp.uconn.edu", but this is apparently no longer true. However, a bit of general googling suggests that there is a close relationship between the cited survey and one carried out by New England Survey Research Associates for the First Amendment Center: "State of the First Amendment 2005". At least, the report credits "Professors David Yalof and Ken Dautrich" with devising the questions and supervising the survey, and the reported numbers are similar though not identical. According to that report, the very first question in the survey was indeed:

As you may know, the First Amendment is part of the U.S. Constitution. Can you name any of the specific rights that are guaranteed by the First Amendment?

The report gives percentages for a few of the answers, broken down over several years of the survey:

	1997	1999	2000	2001	2002	2003	2004	2005
Freedom of the press	11%	2%	12%	14%	14%	16%	15%	16%
Freedom of speech	49%	44%	60%	59%	58%	63%	58%	63%
Freedom of religion	21%	13%	16%	16%	18%	22%	17%	20%
Right to petition	2%	2%	21%	1%	2%	2%	1%	3%
Right of assembly/association	10%	9%	9%	10%	10%	11%	10%	14%
Don't know/refused to answer	N/A	N/A	37%	36%	35%	37%	35%	29%

Just for reference, in case you don't happen to have the first amendment memorized yourself, it reads:

Congress shall make no law respecting an establishment of religion, or prohibiting the free exercise thereof; or abridging the freedom of speech, or of the press; or the right of the people peaceably to assemble, and to petition the government for a redress of grievances.

(And why not go ahead and memorize it? It's only 45 words long... For that matter, the whole Bill of Rights is just 613 words.) As the survey results indicate, most people associate the first amendment with freedom of speech. I suspect that 63% is a higher percentage than could say what the first commandment requires, in a similar test of free recall.

The poll's questions #3 and #8 are more relevant to the first topic of Glenn's post:

3. Overall, do you think the press in America has too much freedom to do what it wants, too little freedom to do what it wants, or is the amount of freedom the press has about right?

1997 1999 1999(f) 2000 2001 2002 2003 2004 2005

Too much freedom 38% 53% 42% 51% 46% 42% 46% 42% 39%

Too little freedom 9% 7% 8% 7% 8% 8% 9% 12% 10%

About right 50% 37% 48% 41% 42% 49% 43% 44% 47%

Don't know/refused to answer 3% 2% 3% 2% 3% 2% 1% 3% 4%

8. Overall, do you think Americans have too much, too little or just the right amount of access to information about the federal government’s war on terrorism?

2002 2003 2004 2005

Too much access 16% 12% 15% 14%

Too little access 40% 48% 50% 52%

Just about the right amount 38% 38% 31% 30%

Don't know/refused to answer 6% 2% 4% 4%

Those were the results from a year ago -- this year's survey has yet to be posted -- but I'd be very surprised if things had changed so as to put the American public further away from wanting to be kept informed by a free press. Except for the aftermath of the Monica Lewinsky scandal, a solid majority of Americans think that there is too little or about the right amount of press freedom. Public opinion about legal sanctions for publishing classified information no doubt depend on the details of the case, but I'd be surprised if a majority of Americans favored allowing a president to categorize the publication of arbitrary information as a crime.

The results of four additional survey questions were released separately by the American Journalism Review. These results underlined my point even more strongly:

The 2005 edition of the poll, commissioned by the First Amendment Center in collaboration with AJR, found that 69 percent of Americans agree with the statement: "Journalists should be allowed to keep a news source confidential."

This surprised the pundits, who may have been drinking their own kool-aid:

National Journal media columnist William Powers thinks it's "amazing that that many people are behind this principle," saying he would have guessed support would be less than 50 percent.

There was some other good news:

The survey offers one other encouraging finding for the media. Americans endorsed the press' watchdog role, with 74 percent agreeing with the statement: "It is important for our democracy that the news media act as a watchdog on government."

All this despite the public's rather low opinion of the media biz:

... an unnerving 65 percent of those polled agreed with the statement: "The falsifying or making up of stories in the American news media is a widespread problem."

And a mere 33 percent agreed that: "Overall, the news media tries to report the news without bias." That's down 6 percentage points from last year. Among the 64 percent of Americans who disagreed with that statement, 42 percent strongly disagreed.

On all these points, my own opinions line up pretty close to those of the folks who were surveyed. And as a group, we're better informed, more sensible, and more American than the PR spin suggests.

Note that I'm not blaming Greenwald's anonymous commenter. (S)he just swallowed the bait dangled by the press release and its media uptake, as all too many people do. But the real public-opinion situation is a lot better -- and that matters.

[Update -- Zeno points out that the ten commandments are an especially tricky example:

The sixth commandment says, "Thou shalt not commit adultery." The eighth commandment says "Thou shalt not bear false witness." If you think I'm off by one in each case, then it's because you're not using the traditional Catholic numbering of the commandments. (I think the Lutherans use the same numbering.) Catholics have two commandments (9 & 10) about coveting, whereas most Protestants have one omnibus anti-coveting commandment (10). To make up the difference, they split in two the commandment that Catholics consider the first: "I am the Lord thy God, thou shalt have no strange gods before me."
This means, of course, that a Protestant pollster could easily mark me down as someone who doesn't know the Ten Commandments. I'd be a confirming instance of the ignorance of the man in the street. And it would be false witness, too.

So it'll be especially easy to find "ignorant" clergy. Seriously, this is like the issues with counting the "five freedoms" discussed here. Also, it never occurred to me before to wonder: whose numbering gets used on those public decalogue displays that have been a matter of first-amendment contention in recent years? ]

[Update #2 -- Ran Ari-Gur wrote:

I completely agree, and I think a further point is that freedom of the press is really just a kind of freedom of speech, so that it's not unreasonable for someone not to name it if they've already named freedom of speech. (Technically, there may be a line where one ends and the other begins, but if so, we're no longer talking about the basic facts of the Bill of Rights that every American should know.)

On one hand, there's a conceptual thread that ties religion, speech, press, assembly and petition together in the first amendment; and on the other hand, the founders had reasons for enumerating them separately. But in any case, it's certainly true that the development of online media increasingly blurs the boundary between the speech of ordinary citizens and "the press". ]

Posted by Mark Liberman at 12:54 PM

Confusing web language with the web world

Says Adam Cohen in a New York Times article on corporate threats to the democratic ethic of the World Wide Web [print: Sunday May 28, 2006, Week In Review, p. 9]:

"The blogging phenomenon is possible because individuals can create Web sites with the World Wide Web prefix, www, that can be seen by anyone with Internet access."

The remark suggests at least two mistakes. One (relatively minor and perhaps debatable) is linguistic. The other is more technical, and concerns a false belief that is probably fairly widespread.

1. Morphologically, the www in such words as www.languagelog.com is more like a combining form (see chapter 19 of The Cambridge Grammar) than a prefix. While there is great freedom in creating URLs, the most usual practice follows the convention of having a name for a server or a department followed by a name for the site or the company followed by a suffix indicating either a type of domain (commercial, non-profit, educational) or a specific country, and only the last element has to be picked from a predetermined list and has a semantics that the owner does not control. Thus we find email.sjsu.edu for the email server at San Jose State University; ftp.debian.org for the ftp (file transfer protocol) of the Debian Linux organization; ling.ucsc.edu for the departmental server of the Department of Linguistics at the University of California, Santa Cruz (where I work when I'm not at Language Log Plaza or away on sabbatical); and so on. The www portion of the millions of URLs that have it is really a server name: www.webster.com is the URL for the main web server (or server array) of the Merriam-Webster Corporation in the commercial arena, matching the typical pattern of server + site + domain. To the linguist's eye, the www is more like an initial combining form in a multi-component word (like the geo in geophysical or the psycho of psycholinguistic) than like a sense-modifying suffix (un- in unhelpful or pre- in prenatal).

2. You don't have to have www as the first component of the URL in order to have a blog or any other kind of web site. What you have to have is an Internet-connected machine running http server software. You can see this immediately from any of the livejournal blogs (like clunis.livejournal.com, to cite a random example), or from the fact that ling.ucsc.edu and lib.harvard.edu are web servers, etc. Cohen appears to believe that in order to have a blog, or perhaps in order to have a web site at all, you have to create a site with www as the first element of the URL. You don't. Doing so is purely a very widespread linguistic convention. One more case of confusing your language with your world.

Posted by Geoffrey K. Pullum at 10:46 AM

Tensions between a singular and plural nouns

New York Times cultural critic Edward Rothstein has a provocative column about the Senate vote to declare English the "national language," contrasting the legislation with the European Charter for Regional and Minority Languages. Rothstein's pointed arguments about the divergent American and European approaches to multilingualism deserve serious consideration, but I had a hard time getting past the headline that the Times stuck on the column in the online edition, both in the main Arts section and on the page for the article itself.

It's another case of contracted "headlinese" leading to a curious, if not downright bizarre, grammatical construction.

We are of course expected to read

(1) tensions between [ [a national Ø] and [minority languages] ]

as elliptical for

(2) tensions between [ [a national language] and [minority languages] ]

rather than the ill-formed

(3) *tensions between [ [a national languages] and [minority languages] ].

Presumably the full version (2) was deemed too long for the space allotted for the headline (three short "decks" on the main Arts page and two longer ones on the article page). But the contracted version just doesn't work for me: I stumbled over the headline, expecting to find a singular head noun to agree with the singular article in "a national..." What we have here is yet another flavor of WTF coordination, this time with a singular-plural conflict between the two conjoined NPs.

It would have been so much simpler if the headline-writer had just omitted the singular article a, leaving

(4) tensions between [ [national Ø] and [minority languages] ]

which unproblematically expands to

(5) tensions between [ [national languages] and [minority languages] ].

So what's wrong with that? Well, it's a bit less precise given the context of the article, which considers policies balancing a country's national language (e.g., German) and its minority languages (e.g., Low German, Sater Frisian and Lower Sorbian). So the "tensions" are in a one-to-many relationship (between a single national language and multiple minority languages) rather than a many-to-many relationship. The problem is that there is no concise way to express that one-to-many correspondence without causing an eyebrow-raising mismatch between singular and plural conjuncts. In this case, I would prefer the semantic imprecision of (4) to the grammatical weirdness of (1).

The moral of the story: you can call English "national," but that doesn't make it rational.

[Update: The headline for the column in the print edition (as verified by the Nexis and ProQuest databases) is completely different: "Translated From Spanish (or Lower Sorbian or Breton), With High Emotion." That more creative style of headline-writing doesn't fly in online editions, as the New York Times itself explained in a recent article, "This Boring Headline Is Written for Google."]

Posted by Benjamin Zimmer at 01:45 AM

May 28, 2006

Menu conventions vs. syntax

Maybe, just maybe, there's another way to look at the menu of the EVOO restaurant that Geoffrey Pullum described(see here) as full of noun phrases with many attributive modifiers:

Garlicky Pork Sausage Stuffed Crisp Fried Maryland Soft Shell Crab

On the (possibly weak) assumption that fair-minded restaurant patrons ought to try to read menus from the perspective of the menu writers, I ask the question, "Why did EVOO's menu have a string of words of such complexity?" Were the owners laying in wait for a grammar expert to come in and parse their menu? Or were they merely victims of the long-held conventions of menu presentation?

Let's begin with data. Restaurant menus follow a standard, expected series of meaning slots. I've examined dozens of restaurant menus around the country and I've found that they consistently present their offerings according to this formula:

Slot 1. self-congratulations about the item our famous, world's best

Slot 2. method of cooking the item fried, roasted, baked, wood-fired

Slot 3. style of cooking the item Italian, Cajun, Southern

Slot 4. the food item chicken, beef, salmon, pork

Slot 5. serving modification sandwich, roll-up

To make this look more scientific than it really is, note the following formula:

+/- slot 1 +/- slot 2 +/- slot 3 + slot 4 +/- slot 5

This says that all slots are optional except slot 4, the food item. Other slots can be used but they don't have to be, depending on the menu writer's discretion and creativity. Even though it's not obligatory for menu items to fill all 5 slots, their order is fixed. You probably won't see many menus offering: "Salmon roasted Cajun our best" or "sandwich French baked our famous."

The really top-flight (expensive) restaurants don't just list the food item all by itself. That would not be classy and their customers would certainly not be impressed to see "crab" on the menu without the method in which it was cooked. Nor would they want to order a sandwich denuded of what actually was in it. Self-congratulation is to be avoided in all high-class menus. If you have to say how good the item is, it probably means that it isn't all that great anyway.

Although it has considerable oddness in slot 2, method of cooking, the menu item that Geoff analyzed follows the standard menu slot sequence:

Slot 1. self-congratulations -- not used
Slot 2. method of cooking -- Garlicky Pork Sausage Stuffed Crisp Fried
Slot 3. style of cooking -- not used
Slot 4. the food item -- Maryland Soft Shell Crab
Slot 5. serving modification -- not used

By now you're wondering, "How can Garlicky Pork Sausage Stuffed Crisp Fried" possibly be a a method of cooking?" This is where the menu writers got into trouble. One could argue that the method of cooking is really "crisp-fried," but that by itself apparently didn't sound classy enough to them. If the method of cooking had stopped with "crisp-fried," and if the writers had insisted on putting "garlicky pork sausage stuffed" somewhere on the menu, it might have made sense to shift it to some other slot. But it doesn't really fit slot 3, style of cooking. Nor would the restaurant want its customers to think that lowly and pedestrian "pork sausage" is part of the classy slot 4 food item, since the owners no doubt wanted to highlight Maryland Soft Shell Crab more than anything else.

So the menu inserts "garlicky pork sausage stuffed crisp fried" into the method of cooking slot, leading to the confusing syntax that Geoff described so well. What appears to be wrong with this slot is that it's missing a preposition, a conjunction, and some punctuation. It seems to mean this:

Crisp-fried (and stuffed with pork sausage) Maryland Soft Shell Crab

Slot 2 (method of cooking) Slot 4 (food item)

From the restaurant's perspective the problem with this is that it becomes an overly long introduction to the most important part of the menu, still to come -- Maryland Soft Shell Crab. The menu writers did their best to follow standard menu conventions but they fell considerably short of making syntactic sense. If they were courageous enough, it might have been prudent for them to fly in the face of menu slot conventions, reversing the slot order, and say simply:

Maryland Soft Shell Crab, crisp fried and stuffed with garlicky pork sausage
Slot 4 (food item) Slot 2 (method of cooking)

Maybe we should pity the poor menu writers who have to choose between following the conventions of their field and writing with English syntax.

[Update] Gabriel McCall writes that when menu items are offered verbally by servers, they generally follow the reverse pattern. Interesting.

Posted by Roger Shuy at 08:08 PM

And the plural of MacBook Pro is ...

Greetings from the youth and popular culture desk here at Language Log Plaza. We also happen to be the ones who take questions regarding language use in the computing industry; nobody else here much cares about it and so our new telephone system just redirects those calls to our desk (following a fittingly recursive route, natch).

Having thrice outed myself as an Apple product fanatic, Language Log reader Jake Seliger recently contacted me directly to ask about how the new line of high-performance Apple notebook computers should be pluralized.

Most Highly Esteemed Professor Bakovic:*

You noted on Language Log that you're fascinated with all things Apple, so I thought you the person to go with a question combining language and Macs.

On Apple websites I've chiefly seen the plural of "MacBook Pro" as "MacBook Pros." Yet I believe MacBooks Pro would be correct as a plural since Pro is just a modifier -- similar to "attorneys general." Is this correct?

Yet when one is referring to the possessive -- concerning the hard drive, for example -- saying, "The MacBook's Pro hard drive died" would, I'm fairly certain, be wrong. So one would say, "The MacBook Pro's hard drive died" instead.

As a result, calling the plural "MacBooks Pro" and calling the possessive "MacBook Pro's" would seem likely to generate confusion.

Is any of this right?

-Jake Seliger

What follows is my reply to Jake's question, suitably edited for Language Log viewing by those of you who may have the same question -- or one very similar to it. (A note to the less fortunate among you who will have to settle for one of the less expensive consumer-level Apple notebooks: they're just called Macbooks, so no linguistic issue for you folks there.)

I think insisting that it should be attorneys general rather than attorney generals (or mothers-in-law rather than mother-in-laws, and other such examples) is a little silly in the first place, and I wouldn't insist on MacBooks Pro over MacBook Pros. There's good linguistic reason for the indecision here: names, titles, and other such labels tend to be analyzed grammatically as compound nouns in English, rather than as phrases. The difference between these two things is in most cases very subtle -- which is part of the problem leading to the issue at hand -- but the following example should help to show that there is one.

Think of the compound noun blackboard, meaning the thing that you write on with chalk. Many blackboards are indeed black in color, but not necessarily; many are green, for instance, but we still have no problem calling them blackboards. On the other hand, take the phrase black board. (The orthographic space between the words only highlights the distinction; in pronunciation, the difference is roughly one of relative stress: BLACKboard vs. black BOARD. Compounds are often, but not always, written without a space.**) A black board cannot be green, and it's not necessarily something you write on with chalk -- it's just a board that happens to be black.

Especially in writing, phrases and compounds often appear to be very similar, as this example illustrates; the differences are mainly things we don't represent orthographically (such as relative stress) and the consequences for meaning: phrases tend to add up to the meanings of their parts, whereas compounds can have specialized meanings of their own that bear less of a relation to their parts. This is what linguists refer to as compositionality: phrase meaning tends to be compositional (transparently composed of the meaning of its parts), whereas compound meaning can be noncompositional.

Another key difference between phrases and compounds is that the parts of a compound can be ordered in ways in which phrases cannot. For example, within a phrase, a modifier overwhelmingly tends to precede the noun that it modifies. (There are particular learned exceptions such as in it came upon a midnight clear, but these are clearly felt by English speakers to be exceptional.) No doubt related, at least in part, to their noncompositionality, compounds can (sometimes) have the order noun + modifier, as in the attorney general and MacBook Pro examples. (Another example like this is bootblack, someone who polishes shoes and boots.)

As you can see, the grammatical rules that are typical of phrases don't necessarily apply in the same way to compounds. The same thing goes for the rule that you're interested in: that the noun part rather than the modifier should receive the plural -s marker. But note that the entire compound in these cases is itself a noun -- they refer to persons (attorney general) or things (MacBook Pro) -- and so the plural -s marker can be thought of as indicating that the entire compound is plural. So, the problem in these particular cases just boils down to the fact that the last part of the compound happens to be the modifier rather than the noun, and so it looks like the modifier is what's being pluralized.

The following diagrams might help. Think of the plural rule as something that says: "give me a singular noun (N), and I'll give you another N with -s attached to it that means the same thing as the original N, except plural rather than singular". Since the compounds you've asked about are Ns that consist of an N and a modifier (M), the plural rule can operate on either of the two Ns, and so you get two possible structures in each case:

	-s attached to lower N	-s attached to higher N
MacBook Pro
attorney general

So, both possibilities are technically correct. (For some English speakers there may be a tendency not to put things like plural markers in between the parts of a compound -- see for example this paper, p. 19ff -- but that's a separate issue.) I think the reason folks tend to insist on things like attorneys general is because they stop to think about the internal structure of the compound as if it were a phrase.

The possessive marker -'s follows a different rule than the plural, so there's no reason for that to influence one's judgment of MacBooks Pro vs. MacBook Pros. While the plural marker attaches to a single noun, the possessive marker attaches to an entire noun phrase (NP). This is shown by examples like the following (which people sometimes find awkward and often try to rephrase, especially in writing, since there are many other ways to mark possession in English):

[_NP This guy I know ]'s sister is a fashion designer.

You definitely wouldn't say this guy's I know sister; possession is marked on the entire noun phrase this guy I know, which happens to end with a verb instead of a noun.

In the MacBook Pro's case, the NP consists of just the compound. So, the relevant structure in such a case would be something like the following.

And that's why MacBook's Pro wouldn't work at all.

Jake wrote back to thank me for this reply, and to give me permission to quote all this -- plus he sent this very relevant link. He also writes:

One other thing to note is the abbreviation issue surrounding MacBooks Pro: online, especially at Ars Technica's Macintoshian Achaia, people tend say "MB" for "MacBook" and "MBP" for "MacBook Pro." As a result, the plural of MacBook Pro becomes MBPs (I usually leave out the apostrophe for plurals). In this case, MBsP obviously wouldn't make any sense because MBP is the entire noun and "Pro" no longer really modifies it.

Final note: it occurs to me that we could just ask the good folks at Apple what they consider more carefully what they think the plural of MacBook Pro should be. When confronted with the fact that Walkmen sounds odd to (most) English speakers, and that Walkmans fails on the supposed analogy with the irregular pair man ~ men, Sony apparently decreed that the plural of their insanely popular product should be Walkman^® personal stereos. Whaddayasay, Apple -- MacBook Pro notebook computers???

[ Comments? ]

* Actually, Jake's message started with just "Hi," -- one of the reasons I've been told that our youth and popular culture desk exists in the first place. I've felt free to translate the salutation on Jake's behalf. [back]

** This is my argument for why the name of the film The English Patient should be pronounced The ENGLISH Patient, not The English PATIENT -- it's a title, hence a compound. Some folks vehemently disagree; for them, I've got a can of Jolt cola right here. [back]

Posted by Eric Bakovic at 06:26 PM

From the deepest void of neverness

I have an uneasy feeling that just because I offered some modest syntactic reflections on the syntactic complexity of an EVOO menu item ("garlicky pork sausage stuffed crisp fried Maryland soft shell crab") I am now going to be inundated by messages from people who think they have found a noun phrase with an even longer succession of attributive modifiers. These would be people who have, sadly, mistaken me for someone who might give a damn.

Already one of our youth and popular culture correspondents here at Language Log, Eric Bakovic, has supplied me with some examples that he found quoted from a user named Babaquara in an article about online music services (which Bakovic was apparently reading on Language Log company time). He quotes the phrase "dada kraut psych mindblowing conscience expanding sublime acid oriented arcana coelestia weirdness", for example. I will explain my reaction to Eric using the unique trisyllabic word that appears to be widely understood by his generation: whatever.

Certainly, it is possible that the phrase dada kraut psych mindblowing conscience expanding sublime acid oriented arcana coelestia weirdness has roughly nine stacked attributive modifiers; but one cannot really tell, because it all depends on how it is parsed: doubtless "consciousness-expanding" (I add the helpful hyphen) is intended as a syntactic unit, but one doesn't know about "kraut psych" and so on. This is basically the problem one finds with quotes from chimpanzee language: chimps are occasionally reported as having signed things with transcriptions like BANANA BANANA HELP REFRIGERATOR GIMME OPEN BANANA GIMME, and syntactically one does not really know where or whether to begin.

Part of the problem here is that Eric is one of the younger staffers here at Language Log Plaza. They work with headsets on, they have X-men posters on their walls, they talk about whether Lara Croft's breasts in the new Crystal Dynamics video game release are as big as before. The average age in their part of the building is approximately 19. They typically list their hobbies as (i) being wicked cool, (ii) dancing to their iPods in public places, (iii) shopping at American Eagle, and (iv) staying out all night.

One does not see them at EVOO; they dine at places where the menu is a series of brightly colored pictures on glass with lights behind them. Often there is a neon sign in the window saying "BURRITOS AS BIG AS YOUR HEAD".

And their reading material does not fully meet the criteria for being called "language". Another phrase quoted from Babaquara (see it here if you have your parents' permission) puts it well: "ultrahypermegamonstaheavy over the top mammoth freakin mind exploding destroyer psychedelia from the deepest void of neverness." The fact is that the younger Language Log staffers seem well acquainted with the deepest void of neverness. Eric Bakovic has definitely been there. I have seen him pour a can of Jolt cola over David Beaver's head during a disagreement about whether something or other was ultrahypermegamonstaheavy or not.

I simply do not understand half the things they say to each other (if my using the verb "say" is not begging the question there). In normal running text, 37% of the words are nouns; in the cubicles of the Youth and Popular Culture department at Language Log Plaza, 37% of the words are dude.

So I am not necessarily prepared to consider random examples containing huge numbers of attributive modifiers to be within the normal range of non-chimpanzee syntax, if they come from things Eric would read and understand, OK? Is that understood? Let's try to keep some reasonable standards in place here. Call me a fusty old conservative if you like, but I think English is quite lax enough on stacked prenominal modifiers without our seeking data from any mammoth-freakin' mind-exploding dialects in which the word like is used as a punctuation mark. Just don't send me any.

Posted by Geoffrey K. Pullum at 04:20 PM

Da Vinci Q & A

I've finally done my civic duty. I read The Da Vinci Code, and saw the movie. Reading the book was an anti-climax: I have nothing add to Geoff Pullum's deconstructions (look at the bottom of this post for a list). The cinematic signs and portents were ambiguous: on one hand, the theater was nearly deserted; on the other hand, a sophisticated fourth grader of my acquaintance thought the movie was better than X-Men, though not as good as The Terminal. But I agree with Geoff Pullum that traditional media are generally "Behind the Da Vinci Curve", and as further evidence of the superiority of the new-media coverage, I'd like to draw your attention to a recent post on The Medicine Box ("The Internet Theologian Explains The Da Vinci Code" 5/17/2006).

It begins:

As the responses to my helpful guide on Christianity show, when theological controversies arise, many people wisely turn to an anonymous crank with a web log. Or, as I prefer, to a Big-Time Internet Theologian.

These are good days for us Big-Time Internet Theologians: religious controversies are in the news daily, and many people have probing, searching questions that cannot be answered by relying on traditional, "second wave" sources like books, professors, or subway graffiti. People want answers, and they want them to come with hyperlinks to Wikipedia entries compiled by embittered teenagers.

The first few (questions and) answers:

Q: Who is Dan Brown and what is "The Da Vinci Code"?
A: Dan Brown is the biggest-selling, and therefore best, author of our times, and "The Da Vinci Code" is his masterpiece: a thrilling, shocking journey across thousands of years of history all packed within a pulse-pounding chase across scenic Europe, leading up to the greatest conspiracy of all.

Q: What is the greatest conspiracy of all?
A: The 1954 NIT point-shaving scandal.

Q: What does all this have to do with Jesus? Or, for that matter, Leonardo Da Vinci?
A: The premise of the book is that Jesus was married to Mary Magdalene, and that the two had children, who passed along Jesus' bloodline through generations of French people. Leonardo was the member of a secret brotherhood of painters who protected this secret by painting pictures of men that look like ladies.

This is Language Log, after all, so there is an obligatory linguistic hook:

Q: Why does the dialogue in the book which is supposed to be in French include French words alongside the English translation, like, "Pain is good, monsieur" and "Le capitaine is happy you decided to stay overnight"?
A: That is how the French speak. There is no French language per se, just a few words they throw into English sentences to make themselves seem superior to Americans.

You should read the rest of it for yourself, but I can't help quoting a few more:

Q: The book goes into detail about a group called the Knights Templar. Can you explain what they were?
A: They were the basketball team of Temple University in the 1950s. Philip the French, who was King of Congress at the time, suppressed them because of the NIT point-shaving scandal. In addition to playing basketball, they also guarded the secret of Jesus' French kids by painting pictures of men who look like ladies.

Q: Okay, explain this whole "painting pictures of men who look like ladies" thing. What does it have to do with Leonardo?
A: In 1099, a reggae group called the Priority of Zion was founded to hush up the truth about Jesus' French children. It was felt at the time that if word got out that Jesus had lived in France, it would drive up real estate costs beyond what the knights were willing to pay. So the Priority of Lion was formed to keep the secret. Throughout the centuries, every time someone became prominent in Europe - Botticelli, Sir Isaac Newton, Tintin - they would be enrolled into the Prior of Zionism to help keep the secret.

Q: Doesn't it seem more sensible, if they wanted to keep a secret, not to enroll high profile Europeans?
A: Yes, except that it was hard for many years to avoid famous Europeans. From 1755 to 1914, everyone in Europe was either an author, inventor, or executed king.

Q: So how do the paintings factor in?
A: Leonardo Da Vinci was a member of the Priorities of the Elders of Simon. However, he was terrible at keeping secrets, and felt it necessary to leave little clues in all his paintings about Jesus' Francophone offspring. For example: the Mona Lisa is smiling because Leonardo was feeling smug about knowing where Jesus lived, all the while Raphael was thinking Jesus lived in Jerusalem.

This captures the book's zany dream-logic better than any other reviews that I've seen.

[Note: the identification of "holyoffice" as Terry Mattingly, though based on what I once thought was plausible evidence, is clearly false. Apologies to both Prof. Mattingly and to "holyoffice" for the error, which I've left in place as evidence of my own carelessness.] At this point, though, I need to 'fess up that holyoffice, the author of The Medicine Box blog on Livejournal, is apparently* Terry Mattingly, who also posts on the blog GetReligion ("The press ... just doesn't get religion"). In Real Life, he's director of the Washington Journalism Center at the Council for Christian Colleges and Universities, and author of a weekly column for Scripps Howard. In other words, an old-media infiltrator.

When you're done with as many of those links as you care to follow, you might want to try the glossary of Christian terminology at the end of the post "The Interpretive Dance Theocrats" (Terry Mattingly as holyoffice on The Medicine Box, 5/12/2006), which begins:

Premillenialism
This is the belief among some Christians that, ever since Jan. 1, 2000, it has no longer been possible, in the words of the Prince song, "to party like it's 1999." Postmillenialists are those Christians who believe that it will always be possible to do so, while Amillenialists believe that in this context, "1999" cannot be understood literally, but must be read as an allegorical term roughly meaning "a time at which it is especially appropriate to party."

Rapture
This was a #1 hit in 1980 for Blondie (#5 in the UK), from the otherwise underwhelming "Autoamerican" album. Many Christians now concede that the then-pioneering use of rap in the song sounds a little lame in retrospect. In their best-selling series of books about the song, "Left Behind (Parallel Lines)," Jerry Jenkins and Tim LaHaye defend the rap verse's hip references to Grandmaster Flash and Fab Five Freddy, and maintain that when Jesus returns, all believers will be united in accepting that Blondie's cover of "The Tide Is High" is better than the original.

* I inferred that Terry Mattingly is holyoffice, or perhaps vice versa, from the LiveJournal profile page for holyoffice, which gives The Press doesn't get religion in the "website" slot. Among the folks who post there, Terry Mattingly seemed like the best fit to "holyoffice". If I got that wrong (and two readers have written with scholarly objections to the analysis), I apologize to all concerned. The DVC Q&A is still the only stuff on Dan Brown I've seen that's as funny (and true) as Geoff Pullum's posts. [ Well, I *did* get it wrong, and apologies are certainly in order to both writers.]

Posted by Mark Liberman at 12:59 PM

Monkey words

To balance our occasional complaints about foolish and misleading science reporting, I'd like to commend an article by Nicholas Wade in the NYT ("Nigerian Monkeys Drop Hints on Language Origin", 5/23/2006), on recent research by Kate Arnold and Klaus Zuberbühler ("Language evolution: Semantic combinations in primate calls", Nature 441, 303, 18 May 2006).

Wade's first two paragraphs describe the new research, with an appropriately nuanced claim about its importance:

Researchers taping calls of the putty-nosed monkey in the forests of Nigeria may have come a small step closer to understanding the origins of human language.

The researchers have heard the monkeys string two alarm calls into a combined sound with a different meaning, as if forming a word, Kate Arnold and Klaus Zuberbühler report in the current issue of Nature.

In the third paragraph, Wade sets the stage in an informative and sensible way:

Monkeys are known to have specific alarm calls for different predators. Vervet monkeys have one call for eagles, another for snakes and a third for leopards. But this seems a far cry from language because the vervets do not combine the calls into anything resembling words or sentences.

(This is a reference to the work of Dorothy Cheney and Robert Seyfarth, explained at length in their wonderful book How Monkeys See the World.) Wade then describes the new facts as Arnold and Zuberbühler reported them :

The putty-nosed monkeys have a "pyow" call meaning there are leopards about and a hacklike sound to warn of the crowned eagle. The "pyow" calls attention to a leopard on the ground.

When hearing the "hack" sound, a monkey tends to freeze because movement would betray its position to an eagle.

Dr. Arnold and Dr. Zuberbühler, zoologists at the University of St. Andrews in Scotland, noticed that adult male monkeys in each troupe were combining the "pyow" and "hack" calls.

Playing back a "pyow-hack" call to see how the monkeys interpreted it, the zoologists found it made the troop leave the area.

This lays out what is new in the research: the animals' response to playback of the combined calls. Monitoring response to call playbacks is a technique pioneered by Cheney and Seyfarth with vervet alarm calls, so this is a new application of an old technique. And what does it mean that putty-nosed monkeys use (and respond to) combinations of two different alarm calls? Wade explains:

Researchers studying monkeys and apes have learned that they possess all the basic apparatus needed to make and analyze sounds. But the nonhuman primates did not seem to possess either of the two combinatorial features of language, those of combining discrete sounds into compound words, and of stringing words together under rules of syntax.

Dr. Zuberbühler said that he and Dr. Arnold had not observed anything resembling syntax, but the putty-nosed monkeys, Cercopithecus nictitans, "combined two types of utterances according to a rule and the combination takes on a novel meaning," a procedure perhaps analogous to forming a word from two sounds.

Notice first that Wade correctly reports Zuberbühler's evaluation that this has no apparent connection to the development of syntax, but may have something to do with phonology (which was also David Beaver's more vividly expressed suggestion). Wade (and Zuberbühler) are also appropriately tentative about the connection to phonology.

Logically, there are a number of possible sources for the phonological "duality of patterning" which is robustly observed in all human languages, but exists (at best) in embryonic or allusive forms in non-human animals. This system for making meaningful messages out of intrinsically meaningless parts -- a digital form of message coding -- might arise from assigning meanings to vocal displays that were originally purely formal (like the songs of birds or whales); or it might arise from combining meaningful bits of behavior (like alarm calls) into communicative structures with new and (and least partly) arbitrary associations. I believe that Klaus Zuberbühler's idea is that the putty-nosed monkeys may be taking a first small step along the second path.

Wade ends with a skeptical quote from Marc Hauser:

Marc Hauser, an expert on animal communication at Harvard, said that the observation was very interesting but that stricter criteria should be applied before assuming the combination of alarm calls was similar to the way people combined sounds into words.

"Because there is no evidence that the calls are words or even wordlike, the connection to language is tenuous." he said.

This might be a bit too skeptical, at least as a way to end the piece; but it's an appropriate corrective for the range of spectacular over-interpretations elsewhere in the media. These include headlines like "Monkeys use 'Sentences', Study Suggests" [National Geographic], "African monkey can 'talk in sentences'" [The Independent], "Monkeys Found Using Primitive Linguistic Grammar" [Fox News]; and leads like "Monkeys are able to string together a simple 'sentence', according to research that offers the first evidence that animals may be capable of a key feature of language" [Mark Henderson in News.com.au], or "Researchers said Nigeria's putty-nosed monkey sometimes communicates with a combination of sounds different from others, offering, they say, the first proof that animals may be able to talk" [UPI].

Another good feature of Wade's story: it gives a link (in the margin under the heading "Related") to the original paper in Nature.

Recently, the major journals Science and Nature have been vying with one another in giving prominent display to papers about animal communication research. This in a fascinating topic, in my opinion, and I think that the published research is individually and collectively valuable (even if I sometimes disagree with the interpretation). Animal communication stories evoke deep resonances in our culture, and so these stories generally also make into news media of all sorts, mostly in bizarrely misconstrued forms.

[ One of the consequences of recent attention from Science and Nature is that the animal communication research featured in the media is not always the most interesting stuff. For example, it's (logically though not journalistically) odd to play up the recent Nature article on putty-nosed monkeys, while ignoring Zuberbühler's (in my opinion even more interesting) 2002 paper on cross-species communication between Diana monkeys and Campbell's monkeys in Côte d'Ivoire ("A syntactic rule in forest monkey communication", Animal Behaviour, Vol. 63 no. 2 , Feb. 2002, pp. 293-299).

Here's the abstract from that paper, for those without subscription access:

Syntactic rules allow a speaker to combine signals with existing meanings to create an infinite number of new meanings. Even though combinatory rules have also been found in some animal communication systems, they have never been clearly linked to concurrent changes in meaning. The present field experiment indicates that wild Diana monkeys, Cercopithecus diana, may comprehend the semantic changes caused by a combinatory rule present in the natural communication of another primate, the Campbell's monkey, C. campbelli. Campbell's males give acoustically distinct alarm calls to leopards, Panthera pardus, and crowned-hawk eagles, Stephanoaetus coronatus, and Diana monkeys respond to these calls with their own corresponding alarm calls. However, in less dangerous situations, Campbell's males emit a pair of low, resounding 'boom' calls before their alarm calls. Playbacks of boom-introduced Campbell's alarm calls no longer elicited alarm calls in Diana monkeys, indicating that the booms have affected the semantic specificity of the subsequent alarm calls. When the booms preceded the alarm calls of Diana monkeys, however, they were no longer effective as semantic modifiers, indicating that they are meaningful only in conjunction with Campbell's alarm calls. I discuss the implications of these findings for the evolution of syntactic abilities.

Anyway, given the need to follow Nature's lead in assigning relative importance, I think that Wade's 5/23/2006 NYT story is a model of how to approach such topics in a responsible way. ]

If someone were to treat this research in a longer feature, there are some other kinds of background that would be worth bringing out. As I discussed at greater in an earlier Language Log post ("Cotton-top tamarins on the road to phonology as well as syntax", 2/9/2004), a number of other animal communication systems are said to

exhibit what Charles Hockett called "duality of patterning": larger patterns made up of well-defined combinations of recurrent, well-defined smaller units.

In that post, I linked to a page (now alas at a different URL, so I've updated the link) offering spectrograms and audio clips of the vocalizations of cotton-top tamarins, including some vocalization type that seem to be combinations of sound classes used independently in other circumstances.

These include repeated sequences called "pulsed vocalizations". In most cases, the "pulsed" form seems just to be an intensified form of the simple call, like the multiple dog barks or whines that David Beaver cites. Thus the "Type F Chirp" is said to be used "During intergroup antiphonal calling of Normal Long Calls. To audible outgroup vocalizations." The corresponding pulsed form, the "Type F Chirp Trill", is "Same as for Type F Chirp. Tilling [sic] indicative of a higer state of arousal than Type F Chirp alone."

But sometimes , there is a suggestion of a different process. Thus the "type D chirp", which is glossed as a "post-food" call used "when an animal actually possesses food or object", has a pulsed form called "hooked chatter", which is used "as infants approach". Perhaps this is because the adults are saying "hey kid, I got something for you here!" And perhaps the "hooked chatter" then acquires a new set of associated meanings, associated with welcoming infants rather than with signaling the possession of food, by the sort of process that Charles Darwin described (in "The Expression of the Emotions in Man and Animals") for the development of meaningful displays (such as snarls) from fragments of gradually-decontextualized behavior.

An apparent typo in another entry raises expectations of greater interest. The "Type E Chirp" is a "general alarm" associated "[t]o sudden visual and acoustic stimuli. To sudden leaping movement by group members if animal startled." In contrast, the "Type A Chirp" is said to be used "During mobbing behavior, to sudden animated stimuli. By some groups to preferred foods. Rarely given to acoustical stimuli."

The "Type E chirp Chatter" is a pulsed form whose usage the cited page describes as "Same as for Type A Chirp only when animal is more highly aroused." That would be neat -- replication shifts the meaning from the expected intensified form of Type E to an intensified form of Type A? That's the kind of unexpected little irregularity that happens in human morphology all the time. Alas, I'm afraid it's just a typo.

There are also some "combination calls", such as the "Type F Chirp + Whistle", said to be used "By individuals less confident than when giving Normal Long Calls. As response to Combination or Normal Long Calls or non-group Type F Chirps. In isolation." The associated sound files suggest that this is the same as a "Normal Long Call" with a "Type F Chirp" substituted for the first of two (or more) rising "whistles". This is interesting and suggestive.

[Full disclosure: Klaus Zuberbühler got his PhD in 1998 from the University of Pennsylvania, where I teach, and I had a very high opinion of him while he was a student, which has been maintained by his subsequent work.]

Posted by Mark Liberman at 08:54 AM

May 27, 2006

Behind the Da Vinci curve

Anthony Lane's movie review of The Da Vinci Code ("Heaven can wait", The New Yorker, May 29, 2006) is wonderful: screamingly funny (IMHO; not everyone agrees). I recommend it if you want further excuses to giggle at (and perhaps not go and see) the ponderously silly movie that has been made out of the much-loved blockbuster of blasphemy. But — if I may raise this sensitive topic — what is it with these print journalists who keep picking up months or years later on linguistic points already well known to the world through Language Log? Look at Lane's paragraph on "renowned" (right hand column below; quotes from TDVC are highlighted in red) and compare it with what you read here (and here) well over two years before, about the same quote (left column; paragraph breaks removed):

Language Log, May 1, 2004:

The New Yorker, May 29, 2006:

I am still trying to come up with a fully convincing account of just what it was about his very first sentence, indeed the very first word, that told me instantly that I was in for a very bad time stylistically. The Da Vinci Code may well be the only novel ever written that begins with the word renowned. [...] "Renowned curator Jacques Saunière staggered through the vaulted archway of the museum's Grand Gallery" [...] I think what enabled the first word to tip me off that I was about to spend a number of hours in the company of one of the worst prose stylists in the history of literature was this. Putting curriculum vitae details into complex modifiers on proper names or definite descriptions is what you do in journalistic stories about deaths; you just don't do it in describing an event in a narrative.

There has been much debate over Dan Brown's novel ever since it was published, in 2003, but no question has been more contentious than this: if a person begins reading the book at ten o'clock in the morning, at what time will he or she come to the realization that it is unmitigated junk? The answer, in my case, was 10:00.03, shortly after I read the opening sentence: "Renowned curator Jacques Saunière staggered through the vaulted archway of the museum's Grand Gallery." With that one word, "renowned," Brown proves that he hails from the school of elbow-joggers — nervy, worrisome authors who can't stop shoving us along with jabs of information and opinion that we don't yet require.

Notice, I'm definitely not alleging plagiarism here: despite the tongue-in-cheek remark about the question of how many words you have to read before you form your judgment having been much discussed, I'm quite sure Lane thinks he's bringing up a new point, and he certainly does it in an original way: his point about elbow-jogging, information-jabbing writers seems new and fresh. He's no thief. He copies no sentences (and this cannot be said of everyone, can it?). What's more, he actually does new research in Brownian literary stylistics: he looks through the rest of the book and finds another example of an exactly parallel sort: he says that you could perhaps "dismiss that first stumble as a blip", but later on in the book you will find this:

Prominent New York editor Jonas Faulkman tugged nervously at his goatee.

This is a wonderful new example of clunky use of an anarthrous occupational NP preposed to a proper name, one that somehow I had not spotted in 2004. So Lane doesn't just use other people's examples; he is an active data-gatherer.

And yet... One does get a vague sense that he doesn't know he's into a two-year-old project here. He sounds a tiny bit like an intelligent literary stylistician who has just been awakened from a two-year coma and thus attracts a certain amount of eye-rolling at conferences as he brings up points that he thinks are new but they're not.

Listen up, Anthony: it's 2006, and everyone reads Language Log now. The web programmers at your own magazine read it: when I commented on an utterly insane prescriptivism-induced message from the magazine's web site search engine in 2005, they reprogrammed to get rid of it within a week or so (we get "Sorry, there are no results matching that search" now, instead of the hilarious message that I had mocked, "I'm sorry I couldn't find that for which you were looking"). Your boss reads Language Log, your Aunt Meg reads Language Log. The other day I saw a dog reading Language Log (on the Internet nobody knows you're one; though come to think of it, he may have thought it said Language Dog — I don't know what his motive was in reading our stuff, I only know he was sitting up on a stool in an Internet café paging through something by Mark Liberman that had graphs in it).

Has word not yet reached the New Yorker office film desk, whose very métier crucially involves being fully and deeply in touch with current cultural trends? Are they really working without having Language Log bookmarked? Is that why Lane seems to imagine that he is raising a brand new linguistic theme here, rather than hauling out a well-roasted old chestnut from the cheery fire of Language Log?

Posted by Geoffrey K. Pullum at 02:46 PM

Linguistics goes out to dinner

Barbara and I dined last night with her brothers and their combined families at EVOO, on Beacon Street in Somerville, near where (until the end of June) we live. (The restaurant name, by the way, is an acronym, from the initials of Extra Virgin Olive Oil.) Excellent as always. And, linguistics being everywhere, I had the pleasure of noting that one item on the list of specials for May 26 was described using one of the most complex and varied naturally-occurring nominal premodifier constructions I have seen in quite a while:

Garlicky Pork Sausage Stuffed Crisp Fried Maryland Soft Shell Crab

(I omit here the rest of the phrase, which dealt with the accompaniments of Potato Gnocchi, Pesto, Orange Segments, Shaved Fennel and Roaster Pepper Aioli; you may be interested in them, but right now I am interested only in the above nominal constituent.)

The ten words of this syntactically composed phrase yield a fantastically large number of possible parses (the number can be computed using the formula for Catalan numbers, but I am not going to compute it, because if I simply fail to do so, someone with better computational skills will email me with the number later, and will then get their name mentioned on Language Log). If I parse the whole thing correctly, and I think I do, then

the noun pork is used as an attributive modifier of sausage;
the adjective garlicky is used as an attributive modifier of the nominal pork sausage;
the nominal garlicky pork sausage is incorporated (with instrumental meaning) into the complex adjective headed by stuffed;
garlicky pork sausage stuffed is used as an attributive modifier of the nominal constituent formed by all the subsequent words;
crisp modifies fried to form a complex adjective;
the complex adjective crisp fried is used as an attributive modifier of the nominal constituent formed by all the subsequent words;
the proper noun Maryland is used as an attributive modifier of the nominal constituent formed by all the subsequent words;
the adjective soft is used as an attributive modifier of the noun shell;
the nominal soft shell is used as an attributive modifier in a (lexicalized) the nominal constituent whose head is the noun crab.

Thus there are four stacked attributive modifiers of crab. It's not at all unusual to have four modifiers of one noun, of course (my phrase "most complex and varied naturally-occurring nominal premodifier constructions" above has three, counting "nominal premodifier" as one, and I could doubtless have tossed in another one without anyone raising an eyebrow). But the variety and internal complexity of these four caught my syntactician's eye; proceeding from the innermost to the outermost, they are soft shell, Maryland, crisp fried, and garlicky pork sausage stuffed. Three of them are syntactically complex, and each of those has a quite different structure from all of the others. The bracketing of the whole thing is like this:

[ [ [ [ Garlicky] [ [ Pork ] [ Sausage ] ] ] [ Stuffed ] ] [ [ [ Crisp ] [ Fried ] ] [ [ Maryland ] [ [ [ Soft ] [ Shell ] ] [ Crab ] ] ] ] ]

I took all that in at a glance, pocketed a copy of the specials menu so I could explain all this later to you, and turned to a very enjoyable dinner. Linguistics is everywhere. It will even go out to a family dinner with you, and enliven the experience. Linguistics is your friend. Linguistics is like family.

Steve Jones points out that a few hyphens would reduce the ambiguity a lot, and that's true; but the menu was printed with none. Putting hyphens in the right places for optimal parsing is your homework exercise for today.

Posted by Geoffrey K. Pullum at 10:50 AM

May 26, 2006

Retirement or Phase II

What is it in our makeup that keeps us going when it seems like we're old enough to quit work and we should be trying to take it easy? Today's article in The Washington Post (here) about the failure of two Indy car racing icons to retire and stay retired seems to have currency for lots of us retired folks. Michael Andretti and Al Unser, Jr. are back at racing again this year, having officially retired a year or so ago. As Unser put it, they just can't stay away:

"Both of us couldn't make the clean break," said Unser, who said he's giddy to be preparing for his 18th Indianapolis 500 ... I thought I could make a clean break away from it ... I'm not doing this for a living. I'm doing this because I love racing, and that's what I want to do."

When I retired from teaching linguistics, I thought I'd spend the rest of my life doing oil painting, exploring the mountains of Montana, and maybe doing some fishing. Now, ten years later, I still haven't touched the huge supply of art equipment my students gave me as a retirement gift. I quickly sold the pickup truck that I planned to drive in the mountains. And it didn't take me long to discover that I don't much like to fish. Meanwhile, it was very hard for me to get linguistics out of my system.

Like Al Unser, Jr., I couldn't make a clean break. I've also noticed this in many of my colleagues who tried to retire. Among others, Dwight Bollinger, Peter Ladefoged, and Eric Hamp easily come to mind. We love linguistics and that's what we want to keep on doing. Like them, I've failed the course, Standard Retirement 101, several times in the past ten years. I don't teach university classes any longer but I continue to consult, review tenure and promotion applications, read new stuff (and old stuff too), give lectures once in a while, evaluate grant applications and book prospectuses, see more of the world, serve on boards, and, mostly, do a lot of writing. There's a better name for "retirement" here. Let's call it "phase II."

For those of us who love what we do and whose minds are still relatively clear, retirement is probably not the time for dropping everything to sun ourselves on the beach or play shuffleboard in Florida. It's the time to reflect, synthesize, and bring together those loose ends that have nagged at us until now. It's the time to write the books and articles that we didn't seem to have time for while we worked every day in the classroom. And it's a freeing time, with no administrative tasks to interfere with our research and writing, no tenure to strive for, no annual course evaluations, no tests to grade, and no more endless university committees to serve on.

Like Unser, I'm giddy (well, happy anyway) about this phase of my life, whether it's called "retirement" or simply "phase II." And I want to report to younger scholars that there's no need to be afraid of reaching this wonderful stage of life.

Posted by Roger Shuy at 01:43 PM

"Not good enough for us, too good for you"

This bumper sticker was posted by James Joyner at Outside the Beltway ("Congressional Double Standard on Warrants", 5/25/2006), with credit for the slogan given to an "anonymous comment on an Orin Kerr post":

The meaning seems intuitively obvious -- Congress is OK with warrantless wiretapping of citizens ("us"), but objects to a warrant-based search of a congressional office ("you"); and the person displaying the bumper sticker thinks this is hypocritical. But it's not so easy to explain how to get there from the words.

I don't mean the referents of "us" and "you", which have to be inferred from the picture of the capitol building and the current associations of "warrant". [Well, maybe this is a problem -- I thought it was obvious that first person ("us") is in the voice of the person displaying the sticker, while "you" is the U.S. Congress; but Melissa Fox (see below) assumed the opposite assignment.] The problem has to do with two quasi-idioms based on good (in "not good enough for" and "too good for").

The second half of the slogan is pretty easy. If we ask Google what sorts of things are "too good for him|them|you", we find a preponderance of punishments: impeachment, US jails, killing, the death penalty, hanging, horse whipping, etc. The idea is that some crimes are so heinous that the legally-prescribed punishments (impeachment, the death penalty) and even extra-judicial sanctions of whatever kind, are not strong enough responses. In the current situation, according to the bumper sticker, mere FBI investigation and search under the terms of warrants granted for probable cause is "too good for" corrupt members of congress. " I say, search 'em all. Now."

But what about the first half of the slogan? In what sense are warrants "not good enough for us"? When we say that "X is not good enough for Y", and X is some sort of object or situation or process or institution, we generally mean that Y wants or needs something more than what X provides. (Y might have a valid reason, or just feel generally superior to things like X...)

100 feet not good enough for you?
Jimmy Carter, a third generation southern Baptist, has come to a painful decision that the ole-time religion is not good enough for him.
... apparently my software's not good enough for them anymore.
If GnomeMeeting is not good enough for you, where is the problem?
If so, the [audio] system is probably not good enough for you.

So the first half of the bumper-sticker slogan ("Warrants: not good enough for us...") suggest that we want something more than what warrants provide. This is confusing, since the main complaint about the warrantless wiretapping was that it is extra-judicial -- if the FISA court had been used, most people would not have objected. So does displaying this bumper sticker (or wearing one of the t-shirts) commit the user to the view that even FISA warrants would have been "not good enough"?

The other possibility, I guess, is that the "warrants are not good enough" complaint was meant to refer not to extrajudicial wiretapping but to the concerns (fairly widespread in the libertarian blogosphere) over "intrusive paramilitary raids" carried out with warrants.

A less generous interpretation of the bumper sticker's sentiment might be: "never mind all this business about warrants, the feds should just search the bad guys (themselves?) and leave the good guys (us) alone". That's a natural human reaction, but not much of a judicial (or logical) principle.

Great bumper sticker, though.

[Update -- Melissa Fox writes:

I was surprised to see that you parsed the slogan from the bumper sticker as referring to the citizenry as 'us' and Congress as 'you'. My reading had been that the capitol dome indicated Congress was the speaker -- so warrants aren't good enough for us (congress) but too good for you (the people); that is, normally a warrant is enough to allow a search, but not for members of Congress, oh no, warrants aren't good enough -- what do they want, a sign from above? Conversely, as you say about the second half (only with opposite referents), the contempt in which Congress seems to hold the American people suggests that we don't even deserve to be protected by due process, to be notified before our homes and offices are searched or our phone conversations recorded, etc., etc. Warrants are too good for us -- which, as it's Congress speaking to us, means they're "too good for you".
That seemed so intuitive to me that I had to read your LL post three times before I got it. :-/

Hmm. I'm not sure whether this is evidence that this is not such a great bumper sticker after all, or that it is even better than I thought :-). If I have time later today, I'll put up an interactive poll so that we can get a sense of how many people construed the slogan in which way...]

[Karen Davis and Fernando Pereira agree with Melissa Fox -- I can see that I've got the back end of the elephant on this one.]

Posted by Mark Liberman at 06:37 AM

May 25, 2006

Not old enough for sex, by half

I was surprised to find that a linguistic point was front and center in Dan Savage's widely published raunchy sex advice column ‘Savage love’ last week.

"I'm a straight guy, 17-and-a-half", wrote an advice-seeking reader whose "Catholic Christian girlfriend" is "still a virgin" and has stubbornly not agreed to have sex even after "more than four months" of dating. Is four months of "stalemate" too long? Should he hold or fold? "Please help", he says.

And Dan Savage (not one of your feelgood, I'm-OK-you're-OK therapists) comes roaring down on him, armed with a crucial linguistic piece of evidence:

...no one who gives his age as "whatever-and-a-half" is mature enough to be having sex himself, much less sitting in judgment over someone else's decision not to have sex.

Dan's quite right, as far as my linguistic intuition goes: there is some vaguely delimited age at which you stop counting your age in steps smaller than one year, and the age at which it seems reasonable to say that a person is truly mature enough for the responsibilities that go along with sexual activity seems (forgive me, sexually active junior high-schoolers) to be broadly located somewhere after that point. I hadn't explicitly noticed that before, but I think Dan has identified a reasonable rule of thumb.

To make it depend on something more robust than my linguistic intuitions, or Dan Savage's, let's look at the Google hit counts for a few relevant phrases. Some will be bad hits (like across sentence boundaries, or followed by "months"), and at 22 there is a bad data point because of a much-quoted historical reference, but the pattern is a descending one from 18,900 hits for two and a half down to zero at twenty-five and a half. It's very clear from a graph on a logarithmic scale:

The raw data follow:

Phrase

G-hits

aged two and a half
aged three and a half
aged four and a half
aged five and a half
aged six and a half
aged seven and a half
aged eight and a half
aged nine and a half
aged ten and a half
aged eleven and a half
aged twelve and a half
aged thirteen and a half
aged fourteen and a half
aged fifteen and a half
aged sixteen and a half
aged seventeen and a half
aged eighteen and a half
aged nineteen and a half
aged twenty and a half
aged twenty-one and a half
aged twenty-two and a half
aged twenty-three and a half
aged twenty-four and a half
aged twenty-five and a half

18,900
631
468
134
134
133
80
81
31
27
32
17
21
32
31
35
16
10
1
1
>3
1
1
0

Endnote: I am expecting to get a certain amount of joshing around the corridors of Language Log Plaza, and perhaps a certain amount of mail from impudent strangers, over what exactly I was doing reading a sex advice column in Boston's raunchiest free weekly, The Dig. The answer is that it is the duty of the Language Log staff to scan the widest possible array of popular media to provide you with the hard, penetrating (oops, 'scuse the metaphor) linguistic analysis you have come to expect. For us, all human linguistic life is worthy of study. There is nowhere that Language Log will not go to find linguistic insights for your interest and reading pleasure: the casinos of the Las Vegas strip; the pages of 18th-century pornography; the sleazier side of the psychological study of transsexuals; graffiti in men's bathrooms... There is nowhere we will not go, nothing we will not read, if it is in the service of linguistic science.

Posted by Geoffrey K. Pullum at 06:50 PM

I have stress! You have stress! Not resolved!

The latest "viral video" to become a global sensation via the Youtube website is a six-minute clip from Hong Kong called "Bus Uncle" (or "Uncle Bus," as Wikipedia currently renders it). It's a cell-phone video shot on a bus, documenting an older passenger chewing out a younger one who dared to tap him on the shoulder for talking too loudly. Here is how a recent AP article describes the encounter:

The film starts out when the protagonist, a middle-aged man, reacts strongly when a young man sitting behind him taps his shoulder to ask him to keep his voice down while talking on the phone.
"I don't know you. You don't know me. Why do you do this?" the infuriated bus rider says, punctuating the sentence by jabbing his right hand downward in the air.
When the young man, who rarely talks back during the lengthy argument, expresses an unwillingness to continue the conversation, the middle-aged man explodes, "This is not resolved! This is not resolved! This is not resolved!" — which has now become a catch phrase in Hong Kong.
He goes on to say, "I face pressure. You face pressure. Why did you provoke me?"

The belligerent man goes on to make some indecent remarks about the younger man's mother, all the while bullying him into shaking his hand to "settle" the dispute. It makes for strangely compelling viewing — indeed, the original video has been viewed about 1.8 million times since it was uploaded on April 29th. And that doesn't include other versions, such as one subtitling the Cantonese dialogue with English and Mandarin (highly recommended), not to mention musical remixes and various parodies. Thanks to Youtube, the "Bus Uncle" catchphrases have become firmly lodged in Hong Kong pop culture. And thanks to the new AP article, English translations of the phrases are now circulating widely. One blogger says she has taken to repeating the translations without even having seen the video. As a public service, I've transcribed and transliterated two of the most popular catchphrases for those who would like to spread the meme in the original Cantonese.

Disclaimer: I don't know a lick of Cantonese, but fortunately these two phrases are lexically and syntactically simple enough that I was able to piece the transliterations together without too much trouble using Adam Sheik's online Cantonese Dictionary. I've provided the original phrases with Jyutping romanization and word-for-word glosses. (See here for how to interpret the tone numbers.)

我有壓力 ! 你有壓力 !
"I have stress! You have stress!"

我有壓力你有壓力

ngo5 jau5 ngaat3 lik6 nei5 jau5 ngaat3 lik6

I have pressure
you have pressure

未解決 !
"Not resolved!"

未解決

mei6 gaai2 kyut3

not yet solve

With any luck, these catchphrases will achieve the prominence of other Youtube-disseminated memes, such as, say, "Mr. Pibb and Red Vines equals crazy delicious."

[Update: You never know where those "Bus Uncle" phrases are going to show up next. Here's a lovely if enigmatic image from no-sword:

]

Posted by Benjamin Zimmer at 06:05 PM

Public competence, linguistic and otherwise

In response to my call for "broader common-sense discussions" of language-related issues, Ryan Miller wrote:

I just wanted to note that if your hope for public knowledge of linguistics is that it reach the level of public competency that "automobiles, computers, investments, or court cases" have revealed, then you may well be disappointed with your results. Marginal Revolution had quite the head-shaking over the inability of economics graduate students to pass a multiple-choice test on opportunity cost, and I don't gather that most people have any idea what habeus corpus means or what diesel fuel is or the difference between documents and applications (if the latter seems improbable to you, ask any help-desk technician). So I suspect you will either be easily satisfied or easily disappointed.

I recognize that not all public discussion of opportunity cost, habeas corpus, diesel fuel or software is well informed, but at least some of it is. And I continue to believe that weblogs and other social media can provide a sort of inverse Gresham's Law, which I formulated this way: "[O]pen intellectual communities intrinsically tend to generate a virtuous cycle: if there were an order of magnitude more science writing in blogs, there'd be less than an order of magnitude more crap, and more than an order of magnitude more good stuff".

(I'm sure this idea is not original, but I don't recall where I've seen it before. As an idea about rational inquiry in general, it surely goes back at least to Roger Bacon.)

Based on this virtuous-cycle perspective, I'll be satisfied if the amount of discussion increases, and the rate of growth of bullshit is significantly smaller than the rate of growth of sensible and informed stuff.

Ryan cited the infamous Tuttle Software Correspondence as evidence for his view that the "level of public competency" in the computer area might be disappointing to me. On the contrary, I see this as a pretty favorable case -- a large community of well-informed people have been shaking their collective heads over one individual's stubborn misunderstandings. And as far as I can tell, the issue wasn't picked up by the AP or AFP newswires (much less the NYT or the Guardian) and presented by technologically-ignorant reporters under headlines like "Software company defaces town website". If language-related issues and events generally came out this well, I'd be ecstatic, not just satisfied.

[Update -- Mike Albaugh comments:

In the discussion about "Licensed Linguists", you quoted Ryan Miller in re: how the great unwashed don't seem to grasp the difference between applications and documents. Perhaps their guts sense that the diference is not as clear-cut as the Ryan Millers of the world think it is. Perhaps he was one of the ones confidently assuring his friends that there was no possibility of an email virus, on the eve of "I Love You".
A little knowledge is a dangerous thing. If you get very far into any subject, you find yourself sounding like a lawyer, or Rabbi: "It depends".

Well, I'm sure that Ryan knows about rogue email attachements and word-processor macros, and was just waving a hand at the numerous faulty presuppositions that help-desk folks have to deal with every day; but Mike's point is also a valid one. It's a strength of vigorous civic discourse that assumptions are always getting questioned, and compelling questions have a reasonable chance of reaching a critical mass of people.]

Posted by Mark Liberman at 09:06 AM

May 24, 2006

On Learning Mandarin in America

[Guest post by Victor Mair]

Two daughters of a couple whom Li-ching and I have known for many years both went to the same elite American university (one of the very best in the United States; extremely competitive to get into). The father and mother are mainland Chinese who were born in Taiwan, went to National Taiwan University, and attended graduate school in the United States. Both of the daughters were born in America, but the father and mother have always spoken Mandarin to them and required that they attend weekend schools for Mandarin instruction throughout their primary, junior high, and high school education.

The two girls, who are very diligent, intelligent, and obedient (TING1HUA4), probably recognized about 500 HAN4ZI4 before entering college and could speak at the intermediate level, but they could barely write anything and had difficulty reading even children’s books (admittedly, Chinese children’s books are infamous for not being written in a style appropriate for young readers). When they went to their elite university, they essentially had to start all over again at the beginning.

Both of the daughters went through the third-year level of Mandarin at the university. The elder graduated about five years ago, and her Mandarin now is probably at the same level she had achieved when she was in the fifth grade of elementary school – viz., barely functional for reading and writing, and moderately fluent for speaking and listening. Her younger sister is now in 3rd-year Mandarin at the university and is likely going to end up at about the same level in all four skills (reading, writing, listening, and speaking) as her JIE3JIE.

In fact, what prompted me to write this account at all is the sensational realization by MEI4MEI’s mother that she was writing her Mandarin compositions with a software program that permits one to type in English sentences and let the computer convert them into Mandarin! The mother discovered this situation when she read one of her daughter’s compositions and could scarcely believe how ungrammatical and unnatural it was. Such software is apparently in quite widespread use, not just in America, but even in Hong Kong, Singapore, and elsewhere.

Now, JIE3JIE and MEI4MEI are both highly intelligent and hard-working, but their Mandarin is, for all intents and purposes, useless when it comes to reading and writing, and minimal when it comes to speaking and listening – despite the fact that they have spent more than 15 years studying it in school and university (plus at home). And I’m sure that this tragic situation is by no means peculiar to these bright sisters, but is endemic among many (actually most) students across the country and, indeed, throughout the world. How can this be?

Well, as I knew intuitively when I began studying Mandarin in 1968 and have reiterated on countless occasions since then, it’s because there’s far too much emphasis on HAN4ZI4 from the very beginning. I believe that students should NOT be exposed to HANZI for **at least** the first year of instruction, and preferably not for the first two years. Only then should the characters be gradually introduced. Why? The answer is simple: students need to master the basics of the language (pronunciation, vocabulary, grammar, syntax, idioms, etc.) before they are required to memorize hundreds of HANZI. It is essential to internalize the patterns and structures of the language before spending endless hours vainly striving to master large numbers of HANZI. As a matter of fact, students who initially concentrate exclusively on learning pronunciation, grammar, syntax, etc. and are not burdened by having to memorize HANZI actually learn the HANZI more effectively and quickly later on when they do start to acquire them systematically **in relation to the language as a whole.** This has been proven in the ZHU4YIN1 SHI4ZI4, TI2QIAN2 DU2XIE3 (“recognize characters [through] phonetic annotation, speed up reading and writing”) experiment in the People’s Republic of China. It is also borne out by the experience of students who learn Cantonese, Taiwanese, and other Sinitic languages without having to be saddled with the HANZI. It is remarkable how swiftly such students can attain fluency when exposed to a full course of instruction and exercise.

The negative effects of having to learn HANZI during the first few years of instruction are numerous. In the first place, they consume limitless amounts of time that would better be spent on actual language learning. Secondly, they emphasize monosyllabic morphemes over multisyllabic words, which constitute the overwhelming proportion of the lexicon. Third, the HANZI themselves cannot be efficiently memorized in the absence of a prior mastery of the fundamentals of the language, thus a vicious cycle ensues, with neither language acquisition nor control of the script making any noticeable progress.

JIE3JIE’s and MEI4MEI’s mother is still earnestly seeking a way to improve the reading and writing skills of her daughters in a realistic manner. I have a simple solution: make available a large amount of quality literature with phonetic annotation for each character, preferably showing word division marked by spaces (FEN1CI2 LIAN2XIE3). This is comparable to KANJI texts with FURIGANA in Japanese.

Students should not be blamed for being poor learners when it is their teachers who employ outmoded, impractical methods.

[Guest post by Victor Mair.]

Posted by Mark Liberman at 06:34 PM

Broader common-sense discussions, not narrower "licensed" commentary

There's a lot of confident-sounding nonsense out there about language, in print and in conversation. And it's natural for Roger Shuy, who does a lot of work in forensic linguistics, to describe this in terms of the metaphor of "practicing linguistics without a license". However, I'm afraid that some of the nonsense comes from people with impeccable academic credentials, while some of the sensible stuff comes from people who are smart, rational and curious but don't have any diplomas. And the idea of being "licensed to practice linguistics" suggests that we want people to be passive consumers of expertise, whereas the truth is just the opposite. It's great that so many people want to talk about language. Our suggestion is not that they shut up and leave it to the experts, but rather that they put some effort into learning and thinking as well as into writing and talking.

In the case that Roger was commenting on, my beef with Stewart Lee was not that he's a comedian without any particular credentials in linguistic analysis. My problem was that his ideas are plain nonsense, in ways that anyone who can read his essay can easily understand. For example, Lee says that the "pull back and reveal" type of joke doesn't translate well from English into German, because "the rigours of the German language's far less flexible sentence structures" "prevent using little linguistic tricks to conceal the subject of our sentences until the last possible moment". But in his examples, the "reveal" is not determined by the order of words in a sentence, but rather by the order of clauses in a discourse -- a matter in which German differs from English not at all. (For that matter, the sentential word-order in his example doesn't crucially differ between the two languages, though that isn't relevant to his argument.)

You don't need a course in linguistics, much less an advanced degree or any other sort of credential, to see the problem here. You just need is to understand the meaning of the words Lee used -- or to look them up on line if you're not sure -- and to ask yourself a few simple questions about the logic of his argument.

The same thing is true about the other case that I mentioned, the claims of Prof. Jean-Claude Sergeant about the paradoxical rigidity of the English language in comparison to French. Prof. Sergeant is certainly as licensed as they come, in formal terms -- "Director of the Maison Française in Oxford, a research centre funded by the French Ministry of Foreign Affairs ... former head of the British and American Studies Department at the ... Sorbonne Nouvelle", recipient of an honorary degree in 2002 from Oxford University. But what Prof. Sergeant had to say about English in relation to French was just as confidently nonsensical, in my opinion, as what Mr. Lee had to say about English in relation to German. And anyone who can read his essay, and cares to take the time to think through what it means, and to spend a few more minutes looking at the facts, can see this.

Like Roger, I'm very much in favor of better linguistics education in the general curriculum. More linguistic knowledge and more analytic skills and more practice in analysis and argumentation would surely be a good thing. But the pay-off, in my opinion, would be public discussion of language that is as vigorous, rational and well-informed as (say) public discussion of automobiles, computers, investments or court cases. And the way to get there is probably to encourage broader discussion -- and therefore more nonsense -- rather than less.

At least, that's what I argued last year about science blogging in general ("Raising standards -- by lowering them", 3/7/2005). I offered a three-point plan for improving scientific communication, whose first point was:

Encourage everyone to think about science, and to write about it on the web, whether they know anything about it or not. And encourage them to criticize what others write, and to read others' criticisms, and to tell their friends about the best stuff that they find, whether in the popular media, or in the technical literature, or in weblogs. I claim that open intellectual communities intrinsically tend to generate a virtuous cycle: if there were an order of magnitude more science writing in blogs, there'd be less than an order of magnitude more crap, and more than an order of magnitude more good stuff. (The same is probably true for science writing in newspapers, though the network effects are smaller there.) This follows from a scientific version of Moglen's Metaphorical Corollary to Faraday's Law: add more wires, lower the resistance, and more intellectual current is induced.

In my opinion, the same thing goes for language-related writing, whether strictly scientific or simply rational. I know that Roger feels basically the same way that I do. It's just that when you hear some of the crap that people come out with, it's hard to resist the temptation to turn on the siren and write them a ticket.

Posted by Mark Liberman at 05:37 PM

Report from the language log security department

Mark Liberman's post (see here) on the Guardian article in which the writer practices linguistics without a license echoes many other Language Log complaints about this crime, which appears to be running rampant throughout our society these days, not just in the press. For example, one evening at a dinner party I was the only linguist present when a psychiatrist sitting next to me felt it necessary to lecture me on the evils of Vernacular Black English, uttering nonsense in virtually every sentence. More recently, I've found that the legal profession has a distressing lack of knowledge about our field. And I won't even go into the problems that educators have with this. But let me relate an incident that happened to me just yesterday.

In my role as one of the security officers at Language Log Plaza, I was forced to issue a citation to a PhD dissertation writer who, in her preliminary draft, falsely referred to part of her research as linguistics. Okay, it was a psychology dissertation and so maybe I should have cut her some slack. But, as a law-abiding linguist, I felt that I had to enforce the law. So I ticketed her for driving her dissertation on the wrong side of the road. In a conciliatory tone, I explained that I was doing this for her own good. I wouldn't want readers of her final draft to accuse her of practicing linguistics without a license.

Like many psychology dissertations, her research was a rather good content analysis of lots of data that she had carefully gathered. Among other things she used a computer program to count the number of times personal pronouns and other language features occurred. She backed this up with a good statistical analysis. The problem is only that she referred to this as a linguistic analysis. She found instances of indirectness but didn't say anything about how this worked. It was simply a good tallying exercise. Same for conditionals, passives, and instances of politeness. No linguistic analysis of these features--only their presence or absence. She used a research approach that was appropriate enough for what she was trying to accomplish but it simply wasn't linguistics.

Mark is quite correct about the way journalists mangle linguistics in articles that compare one language with another. Past Language Log posts, too numerous to mention here, have also dealt with journalistic ignorance of such things as the alleged language learning of birds and animals, how many words there are in the English language, and that there are no words for X in language Y. Maybe it's our job to give citations to offenders but there is also a whole lot of work for us to do in the education of our sister disciplines. We haven't been very good at this.

Posted by Roger Shuy at 12:38 PM

Thriving on confusion in the Guardian

Last year, I wrote about Jean-Claude Sergeant's view of the English language ("Paradoxes of the imagination"; "Another over-earnest comedy of fact checking"). Sergeant, then "professeur de civilisation britannique" at the University of Paris III, informed us solemnly that

In its present configuration, current English is characterized first by an extreme concern for coherence and for explicitness approaching redundancy. The core constituents of the phrase -- subject, verb, complement -- cannot be as easily separated as in French, and the order in which they occur in the phrase is less susceptible of modification.

Now Stewart Lee ("Lost in Translation", Guardian 5/23/2006) tells us that

The German language provides fully functional clarity. English humour thrives on confusion.

It's hard to keep track of the intricate graph of European ethnic stereotyping, involving relative degrees of rationality, punctuality, diligence, food-preparation skills, and so on, though certain north-to-south and west-to-east trends are obvious. But I'm beginning to get the idea that at least one of these relations of mutual prejudice is symmetrical: speakers of language X always think that language Y is less flexible than X, for any values of X and Y.

Lee's conclusions about the German language are based on what must have been a very curious experience:

In December 2004 I accompanied Richard Thomas, the composer of the popular stage hit Jerry Springer The Opera, to Hanover, where he had gained a commission to develop an opera about a night in a British stand-up comedy club. We wrote the words in English and Richard then collaborated on a translation with a talented German comedy writer called Hermann Bräuer.

Throw in a machine-translation researcher, and you've got the premise of a Richard Powers novel.

Anyhow, it turns out that translating the jokes in this opera libretto was hard. You might think that this is because it's hard to translate songs, and hard to translate jokes, and doubly hard to translate sung jokes. However, Lee concluded that blame should be assigned to "the rigours of the German language's far less flexible sentence structures". Specifically, "German will not always allow you to shunt the key word to the end of the sentence to achieve [the] failsafe laugh" associated with "the endless succession of 'pull back and reveals' that constitute much English language humour".

He gives an interesting, quasi-formal analysis of "pull back and reveal" humor (more fodder for that Richard Powers novel):

At a rough estimate, half of what we find amusing involves using little linguistic tricks to conceal the subject of our sentences until the last possible moment, so that it appears we are talking about something else. For example, it is possible to imagine any number of British stand-ups concluding a bit with something structurally similar to the following, "I was sitting there, minding my own business, naked, smeared with salad dressing and lowing like an ox ... and then I got off the bus." We laugh, hopefully, because the behaviour described would be inappropriate on a bus, but we had assumed it was taking place either in private or perhaps at some kind of sex club, because the word "bus" was withheld from us. Other suitable punchlines for this set-up would be, "And that was just the teachers", "I was 28-years-old" and "That's the last time I attempt to find work as a research chemist in Paraguay."

At the risk of being accused of spoiling a good story with a mutant American version of Teutonic over-rationality, I have to point out that this makes absolutely no effing sense at all. I don't mean that Lee's example needs work (although it does -- if you try telling his joke, in English, with any of the proposed punch lines, you're more likely to get puzzled stares than laughs). Let's imagine substituting a better joke, and go on. The "pull back and reveal" joke structure, as Lee describes and exemplifies it, consists of a sequence of clauses: A, B, C ... and then D. Is there any language on earth in which you can't tell a story that way? My knowledge of German barely reaches the ability to read with the help of a dictionary, but that's enough to make me sure that if any language is so bizarrely crippled, it's not German.

[In the particular case of "I got off the bus" as a punch line, German might prescribe "... dem Bus aus" instead of "... off the bus" -- but can that possibly spoil the associated joke, if any? And what if the joke involved "off" rather than "bus" -- "and then I got OFF the bus" -- then German would allegedly have the advantage, right?]

To this incoherent theory about the alleged role of sentence structure in the cross-linguistic rhetoric of alleged humor, Lee adds some equally incoherent stuff about the effect of German noun compounds:

In English there are many words that have double or even triple meanings, and whole sitcom plot structures have been built on the confusion that arises from deploying these words at choice moments. Once again, German denies us this easy option. There is less room for doubt in German because of the language's infinitely extendable compound words. In English we surround a noun with adjectives to try to clarify it. In German, they merely bolt more words on to an existing word. Thus a federal constitutional court, which in English exists as three weak fragments, becomes Bundesverfassungsgericht, a vast impregnable structure that is difficult to penetrate linguistically, like that Nazi castle in Where Eagles Dare.

Penetrating this fortress of balderdash is left as an exercise for the reader. I'll supply one clue, namely a German joke that depends on a pun wrapped in a noun compound:

Aus welchem stahl macht man Autos in Polen? Diebstahl.

Literal translation: "From what steel do they make cars in Poland? Theft." Linguistic background: stahl means "steel"; stehlen means "to abstract, appropriate, steal", preterite form stahl etc.; dieb mean "thief", and the compound diebstahl means "larceny". Cultural background: Poland is apparently a notorious destination for cars stolen in Germany.

Can you translate that joke? No, not really. Is it because {English|German} is {more|less} ambiguous or linguistically penetrable than {German|English}? The answer is left to you.

Finally, Lee offers a third "explanation" for why German is allegedly a bad language for jokes. According to him, in German

... [t]here seemed to be no nuanced, nudge-nudge no-man's land, where English comic sensibilities and German logic could meet on Christmas Day and kick around a few dirty jokes in a cheeky, Carry On-style way. A German theatre director explained that this was because the Germans did not find the human body smutty or funny, due to all attending mixed saunas from an early age.

At this point, I'm beginning to think that "Stewart Lee" is the invention of a team of writers from the Onion. There are no doubt some cultural differences in deploying sexual allusions in humor, but are saunas really a ubiquitous fixture of modern German family life, or has "Lee" carelessly displaced this practice a country or two southwards? And aren't there any metaphors for English-German translation that don't involve WWI or WWII?

I don't have any sort of broad empirical basis for evaluating Lee's conclusion:

The geographical accident of Germany has denied Germans the fun we have with language, and it seemed to me that their sense of humour was built on blunt, seemingly serious statements, which became funny simply because of their context.

But my experiences with German friends, and what I little know of German literature and German humor -- Freud's discussions of humor will do for a start here -- leaves me very skeptical of the notion that Germans are "denied the fun we have with language" and that "their sense of humour [is] built on blunt, serious statements".

I guess it's inappropriate to expect coherence in an opinion piece by a comedian, even if its veneer of rationality suggests that its description of the English and German languages is meant to be meaningful. And there's some good stuff in the article. Lee mixes his little spurts of ethnic prejudice and his incoherent linguistic analyses into a slurry of interesting anecdotes and jokes -- leading with a pretty good version of the traditional essentialist joke about the ethnically-German child raised by English parents -- which many will enjoy.

Far be it from me to suggest that the Guardian needs theory checkers. And it's obviously impractical to imagine that people in general, and comedians in particular, will ever give up basing their opinions on unsupported and unexamined national stereotypes. But I can hope that someday, people in a position to write for outlets like the Guardian will have gotten some elementary linguistic analysis skills somewhere along the way.

[Hat tip to reader Ben Hadley.]

[Update -- John Cowan writes:

You write:

> At the risk of being accused of spoiling a good story with a mutant
> American version of Teutonic over-rationality, I have to point out that
> this makes absolutely no effing sense at all. I don't mean that Lee's
> example needs work (although it does -- if you try telling his joke,
> in English, with any of the proposed punch lines, you're more likely
> to get puzzled stares than laughs).

That's because "English humor" doesn't mean "humor expressed in the English language", it means "what those peculiar people in the south-eastern part of Ysl Prydain think is funny." And from the English point of view, we Americans are nothing but a lot of anglophone Germans. We can't even get the local-politics jokes in Monty Python episodes. (Seriously, Edward Hall does rate American culture as only slightly less low-context than German culture, and far more so than English, French, or New World Spanish culture. High-context cultures like the English have no problem with finding jokes like that funny.)

That's this Edward Hall, and John is usually sensible and well informed, but is that joke about getting off the bus actually perceived as funny (as opposed to silly) by UK residents? Even those from the London area? ]

[Update #2 -- Karen Davis writes:

You quote Stewart Lee as saying:
At a rough estimate, half of what we find amusing involves using little linguistic tricks to conceal the subject of our sentences until the last possible moment, so that it appears we are talking about something else. For example, it is possible to imagine any number of British stand-ups concluding a bit with something structurally similar to the following, "I was sitting there, minding my own business, naked, smeared with salad dressing and lowing like an ox ... and then I got off the bus."

I find it difficult to think of "and then I got off the bus" as the subject of the rest of it.
Perhaps he meant "the topic of our jokes"? But then, why say "the subject of our sentences" - and why believe that German can't tell that story, with an entire sentence (then I got off the bus) last? And why oh why doesn't he see that in none of his sample sentences was the subject concealed? "I" is right up at the front!

Indeed. The equivocation "subject of the sentences" vs. "topic of the joke" was one of the first things that I noticed about Lee's article. I wound up leaving it out in favor of other confusions, but maybe that was a mistake. Anyhow, there are a lot of interesting fragments of ideas floating around in what Lee has to say. The problem is that no one seems to have taken any trouble to try to clarify what they mean, how they go together, and whether they're true. That's reasonable practice for a stand-up routine, I guess, but it strikes me as out of place in an essay published in what claims to be the "best daily newspaper on the world wide web". ]

[Update #3 -- Margaret Marks of Transblawg writes:

I picked up the Guardian article on German humour too, because Trevor pointed it out. He didn't think it was as ridiculous as I did! Anyway, thanks for the detail.
That getting off the bus joke is just silly to me (from the London area!)
And Germans have several ways of expressing that:
dann bin ich aus dem Bus ausgestiegen
dann stieg ich aus dem Bus aus
dann verließ ich den Bus (this does have a snappier and more amusing quality)
And they don't have to use a subclause - they can just string a sentence on
Ich stieg dann aus dem Bus aus
Ich bin dann aus dem Bus ausgestiegen

This Stewart Lee appears to exist, but it is amazing people can write such rubbish.
Anyway, the Germans laugh at my jokes. Should I be worried? (Actually, when I first spent a year here, I got on much better in the six months in Berlin than the six in Franconia, from the witticism point of view. Germany is very varied, and Berlin humour is not the same as Bavarian).

]

[See Abiola Lapite's post at Foreign Dispatches " Nice Theory, Shame about the Facts" (5/25/2006) for a congruent point of view with some excellent examples and clear-headed reasoning about them. I especially liked his closing point:

Usually, when people are looking for foreign languages to construct elaborate Sapir-Whorfian theses of difference around, they're prudent enough to opt for something so exotic and alien that no one is likely to call them on their claims: quite why Mr. Lee chose to use a language so similar to his own for such a purpose - and therefore so easy to check his claims against - is mystifying to me.

Perhaps it's that comedians are not used to being fact-checked. (But then how do we explain science journalists?) ]

Posted by Mark Liberman at 09:18 AM

May 23, 2006

Never mind the prose, plot and gnosticism...

Clay Jones, from the Free Lance-Star in Fredericksburg, Va., reminds us what the real issues are. [Hat tip to Cynthia McLemore.]

Posted by Mark Liberman at 12:16 PM

The most powerful person no one has never heard of

According to a U.S. News article by Chitra Ragavan ("Cheney's Guy", 5/29/06):

[David] Addington, says an admiring former White House official, is "the most powerful person no one has never heard of."

There's one too many negatives in that sentence, or one too few. The article's subhead says it the way the the source meant it: "He's barely known outside Washington's corridors of power, but David Addington is the most powerful man you've never heard of."

But overnegation is easy to fail to miss, as we've (shockingly) often observed. [Update -- extra links added 5/18/2007.]

"Negated, or not" (1/21/2004)
"I challenge anyone to refute that this negative is not unnecessary" (1/21/2004)
"Challenge as negation" (1/23/2004)
"Too complex to avoid judgment?" (2/21/2004)
"Who is to be master?" (2/21/2004)
"On not avoiding negatives" (2/21/2004)
"Why are negations so easy to fail to miss?" (2/26/2004)
"Overnegation supererogation" (4/12/2004)
"Another overnegation" (4/27/2004)
"We cannot/must not understate/overstate" (5/26/2004)
"Another overnegation opportunity: yet vs. yet to" (5/31/2004)
"Overstating understatement" (6/22/2004)
"Nothing that cannot impede even by failure" (8/16/2004)
"Rumsfeld overnegates Powell, Powell uses 'fulsome' correctly" (11/16/2004)
"Overnegation alert" (1/11/2005)
"Still unpacked after all these years" (5/17/2005)
"Still upacked: threat or menace?" (5/17/2005)
"The temptation of overnegation" (5/23/2005)
"Things that are rarely better than they normally are" (10/17/2005)
"Never anything but less than precise" (10/20/2005)
"Negation, over- and under-" (12/21/2005)
"On not emerging unscathed" (3/1/2006)
" Not doubting that the door could not be opened wider" (6/5/2006)
" Unlike no other" (7/27/2006)
" It's hard not to read this and not do a double-take" (8/1/2006)
" Been anything so long it looks like not to me" (8/3/2006)
" Overnegation as obfuscation" (8/9/2006)
" Scalar failure" (3/5/2007)
" Everyone was spared no mercy" (3/26/2007)
" Barely missing a chance to overanalyze" (4/1/2007)
" Total undernegation" (4/17/2007)

[U.S. News quote via Andrew Sullivan -- who was focused on the morality of torture, an incomparably more important issue, and apparently missed the extra negation. ]

Posted by Mark Liberman at 07:49 AM

A new rhetorical technique

Or rather, a new term for an old rhetorical technique. John Holbo, "The Mark Steyn Code", describes premature dejoculation: "stripping off the humor so you can paint it on again". As he explains,

Something like this problem actually arises in academic writing. In order to have something original to say, you pretend so-and-so didn’t see something about his own position which, plausibly, he really did. (Not a strawman argument, because you aren’t exactly attacking. A straight man argument. You need them to say something a bit obtuse, to make a space for your cleverness.)

Among the comments, Holbo hints at a relationship to Harold Bloom's theory of misreading, which is described by the Wikipedia as follows:

"Poetic influence, as I conceive it, is a variety of melancholy or the [Freudian] anxiety-principle." A new poet becomes inspired to write because he has read and admired the poetry of previous poets; but this admiration turns into resentment when the new poet discovers that these poets whom he idolized have already said everything he wishes to say. The poet becomes disappointed because he "cannot be Adam early in the morning. There have been too many Adams, and they have named everything."

In order to evade this psychological obstacle, the new poet must convince himself that previous poets have gone wrong somewhere and failed in their vision, thus leaving open the possibility that he may have something to add to the tradition after all.

Posted by Mark Liberman at 06:48 AM

Dan Brown, evangelist?

Laurie Goodstein, in the May 21 2006 NYT, explains that "It's Not Just a Movie, It's a Revelation (About the Audience)":

Even Tom Hanks, the lead actor, called the plot "scavenger-hunt-type nonsense." But it is doubtful the uproar will disappear.

The reason is that "The Da Vinci Code" is, in the sweep of Christian history, a historical marker — encapsulating in one muddled movie an era in which many Christian believers have assimilated a whole lot of new and unorthodox ideas, as well as half-truths and conspiracy thinking, into their faith, while still seeing it as Christianity. Call it Da Vinci Christianity.

In support of this view, Goodstein quotes an evangelical pollster, George Barna, as saying that

25 percent of those who had read the book said it helped them achieve personal growth or understanding.

More exactly, according to the article that Goodstein seems to have been working from, which is posted on the Barna Group's website as "Da Vinci Code Confirms Rather Than Changes People's Religious Views":

Among the adults who have read the entire book, one out of every four (24%) said the book was either “extremely,” “very,” or “somewhat” helpful in relation to their “personal spiritual growth or understanding.” That translates to about 11 million adults who consider The Da Vinci Code to have been a helpful spiritual document.

To place that figure in context, the Barna study revealed that another recently published popular novel about Jesus Christ – Christ the Lord: Out of Egypt , written by Anne Rice – was deemed to be spiritually helpful by 72% of its readers – three times the proportion who lauded Dan Brown’s book.

On the other hand, I imagine that about 300 times more people have read Dan Brown's book, yielding 100 times or so more spiritual influence.

Goodstein also quotes some of Barna's other findings, for example:

"Few people said that reading the book had actually changed any of their beliefs," he said. "That was only 5 percent. Most people said that it essentially reinforced what they believed coming into the book."

Again, 5% of 44 million is more than two million. "Changed any of their beliefs" is a pretty low standard, but comparing apples and oranges, we can observe that the Campus Crusade for Christ, the "largest evangelical organization in the United States" according to USA Today (cited here), says on its website that "Over the past five years, more than 37,900 students made a decision to become a Christian".

The Barna Group article makes the same point in a different way:

“On the other hand,” [Barna] continued, “any book that alters one or more theological views among two million people is not to be dismissed lightly. That’s more people than will change any of their beliefs as a result of exposure to the teaching offered at all of the nation’s Christian churches combined during a typical week.”

Let me suggest a different point, which is plausible though not supported by any polling: TDVC has made more of an impact on Americans' models of literary style and plot construction than exposure to all the teaching offered in all of the nation's English courses over the course of a decade.

The Barna Group article continues:

The people most likely to have altered their religious views in response to the book’s content were Hispanics (17% of those who read the book), women (three times more likely than male readers to do so), and liberals (twice as likely as conservatives). Upscale adults were also much more likely than downscale individuals to shift their thinking based on the novel.

I'm almost ashamed to say that I still haven't read it. It's starting to feel like an unpleasant but unavoidable civic duty to do so.

Goodstein's "Da Vinci Christianity" thesis -- that in America, "many Christian believers have assimilated a whole lot of new and unorthodox ideas, as well as half-truths and conspiracy thinking, into their faith, while still seeing it as Christianity" -- seems consistent with the theme of Harold Bloom's 1993 work The American Religion, which the Publishers Weekly review on amazon.com describes this way:

Without knowing it, American worshipers have moved away from Christianity and now embrace pre-Christian Gnosticism. ... In his most controversial book to date, the Yale professor defines "the American Religion" as a Gnostic creed stressing knowledge of an inner self that leads to freedom from nature, time, history and other selves. Every American, he writes, assumes that God loves her or him in a personal, intimate way, and this trait is the bedrock of our national religion, a debased Gnosticism often tinged with selfishness. The core of this odd, ponderous book focuses on Pentecostals, Christian Scientists, Jehovah's Witnesses, Seventh-Day Adventists and especially Mormons and Southern Baptists--the two denominations Bloom believes will dominate future American religious life. He argues that mainline Protestants, Jews, Roman Catholics and secularists are also much more Gnostic than they realize. He identifies African-American religion, mystical and emotionally immediate, as a key element in the birth of our home-grown Gnosticism around 1800. Bloom is not likely to win many converts to his viewpoint.

I don't know -- with Laurie Goodstein on board, and Dan Brown at the throttle, that train seems to be picking up speed even without Bloom's name on it.

[Update: Laurie Goodstein is not the only one who apparently wasn't paying attention to Bloom's theories and arguments. Adam Gopnik, in his review "Renaissance Man" (New Yorker, Jan. 17, 2005 -- unfortunately not available on line), writes:

A cultural anthropologist, a hundred years from now, will doubtless find, in the unprecedented success of "The Da Vinci Code" during the time of a supposed religious revival, some clear sign that, in the Elvis mode, what a lot of Americans mean by spirituality is simply an immense opennness to occult superstitions of all kinds.

I'm skeptical that this characterization of my fellow citizens is a fair one, but in any case, it's odd that Gopnik didn't make the connection to Bloom's presentation of a similar set of ideas, not a hundred years later, but a dozen years before. I guess that Bloom's interest in American vernacular religion seemed so out of the mainstream, in 1993, that even an omnivorous intellectual like Gopnik simply ignored his book.]

Posted by Mark Liberman at 05:50 AM

May 22, 2006

Euphony and usefulness

On the title page of a 1993 manuscript by John McCarthy and Alan Prince, "Prosodic Morphology I: Constraint Interaction and Satisfaction", was this epigraph:

"Bulgermander. It's more euphonious than Weldmander. Weldmander will never stick." William Weld, Governor of Massachusetts, on a redistricting plan he devised with Senate President William Bulger. Boston Globe, July 11, 1992.

Alas, neither Bulgermander nor Weldmander seems have stuck, since Google's index is currently ignorant of both coinages. However, I was reminded of Weldmander's alleged euphony problems by something Bernard Lewis said in a recent interview about the neologism Islamdom:

In talking of the Christian world ... we use two terms: Christianity and Christendom. Christianity means a religion, in the strict sense of that word, a system of belief and worship and some clerical or ecclesiastical organization to go with it. If we say Christendom, we mean the entire civilization that grew up under the aegis of that religion, but also contains many elements that are not part of that religion, many elements that are even hostile to that religion. ... In talking of Islam, we use the same word in both senses, and this gives rise to considerable confusion and misunderstanding. There are many things that are described as part of Islam, which are indeed part of Islam, if we take the word as the equivalent of Christendom, but are very much not part of Islam — are even alien or hostile to Islam — if we take the word Islam as the equivalent of Christianity. ...

The late Marshall Hodgson, of the University of Chicago, in discussing this issue, suggested that we use the word Islamdom to describe the civilization. A good idea, but it didn't catch on, probably because it's so difficult to pronounce.

As McCarthy and Prince recognized, this sort of explanation makes a lot of sense. It's plausible that some words, like craptacular and truthiness, have got the right sort of mouth feel to make it. But is it really lack of euphony that dooms almost all made-up words to the fate of glemphy and blang? Anyhow, Islamdom is by no means a total failure, since it gets 19,700 web hits on Google.

What happened to Bulgermander? Well, I guess that it referred to a small, local event whose general category already had the similar coinage gerrymander, so that there was little reason to retain it or generalize it. And Billy Bulger became president of UMass in 1996, and his brother James J. "Whitey" Bulger is still at large. So when "Prosodic Morphology I" was posted to the Rutgers Optimality Archive, as John McCarthy explained (p.c.), "cooler heads prevailed" and the epigraph was omitted.

John added:

By the way, the word "Islamdom" brings to mind my first exposure to the word "Islamism" in my youth. Once a month or so, we were made to recite this prayer:

http://www.catholic.org/prayers/prayer.php?p=26

This prayer is a rich tapestry of intolerance, what with the pairing of idolatry and "Islamism" followed by a couple of sentences on the blood guilt of the Jews. The Protestants and the Orthodox get much milder treatment with their "erroneous opinions".

He's talking about this passage in the "Consecration of the Human Race to the Sacred Heart of Jesus":

You are King of all those who are still involved in the darkness of idolatry or of Islamism; refuse not to draw them all into the light and kingdom of God. Turn Your eyes of mercy toward the children of that race, once Your chosen people. Of old they called down upon themselves the Blood of the Savior; may it now descend upon them a laver of redemption and of life.

At least it's a bit more welcoming than the Saudi-textbook language documented by Nina Shea in yesterday's Washington Post...

The OED dates Islamism to the middle of the 18th century, as a term for the religion we would now call Islam, a word which did not come into use until quite a bit later:

1747 Gentl. Mag. 373 Never since the rise of Islamism [note So the Mahometans call their own religion] has our worship once varied.
1754 Phil. Trans. XLVIII. 755 Before the introduction of Islamism into Arabia.
1855 MILMAN Lat. Chr. IV. i. (1864) II. 169 To subdue to the faith of Islam. Ibid. 213 The potentates summoned by Mohammed himself to receive the doctrine of Islam.

That's clearly the usage in the prayer that John cited. However, the American Heritage dictionary gives Islamism a very different and much more recent gloss:

1. An Islamic revivalist movement, often characterized by moral conservatism, literalism, and the attempt to implement Islamic values in all spheres of life. 2. The religious faith, principles, or cause of Islam.

The Wikipedia article say that Islamism

attained its modern connotation in late 1970s French academia, thence to be loaned into English again, where it has largely displaced "Islamic fundamentalism."

Islamist, meaning "orthodox Muslim" or alternatively "one who is versed in Islamic studies", didn't arrive until the mid-19th and mid-20th centuries respectively:

1855 MILMAN Lat. Chr. XIV. iii. (1864) IX. 108 Caliphs who were, at least no longer, rigid Islamists.
1937 R. H. LOWIE Hist. Ethnol. Theory viii. 97 Westermarck is very widely read, and his original researches in Morocco, though only appraisable by Islamists, bear the earmarks of scholarship.

Again, the AHD offers a more contemporary sense of Islamist as the adjectival form of its noun Islamism, i.e. a certain type of Islamic fundamentalist.

Islamism and Islamist are certainly now successful words, as judged by their 2.4 million and 11.5 million Google hits, respectively. However, since "Islamic fundamentalism" also has 1.77 million Google hits, we can clearly reject the Wikipedia claim that Islamism "has largely displaced 'Islamic fundmentalism'" in English. In fact, on Google News as of yesterday afternoon, "Islamic fundamentalism" got 539 hits, and "Islamism" got only 259. On Yahoo News, "Islamic fundamentalism" got 154 hits, and "Islamism" got 78. I conclude from this that the terminological battle -- if it is one -- is still very much in progress.

In a May 12, 2005 blog entry "Onward, Christianist soldiers?", Ruth Walker wrote that

Google has rounded up 631 hits for me for "Christianist," along with the query, "Did you mean to search for 'Christiano'?" [...]

I figure 631 hits for ‘Christianist’ is the Internet equivalent of seeing the first sliver of the sun coming up over the mountain in the morning.

Now, a bit more than a year later, {Christianist} gets 70,200 hits (though Helpful Google still asks "Did you mean: christiano"). Some of the increase has been promoted by Andrew Sullivan, who has started using the term frequently on his blog The Daily Dish. He has also adopted the nominal form Christianism, for example in his May 15, 2006 essay "My Problem with Christianism" (free version here for non-subscribers).

{Christianism} gets 592,000 Google hits, many more than Christianist -- but that seems to be because the term has a number of prior realms of use, both positive and negative.

Anyhow, my guess is that Islamdom as a term for "Islamic civilization" has failed, while Islamist as a term for "Islamic fundamentalist" has succeeded, not because of their relative euphony, but because of their relative usefulness.

[Update: Ben Zimmer writes

Marshall Hodgson introduced a number of neologisms in The Venture of Islam. He particularly liked the "-ate" suffix for nouns and attributives, as in "Islamicate", "Persianate", and "agrarianate". Other Hodgsonisms include "citied", "technicalistic", and "shari'ah-minded". Many of these terms continue to be used by Islamicists (as opposed to Islamists!) who begin their studies with Hodgson's three-volume classic.

]

[Update #2: I neglected to note (because I had forgotten) that William Safire wrote about Christianism and Christianist in his column of May 15, 2005 -- and I didn't find it earlier, because Google doesn't index behind the Times Select wall. I happened on it in a post by Tristero at Hullabaloo, which in turn I happened on because in an adjacent post he cited a recent post on mine on the striking similarities between a book review by Mark Steyn and a blog post by Geoff Pullum. ]

Posted by Mark Liberman at 05:41 AM

May 21, 2006

Pirahã channels

According to Daniel L. Everett, "Cultural Constraints on Grammar and Cognition in Pirahã: Another Look at the Design Features of Human Language", Current Anthropology, Volume 46, Number 4, August-October 2005 (preprint for those without a subscription):

The Pirahã people communicate almost as much by singing, whistling, and humming as they do using consonants and vowels.

Dan explains the contextual functions of these "channels" as follows:

Channel	Functions
a. Hum speech	Disguise Privacy Intimacy Talk when mouth is full Child language acquisition relation
b. Yell speech	Long distance Rainy days Most frequent use – between huts & across river
c. Musical speech ('big jaw')	New information Spiritual communication Dancing, flirtation Women produce this in informant sessions more naturally than men. Women's musical speech shows much greater separation of high and low tones, greater volume.
d. Whistle speech (sour or 'pucker mouth' -- same root as 'to kiss' or shape of mouth after eating lemon)	Hunting Men-only (as in ALL whistle speeches!) One unusual melody used for aggressive play

Lots of other cultures have ways of expressing speech as whistling, drumming, or chanting. What seems to be unusual about the Pirahã is the relatively large role of these other "channels" (as Dan calls them) in everyday life. As Dan suggests, this may be connected to the fact that Pirahã has a small number of consonant-vowel distinctions

The consonants of Pirahã are /p b t k g ' h/ and, in men's speech only, /s/, and the vowels are /i a o/

and a relatively complex system of syllable-weight, stress and tone. Whistling and humming preserve the prosodic distinctions and blur or eliminate the distinctions among different consonants and vowels. Thus it'll be easier to understand what someone is humming (for example) in a language where there's more information in timing, stress and tone, and less information in consonant and vowel distinctions.

But in my opinion, the most interesting aspect of this situation is what Dan calls "the sloppy phoneme effect", which allows for a "tremendous amount of variation among consonants" in ordinary speech. For example, there's apparently free substitution of /'/ (glottal stop), /p/ and /k/, as in the word for "head", which can be any of:

'apapaí kapapaí papapaí 'a'a'aí kakakaí

(and so on) but not

*tapapaí *tatataí *bababaí *gagagaí *gagagaí *aaaí

or other examples where voiced consonants or no consonants are substituted. Some (though not all) of this variation is equivalent to preserving the syllable-weight classes involved in the Pirahã stress rule, and thus sets up equivalence-classes of words in ordinary speech similar to the equivalence-classes of hummed or whistled "channels".

[A side note for the linguists among you -- I don't know why Dan describes things this way, rather than saying that glottal stop, /k/ and /p/ are actually just allophones of a single underlying phoneme. Perhaps the statistical variation is lexically modulated in a way that makes clear that one word is basically /kaka/ while another is basically /papa/, even though both can in principle be any of [kaka] [kapa] [paka] [papa] (adding in the glottal-stop variants as appropriate)? That would be fascinating but very unexpected.]

To make all this less abstract, here are a couple of examples. First, an .mp3 recording of two boys conversing in musical speech ("big jaw"), from the "enhancements" of the cited Current Anthropology paper. I believe that this is roughly a "hey guess what I did today" conversation, but unfortunately I don't have a transcript, a rendition in normal speaking mode, or any further explanation. Nor do I have any examples of "hum speech" to show you.

Here's the example of whistled speech that Dan gives in the cited article, with spoken and whistled version performed by Dan himself (he kindly sent it from a hotel room in Brazil, where he was on his way to another summer of field work). First the basic sentence, kái'ihí'ao 'aagá gáhí "there is a paca" (audio link):

(syllables)	kái	'i	hí	'ao	'aa	gá	gái	hí
(tones)	H-L	L	H	H+L	L	H	H-L	H
(syllable weight)
(stress)	↑				↑		↑
(glosses)	paca			possible	exist-be		there
(translation)	"there is a paca"

(I guess that this is the "hunting" kind of whistle speech, rather than the "agressive play" kind.)

Here's a picture showing the pitch contour, spectrogram and waveform of the spoken version (note that I've used "?" to mark glottal stop in this case -- sorry for the lack of consistency):

Here's an audio link to Dan's whistled version. The picture below shows a "narrow-band" spectrogram, to make the pitch of the whistling plain, along with the waveform.

(The whistled version is somewhat longer, in this case -- 2.34 seconds versus 1.759 seconds, or about a third longer. This might just be because Dan is a less-practiced whistler than speaker.)

For any and all of the alternative-channel versions of speech across the world's languages, it would be nice to have a collection of analyzed examples that was large enough for some quantitative analysis. Depending on the complexity of the system and the amount of variability in it, that would be several thousand phrases at a minimum, and of course the more the better. This may exist, at least in museum or library archives, for some of the cases (e.g. Vedic chanting and Yoruba drumming), but I don't know of any published (or even accessibly unpublished) examples.

Posted by Mark Liberman at 08:52 AM

May 20, 2006

The dawn of a new era

Roger Shuy touts the new automated phone system at Language Log Plaza:

One day we'll even perfect the way to have recursive menu items redirect you to one of the menus you've already traversed. Now won't that be a great day? We're so proud!

A fabulous day indeed. Because then our phone system will be A HUMAN LANGUAGE. Recursivity ruulz!

Posted by Arnold Zwicky at 06:32 PM

We proudly announce our new telephone system

We here at Language Log Plaza recognize the need for huge operations like ours to keep pace with the times. Being simple folks, we tend to answer our own phones when they ring. But no more nice guy! From now on if you have serious language problems and try to call our number, you will be greeted by our spanking new automated telephone answering service. Here's what you can look forward to:

(telephone ringing at Language Log Plaza)

Voice: Welcome to Language Log. Your call is important to us and we want to be sure that you talk with the right expert here. Please listen carefully to the following menu:

If you wish to speak with a phonologist, punch 1--but be sure to speak very clearly.

If you is trying to reach one of our grammarian which are here right now, punch 2.

If you wish to speak to a semanticist, punch 3, if you know what I mean.

If you wish to speak to a sociolinguist, punch any variety of numbers.

If you wish to speak to a computational linguist, punch 110100110001110000001100101111010.

If you wish to speak to The Director, punch star.

If you feel that you really need a psycholinguist, hang up, dial 911.

If you wish to make a complaint, punch 0 and tell it to the operator.

If you have the wrong number, hang up and redial the same number.

See how easy this is? Give us a call and see how Language Log can be every bit as efficient as the other large companies and government offices.

Note: We apologize for the fact that so far we haven't figured out how to construct an embedded menu, where our first menu transfers you to another menu, which then transfers you to still another menu. Nor have we developed ways to transfer you to a human operator, who will put you on hold for the standard and obligatory 30 minutes. Our engineers are working on these problems, however, and we plan to have these modern, automated systems in operation within a few months. One day we'll even perfect the way to have recursive menu items redirect you to one of the menus you've already traversed. Now won't that be a great day? We're so proud!

Posted by Roger Shuy at 03:17 PM

The Arabs own him

Half-listening to NPR's Weekend Edition, I heard a man talking about current prospects in the horse-racing world, and he said of one horse (I forget which):

He's a very expensive horse. The Arabs own him.

I thought, what, all of them? The entire population of the continuous region of predominantly Arab nations extending from Mesopotamia to Western Sahara? I suppose he meant that the horse was owned by consortium of super-rich sheikhs from Saudi Arabia or the UAE. [Update: Quite possibly the horse he was talking about was Discreet Cat, which won the $2 million United Arab Emirates Derby last March 25. Discreet Cat is owned by Godolphin, the racing stable of Sheikh Mohammed bin Rashid al Maktoum of Dubai.] The remark sounded very strange to me. If a group of South American investors owned a racehorse, would he say "The Latin Americans own him?" If it was Don Ho and some well-heeled co-investors from some of the Pacific Islands, would he say, "The Polynesians own him? It seems unlikely. The Arabs seem to have a much more sharp and unified profile than other widely spread transnational ethnic or linguistic groups. When one of them gets in the news, the news is more likely to be attributed to the entire group than it would be if the same thing had been done by a member of some less salient group.

Posted by Geoffrey K. Pullum at 10:16 AM

Attorney General caught in linguistic snare!

Confusion reigned on Friday over the Senate vote on separate amendments to the immigration reform bill declaring English the "national language" on the one hand and the "common and unifying language" on the other. Sen. James Inhofe (R-OK), sponsor of the "national language" amendment, belittled the softer alternative, saying "You can't have it both ways." But White House spokesman Tony Snow said President Bush supports both amendments, agreeing with the two dozen senators (on both sides of the aisle) who voted for the two measures.

Unfortunately, no one filled in Attorney General Alberto Gonzales, who was in Houston meeting with state and local officials about the enforcement of immigration laws. "The president has never supported making English the national language," Gonzales said after the meeting. "I don't see the need to have legislation or a law that says English is going to be the national language." The White House was forced to backpedal from Gonzales' remarks later in the day, explaining that Bush doesn't believe English should be the "official" language, though "national" is OK. White House spokeswoman Dana Perino clarified:

"The attorney general got caught in a linguistic snare. He took 'national' language to mean what we describe as 'official' language. We have no problem in identifying English, our common linguistic currency as a national language; we also view it more expansively as the "common and unifying language."

Everyone clear now? The word from the White House is: "national" good, "common and unifying" also good, "official" bad. Even if the binding force of the "national language" amendment is tantamount to treating English as "official," that word is still apparently off-limits. Too bad no one told the Attorney General about this fine-grained distinction.

Posted by Benjamin Zimmer at 10:07 AM

Better late

I apologize to our UK readers for not posting this notice before the premiere at SOAS on May 17 of The Last Speakers,

[a] documentary film on endangered languages [that] shows the work of David Harrison and Gregory Anderson on the language of the Ös people who live in log-cabin villages in central Siberia, 3,500 km east of Moscow.

David Harrison is based at Swarthmore, and there will surely be some U.S. showings, which I'll try to write about before they happen rather than after. (And maybe there'll be an online version at some point, to reach the many interested people who won't be lucky enough to be in the audiences?)

Posted by Mark Liberman at 10:07 AM

May 19, 2006

1421 Update

A couple of years ago I wrote about the ridiculous linguistic evidence put forward for the claim in the book 1421 that the Chinese fleet of 1421 reached the Americas. Well, things aren't getting any better. Not only have the 1421 people people not answered any of the criticism of their argument, but the new stuff on their web site is if anything even worse than the old.

The web site now includes an interactive map of British Columbia. The red dots represent putative pieces of evidence. Click on one and in theory (but only part of the time, in practice) a description of the evidence apears below. Clicking on the northernmost dot, which looks to be around Dease Lake, produces this:

Inuit = Yin uit (people from Yin) (Martin Tai).

My previous post dealt with the problems with this equation, but its location on the map really brings out the utter incompetence of these "researchers". The vicinity of the dot is not anywhere near Inuit territory. No part of British Columbia is Inuit territory, nor any part of the Yukon except for a strip in the far north, right along the Arctic Ocean. The nearest Inuit would be about 1,000km away. On the page devoted to linguistic evidence, however, the Inuit are said to be found in Vancouver, which is about 700km South of Dease Lake, not to mention nearly 2,000km from their actual location. We're supposed to take seriously the geographical claims of people who haven't a clue as to where the Inuit live and can't tell the difference between Vancouver and Dease Lake?

P.S.: There's a web site devoted to debunking 1421. Check out 1421exposed.

Posted by Bill Poser at 08:40 PM

Anti-Elmorese

According to an article by Linda Greenhouse in today's NYT ("Second Hearing on Detroit Drug-Search Case Shows Deep Divisions on Supreme Court"),

Nonetheless, Justice Breyer proceeded to make it clear that he remained unpersuaded by Mr. Baughman's argument that the Michigan Court of Appeals was correct in refusing to exclude from Booker T. Hudson's trial the drugs the police found when they executed a search warrant by bursting into his home without knocking or waiting for him to open the apparently unlocked door.

There can be no question that both Justice Breyer and Ms. Greenhouse are users of a human language, by the recursion-based standards of Hauser, Chomsky and Fitch.

In fact, this example is pretty much the opposite end of the syntactic stick from the flat structures used by the Pirahã and by Elmore Leonard characters. Using the crude metric of clausal depth featured in my study of secular trends in presidential embedding, Ms. Greenhouse's sentence weighs in with an truly impressive (word-wise) average embedding depth of 5.98, and a spectacular peak of 12. Without even one comma or other internal punctuation.


 0 [Nonetheless, 
 0 [Justice Breyer proceeded 
 1   [to make it clear 
 2      [that he remained 
 3         [unpersuaded by Mr. Baughman's argument 
 4            [that the Michigan Court of Appeals was correct 
 5               [in refusing 
 6                  [to exclude from Booker T. Hudson's trial the drugs 
 7                     [the police found 
 8                        [when they executed a search warrant 
 9                           [by bursting into his home 
 10                             [without knocking 
 10                                   or waiting 
 11                                [for him to open the 
 12                                   [apparently unlocked]
 11                                 door.]]]]]]]]]]]

(The quantification of depth has been revised to reflect Geoff Pullum's judgment that I was wrong to let "remained unpersuaded" go by without an increment of embedding:

"Remained" is definitely a complement-taking verb; and "unpersuaded by Mr. Baughman's argument..." is definitely a passive clause. So you have undercounted Greenhouse's astonishing hypotacticity: she hits 12.

)

Of course, this isn't the kind of center embedding that tamarins and starlings have been tested on (short versions of), it's almost all embedding of the type that linguists call "right branching" (because at each iteration, it's the right-hand constituent that is subdivided further). But still.

Gene Buckley, who sent in the link, point out that it's not just the depth, it's the negation:

It was only some top-down knowledge of Breyer and the usual votes on cases like this that permitted me to understand the sentence, at least when I was still on my first cup of coffee. The string of predicates "unpersuaded... correct... refusing... exclude...", three of which have some kind of negative meaning, was just too much. What follows is plenty complex as well!

Yes, don't forget "without knocking"!

I've noticed recently that readers of Language Log sometimes misconstrue my opinions, so I'll be explicit: no criticism of Justice Breyer or Linda Greenhouse is intended. I'm proud to be a member of a species that can think and write like that, when it wants to. Nor, of course, do I mean mean any disrepect towards people who talk like Elmore Leonard characters, with hardly any embedding at all. I'm one of them, sometimes, and not ashamed of it either. It takes all kinds.

Posted by Mark Liberman at 04:05 PM

Hutchisonian science

As Mark Liberman notes in an update to his post, "Request for action from the AAA," Inside Higher Ed now reports that the Senate Committee on Commerce, Science and Transportation has rebuffed Sen. Kay Bailey Hutchison's proposal to cut (or at least drastically deprioritize) social science funding in the NSF budget. The committee's compromise evidently involves a call for "increased support for physical science research," so presumably other disciplines would continue to be supported under the committee's proposal but not at increased levels. We'll have to wait for the text of the revised bill before we know for sure what specific recommendations the committee has made.

I was particularly struck by one paragraph in the Inside Higher Ed article:

Hutchison reiterated her feeling that Congress should "focus on science and technology" because "we are responding to a crisis in our country." Hutchison added that she is "not against social sciences being part of the NSF budget," but that "I want to make sure we focus on the mission we are after." Hutchison appeared to be using a broad definition of social science when she noted that biology, geology, economics, and archaeology are worthy pursuits, but can often stray from the innovation and competitiveness path.

Attention biologists and geologists! According to Hutchisonian definitions, you are now social scientists!

It's downright alarming that someone with such influence on the funding of the NSF and other research-related agencies doesn't seem to have a clue what counts as "physical" or "natural" science and what counts as "social science." One can only wonder why Hutchison felt the need to lump biology and geology in with disciplines that she regards as somehow peripheral to the "mission" of the NSF, like economics and archaeology. My best guess is that Hutchison's comment was intended as an indirect swipe at researchers doing work on such disfavored topics as evolution and global warming. This would be entirely in keeping with what Chris Mooney has identified as the Republican war on science.

Stay tuned for further senatorial edicts from the once august (now increasingly reactionary) body. I await the senator who announces, "I can't define hard science, but I know it when I see it!"

[Update #1: More coverage from ResearchResearch ("Hutchison amendment tweaked, social scientists relieved"):

Sen. Kay Bailey Hutchison, R-TX, had originally sought an amendment designed to ensure a focus upon science and technology fields, but later reached a compromise with Sen. Frank Lautenberg, D-NJ, who objected on the grounds that such language could stifle social science. The original amendment would have instructed NSF to give priority to research grants and activities that contribute specifically to "physical science, technology, engineering, or mathematics," but Hutchison explained that she agreed when Lautenberg added the words "innovativeness" and "natural science" to her proposal. "It's all good. We are really happy," said Barbara Wanchisen, executive director of the Federation of Behavioral, Psychological and Cognitive Sciences.

Nevertheless, Hutchison stuck by the rationale of her original wording. "The awarding of tax money should be to further our goal of innovation and competitiveness in math and science," she told the committee. But, the senator clarified that she is not against the social sciences being part of NSF's mandate and mission. "I think that biology, economics, geology, geography, archeology, are all worthy of our study and there are some great studies going on in the fields of sociology," she said.

So apparently all disciplines would be eligible for prioritized funding, as long as they contribute to "innovativeness." Innovativeness is, of course, in the eye of the beholder. Among Hutchison's examples of an "inappropriately supported" project is a study of how global and national economic changes are affecting urban women workers in Bangladesh. Such a topic is apparently not "innovative" and therefore unworthy of taxpayer funding from Hutchison's perspective. Will NSF-supported researchers now be subject to an "innovativeness" test to decide who gets to move to the front of the funding line?]

[Update #2: Based on the full quote from Hutchison provided by the ResearchResearch article, it may have been unfair for the Inside Higher Ed writer to say that she "appeared to be using a broad definition of social science" (and in turn, unfair of me to say that she called biology and geology "social science"). We know that she said she is "not against social sciences being part of the NSF budget" and further clarified, "I think that biology, economics, geology, geography, archeology, are all worthy of our study and there are some great studies going on in the fields of sociology." So to give her the benefit of the doubt, she may have been trying to recognize the significance of research conducted in a diverse range of fields, from economics and archaeology to biology and geology, rather than simply labeling them all as "social science." We'll know better what she was driving at once we see the full transcript of her remarks.

Also, Mark Liberman observes that a focus on "innovativeness" is nothing new for the NSF:

Having served on many NSF review panels, I can say that NSF proposals are already evaluated for innovativeness, both in terms of intellectual merit and in terms of broader social implications.
The word "innovation" occurs in 23,100 pages on the NSF web site; "innovative" on 7,980 pages; and "innovativeness" on 77. I expect that NSF will respond by increasing that last number. ]

[Update #3: Here's the agreed-upon wording, according to ScienceNOW:

After highlighting the importance of the "physical and natural sciences, technology, engineering, and mathematics," the amendment explains that "nothing in this section shall be construed to restrict or bias the grant selection process against funding other areas of research deemed by the foundation to be consistent with its mandate, nor to change the core mission of the foundation." ]

Posted by Benjamin Zimmer at 01:55 PM

Homo journalisticus

The story in National Geographic News on the putty-nosed monkeys and their combination pyow-hack calls (acknowledgment to Evan Bradley, who made my day slightly sadder by pointing it out to me) is worse than the one David Beaver cites. It is headed:

Monkeys use "sentences", study suggests

Study suggests nothing of the kind, of course.

In fact the story itself reports that the author of the scientific report "cautions that analogies to human language are not always helpful in understanding the utterances of animals." Quite so. I guess content means nothing to the headline writers where science journalism is concerned.

I have no doubt that for a long, long time we shall continue to see stories recognizing language use in dumb animals and birds sitting alongside stories about it being absent in various kinds of humans (Bushmen, undergraduates, primitive tribes, bureaucrats, urban blacks, Danes, male scions of the Bush family, teenagers, Southerners, university administrators, and other despised groups). Because, while it is completely unclear whether the roots of language are innate, there is overwhelming evidence of an innate drive in Homo journalisticus to write stories about talking or understanding being manifested in chimps, gorillas, orangutans, baboons, monkeys, tamarins, bees, dolphins, whales, parrots, starlings, dogs, bats (yes, bats — see below) and I don't know what will be next but perhaps donkeys. And the subspecies Homo journalisticus subeditorialis clearly has a built-in drive to write wild and goofy headlines for such stories.

I am not exaggerating. You might want to look at Holy Bat Chat, Batgirl! Medic Is Cracking Bat Code, about Barbara French, who "has decoded a basic repertoire of bat calls and deciphered the social context in which they are used," before you accuse me of exaggerating. Check this bit, which Mark Liberman pointed out really needed to be quoted:

French believes the animals are using sounds with syntax. To test the hypothesis French, [her collaborator the neurophysiologist George] Pollak, and one of his graduate students are cataloging all the calls, and analyzing the acoustic structure of each, to study how sounds are manipulated to produce different meanings.

During mating season, for example, males produce a "territorial announcement buzz" to woo females. The same sound, albeit at a different intensity and pace, seems to be used to ward off competing males. "It's the difference between saying something sweetly, and screaming those same words — they could have very different meanings," said French.

That's right: they do an intensified angry buzzing sound to make rival horny male bats buzz off, and it's reported as sounds with syntax. Bzzzzzzz! Leave my woman alone! I mean, hello?

You might also want to look at Monkeys Have Accents, Japanese Study Finds, before you suggest that I am overstating. There it is reported that "primate researchers have discovered that Japanese macaques can acquire different accents based on where they live — just like humans." Just like humans! No, I'm not exaggerating.

Why this drive toward drivel in linguistic science reporting? Is there a survival advantage conferred by some trait manifesting itself in credulity concerning animal communication? Further research is needed.

Posted by Geoffrey K. Pullum at 10:41 AM

What does "official" mean?

The problem with nitpicking over whether a particular piece of legislation makes English "official" is that being "official" has no well-defined meaning. Some countries make a distinction in their legislation. For example, Switzerland has three "official" languages (French, German, and Italian) but four "national" languages (the foregoing plus Romantsch). Swiss legislation specifies various ways in which a language that is merely "national" rather than "official", in practice just Romantsch, has a somewhat second class status. The distinction made in Switzerland, however, is not necessarily carried over in other uses of terms like "official" and "national". elsewhere. In the Northwest Territories, for example, several native languages have "official" status along with English and French, but their status is in fact not the same. There is, for example, no legal right to receive one's education in a native language.

There are uses of "official language" that are apparently outside the scope of the Inhofe amendment. It evidently does not envision denying the status of legal instrument to documents written in languages other than English. But denying US citizens the legal right to receive government services in any language other than English certainly comes close enough to what "official" means in many contexts for it to be quite legitimate to say that the Inhofe amendment makes English the official language of the United States.

Posted by Bill Poser at 03:15 AM

English: official, national, common, unifying, or other?

Has the United States Senate really voted for "official English," as Bill Poser writes? The situation's a bit more complicated than that, as suggested by the AP headline, "Senate sends mixed signals on English." Senators have not actually weighed in on whether English should be made the nation's "official" language, though a House bill along these lines is said to have strong support. Rather, the Senate considered two amendments to the immigration reform act which proposed modifiers to "English" not quite as forceful as that magic word "official."

The first amendment, sponsored by Sen. James Inhofe (R-OK), is intended to "preserve and enhance the role of English as the national language of the United States of America." The amendment passed by an overwhelming vote of 62 to 35, with 10 Democrats joining 52 Republicans in support. This is evidently the first time in history that the Senate has identified English as the "national" language, if not the "official" language. According to various news accounts, Inhofe had originally wanted to use the word "official" but changed it to "national" to draw more support for the amendment. The Chicago Tribune reports that for Official English proponents, the choice of adjective didn't actually matter very much. Tim Schultz, director of government relations of U.S. English Inc., is quoted as saying, "We don't care. We think it's basically the same thing. It's a 'You say potato, I say potahto' kind of thing."

For those on both sides of the debate, what was clearly more important than the cosmetic choice of adjective was the amendment's "teeth": its unprecedented insistence that unless otherwise authorized "no person has a right, entitlement or claim" to obtain government services in a language other than English. Many Democrats were harsh in their assessment: Minority Leader Harry Reid said, "While the intent may not be there, I really believe this amendment is racist. I think it's directed basically at people who speak Spanish."

Immediately after the vote on Inhofe's amendment came another vote, this time for a less binding amendment to the immigration bill sponsored by Sen. Ken Salazar (D-CO). The purpose of Salazar's amendment was "to declare that English is the common and unifying language of the United States, and to preserve and enhance the role of the English language." Inhofe scoffed at the followup amendment, saying, "You can't have it both ways." Apparently, to Inhofe's way of thinking, English can be either "national" or "common and unifying," but not both. From an outsider's perspective this might seem slightly insane, but it makes perfect sense in the context of congressional party politics. The "common and unifying" measure was, to Inhofe, a weak Democratic response to declaring English the "national" language, since "national" is now supposed to be taken as a code word for "official," softened to placate moderates.

The moderates, however, did want to "have it both ways." A total of 23 senators who voted for Inhofe's amendment (roughly split between Republicans and Democrats) crossed over and voted for Salazar's amendment too. This allowed the bill to pass by a margin of 58 to 39. So the Senate has now told us that English should be recognized as "national," "common," and "unifying," though less than two dozen senators liked all three of those adjectives.

Here's the breakdown of the vote so American readers can know on which side of the adjectival divide their elected representatives stand:

English is national, not common & unifying (39: 39R, 0D)	English is common & unifying, not national (35: 1R, 33D, 1I)	English is national and common & unifying (23: 13R, 10D)
Alexander (R-TN)	Akaka (D-HI)	Baucus (D-MT)
Allard (R-CO)	Bayh (D-IN)	Brownback (R-KS)
Allen (R-VA)	Biden (D-DE)	Byrd (D-WV)
Bennett (R-UT)	Bingaman (D-NM)	Carper (D-DE)
Bond (R-MO)	Boxer (D-CA)	Chafee (R-RI)
Burns (R-MT)	Cantwell (D-WA)	Coleman (R-MN)
Burr (R-NC)	Clinton (D-NY)	Collins (R-ME)
Chambliss (R-GA)	Dayton (D-MN)	Conrad (D-ND)
Coburn (R-OK)	Dodd (D-CT)	DeWine (R-OH)
Cochran (R-MS)	Domenici (R-NM)	Dorgan (D-ND)
Cornyn (R-TX)	Durbin (D-IL)	Graham (R-SC)
Craig (R-ID)	Feingold (D-WI)	Hagel (R-NE)
Crapo (R-ID)	Feinstein (D-CA)	Johnson (D-SD)
DeMint (R-SC)	Harkin (D-IA)	Lincoln (D-AR)
Dole (R-NC)	Inouye (D-HI)	McCain (R-AZ)
Ensign (R-NV)	Jeffords (I-VT)	Murkowski (R-AK)
Enzi (R-WY)	Kennedy (D-MA)	Nelson (D-FL)
Frist (R-TN)	Kerry (D-MA)	Nelson (D-NE)
Grassley (R-IA)	Kohl (D-WI)	Pryor (D-AR)
Gregg (R-NH)	Landrieu (D-LA)	Snowe (R-ME)
Hatch (R-UT)	Lautenberg (D-NJ)	Specter (R-PA)
Hutchison (R-TX)	Leahy (D-VT)	Voinovich (R-OH)
Inhofe (R-OK)	Levin (D-MI)	Warner (R-VA)
Isakson (R-GA)	Lieberman (D-CT)
Kyl (R-AZ)	Menendez (D-NJ)
Lott (R-MS)	Mikulski (D-MD)	Not voting (3: 2R, 1D)
Lugar (R-IN)	Murray (D-WA)	Bunning (R-KY)
McConnell (R-KY)	Obama (D-IL)	Martinez (R-FL)
Roberts (R-KS)	Reed (D-RI)	Rockefeller (D-WV)
Santorum (R-PA)	Reid (D-NV)
Sessions (R-AL)	Salazar (D-CO)
Shelby (R-AL)	Sarbanes (D-MD)
Smith (R-OR)	Schumer (D-NY)
Stevens (R-AK)	Stabenow (D-MI)
Sununu (R-NH)	Wyden (D-OR)
Talent (R-MO)
Thomas (R-WY)
Thune (R-SD)
Vitter (R-LA)

[Late update: Initial reports of the Senate vote had Mary Landrieu (D-LA) voting for the Inhofe amendment, but the final tally shows her in the "nay" column.]

Posted by Benjamin Zimmer at 01:43 AM

Senate votes for official English

The US Senate achieved a linguistic nadir today in approving 63-34 an amendment by Oklahoma Senator James Inhofe to S. 2611, the Comprehensive Immigration Reform Act of 2006, that makes English the official language of the United States. The main part reads:

The Government of the United States shall preserve and enhance the role of English as the national language of the United States of America. Unless specifically stated in applicable law, no person has a right, entitlement, or claim to have the Government of the United States or any of its officials or representatives act, communicate, perform or provide services, or provide materials in any language other than English. If exceptions are made, that does not create a legal entitlement to additional services in that language or any language other than English. If any forms are issued by the Federal Government in a language other than English (or such forms are completed in a language other than English), the English language version of the form is the sole authority for all legal purposes. [source]

Unless and until passed by the House of Representatives, this is not the law, but as the House is also dominated by Republicans, it may well. The vote was not strictly partisan - 13 Democrats voted for it - but the opposition consisted entirely of Democrats with the exception of New Mexico Senator Pete Domenici. The roll call can be found here.

One justification for this is that eliminating services in languages other than English will save $1 to $2 billion (Inhofe speech). That's at most 0.7% of the cost thus far of the invasion of Iraq. In both cases, the dollar figures don't include the human cost. A second is that it will encourage immigrants to learn English, as if they needed encouragement. The myth that immigrants are unwilling to learn English was debunked so long ago you'd think that people would be embarassed to mention it. The third argument, believe it or not, is that making English official will have a unifying effect! That's rich. Depriving Spanish-speakers in the Southwest and Puerto Rico and American Indians and Eskimos of services in their own languages is obviously a great way to make them feel wanted.

Posted by Bill Poser at 01:20 AM

Parataxis in Pirahã

Take a look at this quote from Elmore Leonard's LaBrava:

"What're you having, conch? You ever see it they take it out of the shell? You wouldn't eat it."

The speaker is Maurice Zola, "five-five, weighed about one-fifteen and spoke with a soft urban-south accent that had wise-guy overtones, decades of street-corner styles blended and delivered, right or wrong, with casual authority".

"Wise-guy overtones", check; "casual authority", OK; but is this guy speaking a human language? Maybe not, according to Marc Hauser, Noam Chomsky and Tecumseh Fitch. If they're right, we need to wait and see whether Maurice comes up with any genuinely "recursive" syntax. Until then, the humanity of his soft urban-south way of talking is uncertain.

In a 2000 article in Science, Hauser, Chomsky and Fitch (HCF) argued that the "aspects of language that are special to language ... only [include] recursion". Steve Pinker, Ray Jackendoff and many others have disagreed. (A quick historical sketch of the debate, with links to the original publications, is here.)

This debate underlies the recent series of articles about whether monkeys and birds can learn "recursive" patterns. (You can read about the experiments on cotton-top tamarins here and the experiments on starlings here, and you can find a list of more than a dozen other relevant Language Log posts here. A short note from Ray Jackendoff, Geoff Pullum, Barbara Scholz and me about this stuff is here.)

By "recursion", HCF mean "computational mechanisms ... providing the capacity to generate an infinite range of expressions from a finite set of elements". "Recursion" in this sense goes beyond the simple combinations of modifiers and heads ("red" + "cow" → "red cow"), or subjects and verbs ("Joan" + "disagree" → "Joan disagrees"), or any other construction that doesn't involve embedding a complex element repeatedly inside another element of the same type. Non-recursive constructions (like modifier+head) are very useful, and such embeddings multiply the set of messages that you can make out of a finite set of elements, but they don't "generate an infinite range of expressions" unless they operate recursively.

And "recursion" in HCF's sense also excludes simply stringing together expressions in a sequence. Animal signaling, human or otherwise, is limited to a single item only if an unhappy accident immediately silences the signaler. Otherwise communication, like life, is just one thing after another; and if mere sequence is recursion, then bacterial signaling is recursive. To be syntactically "recursive", a message must involve structural embedding that goes beyond concatenation or juxtaposition.

That's why Dan Everett ("Cultural Constraints on Grammar and Cognition in Pirahã: Another Look at the Design Features of Human Language", Current Anthropology, Volume 46, Number 4, August-October 2005 -- free preprint) can plausibly think that

... the evidence suggests that Pirahã lacks embedding altogether.

even though he presents and discusses examples that he translates as "When I finish eating, I want to speak to you"; "If it rains, I will not go"; "I want the shirt that Chico sold"; "The woman wants to see you"; "He knows how to make arrows well"; "I said that Kó'oí intends to leave"; "There are two big red airplanes"; and so on.

Now, Dan is talking about recursive embedding, not simple modification or combining words into simple clauses. But even so, the most interesting thing about this claim, in my opinion, is that it imposes a lot fewer constraints on what the Pirahã can say than you might think.

Here are a couple of examples with his discussion:

ti	gái	-sai	kó'oí	hi	kaháp	-ií
I	say	-nominative	Kó'oí	he	leave	-intention
"I said that Kó'oí intends to leave." (lit. "My saying: Kó'oí intend-leaves.")

The verb "to say" (gái) in Pirahã is always nominalized. It takes no inflection at all. The simplest translation of it is as a possessive noun phrase "my saying," with the following clause interpreted as a type of comment. The "complement clause" is thus a juxtaposed clause interpreted as the content of what was said but not obviously involving embedding. Pirahã has no verb "to think," using instead (as do many other Amazonian languages [see Everett 2004]) the verb "to say" to express intentional contents. Therefore "John thinks that ..." would be expressed in Pirahã as "John's saying that. ..."

A similar construction in Elmorese would be something like "My opinion, he's gonna leave".

English complement clauses of other types are handled similarly in Pirahã, by nominalizing one of the clauses:

kahaí	kai	-sai	hi	ob	-áa'áí
arrow	make	-nominative	he	see	-attractive
"He knows how to make arrows well." (lit. "He sees attractively arrow-making.")

There are two plausible analyses for this construction. The first is that there is embedding, with the clause/verb phrase "arrow make" nominalized and inserted in direct-object position of the "matrix" verb "to see/know well." The second is that this construction is the paratactic conjoining of the noun phrase "arrow-making" and the clause "he sees well."

Dan gives some arguments (clitic agreement and so on) for the "paratactic conjoining" theory -- but what is this "paratactic conjoining" anyhow?

The American Heritage Dictionary says that parataxis (contrasted with syntaxis) is

NOUN: The juxtaposition of clauses or phrases without the use of coordinating or subordinating conjunctions, as It was cold; the snows came.
ETYMOLOGY: Greek, a placing side by side, from paratassein, to arrange side by side : para-, beside; ... + tassein, tag-, to arrange.

The OED's gloss allows "connecting words" in general to be left out, not just conjunctions:

The placing of propositions or clauses one after another, without indicating by connecting words the relation (of coordination or subordination) between them, as in Tell me, how are you?.

But there's a tricky point here: in parataxis, is the (contextually apparent) relation between the phrases really there, but just not overtly expressed? Or is it in some sense not there at all?

Another way to put this is to ask whether we're talking about the sentence structure or discourse structure. Most linguists think that the rhetorical structure of a coherent discourse is not encoded in the same way that phrasal structure is. On this view, a story may have a beginning, a middle and an end, but this is a different kind of structure from the organization of a clause into a subject, a verb and object. From the listener's point of view, you might say that syntactic structure is part of the evidence you use to make sense of a sentence, while rhetorical structure is part of the result you get when you've succeeded in making sense of a discourse.

There are a lot of different ideas about what rhetorical structures are like once you figure them out (see here and here and here for some examples), but everyone seems to agree that they can be recursive. For example, a story can contain another story, or an elaboration can contain another elaboration, as readers of this weblog have reason to know.

However, there's a sort of gray area in between sentential syntaxis and discourse parataxis. In English noun compounds like [[sickle cell] anemia] vs. [rat [bile duct]], we assume that there's really a (syntactic) structure there, even though there are no words or other signs that make the relationship explicit. (Well, there's a stress difference in this case, but never mind that for now.) So couldn't there be a similar implicit relationship between apparently "paratactic" words and phrases, at least in some cases?

The reported speech of Elmore Leonard's characters, especially the lower-class ones, is full of concatenated phrases where this question comes up, because the semantic connection between the phrases is clear in context, but is not made explicit.

Sometimes the implied connection is temporal ("When X, Y"), as in this example from Mr. Majestyk:

"We get here," Larry Mendoza said, "this guy's already got a crew working."

Sometimes it's conditional ("If X, Y"), as in this example from the same book:

"Listen," Renda said, "we get to a phone we're out of the country before morning."

There are also examples of juxtaposed noun phrases whose connection is left implicit, as in these two examples (again from Mr. Majestyk):

"All right, I call some more friends. They get us out of the the country, some place no extradition, and wait and see what happens."

"That goddam truck of his, he can go anywhere," Renda said. "He told me, he comes up here hunting."

This is all perfectly grammatical vernacular American English, in my opinion; I talk this way myself, in some kinds of casual conversation.

These examples feel similar to Everett's examples of Pirahã parataxis, sharing the property of implying, via juxtaposition, phrasal relationships that English in other styles encodes via explicit clausal or phrasal embedding. Here's how Pirahã does temporal clauses:

kohoai	-kabáob	-áo	ti	gí	'ahoai	-soog	-abagaí
eat	-finish	-temporal	I	you	speak	-desiderative	-frustrated initiation
"When [I] finish eating, I want to speak to you." (lit. "Eating finishes, I you speak-almost-want")

There is almost always a detectable pause between the temporal clause and the "main clause." Such clauses may look embedded from the English translation, but I see no evidence for such an analysis. Perhaps a better translation would be "I finish eating, I speak to you.

A lot like Elmorese, except that the Pirahã examples are often more explicit about the semantic relationships, as indicated in this case, for example, by the completive and temporal morphemes on eat.

So let's return to the example I started with. Maurice juxtaposes three clauses

[You ever see it] [they take it out of the shell] [you wouldn't eat it].

whose relationship might have been made explicit by adding an "if" and a "when"

If you ever saw [a conch] when they take it out of the shell, you wouldn't eat it.

Once the relationships are made explicit like that, we've arguably got a recursive sentence, since the structure is something like this:

What about the way Maurice said it? Are his three clauses organized in a recursive syntactic structure:

or just a paratactic juxtaposition:

I'm not sure. A lot seems to hinge on the answer, at least in the case of Pirahã. Is "paratactic juxtaposition" like stringing sentences together in a discourse, or is it like combining words and phrases in a sentence? Or are those two alternatives really just versions of the same thing seen from different disciplinary angles? Those are deep questions, to be answered by people who know more about syntax and discourse than I do.

In any case, I'm confident that HCF are now among those who make a sharp distinction between syntax and discourse structure, and I'm equally sure that "recursion", for them, is a matter of syntax. Chomsky has famously been skeptical for decades about whether discourse coherence is even a problem amenable to rational investigation, as opposed to one of the mysteries that "lie beyond the reach of the form of human inquiry that we call 'science'" ("Problems and Mysteries in the Study of Human Language," Reflections on Language pp. 137-227, 1975). And the recent animal experiments are all about learning "grammatical" constraints on short sequences of uninterpreted sounds, a situation where discourse structure simply doesn't arise. HCF are strictly concerned with recursion in the syntactic structure of sentences, not in the interpreted structure of discourses.

That's why Dan Everett's claim that Pirahã lacks (recursive) syntactic embedding is important. If some human languages (and Pirahã is not the only candidate) lack recursion, then it's hard to see how recursion could be the defining characteristic of human language. I tried to make this point in a humorous way last year ("Homo Hemingwayensis"). So Tecumseh Fitch is heading to Amazonia this summer, according to Elizabeth Davies' May 6, 2006 article in The Independent:

Professor Everett ... will head back to Amazon this summer with a bevy of enthusiastic young PhD students to try to introduce others to the Pirahã and to prove his theories. A mark of how seriously the linguistic world takes his studies is that accompanying him will be W Tecumseh Fitch, one of the three architects of the original theory of universal grammar along with Chomsky and Dr Marc Hauser. The expert is keen to see whether the tribe does indeed refute their long-established theory.

(Given that the theory in question (that human language in the narrow sense is only recursion) was first proposed in 2000, and immediately called into question by Pinker and Jackendoff among others, the modifier "long-established" is a bit of a stretch here. "Universal grammar" is a term with a longer and broader history -- but this post is not yet another critique of linguistic journalism...)

Anyhow, the article doesn't tell us what Fitch is going to be doing. Probably not analyzing the sentence and discourse structures of Pirahã, since that's not the kind of stuff he does. I'd guess that he'll be testing the Pirahã people on the sorts of acoustic novelty-detection tasks that Hauser and Fitch 2004 applied to cotton-top tamarins and Harvard undergraduates. If he can get them to approach the task the way he wants, I'll look forward to learning the results. But I'd also really like to know when parataxis is really covert syntaxis, and what sort of embedded linguistic structures the Pirahã actually use.

[See David Beaver's recent post "And people say we monkey around" for more on the question of concatenated signals from animals.]

[And yes, I'm aware that some of Elmore Leonard's paratactic juxtapositions could alternatively be analyzed in terms of "prosiopesis", Otto Jespersen's term for starting to talk without putting your mouth in gear, e.g. "[with] that goddam truck of his, he can go anywhere", "[in] my opinion, he's gonna leave." But I think there's plenty of evidence that this is not the right story, in general.]

[Update: Dan Everett writes that

This looks great. You have made this all much clearer than I have. So I will happily borrow from you in future discussions.
The New Yorker is going to be doing an in-depth article on this stuff. [...] What you have to say on this will be very helpful.
Actually, what Tecumseh will be doing is checking for recursive reasoning, which I am quite confident that the Piraha have. I see this as independent of their syntax, though, whereas he does not.

I'm encouraged that Dan didn't find any howlers, though I'm still not sure of my grasp of the logic of this question. His description of Fitch's project does suggest that my characterization of Fitch's views is wrong -- if "recursive reasoning" is the same as recursive embedding in syntax, then I suppose that rhetorical structure imposed on paratactic juxtaposition of phrases would count for Fitch as "recursion".]

[In a later note, Dan confirms that the Pirahã do have recursive narrative structures. A simple example would be interpolating a digression into the plot of a story. The same thing of course is also true of even the most paratactic writers. ]

Posted by Mark Liberman at 12:04 AM

May 18, 2006

Request for action from the AAA

[Update: according to Inside Higher Ed

Voicing concern over America's math and science competitiveness, a Senate committee on Thursday unanimously approved legislation that would push physical science research and teaching partnerships involving colleges and government agencies.
[...]
On Wednesday, Hutchison proposed an amendment that would have forced NSF to give funding priority to work that is expected to make contributions in the physical sciences, technology, engineering, or math. By voting time, however, a compromise was reached with Sen. Frank R. Lautenberg (D-N.J.) and, while the language in the bill places special emphasis on the physical sciences, Hutchison's amendment was changed to allow NSF to be flexible with its funding priorities.
[...]
Hutchison's was the lone voice of concern Thursday.
[...]
The bill would also authorize the NSF to give 2,500 additional grants to be used for graduate research fellowships and for the Integrative Graduate Education and Research Traineeship Program, which preps doctoral science and engineering students for interdisciplinary work.

(Thanks to Kai von Fintel for the link.)]

In this case, the "AAA" is the American Anthropological Association, and by several routes today I've gotten copies of an an email with the Subject "Urgent Action Required", regarding a "proposed amendment by Senator Kay Bailey Hutchison (R-TX) that would instruct the National Science Foundation (NSF) to direct its resources primarily to the physical sciences." Some further details are available on the AAA website here.

I haven't seen the text of Hutchinson's amendment to S. 2802 (the “American Innovation and Competitiveness Act of 2006”) -- for some reason the AAA web page doesn't quote it or link to it. But as described, it would not only defund sociology, anthropology, linguistics and economics, but also mathematics and most of the Directorate for Computer and Information Science and Engineering, whose Digital Libraries initiatives had a significant impact on recent American Innovation and Competitiveness:

In 1996, Google co-founders Sergey Brin and Larry Page were graduate computer science students working on a research project supported by the Stanford Digital Library Technologies Project. Their goal was to make digital libraries work, and their big idea was as follows: in a future world in which vast collections of books are digitized, people would use a "web crawler" to index the books' content and analyze the connections between them, determining any given book's relevance and usefulness by tracking the number and quality of citations from other books.

The crawler they wound up building was called BackRub, and it was this modern twist on traditional citation analysis that inspired Google's PageRank algorithms – the core search technology that makes Google, well, Google.

Without seeing the text of the bill and the amendment, or reading a more extensive analysis of what they say and mean, I'm not sure whether this interpretation of the Hutchinson Amendment's effect is valid. In any case, it may be too late to affect this particular committee action one way or another, since the AAA page asserts that "At a meeting of the Senate Commerce Committee TODAY [i.e. Thursday], an authorizing bill – S. 2802 – focusing on American competitiveness will be marked up (i.e. negotiated)", and that phones calls or emails would need to have gone in this morning in order to have an effect on committee members' votes.

On the other hand, this is only a committee vote on "authorizing legislation", so later votes and actions will be at least as important, if not more so. Therefore I suggest that you look into this and act as your political opinions dictate.

[Update: Ben Zimmer writes that

There is no actual amendment (yet) -- Hutchison so far has just been raising questions about what types of projects NSF funds, with vague suggestions that some other agency be in charge of social science funding. So it was a bit misleading for the AAA email to refer to the "Hutchison amendment." The request for action is apparently preemptive, in order to forestall any amendment to the bill that would limit NSF funding to particular disciplines. More here.
Also, there's no indication that Hutchison wants to restrict NSF funding for mathematics (despite the AAA email's reference to "the physical sciences"). In fact, she is quoted by Science as saying: "I want NSF to be our premier agency for basic research in the sciences, mathematics, and engineering. And when we are looking at scarce resources, I think NSF should stay focused on the hard sciences." So it's just those "soft sciences" that would lose out.

Another piece of evidence that one should be careful before crediting mass emails with "Urgent" in the Subject line, even if they come from a semi-reputable source like the AAA. The issue is surely an important one, but it seems to me that the AAA should send out a more accurate picture of what is going on.]

[Update #2: Joshua Tauberer, linguistics grad student, to the rescue:

Hi, Mark. It's a rare moment when my linguistic and non-linguistic activities come together. Details of the bill can be found on my website:
http://www.govtrack.us/congress/bill.xpd?bill=s109-2802
The amendment wasn't linked to or quoted probably because it hasn't been officially proposed as a true amendment, but rather is all happening within committee where public disclosure of things is sadly pretty limited.

Unfortunately this links to the official record on the Thomas website, where the text of the bill is not yet available.]

[Fernando Pereira supplies this link to a "Staff Working Draft" of May 12 on the Senate Commerce Committee web site, which contains the paragraph:

PRIORITY TREATMENT. -- Proposed research activities, and grants funded under the Foundation's Research and Related Activities Account, which can be expected to make contributions in physical and natural sciences, technology, engineering, and mathematics, and other research that underpins these areas, shall be given priority in the selection of awards and in the allocation of Foundation resources.

No amendments are indicated, but this language seems similar to what the Science article suggests that Senator Hutchinson is after.]

[Kai von Fintel sent in a link to a press release from the Senate Commerce Committee, indicating that

The U.S. Senate Committee on Commerce, Science, and Transportation today approved S. 2802, the American Innovation and Competitiveness Act, by a vote of 21-0.
[...]
S. 2802 responds to recommendations contained in the Council on Competitiveness' Innovate America Report and the National Academies' Rising Above the Gathering Storm Report. In responding to these reports, the legislation focuses on three primary areas of importance to maintaining and improving United States' innovation in the 21st Century: increasing research investment, increasing science and technology talent, and developing innovation infrastructure.
The bill sets authorization levels for both the National Science Foundation (NSF) and the National Institute of Standards and Technology (NIST). To increase the nation's commitment to basic research, the bill increases authorized funding for NSF from $6.4 billion in Fiscal Year 2007 to $11.4 billion in Fiscal Year 2011. The legislation authorizes NIST from approximately $640 million in Fiscal Year 2007 to $937 million by Fiscal Year 2011, and it establishes a Fiscal Year 2007 level of approximately $110 million for the Hollings Manufacturing Extension Partnership program (MEP), which increases to $130 million in fiscal years 2008 through 2011.
In addition, the bill requires the National Academy of Sciences to conduct a study to identify forms of risk that create barriers to innovation one year after enactment of the bill and every four years thereafter. The study is intended to support research on the long-term value of innovation to the business community and to identify means to mitigate legal or practical risks presently associated with such innovation activities.

The press release says that "A list of amendments that were adopted as part of a manager's package is attached", but the online version doesn't seem to connect to any such list. ]

Posted by Mark Liberman at 07:42 PM

And people say we monkey around

Yeah, yeah, another animal language story. We are sooo excited at Language Log Plaza that we are taking it in turns to bungee jump from Mark's helipad. Geoff P. was kind enough to let me go first, and I'm writing this upside down swaying in the breeze while staring through Ben's window, but he seems kinda busy typing, so even if he could hear me through the inch thick glass, I wouldn't disturb him. Maybe someone will pull me up soon. Anyway, it must be years months weeks since the last time it turned out animals could do so much more than anyone ever suspected.

And can you guess what those smart little critters can do now? They can make not one, but two different sounds. In combination. And the combination means something different from either sound. That's syntax! Everyone is saying so! Of course, it could also be phonology, but everyone isn't saying that. You see, the sounds are so far apart they seem more like words than phonemes. Listen for yourself. Oops, I meant here of course. And Chomsky has argued on many occasions that one of the hallmarks of human syntax is that there are really big gaps, or at least that's how I interpret him. So you can see why these new critters, putty-nosed monkeys no less, are really sending us off the deep end. Gosh, I mean these monkeys almost have compositionality. That would mean that the combined sound had a meaning that was built up of the meanings of the parts.

Based loosely on the work (and I haven't seen the original, so none of my comments apply to it) of Kate Arnold and Klaus Zuberbühler, of the University of St Andrews, as reported in Nature News, in an article subtitled "monkeys string sounds together to create meaning," ehhh, this sentence has a lot of parts to it, a wonder that you can even begin to parse it, and I want to wish you the very best of luck with getting all the way to the end, well, actually, I must confess I'm probably making life unnecessarily tough for you by writing it backwards as well as upside down, you'd never have known, would you, here is a putty-nosed monkey phrasebook you may find useful:

pyow: hey everyone, get away from the lower branches, or some ground beast might get you.
hack: hey everyone, get away from the canopy or an eagle might get you.
pyow ... hack: hey everyone, wherever you are, move.

You're impressed, right? The first time a monkey came up with that innovation the whole pack looked at him like he was crazy. But nowadays it's pretty much accepted. "Pyow hack!" "OK, we're moving, we're moving." (They don't actually say that last part. More of a Gricean inference.)

My dog, see, he's a pretty smart dog. He can make two sounds. He can whimper and he can bark. And sometimes he barks lots of times. And sometimes, if you shut him in the kitchen, he whimpers lots of times. But what he doesn't do is bark and whimper in the same sentence. Except when he wants to play with another dog and you're restraining him and he's excited but disappointed. But that doesn't count, cos it isn't in Nature. There will not be a Nature article "David's dog strings sounds together to create meaning."

Umm, you can pull me up now. Guys? Hello? Is anyone there?

Posted by David Beaver at 05:32 PM

Regale in basilica

A few days, ago, Victor Mair sent in a spectacularly mistranslated blurb from a package of mushrooms and seaweed, full of sentences like "It is the masterwork of the curiosity selected by our professional." One phrase in particular was baffling: "It is always the regale in basilica". Victor tracked down the Chinese original, and contributed this analysis

Well, I went and checked the parallel Chinese text for "It is always the regale in basilica" and this is what I found:

LI4LAI2 LIE4WEI2 HUANG2GONG1 YU4PIN3 HE2 GUO2YAN4 JIA1YAO2

A fairly literal translation of that would be "Throughout history it has been ranked among the royal products [for use in] the imperial palace and the delicacies [to be served at] state banquets."

Now you can get an idea of the brain-crunching that Sinologists have to go through every day.

Posted by Mark Liberman at 07:19 AM

May 17, 2006

Inconceivable!

Even A.O. Scott takes time in his review of the film adaptation of The Da Vinci Code to bash Dan Brown's prose, starting from the very first paragraph:

"The Da Vinci Code," Ron Howard's adaptation of Dan Brown's best-selling primer on how not to write an English sentence, arrives trailing more than its share of theological and historical disputation.

No, he's not about to "borrow" from some of Geoff's posts, but the two obvious digs may be interesting to Language Log readers nonetheless.

(Full disclosure: I haven't read any of Dan Brown's books nor have I seen the movie, and I don't plan to, thanks in no small part to Geoff and A.O. Scott. Consequently, I may have missed other more subtle digs in Scott's review.)

First, there's this curious comment about a pied-piped preposition (emphasis added):

To their credit, the director and his screenwriter [...] have streamlined Mr. Brown's story and refrained from trying to capture his, um, prose style. "Almost inconceivably, the gun into which she was now staring was clutched in the pale hand of an enormous albino with long white hair." Such language – note the exquisite "almost" and the fastidious tucking of the "which" after the preposition – can only live on the page.

Since the comparison is with the following alternative phrasing, isn't it the preposition "into" which is fastidiously tucked before the wh-word "which", rather than the other way around?

the gun which she was now staring into

Of course, this rephrasing makes the "which" unnecessary; it could be replaced by "that" or (better) omitted altogether, so I can almost see why Scott says what he says about the original. But still.

The second dig is very Pullum-esque: Scott takes the "almost inconceivably" bit that he has already shown his distaste for and incorporates it into the first sentence of his outline of the movie plot:

[A]n old man (Jean-Pierre Marielle) is killed after hours in the Louvre, shot in the stomach, almost inconceivably, by a hooded assailant.

Maybe Scott ripped Geoff off after all? Inconceivable!

[ Comments? ]

Posted by Eric Bakovic at 06:54 PM

Big much squib

I signed the e-mail "arnold, who much enjoyed the visit with beth et al. last night" and then realized that this use of much was of interest to me. I've posted here on determiner much (vs. determiner a lot of), and "much enjoyed" has a different much in it, a VP adverbial of degree, but the various uses of much have a lot in common, including alternation with a lot (of) and an affinity for negative and interrogative contexts, so it was notable that what I wrote had much in a positive declarative clause. In short order I racked up a list of puzzling properties of the VP adverbial much, beginning with a contrast in acceptability between preverbal positioning and postverbal positioning:

(1a) ok I much enjoyed these concerts. (preverbal)
(1b) ?? I enjoyed these concerts much. (postverbal)

I'm inclined to asterisk (1b), but I'll settle for deep disapproval for now. In any case:

Observation 1: The VP adverbial much is much less acceptable postverbally than preverbally.

Since my initial interest in determiner much had to do with its alternation with a lot of, I tried the VP adverbial a lot in the two positions, and found it to work essentially opposite to much: absolutely unacceptable preverbally, fine postverbally:

(2a) * I a lot enjoyed these concerts. (preverbal)
(2b) ok I enjoyed these concerts a lot. (postverbal)

Observation 2: The VP adverbial a lot is acceptable postverbally, unacceptable preverbally.

This is not so surprising; it's well-known that different adverbials have different privileges of occurrence in the several positions open to them. Still, it's interesting that much and a lot look like they're parceling out the two positions between them.

On to irrealis contexts, in particular negativity and interrogativity. After my posting on determiner much, John Lawler wrote me to claim that determiner much and many were in fact negative polarity items (NPIs) -- expressions that are restricted to certain contexts in which the factuality of some situation is not assumed or asserted, notably negative and interrogative contexts -- noting that they had been on his list of NPIs since he started keeping it, "around 1971 or so". I disputed Lawler's claim, pointing out that determiner much and many are virtually never unacceptable in positive declarative clauses; instead, sometimes they just seem infelicitously formal, and other times they are impeccable, as in this example (one of several such) supplied to me by Marilyn Martin:

My first full year at the Hawaii Film Office has been filled with much joy and much pain. (link)

(Meanwhile, Amanda Kraus wrote to report that determiner much is very common in hip hop culture, citing "the frequent (too numerous to list) calls of 'much love' versus Led Zeppelin's 'whole lotta love'".)

In any case, my attention had now been drawn to irrealis contexts, so I checked out the negative and interrogative counterparts of the questionable (1b), and found them fine:

(3a) ok I didn't enjoy these concerts much. (postverbal)
(3b) ok Did I enjoy these concerts much? No. (postverbal)

I concluded that there is a small island of NPI-hood in the much world, a place in which the affinity of much (in several of its uses) for negative and interrogative contexts has hardened into a restriction:

Observation 3: Postverbal VP adverbial much is a NPI. (And its preverbal counterpart is not.)

(There are also a couple of idioms with much in them that are NPIs: be much of a, as in "He's not much of a linguist" and "Is he much of a linguist?" but *"He's much of a linguist" 'He's an excellent linguist', and be much to look at, as in "He's not much to look at" and "Is he much to look at?" but *"He's much to look at" 'He's attractive'.)

At this point things got weirder. In my earlier posting I'd pointed out that the alternation between much and a lot of as determiners is complicated by the fact that the modifiers that determiner much can take -- so, that, very, etc. -- are not available for a lot of (and that, correspondingly, quite can modify a lot of but not much), so that when you want to modify these quantity determiners, you'll be forced to choose just one of them, with the result that in many contexts determiner much improves in acceptability just by being modified:

(4a) ? With much shrubbery growing in front of it, the house seems dwarfed.
(4b) ok With that/so much shrubbery growing in front of it, the house seems dwarfed.

All the uses of much are subject to modification in pretty much the same ways, and this includes the VP adverbial much. Preverbally, this much is fine unmodified, as in (1a), so it's no surprise that it continues to be fine when it's modified, but the postverbal version shows the amelioration effect in (4):

(5a) ok I very/so much enjoyed these concerts. (preverbal)
(5b) ok I enjoyed these concerts very/so much. (postverbal)

That is, we CAN get postverbal VP adverbial much in positive declarative clauses. It just has to be modified. Observation 3 has to be refined:

Observation 3 (revised): Unmodified postverbal VP adverbial much is a NPI.

This is a very small island of NPI-hood indeed.

But wait! There's more. So far I've been talking about the VP adverbial of DEGREE much; the meaning of much in the two examples of (5) is roughly 'greatly, to a high degree'. But there's another VP adverbial much, namely a FREQUENCY adverbial with roughly the semantics and syntax of many times. The frequency adverbials much and many times are possible, though a bit edgy, in postverbal position, but (like a lot, and unlike often) absolutely unacceptable preverbally:

(6a) ? I come here much/many times. (postverbal)
(6b) * I much/many times come here. (preverbal)

(7a) ok I come here often. (postverbal)
(7b) ok I often come here. (preverbal)

We are forced to revise Observation 3 once again, to shrink the island still further:

Observation 3 (second revision): Unmodified postverbal degree VP adverbial much is a NPI.

Enough of postverbal much for today. On to preverbal much, as in (1a). If you do a Google web search on <"I much">, as Thomas Grano did for me yesterday, you get an enormous number of hits, nearly three million. Suspiciously many of them are "I much prefer". Googling on <"I much prefer"> shows that about HALF of those original hits have the verb prefer, and that lots of the rest are junk of one sort or another. Grano began to suspect that most verbs don't allow preverbal much, and we were quickly able to concoct near-minimal pairs like these:

(8a) ok I much appreciate your advice. (APPRECIATE)
(8b) *    I much believe your claims. (BELIEVE)

(9a) ?    I much look forward to her arrival. (LOOK FORWARD)
(9b) *   I much expect her to arrive soon. (EXPECT)

Observation 4 (tentative): The default is for verbs to disallow preverbal degree much.

At the moment, Grano and I have no idea about what properties of verbs -- semantic, phonological, whatever -- might improve them as hosts for preverbal much. It is known that there are verb-specific conditions on VP adverbials of degree; Pullum and Huddleston (CGEL, p. 579) survey the situation warily:

There are significant differences among degree adverbs. Some, such as almost, nearly, quite, normally occur only in [preverbal] position. Others, such as thoroughly, enormously, greatly, occur in either [preverbal] or [postverbal] position. With this second set, [postverbal] position is the default, and acceptability in [preverbal] position depends on the verb. Thus He enormously admires them is fine, but we cannot have *The price has enormously gone up.

With much, the situation seems to be:

Observation 5: Some verbs permit preverbal much, and also postverbal much if the much is modified, while others -- the default type, perhaps -- permit preverbal much ONLY IF IT IS MODIFIED, and disallow postverbal much entirely.

For appreciate (in (10)) vs. believe (in (11)):

(10a) ok I much appreciate your advice. (preverbal, unmodified)
(10b) ok I very much appreciate your advice. (preverbal, modified)
(10c) *    I appreciate your advice much. (postverbal, unmodified)
(10d) ok I appreciate your advice very much. (postverbal, modified)

(11a) *    I much believe your claims. (preverbal, unmodified)
(11b) ok I very much believe your claims. (preverbal, modified)
(11c) *    I believe your claims much. (postverbal, unmodified)
(11d) *    I believe your claims very much. (postverbal, modified)

Perhaps there are more than these two types. Grano and I are just getting into this stuff, which is vastly more complex than we'd thought at first. And we haven't yet looked at how the classification of verbs with respect to degree adverbial much lines up with their classification with respect to other degree adverbials. And we're sure that there will be some variation here from speaker to speaker.

We also don't know if we're walking on a path that others have traveled on. It usually turns out that Jespersen or Curme has been there, or Bolinger, or McCawley, just to name the most likely suspects.

[And now, an unsolicited letter of thanks, as the end of my year at the Stanford Humanities Center looms. First to Thomas Grano, who (as an Undergraduate Fellow at the SHC) has worked with me all year on my project on the advice literature on English grammar, usage, and style in the 20th century; he's scoured this literature for treatments of particular points, collected data (usually by Google searches) on twelve different topics, and joined me in hours of discussion about interpreting what he and I had found. It's been like having an annex to my mind.

Thanks also to the SHC staff, for selecting him for a fellowship and providing him with practical support of several kinds, including free lunch whenever he wanted it, and to the office of the Vice Provost for Undergraduate Education at Stanford, which funded that fellowship, oversees the undergraduate honors programs (Grano has also just completed an honors thesis, on pronoun case in coordination), funds the Stanford Introductory Seminars (my advice-literature project grew out of sophomore seminars I taught over the years in the SIS program), and is now about to fund an undergraduate intern for me for this summer, to continue my research on the choice of variant expressions, like much vs. a lot (of). In two past summers, the VPUE's office has funded interns for me on other pieces of my usage project (on the reflexive themself and on dangling modifiers), as well as interns for the Stanford ALL Project (on innovative uses of all). The VPUE's office is there to benefit students, but obviously it does a lot for faculty too.

Finally, thanks to the sources of my own funding for this fabulous year: the School of Humanities and Sciences at Stanford, the Department of Linguistics at Stanford, and the Mericos Foundation, through a gift to the SHC's endowment.]

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 06:29 PM

Starlings, darlings

Monkeys, dogs, whales, chimps, gorillas, dolphins, parrots... Everybody seems to want to find language in animals. And then they turn round and deny it when it is seen in university students. Just twenty minutes ago at a Radcliffe Institute reception I met a woman who thought that for an 20-year-old American to use an expression like I should have went is evidence that something awful has happened to the linguistic capacities of Harvard undergraduates. What is wrong with everybody? Attributing language to a pet animal with a brain the size of a hazelnut but denying it in a fluent speaker of a slightly non-standard variety of English? Talk about a double standard! Listen, I'll grant you that you have found a language-using animal when you can show me one that is capable of plagiarism. No, no, don't tell me parrots and songbirds do it; they don't. They imitate sound streams. They don't even know it's language. To be a plagiarist you have to know what language is, and know how to use it, and know about authorship, and about concealment. It's a highly sophisticated linguistic activity. Nothing unintelligent about it. Like other language skills, plagiarism is not for the birds. Oh, and by the way, talking of birds, Ray Jackendoff and Mark Liberman and Barbara Scholz and I sent a letter to Nature about the Gentner et al. work on alleged syntactic skills in starlings (sigh; yes, I said starlings, darlings). Nature rejected it by lunchtime the next day (science journals almost always reject criticisms by linguists of non-linguists' papers about linguistic topics), but you can read the main part of the text of our letter on the LINGUIST List: follow this link.

Barbara Scholz pointed out to me that Medical News Today headlined its story about the starlings with a remark about how "The European starling ... may also soon gain a reputation as something of a grammar-marm". You just can't parody stuff as dopey as the headlines that get made up for science stories, can you?

Posted by Geoffrey K. Pullum at 06:19 PM

A tale of two copiers

In 1988, Molly Ivins published an article in Mother Jones magazine called "Magnolias and Moonshine". Seven years later, Florence King responded with an article in The American Enterprise magazine, September/October 1995, under the title "Molly Ivins, Plagiarist".

King accused Ivins of three things. The first thing was "gilding the lily". King had written in her 1975 book Southern Ladies and Gentlemen that the southern woman

... is required to be frigid, passionate, sweet, bitchy, and scatterbrained—all at the same time. Her problems spring from the fact that she succeeds.

Ivins quoted this as follows:

In her definitive work, Southern Ladies and Gentlemen, Florence King observes, “The cult of southern womanhood…requires [a female] to be frigid, passionate, sweet, bitchy, animated, and scatterbrained all at the same time…. A horrifying number of us succeed, which accounts for that popular southern female pastime, having a nervous breakdown.”

"Add a l’il more on there, honey, give the folks they money’s worth", King suggests.

The other two alleged authorial crimes are instances of apparent plagiarism. King writes that

My name is strewn through ["Magnolias and Moonshine"], but never where it counts. She credits me on minor observations, but when the subject is politics—her turf—she plagiarizes me.

King cites two instances, both of plagiarism in the paraphrase mode:

IVINS: “Keep in mind that Southerners are so conservative they voted for Franklin Roosevelt, so isolationist they voted for Richard Nixon, so populist they voted for Barry Goldwater, so aristocratic they voted for George Wallace, and that they see nothing peculiar in any of this.”

KING: “The typical Southerner:
—Brags about what a conservative he is and then votes for Franklin D. Roosevelt.
—Or brags about what an isolationist he is and then votes for Richard Nixon.
—Or brags about what a populist he is and then votes for Barry Goldwater.
—Or brags about what an aristocrat he is and then votes for George Wallace.
—And is able to say with a straight face that he sees nothing peculiar about any of the above.”

IVINS: “The Southern passion for military service first astonished the rest of the country in 1898, when Southerners signed up in droves to avenge the Maine. It was the country’s first war since Appomattox, and for 33 years Yankees had questioned Southern loyalty.”

KING: “In 1898, the phenomenon that surprised Americans nearly as much as the explosion of the battleship Maine was the vast number of Southern men who answered the call to the colors. It was America’s first war since Appomattox, and Southern loyalty had been in question for 33 years.”

King was very angry. She is quoted elsewhere as telling reporters that "if we had the right kind of laws in this country I’d challenge her to duel over this." She opens her TAE article by writing that "Most liberals sneer, grate, whine, scream, and picket, but Molly Ivins chuckles wisely and smiles tiredly so everyone will regard her as a lovable cynic", and sprinkles the piece with zingers like "she delivers laid-back wisdom with the serenity of a down-home Buddha who has discovered that stool softeners really work", and "Watching her go through her paces is like watching Ona Munson, who played Belle Watling in Gone With the Wind, doing an imitation of Spencer Tracy playing Clarence Darrow in Inherit the Wind. That’s a lot of wind."

In my opinion Ivins was clearly guilty as charged, although the longest stretch of literal copying was only four words long ("first war since Appomattox, and"), and none of the other literally-copied sequences are more than two words long. This was plagiarism of the paraphrasing type.

Apparently Ivins agreed, because her (apparently immediate) response was to 'fess up and apologize. The December issue of The American Enterprise magazine published an exchange of letters between Ivins and King, under the title "Author, Author!". (Both letters, puzzingly, seem to have been written before the first TAE article came out. I believe that time worked differently back in the last century, at least in the publishing industry.)

Ivins' letter:

August 16, 1995

Dear Ms. King,

You are quite right. There are three sentences in my article “Magnolias and Moonshine” —one of them a really good political line—that should have been attributed directly to you and are not.

On the third matter you raise in your Author Author! column in The American Enterprise, I have no idea how I managed to attribute to you more than you actually said—perhaps a recollection of something somewhere else in one of your books on the South. But I do not think a mistake of excessive attribution can be considered plagiarism.

I owe you an apology and I hereby tender it. I am deeply ashamed. I regret not giving you credit, and devoutly wish the matter had been brought to my attention earlier so it might have been corrected in subsequent editions and the paperback edition of the book.

I hope this does not sound too defensive to you, but there was no intention on my part to deceive anyone into thinking I had not read the many funny things you have said about the South. I hope my good faith is evidenced by the fact that I did cite you directly six times in the piece and praise one of your books as “definitive” on the peculiarities of Southerners as well.

I was inexcusably sloppy about the three sentences in question, with emphasis on the inexcusably.

Over the years, I have not only quoted many of your wonderful lines about the South in speeches—always, I believe, giving you credit—but also recommended your books to hundreds of people. I realize this does not excuse my lifting lines of yours without credit, but I did want you to know.

As for the rest of your observations about me and my work in your Author Author! column, boy you really are a mean b——, aren’t you?

Sincerely,
Molly Ivins, plagiarist

King's response:

August 24, 1995

Dear Miss Ivins:

Rather than rehash what I call plagiarism and you call careless attribution, I will speak in general terms.

First, the Washington Post, in breaking this story, referred to your “side” and my “side.” How can there be a “side” in this when everyone involved is either a writer or an editor? All of us, by definition, are on the same side—the word side. Every word I write is a piece of my heart, and I presume you feel the same way.

Second, I’m wondering how you managed to recycle me unchanged from the 1988 Mother Jones article into the 1991 book. When I compiled The Florence King Reader, I reread everything I’ve published over the last 20 years. I polished, revised, even rewrote some of the early selections to bring them up to my present standards, and I also prepared a fresh manuscript. This is how you catch mistakes. Anthologies are harder than they look, so please look next time.

Third, your publisher contends that I am seeking publicity by “attempting to hang onto the cape of Molly’s notoriety.” (You may want to take issue with him over his choice of words.) I have no need or wish for “notoriety”; celebrity is bad enough. I already have the only thing I want: the admiration and respect of people who know good writing and love the English language as I do.

Finally, it’s a shame this had to happen because you and I are such a pair of old rips that we probably would have gotten along like gangbusters. Please don’t spoil any more potential friendships.

Sincerely,
Florence King

And now for something completely different.

Recently, Mark Steyn (a witty political commentator, a lot like Molly Ivins) wrote a book review that had some very striking similarities to a 2004 weblog post by Geoff Pullum. Steyn's response to email from Pullum, requesting an apology and a link, was to have an assistant write a note saying that

We cannot see any similarities between Mark's piece and yours other than the quotations themselves, which obviously are the work of Mr Brown, and the grammatical term, which Mark was at pains to credit to you.

It's true that the three quotations are "obviously the work of Mr. Brown", but

Even facts or quotations can be plagiarized through the trick of citing to a quotation from a primary source rather than to the secondary source in which the plagiarist found it in order to conceal reliance on the secondary source. ["Copy Wrong: Plagiarism, Process, Property, and the Law," California Law Review, 1992; quoted in "What is plagiarism?", The Chronicle of Higher Education, 12/17/2004]

However, Steyn's "similarities" are not limited to a selection of quotations from Dan Brown, along with a set of ideas about why those particular quotations are interesting. For an even more striking similarity, the reader should consult the table at the end of this post, comparing Steyn's witticism "Novelist Dan Brown staggered through the formulaic splendour of his opening sentence" to its possible sources: the original quotation from Brown ("Renowned curator Jacques Saunière staggered through the vaulted archway of the museum's Grand Gallery"), the title of Pullum's post ("Renowned author Dan Brown staggered through his formulaic opening sentence"), and Pullum's reprise of the theme in the body of the post ("Renowned linguist Geoffrey Pullum staggered across the savage splendor of the forsaken Santa Cruz campus").

Given this (in my opinion clear) sign of influence, do you believe that Steyn didn't read Pullum before writing his review? How credible do you find it that Steyn came up independently with the idea of focusing on Brown's missing the's, and also with the particular examples and their order, and was subsequently given Pullum's grammatical terminology by one of his assistants

... because Mark asked if there was a technical term for a missing definite article and a Welsh University website, which led us to you, suggested the term had been coined by you.

The assistant pointed to an alternative chain of influence, also not credited in Steyn's review:

Mark's interest in this subject was piqued not by your website, with which he was not familiar, but by an item by Mark's former editor at his London newspaper, The Daily Telegraph, on the missing definite article in the first sentence of The Da Vinci Code. Mark mentioned it on a radio show last year and then noticed a similar start in Angels And Demons and wondered if it was a habit.

In a follow-up note, the same assistant insisted again that the idea came from Steyn or from his "colleagues in London", not from Pullum:

Mark had never heard of your website till last week [i.e. after writing the review - myl] and we will be able to demonstrate in court that nobody in our office clicked on two of your three allegedly plagiarized pieces until we received your e-mail. The points you claim Mark stole from you were made by others, including Mark and Mark's colleagues in London, long before we ever clicked on your website, as we would again prove in court.

The issue is not whether Steyn or his assistants clicked on Pullum's "website", but whether they copied Pullum's ideas and words, directly or indirectly. I guess it's possible that Steyn's "former editor at his London newspaper, The Daily Telegraph" borrowed Pullum's ideas and words in 2005, and Steyn then borrowed them from him -- the Telegraph's online archives only go back one year, so I can't check. But Geoff's posts on Dan Brown have been very widely read and circulated on the internet. As I mentioned before, one of them has for some time been on the first page of Google hits for {Dan Brown}. Language Log has gotten roughly four million page views since 2004, and around five percent of these are Geoff's Dan Brown posts, so that something like 200,000 people have read one or more of them. So it's also possible that someone emailed a copy of Pullum's posts to Steyn or to one of his assistants.

Whatever the detailed chain of transmission, I find it very hard to believe that Steyn wrote his Maclean's review "The Da Vinci Code: bad writing for biblical illiterates" without having read Pullum's post "Renowned author Dan Brown staggered through his formulaic opening sentence" (and perhaps others, such as "The Dan Brown Code"). What do you think?

I should mention that Steyn's assistant ended her communications with what might be perceived as a threat:

It is up to you whether you wish to escalate this any further. [...] But, given the intemperate nature of your e-mails, I think it would be better if you spoke to your lawyer and we will refer him to ours.

It's pretty common for people whose words and ideas are copied without attribution to get a little hot under the collar. In contrast to King's public take-down of Ivins, however, Pullum's private request for an apology and a link didn't mention challenging Steyn to a duel, or comment on the looseness of his bowels, or call him a windbag. And as Geoff made very clear, he doesn't see this as a legal issue but as a moral one, where the appropriate and courageous response would be a forthright apology. To my mind, the question here is whether Mark Steyn has as much grace and courage as Molly Ivins does.

[Update (with apologies for adding to what is already an overlong post): Ben Zimmer did a better job of searching the Daily Telegraph's archives than I did, and found the following. On 2/11/2005, Sam Leith's Daily Telegraph "Notebook" included the following item, reprinted below in its entirety:

The Da Vinci Code is an exemplary demonstration of the truth that, more than any other genre, a thriller need not be well written to work. Plotting and pace are all.

But seldom do books manage to grate from before the first word of the opening sentence. "Renowned curator Jacques Sauniere staggered through the vaulted archway…" It's the dog that didn't bark. The first word - "the" - isn't there. My theory is that a shadowy order of monks has stolen Dan Brown's definite article, and is guarding it at an ancient Templar priory.

According to Nexis, this appeared on p. 22 of the 11 Feb. 2005 edition, well after Pullum's widely-circulated posts on Dan Brown. In particular, Leith's little joke about the how Brown's novel "[grates] from before the first word of the opening sentence" is similar to what Pullum wrote 9 months earlier in "The Dan Brown code":

I am still trying to come up with a fully convincing account of just what it was about his very first sentence, indeed the very first word, that told me instantly that I was in for a very bad time stylistically.

The Da Vinci Code may well be the only novel ever written that begins with the word renowned. Here is the paragraph with which the book opens. The scene (says a dateline under the chapter heading, 'Prologue') is the Louvre, late at night:

Renowned curator Jacques Saunière staggered through the vaulted archway of the museum's Grand Gallery. He lunged for the nearest painting he could see, a Caravaggio. Grabbing the gilded frame, the seventy-six-year-old man heaved the masterpiece toward himself until it tore from the wall and Saunière collapsed backward in a heap beneath the canvas.

I think what enabled the first word to tip me off that I was about to spend a number of hours in the company of one of the worst prose stylists in the history of literature was this. Putting curriculum vitae details into complex modifiers on proper names or definite descriptions is what you do in journalistic stories about deaths; you just don't do it in describing an event in a narrative.

Leith might have come up with this idea independently, or he might have gotten it from Pullum and thought it didn't rise to journalistic standards of sharing credit, or he might have gotten it from someone who got it from Pullum. This sort of recycling of jokes has always been common, if not entirely sanctioned -- Oscar Wilde: "I wish I had said that." James Whistler: "You will, Oscar, you will."

Leith's note is a small thing, in any case: the fifth of five brief items in an editor's column of miscellanies. And it's credible that Steyn originally got the idea of focusing his review on Brown's the's from this note (though his Maclean's article doesn't credit Leith either). But wherever Steyn first got the idea from, I find it hard to believe, as I wrote above, that he didn't use material from Pullum's post "Renowned author Dan Brown staggered through his formulaic opening sentence", and perhaps other posts, in writing his Maclean's review. And the apparent scale of copying in that case, at least in my opinion, rises to the level where an acknowledgment (or after the fact, an apology) would be appropriate. ]

Posted by Mark Liberman at 12:02 AM

May 16, 2006

Recycling grammatical terminology

Christopher Hitchens' latest fighting words column for Slate ("Don't Talk to the Mullahs", 5/15/2006) directs a few desultory insults towards his recent virtual debating partner Juan Cole, while describing Mahmoud Ahmadinejad's letter to President Bush:

It then turns to a pedantic discussion of the wrongness of the whole existence of the state of Israel, which might have been designed to make professor Juan Cole (who thinks that Khomeinist anti-Zionism is a derivation from Persian poetry) look like a fool and an ignoramus.

Though these insults are pedestrian, by Hitchens' standards, the column did feature a notable act of lexicographic creativity.

His innovation occurs in the last sentence of the paragraph quoted below:

The man is as mad as a hatter, therefore, and makes up for his impotence and insanity with many ingratiating assurances about Jesus and his honored place in the Quran and many lachrymose remarks about violations of human rights. He declares that his regime's nuclear program is a matter of "scientific R&D," and he ends with a salutation in Arabic which is given without translation in the news-agency versions that have been made available. The salutation reads, "Vasalam Ala Man Ataba'al hoda." This is a customary signoff by devout clerics, in Iran as well as in Arab lands, and can be approximately translated as "peace unto those who follow the true path." It was a favorite of the late Ayatollah Khomeini's. According to some, it was used as a silkily threatening mode of address by the Prophet Mohammed, who employed it when addressing neighboring states that had not yet converted to Islam. In this declension, it could be interpreted to imply war unto those who did not choose to follow the true path. [emphasis added]

As the AHD explains, Declension can mean things like

2. A descending slope; a descent. 3. A decline or decrease; deterioration: “States and empires have their periods of declension” (Laurence Sterne). 4. A deviation, as from a standard or practice.

but Hitchens has no descents, declines or deviations in mind. Instead, I believe, he's taking off from what the AHD gives as the first meaning of declension:

1. Linguistics a. In certain languages, the inflection of nouns, pronouns, and adjectives in categories such as case, number, and gender. b. A class of words of one language with the same or a similar system of inflections, such as the first declension in Latin.

However, Hitchens is not talking about a word, but rather about a phrase used as a "customary signoff", and he's not talking about its processes or categories of grammatical inflection. Instead, he obviously means something like "in this interpretation" or "in this construal" or perhaps "in this context of use". This is a plausible extended meaning, since the process of declension suits a noun, pronoun or adjective for different functions in different contexts, just the Arabic valediction is said to have different implications when addressed to co-religionists and to others. [Whether this interpretation is linguistically or historically correct is beyond my knowledge.]

It would be unwise to claim that no one has ever used the word declension in this way before, but I'm pretty sure that I've never heard or read it.

It's become standard for people using parse to mean "examine very carefully, especially with respect to possible alternative interpretations". This was originally a metaphorical extension of its basic meaning of "perform grammatical analysis", but now that grammatical analysis has largely vanished from the public curriculum, the metaphor is all that's left in popular discourse. As Hitchens has realized, a long list of other grammatical terms are also now available for re-use.

But there may be a narrow window of opportunity here: this metaphorical recycling can only happen as long as a fair fraction of the population can access, at least as a vague resonance, the literal meaning of grammatical terms like conjugation, inflection, mood, tense, affix, verb, predicate and the like. Unless there's a renaissance of linguistic analysis in primary and secondary education, this won't be true for long.

[Hat tip to Lane Greene]

Posted by Mark Liberman at 05:52 PM

Ambiguously numbered pronoun

My fascination with all things Apple has me occasionally reading the (likewise, occasionally) funny Crazy Apple Rumors Site. Yesterday's post was mildly funny, with a bit of corporate social commentary thrown in for good measure. What I found most interesting, though, was a very nice example of them being deliberately ambiguous between singular and plural. (Emphasis added.)

Crazy Apple Rumors Site has learned that former Apple General Counsel Nancy Heinen was released from the firm after failing to produce a pair of testicles. [...]

"We knew, of course, that Nancy was a woman," Jobs said. "But she long assured us that she had a pair of testicles that she kept in a safety deposit box somewhere." [...]

Late in April, Heinen reportedly stalled for time by saying that she had "loaned out the testicles to a friend who had forgotten to return them and then went on vacation and [she] couldn't get a hold of them."

"Them" apparently meaning either the friend or the testicles. Apple's male board members were apparently not impressed as they are usually quickly able to get a hold of their testicles.

Some related posts:

Are we an it or a they? (Geoff Pullum)
Shakespeare used they with singular antecedents so there (Geoff Pullum)
Singular they with known sex (Geoff Pullum)
Singular they and plural he/she/it (Mark Liberman)
Coolective nouns with singular verbs and plural pronouns (Mark Liberman)
The SAT fails a grammar test (Mark Liberman)
They are a prophet (Geoff Pullum)
"All lockers must be emptied of its contents." (Mark Liberman)

Less closely related (but still relevantly interesting):

Unchanging pronouns? (Sally Thomason)

[ Comments? ]

Posted by Eric Bakovic at 04:56 PM

Locative Epithets as Names

In all the foofuraw about the barbarism of referring to Leonardo da Vinci as da Vinci, nobody seems to have noticed that referring to people by their locative epithets alone is quite common. Here are some examples:

(Vincent) van Gogh
(Alexis) de Tocqueville
(James) van Allen
(Johannes Diderik) van der Waals

How many of you even knew the given names of the latter two?

Not only is the use of such names common, but I suspect that it is not necessarily the result of ignorance of the structure and usage of such names in their original language. In fact, one finds the same usage in the original languages. Check out the article on Democracy in America in the French language version of Wikipedia and you'll see that it refers to de Tocqueville , or the article on Van Gogh in the Dutch Wikipedia, which refers to Van Gogh.

Posted by Bill Poser at 01:07 PM

Cutting in line: what would Of Nazareth do?

Mark Steyn's review "The Da Vinci Code: bad writing for Biblical illiterates" starts like this:

It's a good rule in this line of work to respect a hit. But golly, The Da Vinci Code makes it hard. At the start of the book, Dan Brown pledges, "All descriptions of artwork, architecture, documents and secret rituals in this novel are accurate." It's everything else that's hokum, beginning with the title, whose false tinkle testifies to Brown's penchant for weirdly inauthentic historicity. Referring to "Leonardo da Vinci" as "da Vinci" is like listing Lawrence of Arabia in the phone book as "Of Arabia, Mr. L," or those computer-generated letters that write to the Duke of Wellington as "Dear Mr. Duke, you may already have won!"

This paragraph is just about the only part of what Steyn has to say about Dan Brown's book that is not strikingly similar to a Language Log post by Geoff Pullum. That's not to say, however, that it's original.

Actually, Geoff Pullum's 1/18/2005 post "The Kaleidoscope of Power" did say of The Da Vinci Code:

Even the title contains a linguistic error, Adam Gopnik claims in this week's issue of The New Yorker. Leonardo came from Vinci. Da Vinci is not a name. It's a prepositional phrase, like of Nazareth in Jesus of Nazareth. What would Of Nazareth do?

But Geoff credits Adam Gopnik. And this witticism is apparently not original to Gopnik either. A very similar point is made in a post by Emily in the weblog "it comes in pints?", dated almost a year earlier:

Though it still grinds me a little when people refer to Leonardo Da Vinci as Da Vinci, which is like calling William of Orange Of Orange. [March 18, 2004]

And the "of Nazareth" version is used on a web page that claims to have last been modified in October of 2004:

Also, Brown refers to Leonardo da Vinci, as "Da Vinci" , all through his book. Since "Da Vinci" means literally, "of Vinci", that would be the same as us calling Jesus, "of Nazareth", instead of, Jesus, of Nazareth ! [apparently October 26, 2004 or earlier]

The "da Vinci is like of X" meme has been picked up widely: on September 27, 2005, Jay Nordlinger wrote in the National Review:

Okay, a little language, and a little art. I want to say: Et tu, Antonin? While at the Juilliard School the other day, Justice Scalia referred to "da Vinci" — meaning, Leonardo. I'm surprised at him.

The mistake of referring to Leonardo as "da Vinci" is so entrenched, I'm afraid it's uncorrectable. I have had to fight with editors about this: You say "Leonardo," and they want to say "da Vinci," thinking it's his last name — thinking it's the same as saying "Reynolds." They think that, when you say "Leonardo," you're saying the equivalent of "Joshua." Actually, to say "da Vinci" is to say "of Orange," instead of "William."

Nordlinger quotes from Charles Moore's Spectator Diary "not long ago" (i.e. not long before September 2005):

My colleague Christopher Howse has pointed out that you can tell that The Da Vinci Code is rubbish just by its name. Students of art refer to the man in question as 'Leonardo', 'Da Vinci' being simply the identifier of his town of origin. So Dan Brown's title is the equivalent of a book about Jesus being called Of Nazareth. [That is much better than my "of Orange" example.]

(The passage in blue is Nordlinger's quote from Moore; the remark in square brackets is Nordlinger's comment on that quote.)

Depending on when Moore's piece actually ran (I'm not willing to pay 50 pounds for a subscription to The Spectator in order to find out), Moore's colleague Howse may have have misled him about the source of the "Of Nazareth" joke by failing to cite Gopnik. But this doesn't matter much, in my opinion -- it seems likely that some form of this insight has been commonplace among intellectuals for a while. It certainly pre-dates Gopnik, and it wouldn't suprise me to find a similar remark -- some re-phrasing of <referring to Leonardo as 'da Vinci' is like referring to X of Y as 'of Y'> -- dating from many years before.

However, it's worth noting that Pullum and Nordlinger both take the trouble to make it clear that the remark is not original to them. Steyn, in contrast, does not. As a result, quite a few bloggers have credited him with brilliance for trotting out his version of this well-worn witticism. For example:

Who looks at the world in quite the way Mark Steyn does? Here he's putting The Da Vinci Code into perspective:

It's everything else that's hokum, beginning with the title, whose false tinkle testifies to Brown's penchant for weirdly inauthentic historicity. Referring to "Leonardo da Vinci" as "da Vinci" is like listing Lawrence of Arabia in the phone book as "Of Arabia, Mr. L," or those computer-generated letters that write to the Duke of Wellington as "Dear Mr. Duke, you may already have won!"

Another one:

Steyn wins in a knockout. It wasn't really even a fair fight:

It's a good rule in this line of work to respect a hit. But golly, The Da Vinci Code makes it hard. At the start of the book, Dan Brown pledges, "All descriptions of artwork, architecture, documents and secret rituals in this novel are accurate." It's everything else that's hokum, beginning with the title, whose false tinkle testifies to Brown's penchant for weirdly inauthentic historicity. Referring to "Leonardo da Vinci" as "da Vinci" is like listing Lawrence of Arabia in the phone book as "Of Arabia, Mr. L," or those computer-generated letters that write to the Duke of Wellington as "Dear Mr. Duke, you may already have won!"

And another:

Quote of the day - Mark Steyn (the Da Vinci Code)

Referring to "Leonardo da Vinci" as "da Vinci" is like listing Lawrence of Arabia in the phone book as "Of Arabia, Mr. L," or those computer-generated letters that write to the Duke of Wellington as "Dear Mr. Duke, you may already have won!"

Or again:

A Steyn-slap to the Da Vinci Code. Nobody does it better. [emphasis added]

But in fact another writer did it at least as well, and in almost exactly the same way, and earlier. This sort of reputational mis-attribution is just what Jonathan Baron was writing about in his post "Plagiarism as probabilistic harm":

Plagiarism exists only as part of a system in which people are rewarded for their work. The reward in these cases is primarily reputation, or a personal record. [...]

When people pass off someone else's work as their own, they butt into the queue. This weakens the system and makes it less trustworthy. The harm from such weakening is probabilistic and not always noticed immediately.

Let me say at this point that I've been a fan of Mark Steyn's writing for some time. He's opinionated, clear, memorable and often very witty. For example, read the opening of his May 7, 2006 column "Moussaoui gets life, the terrorists win":

"America, you lose," said Zacarias Moussaoui as he was led away from the court last week.

Hard to disagree. Not just because he'll be living a long life at taxpayers' expense. He'd have had a good stretch of that even if he'd been "sentenced to death," which in America means you now spend more years sitting on Death Row exhausting your appeals than the average "life" sentence in Europe. America "lost" for a more basic reason: turning a war into a court case and upgrading the enemy to a defendant ensures you pretty much lose however it turns out. And the notion, peddled by some sappy member of the ghastly 9/11 Commission on one of the cable yakfests last week, that jihadists around the world are marveling at the fairness of the U.S. justice system, is preposterous. The leisurely legal process Moussaoui enjoyed lasted longer than America's participation in the Second World War. Around the world, everybody's enjoying a grand old laugh at the U.S. justice system.

Except for Saddam Hussein, who must be regretting he fell into the hands of the Iraqi justice system. Nine out of 12 U.S. jurors agreed that the "emotional abuse" Moussaoui suffered as a child should be a mitigating factor. Saddam could claim the same but his jury isn't operating to the legal principles of the Oprahfonic Code.

Whether or not you agree with Steyn's sentiments, you have to admit that this is brilliant writing. "Upgrading the enemy to a defendant" and "the Oprahfonic Code" are particularly nice touches, in my opinion. And as far as Google knows, these phrases are original to Steyn, as is the observation that Moussaoui's trial lasted longer than America's engagement in WWII. I bet that all the rest of it is original Steyn too, though I haven't checked.

My own experience has been that the students who engage in the more subtle forms of plagiarism are often among the smartest and most verbally adept. Something similar seems to be true of Stephen Ambrose, Doris Kearns Goodwin, Kaavya Viswanathan and the like. Such people are not borrowing in order to make up for their inadequacies, it seems, but in order to help establish or maintain a reputation that they have every reason to believe that they deserve.

[By the way, Steyn could have found a better replacement for his Duke of Wellington example (also a variant of an old joke) by looking in Geoff's August 26, 2004 post "A letter from the Lord Quirk". And if he'd cited it as Pullum cited Gopnik or as Nordlinger cited Moore, he'd have been welcome to it.]

Posted by Mark Liberman at 12:08 AM

May 15, 2006

Strange ifs of the third kind

In a recent New Yorker cartoon by Alex Gregory [April 17, 2006, page 74], a hospital patient, lying strapped down on the operating table and ready to be anaesthetized for an operation, looks up earnestly at the masked surgeon and says:

"You know, doctor, right now I'd really prefer if your sense of humor were a tad less self-deprecating."

The joke about what patients would think of surgeons' traditional operating-room humor is good, and made me smile; but the use of if in the caption really caught my attention. It's not the conditional if, you see — the one that you get in You can get it if you really want. And it's not the interrogative subordinator — the one that has exactly the same function as whether, as in I don't know if I'm coming or going. It's a very interesting if indeed. A third kind of if that almost no grammarians have written about. Let me explain.

The subordinator if that introduces interrogative content clauses is easy to spot: you just replace by whether and make sure the result is grammatical and has the same meaning. Prefer doesn't take interrogative content clauses: *I'd prefer whether your sense of humor were a tad less self-deprecating is obviously completely ungrammatical. So we can forget that. The cartoon does not have the interrogative subordinator if. What I have to do now is to convince you that we can tell the conditional if from the strange new third kind of if I am claiming English has. And I think I can do that.

My claim is that the patient in the cartoon is using if as a subordinator to introduce a declarative irrealis mood content clause, and that this is one of the grammatical possibilities with the matrix verb prefer.

Here's how to tell that it's not the conditional. Conditional if phrases are adjuncts, and you can always put an adjunct at the front of the clause it belongs to if you want. So we have pairs like this:

(1) a.	You can get it if you really want.
b.	If you really want, you can get it.
(2) a.	I would die of embarrassment if that happened to me.
b.	If that happened to me I would die of embarrassment.

In each case, because the (a) example is grammatical, so is the (b) example. Now look at what happens with the cartoon example (which I shorten by trimming the irrelevant stuff at the beginning):

(3) a.	I'd prefer if your sense of humor were a tad less self-deprecating.
b.	*If your sense of humor were a tad less self-deprecating, I'd prefer.

The version with the if clause at the front is ungrammatical! The situation is exactly comparable to this one:

(4) a.	I'd prefer for your sense of humor to be a tad less self-deprecating.
b.	*For your sense of humor to be a tad less self-deprecating, I'd prefer.

What's going on here is that you can always front an adjunct, but in general, it is much less likely that you will be able to front a complement clause. The sentence (3a) is comparable to the sentence in (4a) in every way, in fact: they mean just about the same, and I'm saying they have just about the same structure.

One difference between clauses introduced by if and clauses introduced by for is that for introduces an infinitival clause (it has the verb in the plain form and a to at the beginning of the verb phrase), but if introduces a finite clause in which the verb is in the special irrealis form if the verb has one.

The inflectional system of English is much less complex than it was a thousand years ago, and today there is only one verb that has an irrealis form that is (in spelling and pronunciation) different from its preterite. That verb is be. And even for be, there are only two contexts in which you can tell the irrealis from the preterite: the first person singular (if I were you) and the third person singular (if he were really smart). In all other person/number combinations, were is the preterite form too, but in these two the preterite would be was, so we can see whether the irrealis is being used. And by a stroke of good luck, that was the verb the cartoonist chose as the main verb of the complement clause! If it hadn't been — say, if he had chosen to have the patient say, "I'd really prefer if you stopped making jokes" — I wouldn't have been able to show you that it really is a special irrealis mood form in there, so there are special syntactic properties of this construction.

One other thing. Like other clauses, these if-introduced clauses are often found in what we call extraposition, with a meaningless it occupying the subject or object slot instead of the clause occupying it: we get It would be silly if you gave up now, and I'd appreciate it if you'd take your hand off my leg. But in these cases, since sentences like It would be silly and I'd appreciate it are grammatical in their own right (with a meaningful it that refers to something), it is not so easy to tell that you are looking at content clauses. There is a separate reading of these sentences under which the it is meaningful and the if introduces a conditional adjunct. That is, the sentences are ambiguous. What is so nice about the cartoon example quoted above is that it doesn't have an it object. And conveniently for my purposes, *I'd prefer is ungrammatical on its own. That tells us conclusively that we have a declarative irrealis content clause in our cartoon caption example. Q.E.D.

To summarize, before I dismiss the class: there are at least three items spelled if. Two of those three are described in every grammar book, but the third has virtually never been described anywhere. This latter one, this strange if of the third kind, introduces declarative content clauses in the irrealis mood. And let me make it very clear: there will be a test on this material, and it will be on the final.

Posted by Geoffrey K. Pullum at 11:43 PM

Linguistics fails again

In the May 13, 2006 New York Times, there's an article by Diana Jean Schemo about the controversy at Gallaudet University over the selection of Jane K. Fernandez as president, "Protests Continue at University for Deaf". The article is an interesting account of an unusual situation, as you'd expect from one the paper's main higher-education reporters. However, it contains one very odd sentence:

Deaf students here said that American Sign Language, which uses gestures to express words and ideas rather than specific letters, was easier for them to understand than other forms of communication that may translate letters and syntax that they have never heard and that are more difficult to grasp.

I think that this may be a reference to the difference between ASL and "finger spelling", in which the letters of written English are spelled out with a series of hand-shapes. However, the article continues

Erin Moran, who is studying for a master's in counseling and was handing out fliers opposing Dr. Fernandes, criticized her for not banning students from speaking in front of deaf students, instead of using only American Sign Language. When that happens, Ms. Moran said, deaf students feel shut out at an institution that should help strengthen their identity as deaf people with a right to participate fully in the world.

This makes it seem as if Schemo is contrasting sign language with spoken language, implying that speech "uses ... specific letters" "to express words and ideas". Perhaps this is Schemo's confusion, or perhaps it was introduced by an editor working to shorten a longer account of the linguistic issues involved. In either case, it's another example of the common confusion between languages and their writing systems, and another casual journalistic mis-description of speech and language.

Confusions about the nature of orthography and its relationship to language are most evident in discussions of Chinese, but there are plenty of examples within the boundaries of English. For another case involving a smart and well-educated person writing in a major American publication, consider Leon Wieseltier's account of his g-dropping choices in his Sopranos role as "Stewart Silverman". It's clear enough what pronunciation options Wieselter was getting at, though his description was completely inaccurate; in contrast, it's not at all clear what Schemo meant to tell us about the nature of ASL.

As I wrote about another casually botched linguistic description in the popular press, I blame the linguists. Modern intellectuals are almost entirely bereft of resources for talking about the simplest facts of pronunciation, sentence structure, and meaning. This isn't their fault -- in most cases, no one has ever taught them anything about these topics. My profession has failed in its most basic duty to society.

Posted by Mark Liberman at 06:58 PM

A grander Chinglish

Email from Victor Mair:

From the back of a package full of fancy mushrooms and seaweed products (I boiled a decoction of them for Li-ching last night):

It is made from the exclusive remote mountains or non-polluted maritime space. It is the masterwork of the curiosity selected by our professional. This series use the classy material, fastened on the edible value and health care. It is the pure natural health care. It is according to the pursue of modern to return to the nature and green life. It is always the regale in basilica and the best choice of presenting a gift to friends.

The first clause of the last sentence is particularly precious. I have no idea what it means.

Breathless,

Victor

Posted by Mark Liberman at 04:19 PM

Is Mark Steyn guilty of plagiarism?

I described the facts in an earlier post. It seems clear to me that in Steyn's 550-word discussion of Dan Brown's style, he took the terminology, most of the basic ideas, all of his three examples (in order), a couple of turns of phrase, and his punch line from one of Geoff Pullum's Language Log posts. He credits Pullum by name (though he gives no link or any other sort of source citation) for the term "anarthrous occupational nominal premodifier", but not for the rest of his borrowings. I promised to give my opinion later on, and this is a first installment.

When an undergraduate in one of my classes turns in a paper with a similar amount of uncredited copying, I ask him or her to come see me. We're not talking about something copied wholesale from a published paper or from an internet paper mill -- that would simply get a grade of zero and a referral to the Office of Student Conduct for further action. We're talking about a case where some basic ideas, a series of quotations or examples, and some key turns of phrase are taken from another source without explicit credit.

After laying out the facts, I'd ask for a response, which at first is usually a denial or an excuse. One of the commonest excuses is "But that source is in my bibliography/footnotes!" In that case, I would explain how Stephen Ambrose was accused of plagiarizing Thomas Childers, despite the fact that Ambrose gives Childers "a mention in the bibliography and four footnotes" (according to Fred Barnes' Daily Standard article of January 1, 2002, "Stephen Ambrose, Copycat" which also gives several examples of the copying involved). I'd also show them the coverage of the case where Doris Kearns Goodwin was accused of plagiarizing Lynne McTaggart, starting with what Timothy Noah wrote in a Slate article from 1/22/2002 headlined "Doris Kearns Goodwin, liar: First she plagiarized, then she lied about it":

Did Doris Kearns Goodwin commit plagiarism? "Absolutely not," she tells Boston Globe reporter Thomas C. Palmer Jr. "There were extensive footnotes.'' Chatterbox has had it with brand-name historians who pretend that the rules allow you to steal someone else's sentences (for examples of Goodwin's theft, click here) provided that you supply a footnote. This is not a gray area.

And I'd urge them to read the rest of the Ambrose/Goodwin coverage, in the hopes of persuading them that more is at stake than just their grade in one undergraduate class. These days, I might take this hypothetical student through the sad tales of Kaavya Viswanathan and William H. Swanson as well, to drive home the lesson about the consequences of plagiarism (though those were cases where no reference of any kind was given, even an inadequate one).

Since students often remain convinced that what they did was OK, since they rearranged words a bit or paraphrased some of the material rather than quoting it, I give them a copy of the special report from The Chronicle of Higher Education, posted 12/17/2004, "What is plagiarism?" I might ask them to read these two paragraphs out loud:

Outright copying of someone else's writing is only the most clear-cut form of plagiarism. The Modern Language Association provides a succinct but sweeping catalog of varieties of plagiarism in its MLA Handbook for Writers of Research Papers: "A writer who fails to give appropriate acknowledgment when repeating another's wording or particularly apt term, paraphrasing another's argument, or presenting another's line of thinking is guilty of plagiarism."

The term "plagiarism" applies to "the imitation of structure, research, and organization," notes Laurie Stearns, a copyright lawyer in "Copy Wrong: Plagiarism, Process, Property, and the Law," an essay appearing in the California Law Review in 1992. "Even facts or quotations can be plagiarized," writes Ms. Stearns, "through the trick of citing to a quotation from a primary source rather than to the secondary source in which the plagiarist found it in order to conceal reliance on the secondary source." In the sciences, "accusations of plagiarism may center on the content of discoveries or the interpretation of data rather than on specific phraseology."

I also try to make sure that the hypothetical student understands that plagiarism is not at all the same thing as copyright violation. As the Chronicle article explains

If Smith copies a chapter from a book by Jones without permission, then the rights of the copyright holder have been violated. But suppose Smith paraphrases the chapter, argument by argument. In that case, Smith will have copied the ideas, but not the expression, of a copyrighted work. If no credit is given, then Jones has every reason to complain about being plagiarized. Still, assuming that Smith has been careful not to borrow any of the language of the original, it will not be an infringement of copyright.

In his essay "Plagiarism, Norms, and the Limits of Theft Law: Some Observations on the Use of Criminal Sanctions in Enforcing Intellectual Property Rights," appearing in the Hastings Law Review in 2002, Stuart P. Green, a professor of law at Louisiana State University at Baton Rouge, writes that copyright law "protects a primarily economic interest that a copyright holder has in her work ... whereas the rule against plagiarism protects a personal, or moral, interest."

I might also try to engage their moral sense with a discussion of why plagiarism is ethically wrong, based on ideas like those that Jonathan Baron lays out in his blog post "Plagiarism as probabilistic harm". I try to explain that I don't think that they're an evil person, but what they did was wrong. In order to try to engage their sense of self-preservation, I underline (as I did at greater length here) that plagiarism in academic and journalistic writing is one of those sins against the social order that our culture often takes seriously, like murder, rather than one of those that it usually excuses, like extramarital sex.

After all of this discussion, what happens next depends on how the student reacts. Usually we have a tense but friendly discussion, at the end of which they agree to do the paper over again. If their first and last reaction were instead to be "I did nothing wrong -- see you in court!", I'd refer the case to my university's Office of Student Conduct and let them sort it out. I'm happy to say that this has never happened to me.

Unfortunately, this is roughly what happened to Geoff Pullum in the case under discussion. As I understand it, the sequence of actions and reactions was something like this. First, a Language Log reader emailed Geoff to tell him that he was mentioned by name in a Steyn piece (no reference given). Geoff googled Steyn and found the Steyn Online web site. He was expecting to see just some passing mention in a piece about something else, but found that Steyn's review of The Da Vinci Code seemed to be developed entirely out of his ideas. Thinking initially that it was merely a piece on the web site, Geoff wrote to Steyn and asked him if he could modify it with links to credit the Language Log pieces that had influenced it. A short time later, after learning that links were now out of the question because the piece was in final form had already appeared in print in Maclean's (the link from Steyn Online actually pointed to the Maclean's web site), and having had no immediate reply, Geoff wrote again to ask for an acknowledgment and some public attempt to to clarify the source of the ideas and examples. At no time did he mention legal action, copyright, or courts, because it was always clear to him (as it is to me) that this is not a matter to which copyright law could possibly be applied.

Steyn's assistant responded (and I paraphrase rather than quoting here) <<Steyn did nothing wrong -- see you in court if you dare to take this further. >>

Mark Steyn is, of course, not a student. So given his attitude, I think it's appropriate to refer his case to the court of public opinion. Make of it what you will.

Posted by Mark Liberman at 11:02 AM

Pulling (to) within: the paper trail

Last week I wrote about the peculiar sports expression "pull (to) within N" meaning 'narrow a differential of points, runs, etc. to exactly N' — and the even more peculiar construction "pull (to) within X-Y," where X-Y represents the score of two teams in a game. I traced the turn of phrase back to the language of boat-racing and horse-racing, which feature competitors shifting positions along a continuous course. When the spatial metaphor was borrowed into team sports like baseball and basketball, where scoring is discontinuous (expressed in whole numbers), the prepositional phrase "within N" came to be understood as 'previously more than N behind, now exactly N behind.' I sketched out a vague chronology for this shift, suggesting that by the mid-20th century it was common for sports commentators to talk about "pulling (to) within N points, runs, etc.," and that the newer version with a game score as the object of "within" was common by the 1970s. I based that chronology on a cursory glance through digitized newspaper databases, finding plenty of examples for the first sense c. 1950 and plenty for the second sense c. 1975.

My imprecision and lack of citational support did not impress Dr. Metablog, who had previously griped about the sportscaster usage of "within." Dr. M wrote:

I can't say that my memory supports his version of history. I've been listening to basketball games on the radio since just after World War II, when the NBA came into existence, and I'm moderately sure that I didn't hear the idiom "within two points" until the 1980s at the earliest. It was an unpleasant innovation in language that stuck painfully in my ear. To the best of my recollection, "within" was an invention of Marv Albert -- one of his extremely limited repertoire of linguistic tics. Others: "from downtown," "served up a facial," " yesss," "a spec-tac-ular move."

Since I was remiss in providing actual citations the first time around, I'll do so below. As it happens, Dr. M has fallen prey to the dastardly Recency Illusion, as these semantic shifts emerged much earlier than I had initially estimated.

First, it's important to recognize that the sporting version of "within" has been percolating for quite a long time independent of its usage in the longer phrase "pull (to) within." For instance, during the 1880 baseball season (back when the National League was the only game in town), the Chicago White Stockings opened up a big lead over the other teams in the standings. At the time, the league championship was awarded to whichever team had won the most games at the end of the season. As the season wound down, the Chicago Tribune kept track of how close the White Stockings were to clinching the championship (a calculation that baseball buffs would later call the "magic number"). On Sep. 10, 1880, the Tribune reported that the White Stockings were "now within four games of the championship goal," meaning that they could lock up the championship with four more wins. Two days later, the Tribune continued the countdown with the headline, "The Chicago team now within two games of the championship." Clearly, "within two games" could not be construed as "less than two games away from," since the article explains that two was the exact number of wins that would guarantee Chicago the National League pennant.

Now what about the full expression with "pull"? By the early years of the 20th century, one variant form of the phrase had already become popular with sportswriters: "pull up to within N points, runs, etc." (This echoed less quantifiable expressions of the time like "pull up to within striking/hailing/speaking distance.") I found that "pull up to within N" was most common in baseball, with cites back to 1891 if not earlier. But by the turn of the century it could also be applied to a variety of other sports, such as bowling, tennis, and the burgeoning pastime of basketball:

Atlanta Constitution, Sep. 5, 1891, p. 6
For five innings the contest was interesting. Then the visitors pulled up to within one of tieing the score, and it became still more so.

Boston Daily Globe, Jan. 12, 1900, p. 4
In the seventh frame the visitors had pulled up to within 37 pins.

Washington Post, Aug. 24, 1900, p. 8 (headline)
Rally fell a run short. Phillies pulled up to within one run of Giants last time at bat.

New York Times, June 9, 1901, p. 20
Miss Jones made a splendid effort when the score stood at 5-3 against her, pulling up to within a single point of the set only to lose it finally after a long struggle by 10-8.

Trenton (N.J.) Times, Oct. 28, 1901, p. 7
During the last ten minutes Millville pulled up to within seven points of Trenton.

The version with "pull up to within" was eventually overtaken by the shorter variants "pull to within" or simply "pull within." Here's a selection of cites from 1924 and 1925 (as above, drawn from ProQuest and Newspaperarchive) showing that these variants were already becoming firmly entrenched by that time:

Danville (Va.) Bee, May 14, 1924, p. 11
McGraw's worried outfit lost their fourth consecutive game to St. Louis yesterday, 8 to 3 and the Cubs pulled to within half a game of second place by defeating Brooklyn, 3 to 1.

Washington Post, June 9, 1924, p. S2
The Cubs, by winning two games from New York, pulled to within one game of the Giants.

Olean (N.Y.) Evening Herald, July 11, 1924, p. 10
Cleveland pulled to within a game of St. Louis by squeezing a 4 to 3 win out of the stubborn Athletics.

Appleton (Wisc.) Post-Crescent, Sep. 26, 1924, p. 17
The Pirates, however, went down with colors flying in the ninth inning when with two out they rallied and pulled to within a run of the champions on Carey's home run drive.

New York Times, Dec. 21, 1924, p. S2
At one time in the first half Franklin and Marshall had pulled to within one point of the Hoboken team, the score standing at 7 to 6.

Washington Post, Feb. 21, 1925, p. S1
With Ryan and McNaney playing in place of Farley and Gitlitz, Bucknell pulled to within 8 points of the Hilltoppers.

Decatur (Ill.) Daily Review, Jan. 18, 1925, p. 9
Taylorville led all the way until the last quarter when the Y staged a rally and pulled within one point of the visitors.

Washington Post, Mar. 23, 1925, p. S1
The bakery team seemed to find itself at the start of the second period and at one time pulled to within six points of its opponents.

Bridgeport (Conn.) Telegram, Aug. 7, 1925, p. 9 (headline)
Senators pull within game of Athletics by twin win over George Sisler's club.

As with the earlier cites, there is no mistaking that "pull (to) within N" meant 'narrow the gap to exactly N.' This is true even when the expression was used to refer to "games behind" in the standings, despite the fact that such calculations can involve half-games. To take the last citation as an example, on the morning of Aug. 7, 1925 the Washington Senators were exactly one game behind the Philadelphia Athletics in the American League standings, not a half a game. (The nifty website Retrosheet verifies this.)

What about the second sense of "pull (to) within," where a game's score is the object of the preposition? It turns out my previous estimate of the 1970s as the time of its emergence was significantly late. (The Recency Illusion spares no prisoners.) I've found attestations all the way back to 1930, with frequency increasing to a high level by about 1950. Here are some cites taken from Newspaperarchive's database of regional papers, providing a range of sports coverage from local beat reporters to nationally syndicated wire stories:

Decatur (Ill.) Herald, Nov. 26, 1930, p. 7
The Banner Blues were slow getting started, and found themselves trailing 6 to 0 when the first quarter ended, but they pulled to within 11-10 by halftime and took the lead in the third quarter.

Nebraska State Journal, Jan. 13, 1937, p. 11
Then Coach Pop Klein put in his reserves and Hebron pulled to within 19-15 at the half.

Charleston (W. Va.) Daily Mail, Aug. 24, 1937, p. 14
Carbide pulled within 3-4 in the sixth when Ware got on base as his third strike got away from the catcher.

Clearfield (Pa.) Progress, Feb. 19, 1941, p. 3
The Bisons made a fourth-quarter surge, pulling to within a count of 25-28, but couldn't quite make the grade.

Kingston (N.Y.) Daily Freeman, Mar. 20, 1945, p. 11
St. John's pulled to within 14-13 at halftime and went ahead at 15 soon after second half started.

Joplin (Mo.) Globe, Feb. 28, 1946, p. 12
Greenfield pulled to within a 27-30 shade starting the last period and kept on the heels of the eventual winners all the way.

Post Standard (Syracuse, N.Y.), Feb. 10, 1947, p. 11
Kentucky had pulled to within 49 to 47.

Traverse City (Mich.) Record-Eagle, Dec. 13, 1947, p. 6
The Vikings, paced by their big center Johns, found the range in the third period and pulled to within a 23-16 count.

The vast majority of the "pull (to) within a score" citations from the 1930s onwards come from coverage of basketball, which was gaining quickly in regional popularity. It's not too surprising that basketball reporters would be the ones to develop this new usage, since the sport involves a great deal more game-score fluidity than relatively low-scoring sports like baseball, hockey and even football.

Finally, I wondered about the application of the "pulling (to) within" idiom to the political calculus of voting. Though I still haven't found much of a parallel to the "pull within a score" expression in the world of politics, citations for "pull (to) within N votes" are easy to spot by the 1940s:

Zanesville (Ohio) Signal, May 10, 1944, p. 1
Edging steadily upward, Atty. Gen. Thomas J. Herbert pulled to within 8,225 votes of James Garfield Stewart today in a stretch finish for the Republican nomination for governor.

Charleston (W. Va.) Gazette, June 25, 1948, p. 1
On the second [ballot] he [sc. Thomas E. Dewey] raided opposition camps, lassoed stray votes from delegation after delegation and pulled to within 33 votes of the glittering goal of 548.

The precise nature of these vote tallies suggests once again that "within" is used to denote an exact differential, though perhaps the 1944 example rounds up to the nearest 25. But if we return to the older variant of "pull up (to) within N," we can find citations in electoral contexts going all the way back to the late 19th century:

Bucks County (Pa.) Gazette, Sep. 7, 1882, p. 2
The sixty-seventh ballot put Evans 3 1/2 ahead of Weand, and on the sixty-eighth Weand pulled up within 1/2, he having 35 1/2, Evans 36, Thropp 29 1/2, Godshalk 15, and Bean 2.

Fitsburgh (Mass.) Daily Sentinel, Oct. 27, 1883, p. 3
G. H. Kellogg said that Mr. Thayer was nominated a few years ago and pulled up within 300 of an election.

Frederick (Md.) News, Nov. 7, 1890, p. 3
The result of the canvas, however, so impressed itself upon the public mind that last year Mr. Russell again made the race and pulled up within 6,775 votes of Governor Brackett.

Again, it's possible that "within 300" rounds up the differential to the nearest 100 and "within 6,775" to the nearest 25. The citation from 1882, however, leaves no room for ambiguity: the difference between 36 and 35 1/2 is exactly one half. So it looks like this sense of "within" had already made the jump from sports to politics about 125 years ago. Marv Albert, you've been well and truly exonerated. ("Yesss!")

Posted by Benjamin Zimmer at 05:59 AM

Some striking similarities

In 2004 and 2005, Geoff Pullum wrote a few Language Log posts about Dan Brown's style. I think that they're among the funniest bits of stylistic criticism since Mark Twain took on "Fenimore Cooper's Literary Offenses", and I'm not alone in being impressed. Two of these posts ("The Dan Brown code", May 1, 2004; and "Renowned author Dan Brown staggered through his formulaic opening sentence", November 7/2004) are still generally among our top ten pages, and "The Dan Brown Code" has been the third or fourth Google hit for {Dan Brown} for some time. As a result of those posts, Geoff was invited to contribute to a collection called "Secrets of Angels and Demons", published in December of 2004.

Recently, Mark Steyn contributed a book review to the Canadian weekly magazine Maclean's, "The Da Vinci Code: bad writing for biblical illiterates". The online copy is dated May 10, 2006, and in paper form, the material appears on p. 54 of the May 15, 2006 issue. Steyn's piece is about 1300 words long. The first 550 words or so are about Dan Brown's writing; the rest is about the Gospel of Judas. If you read the first portion of Steyn's review along with the two Language Log posts that I've cited (here and here), I believe that you'll notice some striking similarities.

In this post, I'm going to limit myself to pointing out some of these similarities. I'll explain later what I think they mean. [Some opinions are now available here, here, and here.]

The first thing to observe is that Steyn cites Pullum:

The linguist Geoffrey Pullum -- or linguist Geoffrey Pullum, as novelist Dan Brown would say -- identifies this as the anarthrous occupational nominal premodifier, to which renowned novelist Dan Brown is unusually partial.

The reference is to Brown's habit of starting books with phrases like "renowned curator Jacques Saunière", "physicist Leonardo Vetra", and "geologist Charles Brophy". Roughly 400 of Steyn's 550 words on Dan Brown are focused on this intrusion of journalistic style into Brown's novels. Steyn's wording suggest to me that he is giving Geoff credit for the the grammatical terminology, not for the stylistic observations or the selection of examples. The reference to Pullum comes after two paragraphs describing two anarthrous examples of Brown's style (out of the three that Steyn quotes), which are presented as if the stylistic observation were Steyn's original reaction as a reader:

So I didn't like the title and then I began reading the book. In the beginning was the word, and Mr. Brown's very first one seems to have gone missing:

"Renowned curator Jacques Saunière staggered through the vaulted archway of the museum's Grand Gallery."

And after that I found it hard to stagger on myself. Shouldn't it be "The renowned curator"? What happened to the definite article? Did Mr. Brown choose to leave it off in order to affect an urgent investigative journalistic style? No, it's just the way he writes. Here's the first sentence of Angels &Demons:

"Physicist Leonardo Vetra smelled burning flesh, and he knew it was his own."

The key joke in Pullum's two cited posts was the observation that this phrasing (which Pullum calls "an occupational term is used with no determiner as a bare role NP premodifier of a proper name") is characteristic of journalism and never normally used in fiction, and that Dan Brown nevertheless uses it to start several of his novels. In his Language Log post "Renowned author Dan Brown staggered through his formulaic opening sentence", Pullum illustrates this point by discussing, in order, three quotes from Brown:

1. Renowned curator Jacques Saunière staggered through the vaulted archway of the museum's Grand Gallery.
2. Physicist Leonardo Vetra smelled burning flesh, and he knew it was his own.
3. Death, in this forsaken place, could come in countless forms. Geologist Charles Brophy had endured the savage splendor of this terrain for years, and yet nothing could prepare him for a fate as barbarous and unnatural as the one about to befall him.

Steyn's 400-word discussion of Brown's anarthrous style is also structured around the discussion of these three quotes. He elides the last phrase of the last quote, but otherwise he gives the same quotes in the same order, supplying no other examples:

1. "Renowned curator Jacques Saunière staggered through the vaulted archway of the museum's Grand Gallery."
2. "Physicist Leonardo Vetra smelled burning flesh, and he knew it was his own."
3. "Death, in this forsaken place, could come in countless forms. Geologist Charles Brophy had endured the savage splendor of this terrain for years . . ."

Following the last Dan Brown quote, Steyn produces a real zinger of a witticism:

Novelist Dan Brown staggered through the formulaic splendour of his opening sentence.

I'm not the only one who was impressed:

(link) And on a lighter note, I always enjoy Mark Steyn. This is great on the Da Vinci Code: "Novelist Dan Brown staggered through the formulaic splendour of his opening sentence."

Steyn's bon mot was also posted (as the example sentence in a fake Word For The Day post for "anarthrous") on May 11 on Free Republic.

Steyn's witticism is strikingly similar to the language of Pullum's post "Renowned author Dan Brown staggered through his formulaic opening sentence", as displayed in the table below.

First column: the first sentence of The Da Vinci Code, by Dan Brown. Words not reproduced in Pullum's two parodies of it are on a blue background.
Second column: the title of Pullum's post, parodying Brown. It is colored lilac where it echoes Brown, and pink where words are replaced by others in the parody.
Third column: Pullum's jokey echoing of his own title near the end of his piece, mingling Saunière's "staggering" (from The Da Vinci Code with Brophy's "savage splendor" (from Deception Point). It is colored green where it does not exactly echo the title (column 2) and pink where it does.
Fourth column: Mark Steyn's blend of Pullum's two sentences. Yellow signals words that are original with Steyn (there is just one: he replaces "author" with "novelist"); lilac signals words carried through all versions; pink indicates copying from Pullum that is not carried over from Brown.

	Dan Brown's sentence	Pullum's parody	Pullum's repetition	Steyn's blend
1	Renowned	Renowned	Renowned
2	curator	author	linguist	Novelist
3	Jacques	Dan	Geoff	Dan
4	Saunière	Brown	Pullum	Brown
5	staggered	staggered	staggered	staggered
6	through	through	across	through
7	the	his	the	the
8	vaulted	formulaic	savage	formulaic
9	archway		splendor	splendour
10	of		of	of
11	the		the	his
12	museum's	opening	Santa Cruz	opening
13	Grand Gallery	sentence	campus	sentence

[Seven of Geoff's Dan Brown posts are reprinted in our recent book, but of course they are still available in the Language Log archives, along with some others not yet reprinted:

"The Dan Brown code" (May 1, 2004)
"The sixteen first rules of fiction" (May 15, 2004)
"Dan Brown still moving very briskly about" (November 4, 2004)
"Renowned author Dan Brown staggered through his formulaic opening sentence" (November 7, 2004)
"Oxen, sharks, and insects: we need pictures" (November 8, 2004)
"Thank God for film: Dan Brown without the writing" (December 2, 2004)
"The kaleidoscope of power" (January 18, 2005)
"Learning the ropes in the trenches with Dan Brown" (July 14, 2005)
"Don't look at their eyes!" (July 19, 2005)
"A five-letter password for a man obsessed with Susan" (September 10, 2005)

For the many writers in need of material to deal with the imminent opening of The Da Vinci Code movie, there's a lot of good stuff in there that hasn't been re-used yet. ]

Posted by Mark Liberman at 12:21 AM

May 14, 2006

Think this

I recently mentioned that I had read a VF article about Steve Jobs. This led me to check out a recent book from my local public library: iCon / Steve Jobs: The Greatest Second Act in the History of Business, by Jeffrey S. Young and William L. Simon. On p. 236, I found this interesting little passage:

The new Apple billboards, spare and stunning, with a simple message of "Think Different," sprang up everywhere, even painted on the sides of buildings, announcing a fresh start for the company. They boosted employee morale. It didn't matter that the phrase was gratingly ungrammatical; maybe that was even part of its charm.

The "gratingly ungrammatical" bit here depends on two assumptions:

Different is supposed to be an adverb that modifies the verb think.
Different is an adjective; the appropriate adverb would be differently.

So, according to Young and Simon, the message on the billboard should have been "Think Differently". Hmm.

First, it's not clear that different is supposed to be an adverb in this case. Apple started this ad campaign after Steve Jobs' return in 1997; he had been ousted from Apple some 12 years earlier. Those 12 years saw Apple hit its lowest point, with little in the way of the kind of successful product innovation it's now well-known for. Jobs was always a maverick-type, wanting to do what nobody else dared to do (and sometimes failing at it, of course). Different in this case could thus easily be interpreted as a kind of object of think, as if in answer to the question: "What is the one word you think of when you think of Apple and Steve Jobs?" The answer could be "Innovative", or "Awesome", or "Different" -- hence, Think Different.

Also, distinctions between many adjective/adverb pairs have been slowly but surely eroding in English. Different/differently is among these pairs; the OED lists different as an adjective or an adverb, in the latter case meaning the same thing as differently and with the caveat "Now only in uneducated use." I think the erosion has gone so far that the "educated/uneducated" distinction made in this OED usage point comes close to simply separating pedants from most other folks; thus, the ad campaign benefitted from the slight double meaning: Apple thinks different(ly), and (therefore) Apple is different.

(In anticipation of the rumors that will no doubt begin flying about: no, this is not the adverb that David Beaver and I came to blows over. But if David has any problem with this post, he knows where he can bring it.)

[ Comments? ]

Posted by Eric Bakovic at 06:05 PM

Sighted in the wild

An Escher sentence sent in by Don Blaheta: "Don would know how much more true that is than I do!"

Apparently it's something about the month of May:

We cannot/must not understate/overstate ... ? (5/6/2004)
Escher sentences (5/7/2004)
An Escher sentence in the wild (5/8/2004)
Approximate inference and global (in)coherence (5/9/2004)
High plains construction grammar (5/12/2004)
Escher sentences: prior art (5/15/2004)
What is linguistics, and why do they embarrass your international customers? (5/28/2004)
Things that are rarely better than they normally are (10/17/2005)
Asking Dr. Language Log (5/12/2006)

[Update: Marilyn Martin writes

In the '40s an issue of the Reader's Digest had a short section on what they called "wolf sentences."
Two that I remember are
I feel a lot more like I do now than I did before.
Please permit those who are going out first.

Google and the Proquest historical New York Times database both appear to be ignorant of the phrase "wolf sentence(s)": can anyone supply more information? ]

Posted by Mark Liberman at 09:23 AM

May 13, 2006

Mitsuwhatzit

Last week I looked at mispronunciations and misspellings of the name Mitsubishi, in particular Mitsubushi (the winner in the misspelling bee) and Mitsibushi and Mitsibishi (the runners-up), arguing that the problem presented by Mitsubishi isn't in nativizing the Japanese name, but in remembering and retrieving the name correctly. Then came the mail (it's always like that here at Language Log Plaza): about actual problems in nativizing words (mostly from Japanese), about Japanese car names and their etymologies, about factors that might have helped boost Mitsubushi to the top of the heap, and about still more manglings of Mitsubishi. Here's the digest.

Problems in nativization (not all of which I understand). Bill Poser, who has the cubicle just down the row from mine at LLP, wrote to wonder why English speakers so frequently mispronounce (and sometimes misspell) harakiri (as hari-kari) and karaoke (as karioki).

Part of this -- the [i] instead of [e] at the end of karaoke -- is easy. Final unaccented [e] is at best marginally acceptable in English, and is normally "fixed" by raising it to [i]; this sometimes shows up in spelling, in the variant karaoki. You see raising not just in karaoke, but also in, for instance, karate, in Hare Krishna, and in some borrowings from Italian, like the salami and zucchini (Italian salame and zucchine) that M. I. Amorelli asked about (from Sardinia) on ADS-L back in April.

The second vowel of karaoke -- which is sometimes spelled karioki, in line with its most common pronunciation in English -- is a bit trickier. This unaccented vowel would be expected to come out as a schwa, giving a sequence of vowels that isn't actually unpronounceable in English (it occurs in supraorbital) but is very rare. So possibly that [i] is just a fix in the direction of a better unaccented vowel before [o].

[Added later 5/13/06: Aaron Dinkin writes to remind me that (unaccented) schwa generally gets raised to [i] before a vowel, as in Judaism and the three-syllable version of Israel, so karioki is just what you'd expect. Words like supraorbital and (six-syllable versions of) extraordinary don't show raising, because they're "level 2" morphological combinations (in Kiparsky's terms).]

Harakiri > hari-kari is more puzzling to me. Something like para-teary seems entirely pronounceable to me; it's just an absurd combination of elements. As it happens, NOAD2 gives a straightforward nativization of harakiri as the first pronunciation, but then admits that the rhyming pronunciation -- the one I hear from everybody except pedants and people who actually know something about Japanese -- is also possible. AHD4 goes a step further, and gives hari-kari as an alternative SPELLING for the word. The correct A A I I spelling outnumbers the rhyming version A I A I about 5 to 1 in raw Google webhits, but we're still talking lots of A I A I spellings.

Google also turns up some A I I I spellings (with the second vowel anticipating the two I's -- and [i]'s -- that follow), about one-third as frequent as the A I A I. But this version might have served as an intermediate step from A A I I to A I A I: first anticipation, in A I I I, then an improvement of this into the satisfying rhyming pattern of A I A I.

I know, some of you are thinking that the baseball announcer Harry Caray (1914-98), whose name was pronounced just like hari-kari, must somehow be involved here. But no, as a quick trip to OED2 shows. The first OED cites under hara-kiri in fact have the spelling hari-kari, and this is in 1856, 1859, and 1862, surely before Harry Caray's PARENTS were born. (By the way, H.C. was born Harry Christopher Carabina.) We don't get "correct" spellings until 1871. In 1888, we get one of each of these spellings, plus the possibly intermediate version hari-kiri. So whatever is going on here, it's been going on for a very long time.

Japanese car names. Several correspondents have pointed out that Mitsubishi is a meaningful compound in Japanese: mitsu 'three' plus hishi 'diamond' (in its voiced variant bishi). The three diamonds are visible in the company's logo. This has damn-all to do with the pronunciation or spelling of the name, but it's still entertaining. (Even cooler is Bill Poser's observation that karaoke is also a compound: kara 'empty', as in karate, literally 'empty hand', plus oke, which is, wonderfully, a borrowing of English orchestra, somewhat truncated. So orchestra traveled to Japan as oke and then came back inside karaoke.)

One other Japanese car name, Isuzu, gives trouble for English speakers. Here the vowels are fine, but the S Z sequence is problematic. As one correspondent pointed out, you'll hear the reversed Izusu (even in some old Isuzu commercials!), and occasionally the assimilated Izuzu. Here the trick is to explain why Isuzu is troublesome but Suzuki is not. Probably something to do with the voicing of [s] (when spelled with a single S) between an unaccented vowel and an accented one, as in presume.

Facilitating factors. Back to Mitsubishi. Several correspondents have suggested things that might tip the scales in favor of I U U I, over its closest competitors I I U I and I I I I. The most common suggestion is that bushido, literally 'warrior's way' (bushi + do:) and referring to the code of the samurai, favors bushi (U I) in the second half of the name. Of course, both I U U I and I I U I have bushi, but I U U I has the extra advantage of preserving the first half, I U, of the original.

More important, I'm doubtful that the word bushido is widely known among English speakers, even though it did make it into both AHD4 and NOAD2. And even more doubtful that many speakers appreciate that bushi is a significant piece of the word bushido -- though there are Bushido Blade video games, and people who play these games are likely to know bushi meaning 'warrior'. In any case, I think that the most bushi(do) could have done is helped I U U I a little bit.

One correspondent did suggest the word sushi (which is probably the Japanese word most widely known to speakers of English -- outside of brand names, of course) as a factor facilitating I U U I. This has some plausibility.

Finally, one ADS-L poster suggested that I I U I should be favored because of the English proper name Mitzi. It is true that of mitsi, mitsu, mutsi, mutsu, bishi, bishu, bushi, and bushu, only the first is pronounced like a generally recognizable English word (though bushi is not too far from bushy). There was a general feeling on the ADS-L that Mitzi is now a name of too little currency to have much influence in reshaping word forms.

More manglings. One further correspondent reported that a pronunciation with two [š]'s -- Mishybishy, as he represented it -- "was quite prevalent in Charlottesville VA a decade ago" and that he'd never heard that version anywhere else. Here we have an anticipation, in the second syllable, of the [š] in the fourth syllable. Plus the shift to the I I I I pattern.

It turns out that the spelling Mishibishi gets a modest number of relevant Google webhits, and they seem to come from all over the place, including Australia. Mishubishi, preserving the original vowel pattern, gets even more.

No doubt there are more. Well, yes, there are a few occurrences of Misubisi. And Michubichi. But it's time for me to put the Mitsubishi file away. I can barely spell the word myself any more.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 02:53 PM

The geopolitical significance of sentence-final prepositions

At last , a rational argument about why it's a bad idea to end a sentence with a preposition. This long-awaited explanation turned up during a Daily Show bit about an Army essay contest.

Jon Stewart introduces the subject:

It's obviously no secret that the military has been having a difficult time fighting the Iraqi insurgency, but help is on the way. The U.S. Army has recently sponsored a civilian-only essay contest called "countering insurgency", to solicit ideas from the public on how best to defeat the Iraqi insurgency. According to Army journal Military Review, "nothing less than the future of the civilized world might depend on it." Oh, and uh please double space.

After a bit, he brings on John Hodgman, who reads part of one of the submissions, and complains about misuse of the word "literally".

Jon Stewart:	All right, John. Uh, you're reading these counter-insurgency essays for grammar.
John Hodgman:	No, and also style and usage. I mean, you can't fight a War on Terror if you're ending a sentence with a preposition.
Jon Stewart:	Wh- Why is that?
John Hodgman:	Uh, in the Middle East, that's seen as a sign of weakness.

In addition to critiquing grammar, style, and usage, Hodgman also offers some advice on rhetoric:

Jon Stewart:	As to what would be an example of a good essay?
John Hodgman:	Well, when it comes to writing an expository essay about counter-insurgent tactics, I'm of the old school. First you tell them how you're going to kill them; then you kill them; then you tell them how you just killed them.

The contest is for real, and the winner should be announced soon. But Stewart is not quite right in calling this a "civilian-only" contest. The call says:

Anyone conducting serious research on issues related to counterinsurgency is invited to submit papers for consideration. However, papers written by government employees during the course of their government employment are not eligible.

I understand this to mean that active-duty military personnel could enter the contest, as long as the submitted paper was done on their own time.

Hodgman's argument -- or rather a slightly modified version of it -- is the only valid argument for thinking twice about sentence-final prepositions. However, we need to delete the reference to the Middle East: the people who irrationally see sentence-final prepositions as a "sign of weakness" are mostly Americans, I think, even though this particular superstition originates with a catty remark that John Dryden once made about a line in one of Ben Jonson's plays.

[Tip of the hat to John McChesney-Young, who sent a link to the Daily Show piece and supplied the title.]

[If your browser or OS has trouble with the Daily Show link given above, a YouTube link is here.]

Previous Language Log posts on phrase-final prepositions:

An Internet Pilgrim's Guide to stranded prepositions (4/11/2004)
A Churchill story up with which I will no longer put (12/8/2004)
A misattribution no longer to be put up with (12/12/2004)
Better a spectacular blunder than a hint of unseemliness (4/25/2005)
Ending with a preposition (5/17/2005)
More on Canadian French prepositition stranding (5/21/2005)
Who are you writing to? (6/2/2005)
If we look, simply, to the French (6/29/2005)
The French aren't really against (6/30/2005)
Avoidance (7/5/2005)
New Yorker search engine stark staring mad (9/20/2005)
Churchill vs. editorial nonsense (11/27/2005)

[Update: Kathleen Burt emailed

To me, "during the course of their government employment" means "while they are still employed by the government" or to put it another way, a person may not submit a paper while still being a government employee.

I feel that this is the plainest interpretation, and also ( a separate consideration) likeliest to be correct. I base this on having been, for most of my working life, a government employee. What we would have said, in order to convey what you understand by that sentence, is "while they are on the clock," although of course, salaried employees don't actually clock in.

Kathleen's interpretation may very well be true, though in other contexts the {"in the course of their * employment"} clearly does mean something like "while they are on the clock" or "as part of their duties". Thus

Under the NHS indemnity for clinical negligence, NHS bodies agree to meet the cost of claims for negligent harm caused by NHS staff in the course of their NHS employment, including their involvement in clinical trials.
Employees who lose or damage personal property in the course of their County employment may process a claim for reimbursement through the Claims Review Board as provided for in the Kern County Administrative Procedures Manual.

Presumably NHS employees who have car wrecks while shopping are on their own, as are Kern County employees who put their iPods through the laundry. In any case, the wording in the contest restriction doesn't distinguish between military and civilian government employees.

Another question, though: is "government" restricted by context to "U.S. Federal government", even though there is no explicit limitation? ]

Posted by Mark Liberman at 06:24 AM

May 12, 2006

Water may or may not run through it

Ten years ago, when I moved from the flatland East to the mountains of Montana, I had to learn some new language. I asked my wife, a native Montanan, what people call those V-shaped indentations in a mountain a few blocks from our home. She patiently explained that they are gulches. I had heard other people refer to these as ravines and gullies, and I needed to know the proper term. "How do you differentiate these?" I stupidly inquired. "Gulches have water running through them," she replied.

That seemed like an okay answer until I realized that I had never seen any water running down these gulches. I also remembered visiting Helena's Last Chance Gulch, where there was absolutely no water in evidence. It's now a sort of toney business center in that town. "Oh, there used to be a creek running down it," was her answer. So now I learned that a gulch is a formation where there is now or used to be water running through it but I'm not sure how an outsider is supposed to know this stuff. There are lots of things called canyons in Montana too, most of which have creeks running down them, so I asked her to whatdifferentiated a gulch from a canyon. "Size," she replied. "Big ones are canyons. Little ones are gulches." My next obvious question was, "How big is big?" No answer. Montanans know these things I guess. Easterners do not. Then there are ravines, which appear to be smaller than canyons but fit the same description: water formed them but it's not necessary for the water to be there now. A gully's formation stems more from a downpour than a creek. What I can't figure out is how I'm supposed to know whether or not water formed these things orginally and whether it's still doing it.

Now, in today's local paper, The Missoulian (see here) I see a story about a lawsuit centering on the terms, slough and ditch. Here's what happened. Rivers tend to migrate here in Montana, changing their courses every once in a while. When this happens, a slough develops near where the rivers used to flow. A man who owns the property where a slough formed did some channel changing construction, stocked it with trout, and barred others from fishing in it (a serious problem in this fishing paradise state). It now fits the definition of a ditch. Here's what the newspaper said:

District Judge Ted Mizner, in a long awaited, closely watched decision, has ruled that the slough, which draws most of its water from the Bitterroot River, is "no longer a natural water body." "Perhaps as early as 130 years ago, the Mitchell Slough may well have been considered a natural water body under the Stream Access Law," wrote Mizner..."However, it cannot be seriously disputed that through natural processes the Bitterroot River has migrated to the west and its bed is substantially lower than the bed of the Mitchell Slough."

The judge's decision was a triumph for property owners. So now what used to be a slough has been renamed a ditch. The judge's ruling created a storm of protest from both conservationists and fishermen. They believe the stream access laws allow the public to use that slough. Objections were made that it was the extensive construction done by the property owner that caused the channel to change so much that it's no longer a natural part of the river, refining it in the process as a ditch. The judge admitted this but still ruled that the water therein is no longer any part of the Bitterroot River.

Although the legal battle here is about whether a man's private construction "improvements" can turn a slough into a ditch, in the process turning a public water access into private property, there seem to be some lexical issues here as well. It's probably okay for me to be confused about the terms we use to describe nature's rearrangements of the landscape, such as gulches, ravines, canyons, and gullies, but I wonder about the right of humans to convert and therefore rename nature-made sloughs into man-made ditches. Maybe, like trademarks, this is another example of law's attempts to control our language.

Posted by Roger Shuy at 01:47 PM

What a difference a comma makes

Pop quiz! In an NYT article entitled "Bolivian Says He Won't Pay Energy Companies", Bolivian President Evo Morales was quoted in one of the following two ways. Guess which one. (Answer below the fold.)

(1) "What we are looking for are partners, not bosses, that exploit our natural resources," Mr. Morales said.

(2) "What we are looking for are partners, not bosses that exploit our natural resources," Mr. Morales said.

Answer: (1), but I'll bet that Mr. Morales actually said (2) (more accurately, I'll bet that he said something in Spanish that means roughly the same thing as (2) as opposed to (1)). In case the distinction in meaning that I'm thinking of here is not clear, here's what I think each of the sentences above means -- and I trust this makes clear why I think Mr. Morales actually said (2), not (1).

(1') What we are looking for are partners that exploit our natural resources, not bosses that exploit our natural resources.

(2') What we are looking for are partners that do not exploit our natural resources, not bosses that exploit our natural resources.

I think that (1') is entailed by (1), while (2') is merely (though quite strongly) implicated by (2).

I'm also willing to bet that the crucial additional comma in (1) was added by a writer or editor, and that it does not reflect what Mr. Morales's interpreter said. The reason I think that is because if the interpreter had said something like (1), I think s/he would have said (1'') instead:

(1'') What we are looking for are partners, not bosses, to exploit our natural resources.

Deceptively small difference between that in (1) and to in (1''), isn't it? But the basic meaning in (1') is conveyed much better by (1'') than by (1), it seems to me.

(Readers may recall that I commented on how weird I find references to President Morales simply as "Bolivian" here; it appears this sort of thing is more common than I'd thought.)

[ Comments? ]

Posted by Eric Bakovic at 01:08 PM

Asking Dr. Language Log

In this morning's mail:

Dear Dr. Log:

A full page ad for the Red Cross in May 15, 2006 issue of The New Yorker has the following headline

What if harm's way was headed yours?

Why is this so jarring? It doesn't seem to be a straightforward syntactic problem:

What if John's car was blocking yours?

One hypothesis is that "way" has different meanings in "harm's way" -- path that harm is following -- and "your way" -- path towards you, and this mismatch interferes with ellipsis reconstruction (cf. Andy Kehler's thesis). Or is it something simpler?

Referentially Challenged, Philadelphia

A few minutes later, Prof. Challenged wrote back with "something simpler":

The problem is that "Harm's way" is not a sensible subject for "was headed".

He also suggested something more complicated, namely that the Red Cross question might be a kind of Escher sentence.

It's a treat to have correspondents who write in with interesting questions, and then write back with answers. This could become a regular Language Log feature.

[Credit for this Q&A belongs to Fernando Pereira]

[Update: Joe Malin points out that "headed yours" is a bit of old radio operator shorthand, for example:

One memorable Mason wireless dispatch: "Twenty-five torpedo bombers headed yours." The message cost the Japanese Imperial Navy every one of those airplanes, save one. [emphasis added]

So maybe the apparent ellipsis in this case is actually pragmatically-controlled anaphora. An argument against: {"headed yours"} gets only 27 Google hits.]

[Update #2: Paul Kay writes

I think harm's way has to be the object of a preposition, perhaps only one of {in, out of, from}. Also this is one of those PPs that can only be used predicatively:

*The platoon was foolishly relaxing in harm's way.
The platoon was foolishly relaxing, while [they were] in harm's way.
*[I hate it here.] Harm's way is a shitty place/Harm's way sucks.

[Well, maybe "Harm's way sucks" could work as a jeu de mots. It would require a lot of previous context.]

I.e., the problem seems more general than that harm's way isn't a proper suject for headed. I don't think it's a proper subject at all.

Also, it's one of those closely bound PP objects that resist extraction:

??Harm's way, I don't want my son to be put in.
??We reluctantly put the platoon in harm's way that couldn't be avoided.
We reluctantly put the platoon in danger/a dangerous position that couldn't be avoided.

I think Paul is right.

Here's a curious thing. In the large part of English poetry indexed by LION, there are 23 instances of "harm's way", of which 21 are "out of", 1 is "in", and 1 is prepositionless. The lone bare example is the 2nd through 4th lines of Paul Simic's "Ballad" (from Return to a Place Lit by a Glass of Milk, 1974):

A little girl picking flowers in a forest
The migrant's fire of her long hair
Harm's way she comes and also the smile's round about way

(Simic is apparently playing with the fact that we normally say "coming my way" or "coming Paul's way" but not "coming harm's way" -- despite the one web citation "without the Nova Scotia Health Research Foundation, all health research in Nova Scotia will come harm's way." This the same wordplay in the ad slogan Fernando cited.)

In quantitative constrast, on the web there are roughly twice as many examples of "in harm's way" as "out of harm's way":

out of 716,000

in 1,500,000

into 225,000

from 48,200

Without looking into it, I believe that this represents an idiomatic preference for boldness over protectiveness, perhaps connected to this idiom by the echoes of John Paul Jones' famous remark "I wish to have no connection with any ship that does not sail fast; for I intend to go in harm's way." Jones did not invent the idiom, however -- the OED gives citations for "out of harm's way" from 1661, and "in harm's way" from 1677:

a1661 FULLER Worthies (1840) I. xviii. 61 Some great persons..have been made sheriffs, to keep them out of harm's way.
a1677 T. MANTON Serm. Psalm cxix, civ in Wks. (1872) VIII. 5 To stand nicely upon terms of duty is to run in harm's way.

The web offers a few semi-convincing examples of "harm's way" with other prepositions:

Well, maybe those hurricane shutters can wait until 2007 - after all you slipped by harm's way in 2005, and maybe you'll do the same in 2006, right?
When he had driven Hood beyond harm's way, he returned and made all haste to put his army in readiness for the march to the sea.
Can you lead these 5 other men, and yourself, through harm's way, intentionally, and come out alive on the other side?
...he wanted to create a robotic spy plane that could fly above harm's way at altitudes above 60000 feet.

And there was an episode of the cult TV series Angel named Harm's Way (episode 9, season 5), which excuses various otherwise-odd word sequences on the web.

]

Posted by Mark Liberman at 07:54 AM

Mock Spanish or Mock Mock Spanish?

When the news broke that Cingular Wireless had revoked a cell-phone ringtone featuring Mock Spanish in a poorly conceived joke about border-crossing, I rattled off a post that suspected "racist intent" at work behind the ringtone (which had already drawn the ire of Latino advocacy groups). The post was based on early reports filed before anyone had investigated the origin of the offending ringtone, in which the Southern-sounding voice of an agent for "La Migra" (the Border Patrol) says, "I repeat-o, put the oranges down and step away from the telephone-o. I'm deporting you back home-o." Now we know more about the source of the ringtone, and the details not only undermine some of my initial assumptions but also raise a whole new set of questions to ponder.

The AP has reported that the "Migra" ringtone was the work of Mexican-American comic Paul Saucido. The company that developed the ringtone, Barrio Mobile, has apologized but has said that it was intended as a work of satire. A spokesman for the ringtone's distributor is quoted as saying of Saucido, "His position is that people of Hispanic background need to maintain a sense of humor about the immigration situation."

An interview with Saucido on Gearlog provides some further insight:

The ringtone, "La Migra Alert," is Saucido pretending to be an immigration official with a really bad fake Southern accent, saying, "I'm deporting you back home-o."
The character came from a brainstorming session between him and a few other Latino comedians, Saucido said, citing Dave Chapelle and Carlos Mencia among his comedic influences.
"It was inspired by other comedians who riff on the same stereotype of the immigration officer ... you know how people try to phonetically speak when they talk down to you, like, 'where is the bathroom-o?'" he said.
The ringtone came as part of a package of comic ringtone characters developed by Saucido, including a hovering, novela-obsessed Mexican mom, a Mexican dad, and a "barrio kid" who would say "I can't make it to the phone right now, I'm busy rotating the tires on my low-rider." All of Saucido's ringtones have been removed by Cingular, he says.
"I think because of the times, right now people are a little extra sensitive [about immigration issues,]" he said. "I'm sensitive to this issue! But people obviously leave their senses of humor behind when they get so much fever in them. I thought the Migra character was the last character that would get that kind of reaction."
Saucido says there's "absolutely" room for edgy comedy in the ringtone world.
"I've played it for all my friends and they love them - they're waiting for them to be sold, and they're like, where can we buy them?" he said. "These companies have got to have some backbone to say we bought this content, we believe in it, and we're not going to get rid of it just because the first advocacy group calls racism. Dude, everyone that produced them and worked for them - we're all Mexican!"

Satire's a very tricky thing, and context is key. If Saucido had presented the Mock Spanish of the "Migra" character in the context of a standup routine, his audience would be prepared to hear lines like "I'm deporting you back home-o" as the work of a Latino comic parodying "how people try to phonetically speak when they talk down to you" — Mock Mock Spanish, if you will. But the discursive frame of a cell-phone ringtone is wildly different from a standup act. Saucido's voice was recorded, disembodied from the original context of utterance, and commodified in the form of a downloadable audio file (Cingular ringtones cost $2.49 a pop). And the joke was further decontextualized in news reports about the controversy — especially since the ringtone had already been pulled from the Cingular site, leaving accounts in the print media as the only representations of the original routine. Despite the comedian's amazement at the backlash against the ringtone, it's not too surprising that the radical recontextualization of Saucido's work would lead many observers (including me) to miss the intended satire completely.

Saucido's original bit relied crucially on its own type of recontextualization: it took the condescending use of Mock Spanish by immigration officials and reframed it as satirical social commentary through the mimicry of standup comedy. But the misinterpretation of the joke once it was let loose on the world as a downloadable ringtone only demonstrates the unpredictable effects of recontextualization (and re-recontextualization, and re-re-re...), particularly in a highly charged political atmosphere as we currently find surrounding the issues of immigration and Spanish language use in the United States. Po-mo types talk about these pragmatic pitfalls in Derridean terms like "iterability" and "citationality" (see, for instance, the work of Judith Butler, who has written extensively on the difficulties of reining in "subversive resignification"). Whatever one's theoretical outlook, the controversy over the "Migra" ringtone would make a fascinating case study in misconstrued speech acts and failed performativity.

Posted by Benjamin Zimmer at 01:11 AM

May 11, 2006

Good story, bad headline

Since we often criticize journalists here on Language Log, I try to praise good reporting on language-related issues when I can find it. And Rafaela von Bredow's May 3 story about Dan Everett and the Pirahã, in Spiegel Online, is very good. She explains the facts, the interpretations and the issues in a clear and readable way. Unfortunately, her work is spoiled by a seriously misleading headline and sub-head -- which I'm sure that she didn't write. As Nicole Stockdale explains, "[h]eadlines are written by copy editors, who battle deadlines to clean up or rewrite the reporter's copy, massage it, then crown it with a spiffy headline". When the editor doesn't understand the story, the result is what you'd expect.

In this case, some anonymous Spiegel editor gave Rafaela von Bredow's story the title "Living without Numbers or Time", and the sub-head "The Pirahã people have no history, no descriptive words and no subordinate clauses. . ." It's true that the Pirahã lack number words, but it's false that they "[live] without Time". It's apparently true that they have no subordinate clauses, but false that they "have no history [and] no descriptive words".

So three of the five cited facts about the language are wrong. That's 40% correct, a failing grade by any reasonable standard. Good stories are often spoiled by bad headlines -- isn't it past time to do something about this dysfunctional aspect of journalistic culture?

[Before going on, I should note that the body of Bredow's article is marred by a couple of unfortunate phrases, like

What the tribesmen didn't realize, however, was that Everett, a linguist, was eavesdropping, and he could already understand enough of the Amazon people's cacophonic singsong to make out the decisive words. [emphasis and dictionary links added]

The idea that the Pirahã communicate via "harsh and unpleasant monotonously rising and falling inflections" is a value judgment added by the reporter or her editor, and it ill behooves a speaker of the much-maligned German language to sling around words like cacophonic. However, there are only few issues of this sort in the body of the article, and in my opinion they don't spoil its generally clear and insightful presentation of the basic facts and issues.]

For those interested in the aspects of Pirahã treated in the headline and subhead, the stuff about numbers is well covered in the links given here. As for subordinate clauses, Everett does argue that Pirahã lacks them, as has also been claimed for several other human languages. (Here's a sketch of what English might be like if it worked that way.)

With respect to time and descriptive words, I'll quote a few relevant passages from Daniel L. Everett, "Cultural Constraints on Grammar and Cognition in Pirahã: Another Look at the Design Features of Human Language", Current Anthropology, Volume 46, Number 4, August-October 2005 (a preprint is available for those without access to a subscription).

Dan's discussion of time and tense does suggest that something a bit unusual might be going on:

I have argued elsewhere (1993) that Pirahã has no perfect tense and have provided a means for accounting for this fact formally within the neo-Reichenbachian tense model of Hornstein (1990). This is an argument about the semantics of Pirahã tense, not merely the morphosyntax of tense representation. In other words, the claim is that there is no way to get a perfect tense meaning in Pirahã, not merely an absence of a formal marker for it. Pirahã has two tenselike morphemes, -a `remote' and -i `proximate'. These are used for either past or present events and serve primarily to mark whether an event is in the immediate control or experience of the speaker ("proximate") or not ("remote").

In fact, Pirahã has very few words for time. The complete list is as follows: 'ahoapió 'another day' (lit. 'other at fire'), pi'í `now', so'óá 'already' (lit. 'time-wear'), hoa `day' (lit. `fire'), ahoái 'night' (lit. 'be at fire'), piiáiso 'low water' (lit. 'water skinny temporal'), piibigaíso 'high water' (lit. 'water thick temporal'), kahai'aíi 'ogiíso 'full moon' (lit. 'moon big temporal'), hisó 'during the day' (lit. 'in sun'), hisóogiái 'noon' (lit. `in sun big be'), hibigíbagá'áiso 'sunset/sunrise' (lit. 'he touch comes be temporal'), 'ahoakohoaihio 'early morning, before sunrise' (lit. 'at fire inside eat go').

Specifically, Dan thinks that this is a one of many linguistic symptoms of a general pattern, in which

Pirahã culture constrains communication to nonabstract subjects which fall within the immediate experience of interlocutors.

There's plenty of room for argument here about whether "nonabstract" is a fair characterization of morphemes that mean things like "remote" vs. "proximate", "other", "temporal" and so on. With respect to talk about tense and time, Dan argues that

[I]n the context of the present exploration of culture-grammar interactions in Pirahã, it is possible to situate the semantics of Pirahã tense more perspicaciously by seeing the absence of precise temporal reference and relative tenses as one further example of the cultural constraint on grammar and living. This would follow because precise temporal reference and relative tenses quantify and make reference to events outside of immediate experience and cannot, as can all Pirahã time words, be binarily classified as "in experience" and "out of experience."

In any case, there's no support for the view that the Pirahã "[live] without time". As one more nail in the coffin of this notion, I'll quote one of the example sentences from Everett 2005:

kohoai	-kabáob	-áo	ti	gí	'ahoai	-soog	-abagaí
eat	-finish	-temporal	I	you	speak	-desiderative	-frustrated_initiation
"When [I] finish eating, I want to speak to you."

By the way, you might think that this example includes a subordinate clause, but Dan says "no":

There is almost always a detectable pause between the temporal clause and the "main clause." Such clauses may look embedded from the English translation, but I see no evidence for such an analysis. Perhaps a better translation would be "I finish eating, I speak to you."

What about the claim that the Pirahã have "no descriptive words"? The only part of the the Spiegel article that might have given rise to this preposterous claim is the sentence

Apparently colors aren't very important to the Pirahãs, either -- they don't describe any of them in their language.

Dan Everett does argue that the Pirahã have no basic color terms (though Paul Kay, one of the commentators on the Current Anthropology article, is not convinced). I'm not sure what it would mean for a language have "no descriptive words", but a couple of additional Pirahã examples should establish that it's not true in this case:

bii	-o³pai²	ai³
blood	-dirty/opaque	be/do
"blood is dirty"

kahaí	kai	-sai	hi	ob	-áa'áí
arrow	make	-nominative	he	see	-attractive
"He knows how to make arrows well." (lit. "He sees attractively arrow-making.")

[Update: Julia Hockenmaier raises a possibility that should have occurred to me -- the headline might have been botched in translation. She points out that the headline and subhead in the German version read:

LINGUISTIK: Leben ohne Zahl und Zeit
Das Volk der Pirahã kennt keine Vergangenheit, keine Farbwörter, keine Nebensätze. Das macht seine Sprache zur merkwürdigsten der Welt - und zum Zankapfel der Linguisten.

and comments:

'Zeit' in German means both tense and time, so I think this is just a translation error. Similarly, 'Vergangenheit' means either 'past' or 'past tense' (especially in this list of language-related terms), but not 'history' (that would be'Geschichte'). And the original doesn't say 'descriptive terms', but 'color words'.

She also asks

By the way, how is this absence of tense different from, say, Chinese? It really doesn't seem that unusual to me.

As I understand Dan's argument, he's claiming that the Pirahã's time-related morphology is consistent with their general cultural pattern, not that it's unique. ]

Posted by Mark Liberman at 08:20 PM

The hispanicization of American baseball, the status of Puerto Rico, and the achievements of Roberto Clemente

George F. Will (yes, THAT George F. Will) reports, in a review of Clemente by David Maraniss, New York Times Book Review, 5/7/06, p. 13:

Baseball has come a long way since the San Francisco Giants' manager Alvin Dark, in 1964, banned Spanish in the clubhouse. In 1989 and 1990, five of the 26 major-league teams had a starting shortstop from the same Dominican town, San Pedro de Macorís. In 2005, 29 percent of the players on the 30 teams' opening day rosters were born outside the United States -- 70 percent of them from the Dominican Republic, Venezuela or Puerto Rico. Among the nearly 1,200 players on the 40-man rosters this spring, 10 of the 16 most common surnames were Hernández, Gonzalez, Perez, Ramirez, Rodriguez, Cabrera, Guzman, Lopez, Peña and Sanchez.

Four things of note here: the main point, which is the hispanicization of American baseball; the identification of Puerto Rico as being outside the U.S.; picking out the 20.3 percent of players who are from the Dominican Republic, Venezuela, or Puerto Rico by taking 70 percent of 29 percent; and the shift from origin in the Spanish-speaking Americas to possession of a Hispanic surname.

Here at Language Log Plaza, we've been remarking on American attitudes (often negative) towards the Spanish language, towards its speakers, and towards Latino/Hispanic Americans in general -- most recently, here, here, and here -- so it's nice to see a little report on how our national pastime has come to rely so significantly on Latino players.

Now, the list: the Dominican Republic, Venezuela, Puerto Rico. All characterized as "outside the United States". Puerto Rico is the oddity here. (It's also relevant to Roberto Clemente, who was Puerto Rican. And black.) It has a status that puts it firmly both inside and outside the U.S. Mostly inside in several respects, some of them described on the website of the Puerto Rico Federal Affairs Administration:

Puerto Rico's relationship with the U.S. Federal Government, as defined by the Constitution of 1952, is in many respects, similar to that of any other state. Matters of currency, defense, external relations and interstate commerce are within the jurisdiction of the U.S. Federal Government. The U.S. Constitution as well as most laws passed by Congress are applicable in Puerto Rico. Residents of the island however, do not pay federal income taxes and do not vote for President. [On the other hand, since defense is within the jurisdiction of the U.S. government, Puerto Ricans are subject to the draft.]

And Puerto Ricans are U.S. citizens, but then so are residents of Guam and the U.S. Virgin Islands, both of which are U.S. territories. (This is all so convoluted: residents of American Samoa are U.S. nationals but not U.S. citizens.)

According to that 1952 constitution, Puerto Rico is a semi-autonomous entity, officially named a Commonwealth. (Kentucky, Massachusetts, Pennsylvania, and Virginia are commonwealths, but of course not semi-autonomous entities.) The Commonwealth also is a possession of the United States, though not called a territory. In any case, if Guam and the U.S. Virgin Islands are "outside the United States", which I think would be common usage, then Puerto Rico is even more so.

Still, many of us who live in the 50 states and the District of Columbia tend to think of Puerto Rico as more a part of the U.S. than Guam or the U.S. Virgin Islands -- as not really being "foreign" -- so having it on a list with the Dominican Republic and Venezuela seems a bit odd.

On yet another hand, Puerto Rico shares with the Dominican Republic and Venezuela (as against the United States) the property of having Spanish as an official language. And that's directly relevant to Will's little history of the Spanish language in the U.S. major leagues.

Could Will have avoided "outside the United States"? Well, there's a problem here, which we can see more clearly when we ask why he chose to refer to 20.3 of the players so indirectly, as 70 percent of 29 percent of them. Why didn't he just say, "In 2005, of the players on the 30 teams' opening day rosters, just over 20 percent of them came from only three ____ where Spanish is an official language"? But what plural noun fills in the blank? Oh dear. "Countries" or "nations" won't do, because Puerto Rico isn't actually a country or nation; as currently configured, it's not entitled to a seat in the United Nations, any more than Guam is. "Places" and "lands" are too vague. "Governmental entities" or the like would be too technical AND too vague.

There are work-arounds, for instance: "In 2005, the Dominican Republic, Puerto Rico, and Venezuela -- in all of which Spanish is an official language -- together supplied just over 20 percent of the players on the 30 teams' opening day rosters." (Or maybe "1 in 5" rather than "20 percent".) This avoids the "outside the United States" problem and also the 70-percent-of-29-percent problem, and makes the Spanish language point explicit. It doesn't note explicitly that only three places account for so much of the rosters, but then Will's original didn't either.

Finally, the shift to Hispanic surnames, which rather muddies things, since the surnames point neither to place of origin nor to the real matter under discussion, the use of the Spanish language. [Clarification added 5/13/06: The spellings of these surnames above -- with their inconsistency in the use of the acute accent -- are exactly as they appeared in the NYT review.] For a moment, I entertained the idea that Will was slyly trying to insert the idea that if your ancestors were Spanish-speaking foreigners (from Latin America, at any rate), then you're (still) a foreigner too -- in which case that last sentence would be not merely only indirectly relevant to the topic, but also slimy. Then I decided he was only noting that that Latinos, for some value of "Latino", are all over baseball these days, something that certainly wasn't the case in Roberto Clemente's time, and that Clemente himself, laudably, had a lot to do with that.

The review is mostly about Clemente, and it's sympathetic to and admiring of the man. It even notes that he was "arrestingly handsome" as well as, in several ways, heroic. Catch the sympathetic resentment in this report:

Clemente, playing in a city with a minuscule Latino population [Pittsburgh], said he felt like a "double nigger." As late as 1971 -- in one game that year, the Pirates became the first team ever to have nine black players in its starting lineup -- some sportswriters still quoted him in phonetic English: "Eef I have my good arm thee ball gets there a leetle quicker."

This about a man who died tragically, while trying to get aid from Puerto Rico to Nicaragua after a severe earthquake there in 1972.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 07:37 PM

No concept of the future, no yuccas either

Juan Forero reports, on the front page of today's New York Times, on a group of Nukak-Makú hunter-gatherers who have emerged from the Colombian jungle to seek refuge in the town of San José del Guaviare. They are described as classic primitives, people who "have lived a Stone Age life" and are innocent of the ways of the modern world:

The Nukak have no concept of money, of property, of the role of government, or even of the existence of a country called Colombia. They ask whether the planes that fly overhead are moving on some sort of invisible road.

Their conceptual poverty extends, in Forero's somewhat confused account, to at least one basic temporal notion:

When asked if the Nukak were concerned about the future, Belisario, the only one in the group who had been to the outside world before and spoke Spanish, seemed perplexed, less by the word than by the concept. "The future," he said, "what's that?"

But much later in the story, we see that they are perfectly capable of planning for this putatively unconceptualizable future:

That is not to say the Nukak do not have plans.

Ma-be explained that the idea is to grow plantains and yucca and take the crops to town. "We can exchange it for money," he said, "and exchange the money for other things."

Now I don't know what word Belisario used to translate Ma-be here -- yucca is attested as an occasional variant of yuca, the name of a starchy tuber better known as cassava -- but American readers unfamiliar with tropical foodstuffs will mostly be puzzled by the idea that the Nukak hope to grow the spiky agave yucca as a crop. "Yuca" would have been a better choice, and "cassava" even better than that.

Back to the future. It's hard to see how Belisario's perplexity was about anything BUT words. Somebody asked him if the Nukak were concerned about "el futuro", and Belisario asked what "el futuro" was. End of story. At this point we can begin to suspect that Belisario's command of Spanish, in particular its vocabulary, is not so great. And we can begin to wonder what Spanish translations he gave to the other Nukak's reports of their plans for the future: did he use future tense forms? In any case, the exchange about the future was about a word, not a concept.

So why did Forero report the exchange as being about a concept? Because, once again, "primitive" peoples are being imagined as deficient in abstract thought. It's a cousin of "the X have N words (for some large N) for Y, but no word for Y in general, so the X are incapable of conceptualizing Y as an abstract notion". You know, those poor Eskimos, stuck in an avalanche of highly specific words for snow.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 12:56 PM

May 10, 2006

Pulling within

Here's an easy bet. Tune in to an upcoming NBA playoff game — say, tonight's matchup between the New Jersey Nets and the Miami Heat — and wait for one team to fall behind by a significant margin. Let's say, for the sake of argument, the Heat fall 10 points behind the Nets. Then wait for the team that's behind to stage a bit of a rally, for instance if the Heat bridge a 10-point gap and make it a 5-point game. I wager that the announcer will say one of two things:

a) And the Heat pull (to) within five!
b) And the Heat pull (to) within 90 to 85! [Or whatever the score is.]

How did "pulling (to) within" a scoring differential, or even more oddly, "pulling (to) within" a score, become the standard sportscaster-talk to describe a losing team rallying against a winning team? The answer lies in how the spatial metaphors of racing contests have been transformed by American team sports.

We're used to hearing the verb "pull" in various competitive contexts to describe one contestant's movements relative to another, as in "pulling even," "pulling ahead," and so forth. This sense of "pull" goes back to the literal sense of pulling on oars in a boat race, attested since the 1860s. It didn't take long for "pull" to get applied to other types of races, such as horse-racing. (And, like so many other horse-racing terms, "pulling ahead" and similar forms quickly entered the world of politics to refer to the jockeying of candidates in the polls.) With "pull" coming to mean "move in relation to the position of another competitor," the expression could be used to quantify the relative distance between competitors, as in "lengths" for horses on a racetrack (a measurement likely borrowed from boat-lengths in crew racing). An announcer at a horse race could say: "Horse A pulls (to) within three lengths of Horse B," with the implication that Horse A was more than three lengths behind and has now passed into a position within three lengths relative to Horse B.

When the idiom of "pulling within" a distance in a race started to get applied to team sports like baseball, football, and hockey, the metaphorical fit was not exact. If a baseball team is down 6-0 and then a player on the team hits a two-run homer, the gap has been narrowed from six to four. Unlike the continuous movement of boat-racing or horse-racing, however, scoring in baseball and other team sports is "discrete," as mathematicians call it: a team's score jumps from one non-negative integer to the next with no fractions in between. So the spatial metaphor of being "within" a certain relative distance doesn't exactly match the context of a baseball team down by a discrete number of runs. Nevertheless, it became common by the mid-20th century for announcers and reporters to talk about teams "pulling (to) within" a certain number of runs, points, goals, or even games in the standings.

Once this usage of "pulling (to) within" a scoring difference was established, the spatially defined sense of "within" began undergoing a subtle change in sporting contexts. It no longer implied necessarily a movement "inside" a given limit, like a number of lengths on the racetrack. Instead, the preposition "within" could simply suggest a narrowing of the differential between two team's scores. From there it was a short step for the score itself, rather than the differential, to be used as the object of the preposition "within." By the mid-1970s it became rather common for sports reporters to use this new sense in print, as in "The Penguins pulled within 3-2," or "Houston pulled to within 115-112." And it wasn't long before verbs other than "pull" were used in conjunction with the sense of "within" meaning "to a score of": for instance, the AP reported in Game 1 of the Nets-Heat series that Shaquille O'Neal had a late run "that drew Miami within 92-83."

For now, this emergent usage is restricted to sporting competitions. But I wonder how long it will be before we see a news story reporting that proponents of a failed Senate bill pulled the vote to within 52-48. After all, everyone loves a good horse race.

[Update #1: Paul Kay notes that in today's on-line Time Magazine, under the headline "Why Jeb Bush Won Big," Tim Padgett writes: "And when challenger and political rookie Bill McBride pulled to within three points in polls taken earlier this fall, it looked like they [the Democrats] just might [get revenge]." Indeed, "pull (to) within" has long been applied in political contexts to poll differentials measured in percentage points. (Bruce Rusk sends along examples going back to 1972.) What I have yet to see, however, is a political equivalent for "pull within (a score)." That's what I was trying to get at with my posited example of pulling a Senate vote (to) within 52-48.

Meanwhile, here's a mathematical perspective from Adrian Riskin:

I think that at least mathematically speaking it makes sense to describe one team as being within a whole number of points of another team. Mathematicians at least use "within" to denote being in an interval, and the question then becomes whether or not the interval contains its endpoints. Often it does not, as in the case when X is said to be "within epsilon" of Y. This always has the implicit meaning that the absolute value of X-Y is strictly smaller than epsilon. On the other hand, it's not uncommon in other contexts to see "within" used to mean "less than or equal to the actual differential". To find examples of this I searched for the word "within" in mathematical articles on arxiv.org (a scientific preprint server) and found a number of examples where it's used to denote the differential of whole numbers. ...
So I think at least the first usage of "within" by sportscasters that you discuss is not so strange (assuming that being similar to what mathematicians do can be described as being "not so strange"). The second usage I won't try to defend. On the other hand, I'd like to hear a sportcaster say, when the score is 6-3, that the losing team has pulled within 8 of the winning team. It's still true!

Even if it's mathematically defensible to say that Team X is "within N" of Team Y when the difference in scoring is exactly N, I find I'm not the only one who considers the sporting usage a bit annoying. Vivian de St. Vrain, aka Dr. Metablog, posted peevishly about this use of "within" back in March.]

[Update #2: Paul Kay sends along another political example, this time a bit closer to "pull within (a score)":

Both events would help the Democrats pull within parity in one or both houses this fall. (The Left Coaster) ]

[Update #3: These turns of phrase are even older than I had thought. Documentation here.]

Posted by Benjamin Zimmer at 02:56 PM

Colossus

OUP has just (May 4) published "Colossus : The secrets of Bletchley Park's code-breaking computers", edited by Jack Copeland.

According to the blurb on amazon's site,

The American ENIAC is customarily regarded as having been the starting point of electronic computation. This book rewrites the history of computer science, arguing that in reality Colossus--the giant computer built by the British secret service during World War II--predates ENIAC by two years.

Colossus was built during the Second World War at the Government Code and Cypher School at Bletchley Park. Until very recently, much about the Colossus machine was shrouded in secrecy, largely because the code-breaking algorithms that were employed during World War II remained in use by the British security services until a short time ago. In addition, the United States has recently declassified a considerable volume of wartime documents relating to Colossus. Jack Copeland has brought together memoirs of veterans of Bletchley Park--the top-secret headquarters of Britain's secret service--and others who draw on the wealth of declassified information to illuminate the crucial role Colossus played during World War II. Included here are pieces by the former WRENS who actually worked the machine, the scientist who pioneered the use of vacuum tubes in data processing, and leading authorities
on code-breaking and computer science.

A review by Alan Cane in today's Financial Times (subscription only) explains that

... the full significance of the Colossi in shortening the war is now becoming clear, with the release a few years ago of a 500-page document under the curious title "General Report on Tunny" that had remained highly classified since the war.

Written by three of the Bletchley Park codebreakers, Jack Good, Donald Michie and Geoffrey Timms, the report describes how the 11 Colossi were designed and used. Not to break the Enigma traffic - that was the preserve of Alan Turing's "bombes" - but to attack the German's most secret cipher which the Allies codenamed Tunny. This cipher carried the highest grade of German intelligence. Breaking Tunny was key to the success of the D-Day landings.

The "General Report on Tunny" can be found here. The FT review goes on to say that

Computing history, therefore, has to be rewritten. The credit for creating the first electronic computer has so far rested with ENIAC, an 18,000 thermionic valve monster built by Presper Eckert and John Mauchley at the University of Pennsylvania.

ENIAC, however, ran its first program at the end of 1945, two years after Colossus successfully attacked the Tunny codes.

After the war the Colossi were largely broken up, the documentation destroyed and any mention of Colossus or the part it played in the Allied victory suppressed under the weight of the Official Secrets Act.

The article concludes:

Are there lessons to be learned from Colossus? Only that the UK still lacks the skills to profit from its ability to innovate: and that until it acquires them, it will have to be satisfied with the tacit knowledge that "We did it first."

Well, ENIAC didn't bring Philadelphia any lasting dominance in the digital hardware business, either.

[More information can be found in the Wikipedia article on the Colossus, including some discussion of the code-breaking methods that Colossus was designed to implement. You'll see from that description

The Colossus computers were used to help decipher teleprinter messages which had been encrypted using the Lorenz SZ40/42 machine. Colossus compared two data streams, counting each match based on a programmable boolean function. The encrypted message was read at high speed from a paper tape. The other stream was generated internally, and was an electronic simulation of the Lorenz machine at various trial settings. If the match count for a setting was above a certain threshold, it would be output on an electric typewriter.

that it's not clear that Colossus should be considered to be a "general purpose" computing machine: perhaps ENIAC's laurels are safe.]

[Update: I should have known that I couldn't get away without citing Atanasoff and Zuse.

Linda Seebach wrote:

Eckert-Mauchly don't deserve credit for the first digital computer; John Atanasoff at Iowa State was ahead of them. But he didn't patent it. http://www.cs.iastate.edu/jva/jva-archive.shtml
[section below quoted from cited URL]
The Atanasoff-Berry Computer was the world's first electronic digital computer. It was built by John Vincent Atanasoff and Clifford Berry at Iowa State University during 1937-42. It incorporated several major innovations in computing including the use of binary arithmetic, regenerative memory, parallel processing, and separation of memory and computing functions. On October 19, 1973, US Federal Judge Earl R. Larson signed his decision following a lengthy court trial which declared the ENIAC patent of Mauchly and Eckert invalid and named Atanasoff the inventor of the electronic digital computer -- the Atanasoff-Berry Computer or the ABC.

Clark Mollenhoff in his book, Atanasoff, Forgotten Father of the Computer, details the design and construction of the Atanasoff-Berry Computer with emphasis on the relationships of the individuals. Alice and Arthur Burks in their book, The First Electronic Computer: The Atanasoff Story, describe the design and construction of the ABC and provide a more technical perspective. Numerous articles provide additional information. In recognition of his achievement, Atanasoff was awarded the National Medal of Technology by President George Bush at the White house on November 13, 1990.

Julia Hockenmeier asked:

I was also wondering, how exactly does Zuse's Z3 compare to the Colossus and the ENIAC?

And Mike Albaugh wrote:

Colossus vs. Enigma: sounds like a WWF match. Anyway, neither machine was as "General Purpose" as Konrad Zuse's Z1, built from scrounged metal bits in the living-room of his parents' apartment, in 1936. Of course, within their domains they were undoubtledly faster, as Z1 was purely mechanical. (Z2 was a later version in electro-mechanical form.)

As a docent at the Computer History Museum, I have learned to very carefully lay out the adjectives when describing a "first". There are a huge number of "first" computers, depending on the precise mix of adjectives.

Further details are available from the Wikipedia articles on Atanaoff and Zuse.

Meanwhile, my copy of Copeland's Colossus book has just arrived, which was the point for me here in the first place.]

Posted by Mark Liberman at 11:41 AM

Depreciate and deprecate: stay out of it

Furious altercations down the hall from the water cooler in One Language Log Plaza today. Nunberg was shouting, red-faced: "Doctor Johnson's vocabulary was good enough for him and it's good enough for me!" Several younger staffers were arguing with him: "You gotta move with the times!" Then somebody said, "For heaven's sake, there's only one letter different," and everybody turned and started shouting at once. Should Nunberg have used "depreciate" to mean "to lower in estimation or esteem" (the first meaning in the Webster entry), i.e., "lower the value of by expressing the opposite of appreciation for", hence roughly what "denigrate" or "deprecate" would mean? He did, in this post. And he meant it; it wasn't a typo. And the dictionaries back him up: the word can mean that. Yet a proofreader wrote to Nunberg saying he had a ROTFL moment when he saw it... Well, I don't know. I stayed well away from the whole rowdy scene. I've seen this kind of word-quarrel spiral down into violence, with men dashing glasses of chardonnay in each other's faces. One time I saw young Bakovic and Beaver actually come to blows over an adverb. We care about language here at Language Log. You might want to look up depreciate and deprecate. Their meanings are very close, yet their etymologies are quite different (the first from the Latin pret- root meaning "price", the second from the Latin prec- root meaning "pray"). Did Nunberg pick the right one? Maybe, maybe not. But I'm staying out of it.

Posted by Geoffrey K. Pullum at 09:25 AM

Mock Spanish in the cellular age

In a 1995 paper, the linguistic anthropologist Jane Hill argued that the register of "Mock Spanish" serves as "a site for the indexical reproduction of racism in American English." Though it may be hard to accept Bart Simpson's "Ay, caramba!" or Arnold Schwarzenegger's "Hasta la vista, baby" as racist discourse, even covertly so, there's no denying [*] the racist intent of a cell-phone ringtone recently pulled from the Cingular Wireless website. According to an article in the Brownsville Herald (picked up by the AP wire), the ringtone was called "La Migra," a Spanish term for the U.S. Border Patrol. The Herald describes the ringtone as follows:

In it, a siren is heard, followed by a male voice that says in a southern accent: "Calmate, calmate, this is la migra. Por favor, put the oranges down and step away from the cell phone. I repeat-o, put the oranges down and step away from the telephone-o. I'm deporting you back home-o."

The recording makes extensive use of -o suffixing, a feature Hill observes is one of the hallmarks of Mock Spanish. The most common example of this jocular suffixing is "No problemo," heard along with "Hasta la vista, baby" in the movie Terminator 2. As Hill notes, "No problemo" doesn't derive from Spanish (where the equivalent expression is No hay problema) but rather is simply the English colloquialism "No problem" with -o added. Hill's paper includes many more examples of such suffixation, from routine putdowns like "el cheap-o" to this personal ad in the UC San Diego student newspaper which seems to combine Mock Spanish with Mock Sicilian:

"Don Thomas -- Watcho your backo! You just mighto wake uppo con knee capo obliterato. Arriba!"

Cingular Wireless, to its credit, denounced the "La Migra" ringtone as "blatantly offensive" and pulled it from the site as soon as a reporter from the Brownsville Herald pointed it out. The AP reports that the ringtone was developed by "Barrio Mobile" and was available on the Cingular site beginning in late February or early March (since which time it had only been downloaded eight times). The timing of the discovery is rather inopportune, given how polarized the debates over immigration and Spanish-language usage have become since the eruption of the "Nuestro Himno" controversy (see here, here, here, here, and here). One can only hope that the news of the pulled ringtone might provoke some healthy introspection about ugly stereotypes of Mexican immigrants and the frequent offensiveness of Mock Spanish.

[* Update #1: Paul Postal takes exception with my assertion that "there's no denying the racist intent" of the ringtone:

It is not racist to make fun of Mexican/Spanish accents and cannot be. Spanish is spoken by people of many racial groups and made fun of by many. If an African-American makes fun of Arnold Schwarzennegger's accent, is that racist? Give me a break.

I acknowledge that it was overly glib of me to say "there's no denying..." when there may be many who deny the point. However, I use "racist" here as Jane Hill does in her paper, to imply a "racializing" effect (regardless of whether one consciously imagines Spanish speakers in the U.S. as a separate "race"). Quoting Hill:

I would argue, along with many contemporary theorists of racism such as van Dijk (1993), Essed (1991), and Goldberg (1993), that to find that an action or utterance is "racist", one does not have to demonstrate that the racism is consciously intended. Racism is judged, instead, by its effects: of successful discrimination and exclusion of members of the racialized group from goods and resources enjoyed by members of the racializing group. It is easy to demonstrate that such discrimination and exclusion not only has existed in the past against Mexican Americans and other members of historically Spanish-speaking populations in the United States, but continues today.

As I said, it may be hard to accept that, say, Arnold Schwarzenegger's catchphrases in Terminator 2 have the racializing effect that Hill ascribes to them, but the "La Migra" ringtone makes no bones about its reliance on racist stereotypes of Mexican border-crossers and arguably Mexican-Americans and Latinos more generally. This is an overt type of offensively racializing discourse, I think many would agree, regardless of one's feelings about what Hill identifies as "covert indexes" of racism in other Mock Spanish utterances.]

[Update #2: Turns out the ringtone was a work of satire by a Mexican-American comedian. Details here.]

Posted by Benjamin Zimmer at 01:35 AM

A racy WTF coordination

Joe Gordon spotted a headline that is both off-color (erotically so) and off-kilter (grammatically so) on Drew Curtis' Fark.com, a popular website where users comment on a variety of weird and wacky news articles. In this case the news concerns a Kentucky schoolteacher who was fired after it was revealed that she once appeared in an adult movie. The teacher says she's now a good Christian and is asking her community to forgive her. Here's the wisecracking headline that a Farker supplied for the linked article:

"Teacher who starred in porn movie a decade ago wants forgiveness, it harder, faster, OH GOD YES"

It's our old friend "WTF coordination," now working blue.

The submitter of the headline is consciously playing with the telegraphic conventions of headlinese, in which coordinate structures are often clipped by removing and, leaving only a comma to join two conjuncts (e.g., "Sen. Clinton Says Bush Has Charm, Charisma"). A startling change in register is also used for comic effect, shifting from the matter-of-fact reportage of "Teacher who starred in porn movie a decade ago wants forgiveness" to the soft-core titillation of "...it harder, faster, OH GOD YES." Stripping away the enraptured coda and adjusting for ellipticality, we're left with this coordination of unlikes:

[The] teacher wants {forgiveness} [and] {it harder [and] faster}.

The first complement of the verb want is a straightforward direct object, {forgiveness}. But when the headline shifts gears with the second complement, {it harder [and] faster}, things get a little tricky. One can think of the full version as using an infinitive complement of want with raising:

[The] teacher wants {it to be harder [and] faster}.

In such constructions, the infinitive to be may be optionally deleted, yielding:

[The] teacher wants {it harder [and] faster}.

The result is a kind of "small clause" complement, {it harder [and] faster}, which is quite divergent from the direct object complement {forgiveness}. What makes this coordination even harder to parse is the fact that the second complement starts off with the pronoun it, without any obvious antecedent. (The referent for the anaphor it in "[The] teacher wants it harder [and] faster" is left as an exercise for the reader.) Especially given the telegraphic deletion of the conjunctive and, the unusual second complement only heightens the WTF effect, perhaps requiring an extra reading or two to figure out what's going on. All of this actually serves to increase the comedic value of the bizarre and unexpected register shift between complements. The Fark.com commenters seem suitably appreciative, as the headline elicited such approbation as "*golf clap*", "kudos", "LOL-d", and "Go headline, go, go!" It just goes to show that incongruity as the engine of humor can encompass grammatical incongruity as well.

Posted by Benjamin Zimmer at 01:08 AM

May 09, 2006

A cautionary tale

If you've read John Chadwick's classic The Decipherment of Linear B, you'll recall that this script is often ambiguous as a way of writing Mycenaean Greek. For example, it omits many syllable-final consonants, so that e.g. khalkos "bronze" was spelled ka-ko; and there are some optional symbols sometimes used to "clarify the spelling of a word", including one transliterated as a₂ that can mean ha, initial au, or perhaps nothing at all. With this background, you'll be able to follow Karl Petruso's joke "The infuriatingly imprecise orthography of Linear B: a Cautionary Tale of Crime and Punishment in Bronze Age Greece."

This may not be the only extant joke whose punch line is (partly) in Mycenaean Greek, but it's the only one that I know.

Prof. Petruso's moral, I think, goes a bit too far:

Strive for precision in your writing.
And for God's sake, use an alphabet, not a syllabary.

Moraic or syllabic writing systems can be just as precise as alphabet systems -- or just as imprecise. After all, Prof. Petruso renders his punch line alphabetically as "No!! Not A₂-ke-ra-wo!! A-ke-ra-wo!!" rather than resorting to the sequence of Linear B glyphs that I could render in Unicode if I had the time.

[Via rogueclassicism via John McChesney-Young]

Posted by Mark Liberman at 02:51 PM

Dolphin naming?

The big animal communication story of the day is is the idea that dolphins have -- and use -- "names". The background of the story, known since 1991 or so, is that infant bottlenose dophins develop "signature whistles", and as they grow up, learn to recognize the signatures of their relatives and friends. This general sort of thing is not unique to dophins. If you saw the hit movie March of the Penguins, you'll recall that "speaker recognition" was pictured as crucial to several aspects of penguin family life, for example allowing returning females to find their mates in the crowd. Many other species have similar abilities, which is not surprising given that identification of familiar individuals is crucial for most types of social organization.

What's new here, apparently, is how identity is coded. In many cases, animal "caller ID" is coded somewhat in the way that human speaker identity is, as a holistic amalgam of the effects of vocal tract size and shape, low-level motor habits, and so on. In this experiment, the "signature whistles" were "synthesized", so that the researchers were able to control which aspects of the signatures were preserved and which were omitted. However, it's a long way from the recognition of synthetic signature-whistles to the conclusion that dophins have and use "names". It would be exciting to find out that such dolphins do exhibit naming behavior, such as dolphin A calling for dolphin B by using B's "signature", but I haven't found any indication in the media coverage that the PNAS article contains any evidence of such things. (There are plenty of assertions that dolphins do name one another, some attributed to Vincent Janik, the lead author of the forthcoming PNAS article that is causing this media buzz, but it's hard to know what to make of this).

I wrote that the experiment's use of synthetic whistles is "apparently" what's new because the article, due to appear online in the Proceedings of the National Academy of Sciences this week, and presumably in the journal itself next week, isn't available yet. I'll post a more careful evalutation once the article is out. Meanwhile, experience suggests that it's a bad idea to spend too much time trying to construe what the MSM has to say about it. In particular, I'd be very careful about using words like "name" and "naming" to describe these experiments (even if the researchers themselves are quoted as doing so).

Here's what The Times says about the experiments themselves (under the headline "Dolphins 'know each other's names'"):

In the study some of the Sarasota Bay animals were corralled in a net. The researchers then played synthetic versions of the signature whistles of other dolphins through underwater loudspeakers to see if they would evoke a response in the captive animals. The use of synthetic whistles ruled out the possibility that the animals might simply be recognising the sound of each other’s voices.

They found that dolphins responded strongly to the whistles of their relatives and associates while generally ignoring those of dolphins to whom they had no link.

Janik said: “Bottlenose dolphins are the only animals other than humans to have been shown to transmit identity information independent of the caller’s voice.”

In my opinion, it's hard to know how to evaluate this statement without knowing exactly how the whistles were synthesized. When you talk on a cell phone, for example, your voice is digitized, analyzed, compressed and re-synthesized at the other end for the person you're talking to, who recognizes your voice as well as your words despite this treatment. So another way to express the results of these experiments might be to say that the experimenters' method of analysis and re-synthesis has preserved some of the features that dolphins use to distinguish between familiar and unfamiliar voices. And there's nothing in the description of the experiments to suggest that they addressed the question of whether dolphins use others' signatures vocatively or referentially.

Let me close by saying that I hope my skepticism is unwarranted in this case. It would be exciting to learn that dolphins' signature whistles are (say) like the songs of some birds, which are individualized by learned and practiced sequences of particular types of "syllables", and not like the individual barks of baboons, which (as I understand it) are individualized by their relatively automatic encoding of vocal tract size and shape, vocal-cord vibration patterns, and so on. It would be stupendous to learn that dolphins use such identifying calls in some of the ways that humans use names, e.g. to call for others by name, to refer to absent third parties by name, and so on. All of that, and more, is stated or implied in Michael Hopkin's story on the subject in news@nature.com, under the headline "Dolphins play the name game".

Unfortunately, the smart money says that I'll be disappointed, at least in terms of the evidence presented in the forthcoming PNAS paper. The media reliably overinterpret science stories that push their buttons, and nothing pushes people's buttons like talking animals.

Posted by Mark Liberman at 06:25 AM

May 08, 2006

Says Who?

Taking exception (without actually naming it) to a recent David Brooks column that depreciated the "conspiracy fantasies" of Kevin Phillips' American Theocracy, Paul Krugman wrote in his column today:

A conspiracy theory, says Wikipedia, ''attempts to explain the cause of an event as a secret, and often deceptive, plot by a covert alliance.''

Not a bad definition, but that "says Wikipedia" had me doing a double-take. Time was when the honor of attribution was reserved either for individual authors or for sources that are in a position to speak with an institutional voice, like the Times itself, the Encyclopedia Britannica or the OED (which by the way offers its own rather more precise if wordier definition of conspiracy theory as "the theory that an event or phenomenon occurs as a result of a conspiracy between interested parties; spec. a belief that some covert but influential agency (typically political in motivation and oppressive in intent) is responsible for an unexplained event"). In this case, sticklers might insist on a more prudent attribution like "say some anonymous self-selected contributors to Wikipedia." But if a PhD columnist at the doyenne of the Old Media can invest Wikipedia with the same metonymic authority used to be reserved for the august institutions of the print world, maybe we should throw in the towel on this one.

Posted by Geoff Nunberg at 11:59 PM

Pirahã links

Since Dan Everett and the Pirahã were featured in the Independent a couple of days ago (and Spiegel as well), I've laid out Language Log's Pirahã links for your convenience.

One, two, many -- or 'small size', 'large size', 'cause to come together'? (8/20/2004)
Life without ~~counting~~ throwing (8/22/2004)
The Straight Ones: Dan Everett on the Pirahã (8/26/2004)
On counting and throwing (8/27/2004)
No abstract concepts for them (9/7/2004)
Pica on the Mundurucú (11/1/2004)
Cultural constraints on grammar (3/10/2005)
Good story, bad headline (5/11/2006)
Parataxis in Pirahã (5/19/2006)
Pirahã channels (5/21/2006)
Fear and loathing on Massachusetts Avenue (11/29/2006)
Dan Everett and the Pirahã in the New Yorker (4/9/2007)
Pirahã color terms (4/13/2007)
Comments on 'The Interpreter' (4/23/2007)
The enveloping Pirahã brouhaha (6/11/2007)

There are lots of interesting links on Dan's web site, but the best single thing to read is probably Daniel L. Everett, "Cultural Constraints on Grammar and Cognition in Pirahã: Another Look at the Design Features of Human Language", Current Anthropology, Volume 46, Number 4, August-October 2005.

Posted by Mark Liberman at 03:33 PM

Why I Will Never Be a Wine Connoisseur

I'm the first to admit that wine terminology is beyond my ken. But I like to think I'm an open-minded person, so I'm quite willing to drink a wine that's described as (for instance) having `superb nose with lifted aroma of confectionary, liquorice, spice and violets', with `blackpepper nuances' that `explode on the back palate supported by fine grained tannins and long plum and spice aftertaste', even though I have not personally tasted any of those things in any wine that has crossed my palate. But the other day I read a wine description that seemed to go too far.

This wine, a 1999 Cabernet from Stags' Leap Faye Vineyard, was listed on the restaurant menu for $150, and the pitch went like this: `Graceful style with complex earthy currant, sage, tobacco and pencil lead flavors, which focus and gain intensity on the finish'. Now, I have nothing against earthiness, currants, or even sage in my wine, especially since I can't taste them anyway, but I'm not sure even a dedicated smoker would want to drink tobacco (I could be wrong here, of course), and who on earth would want to taste pencil lead in a glass of wine? I thought it must be a joke, but the server assured me that it wasn't. So I copied down the text and came home and googled "pencil lead flavors", and got 182 hits! Only 24 were actually listed as `most relevant' (least repetitious); several of these were clearly to this same wine, but others were to other wines, and one was to a cigar. So now I know: apparently wine connoisseurs do want their wine to taste of pencil lead. I don't. Since I can't taste all those other things, probably I also wouldn't taste the pencil lead, but I didn't buy the bottle to find out.

I do wonder what kind of training people undergo to learn to write descriptions of wines. I am having trouble envisioning a process that involves tasting a wine, discovering (by whatever occult means) that it has pencil lead and tobacco flavors, and then admitting that right out loud in the official description, in the belief that those flavors will attract buyers. Presumably buyers of expensive wines also have to be educated so that they can avoid being disgusted by such a description. How does all this happen?

[Some other LL posts on winetalk:

Just a trace of the obligatory rubber (4/9/2004)
Ritual verbal enthusiasm for food (5/11/2004)
More winetalk imports into coffee lingo
Apologia pro risu suo (6/2/2004)
Grand Cru smackdown (6/2/2004)
More on winetalk culture (6/2/2004)
What do winetasting notes communicate? (6/5/2004)

]

Posted by Sally Thomason at 12:29 PM

Deep in the Hookergate weeds

I'm a regular reader of Joshua Micah Marshall's Talking Points Memo, and you should be too if you're at all interested in American politics or in American political language. Last September, I commented on his use of the word "parse". In one of yesterday's items it was the phrase "deep in the weeds" that got my attention:

You've got to be relatively deep in the "Hookergate" weeds to follow this one. But Newsweek has now positively identified the CIA party-attender "Nine Fingers" as Brant Bassett, ex-CIA official and former Goss Hill staffer.

That "deep in the 'Hookergate' weeds" phrase is quintessential TPM, I thought to myself. But I know that I'm no more immune to the linguistic "frequency illusion" than anyone else is, so I thought I'd check. Searching for {weeds} on the TPM site, I found five additional examples:

(May 04, 2006) Okay, this one will go a little deep in the weeds. But bear with me because I think it'll shed a bit of light on the amount of due diligence the Department of Homeland Security did on Shirlington Limo ...
(Feb. 15, 2006) These [sic] gets fairly deep into the weeds of the Whittington case. But one point that garnered a lot of discussion yesterday was just how the birdshot got lodged in or near Whittington's heart.
(July 18, 2005) This point is admittedly very deep in the weeds. But if you're playing the Rove/Plame/Niger sleuth game like many of the rest of us, it's a significant point.
(July 17, 2004) This post will take us admittedly deep into the weeds of the Iraq-Niger saga. But if you can handle the detail, let's proceed. As we've noted several times recently, both the Senate intel committee report and the recent "Butler Report"...
(May 17, 2004) Policy wonks may get into studies about money put into security at America's major seaports or what the president is doing to get the FBI into shape to combat terror. But that's deep in the weeds where few non-policy wonks venture.

Do six uses of a phrase in two years count as "quintessential"? Well, I've observed before that a word or phrase may only need to be repeated a couple of times in order to seem characteristic of a writer or speaker, if the use in context is striking enough. In this case, five of the six TPM uses of "deep in the weeds" are used to introduce a post, as part of a ritualized warning to the reader that the content will involve a level of detail that some may find excessive.

In comparison, the phrase "deep in the weeds" has never been used on Language Log, on Language Hat, on the Volokh Conspiracy, on Crooked Timber, etc., although these blogs are more often deep in (what some might consider) the intellectual weeds than not.

DITW is of course not unique to TPM. Even the ritual-excuse usage can be found elsewhere on the web, e.g. in Margaret Warner's 10/17/2003 conversation with William Safire on PBS NewsHour:

We're getting deep in the weeds here, but let's go on to the U.N. vote yesterday, Bill.

This suggests an inside-the-beltway thing, and there are plenty of other examples from political language:

Word is spreading among NAPUS Legislative Activists that Congress is already communicating with them on some “deep-in-the-weeds” postal reform issues.
If you get deep in the weeds of how pollution markets function, you come to understand that they can be a remarkably effective tool for reducing emissions...

But plenty of non-political types also use "get (or be) deep in the weeds" meaning "to deal with a topic at a remarkable and perhaps excessive level of detail":

I don't want to get too deep in the mathematical weeds (which look like little, green integral signs), but there's an equation governing gas pressure called Gay-Lussac's Law. To really boil it down, pressure P is equal to a constant k times temperature T: P = k • T.
Without getting too deep in the technical weeds, the technology involved the easy movement of electronic data.
Without getting too deep in the weeds, dogmas are simply values or principles that cannot be proven, but that we accept as true or divinely decreed (and therefore true).
Yeah, you know you're deep in the meta weeds when you can't write a story summary that is coherent and useful and shorter than the story itself.
I didn't want to get too deep in the weeds, and was only trying to give the poster a general idea of why he didn't need a "neutral" on a 240V load.
I don't want to get too deep in the weeds on this histogram thing.
I am also very concerned when accounting firms suggest that they were making materiality determinations at the segment, interim financials level. This is much too deep in the weeds. Materiality assessments need to be made on an enterprise basis at the annual period level.

The metaphor here seems to be that when you wander off the beaten path, you can explore arbitrary amounts of not-very-valuable intellectual foliage ("weeds") without getting closer to your conceptual destination. (Many of the literal uses of "(deep) in the weeds" involve fishing, for reasons that are obvious to any fisherman. However, the fishing context doesn't seem to play a role in the development of the metaphors here, as far as I can see.)

There's another metaphorical use of "in the weeds" -- maybe even commoner -- that has a different meaning, clearly explained by Chris DeLorenzo in a May 2 blog post:

Anyone who's been a waiter at a busy restaurant for any length of time surely knows what I'm talking about. It's a dream where you find yourself "in the weeds" (waitering term meaning so busy and behind that your head spins a la The Exorcist), so deep in the weeds that there doesn't seem any way out, that all is hopeless, and that with each passing moment the weeds get thicker and thicker.

There are plenty of other examples on the web of this waiter-specific "in the weeds", with or without "deep":

Back when I was still cooking, we had a favorite phrase, deeply treasured, to describe a guy (usually a young guy, usually on his first or second night working fryers or garde manger) who was so far down that he could no longer see beyond the next ticket. When he was flustered, when he'd lost the long view and had that white-eyed, glassy stare that signals the onset of total adrenaline burnout, he was "shitting dandelions," as in, "Check out Bob. Motherfucker's so deep in the weeds he's shitting dandelions."
On a recent hectic Saturday night, our food arrived with dispatch but our overburdened server was so deep in the weeds that we endured long waits for everything from chips for our appetizer dips to cream for our coffee.
The waitstaff were so deep in the weeds that they started speaking in tongues.

An online "Waiter's Glossary" also gives the derived form weeded:

1} To have to much to do. 2} To not be able to keep up or accomplish all the tasks set before you. 3} A feeling or act of being behind in one's work. 4} A very bad place. i.e. "I have too many tables, I'm very weeded!"

By 2000, the phrase "in the weeds" was mainstream enough to become the title of a movie starring Molly Ringwald and Eric Bogosian, among others, and described on IMDB as "a mediocre ensemble dramady repartee-fest about the staff of a restaurant and their activities during one night's dinner rush".

You can also find extensions of the restaurant sense to other cases of over-scheduling:

The soup recipe (Boston Globe) called for six cups of chicken stock. My stock supply is terribly low, and it being a Friday (and my schedule being already deep in the weeds at breakfast) there was no time for proper stock.
BAM!... your boss cheerfully dumps several more confusing, time-sensitive projects in your lap and you're deep in the weeds again.

Note that this is a rather different meaning from the one we started with. Josh Marshall goes deep in the weeds because he's enjoying himself, following story leads into a thicket of details. He's in control, even if he worries that his readers may not want to follow all his excursions. A waiter in the weeds, in contrast, is tangled, hindered, and overwhelmed.

Some examples seem to be in between the Good Weeds ("pursuing details") usage and the Bad Weeds ("overwhelmed by demands") usage: a Neutral Weeds use, meaning something jlike "busy with details":

I was deep in the weeds trying to debug the Skycam control loop. Having no other option, I was about to try some Trial and Error debugging when a teammate suggested that we take a close look at my Drum Circumference Parameter.
We've been working very hard on content for the show, and we're pretty deep in the weeds of it right now.
I'm deep in the weeds of building a really nice Tournament Tracker and Online Fishing Logbook ...
My friends Winnie and Chris are deep in the weeds of wedding planning and Blue Hill at Stone Barns was a contender.
Haven’t posted in a while because I’m deep in the weeds on a project.
I apologize, but there's a good reason for this lack of content: I've been deep in the weeds on a cover story about cute, diseased puppies and the kind people who adopt them from the pound during the holiday season.
I am way too deep in the weeds of my book to offer further extended thoughts on what we can now fairly call the Rove-Plame affair.

Another way to think about this is that there are many different sorts of reasons why someone could wind up "in the weeds", as well as different evaluations (perhaps by different people) of the costs and benefits of being there. These distinctions start out as inevitable parts of the story we can make up for ourselves around the concept "in the weeds" -- but as some particular patterns of use become common, certain sets of story lines congeal into "senses".

[Update: Grant Barrett's Double-Tongued Word Wrester Dictionary has an entry for "weeded", with citations from 1995. One of Grant's citations suggests a swimming metaphor: "As in 'in the weeds' like you're 'swimming' (another useful bit of nomeclature for the same thing) in a lake, and being tangled in seaweed."

Grant himself suggests (by email) that golf is another source for the expression "in the weeds", which makes sense.]

[And Jonathan Lundell writes that "My first recollection of the phrase "in the weeds" is in connection with sports car racing, as term for going off-course".]

[Chris Brew contributed a similar-but-different herbage idiom from across the pond:

Joshua Micah Marshall's weeds phrase somewhat resembles "kicked into the long grass", which is an "inside the M25" phrase for placing a ticklish issue on one side by sending it to some glacially slow committee.
The metaphor must be something to do with soccer, but I'm not totally clear on how that works. Unless the grass is very long, I doubt whether the ball would be as lost as typical usages imply. You'ld just go and find, it not wait ten years and never get back to it.

Chris also sent a link to an article in The Times ("The long grass") whose punning headline seems to use this idiom to the subvert the author's meaning, which was to call sincerely for "a truly comprehensive review of cannabis policy". On the other hand, the last paragraph does start by observing that "Mr Clarke has kicked this controversy into the long grass. The review he has commissioned will not even start until almost a month after polling day." ]

[Rich Alderson explains that

Chris Brew's idiom "kick it into the long grass" is golf-derived, essentially describing a form of cheating: A player whose ball lands in the rough so as to be unplayable without adding multiple strokes to the hole can cheat by kicking the ball out-of-bounds into the really long grass and take a one-stroke penalty for a lost ball.
The metaphor here would be that the ticklish issue gets lost, and we play with a different issue altogether (the new ball).

]

Posted by Mark Liberman at 11:00 AM

May 07, 2006

The alcoholic orientalist thief vs. the tenth-rate syntactical train wreck

Christopher Hitchens and Juan Cole have been arguing about how to interpret the anti-Israel rhetoric of the Iranian leadership. As I understand it, Cole believes that the Iranians are calling for an end to the Israeli occupation of the territories captured in the 1967 war, while Hitchens interprets them as calling for the violent destruction of the Israeli state and the expulsion or death of its Jewish residents.

Mostly, though, Cole and Hitchens have been arguing about each other. Cole referred to his critic as "Hitchens the orientalist", asserted that Hitchens has a "very serious and debilitating drinking problem", and called him an "asinine thief". Hitchens has called Cole "tenth rate", "the embodiment of mediocrity", and "in need of a remedial course in English", explaining that "his sentences are made up of syntactical train wrecks".

I don't have anything to add to their argument about how to translate Ahmadinejad and Khomeini -- see below for a list of links to primary and secondary sources on this debate, if you want to learn more. My interest is in some of the insults.

The OED defines orientalist as "An expert in or student of oriental languages, history, culture, etc.". Christopher Hitchens has no expertise in oriental languages, history or culture, as far as I know, nor even any particular interest in those topics, so you might wonder why Juan Cole ran his first response to Hitchens under the title "Hitchens the Hacker; And, Hitchens the Orientalist And, "We don't Want Your Stinking War!" And underneath all the insults, the argument is about how to translate and interpret some sentences of Persian, so why is Christopher Hitchens complaining about Juan Cole's English prose style?

The "orientalist" insult is easier to explain. The OED's citations start in 1723:

1723 H. ROWLANDS Mona Antiqua Restaurata 318 We have two learned Orientalists.
1779 JOHNSON Smith in Pref. Biogr. & Crit. IV. 38 The great Orientalist, Dr. Pocock.

As the dates suggest, orientalism began as an enlightenment phenomenon, associated with scholars as diverse as William "Oriental" Jones, a radical Whig who supported the American side in the revolutionary war and learned Sanskrit so as to be able to understand the legal system in Bengal, and Hermann Grassmann, a German secondary-school teacher whose Wörterbuch zum Rig-Veda is still used, and who also made important contributions to mathematics.

Thus "Orientalist" was a rarely-used term of respectful description until 1978, when Edward Said published his massively influential book Orientalism. Said argued that the Palestinians were "the latest victims of a deep-seated prejudice against the Arabs, Islam, and the East more generally — a prejudice so systematic and coherent that it deserved to be described as 'Orientalism,' the intellectual and moral equivalent of anti-Semitism". This description comes from Martin Kramer's highly critical chapter, "Said's Splash". Edward Said put it more categorically: "Since the time of Homer every European, in what he could say about the Orient, was a racist, an imperialist, and almost totally ethnocentric." Said's book was arguably one of the most consequential works of the second half of the 20th century, with an enormous influence on intellectuals from Middle Eastern countries and also on Americans in several academic disciplines.

To recap, then: "orientalist" started out meaning "(European) expert in the Orient"; Said argued in 1978 that all Europeans have always been ethnocentric racists and imperialists, including those experts who pretended to have a purely scholarly interest; and this accusation is apparently so significant to Cole that it has completely washed out the original notion that an "orientalist" is someone who studies the Orient. For him, an orientalist is now simply someone who is prejudiced against the Orient, and specifically against Islam. (A good if mostly irrelevant joke about an orientalist in Egypt is explained here.)

This sort of development, in which a connotation comes to replace a word's original denotation, is common in the history of words. It helps explains many of the cases in which a word's meaning turns into its opposite, as has happened recently with "liberal" and "conservative".

As for the business about Cole needing remedial English for his syntactical train wrecks, that begins with the last paragraph of Hitchens' original May 2 Slate article:

One might have thought that, if the map-wiping charge were to have been inaccurate or unfair, Ahmadinejad would have denied it. But he presumably knew what he had said and had meant to say. In any case, he has an apologist to do what he does not choose to do for himself. But this apologist, who affects such expertise in Persian, cannot decipher the plain meaning of a celebrated statement and is, furthermore, in need of a remedial course in English.

Hitchens bangs the same drum in an interview on May 3 for the Hugh Hewitt Show. Duane Patterson at Radioblogger transcribed the crucial passage as follows:

Hugh Hewitt:

I wonder, what has happened to the left, Christopher Hitchens, in their confusing of sort of personal attacks and slanders with argument?

Christopher Hitchens:

Well, I've always thought that attacks of that kind, wherever they come from, were invariably a sign of weakness. I mean, if Juan Cole wrote a piece attacking me, and all I could think of in reply was to say well, he seems like a dope fiend, or a closet case, or a pederast, I would feel that I wasn't really meeting his argument, I mean, that I hadn't replied to the points he'd made against me. The ad hominem is widely and rightly denounced, because it shows a collapse on the part of the person who uses it. They won't reply to your point, they won't reply to your case. And Cole, who is the embodiment of the mediocre, this would not surprise me in the least. I mean, he writes as if he's drunk, because you have to, the sentences are made up of syntactical train wrecks. But I don't think it's alcohol in his case. I think it's illiteracy, simply.

The wording of Hewitt's question adds to the evidence that he suffers from a serious case of irony deficiency. But the answer made me wonder whether Hitchens might be catching the disorder as well: the "syntactical train wrecks" phrase is in the second of two apparently ungrammatical sentences. So I downloaded the mp3 and did my own transcription, which removes what otherwise would have been a nice instance of Hartman's Law:

HH:	I- I wonder, what has happened to the left, Christopher Hitchens, in their confusing of sort of personal attacks and slanders with argument?
CH:	Well, I've- I've always thought that attacks of that kind, wherever they come from, were all- invariably a sign of weakness. If- if um- if Juan Cole wrote a piece attacking me, and all I could think of in reply was to say well, he seems like a dope fiend, or a- a closet case, or a pederast, um I would feel that I wasn't really meeting his argument, I mean, that I hadn't replied to the points he'd made against me. It's a- the ad hominem is ri- widely and rightly denounced, because it's- it- it shows um a collapse on the part of the person who uses it, they- they won't reply to your point, they won't reply to your case.
HH:	Let's go back to the-
CH:	((with)) Cole, who is the embodiment of the mediocre, this would not surprise me in the least.
HH:	I- I want to go back to the key # point-
CH:	I mean, he writes as if he's drunk, because you have to- the sentences are made up of syntactical train wrecks. But I don't think it's alcohol in his case. I think it's just- it's- it's illiteracy, simply.

Hewitt keeps interrupting to try to make a point of his own (which of course for him is the "key point"), and this covers Hitchens' word "with", which rescues the first sentence "((With)) Cole, who is the embodiment of the mediocre, this would not surprise me in the least". And in the second sentence, it's pretty clear that the clause-initial "you have to" is a false start, which Hitchens cancels before going on. With this edit, the result becomes fully grammatical: "I mean, he writes as if he's drunk, because the sentences are made up of syntactical train wrecks".

Though Hitchens' syntax is rescued by more careful transcription, his logic is not. I follow the idea that ad hominem attacks are a sign that more pertinent arguments are lacking; and I follow the claim that Cole lacks more pertinent arguments because he's mediocre; but it's not clear what the quality of his syntax has to do with anything. I suppose that Hitchens means that Cole's alleged stylistic flaws are evidence for the poor quality of his ideas. I've run across this idea before, for example in Somali political poems (called gabay), where explicit analogies between metrical and logical rigor are common; but this is like arguing that someone must be telling the truth because she's got a lovely face. Some people frame cogent arguments in crappy prose, and others say nothing but put it beautifully.

Hitchens puts forward a related argument earlier in the Hewitt interview:

His English is, by the way, very poor. I can't believe his Persian is excellent, because his English is lousy.

Here the argument seems to be that no one could construe Persian prose correctly without being able to write elegant English; Juan Cole doesn't write elegant English; therefore he must not be able to understand Persian. This strikes me as preposterous, frankly. It may not be an ad hominem argument -- though I suspect that Hitchens is much vainer of his prose than of his face -- but it shares the quality of undermining someone's arguments by attacking associated though logically irrelevant qualities.

It seems to me that you can tell something about Cole and about Hitchens from their choice of insults. Cole apparently shares with many other American Middle East experts an urgent need to demonstrate that he is an exception to Said's claim that "[s]ince the time of Homer every European, in what he could say about the Orient, was a racist, an imperialist, and almost totally ethnocentric". This may motivate him to bend over backwards to interpret Iranian threats in a sympathetic way ("... The phrase is almost metaphysical. He quoted Khomeini that 'the occupation regime over Jerusalem should vanish from the page of time.' It is in fact probably a reference to some phrase in a medieval Persian poem. It is not about tanks. ...") It certainly led him to bring out "orientalist" as one of the first insults he slung at Hitchens.

Hitchens, on the other hand, apparently confuses style and substance. Would he have more respect for Cole's arguments if he thought they were better written?

For what it's worth, I agree with Jeff Weintraub's assessment of the substantive issues at stake:

Cole's recent apologetics for the actions and statements of the Iranian regime have become increasingly strained, misleading, irresponsible, and difficult to take seriously. I am afraid that Hitchens's criticisms of Cole in this piece [the Slate article -- myl] are entirely deserved.

If you want to follow the debate in primary sources, here are the main links:

Christopher Hitchens: "The Cole Report: When it comes to Iran, he distorts, you decide." Slate, May 2, 2006.
Juan Cole: "Hitchens the Hacker; And, Hitchens the Orientalist. And, "We don't Want Your Stinking War!" (May 3, 2006)
Juan Cole, "Hitchens not Drunk, Only an Asinine Thief" (Blog post, May 3, 2006)
Christopher Hitchens, Transcript of Radio Interview (May 3, 2006)
Juan Cole, "Cobban on Hitchens" (Blog post, May 4, 2006)
Juan Cole, "Cole/Weisberg conversation on Hitchens" (Blog post, May 5, 2006)

For an evaluation of the content of the argument by a third party who favors Hitchens' side, you could try a series of posts on Jeff Weintraub's weblog:

Jeff Weintraub, "Juan Cole's Iran distortions (Christopher Hitchens)" (May 3, 2006)
Jeff Weintraub, "Michael Young on Hitchen vs Cole" (May 3, 2006)
Jeff Weintraub, "P.S. re Cole, Hitchens and Ahmadinejad" (May 5, 2006)

Some detailed discussion of the contested words by an Iranian who agrees with Hitchens, Tino Sanandaji, is here.

There are many pro-Cole weblog posts, but I haven't been able to find any that say much besides "go get 'em, Juan!" The most serious support for Cole that I've found is a post by Helena Cobban, "Cole, Hitchens, and the threat of a US attack on Iran" (May 3, 2006), but the content of her post is just that she can testify from 35 years of personal acquaintance that Hitchens "has long had a very serious drinking problem", and she has "huge admiration" for Cole's "scholarship and for the personal qualities of caring and commitment that he brings to all his endeavors", and and that she "applaud[s] and completely support[s] the firmly antiwar position he has expressed regarding US policies toward Iran". There's not a word about the specific issues being debated, namely what Ahmadinejad and Khomeini said and what they meant by saying it. If you know of any pro-Cole post or articles that include any contentful analysis of this question, let me know and I'll post the links here.

[Update: Ben Zimmer suggests this:

Bill Scher, " The Importance of Cole v. Hitchens" (The Huffington Post, 5/4/2006)

]

[Here's another pro-Cole post that offers something beyond cheerleading: "BTC News unearths another Ahmadinejad apologist" (BTC News, 5/7/2006).]

Posted by Mark Liberman at 02:06 PM

May 06, 2006

¿Es el español un idioma extranjero?

One of the ideas rearing its ugly head in the present debate about US immigration is the proposition that Spanish is a foreign language introduced by illegal immigrants. Using US government sources, I obtained the numbers of people of Hispanic origin in the Southwestern US, in the seven states that have been part of the United States since they were taken from Mexico 150 years ago. From these I subtracted the estimated numbers of illegal residents, resulting in estimates of the number of legal residents of Hispanic origin. I have tabulated the results below (in thousands):

State	Hispanic	Illegal	Legal Hispanic
Arizona	1,442	283	1,159
California	11,629	2,209	9,420
Colorado	787	144	643
Nevada	453	101	352
New Mexico	800	39	761
Texas	7,197	1,041	6,156
Utah	216	65	151
Total			18,642

There are approximately 19 million legal residents of the Southwest of Hispanic background. If we include legal residents who have not been in the mainland United States so long, we may add the 1,169,000 Cubans in New York and Florida and 800,000 Puerto Ricans in New York, for a total of 20,611,000. Hay por lo menos 20 milliones de personas de origen hispánico quienes habitan legalmente los Estados Unidos, la mayoría de las cuales vienen de familias que han habitado los Estados Unidos por muchas generaciones. ¿Cómo se puede decir que el idioma español sea una lengua extranjera en los Estados Unidos?

Sources

Posted by Bill Poser at 08:51 PM

No Word for Thank You

A recurring theme here on Language Log is the claim that such-and-such a language has no word for this-or-that. Such claims are easy to make fun of as they're often wrong and even when true don't have the implications people think they do. There are, however, some serious points to make about these claims. Not only do they rest on false linguistic premises, but they can be quite damaging.

One false premise is the Whorfian one on which we have often commented: not having a word for something does not mean that one lacks the concept. Indeed, there is some very good evidence that people have covert categories, classes of things for which we have concepts but no terms. The evidence for this is nicely presented in Brent Berlin's wonderful book Ethnobiological Classification: Principles of Categorization of Plants and Animals in Traditional Societies. If you haven't read it, you should do so immediately. Not only is it fascinating, but it has lovely illustrations by one of the Tzeltal people whose ethnobiology Berlin has studied extensively. (And if you're the person who has my copy, please return it.) Among other examples, Berlin points out that in Tzeltal there are no words corresponding to "plant" and "animal", but there is a variety of evidence that Tzeltal speakers nonethless classify living things into categories that correspond to "plant" and "animal". One such piece of evidence is the fact that there are classificatory suffixes that reflect these categories.

Another false idea is that if people have borrowed a word that implies that they previously lacked a word for the same idea. There is some truth to the converse of this idea - if a new idea is introduced, people are likely to create a term for it, and a common way to do this is to borrow the word from the language of the people from whom the new idea comes - but people borrow words for reasons other than the absence of an equivalent term in their own language. This is easily seen in the history of languages like Japanese which have vast numbers of doublets, one native, the other borrowed, for the same meaning. In the case of Japanese most of the loans come from Chinese. There is typically both a native Japanese word for something and a loan from Chinese. Usually, though not invariably, the loan from Chinese belongs to a more formal, literary register. One of the difficulties of reading Japanese is that if both words are written in Chinese characters it may be impossible to know which word is intended. Often you can figure this out from the morphology, but sometimes you can't. For example, if you encounter 村人 "villager", a compound of 村 "village" and 人 "person" there is no way to be sure whether to read it as murabito, which is the native Japanese word, or sonjin, the loan from Chinese. You can guess, since sonjin is much more formal than murabito, but you can't be sure.

A particularly damaging example of the No word for X fallacy is one that one hears here in Northwestern Canada. Many of the Athabascan languages of Canada have a word for "thank you" that is borrowed from French merci. In Carrier it is [mʌsi]. This fact has suggested to the ignorant that these languages previously had no word for "thank you", from which they draw the further conclusion that their speakers had no concept of gratitude. Such a people, of course, must have been sub-human savages. The conclusion is that it's a good thing that white people came to rescue them from their degraded traditional way of life. This claim is so well known that it figured in an episode of the television program North of 60, which was set in a Slave village in the Northwest Territories.

The fact is that the loan was not motivated by the lack of a native way to say "thank you". In Carrier, there are actually two different verbs for expressing thanks, one for giving thanks for what someone has said, the other for giving thanks for what someone has done. Both verbs are conjugated for both the subject (the one thanked) and the object (the one giving thanks). Here are the paradigms for giving thanks for what someone has done and for giving thanks for what someone has said. (These are in the Stony Creek (saik'ʌz) dialect.)

snaʧailja	I thank you (one person)
snaʧaɬʌja	I thank you (two or more people)
nahnaʧailja	We (two) thank you (one person)
nahnaʧaɬʌja	We (two) thank you (two or more people)
nenaʧailja	We (more than two) thank you (one person)
nenaʧaɬʌja	We (more than two) thank you (two or more people)

snaʧadindlih	I thank you (one person)
snaʧadahdlih	I thank you (two or more people)
nahnaʧadindlih	We (two) thank you (one person)
nahnaʧadahdlih	We (two) thank you (two or more people)
nenaʧadindlih	We (more than two) thank you (one person)
nenaʧadahdlih	We (more than two) thank you (two or more people)

The reason that the subject is the one thanked is that these verbs literally mean something like "you have done me a favour".

The verb for giving thanks for what someone has said is the appropriate verb for saying "No, thank you". Since you are refusing what is being offered to you, you are not giving thanks for receiving something. Rather, you are giving thanks for the offer, which is something that someone has said.

Far from lacking a way of saying "thank you", Carrier had, and has, a more highly articulated, finer-grained way of doing so than English or French. The loan from French is used for relatively casual thanks, and increasingly by semi-speakers and non-speakers, but truly fluent speakers still use the traditional verbs when seriously expressing gratitude.

Posted by Bill Poser at 01:31 PM

Yet another plagiarism case

According to the Chronicle of Higher Education, Scott D. Miller, the President of Wesley College in Delaware, has been accused of plagiarizing two entire documents. One is Wesley College's statement of management philosophy, which is reported to be identical to that of Samford University in Alabama except for the substitution of Wesley for Samford and College for University. President Miller maintains that he inherited the statement from his predecessor and is not responsible for its creation. The other document is an essay entitled "The Liberal Arts: Solving the 'Practicality Gap'," which is reportedly identical to a one section of a 1997 essay by Richard H. Hersh, then the president of Hobart and William Smith Colleges. No explanation for this one has yet been offered.

Miller is a repeat offender: in 2000 he was found to have plagiarized a speech on multiculturalism, an incident also reported on by the Chronicle. Apparently, if you're a 19 year old student there's a huge hue and cry over less than 1% of a novel, but if you're a university president copying entire speeches and essays verbatim is okay.

Posted by Bill Poser at 06:17 AM

Cartoon roundup, "Nuestro Himno" edition

Every once in a while a linguistic issue dominates the national discourse: think of the "Ebonics" dispute of 1996, or the debate over California's initiative to curtail bilingual education in 1998. For the last week or two the "Nuestro Himno" controversy has provided such a moment, as a Spanish translation of the national anthem has become a flashpoint in the political conflict over immigration and the role of English in an increasingly multicultural nation. All of this public wrangling has afforded a great learning opportunity for university courses on language and culture — see, for instance, this QuickTime movie that Dennis Baron prepared for his classes at the University of Illinois, with clips of "Nuestro Himno" coverage from CNN, Fox, NBC, CBS, and even "The Colbert Report" in all its resplendent truthiness.

Political cartoonists have supplied another ready source of material, as many of them have seized on the anthem controversy to explore issues relating to linguistic and cultural assimilation, bilingualism, and anxieties over whether English is losing its hegemonic grip. Below are ten recent language-related cartoons, covering a broad political spectrum. (The views expressed are solely those of the cartoonists and do not necessarily reflect the sparkling exchange of ideas at Language Log Plaza.)

Glenn McCoy, 4/27/06:

Chan Lowe, 4/29/06:

Dana Summers, 5/1/06:

Dan Wasserman, 5/1/06:

Tony Auth, 5/2/06:

Wayne Stayskal, 5/2/06:

Lalo Alcaraz, 5/3/06:

Chip Bok, 5/3/06:

Dana Summers, 5/3/06:

Matt Davies, 5/3/06:

Posted by Benjamin Zimmer at 01:17 AM

May 05, 2006

Francesco spells trouble

I was intrigued by reports of the remarkable process used last week in the Italian parliament for choosing a new speaker of the Italian Senate. The NYT reported on the attempt to elect Franco Marini to the position as follows:

Mr. Marini failed to win the necessary absolute majority in the first secret vote on Friday morning by five ballots.

In the second vote, a preliminary count showed he had won by one ballot, but the outcome was contested by the center-right, which complained that Mr. Marini's first name was given as Francesco on two of the hand-written ballots, making them void.

After lengthy deliberations, the acting speaker, Oscar Luigi Scalfaro, annulled the vote and demanded it be taken again. But Mr. Marini once more fell just short, this time by one vote.

Misspellings appeared in all three votes, but in the country that gave the world Machiavelli, few people thought the errors were genuine. Rather they were seen as veiled warnings that support from some senators could not be taken for granted.

I'm not quite sure how Gricean theory would explain the implicature from misspelt ballots to weakness of political support. Presumably the Maxim of Manner comes into it. But the fact that the manner of expression departs from the norm does not in itself tell us what that departure signals, only that something is being signaled. The logic of the implicature seems to be: if I can't be trusted to spell the name of your favorite candidate correctly, I can't be relied on for anything. And this is given an extra level of piquancy by the anonymity of the ballot: everyone now knows that someone cannot be trusted by the new Prime Minister, Romano Prodi, who was pushing the Marini nomination. But we don't know who can't be trusted. And what of the senators who spelt the name correctly? Surely it is generous of someone who cannot be trusted to let you in on the fact? So the true backstabbers presumably reproduce the name of the enemy's candidate perfectly. But what if you were a genuine supporter of the new prime minister, yet knew of others who were treacherous? Might you then spoil your ballot to make it plain to all that at least some senators could not be trusted? Might Franco have written his own name FRANCESCO in order to spell TROUBLE?

Such is government in the land of Machiavelli, whose advice Prodi has no doubt heeded:

There is nothing more difficult to take in hand, more perilous to conduct, or more uncertain in its success, than to take the lead in the introduction of a new order of things.

Posted by David Beaver at 04:36 PM

Mitsubushi?

Passed on to me by Joe Clark, a piece from the 4/22/06 Toronto Star on Mitsubishi Canada, available here under the header "Lancer, Eclipse lead Mitsu return in Canada", in which it's claimed that many Canadians pronounce the brand name "Mit-soo-BOO-shee". To judge from the enormous number of "Mitsubushi" spellings I googled up on 4/24, it's not just Canadians; the spelling's used by English speakers all over the world.

Eventually, I checked out all sixteen ways of distributing the vowel letters I (representing [i] or [I]) and U (representing [u] or [U] or sometimes maybe a reduced vowel) across the four vowel positions in the frame M_TS_B_SH_. Of the 15 possible misspellings, all except two (U I U U and U U U U) are attested, but they come in four clusters. I U U I (Mitsubushi) is on top, along with two close contenders, then two more at about half or a third the frequency of the big guys, then a cluster of low-frequency spellings and one of very-low-frequency spellings (and then, of course, the two total losers). Even more eventually, I found a way to make some sense of these differences in frequency.

First, the Star piece, by Jim Kenzie, which begins:

   NEW YORK CITY - In a world where brand name seems to be so critical to success, Mitsubishi is fighting a multi-layered battle.

   First, in a word-association game, few Canadians recognize the name at all. Second, many who do, pronounce it Mit-soo-BOO-shee, for completely inexplicable reasons.

   Nobody calls it Toy-OO-ta, do they?

   Third, if consumers do know the name Mitsubishi, it is more likely to be for a TV or VCR than a car except for video game freaks, for whom the Mitsubishi rally car, the Lancer Evolution, has its own mythical status (and is, as yet, unavailable in Canada).

As I then reported to the ADS-L, googling on Mitsubushi pulled up a startlingly large number of raw webhits (on the order of a hundred thousand) -- from the U.S., Australia, the U.K., and India, just within the first ten hits. Then eventually a pile from Canada, South Africa, New Zealand, and Ireland. As far as I can tell, mostly with reference to cars, but sometimes with reference to electronic equipment. This is a stunningly common and widespread misspelling.

I then moved on to the other logically possible misspellings, first just using raw webhits, then going back and using the number of hits at the point where Google tells you that the rest of the hits are similar to the ones already displayed. The two counts gave essentially the same clusters and rankings, with only small differences between them. From here on, I'll use the reduced figures. And the results are...

Top dogs:
    1. I U U I (796), with U spread to position 3
    2. I I U I (767), with U moved to position 3, in exchange for I
    3. I I I I (614), with all Is

Moderate frequency:
    4. U U I I (322), with U in position 1
    5. I U I U (178), with U in position 4

Low frequency:
    6. U I I I (61)
    7. I I I U (41)
    8. U U U I (31)
    9. U I U I (29)
    10. I U U U (26)

Very low frequency:
    11. U U I U (6)
    12. I I U U (4)
    13. U I I U (3)

Total losers:
    14. U I U U (0)
    15. U U U U (0)

(One blogger reported, a propos of #9: "Then came Mitsubishi, which for some reason Israelis call Mutsibushi. and in 1990 came Honda, but only American made Hondas (from Ohio), and then the rest." I haven't pursued the claim that Israelis are given to variant #9.)

[Addendum: hint and you shall receive. In no time at all, Aviad Eliam e-mailed me with news on the Israeli-Japanese front: "Unlike the blogger, however, Mutsibushi didn't sound to me like the common error, and my Israeli friends concurred. They suggested IUUI and IIUI, two of the top three contenders in English. I went and did my own quicky google search in Hebrew" -- and found (in raw webhits) English #1 I U U I on top (2130), then #2 I I U I (948), then a drop to #8 U U U I (189) and #4 U U I I (152), another drop to #3 I I I I (37), #6 U I I I (28), and #9 U I U I, i.e. Mutsibushi (17), one each for #10 I U U U and #5 I U I U, and no hits for the rest. Roughly comparable, especially at the top, with the English data. And, though Israelis occasionally do write (in Hebrew, of course) "Mutsibushi", they don't do it at all often.]

At this point, I invite you, the reader, to pause and speculate about what might be going on here. Why these clusters?

While you're thinking, I'll argue that the problem isn't nativization of foreign words. To begin with, hardly any of the writers on the web will actually have heard "Mitsubishi" as pronounced by a speaker of Japanese. What English speakers have to go on is the spelling of the word, the knowledge that it's a Japanese name, and productions of the word by other English speakers. Even just given the first two, the word is perfectly easy to nativize into English, with the accent pattern 2 0 1 0 (where 1 is primary accent, 2 is a weaker accent, and 0 is unaccented), with the T and the S of the English spelling split between the two first syllables, and with "furrin spelling" values for the vowels (a high front (unrounded) vowel for I, a high back (rounded) vowel, or possibly a reduced vowel in position 2, for U).

Instead, my hypothesis is that the word presents a difficulty for memory and recall. It has all Is, except for one syllable, and that syllable is the least salient one in the word (accented syllables are the most prominent -- here, the third syllable has the primary accent, and the first syllable a weaker accent -- and the first and last syllables of a word are, in general, also salient, but the second syllable has no kind of salience). So: how to reconcile the specialness of the U in the second syllable with the lack of salience of this syllable?

One way involves remembering that the second syllable has U in it, but spreading that vowel onto the immediately following, accented, syllable, where it can stand out: I U U I (with the Us in the middle and the Is at the edges, in a nice pattern) instead of I U I I. This is #1, Mitsubushi.

Another way is to move the U into the accented syllable: I I U I, #2. This is a bit less faithful to the original I U I I than #1 is -- it differs from the original in two positions rather than only one -- but from a psychological point of view, it's a very likely error, since it involves recalling (correctly) that there's only one I, which is, however, misplaced by appearing in the most prominent position.

Still another way is just to forget about the U, and use all Is: I I I I, #3.

Put another way: with #1 you remember that U is important to the spelling, but get more than one; with #2 you remember that there's exactly one U, but get it in the wrong place; and with #3 you don't remember the U, because it was off in a corner in the first place.

These three solutions to the U problem are somewhat parallel to the misspellings that result for words that have one doubled and one single consonant letter in adjacent (as in corollary) or parallel (as in [Amy] Gutmann) positions. One solution is remember that there's doubling and go all the way with it: corrollary, Guttmann. Another solution is to remember that there's exactly one doubled consonant letter, but misplace it: corrolary, Guttman. (For the name of the president of the University of Pennsylvania, I'm especially prone to this error; unfortunately, I've had occasion to cite her work, and I don't always get it right.) Still another solution is to forget the doubling: corolary, Gutman.

So the top dogs make a lot of sense.

The moderate-frequency errors preserve U I in the middle of the word, but use a U in one or the other of the two secondarily salient syllables: the first syllable, in U U I I (which has the Us first, then the Is), #4; or the last syllable, in I U I U (which has a pleasing alternating pattern), #5. It even makes sense that #4 should lead #5, since the first syllable is doubly salient (by virtue of position and also accent), while the final syllable is salient only by position.

The third cluster mostly has patterns with only two of the four model vowel slots preserved, plus one pattern, #9 (U I U I), with only one preserved (though it has a pleasing alternating pattern). The other patterns at the bottom of this cluster -- #8 (U U U I) and #10 (I U U U) -- have three Us in them, which takes them very far from the model.

The fourth cluster has one pattern with two of the four model vowel slots preserved, but three Us, and two patterns with only one model vowel slot preserved.

Finally, you get the pattern that preserves NONE of the original vowels (U I U U), and the pattern that has no Is at all (U U U U). These are desperately far from the model, and, fittingly, they get no (legitimate) webhits.

Overall, we see relatively frequent errors that make sense in terms of what people are likely to remember about words and use in recall, followed by much less frequent errors that mostly look like random stabs, but nevertheless are ranked roughly in the order of their distance from the model, in terms of number of vowel slots preserved and number of Is.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 04:18 PM

The Virtues of Spam

Like most people, I hate spam, but I have to say that it is sometimes entertaining. Here are some of the names from which messages that I received today putatively came:

Toboganning T. Nucleuses
Tragedians S. Ostentation
Amnesiac J. Fewer
Possessive I.Stencils
Ibis C. Airways
Diplomacy C. Scotch
Originality H. Wronged

I particularly like the linguistic ones:

Hausa H. Beatifies
Epiglottises C. Rhapsodic

Without the middle initials, these would make good names for linguists' bands.

Posted by Bill Poser at 03:08 PM

Whorf in a bottle

Courtesy of Grant Barrett (who in turn credits fellow lexicographer Erin McKean), here's a naively Whorfian advertisement spotted along Fifth Avenue in New York.

(For more variations on the "No Word for X" trope, see the list of posts linked here.)

[Update: As is so often the case with this trope, the intent is to exoticize a faraway "primitive" land, in this case linking the "Fiji" brand of bottled water with an Edenically pristine image of an island nation unfamiliar both linguistically and culturally with "pollutants." Sadly, as Mark Liberman points out, the reality is quite different, as a Google search on {Fiji pollutant} readily demonstrates. A sampling:

The population of Suva, capital of Fiji, is about 187,000. Three kilometres west of Suva, adjacent to the sea, is a six-hectare landfill site called "Lami Dump". The site was mangrove until 40 years ago. Today, about 60,000 tones of waste from factories, shops and households are brought in annually (figures from Fiji 1 website). The waste matter has been piled up several meters high. The land lease contract with the owner has expired, and it will soon be impossible to add further waste to the site. There are no pollution prevention measures in place. An offensive odour spreads several hundred metres to surrounding homes and businesses. The site hosts flies, cockroaches and rats, which spread endemic diseases like bacillary dysentery. In addition, high concentrations of toxic substances like arsenic, cadmium, mercury and lead, and persistent organic pollutants (POPs) such as organic acids, are leaching into the soil and seawater around the site, resulting in degradation of marine water quality, destruction of habitat, loss of property values and reduced fish marketability, amongst others (UNEP Report number 174, 2000). (link)

The Kinoya wastewater treatment plant in Suva, Fiji, caters for a population of 85,000. Incoming BOD and suspended solids (SS) are approximately 450mg/L and 290mg/L with final effluent at 20-45mg/L and 30-60mg/L respectively. (link)

Fiji is fortunate in having considerable resources, timber, land, marine products, and some minerals. But in recent years their exploitation has not been sustainable, in fact they have been 'mined' for quick economic return without effective environmental and social considerations and regard for the future. This is common in marine products, forests and agriculture. The poor logging and forest management is 'mining' the timber and water catchment's resources, the ginger industry is 'mining' the soil resources, so as the sugar industry on the marginal coastal lands. Over the years, many development assistance agencies have encouraged this shortsighted approach through their focus on narrow economic targets. ...
Fiji's population growth rate is moderate, but the urban and peri-urban growth rate is high, and is clearly outstripping infra-structural planning and development. Thus it is primarily responsible for the important social issues of environmental concern, such as housing, water and sanitation. Direct regulating control of water or air pollution and monitoring are absent. ...
[Sugar] is the second largest industry in Fiji, generating approximately $300 million in revenue, through the production of 470,000 tonnes of sugar. Approximately 700km2 of land is used in growing sugar cane in Fiji. The main environmental hazards of cane fields are the 'inputs' use, fertilizer weedicides and pesticides. There are no regulations to specifically control how they are used on farms and other work places. ...
Saw milling is becoming big in Fiji, especially when there is a big demand for pine chips in Japan and pine timber for local and overseas market. There are about 20 sawmills with timber treatment facilities. 70% of the logs are chipped for sale to Japan and the remaining 30% converted to timber. In terms of solid waste, these sawmills produce about 14,000 tons of wood waste a year, which accumulates in rubbish tips that are continuously smoking from spontaneous fire. Some chemical wastes containing copper, chromium and arsenic are generated by treatment plants. ...
The results show a significant high number of contaminations ... It is of great concern that in all Divisions, some treated supplies were also contaminated. (link) ]

Posted by Benjamin Zimmer at 12:45 PM

Sentences found abandoned after joyride

Nick Montfort of Grand Text Auto has posted an essay consisting of one original question -- at least, I haven't been able to find the same words in the same order elsewhere on the web -- followed by seven sentences each of which was lifted verbatim from a different online source.

For your interpretive convenience, I've reproduced the essay below with the associated links made explicit:

Have you been following this whole Harvard student book plagiarism story?
(link) I’m sorry that this young girl, pushed by the needs of a publishing machine and, no doubt, by her own ambition, should have fallen into this trap so early in her career. [Salman Rushdie]
(link) But the fact is that although she has been manipulated and packaged, what has happened to her has very largely been her own fault. [Geoff Pullum]
(link) The thing is, it's not conventions of character and plot that Viswanathan is accused of copying, it's whole sentences of text. [Mark Liberman]
(link) But published scholarly literature is full of examples of writers using the texts, words and ideas of others to serve their own immediate purposes. [Russell Hunt]
(link) I believe it is also necessary at the outset to demonstrate how plundergraphia is distinct from plagiarism and reference, and shares little more than intention with found poetry. [Jason Christie]
(link) I think it's fair to say that most of us spend hours each day shifting content into different containers. [Kenneth Goldsmith]
(link) There is no exercise of the intellect which is not, in the final analysis, useless. [Jorge Luis Borges]

As we expect from the author of Twisty Little Passages, this collage can be read in several ways. The knotty little rhetorical twists around sentences five, six, and seven make it hard to digest the post in its literal form. I prefer to read it as a sort of implicit edited volume, perhaps the seed of a sequel to The New Media Reader, mediated by successive Google searches rather than the succession of turning pages. But this presumes that Nick chose the sentences and their sources by hand, and I suspect that he's used an algorithm that will automatically compose random Frankenessays given some suggested words and sources. Or does that matter?

In any case, it's a surprise to discover that I've co-authored a blog post with Jorge Luis Borges, not quite 20 years after his death.

[Update: I was mistaken in accusing Nick of having invented the first sentence of his essay out of whole cloth -- it too was in fact taken from a web-accessible source, namely this post by TOS at the blog Chasing the Wanderlust...

Google simply had not indexed this post at the time that I searched.

I've also changed the title of this post to the Subject line of Nick's email to me, which began "Ouch -- I hope I didn't accidentally make that first sentence up."]

Posted by Mark Liberman at 12:08 AM

May 04, 2006

And Putin makes three

No, we're not going to change our name to "Plagiarism Log". But I guess I have to add a brief reference to the third case of high-profile plagiarism that surfaced during the past couple of months. As WikiNews explained on March 29:

Researchers at the Brookings Institution, a non-profit think tank located in Washington D.C., have recently accused Russian President Vladimir Putin of making improper use of almost 16 pages and six figures from a 1982 translation of a 1978 textbook in his economics dissertation. Strategic Planning and Policy, written by University of Pittsburgh professors William R. King and David I. Cleland, was published in 1979.

The article adds the speculation that Putin may be guilty of hiring a ghostwriter and thus innocent of direct copying himself:

Many Russian government officials at the time would pay a ghostwriter for such publications, as gaining a degree could add legitimacy to one's governing policy. In that sense, it is possible Putin bought or paid for the dissertation and did not read over - in essence, passing plagiarised work off as his own, but not himself committing plagiarism. Gaddy states: "It's very clear he never wrote the thing in the first case, this is a clear diploma-mill-type operation. This is a dissertation, paid for, made-to-order."

I suspect that this sort of thing has always been commoner than is generally realized, both in the case of adult "students" with more money than time or patience, and also in the case of much younger students who make informal in-kind deals of one sort or another. These days, the internet is no doubt globalizing the trade in schoolwork through outsourcing of essays and term papers as well as more direct copying.

[Hat tip to Jim Gordon]

[Eric Bakovic reminds me that no discussion of Russian plagiarism would be complete without reference to Tom Lehrer's classic Lobachevsky:

Who made me the genius I am today,
The mathematician that others all quote?
Who's the professor that made me that way,
The greatest that ever got chalk on his coat?

One man deserves the credit,
One man deserves the blame,
and Nicolai Ivanovich Lobachevsky is his name. Oy!
Nicolai Ivanovich Lobache...

I am never forget the day I first meet the great Lobachevsky.
In one word he told me secret of success in mathematics: Plagiarize!

Plagiarize, Let no one else's work evade your eyes,
Remember why the good Lord made your eyes,
So don't shade your eyes,
But plagiarize, plagiarize, plagiarize...
Only be sure always to call it please, "research".

]

Posted by Mark Liberman at 09:48 AM

They looked on one-another & became what they beheld

When I saw the right-hand picture below, I asked myself "why are they running a picture of Dan Rather in a story about George W. Bush?" And then I realized who it actually was.

This is the exception that proves the rule: a Language Log post with no linguistic point. So I'll prolong the moment of irrelevancy by observing that this is funny. And so is this.

This one has a linguistic point after all, though I'm not sure it's strictly accurate.

Posted by Mark Liberman at 06:08 AM

Unwritten rules and uncreated consciences

Some of those who sympathize with Kaavya Viswanathan complain that it takes a lot of sophistication to locate the ethical boundary among the finely-differentiated strata of hypocrisy in this moral canyon. We tell kids that plagiarism is wrong, but we tell them that a lot of things are wrong that they see successful and respected people doing every day. How are they supposed to know that plagiarism is a sin that our culture almost always takes seriously, like murder, rather than one it usually doesn't take seriously, like extramarital sex? In fact, anyone who pays attention knows that in some cases, like political speeches and celebrity memoirs, it's normal and expected to pass the work of others off as your own.

Here's an example that cuts close to the academic bone. A couple of decades ago, X was a graduate student at Y University, a school that regularly appears in U.S. News and World Report's listing of the top 50 American universities, and not at the bottom of the list either. The school's president, Dr. Z, had a nationally syndicated column. It ran under his byline, but X helped pay her way through school by writing it. I don't mean that she edited it, or did research for it, or drafted it. She came up with the ideas, did whatever research was required, and wrote it exactly as it ran. Dr. Z approved it for publication, or at least was given the opportunity to do so, but he never changed anything. (Or so X told me, and I believe her.)

I'm sure that Y University had a policy against plagiarism, like all similar institutions. It no doubt defined plagiarism in the usual way, as "the act of using the ideas or work of another person or persons as if they were one's own, without giving proper credit to the source" or something of the sort. This definition obviously applies to hiring someone to research and write your papers for you, just as much as it applies to copying passages from a book or cutting and pasting from an online source. (And writing-for-hire is hard to detect unless the hireling squeals. I've heard of one case that was uncovered because the hireling plagiarized a term paper from online sources, and when the copying was detected by the usual means, the accused student tried to absolve herself on the grounds that the guilty party was really the person that she had hired. "I hope you throw the book at the lousy cheater", she is apocryphally supposed to have exclaimed.) In any event, if Ms. X had been caught hiring someone to write her graduate-school term papers for her, she would surely have been unceremoniously dropped from the program.

In politics, on the other hand, hiring a ghostwriter is the normal thing -- though using someone else's writing without either a contractual arrangement or an acknowledgment (one or the other) is a big problem, as Joe Biden discovered in 1998. I believe that Dr. Z must have thought of what he did as being similar to political speechwriting for hire, and not the same at all as academic cheating. He certainly took no pains to hide his arrangement with Ms. X. Everyone in his office knew about it, and there was a string of other ghosts before Ms. X and after her, who mostly knew one another. It wouldn't surprise me to learn that there were open auditions for the job. Dr. Z is the only case of a university president hiring a ghostwriter that I happen to have heard about, but I imagine that there are others.

In this same context, it's perhaps also not surprising that William Swanson so shamelessly copied other people's writing in compiling his "unwritten rules". I imagine that Raytheon has routinely paid someone to write Swanson's after-dinner speeches, his messages in the company's annual reports, and so on. It's been suggested that a ghostwriter might well have been hired to come up with the "unwritten rules". As Bill Poser has pointed out, this would be a form of poetic justice.

Posted by Mark Liberman at 05:48 AM

On the Internet Nobody Knows You are a Space Alien Lizard

(Guest post by Paul Postal)

David Beaver (DB) in his recent post about the linguistic abilities or lack thereof of animals cites the assertion by Noam Chomsky (NC) in (1):

(1) [...] if someone could show that other animals had the basic property of human language, it would be of very little interest to the biology of language, but would be a puzzle for general biology.

DB disagrees with this claim and ends up stating:

(2) Where I disagree with him is in the general principle he invokes, which seems to imply that even animals producing and comprehending grammatically correct English would be of no consequence for linguistics. Such a conclusion would be ludicrous.

Now, while I have rarely over the last decades defended any claim of NC's and have quarreled with many, the substance of (1) seems to me at least potentially sound, at least up to the but clause, which I ignore. Moreover, one notes that in concluding as in (2), DB has a bit changed the terms of reference without warning. NC's claim was about the biology of language and DB's is about linguistics. Of coure, some, including NC, largely identify these two, but such an identification has never been justified and is, I would argue, nothing but a category mistake, confusing inter alia language and knowledge of language.

Here then is how I see (1) and (2). (1) might be sound for uninteresting reasons. If one finds an animal with the same linguistic competence of humans, one with NC's point of view would be free to ‘account’ for that in the same way he proposes to ‘account’ for human competence. Namely, he could simply say that the innate linguistic organ he has posited for humans, whose nature he has never specified in any biological terms, and which has no known physical properties, **** is present in the relevant animal as well. If that account ultimately succeeds for humans, there is no reason why it would necessarily fail for the animal in question. Under these assumptions, the animal simply provides another subject, no different in linguistic essence from a human, and that would yield a situation which is genuinely linguistically uninteresting, whatever thrill it might provide zoologists. Of course, if NC's innateness views are wrong, then finding a linguistically competent animal might show something, e.g. that the general skill needed to learn a language is available more broadly than in humans.

I don't care much about (1) because I do not believe linguistics, or more importantly, its subject matter, language, have anything much to do with biology. But (2) which mentions linguistics is another matter and I consider it deeply wrong. Two thought experiments can show why the issue of nonhuman linguistic competence is essentially irrelevant to the understanding of natural language.

First, let us briefly try to give a sense of a genuine linguistic issue, call it ISS. The point will be that that ISS is such that no discovery about animal linguistic competence could bear on ISS in any way different than human linguistic competence does. Take ISS to be an issue, which exists regardless of the specific theoretical assumptions cited to illustrate it.

Seuren (1985) cited (3):

(3) John and every woman in the village want to get married.

About this, Seuren (1985: 22-23) claimed:

(4) In (10b) every woman cannot take scope over the whole remainder of the sentence; as a consequence it cannot mean that for every woman in the village John and that woman want to get married to each other. Its only possible reading is the one in which John as well as every woman in the village want to get married. This is explained in principle by the theory that quantifying into a co-ordinate structure is ruled out by the Co-ordinate Structure Constraint (Ross, 1967).

So take ISS to be the issue of what principles of language determine that the universal quantifier phrase every woman in the village in (3) has the scope that it does. According to Seuren, these principles included the coordinate structure constraint of Ross (1967) which in Seuren's framework or that of May (1985) would preclude a required quantifier lowering into/quantifier raising out of the coordinate phrase. This claim, even if right, leaves things mysterious however, since it is then not obvious how every woman in the village can have any actual scope at all. For its scope must certainly include material external to the coordinated subject, the predicate complex want to get married.

OK, we are not going to solve ISS today, since our interest is in nonhuman language possibilities and actualities. Turn then to the two thought experiments. One harks back to the bad 1983 science fiction television series V. In this, large evil lizard like space aliens try to take over the Earth. They don’t look like lizards though because of a fake outer coating which gives them human appearance. Here is the thought experiment. Suppose that one of the regular Language Log posters, say Geoff Pullum, is in fact a creature just like the V series space lizards (possibly not a new idea). One then must accept that nonhumans know and can use English just like real humans (assuming there are any). And that tells us what about ISS? Evidently, nothing. Just how could Geoff Pullum's being a space alien lizard instead of an Earth mammal offer any insight into ISS. Nor would it matter if he were a raccoon, bluebird or triceratops in human form.

The second thought experiment appeals to the known phrase "On the Internet nobody knows you are a dog", which comes from a New Yorker cartoon found on page 61 of the July 5, 1993 issue (available at http://www.jeffsandquist.com/OnTheInternetNobodyKnowsYouAreADog.aspx). The cartoon shows a dog sitting at a computer terminal with another dog in attendance and the phrase is the caption.

So simply assume the cartoon is realistic...suppose that all the messages on some website, say Daily Kos, have in fact been written by canines. That might have political implications, but as far as ISS is concerned, it means nothing. One learns and can learn in principle nothing more about ISS by the discovery that there are English knowing dogs than by the discovery that there are English knowing space alien lizards. It just doesn't matter.

There is one case of putative linguistic ability in nonhuman animals not covered by the thought experiments. Suppose one finds an animal who is shown somehow to know some variety of some hitherto unknown natural language. That would be of more interest, exactly as much as finding a hitherto unknown language known by some humans. Maybe this could contribute to linguistics, but if so, it won't be because the language is known by a nonhuman, but because the language itself can teach us something.

However, lets face it, with thousands of known languages available for study, we haven't really been able to understand that much. Why would just finding one more be likely to change things, regardless of whether that language is known by space alien lizards, canines or simply a group of people not previously contacted or known.

For what linguistics lacks is not languages to study but insight into them. And that won't be provided in any clear way by further linguistic creatures, regardless of whether they are people, gorillas or roaches.

Underlying these remarks is the view that language is entirely distinct from biology, just as mathematics, set theory and logic are. If we discover a crocodile with the same knowledge off mathematics as the best human mathematician, it won't inherently help determine whether Goldbach's Conjecture is true, and a set theoretical expert gerbil will not thereby provide any insight into the truth of the Continuuum Hypothesis.

***It is an odd organ, to say the least, that has no specifiable physical properties. But worse, NC's assumptions do not permit the hypothetical organ to have physical properties, since he claims that a human language is a state of the innate organ and is discretely infinite. Nothing infinite can be a physical organ or a state of such, on the naturalist assumption, which NC of course makes, that human bodies are physical. Hence the claim that the posited organ has something to do with biology is not serious. Real organs, e.g. livers, are all too finite. Moreover, real organs can't produce an infinite number of, or amount of, anything, e.g. bile. The bottom line is that NC's position that natural language is both to be taken as an organ state and as discretely infinite is simply incoherent.

- Paul Postal, NYU Department of Linguistics

[Postscript April 4, 06

DB (email sent in reaction to the post above):
I should make clear that the reason I think animal results could in principle have anything to do with linguistics is not because they *ought* to have anything to do with linguistics. It's because the lamentable history of our field is choc-a-bloc with people making unwarranted claims about language organs, their distinctive functional ability, their biological uniqueness in humanity etc. This is why I mockingly suggested at the end of my previous post that the new data suggests birds have a language organ but we don't.

It is because ridiculous and un-evidenced claims about biology are strewn about the field like acne on a friend's face that real genuine evidence about what animals can or cannot do *is* relevant to our field. To deny this is to be an idealist, to pretend that linguistics is unsullied by all the weird things that linguists say. But it's nice to be an idealist.

PP (return email):
We are essentially in total agreement about unwarranted biological claims in linguistics....so if all that is involved in citing animal results is debunking the relevant posturing, I am all for it.

My view is though that at a deeper level, biology has to be irrelevant, for the same reason it is irrelevant to e.g. Goldbach's Conjecture.

I doubt if anyone will care about the post....some dogmas are too deeply embedded in widespread thought to be confrontable with fact or argument. But one will see.]

Posted by David Beaver at 01:40 AM

No Room For Polyglots

I'm filling out the Canadian census form. Question 13 asks about one's ability to speak English and French. Question 14 is:

What language(s), other than English or French, can this person speak well enough to conduct a conversation?

You can answer "none", or you can fill in the names of the languages, but only two spaces are provided. How is one supposed to choose? And why the limit?

Posted by Bill Poser at 12:00 AM

May 03, 2006

Corporate Plagiarism

As Mark has pointed out Raytheon CEO William H. Swanson has been found to have plagiarized most of the book Swanson's Unwritten Rules of Management that made him a management guru According to the New York Times, of the book's 33 rules, 17 are taken, often word-for-word, from the 1944 book The Unwritten Laws of Engineering by W. J. King, an engineering professor at UCLA. Another four come from a collection of maxims of Defense Secretary Donald H. Rumsfeld published in the Wall Street Journal, and still another appears to have been lifted from humor columnist Dave Barry's book Dave Barry Turns 50. All told, 22 of the 33 were plagiarized. That's two-thirds of the book.

Raytheon director Warren B. Rudman is quoted as saying:

...the board decided, and I think properly, that there is a great difference between an unintentional error, in which you have simple negligence, and an intentional act that breaches sound ethical conduct...Based on the evidence, we decided that this was unintentional and not negligent.

According to the Times:

Raytheon issued a statement on Mr. Swanson's behalf that said the source for his book came from material he had collected over the years and had given to a member of his staff to prepare a presentation that he was to give to Raytheon engineers.

This seems to me to be even more damning. If someone takes notes on reading and later uses that material without attributing it, it is indeed possible that what he has done is unintentional. The notes may have been sketchy or hard to read. This is apparently what happened in several recent cases involving scholars who used research assistants to assemble material for their books. It's unfortunate, but there is arguably no intentional misrepresentation. The explanation given by Raytheon is the opposite of this: Swanson assembled the source material and an assistant used it to prepare a presentation that, apparently, turned into the book. If we don't assume that the presentation became the book, Raytheon's explanation explains nothing.

What is left unsaid but seems to be the only possible inference from the Raytheon statement is that Swanson didn't merely engage in plagiarism: he didn't write the book at all. His assistant wrote it. According to this blog post, the book is only 40 pages long, so going from a presentation to the book was not a huge leap. If he forgot where his rules came from, he may have been merely negligent rather than an intentional plagiarist, but he must have known that he was passing off a book that he didn't really write as his own. Surely this is far worse than anything Kaavya Viswanathan may have done.

Posted by Bill Poser at 11:49 PM

Sprinkled under the radar

Back in March, Joel Wallenberg e-mailed me a stunning antedating of the GoToGo construction -- as in "She's going to San Francisco and talk on firewalls" 'She's going to go to San Francisco and talk on firewalls', with the go of prospective be going to and motional go (in the GoAndVP construction) telescoped into a single word going -- from the 1920s back to 1864. Chris Waigl then supplied examples from early in the 20th century. So it looks like GoToGo has been around for quite some time, but without attracting attention or eliciting comment, until Charles Hockett noted "the recent colloquial pattern I'm going home and eat" in his 1958 textbook (p. 428). It lasted a century or so completely under the radar, and (so far as I can tell) got only this one blip until close to the end of the 20th century, when David Denison (and later Laura Staum and I) began studying it. Nobody even complains about it. How could the construction maintain itself over such a long period of time without being noticed?

Let's start with the Wallenberg e-mail of 3/16/06, in which he quoted from an August 7, 1864 column Mark Twain wrote for the newspaper the San Francisco Morning Call:

This was a touching allusion to his repeated assertions, made at divers and sundry times during the past few years, that he was going off immediately and commit suicide.

And then the Waigl examples, all from quoted (but fictional) conversation:

1904, The Outdoor Girls at Rainbow Lake by Laura Lee Hope: "That's so," admitted Grace. "And Mollie didn't guess right. I beg your pardon, Mollie. It's so warm, and the prickly heat bothers me so that I can hardly think of anything but that I'm going in and get some talcum powder. I've got some of the loveliest scent--the Yamma-yamma flower from Japan."

1907, Two Boys and a Fortune by Matthew White, Jr.: "What, you're not going off and leave Harrington, are you?" asked Atkins.

1909, THE GOLD HUNTERS. A Story of Life and Adventure in the Hudson Bay Wilds by James Oliver Curwood: "Wabi, I'm going back," he cried softly, forging alongside his companion. "I'm going back and follow the other trail. If I don't find anything in a mile or so I'll return on the double-quick and overtake you!"

c1913, The Boy Scout Camera Club. The Confession of a Photograph by G. Harvey Ralphson: "I'm going right down stairs and pack my camera!" Jack Bosworth, of the Black Bear Patrol, declared.

Then there is one in Sherwood Anderson's Winesburg, Ohio (1919) and one in F. Scott Fitzgerald's The Great Gatsby (1925) and quite an assortment (collected by Denison) in movie dialogue. If you look hard enough, you can find attestations all through the 20th century. And if you ask people for judgments (in a careful way, as Staum did) you'll find a significant minority, maybe as many as 20%, who have little or no trouble with GoToGo sentences. (I am in this minority, and was stunned to discover, five years ago, that most other people judge GoToGo sentences to be flatly unacceptable. Most, but not all, of my colleagues in linguistics at Stanford. My own daughter, even! Oh, sharper than a serpent's tooth!)

So you can get unacceptability judgments when you explicitly ask people to judge sentences. And every so often a linguist, or someone else unusually attentive to the details of language, notices that there's something remarkable about GoToGo sentences: they coordinate a finite VP with a non-finite (base-form) VP. Otherwise, GoToGo sentences escape notice. Why?

First, GoToGo sentences are rare, for two reasons. One reason is that most speakers don't produce them at all. The other reason is that, even if you're a GoToGo speaker, there just aren't that many occasions when you want to put together all the parts of a GoToGo clause: a clause making a future assertion with be going to (rather than will), about motion, with the specific motion verb go and with an expressed goal for that motion, and with the motion seen as the first part of a two-part event, the second part of which is also explicitly expressed (by a coordinated VP). You could go for weeks or months without wanting to express such a thought in such a way.

Now, suppose you're NOT a GoToGo speaker. Every so often -- maybe every few months, or even more rarely -- an example will come up. The intentions of the speaker or writer will be clear, and the sentence will be quite close to what you might have produced yourself (with be going to or will plus motional go in the GoAndVP construction). You probably will unconsciously take the example to be a minor inadvertent error and silently "fix" it in processing, the way people do with most speech errors. So you won't notice anything. (There's an alternative response, which I'll take up in a moment.)

From the other side, GoToGo speakers won't notice that you don't use the construction, because they understand the alternatives you do use, and because people don't notice small gaps -- or, sometimes, large gaps -- in other people's productions. (If you're a need/want+Ved speaker -- "The garden needs watered" -- you can go for decades without realizing that lots of other people don't use this construction, ever.)

The result is that the construction can stay under almost everybody's radar, for any amount of time.

But how is it maintained from generation to generation? Maintenance depends on the occasional person's having a response other than tacit correction to an occurrence of GoToGo. It is, of course, possible for people -- young children, in particular -- to take this occurrence as evidence that the language has a construction (with non-parallel coordination) that they hadn't come across before; this way, the stock of GoToGo speakers can be constantly replenished, though it will stay small.

(It's also possible for the construction to be created afresh, by some kind of analogy, as Hockett suggested, or by telescoping, as I suggested. If the 40-year gap between 1864 and 1904 isn't eventually filled by the kind of attestations we see from 1904 on, then we'll have to assume at least two independent innovations. But others could have occurred. There's really no way to tell.)

Since occurrences of the construction are so rare, we can expect that its spread to new speakers will be close to random, with few if any associations with social groups or contexts; normally, there just won't be enough examples for a learner to posit any sort of pattern in who uses the construction and on what occasions. The construction will be (very lightly) sprinkled across the social landscape. As far as I know, this is the case.

In general, very low-frequency constructions can be expected to show social sprinkling. It's not inevitable -- there could be a kind of in-group fashion for a particular construction -- but it's very likely.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 06:32 PM

Busting he(a)ds at the Express-News

We've complained in the past about inane punning headlines in newspapers and magazines, but now one editor is doing something about it. Via Nicole Stockdale's A Capital Idea comes news that Robert Rivard, editor of the San Antonio Express-News, has instituted a ban on puns in his paper's headlines. According to a piece by Express-News public editor Bob Richter, the decision came after Rivard flipped through the April 20 edition and found no less than nine puns in the headlines (or "heds," as they're known to journalists). Rivard then sent a stern email to his news editors, making it clear that the pun ban is no joke. Richter quotes from the email as follows:

"It's a shame to see the good work of so many disparaged because of the immaturity of a few headline writers who seem more focused on peer approval than on producing a quality newspaper for the community. ...
"I am prepared to take disciplinary action against our most senior headline writers and editors if my order is not respected. I do not want to be the editor of a newspaper where we limit the creative use of language ... I want even less to be the editor of a newspaper riddled with puns."

Richter's article clarifies that reporters aren't the ones responsible for the ~~pundemic~~ outbreak of paranomasia in the newsroom:

A reporter whose byline appears on a story does NOT write the headline. Headlines are written by copy editors, who battle deadlines to clean up or rewrite the reporter's copy, massage it, then crown it with a spiffy headline.
Generally, copy editors have small egos. Many are former reporters who prefer the relative solitude of the newsroom late at night to the limelight of chasing news. Their goal is to, anonymously, make a story better.
Copy editors are underappreciated, taken for granted and typically go unnoticed unless they "bust a hed," miss a factual or grammatical mistake, or — worse — insert an error in their editing. Now, puns are a no-no for them as well.

I dislike lazy puns in headlines as much as the next reader, but this sort of blanket prohibition is clearly over the top. As Nicole Stockdale writes, "There are occasional strokes of brilliance where good word play perfectly fits the tone of the story, where it adds nuance that a straight hed wouldn't." The Editoress at Words At Work agrees, and she further points out that not all of the offending headlines are puns in the narrow technical sense. I've arranged the five heds cited by Richter in decreasing order of punniness, understood in the traditional sense of paranomasia (a playful use of words that sound alike but differ in meaning):

"Old well ends well: River Walk threat wiped out"

"Bell's name doesn't have a familiar ring for many voters"

"(Pope) Benedict names a flock of new cardinals"

"Mumps outbreak swells"

"Border violence killing tourism"

"Old well ends well" is clearly a pun (and the most creative of the five headlines), as it plays on the saying "All's well that ends well" via the phonetic similarity of old and all, along with the polysemy of well. The second hed relies only on polysemy, combining the surname Bell with the idiomatic sense of a name "ringing a bell" or "having a ring to it." The third example playfully suggests the avian sense of cardinals by using the collective noun flock. And the last two examples don't rely on puns so much as figurative extensions of verbs to match the subjects they predicate. An outbreak of mumps swells (figuratively) just as the disease itself causes (literal) swelling, and border violence kills tourism (figuratively) just as the violence itself causes (literal) killing.

Though these headlines might not all be puns per se, they do suggest a certain light-heartedness that may well be inappropriate for the topics at hand. The last example strikes a particularly incongruous tone for such a serious matter — so much so that I wonder if the use of the word killing was intended only in its figurative sense, perhaps inserted by an overworked copy editor not mindful of the tactless suggestion of literal killing. Would the headline writer face disciplinary action from Rivard even if the wordplay was inadvertent? I sympathize for the Express-News copy editors who will now have wrathful hed-hunters looking over their shoulders.

(See A Capital Idea for links to further discussion about when punning headlines work and when they don't.)

Posted by Benjamin Zimmer at 05:26 PM

Bush saved by his own bi-ignorance?

The "Nuestro Himno" imbroglio continued today, with the Washington Post reporting on a tidbit that would seem to undercut President Bush's stated opposition to "The Star-Spangled Banner" being sung in a Spanish translation. As first mentioned on the ThinkProgress blog, Kevin Phillips' book American Dynasty contains this passage about Bush's 2000 presidential campaign:

When visiting cities like Chicago, Milwaukee, or Philadelphia, in pivotal states, he would drop in at Hispanic festivals and parties, sometimes joining in singing "The Star-Spangled Banner" in Spanish, sometimes partying with a "Viva Bush" mariachi band flown in from Texas.

(ThinkProgress also initially asserted that John Secada sang the anthem in Spanish at Bush's first Inaugural, but it turns out that was based on faulty reporting from the Cox News Service: Secada actually sang "America the Beautiful" in Spanish.)

At a press briefing, Scott McClellan said he couldn't recall whether Bush ever sang a Spanish version of the anthem on the campaign trail, and the Post cites other unnamed sources skeptical of Phillips' account:

White House spokesmen and former campaign operatives said they could not recall whether that happened, though given the level of Bush's Spanish proficiency, they seemed dubious.

So Bush's ineptitude in Spanish might rescue him from a charge of hypocrisy on this issue. All of this recalls Texan pundit Molly Ivins' remark that Dubya is less bilingual than "bi-ignorant."

[Update, 5/4/06: The ineptitude defense continues: Scott McClellan says, "The president can speak Spanish but not that well. He's not that good with his Spanish."]

Posted by Benjamin Zimmer at 03:02 PM

Viswanathan v. Swanson

On April 20, 2006, Carl Durrenberger documented on his blog the fact that roughly half of the 33 "Unwritten Rules of Management" published by William H. Swanson, the CEO of Raytheon, were lifted verbatim and without attribution from W.J King's 1944 work "The Unwritten Laws of Engineering", published by the American Society of Mechanical Engineers.

On April 23, 2006, David Zhou documented in the Harvard Crimson that "Kaavya Viswanathan’s ‘Opal Mehta’ includes several passages nearly identical to another author’s earlier work".

The search {Viswanathan plagiarism} gets 143,000 hits on Google, and 757 on Google News (is that all? it seems like it should be millions...). In contrast, {Swanson plagiarism} get 42,300 hits on Google, and a mere 32 on Google News.

As David Leonhardt points out in today's NYT ("Rule No. 35: reread rule on integrity"),

That is too bad, because in the scheme of things his character matters a lot more than hers does.

[Update: Leslie Wayne's update, "Raytheon Punishes Chief Executive for Lifting Text", cites The Boston Herald as documenting that

Mr. Swanson had also lifted the first four of his rules from a 2001 Wall Street Journal article written by Defense Secretary Donald Rumsfeld that was called "Rumsfeld Rules." For instance, the first rule in both Mr. Swanson's book and Mr. Rumsfeld's article is the same -- "Learn to say 'I don't know.' If used when appropriate, it will be used often."
The Herald also found that Mr. Swanson's rule No. 32 is similar to a life lesson about rude treatment of waiters that was contained in the book "Dave Barry Turns 50," written by Mr. Barry, a syndicated humor columnist.

I believe that makes 21 out of 33 rules so far found to be plagiarized.

The punishment? Well, depending on how you look at it, either a million dollars or not much:

The company decided to freeze Mr. Swanson's 2006 salary at its 2005 level and cut Mr. Swanson's 2006 restricted stock award by 20 percent. In 2005, Mr. Swanson received a salary of $1.12 million and a restricted stock award of $2.96 million. A person close to the board, who was granted anonymity because he was disclosing proprietary information, said the reductions amounted to about $1 million in lost compensation.

Since 20% of $2.96 million is only $592K, I guess that means they were planning to raise his base salary by about $408K. Tough times, Bill.

Swanson is a graduate of Cal Poly, whose policy on Academic Dishonesty defines plagiarism as "the act of using the ideas or work of another person or persons as if they were one's own, without giving proper credit to the source", and asserts that

The penalty for cheating requires an "F" course grade and further attendance in the course of [sic] is prohibited.

It doesn't say anything about a 20% reduction in stock awards, though. That's harsh.]

Posted by Mark Liberman at 11:32 AM

Imaginary debates and stereotypical roles

I'm impressed by Andrew O'Hagan's achievement, as documented by Arnold Zwicky, in blowing a hyphen up into a catfight. O'Hagan spun the divergent spelling of two dictionary entries -- "bling-bling" in the Oxford Dictionary of English vs. "bling bling" in the Chambers Dictionary) into a whole Jerry Springer segment: "Out came the lipstick, out came the knives, as the great lexicographers of today rolled their eyes at one another and balanced their inky fingernails on their slender hips."

Concocted debates are an old journalistic technique, but usually there's at least a desultory attempt to lead the participants into saying things that can be quoted out of context to suggest a substantive disagreement. Back in the Bronze Age, when I was working at Bell Labs, a Radio Personality came to interview some of us for a story about something or other. The only thing I remember about the result is that I wound up participating, on the air, in a vivid debate with my friend and colleague Cecil Coker in which we appeared to disagree fairly sharply on a topic that in fact we mostly agreed in being uncertain about. And curiously, though the RP spent half a day interviewing a half a dozen of us at length, the interviews had all been individual.

Thinking back over the experience, we realized that the RP had approached us from opposite sides of the question, and then stitched together bits of our answers. The interview technique was roughly like this:

RP: So, in short, we can say that it's now apparent that EITHER?
Me: Well, the answer isn't clear. To be fair, there's quite a bit of evidence pointing toward OR, such as X, Y and Z. At least some people think that way, though I don't find the arguments very convincing myself; and P and Q do seem to point towards EITHER.

[ . . . ]

RP: As I understand it, a lot of people have concluded that OR.
Cecil: Well, some people think so, but I'm not convinced. EITHER seems much more likely to me, because of P and Q.

and the broadcast "conversation" then consisted of a series of "exchanges" like this:

Me: There's quite a bit of evidence pointing towards OR, such as X, Y and Z.
Cecil: Some people think so, but I'm not convinced. EITHER seems much more likely to me, because of P and Q.

This made for much more interesting radio. Well, maybe marginally more interesting radio -- I suspect that the RP was at his wits' end trying to figure out how to make all that rambling EITHER/OR stuff interesting for his listeners, and figured that embodying our uncertainty as a concocted debate would at least personify the alternatives. The resulting piece would still never have made anyone's "best of" list, I'm sure, though the RP has since gone from strength to strength, and regularly appeareth today towards the low end of the radio dial.

Anyhow, Andrew O'Hagan managed the same sort of trick without actually interviewing anyone, or even finding any quotable differences of propositional content. He did it all with a single little hyphen. I think this must be some kind of record for journalistic inventiveness.

Arnold Zwicky thinks that O'Hagan did this because of the "NYRB tradition of lapidary disparagement". I'm not so sure: many NYRB articles are more like Russell Baker's lovely review of Stephen Miller's "Conversation: A History of a Declining Art", which starts this way:

The conversation was good on the raft that carried Miss Watson's Jim and Huckleberry Finn down the Mississippi. With quiet evenings darkening over the river the talk drifted whimsically, as good conversation should. The earning power of kings was discussed, and the misfortune that required Frenchmen to talk in French. Social problems were explored: Wouldn't the racket of quarreling wives and colicky children in a fully populated harem make a husband's life intolerable?

A fine exercise in philosophical speculation took place when Jim challenged the received opinion about the wisdom of King Solomon. As Mark Twain tells it, Jim not only questioned the very nature of wisdom, a question worthy of Socrates, but also lightened this ponderous exchange with tongue-in-cheek raillery. Solomon's famous proposal to cut a child in two and give half to each of two women who claimed to be its mother was proof that Solomon lacked good sense, Jim said, for "what use is a half a chile?"

These discussions between two socially disreputable Americans—a runaway slave and a seldom-washed boy —may seem at first glance not at all what Stephen Miller has in mind in his meandering and entertaining essay on "the art of conversation." Miller lavishes a great deal of attention on Europeans of the powdered-wig era and this, combined with his frequent references to an "art" of conversation, may leave art-shy readers with the impression that good talk is strictly for the elite. Not so. Huck and Jim—and who could be less elite?—enjoy some of literature's memorable conversation by intuitively following principles laid down by masters of the art.

What a contrast.

Arnold has clearly identified O'Hagan's "organizing figure", namely that "lexicographers are unpleasantly feminine -- shrill and trivial if women, shrieking, prancing queens if men". However, I think that O'Hagan's decision to lead with this figure needs more explanation than "a tradition of lapidary disparagement". I'm reluctant to conclude that O'Hagan is a gratuitously nasty person, though British intellectuals sometimes seem that way to Americans. In my opinion, the blogger A White Bear (at Is There No Sin In It?) gets it right:

Zwicky concludes that O'Hagan was just trying to fit a superfluous stab into his review, since that's the way NYRB articles tend to start, but I think the problem is much older and more entrenched. This problem is made clear in the engraving that accompanies O'Hagan's review. It shows Apollo and the Muses, all young, sexy, and alluringly clad, whipping Dr. Johnson (old, oddly short, pale, fat, humiliatingly nude, wearing a dunce cap and looking unpleasant) around Parnassus.

Throughout modern culture, there are thousands of literary examples of the impotent, effeminate male scholar and the frigid, purse-lipped female scholar. The creative writer, however, whether male or female, is fertile, gregarious, sexually charged, and powerful (even if a total asshole). [. . .] Even Nabokov, who was himself quite a literary and entomological scholar, depicts all of his academic characters (Pnin, Humbert, Kinbote) as impotent, pedophilic, or homosexual, and often crazy, lonely, and unlovable, while his more creative characters (Shade, Quilty, etc.), even when evil, are potent, active, and surrounded by adoring friends and family.

In other words, O'Hagan felt compelled to frame his review in terms of the humanistic version of the stereotypic jock/nerd opposition. Was this because of some sort of secondary-school psychodynamics, in which the football team's equipment manager takes the lead in teasing what Stephen Colbert called "the brainiacs on the nerd patrol"? Was it because O'Hagan felt that what he had to say about lexicography would bore his readers if he didn't find a way to sex up the lead?

I'm not sure. But in the end, I'm not all that interested in finding out. I'd rather talk things over with Huck or Jim, or Arnold and the anonymous White Bear blogger. Our conversations might be virtual, but they've mostly got the characteristics that Russell Baker identifies as "classic conversational etiquette":

Both participants listen attentively to each other; neither tries to promote himself by pleasing the other; both are obviously enjoying an intellectual workout; neither spoils the evening's peaceable air by making a speech or letting disagreement flare into anger; they do not make tedious attempts to be witty.

The blogging format tends to encourage speechifying, I guess; but otherwise, the people that I respect come out pretty well according to this standard of evaluation.

Posted by Mark Liberman at 08:08 AM

May 02, 2006

New and old stuff on animal grammar

Carl Zimmer has a nice article in today's Science Times under the headline "Starlings' Listening Skills May Shed Light on Language Evolution", explaining clearly what the Gentner et al. experiments actually were, and what they might mean. Anyone who is even moderately interested in this topic ought to read his discussion.

For those who want to go into some of the issues at greater length, I'll reproduce here a recent email from Mark Seidenberg, who wrote that:

I am so much reminded of the older ape language research, in which it was possible to claim that chimpanzees could "name objects" if the task was defined in a restrictive enough way.

In reference to an earlier LL post on pattern-learning experiments on human infants and monkeys, which quoted from a letter to Science that Mark co-wrote with Jeff Elman, Mark observed:

I hope your earlier commentary helped clarify for people that there isn't much at stake if one is talking about rules and statistics in broad terms. We tried to make some similar points in the Seidenberg, MacDonald and Saffran commentary on a paper by Pena et al. that attempted to establish limits on "statistical learning". [Seidenberg, M.S., MacDonald, M.C., and Saffran, J.R. (2003). Are there limits to statistical learning? Science 300, 53-54. A subsequent exchange of letters with Gary Marcus and Iris Berent is here.]

Those references underline the important point that recent articles about monkey and starling pattern-learning have roots in earlier studies of human infants. To some extent this excuses the tendency towards overinterpretation -- similar tendencies can be seen in the earlier infant literature. On the other hand, most of the issues now being debated with respect to the performance of starlings and monkeys have already been debated with respect to the performance of human babies; so maybe there's that much less reason for unclarity now.

In any case, whether the research subjects are humans, monkeys or birds, I'd like to see less focus on over-simplified all-or-none hypotheses like "species X has (or doesn't have) recursion", and more focus on understanding what biological pattern processing really is and how it really works, in general and also the (perhaps different) specific cases of particular species dealing with particular kinds of patterns for particular purposes. This is a harder and less superficially glamorous program, but it's more likely to lead to durable progress, in my opinion.

Mark continues

I don't agree that rules and statistics are indistinguishable under all definitions of the terms. They are just indistinguishable in the vague way people have used the terms in certain contexts (e.g., the Marcus study to which we responded).
There are people who have tried to assign specific properties to rules, which differentiate them from "statistical" procedures. Certainly Pinker and Marcus come to mind. Rules are thought to be learned by different procedures than "statistics," represented in a different part of the brain, on a different developmental time course, etc. I don't think these claims turn out to be accurate, but they tried to identify unique properties of rules. Other people have also tried to do this; see Smith Langston & Nisbett (1992, Cognitive Science). I think the attempt is valid but the properties they assign to rules (e.g., not being sensitive to frequency or similarity) don't apply to people's behavior except under a very severe idealization that ignores this information (under "colorless green ideas" reasoning).
The term "statistics" on the other hand, as in "learning statistical patterns" has been completely mangled in the psychology literature. The influence of the original Saffran et al. study has been so great that researchers in child language have equated "statistics of the input" with "transition probabilities" between syllables, which is what they manipulated in the original study. So, to disprove the "statistical learning" hypothesis, you show that a species (humans, whomever) is able to learn patterns for which these particular statistics are uninformative.
The problem is that statistics always have to be over something, and things like syllables are not a priori but rather may themselves arise from statistical learning procedures. Statistics all the way down perhaps. The confusion about this is deep; there have been several exchanges in journals like Science, and others that could have occurred except that I decided to stop responding to every case. But, see recent papers from Jacques Mehler's group on this (e.g., Bonatti et al., Psychological Science, 2005, 451, and Pena et al., Science 2002, 298, 604, to which we did reply). I hope your postings on these issues are widely read; I am assigning them this week in my seminar.

For the convenience of readers who still haven't had enough, here are links to our recent starlings coverage:

The race to the bottom in science reporting (4/26/2006)
Starlings (4/27/2006)
Starlings linguists language loggers readers follow commented on the work of studied are damn smart! (4/28/2006)
Separating species with bullets (4/28/2006)
Wild? I was livid! (4/29/2006)
Can you speak in rhinoceros? (4/29/2006)

And links to earlier Language Log posts on related topics:

Language in Humans and Monkeys (01/16/2004)
Hi Lo, Hi Lo, it's off to formal language theory we go (1/17/2004)
Cotton-top tamarins: on the road to phonology as well as syntax? (02/09/2004)
Humans context-free, monkeys finite-state? Apparently not. (8/31/2004)
Homo hemingwayensis (01/09/2005)
JP versus FHC+CHF versus PJ versus HCF(08/25/2005)
Rhyme schemes, texture discrimination and monkey syntax (02/09/2006)
Learnable and unlearnable patterns -- of what? (02/25/2006) ]

[Carl Zimmer is (Language Log contributor) Ben Zimmer's brother]

Posted by Mark Liberman at 01:27 PM

Hopeless black holes at the New York Times

I sympathize with Kenneth Chang. According to his 5/2/2006 NYT story "Black Holes Collide, and Gravity Quivers":

Einstein's theory of general relativity changed the idea of gravity from a simple force dragging apples from a tree to a puzzle of geometry. Imagine a rubber sheet pulled taut horizontally and then tossing a bowling ball and a tennis ball onto it. The heavier bowling ball sinks deeper, and the tennis ball will move toward the bowling ball not because of a direct attraction between the two, but because the tennis ball rolls into the depression around the bowling ball.

If you follow the hyperlink that the Times helpfully provide for the word depression, you'll learn that

Depression, a mental illness, is marked by feelings of extreme sadness, hopelessness, and inadequacy. Individuals often experience disturbed sleep and weight change. Most people who commit suicide suffer from depression.

Such are the fruits of automatic word-based hyperlinking. If I wrote for a publication that tried to be hip by automatically sprinkling irrelevant hyperlinks here and there in my text, I'd be... well, looking on the bright side, I guess I'd be amused.

My sense of humor might be tested, though, if they started selling those word-based hyperlinks, the way the [Beirut] Daily Star does. When Rami G. Khouri wrote his (serious and interesting) 4/29/2006 opinion piece "The meaning of a simple passport renewal", I doubt that he anticipated the effect on readers of the prominent mouse-over pop-ups on words like "personal" and "affairs":

The single most widespread cause of personal annoyance and political resentment by ordinary citizens throughout the Arab world is this: the sense that your Average Ahmad citizen is not treated fairly or decently by his own government and society, but rather suffers the ignominy of corruption, abuse of power, favoritism, disdain, humiliation, and institutionalized discrimination in the pursuit of the most routine and uncomplicated affairs, like renewing a passport.

"Personal", of course, flashes an on-line personals service, while "affairs" invites the reader to "Meet Local Women Seeking Affairs" via an agency that "specializes in meeting the distinct needs of attached and married adults with unmet needs who wish to meet attached people".

In effect, Mr. Khouri's article has been defaced by spam graffiti, courtesy of his own employer's advertising department.

It's hard for me to believe that an American newspaper would do such a thing. Then again, in light of the recent shareholders' attack on the 47% decline in New York Times Company's share price, I can't help but note that the Gray Lady is failing to tap a significant source of revenue. The company (like the reader) gets nothing from the irrelevant hyperlink on depression in Chang's article -- I'm sure there are some mail-order pharmaceutical companies who'd pay good money for that link.

More seriously, you could do this sort of thing in a tasteful and effective way. You could start with some simple term-tagging and sense-disambiguation technology, to match the ads to the actual content of the articles, rather than using stupid keyword matching. You could add some statistical social-network technology to connect ads to an individual's likely needs and interests, as amazon and others have done. And you could imitate Google's sedate, unobtrusive and peripheral placement of the ads, rather than defacing the article text with obnoxious mouseovers, popups and other cyber-graffiti.

A cynical view of advertising history suggests that the stupidest and most obnoxious outcomes are also the most likely ones. But maybe this time will be different.

Added by GKP: before 11 a.m., the Times had caught that link on "depression" and had removed it. So they're on top of things. Perhaps they read Language Log over at the Gray Lady.

Posted by Mark Liberman at 08:38 AM

At least I hope he's not acting alone

I have somewhat more than a passing interest in Bolivia; this story in today's NYT came with a linguistically interesting headline:

Bolivian Nationalizes the Oil and Gas Sector

While it's true that a single Bolivian -- President Evo Morales -- is ultimately responsible for the nationalization of the oil and gas sector, I don't think I've ever seen a case like this where the president is not referred to by title (e.g., President Morales, Bolivian President, etc.) or metonymically in some way (e.g., Morales Administration, Bolivian Government, Bolivia, etc. -- compare the title of this other NYT article). This could just be a very bad typo, but the effect is underhandedly disparaging of this highly controversial decision (which fulfills an important campaign promise, one of several that have been of concern to the international community), and I do wonder if it was somehow intentional.

[ Comments? ]

Posted by Eric Bakovic at 12:35 AM

May 01, 2006

Oxford English Corpus: infested with eggcorns!

The billion-word Oxford English Corpus continues to make news, though thankfully no longer under the farcical headline, "English Language Hits 1 Billion Words." Now we get this dire headline from the Guardian: "Internet culture spells doom for strait-laced orthographers." The opening paragraphs elaborate the theme of linguistic degenerationism:

If you believe the internet is the fount of all wisdom, giving free rein to bloggers to exercise their vocal cords, think again. Ancient English cliches and expressions are being mangled by the culture of cut and paste and the spread of unchecked writing on the internet.
According to the Oxford English Corpus, a database of a billion words, dozens of traditional phrases are now more commonly misspelled than rendered correctly in written English.

Though the Guardian article doesn't say so explicitly, the common misspellings taken from the Oxford English Corpus are all semantically motivated — in other words, they're eggcorns.

As it happens, every example given by the Guardian has already been entered into the Eggcorn Database, maintained by Chris Waigl. (They also can be found in Paul Brians' comprehensive site, Common Errors in English.) Here's more of the article, hyperlinked to the appropriate Eggcorn Database entries:

"Straight-laced" is used 66% of the time even though it should be written "strait-laced", according to lexicographers working for Oxford Dictionaries, who record the way English is spoken and written by monitoring books, television, radio and newspapers and, increasingly, websites and blogs.
"Just desserts" is used 58% of the time instead of the correct spelling, "just deserts" (desert is a variation of deserve), while 59% of all written examples of the phrase in the Corpus call it a "font of knowledge or wisdom" when it should be "fount".
It has become so widely used that the wrong version is now included in Oxford dictionaries alongside the right one.
Other mistakes fast becoming the received spelling include substituting "free reign" for the correct phrase, "free rein".
The original refers to letting a horse loose, but many use "reign" and assume the expression means to allow a free rule.
Other examples of common mistakes include "slight of hand" instead of "sleight", "phased by" when it should be "fazed by", "butt naked" instead of the correct "buck naked" and "vocal chords" for "vocal cords."

All of these examples are marked in the Eggcorn Database as "nearly mainstream," so it's nice to get corroboration from a corpus more reliable than Google's. Whether you consider the mainstreaming of eggcorns to be a simple matter of language change in progress or something more nefarious depends on your perspective, of course. Catherine Soanes of Oxford Dictionaries takes the lexicographical long view as an antidote to the Recency Illusion:

"We have to accept spelling is not fixed and can change over the years," said [Soanes]. "You only have to look back 100 years, when the word rhyme was spelled rime. But since then we adopted rhyme as the correct spelling because this is more like the Greek word from which it originally came."
She added: "Our Corpus has around 150m words from the web and the way words are written often has to do with familiarity.
"For instance, 35% of people say 'a shoe-in' when actually it should be 'a shoo-in'.
"But the original is an American phrase using a US version of the word shoe in the first place."

(I'm unclear what Ms. Soanes means by the "original" form of shoo-in. The OED derives shoo-in from the verbal phrase to shoo in, and the verb shoo is in turn derived from the interjection shoo! used to drive away animals or intruders — similar forms include German schu, French shou, and Italian scioia. I don't see anything suggesting a derivation from the word shoe, unless she means that the eggcorn variant shoe-in has been present from early on in American usage.)

The Guardian article squarely lays the blame for this rampant eggcornification on "the culture of cut and paste," and particularly on "the spread of unchecked writing" found on the godforsaken Internet. Note, however, that only about 15 percent of the Oxford English Corpus (150 million of the billion words) is gleaned from online usage, so that can't be the whole explanation. As we've seen in previous cases of hell-in-a-handbasket reporting on the degeneration of the English language, the Internet (and especially the discourse of young people on the Internet) is always an easy target for condemnation, since it brings into easy reach a whole range of non-standard usage. Because we're reading so much more text that has not been professionally edited, what may previously have risen only to the level of pet peeve may now appear as a grave threat to the future of the language.

Getting past the doom-and-gloom soothsaying, the Guardian article highlights some other findings from the Oxford English Corpus:

According to the Corpus, another linguistic trend is the American habit of turning two words into one, such as someday, anymore and underway.

I'm not convinced that "turning two words into one" is a particularly "American habit," or at least not as the phenomenon has been illustrated by the Guardian. For example, the earliest example given by the OED for single-word someday is from George Bernard Shaw in 1898, while the single-word form of underway was popularized in nautical usage on both sides of the Atlantic, as far as I know. And the use of single-word anymore has more to do with its development as an adverb meaning 'nowadays.' Even the dialectally distinct "positive anymore" is not particular to American usage, as the OED also marks it "Irish English" (as in the citation from Tom Murphy's 1961 play "A Whistle in The Dark": "We'll squeeze Michael a bit. He'll chip in anymore").

And finally:

The Corpus also records how some words are used almost exclusively to apply to men and others to women.
Only men seem to hijack, crouch, kidnap, rob, grin, shoot, dig, stagger, leap, invent or brandish.
Women, meanwhile, tend to be the only ones to consent, faint, sob, cohabit, undress, clutch, scorn or gossip.

There is no doubt some interesting gender-based variation to be found in the corpus, but as with the other findings the Guardian's framing is a bit dubious. Only men (or women) commit the actions of those verbs? That's a wild overstatement, but a bit of Googling corroborates large gender disparities when comparing the simple collocations "he + V" versus "she + V" for the verbs given. It will be quite intriguing to see the actual results from an analysis of the Oxford corpus, rather than having to rely on the unfortunate simplifications given in the Guardian article and other glib media accounts.

[Update: The Guardian's technology blog links to the article in an entry titled "Watch your language &mdash most of you are wrong." Comments are already turning nasty.]

Posted by Benjamin Zimmer at 07:21 PM

The bitchiness of lexicographers and NYRB reviewers

Andrew O'Hagan's review of Henry Hitching's Defining the World: The Extraordinary Story of Dr, Johnson's Dictionary, in the New York Review of Books for 4/27/06, leads (on p. 12) with a pretty solid slam:

If you keep an eye on them, you might notice that dictionary-makers are marginally bitchier than runway models.

O'Hagan is keeping up a NYRB tradition of lapidary disparagement here. The first sentence introduces the organizing figure of this paragraph, that lexicographers are unpleasantly feminine -- shrill and trivial if women, shrieking, prancing queens if men. The bitchiness of lexicographers is illustrated by what O'Hagan presents as an exchange between two recent dictionaries on the punctuation of a single expression. Meanwhile, O'Hagan's elegantly sneering prose illustrates just how bitchy NYRB reviewers can get.

There is no mention of Samuel Johnson, Henry Hitchings, or Hitchings's excellent book in this paragraph. It's all about (modern) dictionary-makers.

But then things get better. How could they not, once Dr. Johnson enters the room?

Here's the whole first paragraph:

If you keep an eye on them, you might notice that dictionary-makers are marginally bitchier than runway models. A few summers ago, the revised editions of the Chambers Dictionary and the Oxford Dictionary of English were published into an avid marketplace. Out came the lipstick, out came the knives, as the great lexicographers of today rolled their eyes at one another and balanced their inky fingernails on their slender hips. "Bling-bling" is one word separated by a hyphen, said Oxford. Not at all, honey-pie. Two words and no hyphen, said Chambers, summoning the authority of the ancients, or Puff Daddy, seeing as the ancients were unavailable.

A not necessarily complete list of disparagement by imputation of femininity/effeminacy in this paragraph:

"bitchier" ("bitch" and "bitchy" being largely reserved for insulting women and gay men)

"runway models" (who are both female and male, though modeling, like almost everything associated with fashion, is considered to be a "feminine" occupation, and male models are widely believed to be mostly gay; I mean, look at Brad Gooch)

"the lipstick" (no further comment necessary)

"the knives" (debatable, but knives are the weapon of choice for women, guns for men)

"rolled their eyes" (a gesture widely believed to be feminine, and to be a gay mannerism)

"their slender hips" (oh, jesus! this is where I started hissing like a raging queen)

"honey-pie", attributed to one dictionary, applying it to the other

That's a lot to pack into one little paragraph, which as a result configures a difference between the two dictionaries as The Women meets The Boys in the Band. I guess O'Hagan thought he should lead with a really good punch. Wipe out the pussies in the first round. Unfortunately, though I find the paragraph artful, I also find it decidedly unpleasant and also, well, damn bitchy.

Then there's the utter triviality of this difference, which O'Hagan clearly wants his readers to appreciate. These wusses have nothing better to do with their lives than engage in catfights over whether an expression is to written as two words, one hyphenated word, or one solid word! (There is, of course, a huge literature about the issue and about particular cases. For the most part, though, lexicographers don't invest a lot of passion in such things: practice changes over time, there's a certain amount of variation that just has to be recognized, and clarity and comprehensibility are hardly ever at issue.)

To my mind, what's most disturbing about this passage isn't its rhetoric and distasteful background assumptions, but the way it presents the purported confrontation between the dictionaries. Notice that there are no actual quotations here, and no reference to any publication or to some public event at which the dictionary folks could be observed actually exchanging words with one another. Yet O'Hagan frames things so that the non-specialist reader is invited to suppose that there was such an exchange. A melee (I follow NOAD2's preference for the spelling of this word) at the Dictionary Society of North America meetings, perhaps.

But of course there was no such thing. All that happened was that one dictionary published one thing and the other published something else. The catfight scene is an imaginative construction of O'Hagan's. No doubt he will say that he supposed that readers would recognize his hyperbolic fantasy for what it was -- this is the "just kidding" defense -- but I'm not sure they're equipped to. Even if readers correctly divined his intentions, there's still the imputation that this is the way lexicographers behave, and since few people hang out with dictionary-makers, I suspect that even the readers who understand that the paragraph is not a factual report, jazzed up some, will come away from it supposing that it's an entertainingly exaggerated picture of the way lexicographers really do tend to behave (these days, anyway). That's a nasty underhanded blow.

My guess is that O'Hagan felt he needed to twist a knife into SOMEONE in a NYRB review -- it can be a tough neighborhood -- and since he had nothing really negative to say about the book (which he generally admires), Hitchings (whose writing he praises), or, of course, Dr. Johnson, he looked around for a victim and settled on modern dictionary-makers to show off his writerly chops on. Too bad.

After that first effusion of deprecatory hyperbole, O'Hagan settles into less extravagant expository prose, mostly about Johnson, who tends to crowd everyone else into a corner whenever he's in the room. There is a brief review of Hitchings's book in O'Hagan's piece, but by and large it's an essay about Johnson. (Well, the book review as a hook for an essay has a long history and many excellent practitioners. I'm not complaining about that.)

This essay begins, "Authority and provenance are watchwords for the dictionary-making classes." I wouldn't argue with that, though I think "dictionary-making classes" is over-showy and that "authority" needs more discussion, especially in a passage that shifts from the authority of Puff Daddy (see above) to Johnson's intention to both ennoble and fix the language. And I think "definition in context" belongs on the list. Certainly, that was a prime concern for Johnson and is still for modern lexicographers, especially those working on the vocabulary of special communities or on the historical development of meanings, but generally for anybody who wants to figure out what words mean.

In any case, we're now into O'Hagan's essay on Johnson, and out of the Bitch Zone.

Full disclosure time. I am not a lexicographer, but I hang out with a lot of them (on the American Dialect Society mailing list and at linguistics meetings), and I currently have a gig as a Delegate to the Oxford University Press, consulting on American dictionaries.

[Added 5/3/06: I didn't say much about Hitchings's book above, but for the record let me say that it's marvelous. Hitchings writes clearly and engagingly, and, most notably, he's unobtrusive; the book is about Johnson, not Hitchings. When I first saw the book, I thought, oh hell, not another book about Dr. Johnson and his Dictionary -- but, yes, there's room for one more, especially one so focused on the dictionary project itself.]

[More added 5/3/06: Mark Liberman has now posted about O'Hagan's review, citing a perceptive blog entry on it by "A White Bear", who nails something in it that I failed to draw out, namely its contempt for scholars (as opposed to creative writers). Check it out. Oh yes, Mark also quotes from a delightful, genial review by Russell Baker in the most recent NYRB, just to show that the NYRB doesn't actually require its reviewers to skewer people.]

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 02:52 PM

Walking sticks and the polygraph

Is the polygraph like a walking stick? How can I make such an outrageous comparison? Like most of us, I never had any real use for a walking stick. My father-in-law owned a very nice one carved out of ivory but as far as I know, he never used it. When he died, I inherited it and, not knowing what to do with it, I kept it in the corner of my home-office, unused, until one day I developed bursitis in my shoulder. The walking stick suddenly had some functional use to me when I discovered that it was just heavy enough for me to use it to swing my arm around in order to break up the calcium deposits in my shoulder. Thus, my useless white elephant was transformed into a sort of medical device. The CIA and the FBI seem to have discovered a similar principle with the polygraph (see here).

The polygraph is disallowed in most US courtrooms and has been largely discredited as any kind of useful indicator of truthfulness. It measures emotions but has no known ability to detect lying. But never mind that. These agencies have discovered a totally new use for it as an intimidation device that is now being shared by many of the nation's employers. It's currently being administered in tens of thousands of job interviews as well as police interrogations. The claim is that when people are subjected to the polygraph, they allegedly admit admit to things that they might not otherwise talk about. The FBI's security program is said to fail some 25% of all applicants, based on intimidation growing out of preceding polygraph tests. Like my walking stick, the polygraph has emerged from its cloud of failure into an allegedly functional asset. It's no longer used for its original dubious scientific value and has been renamed "an investigative tool." If the subjects believe in the polygraph's highly questionable ability to ferret out lies, it works the same way as if it actually had the scientific credibility to do so. Therefore, the government can justify keeping all of its machines and operators in business.

Ethical issues are suggested here, and not just for the few prosecutors who still manage to sneak questionable polygraph evidence into their cases. Are "intimidation tools" acceptable in the hiring process? Some say, "no." For example, a former scientist at Sandia National Laboratories resigned because he believed that this use of polygraphs was unethical. Should employers directly or indirectly lie about what the polygraph can and can't do in order to induce applicants to reveal embarrassing things about themselves? Should polygraph operators be allowed to pretend that their machines are suggesting lies and then tell subjects such things as: "I'm getting a strange reading. Tell you what, I'm going to turn the machine off and ask you about what you want to get off your chest."?

Is all fair in love and war? It may be legal to permit police investigators to lie to the suspects they interrogate but is it equally okay to lie to prospective employees? I have to admit that converting my walking stick into a medical device worked pretty well for me. It was functional and posed no ethical issues. However functional the transformation of the polygraph into an investigative tool seems to be, it looks like it stretches of the limits of ethical practice. Stay tuned on this one.

Posted by Roger Shuy at 01:18 PM

This time, he should have blinked twice

You'd think Malcolm Gladwell would know better. After all, this is the age of WCFCYA, even if he hasn't yet written a book about it -- but he's out there suggesting that every teen-lit novel uses the same specific sentences that Kaavya Viswanathan borrowed from Megan McCafferty. "My question", he asks, "is whether it is possible to write a teen-lit novel without these sentences." As we'll see -- and as Gladwell might have guessed if he'd thought about it -- the answer is provably "yes".

Specifically, Gladwell argues that charges of plagiarism don't apply to Kaavya Viswanathan because "[t]his is teen-literature. It's genre fiction. These are novels based on novels based on novels, in which every convention of character and plot has been trotted out a thousand times before. ... Calling this plagiarism is the equivalent of crying 'copy' in a crowded Kinkos".

The thing is, it's not conventions of character and plot that Viswanathan is accused of copying, it's whole sentences of text. But Gladwell feels the same way about those -- he writes

It is worth reading, I think, the actual passages that Viswanathan is supposed to have taken from McCafferty. Let's just say this isn't the first twenty lines of Paradise Lost. My question is whether it is possible to write a teen-lit novel without these sentences: ...

Gladwell then goes on to quote one of the less exact borrowings that have been documented (see here for a more exact instance, and here for a longer list of examples):

From page 7 of McCafferty’s first novel: “Bridget is my age and lives across the street. For the first twelve years of my life, these qualifications were all I needed in a best friend. But that was before Bridget’s braces came off and her boyfriend Burke got on, before Hope and I met in our seventh-grade honors classes.

From page 14 of Viswanathan’s novel: “Priscilla was my age and lived two blocks away. For the first fifteen years of my life, those were the only qualifications I needed in a best friend. We had first bonded over our mutual fascination with the abacus in a playgroup for gifted kids. But that was before freshman year, when Priscilla’s glasses came off, and the first in a long string of boyfriends got on.”

The answer to Gladwell's question, of course, is that it's easy to write a teen-lit novel without those particular sentences, or any sentences very much like them. In fact, this is so easy to do that everyone who has ever published a teen-lit novel has suceeded in doing it. (Except, of course, Megan McCafferty and Kaavya Viswanathan.) And thanks to Amazon's "Search Inside" feature, we can check. For example, even Megan McCafferty can produce a novel without these sentences: Second Helpings, her sequel to Sloppy Firsts, contains neither of the four-word strings "is my age and lives" or "was my age and lived".

If I go to a9 and check the category of books for "is my age and lives" or "was my age and lived", I can only find three real instances, none of which are remotely similar in other ways to the McCafferty/Viswanathan passages:

(Timothy Tyson, Blood Done Sign my Name) The teenaged boys filled the truck bed with Coca-Cola bottles and rocks, then roared through the African-American neighborhoods hurling them at pedestrians, windshields and windows. I heard secondhand tales of these vicious adventures many times from Jeff Daniels, who was my age and lived across the street from me. Neighborhood boys older than us were a regular part of these attacks.

(Dirk Wittenborn, Fierce People) I wasn't crazy about hearing some English guy asking my mother in the first light of day, 'Hey, luv, got any Vaseline about for Mr. Johnson?' But everybody who was my age and lived in a loft in the late seventies knew their mom did it.

(Joe Evans, Daydreams) The hangout actually was the big garage in Bud's back yard. Bud was my age and lived three houses up Franklin Street. The garage had a loft and plenty of windows for light. A small table and some folding chairs were inside. There was a cabinet on the back wall with some comic books, cards, checkers and a couple of table games and some junk inside. Everything needed for a good time.

Google Book Search turns up four more, none from chicklet-lit and none with any other similarities:

(Janet Evanovich, Hot Six) I stopped off at Dillan's basement apartment and explained my needs. Dillan grabbed his toolbox and we trooped upstairs. He was my age and lived in the bowels of the building, like a mole. He was a really cool guy, but he didn't do much, and as far as I know he didn't have a girlfriend . . . so, as you might expect, he drank a lot of beer.

(Barbara Probst Solomon, Reading Room/4) There was also Rodolfo Canales who was my age and lived two houses away on La Calle Diez Palmas, but my mother would not let me play alone with a boy, and my father refused to let me associate the son of the man who killed his mother.

(Anatole Kurdsjuk, The Long Walk home with Miracles Along the Way) My best friend was Anatoly Kaluzin. He was my age and lived two houses away; we shared our most intimate five year old secrets and were always together. On the other side of our house lived Pavlik, he was older than I but was very small for his age, and as usual, children coined nicknames for their friends who were "different". Pavlik's street name was Blokcha, the flea.

(Jean C. Carlson, The Widow and the Wizard) Anyway, when I was 11 years old, my grandmother died at age 66 from diabetic complications. Her life had had its share of hardships but she was a dear sweet Christian woman. Her only son was Oscar. And he was the father of three girls also. His middle daughter, Betty, was my age and lived near us. We went to grade school, high school and nurse's training together.

Although I don't have time to do it, I'd be prepared to make a substantial wager that we'd get similar negative results from searching for other strings in the quotes that Gladwell cites, as well as in the rest of the purloined passages.

In fairness to Mr. Gladwell, he doesn't actually assert that all teen-lit novels necessarily contain the sentences copied by Ms. Viswanathan, he merely uses a rhetorical question to imply it. However, the implication is essential to his overall argument, and it's spectacularly and obviously false. I'm sure that he's not as dense as this argument makes him seem, so the real puzzle is why he went down this rhetorical road.

If I were given to speculating about other people's motivations, I'd guess that Gladwell has such disdain for the genre of chicklet-lit that he's unwilling to grant enough creativity to McCafferty for her words to be worth the attribution of authorship in the first place. This is in sharp contrast to his instincts about the value and ownership of his own words, in a famous case where he was the author of a non-fiction piece from which material was borrowed without attribution for use in a Broadway play. In his 2004 New Yorker article about this experince, Gladwell made a striking point about the difference between IPR law and the moral taint of plagiarism:

The arguments that [Larry] Lessig has with the hard-core proponents of intellectual property are almost all arguments about where and when the line should be drawn between the right to copy and the right to protection from copying, not whether a line should be drawn.
But plagiarism is different, and that's what's so strange about it. The ethical rules that govern when it's acceptable for one writer to copy another are even more extreme than the most extreme position of the intellectual-property crowd: when it comes to literature, we have somehow decided that copying is never acceptable.

This is a distinction that deserves more discussion than it's getting. However, the notion that sentence-copying is somehow an inevitable feature of genre writing is preposterous and bizarre.

[OK, amazon doesn't index every teen-lit novel ever written, so I'm not justified in assserting that Gladwell's implication is universally false. But I can assert that it's false in every one of the large number of cases where it's possible to check.]

[Update: Malcom Gladwell took it back. For some reason, however, he decided to compound his corpus-linguistic error by asserting that

In the spy thriller I just read, the bad guy is torturing the hero and getting no where. He tells this to the head bad-guy who says—and I'm guessing that everyone who has ever read a thriller will know what's coming next: "Don't worry. He'll talk. They always do."
Does the fact that I've read that exact line in at least five other thrillers spoil the fun? Not really. Did the writer "steal" that line from someone else? Sure. That's what a cliche is: it's what we call plagiarism the sixth or seventh time around.

There is no citation for the "five other thrillers", and neither A9 nor Google Books can find them. Do they exist? My guess is that they don't, and Gladwell has just retrospectively made them up, based on his (probably correct) impression that the general situation is a thriller-plot commonplace. But quite apart from the frequency illusion, there's a difference between concepts and words. Language is much richer than Gladwell seems to understand, and as a result, exact duplication of word sequences of seven words or so (the length of his example) is vanishingly rare, other than by specific quotation, plagiarism, or use of a genuine fixed expression.]

Posted by Mark Liberman at 11:15 AM

See here

To amplify somewhat Mark's explanation of what's going on when Jim Lehrer and Elizabeth Vargas (for example) close their TV news shows with "We'll see you here tomorrow," I think that if David Giacalone had been as worried about the use of "here" in this utterance as he was about the use of "see," he might have divined the answer to his question.

It's well nigh impossible to give a literal interpretation to "here" in the context of a person on TV addressing viewers in this way. Whatever "here" means, it's not "in the studio I'm talking from now." What seems to be going on, consistent with what Mark says, is the pretense of a face-to-face encounter. In real life, if I "see you here tomorrow", that means we will meet face-to-face. Moreover, it's usual to say "(I'll) see you tomorrow" to convey "We'll meet tomorrow," as in "I'll see you tomorrow, so you can tell me the rest then." As Mark says about Hank Williams's sign-off, "He's trying to make the listener feel that he's right here with them." It seems like a pretty harmless fiction: "I'm right there in your living room, so I'm entitled to refer to your living room as 'here'."

Posted by Paul Kay at 05:44 AM

	1997	1999	1999(f)	2000	2001	2002	2003	2004	2005
Too much freedom	38%	53%	42%	51%	46%	42%	46%	42%	39%
Too little freedom	9%	7%	8%	7%	8%	8%	9%	12%	10%
About right	50%	37%	48%	41%	42%	49%	43%	44%	47%
Don't know/refused to answer	3%	2%	3%	2%	3%	2%	1%	3%	4%

我	有	壓力	你	有	壓力
ngo5	jau5	ngaat3 lik6	nei5	jau5	ngaat3 lik6
I	have	pressure	you	have	pressure