Language Log: August 2004 Archives

August 31, 2004

Stupid contentless political blather

I recently received a letter with the address printed in blue on the envelope in a very clever font that looks all irregular like genuine handwriting. It didn't have POSTMASTER: If undeliverable, please process following applicable Postal Regulations on it, so I opened it. It was from a candidate for national office who promised "to wage a spirited campaign that celebrates American values and embraces a positive vision of one America, moving forward." He is, it seems, in favor of good-paying jobs and opposed to waste; he wants affordable health care and a strong national defense. This all seems like good news. I had decided I was going to send money, in fact. But then we had a clean-up of the living room, I lost the letter, and now I can't remember which side that candidate was on. I'm all in favor of celebrating American values and embracing a positive vision of one America, moving forward, toward good jobs, health care, strong defense, and no waste. I don't want to send money to the party whose platform involves not celebrating American values (possibly even celebrating unAmerican values) and embracing a negative vision of several Americas, moving backward toward unemployment, untreated sickness, military weakness, and profligacy. Dammit. If only I could recall which was which...

Posted by Geoffrey K. Pullum at 06:34 PM

Welcome bikers!

Here at Language Log, we're used to visits from a diverse crowd of gardeners, record company executives, knitters and dog lovers, poets, computer scientists, tour guides, foodbloggers, musicians, and nuns. But today is the first time that we've had many referrals from a cycling site. One Dr. Hoo posted this earlier today to Road Bike Review forums

Do you dare link to the egg corns? Do you fear the egg corns? Might the egg corns be not-work safe? You will just have to find out, or live the rest of your life wondering, won't you?

I DARE to see the egg corns!

Dr. Hoo is clearly a person with considerable ~~false advertising~~ public relations skills. In the 12 hours since he posted this, 25 of his readers have "DARED to see the egg corns". It goes to show, I guess, that a bit of innuendo can promote almost anything to almost anybody.

So far, the forum's reactions are mixed. Comments include both "Well that was a pointless time waster. I should very much like to make the aquaintance of the author of that little website and stick my boot up his keester", and "That site fried what was left of my brain this morning. Too funny."

At first I thought that Road Bike Review might be a motorcyle site, which would be more interesting, but I was wrong. In any event, we're open to vehicles of all persuasions, so let me offer a hearty welcome to any further two-wheeling visitors. While keeping my back to the wall.

Posted by Mark Liberman at 01:34 PM

Open access again

In a recent post, I linked to a preprint of a paper by Perruchet and Rey that severely criticizes an earlier paper by Fitch and Hauser. The topic is an important one, which interests people from many walks of life: the nature of cognitive differences between humans and non-human primates. I've given you links to both papers, but unless you've got a subscription to Science, you can't read Fitch and Hauser's side of the story, and you'll have to be satisfied with the picture presented in my earlier descriptions or in Perruchet and Rey's summary.

To be entitled to read an article from Science on line, you either need to be a member of the AAAS -- which will cost you $130/year if you're in the U.S., and more for foreigners -- or you need to access it through the web site of a subscribing library. I think that every working scientist ought to be a member of the AAAS, but for people who are not in the biz, and who might be interested in just a few articles a year, $130 is a pretty steep price. And unless you're entitled to use a library that gives you remote access by proxy, you'd have to make a special trip to the library IRL, which not many people will do just to learn a bit more about a topic that's not essential to their job or their health.

You can also pay $10 "in order to have access to one article from Science for the next 24 hours from the computer you are currently using". That's not much of a bargain, in my opinion, given that a year's subscription to the New Yorker Magazine costs $36.95. And the New Yorker pays its writers and editors!

Fernando Pereira recently took up this general problem, in reaction to a letter to the Economist from Frank Spilhaus, Executive Director of the American Geophysical Union. Fernando wrote:

One might naïvely assume that scientific societies would be for wider access to science. But, like the guilds of old, their power is tied to restricting the access to knowledge. They are some of the worst offenders in the scandalous inflation of journal prices, under the pretext that their journal revenues provide important services for their members. In the US, scientific societies are non-profits, under the assumption that they work in the public interest. But by that they seem to understand the narrow interests of their bureaucracies, and maybe the only slightly broader interests of their members, and only accidentally the broad public interest. Is there a more important service for a scientific society than maximizing access to new science? At some point, inquiring legislators might start asking how the public interest is served by societies placing tolls on the distribution of research results paid for by the public. Anti-OA agitators continue to spread FUD about "author pays," deliberately hiding the fact that some of the most successful OA journals are run by volunteers benefiting by the huge economies of Web-based publishing, and need no author fees. It may be the case that translating a traditional high-overhead journal to OA is economically impractical, but most of that overhead is useless anyway, since the scientists who to the real work — authors, reviewers, editors — are volunteers anyway.

There's a lot of pressure, from many sources, on this point. Publishers and scientific societies are feeling the heat, and making some concessions. For example, the rest of you will be able to read Fitch and Hauser's paper on line for free as of 1/16/2005, one year after its original publication. This sort of open-access-after-a-delay has become common, and it's certainly a step forward. But the guild structure imposed by high journal subscription prices no longer has any real economic justification, and its days are apparently numbered. You might want to check your stock portfolio for companies whose revenues now derive substantially from publishing scientific and scholarly journals.

I'm by no means any sort of information-wants-to-be-free absolutist. I recognize that there are real costs associated with creating, maintaining, archiving and indexing journals, and that these costs need to be covered . But most of the real costs have always been met by subsidies from all of the participants except the publishers and the scientific societies -- the companies, government agencies and private institutions that fund the research, and the authors, reviewers and editors who volunteer their time.

In the olden days, there were significant costs associated with typesetting, printing and mailing, and so the publishers had a real role to play. But in field after field, paper journals are becoming like academic caps and gowns, a purely ceremonial relict of an obsolete culture. The difference is that the cap-and-gown providers were never given the role of gatekeepers over matriculation and graduation, able to charge tens of billions of dollars a year for an increasingly inessential role. The learned societies continue to provide real services, especially with respect to the organization of annual meetings. But as Fernando suggests, some of them have come to support very significant bureaucracies, and the cost-effectiveness of their contributions to the advancement of science deserve some scrutiny, especially if part of the cost is exclusion of the public from access to the science.

Posted by Mark Liberman at 12:13 PM

Humans context-free, monkeys finite-state? Apparently not.

Forthcoming in Psychonomic Bulletin and Review is a paper by Pierre Perruchet and Arnaud Rey, entitled "Does the mastery of center-embedded linguistic structures distinguish humans from nonhuman primates?" This is a response to a paper by Tecumseh Fitch and Marc Hauser that appeared earlier this year (Computatational Constraints on Syntactic Processing in a Nonhuman Primate, Science, Vol 303, Issue 5656, 377-380 , 16 January 2004).

Both papers address the question in Perruchet and Rey's title: "Does the mastery of center-embedded linguistic structures distinguish humans from nonhuman primates?" Fitch and Hauser presented experimental evidence that the answer should be "yes". Perruchet and Rey do a different experiment of the same kind, and find that the answer is "no".

For an idea about why this is interesting, read Fitch & Hauser's abstract:

The capacity to generate a limitless range of meaningful expressions from a finite set of elements differentiates human language from other animal communication systems. Rule systems capable of generating an infinite set of outputs ("grammars") vary in generative power. The weakest possess only local organizational principles, with regularities limited to neighboring units. We used a familiarization/discrimination paradigm to demonstrate that monkeys can spontaneously master such grammars. However, human language entails more sophisticated grammars, incorporating hierarchical structure. Monkeys tested with the same methods, syllables, and sequence lengths were unable to master a grammar at this higher, "phrase structure grammar" level.

Here in contrast is Perruchet and Rey's abstract:

In a recent Science paper, Fitch and Hauser (2004; hereafter, F&H) claimed to have demonstrated that Cotton-top Tamarins fail to learn an artificial language produced by a Phrase Structure Grammar (PSG, Chomsky, 1957) generating center-embedded sentences, while adult humans easily learn such a language. We report an experiment replicating the results of F&H in humans, but also showing that participants learned the language without exploiting in any way the center-embedded structure. When the procedure was modified to make the processing of this structure mandatory, participants no longer showed evidence of learning. We propose a simple interpretation for the difference in performance observed in F&H's task between humans and Tamarins, and argue that, beyond the specific drawbacks inherent to F&H's study, researching the source of the inability of nonhuman primates to master language within a framework built around the Chomsky's hierarchy of grammars is a conceptual dead-end.

I described the Fitch & Hauser results back in January, and I also criticized their paper for seriously overinterpreting the results of the experiments it reported. I suggested that these might be just "[experiments] about memory span and/or sensitivity to statistical deviations [in local word-sequence counts]. No talk about grammars, much less hierarchies of grammatical complexity, is required".

Other critiques, for example this one by Greg Kochanski at Oxford, emphasized the long-established fact that humans have a great deal of difficulty with center-embedded structures. Fitch & Hauser interpreted their results to mean that human subjects could handle center-embedded structures easily, up to four levels of recursion (three levels are cited in the paper, and an additional level in the background material given on the Science website). As Greg observed, it seems much more likely that the human subjects were using some other method, unrelated to CFG parsing and perhaps not involving any overall grammatical analysis at all. There are plenty of candidates for techniques other than CFG parsing that would work in this particular case, one of which is the approach based on bigram statistics that I suggested.

Perruchet & Rey suggest that Fitch & Hauser's human subjects might have been using exactly such a strategy: "A parsimonious interpretation may be that human participants simply discriminated the cases where there was one female-to-male voice transition (AABB or AAABBB) from the cases in which there were two or three consecutive alternations (ABAB or ABABAB)." But Perruchet & Rey don't just suggest that a strategy other than CFG parsing was used -- they do an experiment to prove it.

You can read their paper for the details, as well as for much interesting discussion. Their experimental materials were almost identical to Fitch & Hauser's -- sets of spoken syllables in either a male or a female voice, arranged in patterns that either alternate -- (AB)ⁿ -- or nested -- AⁿBⁿ -- where "A" means "syllable spoken in a female voice" and "B" means "syllable spoken in a male voice", and n was either 2 or 3. Just as in Fitch & Hauser's experiment, the set of syllables used for the female voice was different from the set of syllables used for the male voice (though the sets used were slightly different, in order to accomodate the fact that the subjects were French): {ba di ro tu la mi no vu} for the female voice, and {sa li mo nu ka bi do gu} for the male voice.

The key difference was that in the nested case, the corresponding A's and B's were constrained to be paired in a fixed way, unlike in Fitch & Hauser's experiment, where no such constraint was imposed. Thus if the lists were matched in the order as given above, examples of grammatical center-embedded patterns would be minodobi -- because mi matches bi and no matches do -- and batuvugunusa -- becuase ba matches sa, tu matches nu and vu matches gu; whereas patterns such as minobido would be ungrammatical, since do is not the proper pairing for mi, and bi is not the grammatically proper pairing for mi.

As you can see, this turns a trivial task ("is the string a sequence of high-pitched syllables followed by a sequence of low-pitched syllables?") into a rather difficult one ("are the high- and low- syllables matched according to the constraints of a context-free grammar?"). Unfortunately for Fitch & Hauser, the first task is so easy (for humans) partly because it can be solved using trivial heuristics that have nothing to do with context-free grammars, such as "is there more than one female-to-male transition in the sequence?" -- or simply "is the sequence one of the two (!) sentences in the language?"

In P&R's experiment, the human subjects were not sensitive to the CFG-generated structure of the test language. They showed the same effect of "acoustic pattern" (i.e. one high-to-low transition versus multiple high-to-low transitions) as F&H's subjects did (83% correct vs. 85% correct), but on "grammaticality" they scored at chance. Furthermore, the "acoustic pattern" effect was stronger for longer strings than for shorter ones, which is the opposite of the effect predicted if the subjects had really been parsing the strings, as opposed to noting the number of high-to-low transitions.

Fitch & Hauser's grammars were far too easy to permit any general conclusions, because a valid acceptor could use a trivial heuristic, totally unrelated to any interesting general properties of the grammar types in question, and certainly without any relationship to the very broad interpretation that F&H give to the results. On the other hand, Perruchet & Rey's grammar may have been unnecessarily difficult. Each of their subjects was required to notice and learn an arbitrary pairing between two sets of 8 CV syllables. But a CFG of the AⁿBⁿ type only requires that the A's and B's in the string should match in inverted order, as P&R remind us, not that the matches involve some arbitrary mapping of terminal symbols. A much easier grammar to learn would be one in which the pairing (of left and right matching elements) is identity. Then e.g. batuvuvutuba and minonomi would be grammatical, while batuvuvubatu and minomino would not. There are still some problems here with simple non-CFG heuristics, especially if the strings are limited to n=2 and n=3, so human success at this task (if it were to be found!) would still need careful interpretation. I'm not sure what to predict about the outcome of such an experiment.

In any case, P&R put the ball firmly back in the court of anyone who wants to claim a relationship between the levels of the Chomsky hierarchy and the different propensities of humans and monkeys to notice things about sets of strings of spoken syllables.

Although P&R's experiment dealt only with human learning, and empirically challenged only the human half of the Fitch & Hauser paper, they also offer an interesting speculation about what might be going on with the monkeys:

... humans and monkeys were submitted to quite different tests. Students were asked to discriminate the strings consistent and inconsistent with regard to the sound pattern heard previously, and they presumably tuned their response criterion in order to share their responses roughly equally among "same" and "different". By contrast, Tamarins presumably turned towards the loudspeaker only if the sounds emitted by the loudspeaker were biologically significant. This difference deeply undermines a direct comparison between the performances of humans and Tamarins. But why did Tamarins turn towards the loudspeaker when they heard AAABBB after being familiarized with ABABAB, and not the reverse? Although we are limited to speculations, one hypothesis is the following. As any reader can check from listening to the sounds available on the Science web site, the AAABBB strings sound much more like natural human language than the succession of syllables alternately spelled out by the female and male voices that composed the ABABAB strings. This may explain why Tamarins selectively oriented towards the loudspeakers when they heard AAABBB after having been familiarized with the other structure. The reverse did not occur, possibly because the "novelty" introduced by ABABAB presented no potential interest (e.g., the new sounds could not cue the possible presence of humans).

This speculation seems plausible to me, though there's still an interesting question about what constitutes "sounding much more like natural human language" to Tamarins. As P&R suggest, this idea replaces a rather artificial task ("is this jabber similar to the jabber you heard before, or not?") with a more natural one ("is this jabber likely to indicate that humans are around, or not?"). The second task is one that the cotton-top tamarins would have had a lot of previous experience with, since they have been raised in captivity by human keepers, whose presence is likely to have been associated in the past with strong reinforcers, both positive and negative.

[Note: there is a typo worth correcting in the Perruchet & Rey paper, pointed out to me by Geoff Pullum: in the second line on p. 4, they write (ABⁿ) where they mean (AB)ⁿ.]

Posted by Mark Liberman at 10:35 AM

August 30, 2004

Incommensurability and indeterminacy

Many of the intellectual themes of the 20th century deal with barriers to understanding or failures of communication.

The various forms of the Sapir-Whorf hypothesis say that cultures, languages and individuals habitually frame the world differently, and may even express ideas that are fundamentally incommensurable. All the same, the practice of linguistic and cultural anthropology implied that an insightful observer can stand back and learn to understand the differences, and can even explain them successfully to the rest of us, as Whorf famously did with the Hopi conception of time.

Later in the century, there were some significantly more pessimistic views. Never mind, for now, the post-modern conviction that the whole notion of a world of objective facts is incoherent. I'm thinking of Willard Van Orman Quine, who belonged to a tradition completely antithetical to the post-modernists, but who argued in Word and Object (1960) for the indeterminacy of translation.

A 1995 paper by Nick Bostrom explains Quine's idea like this:

The thesis is that divergent translation manuals can be set up between natural languages such that they all are compatible with empirical facts but nevertheless diverge radically from each other in what sentences they prescribe as translations of sentences in the foreign language. Each manual works individually, but they cannot be used in alternation: the fusion of two of these manuals does not in general constitute a manual that is compatible with all empirical facts. The sentences (or anyway many of them) which the divergent manuals correlate to a foreign expression stand in no form of equivalence to each other, however loose.
[...]
The thesis of indeterminacy of translation is not that it is hard to find out what foreign sentences mean, or that the evidence available to us, finite beings as we are, is always incomplete. It is rather that there isn't anything there to be found: meanings, interlinguistic well-defined meanings, do not exist: there is no fact of the matter as to which meaning a foreign sentence has of the alternatives attributed to it by the rival manuals.

From Quine's writings one gathers that the thesis of indeterminacy of translation is a protest against the uncritical appeal to meanings and analyticity that characterised the logical positivists. Quine speaks of the notion of meaning as a stumbling-block cleared away. The indeterminacy thesis paves the way for Quine's philosophy of science and of mathematics, whose back bone is semantic holism[.]

On the face of it, indeterminacy of translation seems logically incompatible with Sapir-Whorf linguistic relativism. Quine says that there are many translation manuals that are equally valid but radically different -- but if two languages are even partly incommensurable, then there are no complete translation manuals at all. And if you can't say how sentences match up across languages, then how can you say that two languages "predispose [different] choices of interpretation"? However, in a vaguer sense, I think that indeterminacy and incommensurability are intellectual companions of a sort. Both theses reflect the view that cognitive structures reflect the structure of (individual and cultural) experience, and that such experiences can be very, very different.

This vaguer sort of connection may be the only one that we really have. Bostrom's abstract begins:

The state of the art as regards the thesis of indeterminacy of translation is as follows. Very much has been said about it, most of which is based on misunderstandings. No satisfactory formulation of the thesis has been presented. No good argument has been given in favour of the thesis. No good argument has been advanced against it.

I wouldn't go this far with respect to the Sapir-Whorf hypothesis, but I think it's fair to say that like chess or contract bridge, such ideas are better viewed as motivation for interesting interactions than as problems to be settled once and for all.

[As evidence that Nick Bostrom is a smart and clear-thinking person, read his article from Plus Magazine showing that "Cars in the next lane really do go faster." It has no direct connection with the topic of this post, but it introduces an important idea in a particularly clear way.]

[And I'll mention again that Robert Quine of the Voidoids was W. V. O. Quine's nephew, in case you missed it the first time.

Only time can write a song that's really really real
The most a man can do is say the way its playing feels

]

Posted by Mark Liberman at 07:59 AM

August 29, 2004

"There'd be a difference in his voice"

This morning, as I wrote about Nemesysco's claim to "detect 'Brain activity finger prints' using the voice as a 'medium' to the brain", I was reminded of something I read last night about a much older forensic application of voice analysis.

In chapter II of Dashiell Hammett's Red Harvest, the Continental Op is questioning the secretary of Donald Willsson, a newspaper editor who's been murdered. The secretary is described as "a small girl of nineteen or twenty". She describes what her boss did during his last afternoon at work.

"You called up -- if it was you he told to come to his house -- at about two o'clock. After that Mr. Donald dictated some letters, one to a paper mill, one to Senator Keefer about some changes in post office regulations, and -- Oh yes! He went out for about twenty minutes, a little before three. And before he went he wrote out a check."

They locate his check book, find the check stub, and establish that Willsson could easily have gotten to the bank and back in 20 minutes.

"Didn't anything else happen before he wrote out the check? Think. Any messages? Letters? Phone calls?

"Let's see." She shut her eyes again. "He was dictating some mail and -- Oh, how stupid of me! He did have a phone call. He said: 'Yes, I can be there at ten, but I shall have to hurry away.' Then again he said: 'Very well, at ten.' That was all he said except, 'Yes, yes," several times."

"Talking to a man or a woman?"

"I didn't know."

"Think. There'd be a difference in his voice."

She thought and said:

"Then it was a woman."

It turns out that she was right, of course. Willsson's phone partner was the memorable Dinah Brand.

Red Harvest was published in 1929. As far as I know, in the intervening 75 years, no one has ever thought to check experimentally what you can tell from one side of a phone conversation about the person on the other side. In particular, the implication of the passage I've quoted is that a listener should (sometimes?) be able to tell the sex of the other person from "a difference in [the] voice". This is a nice Socratic displacement of Holmesian deduction -- the secretary knows how to read the clues, even by reference to a day-old memory, but she has to be reminded of her knowledge.

I'm skeptical that such inferences are generally reliable, in fact. But this is something (else) you could do, when you're forced to listen to someone talking on their cell phone in a public place -- ask yourself whether you have any intuitions about the sex of the person on the other end of the line, just from the "voice" of the side you hear. If you're a bold person, you could check your perceptions by asking for the truth of the matter.

It would be easy enough to test various versions of Hammett's hypothesis in a more conventional sort of experiment.

Overall, it's striking how little we know about most non-lexical aspects of speech perception.

Posted by Mark Liberman at 10:29 PM

Determining whether a lottery ticket will win, 99.999992849% of the time

It's reassuring to see that David Beaver, while not occupied with his duties as a mascot, has been applying the power of logic to foil would-be terrorists. I myself, in my secret life as a young Israeli mathematician, have applied the power of statistical pattern recognition develop something even more important, namely the "First ever PC voice analysis software that can detect Love!"

Here for the first time, I will reveal my method, a fundamental innovation that can be applied to nearly any problem. I've already revealed that this technique can determine with nearly 100% accuracy whether or not a passenger is planning to blow up an airplane, or whether or not someone who calls you on the phone is secretly in love with you. Believe it or not, my method can also be applied to determine whether a car's engine is about to break down, or whether a lottery ticket is going to win the grand prize or not.

While I can't claim the perfect results achievable using David's logical methods, I'm proud to be able to submit irrefutable proof that in detecting whether or not a Lotto 6/49 ticket will win the jackpot, my method will be correct exactly 99.999992849% of the time. For detecting airplane hijackers, the results are not quite so exact, but a maximum likelihoood extrapolation from the past three years of experience suggests a success rate of approximately 99.99999894%. This is much better than the mere 98% success rate claimed for the Nemesysco product in the news article that David cited. The details of my method are given, in full, below.

Seriously, Amir Liberman of Nemesysco is not a secret identity of mine, nor even any relation to me, as far as I know. And I have no idea whatsoever how he and his colleagues achieve whatever results their products are able to achieve, whether in detecting love or terrorism. That's the whole problem with these systems, as far as I'm concerned -- because there is no implementable explanation of what they're doing, it's impossible to evaluate the underlying science. Nemesysco's explanation of its "Layer Voice Analysis" technology on its website is hopelessly vague -- that's fair enough on a website for the general public, but there are no pointers to papers or technical explanations. Their statement that LVA uses "wide range spectrum analysis and micro-changes in the speech waveform itself (not micro tremors!)" is about as helpful as saying that a cancer treatment uses "chemical compounds and natural plant extracts (not laetrile!)". They do say that "[t]he LVA uses a patented and unique technology to detect 'Brain activity finger prints' using the voice as a 'medium' to the brain and analyzes the complete emotional structure of your subject". If the technology is patented, there must be some degree of disclosure, though past experience leaves me without a lot of hope of finding implementable details.

Although I don't know anything beyond guesswork about what the Nemesysco system is doing, I was quite serious about the performance of my detection algorithm. Its secret is a simple one, which I'm happy to share with you: just figure out the commonest outcome, and guess that way every time. Most lottery tickets lose -- this website explains that in the Lotto 6/49 game in particular, the odds of picking the right sequence of six numbers is 1/13,983,816 = 0.00000007151. Subtracting that quantity from 1 gives you the probability of a correct diagnosis if you guess "this ticket will not win the jackpot" on every occasion.

A similar line of reasoning applies to airplane hijacking by terrorists -- roughly 19 out of 1.8 billion U.S. domestic airline passengers over the past three years have been terrorists planning to hijack the plane they were boarding, so if you guess "this passenger is not a terrorist" every time, you'd have been right with probability 1 - (19/1,800,000,000). The future success rate for this algorithm is probably not enormously different -- there will continue to be about 600 million passengers boarding planes in the U.S. every year, and only a handful of hijackers.

You can now guess how to apply this method to the love analysis test. Here the technique does need to be personalized, and the success rate will vary with the individual, though results approaching 100% correct are likely in many cases...

Needless to say, I'm not recommending that anyone use this method for any purpose at all. The point is that you have to be careful about interpreting percentages that are cited as indications of performance levels. There's a branch of statistics called signal detection theory devoted to analyzing decision in the presence of uncertainty, and a basic understanding of its concepts -- like ROC curves and d' -- should be part of everyone's basic mathematical education.

I do need to disagree respectfully with one aspect of David's post. He writes:

The insurance companies have nothing to lose, in the sense that they start off with no sensible way of telling whether claims are fraudulent, or which parts of claims are fraudulent. But they do know that if the claimants think that the insurance companies have a way of telling truth from fiction, then fewer fraudulent claims will made or sustained. ... The insurance company has little interest, then, in whether Nemesysco's software really works. For what Nemesysco is really selling is is a great patter.

What we have here, I believe, is a technologically updated new release of that psychologically sophisticated version of the Pinocchio effect that evil parents have been using on their innocent children for generations, an effect whereby one lie spawns another, and all in the cause of establishing a norm of honesty:

Mom: Have you washed your hands.
Kid: Yes, Mom.
Mom: I can see your nose growing...

Based on interactions over the years with people in different areas of fraud prevention and investigation, including in the insurance industry, I believe that there is some truth in this, but it's by no means the whole story. Investigations are expensive, and it's important to be able to use them efficiently. You need to decide, somehow, which cases to examine at what level of scrutiny. Insurance companies do have methods for flagging suspicious claims, and I'm fairly confident that these methods have some genuine diagnostic value. If voice analysis were actually able to detect attempted fraud, even to a modest degree that would be useless as courtroom evidence, it might still be quite valuable as part of the process for deciding which cases to investigate further. This value doesn't depend on any deterrent effect -- it's gravy if things work out that way -- but it does require that the algorithms have some real ability to discriminate valid from fraudulent claims.

How the currently available commercial voice-analysis systems rate on this question is not known to me. But as I wrote before, I think it's appalling that this industry (i.e. proprietary methods for speech-based lie detection) continues, decade after decade, to market products in a way that would probably result in criminal charges for a pharmaceutical company. After all, it's potentially just as damaging to (fail to) identify people as law enforcement suspects or insurance fraudsters as it is to (fail to) diagnose them with diseases. And it would not be hard to evaluate the science underlying the technology -- whatever it is -- and to apply standard testing methods -- double blind, independent evaluation -- to determine how well the technology works in particular applications.

Posted by Mark Liberman at 10:22 AM

BS conditional semantics and the Pinocchio effect

Wow! According to this World tribune news bulletin, which cites a Middle East Newsline report, a new Israeli product can "determine with 98 percent accuracy whether a suspicious traveler has intentions to launch an attack during flight." And it's based on a voice analyzer in a chip small enough to fit in the frame of a pair of glasses, a chip which switches a light on for the user, in realtime of course, to signal (dis-)honesty. The technology was developed by a mathematician by the name of Liberman (hmm, that name sounds familiar, was he the one who co-discovered contradiction contours?), and is manufactured by a company he founded called Nemesysco.

Where to start with a story like this? How about the 98% claim? I will now demonstrate for you a voice analysis system I designed earlier today which, while still at the beta test stage, appears to surpass the accuracy of the Israeli model by close to two percentage points.

I won't discuss the technology involved in your correspondent's system, save to reveal that the core technology is a famously impenetrable dictum of the logician Iksrat der Flard, who pioneered BS conditional semantics:

"Snow is white" is a lie iff the speaker knows damn well it ain't so.

Having thus established the base case, the remainder of the recursion falls out naturally, and we merely need to add the basic intonation recognition capability for detecting BS at the level of individual propositions. Fortunately, your browser has these capabilities built in, and it was a simple matter to engineer the system, which I now present.

For each of the following three sentences, read the sentence aloud in a normal speaking voice, and then click on the sentence to see the system's honesty analysis:

Isn't technology amazing? Just to check that the system really performs at close to 100%, I'm keeping a running tally of system successes and failures. Hit one of the following two buttons (preferably while reading aloud what it says) to indicate the accuracy of the system's analysis. Needless to say, I don't in any way depend on your honesty in providing this feedback...

So much for 98%. Let's move on to the technology.

For all I know, conventional polygraph technology probably indicates little more than that people who are tense often sweat. So the bar may be low in the lie detection industry. Nemesysco's systems may or may not beat standard polygraphs (although they are certainly less intrusive), and may or may not be useful. Independently of what the technology actually is, the company's approach to presenting its technology suggests their main goal is to pull the wool over the eyes of the public, terrorists and governmental or other customers alike. In the US, Nemesysco systems have previously been marketed with the slogan “The DNA of thought.” Ehh, pardon? Many recent media references to the company's products mention that the analysis performed by the systems requires "8000 algorithms." Gosh, what a lot of algorithms... but heck, from a mathematical perspective I use at least 8000 algorithms every time I flush the toilet. The company's website leaves us none the wiser as to what the technology is really about. It spouts all sorts of nonsense like the following:

The LVA uses a patented and unique technology to detect "Brain activity finger prints" using the voice as a "medium" to the brain and analyzes the complete emotional structure of your subject. Using wide range spectrum analysis and micro-changes in the speech waveform itself (not micro tremors!) we can learn about any anomaly in the brain activity, and furthermore, classify it accordingly. Stress ("fight or flight" paradigm) is only a small part of this emotional structure…

So what's the big deal here? What for that matter, is the business model? Well, it may be simple. Nemesysco's biggest customers appear to be insurance companies. They buy Nemesysco's business solutions for assessing insurance claims. The insurance companies have nothing to lose, in the sense that they start off with no sensible way of telling whether claims are fraudulent, or which parts of claims are fraudulent. But they do know that if the claimants think that the insurance companies have a way of telling truth from fiction, then fewer fraudulent claims will made or sustained. Indeed, Nemesysco's website says that they offer training courses for insurance claim processors in how to politely indicate to a claimant that the falsity of a claim has been discovered, in the hopes that the claim will then be dropped. The insurance company has little interest, then, in whether Nemesysco's software really works. For what Nemesysco is really selling is is a great patter.

What we have here, I believe, is a technologically updated new release of that psychologically sophisticated version of the Pinocchio effect that evil parents have been using on their innocent children for generations, an effect whereby one lie spawns another, and all in the cause of establishing a norm of honesty:

Mom: Have you washed your hands.
Kid: Yes, Mom.
Mom: I can see your nose growing...

Posted by David Beaver at 04:52 AM

August 28, 2004

Urban legend eggcorns

It's always risky to claim that "no one ever really said that" (or "no one ever really wrote that"), but sometimes cute linguistic substitutions are too cute to be true.

On August 18th, a Press Association story appeared in the Guardian and elsewhere claiming that "[p]atients' lives are being put at risk because letters from hospital doctors are being sent to secretaries in India to be typed and returned to GPs with mistakes". Google News finds more than 20 follow-up stories. But Ray Girvan at Apothecary's Drawer Weblog points to discussion on sci.med.transcription, as well as web-search results, showing that the cited examples "were circulating in the medical transcription field long before Indian outsourcing was an issue": "phlebitis / flea-bite-us ('we don't even own a dog!'), baloney / below knee ('Transcribed by ward clerk: Baloney amputation') and acute angina / a cute angina ('I certainly hope he has a cute angina, because his face is uglier than hell')", and so forth.

In this case, as Ray observes, the motivation for the apparent fabrications is surely "xenophobia and local employment protection". But there are other examples of funny substitutions where there are only a few real eggcorns among the jokes and made-up stories -- perhaps sometimes none. One example is "self-defecating" for "self-deprecating", which plays a character-defining role in the movie "Kissing Jessica Stein".

At first, I thought that this was only an urban legend eggcorn, but of the 359 examples in Google's current index, I found a few apparent keepers:

(link) There is also a self-defecating sense of humor that finds it’s way into just about everything they do.
(link) It appears they always feel it is incumbent upon their critical prowess to offer sensationalize homogeneous opinions as if by proclamation which almost always has more to do with their flighty self-defecating language giving little credit to acting, writing, cinematography and inherent crafts.
(link) Self-defecating humor is essential aspect for a successful comedian to have, and it was refreshing to see that this comedian was a humble man as well.
(link) Another approach is to point out that even if my self-image is correct, I have the option to change it. I am also reminded of Nate's comment that I should be less self-defecating.
(link) I think there’s a fine line between having low self-esteem and – this is what I think our culture encourages us to do – to be not so assertive, to be modest and to a certain extent, self-defecating or more responsible for mistakes or problems in our lives, and we’re more introspective.
(link) Just stop with self-defecating comments, okay? Not only are they total conversation-stoppers, but if people hear them enough, they start to believe them.

Posted by Mark Liberman at 10:49 PM

Transmutation of wood chips at the BBC

This is a new low in BBC science and technology reporting. Actually, in may be a new low in science reporting in any medium above the level of supermarket tabloids. In an unsigned story dated August 24, BBC News tells us that a "new timber power plant" in Northern Ireland will "make high energy wood pellets from surplus sawdust and woodchips". The beauty part is that "[t]he pellets can be burned in industrial and domestic heating boilers without creating carbon dioxide, which causes global warming." As Ben Goldacre observed in the August 26 Guardian, "For lo, they have cracked the secret of alchemy, reworked the very structure of the atom, and converted long-chain molecules containing carbon into pure hydrogen. Why not gold?"

Never again will I complain about the Beeb's reporting on telepathic parrots, translation difficulty contests, artificially intelligent child guardians, and so on. Well, perhaps that's going too far. Whenever in the future I might say something a bit negative about their language-related reporting, I'll add that at least it doesn't involve transmutation.

I'm glad this didn't appear in a major American news outlet. That's not because of patriotic sensibility, but because I believe that the American educational system needs to to be strengthened in areas related to language. If the New York Times or CBS News were routinely telling us about things like pellets made from wood chips that can be burned without creating CO2, I'd be hard pressed to argue that language is any worse off than any other area. In fact, though, I think that the extent and quality of instruction with respect to linguistics is now a great deal lower than in areas like physics, chemistry, biology and computer science.

[Link via Ray Girvan at the Apothecary's Drawer Weblog, who offers a charitable interpretation -- what the reporter (or one of the sources) really meant was that "carbon dioxide released when the wood is burnt is matched by that absorbed when the next crop is growing".]

Posted by Mark Liberman at 09:59 PM

Wohnata

Greg Easterbrook has traditionally referred to the pro football franchise associated with our nation's capitol as "the Potomac Drainage Baisin Indigenous Persons", because "[b]oth ends of the official name bother me. As to the Washington part, the club practices in Virginia and performs in Maryland. And which of these, precisely, is Washington? But the real objection is, of course, to Redskins." See this post by Geoff Nunberg for some history and commentary on the linguistic and legal aspects of the name.

Now in his Tuesday Morning Quarterback column at nfl.com, Easterbrook suggests a more practical alternative: the Washington Wohnata. "Wohnata" is said to be Lakota Sioux for "they are champions", and comes from a list of suggestions supplied by Laura Redish, editor of the Native Languages of the Americas website.

Some may recall that Easterbrook has had his own problems with ethnically sensitive language.

Posted by Mark Liberman at 01:10 PM

August 27, 2004

Still on the eggcorn beet

Things have been busy here in Eggcornia (or, as some would have it, Eggcornea). Today's offerings: a thought-provoking class exercise from Larry Horn, which leads me to a discussion of "hidden eggcorns" and of classical malapropisms; a soc.motss thread on free reign (and related expressions) that ranges widely over eggcorn issues; another soc.motss exchange on marshall law; a long ADS-L thread on the verbs home and hone; an e-mail message with a catalogue of some of the great eggcorns of all time and a reference to education; and five more sets of (putative) eggcorns from ADS-L and e-mail.

1. Classroom fun from Larry Horn. Five examples of cross-language eggcorns:

We discussed in class the reanalysis of ad hominem (orig., 'to the person', i.e. directed at the person rather than the idea s/he supports) as revealed by ad feminam. Borrowed expressions typically lose transparency (for obvious reasons) and undergo reformation or semantic shift, often along folk-etymological lines. The reanalysis may be indicated by the context (as above), the spelling, or both. For 4 of the following 5 expressions in the contexts provided, give the original version of each borrowing, and explain the nature of the reanalysis. (HINT: the 5 expressions are based on loans from 4 different but closely related languages.)

a) bonified ('genuine, authentic'), as in:

"a bonified psychopath","a real life bonified issue", "a bonified replica", or "a bonified, official, you-better-buy-from-me Girl Scout cookie salesperson"

b) mano a mano, as in:

"Both shows [Prime Time Live and 20/20] had been going mano-a-mano, or rather womano-a-womano, competing for the same stories and interviews. (Surely you recall all those colorful Diane Sawyer-vs.-Barbara Walters tales.)" [Newsday, 1/24/01]

c) power mower, as in:

"Meanwhile, Richard Parker Bowles, brother of Camilla's ex-husband, Andrew, said that from the beginning Camilla approved of Charles marrying Diana while she remained his power mower. [Richmond, VA Times-Dispatch, Jan. 1995]

d) pre-Madonna, as in:

"[boxer Leila] Ali actually feels that [fellow boxer Christy] Martin is showing signs of fear. Ali describes Martin as a real pre-Madonna. According to Ali, Martin hired her own media people..."

[from a review of a San Francisco production of the musical "Chicago"] "Bianca Marroquin, a real pre-Madonna, boasts an almost innocent tawdriness and brings a refreshing gamine quality to Roxie Hart's need for fame."

e) (social) moray, as in:

"Smoldering passion? Bored promiscuity? Murder in an abortive duel? Love confounded by the conventions of social decorum? 'Eugene Onegin' has it all. Peter Ilyich Tchaikovsky's operatic rendering of Alexander Pushkin's tale of aristocratic Russia, which opens Saturday at the Lyric Theatre, makes its case with a rare blend of restraint and the composer's typically over-the-top melodism... Lensky's rage over Onegin's flirtation with Olga (who is supposed to be his girlfriend) might seem a bit exaggerated today - until we stop to think that in Orthodox Russia, flirting was tantamount to fornication. 'Emotions are universal and timeless, but social morays are specific to a time and place. A lingering hand-kiss was the first step toward becoming engaged, whereas today we think nothing of embracing a total stranger.' "

"This is why, in general, men hate to dance: there are myriad social morays that guys need to concentrate on while strutting their stuff, the most important of which is the avoidance of any and all male-to-male body orientation while on the dance floor."

[Re directors Matt Stone and Trey Parker of "South Park: Bigger, Longer & Uncut":] "Whereas the two creators of the television series by the same name have had to restrain themselves on Comedy Central, on the big screen they seem to have looked for every social moray available and farted on it. And it's hilarious."

" 'It's become almost a social moray' to marry someone close to your own age, she says."

Larry notes in e-mail to me:

One of my favorite ones not included here is "can't get untracked" (e.g. a baseball player or team in a slump), which appears to have derived from a reanalysis of "can't get on track" (of a train car). Of course being "untracked" would be a Good Thing, precisely the way being on track is for the train.

2. Hidden eggcorns. Most of the eggcorns we've been collecting show up in spelling. (The really obvious ones show up in pronunciation as well.) The writing system for English provides separate (historical) spellings for a great many homophones -- rain, reign, and rein, for instance, to look ahead just a bit -- and so lets us see reanalyses involving these lexical items.

But there are huge numbers of homophones that are also homographs: pen 'writing implement', pen 'enclosure for animals', and pen 'penitentiary', to choose a textbook example. If someone reanalyzes an expression involving one of these lexical items -- say, by conceptualizing State Pen as involving the 'enclosure for animals' word or by thinking that "The pen is mightier than the sword" is about prisons -- most of the time we won't have any evidence of this. As Larry Horn points out, sometimes context will suggest that a reanalysis has taken place. And sometimes people will actually tell you what word they had in mind. But most of the time there will be no way to tell. There probably are vast numbers of hidden eggcorns out there in English; we just don't detect them.

Things are different in languages with other sorts of writing systems. Mandarin eggcorns will be even easier to detect than English ones. Finnish or Turkish eggcorns will almost all be hidden. An eggcorn hunt in Helsinki or Istanbul will be a tough undertaking indeed, while one in Beijing will be a breeze.

3. Classical malapropisms. Eggcorns are a species of classical malapropism (CM, distinct from the inadvertent "Fay/Cutler malapropism", or FM) originating in reanalysis. In fact, a fair number of the classical malapropisms in my collection from the 70s (reported on in Language Sciences (1979) and in Obler & Menn, Exceptional Language and Linguistics (1982)) were eggcorns; my all-time favorite among them is cholesterol > cholester oil. (No, the -ol of cholesterol has nothing to do, historically, with oil.)

At the time, I saw CMs as arising primarily from two sources: "frozen slips of the ear" (SEs), that is, mishearings that were incorporated into the hearer's mental lexicon; and "frozen tip-of-the-tongue approximations", that is, TOT guesses that were incorporated into the speaker's mental lexicon. In fact, when I looked at the relationships involved in CMs, they were somewhat similar to the relationships involved in SEs and TOT approximations, but strikingly different from the relationships involved in FMs. That is, though in both a CM and a FM someone produces "the wrong word", the mechanisms at work seem to have little in common. If you ask "What happened?", the answer for a CM requires a trip back in the history of the speaker, but for a FM the answer is all about the moment of speech, here and now. (And then, of course, CMs can spread through the population, so that there come to be lots of people whose only error is in not knowing that their speech community is to some degree non-standard.)

I now think that I grossly underestimated the eggcorn portion of the CM inventory. Reanalysis is a third source of CMs, quite possibly the most important source. Evaluating that hypothesis will not be easy, though; it will take a lot more than the collection of fortuitous examples, fun though that may be.

4. Free reign on soc.motss. It started innocently enough on 8/24/04. Robert Coren checked out my last eggcorn posting on LL and inquired:

Interesting. I'm afraid I've forgotten (or never knew) exactly what an eggcorn is, although the examples are suggestive. Would "free reign" (which I was somewhat horrified to see in the introductory wall essay of a show at the Boston Museum of Fine Arts) count?

Before I could step in to say, well, yes, Michael Wharton (on 8/25/04) produced a spirited defense of free reign:

"Reign" accords with my understanding of the usage. What would the alternative spelling be, "free rain", meaning a situation in which precipitation is not charged for? The word spelled "reign" means dominance, control, or is a term for the leather straps used to control a horse. In that way, "free reign" seems perfectly appropriate - to allow the artist to paint, free of the controls (reigns) that might otherwise limit his creativity. I don't think the museum has made any mistake whatsoever.

In response to which Ken Rudolph sympathetically noted, "I think you've just illustrated how easy it is to make an eggcorn", and Mike McKinley, much less sympathetically, accused Wharton of being illiterate (a point I'll get to below). And I observed (in my uppercase-spare style) that this one had made the usage dictionaries:

MWDEU ("rein, reign") reports "in full reign" and "turn over the reigns", and from their own files, "free reign" from Harper's Bazaar in 1981 and People in 1987, plus "take the reigns" from TV Guide in 1986.

as MWDEU puts it, the word "rein" "has been driven into relative obscurity by the automobile" and is now known mostly to horse people. "reign", however, is still in general use, and it conveys the sense of control that these various originally horsey expressions have.

And Lee Rudolph (no relation to Ken) takes rein(s) > reign(s) back a bit further than MWDEU: "The standard edition of Yeats's _Collected Poems_ has Willie writing (of the Roman Empire) that it 'dropped the reigns of peace and war'."

At this point we veer into (inadvertent) typo vs. (advertent) thinko territory. Robert Coren (still on 8/25/04, a busy day in Eggcornia) distinguishes the two examples:

I think (on no authority other than I think makes sense) that these are two different things. "Take the reigns" (or anything that uses the plural, probably) looks to me like a simple misspelling/homophone confusion -- i.e., the horse-control sense is still there -- whereas I suspect "free reign" is in fact a perfect example of what MWDEU says.

and returns to the topic on 8/27/04:

I was thinking about this question again today when "reign" reared its head again, this time in a movie review in the _Boston Globe_, where a director was praised for "keeping an impressive reign on his young actors". I'm sorry, to me this is not the result of "reanalysis", conscious or unconscious; it's plumb carelessness or ignorance or both.

Now, nobody's denying that ignorance of a sort plays a role in eggcorn development. To get the whole thing going, you have to be unaware that (historically, and from the point of view of most English speakers) some expression involves one lexical item rather than another (homophonous with, or very close to, the first) and so requires one spelling rather than another. That allows you to identify lexical items in a way that makes sense to you, and to use the corresponding spelling. After that, you'll be inclined to go with your spelling and to disregard other people's spellings; after all, everybody knows that English spelling is weird and lots of people make mistakes!

The point is that you'll be consistent in your misspelling (and be inclined to defend it). If you were just pulling up homophone spellings on the spot, you'd show variability, possibly also indecision, and you'd be willing to correct the spelling.

As I said in my earlier posting, X > Y might be an inadvertent error (the orthographic counterpart to a Fay/Cutler malapropism) in some cases but an eggcorn (a classical malapropism) in others. On to one such example.

5. Marshall law on soc.motss. In my earlier posting, I cited martial law > marshall law, from the writing of David Fenton on soc.motss. On 8/25/04, Fenton posted to say that his error was inadvertent:

You use my "marshall law" example, to my embarrassment, because at a conscious level, I know perfectly well what's correct, but in quick typing, the brain grabbed a homonym that was incorrect. Some of the homonyms my brain has grabbed in similar situations have been far more ridiculously wrong (though I can't call to mind any particular ones), and I've caught many such during the typing/editing process.

Fair enough. This occurrence of marshall law gets kicked out of the eggcorn corpus. However, it looks like there are some real occurrences, though it will take some work to verify this. Googling on "marshall law" produces a huge number of hits, but most of them are irrelevant; there was a tv program called "Marshall Law" and there are some men named Marshall Law and Thurgood Marshall has law libraries and the like named after him, and so on. There do seem to be some people who think the expression involves the word "marshall", though.

6. hone > home and hypercorrection. On 8/25/04 Herb Stahlke approached the wise folk of ADS-L with a query:

Using "hone" for "home" in expressions like "home in on" has been common for at least two decades. The MWDEU's earliest citation is from George H. W. Bush in 1978, so it must have been around a good bit before that. Today, however, I came across my first instance of "home" for "hone", in the sense of "sharpen". Associated Press reporter Chris Duncan, in a story picked up by the Ball State Daily News, writes about Olympic beach volleyball winners Walsh and May:

"Questions about the pair's Olympic chances arose in June, shortly after May pulled an abdominal muscle. She spent most of the summer rehabbing while Walsh kept homing her game with other partners."

Is this a nonce instance, or are "home" and "hone" trading places?

Not, we eventually decided, trading places. It looks like the hone > home shift is a reversal of the home > hone, in contexts in which sharpening one's skills is relevant. So, a typo or a hypercorrection. Larry Horn Googled up a modest number of "homing my skills" examples and concluded that the shift is unlikely to be (always) a typo, so we're inclined to accept my proposal that it's a hypercorrection (note "error" piled on "error"), involving a writer who kept getting flak about hone in on and became suspicious of all occurrences of hone.

Meanwhile, back at the home > hone ranch, Horn came across the BBC's Skillwise Glossary, which attempts to explain the meaning of difficult words on the BBC website, including:

hone in

Verb

To focus on. (phrasal verb)

Example:The detectives honed in on the suspect.

And Doug Wilson said that he'd written to CNN in 2001, when this advice appeared in some kind of "improve your English" piece on their website. Looks like we're way past the eggcorn stage here.

7. Eggcorns and the state of education. Today, from a British ex-pat in South Africa (blogging under the name "Pom du Cap"), comes a list of all-star eggcorns: bated breath > baited breath, toe the line > tow the line, to the manner born > to the manor born, poring over a document > pouring over a document, exorcizing demons > exercising demons. (The pair metal/mettle is also in there, but the history is so convoluted that I scarcely know which spelling is to be taken as standard; for what it's worth, it's test your mettle for me, though the line-up of examples in e-mail to me suggests that it's test your metal for Pom du Cap.)

Along with the list there's a passing reference to "worsening education" in South Africa (and the U.K. and the U.S.), to which we might attribute the frequency of these misspellings. The unspoken assumption is that eggcorns are more frequent now than they used to be. I know of no evidence for this assertion and have plenty of reason to be dubious of all outcries that Standards Are Declining.

Still, we wouldn't want to deny that some eggcorns, like innovations in general, spread and eventually become dominant variants. Language changes.

Is this a Ride to Illiteracy, as many commentators on eggcorns (and other innovations) suggest? Well, if you insist that any failure to master the complete set of oddities of English spelling is evidence of illiteracy, then yes. Me, I'm inclined to think that such small-scale attempts to improve English spelling, to make it just a little bit more rational, are admirable, and I'm not distressed that in the process some information about the history of the language is no longer evident on the page. (I've gotten over my dismay that the eye in window is no longer visible.) In any case, who made spelling the God of Language?

8. Recent entries in the eggcorn steaks. Enough of this ranting; back to data.

8.1. wheel barrow > wheel barrel. Long discussion of this (very common) development on ADS-L, 8/11-12/04, with at least 11 contributors. Much of it was taken up with the relative contributions of (a) the existing, and common, word barrel and (b) the vocalization of [l] that would make barrel and barrow homophonous, or nearly so. In my opinion, the high point of the exchange came when Rachel Henderson posted the following:

By the way....my 15 year-old son just walked in the room, so I asked him how to spell wheelbarrow. He spelled it correctly...and chuckled when I told him that I had to do a double-take on the word (it's been a long time since I had pondered the spelling), and he professed immediately that he thinks the spelling was changed from -barrel to -barrow at some point in recent (100) years because of southern accents...

8.2. weigh anchor > way anchor and other weigh/way examples, career/careen. Bill Findlay e-mailed me from the U.K. yesterday with:

When they left with invitations to visit their village on another island, (they only stay on this one for a couple of months a year to gather coconuts) we started the engine and proceeded to way anchor. http://www.geocities.com/willix/October02.html

This would fit right along with the very common anchors aweigh! > anchors away!. Non-sailors are none too sure of their nautical vocabulary.

Findlay notes that under way > under weigh is acceptable in sailing circles, a usage sanctioned in some dictionaries (AHD4) but described as a "corruption" in others (the 1913 Webster's Revised Unabridged Dictionary (Merriam-Webster), which nevertheless provides the estimable citations below).

An expedition was got under weigh from New York. (Thackeray)

The Athenians . . . hurried on board and with considerable difficulty got under weigh. (Jowett's translation of Thucydides).

For a more recent sighting of under weigh, here's an excerpt from "Getting Under Weigh" by Allen Brill, Open Source Politics 9/02/03:

A Navy veteran like Kerry knows that "getting under weigh" takes place when a ship's anchors are hoisted and she begins to move forward under sail or power. He and his supporters must hope that the Yorktown was only a backdrop for his campaign's official start and not a metaphor for its progress from this point forward -- for the tourist attraction Yorktown will never be "getting under weigh" again. http://www.ospolitics.org/usa/archives/2003/09/02/getting_un.php

In addition, Findlay asked, "is 'career' > 'careen' an eggcorn? (I guess that car careering around the left turn might well have careened)". I replied that MWDEU has a long entry on "careen, career"; some eggorning was certainly involved historically, but now they seem simply to overlap significantly for most people. Findlay's impression was that in the U.K. careen is still a specialised boating term, except among kids who say 'like' and 'whatever'." Maybe so. Somebody want to check it out?

8.3. learning by rote > learning by route. Blogger Jonathan Mayhew reported this one on 8/25/04. Well, rote is an uncommon and specialized word, and learning by rote involves following a plan, that is, a (metaphorical) path. Mayhew notes that "When you google these phrases you do come up with cases in which the phrase is actually meant as 'route,' but sometimes it is clear that the writer meant 'learning by rote.' "

8.4. pedagogical > pedilogical. Michael Quinion noted this one on ADS-L yesterday, with 14 Google examples. Larry Horn suggested that this was a reshaping much like the famous nuclear > nucular, with a common terminal element pre-empting a rare one; Doug Wilson followed up with: "Quick search in MW3 shows 446 words ending in '-logical', but only 9 in '-gogical' (most of these new to me)."

8.5. pidgin English > pigeon English, gold standard > goal standard. These from John Broughton in e-mail today. The first is an old standard among eggcorniasts; Broughton notes 2,930 Google hits, which strikes me as tiny (but then I teach about pidgins and creoles sometimes, so I probably overestimate the number of occasions for people to talk about pidgin English).

The second was new to me, but isn't entirely surprising, given the likelihood of final t/d deletion after a consonant and before a word beginning with an obstruent. (I suppose it would be too much to hope for that the shift would go on for one more step, with vocalization of [l]: gold > goal > go.) Here's a cite supplied by Broughton: ' 'Platinum' [membership level] would become the goal standard for dealer participation in co-op's programs" (http://www.findarticles.com/p/articles/mi_m0VCW/is_22_25/ai_58381394)

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 01:47 PM

On counting and throwing

For all the lively discussion set off by his forthcoming article in Science ("Numerical Cognition Without Words: Evidence from Amazonia", published online 19 August 2004), Peter Gordon deserves the thanks of everyone interested in human language, thought and culture. A bit of the discussion took place here on Language Log -- I suggested an analogy to a different skill, and Dan Everett sent a fascinating note reflecting on the issues from the perspective of his 27 years of working with the Pirahãs.

Earlier today, Peter sent a response, which I'm happy to be able to present below.

Don't believe everything you read in the press. If you read the Daily Telegraph you will learn that I have been married to Dan Everett's wife Keren for the last 20 years! Also, the glib headlines: "Whorf was right!!!!!" screaming out are also less subtle than the primary source. In the Science article, I first ask the question of whether there are concepts that you cannot entertain as a consequence of the language that you speak. I then allude to Whorfian theory at this point --sorry Sapir, sorry Boas. This is actually the only place where I mention Whorf in the paper, and I do not finish with some final crescendo that "Whorf was right!" So, I distinguish between "weak determinism" and "strong determinism", which is basically a distinction that derives from John Lucy and that he essentially gets from Brown and Lenneberg. The strong determinism question is 1. whether languages can be incommensurable (i.e., possess concepts that are not intertranslatable), and 2.whether such incommensurability can actually prevent you from entertaining such concepts. B&L suggested that the latter cases would not exist because all languages can express the same range of concepts only some do it more efficiently. Apparently even Whorf believed this, so perhaps my results are not supportive of Whorf in the end.

Mark Liberman asks basically whether it is language or practice that is at stake here with his imagined example of the non-throwing culture. Well, this question always comes up in some form or another --usually just "how do you know that it isn't just because they don't engage in counting that leaves them without number concepts?" And here's how I think about this issue. First, as I say in the article, one has to get a handle on what counts as an interesting case. For example, the fact that the Piraha have no concept of quark or molecule is not going to be an interesting case of determinism. How do we draw the line? Well, basically, if someone didn't know what a quark was, we would not question their command of English, just their scientific knowledge. On the other hand, if you ask some to give you 4 sticks, and they say "Uh what does "four" mean?" then you would have some serious misgivings about their command of English or the intactness of their parietal lobes.

So, let's imagine another Libermanesque culture (invent your own name). It turns out that they make no distinction in their language between definite and indefinite reference. So, we do a bunch of experiments that show that they cannot get their minds around this distinction either. The skeptic then replies: "Well, maybe it has nothing to do with not having words like "the" and "a", but that they just don't engage in making distinctions between definite and indefinite reference, and that is where the causal structure lies, not in the failure of the language to engage in such distinctions." It seems to me that this is a pretty dumb argument because distinctions between definite and indefinite reference are inextricably entwined with language, and so to attempt to separate language and use is pretty meaningless. No one claims that it is just the sounds of the words that give you conceptual distinctions, it is their meanings and how those meanings fit into culturally defined conceptiual systems of interconnected knowledge.

Where does number fit into the continuum between definiteness and quarks? I think it is closer to definiteness, because the practice of counting is inextricably entwined with the words (or signs) we use for number. Research in the development of number ability suggests that we are born with the ability to exactly perceive and represent 1 to 3 elements in memory without counting, and that we can approximate larger numbers. This is precisely what you see in the Piraha (sorry I can't generate a tilde on this crappy computer in this trading post in the wilds of Maine where I am right now). The thing that bootstraps you beyond the small-number exact enumeration, into the realm of 4, 5 and to infinity and beyond, is the language of number. There is no way to do this (at least within the natural bounds of human experience) that does not involve some symbolic representation of exact quantities.

If we now take Liberman's example of "throwing", the parallels break down. Sure, you might question someone's knowledge of English if they didn't know the the word "throw", but I must confess, that I do not know the technical difference between a "lob" and a "toss" and a "hurl" -- maybe the latter is a bit faster? It's a bit like knowing that Elms are trees, but I would not bet more than 10 cents on my abiltity to identify one. It seems to me that you could develop a very cognitively complex representation of throwing distinctions by engaging in this act without using language. For example, baseball batters develop the ability to predict how a pitch is going to come by the configuration of the pitcher as the ball leaves his hand. There is no vocabulary for this, but is something that baseball batters develop. It's also possible that having words for different kinds of throws could (contrary to my own experience) engender some categorical perception for different kinds of throws (move the arm below 20 degrees and it's a "lob", above 20 degrees and it's a "toss"). So, in this case, the language might be crucial. But this is all an empirical question --it's why we do experiments.

My claim then is that because language is so intimately tied to counting, it basically makes no sense to ask whether it's language or counting that is important in acquiring exact numerical abilities. Personally, I think that Whorf was wrong about many things he said. I also think that the Piraha number case is just an existence proof for incommensurability and, in the absence of further empirical inquiry, should not be generalized beyond this case.

[email from Peter Gordon to Mark Liberman, 8/27/2004, for posting on Language Log]

Posted by Mark Liberman at 01:32 PM

August 26, 2004

A letter from the Lord Quirk

I had a letter from Lord Quirk yesterday. He had read my post on database errors that get me addressed as "Dear Dr Geoff" instead of "Dear Dr Pullum". I discussed some of the syntax of titles more generally, though certainly not exhaustively, and I used him as an example. He's a useful example because he was born Randolph Quirk and then worked his way upward through the ranks, bearing the official designations Mr Quirk, Dr Quirk, Professor Quirk, Vice Chancellor Quirk, Sir Randolph, and finally Lord Quirk of Bloomsbury. His University College London notepaper says, correctly, "The Lord Quirk FBA". (Oh, I forgot to mention one of his honors, he's a Fellow of the British Academy, too.) He wrote to me to share a little part of the treasure trove of maladdressed mail he has picked up since his elevation to the peerage. Much of it is very funny. And one item is truly extraordinary.

The Forum Hotel wrote to him and began the letter "Dear Quirk", which is an odd error (it's like "Dear Pullum"). A letter from England was addressed to "Lord Randle of Quirk" (that one must have been dictated over a bad cell phone connection). Woodstock Furniture sent him a form letter in which I think the botch must have been caused by reserving the first twenty characters for titles like "Professor" and then following with the given name; since they took "Lord Quirk of Bloomsbury" to be his title, and it is 24 characters long, what emerged was "Dear Lord Quirk of Bloomsrandolph Quirk".

But my absolute favorite came to him on the notepaper of a government department known as DEFRA. It was addressed by a government minister to a wide range of figures in the two houses of parliament, including influential members of the House of Lords, many of whom were very upset at the Labour government's plans to ban fox-hunting. The minister, the Right Honorable Alun Michael MP, began his attempt to influence Lord Quirk's opinion on this important issue in the following way:

DEFRA Department for Environment, Food & Rural Affairs		Nobel House 17 Smith Square London SW1P 3JR
	Private Office: Fax: Switchboard: Email:	020 7238 5379 020 7238 5867 020 7238 6000 alunmichael@defra.gsi.gov.uk
From the Minister for Rural Affairs Rt Hon Alun Michael MP		10 April 2002

The Lord Quirk
House of Lords
London
SQ1A 0PW

Dear The,

HUNTING WITH DOGS

In a statement to the House of Commons on Thursday 21 March, I set out the way in which I intend to proceed on the contentious issue of hunting with hounds in England and Wales...

All those honors, and you end up as "Dear The"! And in a letter from a person holding ministerial office within the legislative assembly in which you also serve! Priceless.

But of course I would not blog this if the point were merely a giggle. No, there is a serious linguistic angle, so let me just state it briefly. In slogan form: The days of "Last name, First name, Middle initial" must end! Let me explain.

Note first that names, forms of address, titles, addresses, and other such data have syntactic structure which is actually quite complex (for just a small sketch of some of that structure, see The Cambridge Grammar of the English Language, pp. 518-520). And note second that today memory resources (both disk space and RAM) are essentially unlimited and absurdly cheap. Yet what the software engineers give us as database programs is old-fashioned junk of 1960-ish design driven by two incongruously inapplicable false assumptions: first, that all data is to be conceived in terms of character strings of fixed lengths and a small number of simple types (First Name, Last Name, SSN#, etc.), and second, that space is tightly limited and expensive.

We need vastly more sophisticated database software for databases of names and addresses. It must be structure-sensitive, it must allow for all sorts of titles and arbitrarily varying lengths of different components, it must cover a large array of naming systems, prefixes, abbreviations, and appellations. And let me throw this in too: it must allow for a solution to the problem of duplicate entries by permitting recognition of almost-identical entries: surely the Geoffrey K. Pullum and the G. K. Pullum who are both at Stevenson College, University of California, Santa Cruz, could be spotted by some algorithm as likely duplicate entries in a mailing list so I don't get two copies of every mailing of junk.

We must have better software. Don't just take my word for it: I am supported in this call by my friend Lord Quirk of Bloomsrandolph Quirk (with whom I am on first-name terms, of course; he calls me "Geoff", I call him "The").

[Revised a bit on 08/27/04.]

Posted by Geoffrey K. Pullum at 08:25 PM

The Straight Ones: Dan Everett on the Pirahã

I was co-editor of the volume in which the first full description of the Pirahã language appeared (Desmond C. Derbyshire and Geoffrey K. Pullum, eds., Handbook of Amazonian Languages, Volume 1, Mouton de Gruyter, Berlin, 1986). Dan Everett's 200-page chapter on Pirahã is a highlight of the volume. (The Wikipedia article on the language is currently an unedited mess from which you can't even figure out the phoneme list, so I won't link to it; don't go there.) Dan is now a distinguished specialist on Amazonian languages and professor of phonetics and phonology at the fine Department of Linguistics at the University of Manchester in England. I know him well and respect him greatly. And I thought he might like to respond to some recent suggestions to the effect that Pirahã is just too strange to be true. So in this long post I include a statement that he supplied at my invitation.

It was Peter Gordon's recent publication in Science ( here, if you subscribe) that led to the long overdue recent discussion of Pirahã language and culture in various forums. The focus has mainly been on their innumeracy and its possible linguistic roots. Now, I should point out that I actually believe Dan may be overstating things in saying (as he does in a recent paper) that they are the only human group ever to have been found to have no numeral system; my understanding is that many Australian aboriginal languages, Warlpiri being one example, have no native number vocabulary; the speakers can and do learn to count, and simply borrow the number system of English in order to do so. That alone suggests there is little plausibility to the totally Whorfian spin that The Economist puts on Pirahã innumeracy ("At least in the field of maths, it seems, Whorf was right"). In what I've learned, I see no support for the usual vulgar Whorfian claim that everyone seems so besotted with (that your language determines the thoughts you can have); Mark Liberman makes the counterargument beautifully with his imaginary no-throwing culture, and as you'll see below, Dan agrees with Mark.

But Whorfian spin aside, the Pirahã language itself has many fascinatingly unusual features. We owe most of what we know about it to Dan and Karen Everett. In an email to Mark Liberman (quoted here), the entangledbank author appeared, in a damning-with-faint-praise sort of way, to be really skeptical about the Dan Everett's work on Pirahã: "None of it reads as obviously loony, but I have to wonder whether he's some Borgesian fantasist, or some Margaret Mead being stitched up by the locals." These suggestions — that either Dan (and Karen too?) made up aspects of Pirahã, or the tribe colluded to pull (without having any knowledge of linguistics) an intricate linguistic confidence trick on two skilled linguists, sustained over a quarter of a century and never revealed despite visits by a distinguished phonetician like Peter Ladefoged and a fine psycholinguist like Peter Gordon — are pretty ludicrous when you think about it. But it's logically possible, I suppose. There have of course been anthropological hoaxes in the past; think of the Tasaday, thought to be an isolated group of stone-age survivals until their language was shown to share 85% of its vocabulary with Cotabato Manobo and its members were discovered to have been manipulated into play-acting by a Filipino official.

So let's have Dan Everett speak for himself. Here's what he emailed to me about whether he was a hoax victim.

I started working with the Pirahãs in 1977, not knowing what I was in for. (By the way, they do not call themselves Pirahã, which is not even a word of their language. They refer to themselves as the Hi'aiti'ihi'. The ' indicates high tone on preceding vowel, no mark indicates low tone. Literally the name has four parts, glossed as 'his+bone+straight+Nominalizer'. A rough English translation is 'the straight one[s]'.) I have lived in all their villages for an aggregate time now approaching 7 years. For the past 27 years I did think I was a 'Borgesian fantasist'. I worried about this when I first published on their unusual stress system in Linguistic Inquiry in 1984, a squib which resulted in letters from well-known phonologists to me, a new PhD, to the effect that I was likely incompetent and telling me what stress meant. Years later, Peter Ladefoged came to work with me on the Pirahã language (and three others). When I met him at the airport in Porto Velho, he immediately said "Hello. By the way, I am very skeptical about your stress claims." Peter Ladefoged did in effect what Peter Gordon did. He heard about some weird claims and went to check them out. Both of them came to agree with me (more or less).

I have long hesitated to write this stuff up because even to me it sounded so weird. I am aware of the criticisms Pinker has, among others, leveled at Whorf for his fanciful sounding glosses, etc. But as I wrote (what people should indeed read if they are interested in this is the article cited by Mark Liberman and available on my website, on Pirahã culture and grammar, not just the number abstract, which I will be taking off the website), I realized that, yes, this is my best understanding of all of this stuff. I sent the article for comments to Steve Sheldon and Arlo Heinrichs, two SIL members (I am no longer a member of SIL) who each lived for many years among the Pirahãs and who speak/spoke the language fairly well. Both of them agreed with the account in that paper.

I disagree, as Mark points out, with Gordon's conclusions. I don't think that this case requires an appeal to Whorf. In fact, as I try to argue in the larger paper, just the opposite seems to be the case (Mark summarizes this all very well, no doubt better than I am doing here).

A well-known MIT linguist with an interest in morphology distributed throughout the grammar asked me why, if I am correct, Pirahã would be the only case known like this. Another linguist from the Northwestern US who has held high office in the LSA asked me if I thought the Pirahãs were genetically different.

To answer the first question, I think that it is hard to hazard analyses that go so strongly against the grain. It took me 27 years to work up the courage to say these things and I am still called a 'Borgesian fantasist' (and have been called much simpler things, like 'stupid'). There just aren't that many linguists with that kind of time on a language so isolated from Western civilization. Therefore, I am not surprised that there are so few claims. I do believe, however, that many analyses of number and grammar in the literature, on similarly primitive societies (in some technological sense of primitive) are likely 'overanalyzed', e.g. that there are likely to be other languages without embedding, where juxtaposition has been taken to be embedding without much thought given to the matter.

To answer the second question, Pirahã women occasionally have children with Brazilian traders passing through, children raised as Pirahãs. These children don't show any difference I can see from other Pirahãs on these cognitive skills or language facts. I don't think genes, retardation, or other such suggestions are useful or appropriate here.

My own view then is that the case of Pirahã illustrates, perhaps as well as any example ever discussed in the literature, the kind of bi-directional causal relationship between language and culture that Boas and Sapir would have expected us to find.

There is a problem for universal grammar in all this, though. That is the non-trivial one of setting the boundary between culture, grammar, and cognition in light of examples like this where previous boundary lines have been shown to be potentially illusory.

I just left the Pirahãs a few days ago. They are oblivious to all of this attention, yet doing well as a people. However, I have heard the very disturbing news that an electric power company is thinking of using their river, the Maici, to generate power in some way. If any outside company enters their reserve (which I helped demarcate, with support from Cultural Survival, 20 years ago), this could be the end of the Pirahã people. So I hope that this attention on them right now can be used to generate some support for their survival. Examples like Pirahã illustrate very clearly the loss inherent to knowledge of our species, if such a language were to cease to exist without having been studied. It also shows, I hope, that some studies take a LONG time, perhaps the length of an entire career.

-- Dan

Daniel L. Everett
Professor of Phonetics and Phonology
Postgraduate Programme Director
Department of Linguistics and English Language
University of Manchester
Manchester M13 9PL UK
Fax: 44 161 275 3187
Phone: 44 161 275 3158
http://ling.man.ac.uk/info/staff/DE/DEHome.html

Here's a map of where the Pirahã live. And for some pictures of Pirahã efforts at drawing and writing (they really do seem to be utter beginners at both), take a look at this drawing of a cat and this drawing of a tapir (with a few numerals added; they seem to be for decoration). Dan has also supplied this picture of two Pirahã women busily engaged in learning how to use pencil and paper.

[Update 4/9/2007 -- links to other Language Log posts on the Pirahã and related topics can be found here.]

Posted by Geoffrey K. Pullum at 01:48 PM

Wretch

Bill Hobbs of HobbsOnLine wrote on Aug. 23

I must admit I'm rooting for the USA's Olympic basketball team - which Donald Sensing notes has made history by losing twice in one Olympiad - to lose again and again, so that they don't win a medal. C'mon, admit it. The thought of Allen Iverson having a gold medal, or even a silver or a bronze, makes you want to wretch, too.

No, I think that Allen Iverson is a skillful and courageous player, who is gradually finding out that Larry Brown is right about what it takes to win in international as well as NBA basketball. I hope that Iverson and the rest of the U.S. team can adjust in time to win an Olympic medal.

And like basketball, English is a team sport. It's a creative bit of lexical shot-selection for Bill Hobbs to act as if retch "to vomit" were a verbal form of wretch "a miserable, unfortunate, or unhappy person; a person regarded as base, mean, or despicable". But individual creativity isn't enough, and so Bill looks like a fool in front of an international audience.

Wretch comes from Old English wrecca "exiles", while retch comes from Old English hræ̅can "to clear the throat, to bring up plegm". Words from diverse sources that wind up being pronounced the same way often fall together, as has happened with pole. But the whole team of English speakers has to agree about this, otherwise it doesn't work. If you just hoist up an off-balance word without thinking about your teammates, Bill, that makes you a self-involved jerk.

Of course, there are a lot of people who play the game the way that Bill does. Google finds 577 pages with "want to wretch", vs. 1660 for "want to retch". So Bill's move will work on the playground, but not at higher levels of the game.

[Eggcorn alert via email from Linda Seebach]

[And no, I don't really think that non-standard written English brands someone as a selfish jerk. I just felt that it would be polite to address Mr. Hobbs in his own idiom.

More seriously, I was just as unfair to Hobbs as he was to Iverson. Skill, creativity and individual initiative are things to be treasured, in language as in basketball, and so are tradition, cooperation and attention to the fundamentals. Iverson has a well-deserved reputation as a hotdog, though he also plays with passion and intensity despite injuries, while Hobbs just made one little lexical slip, of a kind that I've made many times myself. But I'm rooting for Iverson to win by learning more of what Larry Brown has been trying to teach him, not for him to lose by continuing to ignore it.]

[Update: Adrian Wojnarowski at espn.com has a positive outlook on Iverson's attitude and behavior at the Olympics. ]

Olympics

Posted by Mark Liberman at 07:16 AM

August 25, 2004

Abusive publisher of the month

In a recent post, I discussed an interesting-sounding review that was published in the Winter 2003-2004 issue of Academic Questions, which is the quarterly journal of the National Association of Scholars.

Looking for a copy of the review on line, I discovered this plaintive little note on the NAS website:

Catchword is a British company that contracted with our publisher to display AQ online for a fee to researchers. Access is provided free to NAS members and subscribers of the print version. Catchword is evolving since being purchased by a company called Ingenta, but the procedure for registration, cumbersome though it is, appears not to have changed. It reflects what seems to be the intention of our publisher, Transaction Periodicals Consortium, to impose strict limitations.

NAS members and AQ subscribers must go to the www.catchword.com web site. Hit the "Online Journals" link on the home page of that site. That sends you to a "Research Journals" page that sports a horizontal menu at the top, with a tab for "Register." Hit it and go to a page that asks if you seek "personal registration," which is probably what you want. Selecting "Yes" will bring up a form that asks your name, address and other details and requires you to select a user name and password. Enter your information and submit the form. Doing so should generate a CatchWord Identification Number (cid00000000). Go with that number in an email to our publisher, Transaction Periodicals Consortium, at <journals@transactionpub.com> and request online access to Academic Questions. Then you must wait till our publisher gets back to you with notification that you have clearance to open the online articles. Posting began with the winter 1998-99 AQ (Vol. 12, No. 1). It has taken Transaction as much as three or four weeks to provide that clearance in the past. AQ is up on the CatchWord site as a pdf file. You'll need Adobe Acrobat Reader software to open the various articles.

A person at Transaction who seems to be responsible for clearance is one Lisa Killian, (732) 445-1245 ext 610 <lkillian@transactionpub.com>.

I've quoted this note in full, not because I expect very many of you to follow its instructions, but because I want to ask a question.

Why does an organization like the National Association of Scholars, made up of "professors, graduate students, college administrators and trustees, and independent scholars", all apparently in full possession of their reason, and indeed "committed to rational discourse as the foundation of academic life in a free and democratic society", put up with this crap?

Posted by Mark Liberman at 06:05 AM

A marquee eggcorn

Linda Seebach emailed an interesting substitution, found in Mark Bauerlein's review of Gerald Graff's memoir Clueless in Academe, from the Winter 2003-2004 issue of Academic Questions:

"The press catalogues market these efforts as a record of change in the humanities, an insightful recapitulation leavened with a veteran's ken, and the authors' marquis footing seems to bear them out ."

This is not the most transparent sentence ever written. Can you really leaven a recapitulation with a ken? Be that as it may, Linda suggests that Bauerlein probably meant marquee where marquis is printed, and I think she's right.

Because marquee can mean "A rooflike structure, often bearing a signboard, projecting over an entrance, as to a theater or hotel", and because featured performers are traditionally named on a theater marquee's signboard, marquee has come to mean something like "featured" or "famous". For example, Google has 9,090 hits for "marquee player" and "marquee players". The OED indicates that this meaning is "orig. and chiefly U.S.", and glosses it as

designating a celebrity, star attraction, etc., whose name appears or is worthy to appear in the billing of a film, show, etc., or (allusively) who has achieved great fame and popularity.

Since marquis means "A nobleman ranking below a duke and above an earl or a count", and is often pronounced the same as marquee -- that would be [ˌmɐɹˈki], just to help you towards your recommended minimum daily allowance of IPA -- it's expected that some people would misinterpret "marquee player" as using the word marquis to mean something like "an aristocratic or high-status -- and therefore high-quality -- player". And indeed Google finds 2,312 pages with "marquis player" and "marquis players".

Other common modifier uses include "marquee names" (25,600 ghits vs. 250 for "marquis names"), and "marquee status" (1,420 ghits, vs. 29 for "marquis status").

"Marquee footing" is slightly odd, though perhaps no odder than those ken-leavened recapitulations. Linda observed in her email that Bauerlein starts out by saying, "One of the sadder spectacles of academe today is that of the eminent, near-retirement humanities professor reflecting on a long career", and continues with references to "... the topmost faculty ...", and so forth, so that it's clear that he's focusing on social status. Thus the hypothesis that he means "marquis footing" to mean "high status" is a plausible one. On this hypothesis, his use of "marquis" is an eggcorn, which is what we've taken to calling a sporadic folk etymology.

As Arnold Zwicky recently reminded us , "'eggcorn. ... is the name of a mental phenomenon; you have to know what someone intended, not just what they did." And in the case of published texts, it's worse than that -- you also have to deal with the problem of attributional abduction. Maybe it wasn't Bauerlein, but rather some copy editor at Academic Questions who made a change that Bauerlein didn't catch in the page proofs.

But in this particular case, there's an extra layer of eggcornic irony. Marquee is originally an English and/or American backformation from marquise, the feminine form of marquis! According to the OED, marquee is

Prob. < French marquise (although app. not attested until 1718 in this sense: see MARQUISE n.), the final / z / prob. being interpreted as -s plural.

The development of the putative source Marquise is described this way:

< French marquise marchioness (c1393 in Middle French), feminine form corresponding to marquis MARQUIS n.¹ (replacing earlier marchise, 13th cent. in Old French); used of various objects and fashions regarded as elegant or pleasing, hence: a kind of pear (1690), a canopy placed over a tent (1718; cf. MARQUEE n.), a type of settee (1770), a canopy in front of a building (1835), a ring with an elongated stone or setting, a diamond cut as a navette (late 19th cent.), a style of woman's hat (1889).

So marquee is the result of misinterpreting the final /z/ of French marquise as if it were the plural of a native English word, used to describe certain kinds of tents and canopies. This reanalysis took place in the late 17th century and early 18th centuries, and the spelling marquee was settled by 1800 or a bit earlier, with alternatives like markee and marki found earlier in the process.

The OED was able to find only one example of this usage being spelled marquis, in 1788:

1788 F. GROSE Mil. Antiq. II. Descr. Plates 2 A field-officer's tent or marquis.

and says that this is likely a misspelling or typographical error rather than any evidence of a more general pattern. So there's no chance that Bauerlein (or that copy editor) is reverting to a classical norm.

To recapitulate, we have:

marquise "French noblewoman" comes to mean "various objects regarded as elegant" and specifically "tent or canopy".
Then there's an eggcorn-like backformation: marquise "tent or canopy" is re-interpreted as the plural of marquee "tent or canopy".
This becomes the social norm, and a new word is born.
Then marquee = "(theater) canopy" comes to mean "star attraction worthy of marquee billing".
And finally, another eggcorn happens: marquee "star attraction" is re-interpreted as a use of marquis "French nobleman".

If you're interested in the content of Bauerlein's review, here's a longer discussion by Rose at No Credentials.

[Update: on another analytic dimension, Trevor at kaleboel emails:

Great fun, but wouldn't "maquis" be a more likely source? Some might wish to compare Gerald Graff's not-so-secret secret battle to save literature (and students) from academia to the heroic liberation struggle of the French resistance; others might see a vague resemblance between his beard and the Mediterranean scrub from which the maquis took their name.

I had considered the first of these hypotheses, and failed to find any textual support for it, but Trevor's analytic conjectures usually reward scrutiny. As a bearded academic myself, I'll take his "Mediterranean scrub" crack to be one of the exceptions. ]

Posted by Mark Liberman at 05:57 AM

August 24, 2004

On the eggcorn beet

Here on the eggcorn beat, business is booming. They pour in every day, and then there are some contributed by enthusiastic Language Log readers, like reporter Linda Seebach. Here are the latest three to come by my eyes, plus fourteen more from my archives, and some from the usage manuals. But first, some reminders.

Reminder: Any single example is inscrutable. We can't know for sure what gave rise to it. It could be an on-line, inadvertent, glitch in spelling, or a one-shot retrieval of the wrong lexical item. Or it could be a reanalysis of an existing expression. We can guess about what was going on in the mind of the person who produced the example, and we can back up our guesses (when we're lucky) by noting whether the item was corrected, whether the unexpected item occurs repeatedly, whether the person who produced it defends it, and so on. But "eggcorn" -- like "classical malapropism", "syntactic blend", and other labels, some of which I'll soon talk about in this forum -- is the name of a mental phenomenon; you have to know what someone intended, not just what they did.

Another reminder, which follows from the first: An example produced by one person can have a very different status from "the same" example produced by someone else. One person's blunder is another person's beloved eggcorn.

A final reminder, which follows from the other two: If it's an eggcorn for just one person, it's an eggcorn (for them, but not for anyone else). If it's an eggcorn for lots of people, we'll probably call it a "folk etymology", but there's no clear line here. If it's an eggcorn for most people, then it's the new dominant variant, and we all stop pasting labels on it.

On to the inventories. I've tried to eliminate examples that have already been mentioned here. (I have additional citations of at loggerheads > at lagerheads, for example, but this one's already been discussed here.) I might easily have erred here. Really, it's time for someone to keep an index of examples. No, don't look at me; I can barely cope as it is.

And some of the examples might just be inadvertent errors, like the following gem, with unthrone > unthrown (see the first reminder above):

As animal stories go, the mountain lion killed yesterday in Palo Alto could unthrown the black mamba snake from its perch of memorable tales. ["Mountain lion may give snake competition", Palo Alto Daily News, 5/18/04, p. 7]

Ok, now on to the inventories.

1. exercise > exorcize. Richard Schneider, Jr., editor's column ("Sept.-Oct. 2004: Hearts and Minds"), The Gay & Lesbian Review, Sept.-Oct. 2004, p. 4:

Congressman Barney Frank is fond of admitting that the public has repeatedly proven more tolerant, or at least less exorcized, about GLBT progress than he had expected.

2. plum(b) > plump, plumb > plum, plum > plumb. Larry Horn, ADS-L, 8/22/04:

From a recent spam:

My favorite part is the movies area, theres movies listed in here for download that are still in theaters - plump crazy :)

Checking google I find a number of hits on "plump crazy","plump nuts", and so on, as well as "plumcrazy".

Note that the modifier here spelled plum is historically plumb; it's an eggcorn that has pretty much succeeded in making it in the big time, so that at least some dictionaries (AHD4, for instance) list it as a alternative to plumb.

Meanwhile, the "correctly spelled" modifier plumb sometimes intrudes on the territory of the fruit noun plum, used metaphoricially, as when Richard Jasper congratulated Michael Palmer (on the newsgroup soc.motss, 6/16/04) on a new job:

Congratulations, MP! A plumb assignment indeed!

3. blackmail > blackmale. E-mail from Language Log fan Sylvus Tarn to me, 8/20/04:

...here's my contribution, though I'm not really certain it qualifies. A (very) brief google search turned up stuff on black male porn, which is not at all how this word is being used.

I've uppercased the eggcorn to emphasize it.

=====

Authored by: Anonymous on Tuesday, August 10 2004 @ 04:44 PM EDT

... IBM's reaction to SCO's attempt to BLACKMALE should have been a lesson to those that plan on using the legal system to stop this industry revolution.

=====

I'm thinking the writer came up with this word because Darl McBride (SCO CEO) is so strongly associated with with SCO and is considered (at best) by this community to be a blackguard; he's talked about "all hat and no cattle'" and again would be considered by them as a 'black hat' in the stereotypical Western.

Clearly, for some people blackmail has lost its association with written communication ("mail"), so that the second part of the expression is open for reinterpretation. I wouldn't entirely discount the possibility that the (perceived) threatening character of males, and black males in particular, played some role in the reanalysis.

4. Blasts from the past. Here's an assortment of eggcorns from my files, mostly from ADS-L (whose archives can be consulted on its website.

4.1. fine-tooth comb > fine toothcomb. An interpretation reported to me by Gerald Gazdar in the summer of 1987, when I corrected his misapprehension as reflected in a paper he was writing.

4.2. torticollis > tortoise collar. Reported by Mark Mandel on ADS-L, 9/25/00. Torticollis, according to Mandel, is "a medical condition in which the neck is more or less permanently twisted so the head is not facing forward".

4.3. paramour > power mower, candelabra > candle arbor. Discussed by Larry Horn on ADS-L, 7/6/01, following up on an original report by Mark Mandel.

4.4. bald face(d) lie/liar > bold face(d) lie/liar. Discussed by me in two postings to ADS-L, 8/27/02, with some Google hit statistics.

4.5. bald on record > bold on record. Follow-up on ADS-L to 4.4. by Beverly Flanigan, reporting on how some of her students cope with the "bald on record" level of Brown and Levinson's politeness hierarchy.

4.6. Magnum Boerum > Maggie Bowman, chifforobe > S(c)ha(e)f(f)er Robe. Reshapings introducing proper names, reported on by me on ADS-L, 9/21/02. The first (from an article in The American Horticulturist) denotes a old apple variety; the second (observed by me on labels in various antique shops) denotes a piece of furniture.

4.7. say one's piece > say one's peace, peace of mind > piece of mind. The first was noted by me on ADS-L, 5/21/03, from Mike Thomas (and others) on soc.motss, who queried my spelling of "I said my piece". Garner's A Dictionary of Modern American Usage reports this widespread reanalysis, as well as the less common "piece of mind"; MWDEU notes various "confusions" of peace and piece, even going so far as to employ the verb "botch" in this connection. But say one's peace is now so common among younger speakers (who are baffled by the claim that the original noun was piece) that it begins to rival have another thing (for original think) coming as a newly dominant variant.

4.8. teem with > team with. Another one from Mike Thomas on soc.motss, noted by me on ADS-L, 7/21/03.

4.9. bona fide > bonafied. Reported by John McChesney-Young to me in e-mail, 7/14/03, from a message to a homeschooling list that morning:

Well, maybe I've been going on without understanding the situation. I thought they were hassling only bonified truants, rather than homeschoolers who are dotting their "i"s and crossing their "t"s.

4.10. unbeknownst > unbenounced, a deluded state > a diluted state. Both discovered by Larry Horn (ADS-L posting, 8/25/03) in a review of the movie "Nurse Betty". In a posting later that day, Horn reports 475 Google hits on unbenounced.

4.11. per se > per say. Peter McGraw (ADS-L posting, 8/27/03) reports on the following from an e-mail message: "I am not only responsible for CFR but also Governmental Relations as well as we do not have a lobbyist per say."

4.12. windfall > winfall. Herb Stahlke (ADS-L posting, 10/30/03) cites:

Sub-head in today's Indianapolis Star, front page, above the fold:

$95.4 million jackpot is biggest individual winfall in state history.

Google shows 9320 hits for "winfall", many of which are lottery-related. It shows 238,000 hits for "windfall". Of the first 100, only one is lottery related, and that one occurs in The Guardian. Many uses are related to unexpected money from other sources, though.

4.13. martial law > mashall law. Perpetrated by David Fenton, posting to soc.motss on 5/7/04, who protested the treatment of prisoners in Iraq by saying: "This would be unacceptable in the US, even in the aftermath of a war and under marshall law."

4.14. crowning glory > crown and glory, be overdue > be overdo. Beverly Dumas, much taken with the terminology "eggcorn", reports on two of them in ADS-L postings, 8/12/04: "According to a survey, the thinning of a woman's crown and glory is one of the major causes of deflated self-esteem." And "(she'd been vaccinated against it once but was overdo a repeat)".

5. In the usage dictionaries. One place to find eggcorns is in the usage dictionaries and in other inventories of "confusables" (or "confusibles", depending on who you read). Under peak, peek, pique in MWDEU, for example, you'll discover that peak tends to substitute for the other two; I believe that pique > peak (as in "peak one's interest") is especially common, definitely in eggcorn territory.

For the most part, such sources confine themselves to especially common "confusions". But some take a wider view. Here's Bryan Garner, "Making Peace in the Language Wars" (the preface to Garner's Dictionary of American Usage (2003)), responding to Tom McArthur's criticisms of Garner's first edition (A Dictionary of Modern American Usage (1998)) in English Today and offering what he sees as terms of a "truce" between prescriptivists and descriptivists, including a position on eggcorns:

...prescriptivists need to be realistic. They can't expect perfection or permanence, and they must bow to universal usage. But when an expression is in transition -- when only part of the population has adopted a new usage that seems genuinely undesirable -- prescribers should be allowed, within reason, to stigmatize it. There's no reason to tolerate wreckless driving in place of reckless driving. Or wasteband in place of waistband. Or corollary when misused for correlation. Multiply these things by 10,000, and you have an idea of what we're dealing with. There are legitimate objections to the slippage based not just on widespread confusion but also on imprecision of thought, on the spread of linguistic uncertainty, on the etymological disembodiment of words, and on decaying standards generally. (p. xliv)

McArthur poked fun at the wasteband entry, and rightly so, to my mind. If Garner really is going to aim his stigma ray gun at every eggcorn (or classical malapropism) in the land, then his dictionary will indeed bloat up by thousands or tens of thousands of entries, most of them addressing the "confusions" of only small numbers of people (however strongly these people might hold to their hypotheses about the expressions in question) -- people who are incredibly unlikely to consult a dictionary, any dictionary, about whether their language is "correct" on these points.

Garner's larger claims, about undesirable novelty, imprecision of thought, spread of linguistic uncertainty, the etymological disembodiment of words, and decaying standards, are mostly just silly. Users of eggcorns are quite clear about the meanings they intend, they've reshaped expressions in an attempt to make them more transparent and comprehensible, and nearly all of the time these expressions can be understood without any difficulty in spoken language. The novel spellings do take some getting used to, and they do indeed conceal the etymological originals; that's the price you pay for for these "acres of diamonds" (as Linda Seebach described them). Even if they annoy you -- and some of them do annoy me -- they're scarcely the precursor to the Last Days of Civilization.

And I've never understood the position that it's the duty of the educated elites (like Garner and me) to try to prevent people from using innovative variants, until there are only a handful of us holdouts left standing, at which point we have to, however regretfully, let people do what they will. Maybe it's a Noble But Doomed Warrior thing. I just don't get it. Meanwhile, nobody says that I have to use any of these variants myself. Mostly I don't. But I can still admire the poetry of everyday language, even if it's not mine.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 07:53 PM

Hot features

On men, front is hot, back is not. But on women, it's back that's hot, while front is not. We're talking about vowels here, mind you, and Charles Darwin may be raising his eyebrows a bit as he discusses this matter with Edward Sapir and Aristotle at the University of Heaven.

Brad Pitt and Laura Dern have an advantage over John Wayne and Britney Spears, according to Amy Perfors, who did a study of the effects of phonetic symbolism on sexual attractiveness. She presented a paper entitled "What's in a Name? The effect of sound symbolism on perception of facial attractiveness" at CogSci 2004 (August 5-7 in Chicago) whose proceedings published the abstract, and a more informal description with some additional information can be found on her website.

What Perfors did was to post photos of 12 male and 12 female friends on hotornot.com, with names photoshopped in. She posted each picture multiple times, with different names. What she found was an interaction between the sex of the picture and the stressed vowel of the name: "Men whose names' stressed vowel is a front vowel were rated statistically MORE attractive than men with names with a stressed back vowel. However, the reverse was true for women: women with names with stressed BACK vowels were statistically more attractive."

She also found a sex-linked effect with initial sonorants vs. obstruents: obstruents were hotter on males, sonorants on females, though only the female effect was significant. All of these effects were small -- about a quarter of a point on a 10-point scale, or less, as you can see from the graph.

It's a cute piece of work, and it's not surprising that it made Nature, New Scientist, Guardian, Boston Herald, as well as Reuters/CNN.

So why do I think that Charles Darwin might be raising his eyebrows? Well, I'm sure that he's worried about the ease with which evolutionary psychologists can gin up stories about the reasons for perceptual preferences and contextual effects on behavior, no matter what these turn out to be. And he's familiar with the many pieces of research showing tendencies for female animals to make their vocal tracts seem smaller to emphasize their femininity, and for male animals to make their vocal tracts seem larger when they're acting masculine. So he knows, as Perfors does, that if the vowel hotness results had come out the other way around, there would have been an equally ready explanation.

I'm sure that's why, on her website, Perfors is very tentative in offering an explanation for the effect:

"Why do you think this happens? I want to stress here that I'm JUST GUESSING... but there has been other work suggesting that cross-linguistically, people think that front vowels are 'smaller' and back vowels are 'larger.' Now, you'd think that that would make guys with front vowels do worse than guys with back vowels (and vice versa), but there are other studies suggesting that women actually aren't most attracted to the super hyper-masculine, macho guys (and men aren't most attracted to the super feminine girls). It seems we like people who are somewhat masculine or feminine, but not too much. The reasoning is that maybe women want guys who can be kind of sensitive and gentle, and good providers - while guys want women who have a bit of spunk. What does this have to do with names? Well, maybe a guy with a front-vowel name seems subconsciously gentler or more sensitive, hence more attractive (and vice-versa for women). Again, this is a complete chain of guesswork; it could be another explanation entirely. More research needs to be done. But it makes a bit of sense, when you think about it."

I suspect that Darwin also might be worried about some other things, in this particular case. The nature of the English lexicon of names makes it impossible that Perfors' list was strictly controlled, phonologically and otherwise. You can't contrast (say) Beet and Boot, or Bit and Butt, or other "names" that differ only in the front-back dimension of their main-stressed vowel. Even if you could, the names would not be equally common (overall or in a particular age range), or equally associated with famous people, or whatever. Instead, the list of names with front vowels surely differed from the list of names with back vowels in many other ways, phonetically and otherwise. Perfors doesn't give the complete list that she used, or the raw results, so it's hard to tell whether there are any other plausible differences. And if she didn't start the study with the hypothesis that front-back was going to make the difference, but instead considered the 20 or so obvious phonological alternatives -- high vs. low vowels, labial vs. non-labial consonants, one syllable vs. two, open syllables versus closed, etc. -- then there's the statistical problem of multiple tests. And what's the distribution of sexual orientations of the hotornot.com "subjects"? These are the kinds of annoying, picky little questions that reviewers (are supposed to) ask for publication in refereed journals. Perfors may well have answers for such questions, and if she publishes in a well-refereed journal, she'll have a chance to bring them out.

I hope that if that journal is Nature, they somehow find some editors and referees who know more about the subject than the one they tapped to write their news article on Perfors' work (in news@nature.com, "The best in science journalism"). The writer, Michael Hopkin, seems to have good credentials:

"Michael became an online news reporter for Nature in January 2004, after two and a half years as a subeditor for Nature's print edition. Besides contributing to newspapers, magazines and online publications, he has given numerous interviews as a science expert on BBC radio. Michael has a BSc in biology from the University of Nottingham..."

but if he ever took a linguistics course, he wasn't paying attention. He closes the article with these two howlers:

Perfors argues that the discovery that vowel sounds can influence a person's perceived attractiveness is the more interesting finding, because it seems to be a subconscious effect. Experts, including the Swiss linguist Ferdinand de Saussure, have previously argued that vowel sounds are arbitrary building blocks with no intrinsic meaning.

Which brings us to the most pressing question of all: is my own name, Mike, a help or a hindrance when it comes to attractiveness? "Mike is a front vowel sound, so it's a good name," says Perfors. "If you do badly with the ladies you can't blame it on your name."

The first problem here is that there are hundreds if not thousands of published articles on sound symbolism, mimesis and even on whole systematic sound-symbolic subvocabularies called ideophones, so it's preposterous to cite Perfors' research as if it were the first suggestion ever that l'arbitraire du signe is not absolute.

The second problem is worse. Mike is NOT "a front vowel sound". The vowel in the name spelled "Mike", in all English dialects that I know of, has a low back nucleus.

If Hopkin quoted Perfors correctly, and if she was not just being polite, then she might have been confused by the spelling. The symbol "i" in IPA (as in most orthographies) denotes a high front vowel. But in English, as a result of the Great Vowel Shift, the nucleus of long vowels written with orthographic "i" lowered and backed, all the way to the bottom back corner of the vowel quadrilateral. In most contemporary dialects, it's a diphthong with a high front off-glide, so you might take it as mixed on the hot-or-not dimension, but "a front vowel sound" it is definitely not.

If you think about it, this is a sad state of affairs. A journalist who served for "two and a half years as a subeditor for Nature's print edition", who passes for "a science expert on BBC radio", and who was assigned to write a story about sound symbolism for a publication that advertises itself as "the best in science journalism", turns out to be completely ignorant of the most elementary phonetic terminology, as applied to the pronunciation of his own native language. Worse, the term in question was the key independent variable in the experiment under discussion, and he not only didn't know what it meant -- as applied to the kind of words studied -- he didn't bother to find out.

As before, I blame the linguists, for not insisting that anyone who claims to be an educated person needs elementary competence in describing and analyzing the sound, form and meaning of human language.

[link via Erika at Kittenishly Doomy Thoughts]

[Update: as Eric Bakovic points out over at Phonoblog, Perfors demonstrates in her webpage discussion of the experiment (which I cited above) that she knows about the feature composition of the diphthongal pronunciation of long i, so (unless she suffered a moment of distraction in a telephone interview, or something) the error is entirely Hopkin's. ]

Posted by Mark Liberman at 07:20 PM

Further adventures of Grice in Wonderland

If I'm right (in my previous post) then when Lewis Carroll's Duchess says Never imagine yourself not to be otherwise than what it might appear to others that what you were or might have been was not otherwise than what you had been would have appeared to them to be otherwise, she gives quite meaningful advice which in my particular case implies I should not imagine myself to be ugly. What is the relationship between this and the Duchess' previous injunction, "Be what you would seem to be", which she appears to say means the same?

Obviously, there is no simple relationship, since "Be what you would seem to be" concerns what one should be rather than what one should imagine about oneself. To give one possible analysis, "Be what you would seem to be" may be understood with "would" interpreted as "want to". I should be what I want to seem to be, i.e. I should make myself handsome. Or else I should fulfill the request indirectly by coming to want to seem to be what I actually am. So if I'm ugly, I should want to seem ugly too. Either way, it's not the same as never imagining myself otherwise than handsome.

The Duchess, of course, never actually says that the two commands are the same: if you read closely, you'll see that she merely implicates that they are the same (to use Grice's terminology, meaning that she suggests it via pragmatic inference). She connects the two commands with "if you'd like it put more simply", but she never states explicitly that the second command is the first but put more simply. Similarly, I can consistently say "Mice are fish, and if you'd like it put more simply, a whale is a fruit." Just in case it flashed through your mind that the Duchess might have been inconsistent. Perish the thought. She may be mad, but, like Carroll's cat, she is consistently mad:

`But I don't want to go among mad people,' Alice remarked.
`Oh, you can't help that,' said the Cat: `we're all mad here. I'm mad. You're mad.'
`How do you know I'm mad?' said Alice.
`You must be,' said the Cat, `or you wouldn't have come here.'
Alice didn't think that proved it at all; however, she went on` And how do you know that you're mad?'
`To begin with,' said the Cat, `a dog's not mad. You grant that?'
`I suppose so,' said Alice.
`Well, then,' the Cat went on, `you see, a dog growls when it's angry, and wags its tail when it's pleased. Now I growl when I'm pleased, and wag my tail when I'm angry. Therefore I'm mad.'

The Duchess is famous for having said: "Everything's got a moral, if only you can find it." However, the Duchess has by now exhausted me. So only you can find it.

Posted by David Beaver at 01:33 PM

On not doing what you fail to understand a fictional duchess told you not to

Carroll's Duchess said: Never imagine yourself not to be otherwise than what it might appear to others that what you were or might have been was not otherwise than what you had been would have appeared to them to be otherwise. I'll try to take her advice, as Geoff (later commenting "we certainly know nothing about what it would mean") may have done without knowing it. But it's gonna be tough. Parsers on stun.

You see, I was (or might have been) ugly.
However, what it might appear to others that I was (or might have been), was handsome.
So what it might appear to others that I was (or might have been) was not, was ugly.
Now, as it happens, my eyes are blue.
But if I hadn't been as I was, they would have been brown, since most people in my family have brown eyes.
And presumably, if I hadn't been as I was, what I would have been would have appeared to others to be different.
Specifically, my eyes would have been brown: perhaps it might have appeared to others that I was ugly, perhaps handsome.
Who knows?
But let's not worry about that - we should think about other aspects of how I was or might have been than that respect in which I would have been different otherwise.
Concentrating on just those aspects, i.e. everything but my eyes, what might it appear to others that I was (or might have been) was not?
Eyes apart, I would (or at least might) have been handsome.
Thus, eyes apart, what it might appear to others that I was (or might have been) is ugly.
And, eyes apart, what it might appear to others that I was (or might have been) was not is handsome.
So this is what it might appear to others that I was (or might have been) was not otherwise than what I had been would have appeared to them to be otherwise: handsome.
Then if I accept what the Duchess said, I should never imagine myself to be otherwise than handsome.

It sounds like such good advice, so I'm trying, I'm really trying.

But imagining oneself not to be otherwise than what it might appear to others that what one was or might have been was not (even otherwise than what one had been would have appeared to them to be otherwise) is one of those peculiar actions that is much, much harder not to do once someone has told you not to do it. Provided you understand what they mean, of course. The Duchess obviously realized that her command could only be followed by someone who failed to comprehend it, which would explain why she phrased it in such a helpfully obscure way. Grice in Wonderland.

Posted by David Beaver at 01:28 AM

August 23, 2004

Parsing the Duchess

Senior research scientist Chris Culy at FXPAL ran the Brill part-of-speech tagger on the Duchess's sentence, fed the output of that to the parser written by Michael Collins, and fed the output of that to a perl script that translated it into XML. Those with a strong stomach may may look at the XMLized parse output if they care to read on below. But contrary to what I said in the first version of this note posted last night, the fact that a parse was produced does not indicate that the Duchess's sentence is grammatical: Fernando Pereira at Penn informs me that the Collins parser will assign a parse to any string of words. The parser finds the structure that would be most probable for the string, be it ever so unlikely. So that still gives us no clue as to whether the structure below corresponds to a grammatical sentence or not, and we certainly know nothing about what it would mean.

<doc>
<TOP numDtrs="1" headDtr="1" headStr="imagine">
<SG numDtrs="2" headDtr="2" headStr="imagine">
<ADVP numDtrs="1" headDtr="1" headStr="Never">
<RB>Never</RB>
</ADVP>
<VP numDtrs="3" headDtr="1" headStr="imagine">
<VB>imagine</VB>
<NP numDtrs="1" headDtr="1" headStr="yourself" isArg="true">
<NPB numDtrs="1" headDtr="1" headStr="yourself">
<PRP>yourself</PRP>
</NPB>
</NP>
<SG numDtrs="2" headDtr="2" headStr="to">
<RB>not</RB>
<VP numDtrs="2" headDtr="1" headStr="to">
<TO>to</TO>
<VP numDtrs="3" headDtr="1" headStr="be" isArg="true">
<VB>be</VB>
<ADVP numDtrs="1" headDtr="1" headStr="otherwise">
<RB>otherwise</RB>
</ADVP>
<PP numDtrs="2" headDtr="1" headStr="than">
<IN>than</IN>
<SBAR numDtrs="2" headDtr="1" headStr="what" isArg="true">
<WHNP numDtrs="1" headDtr="1" headStr="what">
<WP>what</WP>
</WHNP>
<S numDtrs="2" headDtr="2" headStr="might" isArg="true">
<NP numDtrs="1" headDtr="1" headStr="it" isArg="true">
<NPB numDtrs="1" headDtr="1" headStr="it">
<PRP>it</PRP>
</NPB>
</NP>
<VP numDtrs="2" headDtr="1" headStr="might">
<MD>might</MD>
<VP numDtrs="3" headDtr="1" headStr="appear" isArg="true">
<VB>appear</VB>
<PP numDtrs="2" headDtr="1" headStr="to">
<TO>to</TO>
<NP numDtrs="1" headDtr="1" headStr="others" isArg="true">
<NPB numDtrs="1" headDtr="1" headStr="others">
<NNS>others</NNS>
</NPB>
</NP>
</PP>
<SBAR numDtrs="2" headDtr="1" headStr="that" isArg="true">
<IN>that</IN>
<S numDtrs="2" headDtr="2" headStr="would" isArg="true">
<SBAR numDtrs="2" headDtr="1" headStr="what" isArg="true">
<WHNP numDtrs="1" headDtr="1" headStr="what">
<WP>what</WP>
</WHNP>
<S numDtrs="2" headDtr="2" headStr="were" isArg="true">
<NP numDtrs="1" headDtr="1" headStr="you" isArg="true">
<NPB numDtrs="1" headDtr="1" headStr="you">
<PRP>you</PRP>
</NPB>
</NP>
<VP numDtrs="3" headDtr="1" headStr="were">
<VP numDtrs="1" headDtr="1" headStr="were">
<VBD>were</VBD>
</VP>
<CC>or</CC>
<VP numDtrs="2" headDtr="1" headStr="might">
<MD>might</MD>
<VP numDtrs="2" headDtr="1" headStr="have" isArg="true">
<VB>have</VB>
<VP numDtrs="2" headDtr="1" headStr="been" isArg="true">
<VBN>been</VBN>
<VP numDtrs="2" headDtr="1" headStr="was" isArg="true">
<VBD>was</VBD>
<ADVP numDtrs="2" headDtr="1" headStr="not">
<ADVP numDtrs="2" headDtr="1" headStr="not">
<RB>not</RB>
<ADVP numDtrs="1" headDtr="1" headStr="otherwise">
<RB>otherwise</RB>
</ADVP>
</ADVP>
<PP numDtrs="2" headDtr="1" headStr="than">
<IN>than</IN>
<SBAR numDtrs="2" headDtr="1" headStr="what" isArg="true">
<WHNP numDtrs="1" headDtr="1" headStr="what">
<WP>what</WP>
</WHNP>
<S numDtrs="2" headDtr="2" headStr="had" isArg="true">
<NP numDtrs="1" headDtr="1" headStr="you" isArg="true">
<NPB numDtrs="1" headDtr="1" headStr="you">
<PRP>you</PRP>
</NPB>
</NP>
<VP numDtrs="2" headDtr="1" headStr="had">
<VBD>had</VBD>
<VP numDtrs="1" headDtr="1" headStr="been" isArg="true">
<VBN>been</VBN>
</VP>
</VP>
</S>
</SBAR>
</PP>
</ADVP>
</VP>
</VP>
</VP>
</VP>
</VP>
</S>
</SBAR>
<VP numDtrs="2" headDtr="1" headStr="would">
<MD>would</MD>
<VP numDtrs="2" headDtr="1" headStr="have" isArg="true">
<VB>have</VB>
<VP numDtrs="3" headDtr="1" headStr="appeared" isArg="true">
<VBN>appeared</VBN>
<PP numDtrs="2" headDtr="1" headStr="to">
<TO>to</TO>
<NP numDtrs="1" headDtr="1" headStr="them" isArg="true">
<NPB numDtrs="1" headDtr="1" headStr="them">
<PRP>them</PRP>
</NPB>
</NP>
</PP>
<SG numDtrs="1" headDtr="1" headStr="to">
<VP numDtrs="2" headDtr="1" headStr="to">
<TO>to</TO>
<VP numDtrs="2" headDtr="1" headStr="be" isArg="true">
<VB>be</VB>
<ADVP numDtrs="1" headDtr="1" headStr="otherwise">
<RB>otherwise</RB>
<PUNC>.</PUNC>
</ADVP>
</VP>
</VP>
</SG>
</VP>
</VP>
</VP>
</S>
</SBAR>
</VP>
</VP>
</S>
</SBAR>
</PP>
</VP>
</VP>
</SG>
</VP>
</SG>
</TOP>
</doc>

Posted by Geoffrey K. Pullum at 09:09 PM

Neurotic American English

I forgot to mention to Language Log readers a couple of weeks back (it's been busy) that I had learned from The Wall Street Journal (August 3, 2004, B7) of an advertising campaign launched in Britain by H. J. Heinz Co., featuring a baked bean with an American accent complaining about being undervalued for its nutritional qualities. Many British consumers think of Heinz Baked Beans as a quintessentially British dish, and are not aware that Heinz is based in Pittsburgh. But they love the way the underappreciated kvetching bean has an American accent. Far from expecting a baked bean to be British, test group audiences for the ads caught the psychiatric overtones and thought that the American English spoken by the bean character "was a good way to convey neurosis".

I thought the British were supposed to be our allies? Perhaps if we Americans are neurotic, it's because even our European friends keep stabbing us in the back. Neurotic indeed. Ha! They say we're paranoid, too; but you're not paranoid if people really are out to get you!

Posted by Geoffrey K. Pullum at 01:35 PM

The Duchess on being what you would seem to be

"I quite agree with you," said the Duchess; "and the moral of that is--‘Be what you woul d seem to be’--or if you'd like it put more simply--‘Never imagine yourself not to be otherwise than what it might appear to others that what you were or might have been was not otherwise than what you had been would have appeared to them to be otherwise.’"

Alice's immediate response to this (in Lewis Carroll's Alice in Wonderland, which I was browsing today to get material for a grammar exercise) was: "Pray don't trouble yourself to say it any longer than that."

But Alice did not address this question: is the Duchess's sentence — the one in red above — grammatical? After four or five careful attempts to make a judgment on this, I find I still can't decide. It's only forty words, in my native language, which I've been studying intensively for over three decades, and I still don't know.

Posted by Geoffrey K. Pullum at 01:20 PM

Acres of diamonds

Linda Seebach has a column on eggcorns and mondegreens in the August 21 Rocky Mountain News. A quote I especially like: "That's one of the great things about linguistics. What you need for research is all around you for the taking, like acres of diamonds, and surprisingly little is known about most of it."

Sometimes linguistic gems need more strenuous excavation: years of field work, or design and implementation of ingenious experiments, or long hours with old manuscripts. And diamonds once found need to be cut and polished. But Linda is right: if you have the basic descriptive skills and pay attention, everyday life is a linguistic treasure trove. A linguist experiencing ordinary human talk and writing is like a botanist wandering in the Amazon rain forest. And by linguist I don't mean someone who is paid to exercise one of the linguistic professions, but rather someone who interested in the phenomena of speech and language, in an informed way, whatever their role in society.

Posted by Mark Liberman at 09:36 AM

Olympic event named by a linguist

A recent newspaper article about the Olympic marathon mentioned that this type of race was first proposed and named by Michel Breal, a linguist who also coined the term semantics.

A fuller account can be found in this 2002 article in Running Journal:

"It was not a Greek, but a Bavarian born in 1832, Michel Breal, who conceptualized the race," said Dr. Dave Martin, marathon historian. "Breal's parents were Jews of French descent, and when Michel was just five, the elder Breal died and the family moved to France. Breal became the head of the French education system, and had a personal interest in mythology and the ancient Olympic Games of Greece."

Breal dreamed of a race based on a run by a Greek warrior who ran the distance of approximately 25 miles (40 km) from Marathon to Athens in search of soldiers to fight off potential Persian conquerors. When the Persians failed, the distance was run again - perhaps by the same Greek warrior - to announce victory news to the king.

"Since the ancient Greek Olympic games were to be reborn in 1896, Breal believed the time was right to add a new event - the long-distance run," said Martin. "Long-distance running was a new concept for the Olympics, but it was accepted, added to the games, and dubbed the marathon." As a symbol of the games' resurrection, the competition began that year in Athens on Easter Monday.

For a bit more information on Breal, here is the Columbia Encyclopedia entry and the 1911 Britannica entry. And this chapter (from F.R.Palmer,Semantics: A New Introduction, Cambridge University Press, 1976) explains in a bit more detail:

The term semantics is a recent addition to the English language. (For a detailed account of its history see Read 1948.) Although there is one occurrence of semantick in the phrase semantick philosophy to mean 'divination' in the seventeenth century, semantics does not occur until it was introduced in a paper read to the American Philological Association in 1894 entitled 'Reflected meanings: a point in semantics'. The French term semantique had been coined from the Greek in the previous year by M. Breal. In both cases the term was not used simply to refer to meaning, but to its development - with what we shall later call 'historical semantics'. In 1900, however, there appeared Breal's book Semantics: studies in the science of meaning; the French original had appeared three years earlier [Essai de semantique (1897) - myl]. This is a superb little book, now sadly neglected but well worth reading. It is one of the earliest books on linguistics as we understand it today, in that, first, it treated semantics as the 'science of meaning', and secondly, that it was not primarily concerned with the historical change of meaning (see 1.4).

Breal's connection to the Olympic movement and to the history of the marathon as an event is just a curiosity. But I'm interested that Breal was "head of the French educational system" a hundred years after von Humboldt reformed the Prussian one.

Posted by Mark Liberman at 09:07 AM

August 22, 2004

Language, thought and marketing

A NYT magazine article by Kathryn Schulz suggests a key causal role for terminology in the evolution of Japanese thinking about depression. Or at least, in the evolution of Japanese spending about depression.

Talking about depression in Japanese has always been a fundamentally different undertaking than talking about it in English. In our language, the word for depression is remarkably versatile. It can describe dips in landscapes, economies or moods. It can refer to a devastating psychiatric condition or a fleeting response to the Cubs losing the pennant. It can be subdivided almost endlessly: major, minor, agitated, anxious, bipolar, unipolar, postpartum, premenstrual.
But in Japanese, the word for depression (utsubyo) traditionally referred only to major or manic depressive disorders and was seldom heard outside psychiatric circles. To talk about feelings, people relied on the word ki or ''vital energy.'' A literal translation of Japanese synonyms for sorrow reads, to Westerners, like the kind of emotional troubles that might befall a kitchen sink: ki ga fusagu, sadness because your ki is blocked; ki ga omoi, sadness because your ki is sluggish; ki ga meiru, sadness because your ki is leaking.

About five years ago, according to Schultz, some advertising genius coined the phrase kokoro no kaze -- "a cold in the soul" -- in order to "explain mild depression to a country that almost never discussed it". And, not coincidentally, to sell Depromel, Paxil, Prozac and the rest. As a result, "depression has gone from bad word to buzzword":

Over the past five years, according to the Japanese Bookstore Association, 177 books about depression have been published, compared with a mere 27 from 1990 to 1995. Earlier this month, the country's most popular online bulletin board, Channel 2, carried 713 conversation threads about depression -- more than music (582) or food (691) and almost as many as romance (716).

This has certainly been a good thing for the pharmaceutical industry. It may be a good thing for the Japanese, whose suicide rate is twice that of the U.S., though the article doesn't indicate whether there's any statistical evidence that the increased sales of anti-depressants are having an impact on this. But Schulz seems to feel that it might be a bad thing for Japanese culture:

For 1,500 years of Japanese history, Buddhism has encouraged the acceptance of sadness and discouraged the pursuit of happiness -- a fundamental distinction between Western and Eastern attitudes. The first of Buddhism's four central precepts is: suffering exists. Because sickness and death are inevitable, resisting them brings more misery, not less. ''Nature shows us that life is sadness, that everything dies or ends,'' Hayao Kawai, a clinical psychologist who is now Japan's commissioner of cultural affairs, said. ''Our mythology repeats that; we do not have stories where anyone lives happily ever after.'' Happiness is nearly always fleeting in Japanese art and literature. That bittersweet aesthetic, known as aware, prizes melancholy as a sign of sensitivity.

This traditional way of thinking about suffering helps to explain why mild depression was never considered a disease. ''Melancholia, sensitivity, fragility -- these are not negative things in a Japanese context,'' Tooru Takahashi, a psychiatrist who worked for Japan's National Institute of Mental Health for 30 years, explained. ''It never occurred to us that we should try to remove them, because it never occurred to us that they were bad.''

I'm less willing to tell the Japanese that they should cultivate aware and leave happiness to us Westerners. Anyhow, it sounds like the Japanese are not removing mild depression at all, but rather beginning to obsess about it just like Americans do. Though perhaps they always did, but just with different terminology?

Posted by Mark Liberman at 03:08 PM

Life without counting throwing

The more I think about it, the more unhappy I am about the whole Pirahã counting discussion.

I'm not unhappy because people are getting Whorf wrong, though it's true that they are, as Kerim Friedman at Keywords explains.

I'm not unhappy because people are leaving Edward Sapir out of it, though they're doing that too. The Sapir-Whorf hypothesis is sort of like GNU-Linux systems. People often leave a crucial piece out of the name, and also out of their thinking. Sapir -- from whom Whorf got the idea in the first place -- balanced his discussion of the ways in which "the language habits of our community predispose certain choices of interpretation" with a claim about the "formal completeness of language". By this he meant that "a language is so constructed that no matter what any speaker of it may desire to communicate ... the language is prepared to do his work".

No, what bothers me about the Pirahã counting discussion is that I'm not convinced that language is relevant at all, in the sense of playing a causal role in the Pirahã's lack of counting ability.

Here's an analogy. Suppose that there's an isolated group -- call them the Nerdahã -- who just aren't interested in throwing things. They don't stone people, they don't have snowball fights, they don't play ball games, they don't skip flat rocks on the surface of ponds, they don't pitch pennies, nothing. There's no religious or moral prohibition against throwing, they just think it's boring and a bit stupid, when they bother to think about it at all, which is rarely.

As a result, Nerdahã kids don't ever practice throwing. Not for speed, not for accuracy, not for fancy effects like curving through the air, or coming down vertically through a hoop placed at a distance, or bouncing off in funny directions after landing. Therefore, when they grow up, they're hopelessly bad at throwing. In the parlance of pre-Title 9 America, they "throw like girls", an expression used not because women can't in principle throw very well indeed, but because girls traditionally didn't practice throwing skills.

Because of their complete lack of interest in throwing, the Nerdahã language is completely lacking in throwing vocabulary. They have no words for pitch, fling, chuck, toss, sidearm, slider, curveball, bouncepass, and so on. They have a verb [ˈpʊʃ] that they can use for propelling something a short distance through the air, as in [ˈpʊʃ.ɪɾˈo.vɚˈɦiɹ] "throw that to me", but the same verb can be used for any sort of ballistic transfer, including sliding over a flat surface, and even for propulsion in which the agent remains in contact with the item moved throughout its motion. In fact, this verb can even be used for trying to shift an object that remains immobile. And they have another verb [ˈsɛnd ] that can be used for propelling something a longer distance through the air, but the same verb can also be use for any transfer that is more indirect.

Now an American psycholinguist comes to visit the Nerdahã. Her field linguist guide, who has been working among the Nerdahã for decades, mentions their lack of ball sports and their lack of throwing vocabulary, and the psycholinguist realizes that this is a marvelous opportunity to evaluate not only the Sapir-Whorf hypothesis, but also the idea that throwing is a human instinct, an innate module which may even have played a key causal role in the evolution of the hominid line. So she tests the throwing skills of the Nerdahã.

Well, you can figure out the rest for yourself. At a target distance of one or two feet, the Nerdahã do okay. As the target gets further away, projectiles start flying off in random directions at variable but generally low rates of speed. "Sheesh", says the psycholinguist to herself, "Whorf was right! Language does determine cognitive capacity. These people have only two words for throwing, and as a result, they can't throw for spit!"

There's something wrong with this story, don't you agree?

Posted by Mark Liberman at 12:12 PM

The French and the Piraha

The Piraha aren't the only people with an unusual approach to numbers. In the French of Southern France there are some odd goings on too. The French franc was revalued by a factor of 100 in 1960. In the South (I don't know why, but this seems not to happened in the North) people continued to use Old Francs for some purposes, most notably for real estate and cars. A house listed for 40,000,000 F is actually selling for 400,000 NF I haven't been there in a while and don't know to what extent this is sitll the practice. Anyhow, one day twenty-some years ago I was walking with a friend on his farm in the Dordogne. The previous day I had lent his mother 100 NF. He said to me:

T'as prêté 10,000 balles à maman. [You lent Mother 10,000 francs.]

Je viens de t'en rendre 30. [I just gave you back 30 francs.]

Ça laisse 7,000 balles que nous te devons. [That leaves 7,000 francs that we owe you.]

10,000 OF
-   30 NF
--------
 7,000 OF

The generalization seems to be that Old Francs are used for larger sums, with the boundary somewhere between 30 and 70.

Posted by Bill Poser at 05:45 AM

August 21, 2004

Scattered brains

A friend of mine wrote me an e-mail today in which she related a story about how absent-minded she's been lately, and signed the message "your scattered brain [friend]". After shuddering a little at the image, I Googled.

All together, the three variants of the adjective I'm familiar with -- "scatterbrain", "scatter brain" and "scatter-brain" -- result in 40,300 ghits. The equivalent variants of my friend's apparent neologism -- "scattered brain", "scatteredbrain" and "scattered brain" -- all together result in only 731 ghits.

Update: Rich Alderson writes:

The usage with which I'm familiar is "scatter-brained", which Google offers up preferentially without the hyphen, even though there are half again as many occurrences with same:
scatter-brained 32,500 scatterbrained 23,200
So I find the "scattered brain" form fairly odd if not intentionally shifting the above forms.

In my late-afternoon, scatter-brained stupor, I had thought that Googling for "scatter-brain", etc. would result in hits for both "scatter-brain" and "scatter-brained". I was, of course, wrong.

Interestingly, a quick search for the variants of "scattered-brained" results in a whopping 503 ghits -- including things like:

(link) This is my complete downfall! I am the most unorganized scattered brained person when it cmes to my business!
(link) Sunday: Scattered-brained thunderstorms.
(link) It's an amazing tool/gift that God's given to us - writing - salvation for the scattered-brained.
(link) Yes, I succesfully paperchased us through the red tape of two governments and had managed to get us this far; however, I was too scattered brained to properly care for our passports!!

Go figure.

Some of these come from medical/academic or legal sources and have "scattered brain" in completely normal-sounding contexts like:

(link) Scattered Brain Infarct Pattern on Diffusion-Weighted Magnetic Resonance Imaging in Patients with Acute Ischemic Stroke. [Title of an article in the journal Cerebrovascular Diseases.]
(link) MRI images depicted brain activity clustered in the prefrontal cortex after mental training of the elbow flexor muscles, compared to a more random, scattered brain activity exhibited in the brain prior to mental training (Figure 1).
(link) Closed head injuries often times causes scattered brain injuries or damage to other areas of the brain.

Many others are examples that probably wouldn't have piqued my interest as much as my friend's e-mail, but are still interesting:

(link) Random thoughts from a scattered brain.
(link) Random notes from a scattered brain.
(link) RAVINGS OF A SCATTERED BRAIN.
(link) They are not organized in any way - I just add them as they come to my scattered brain.
(link) I have a very scattered brain.
(link) Just some thoughts from the scattered brain that is mine.

A handful of others are examples very much like my friend's:

(link) Had the police not been scattered-brain and thinking that they were invincible maybe things would've had a different ending.
(link) But I'm a scattered brain and have just started (3 days) taking 30mg Adderall to help me focus.

I even found it in a couple of interesting poems:

From Wednesday's Child (link)

Scattered brain
You've been crying in the rain
You've been drowning in your pain
And gonna die
Do the right thing
Ain't no loose
Don't confuse
Wednesday's child

From O Miss Already (link)

I'm just two feet from the highway.
A scattered brain shuffling a deck of cards.
But if you wanted me to, I'd give up my gambling
just to pull the weeds from your backyard.
I aint got a dollar, aint even got a dime,
but a millionaire I am in time.
O Miss Already, I want you
to spend yours with mine.

[ Comments? ]

Posted by Eric Bakovic at 09:26 PM

The outlandish and the unanswerable

Bill Poser's response to my charge that he is wrong about English grammar comes in two parts. One is relevant, but looks outlandish to me — though it could in principle be supported. The other is irrelevant to grammar but unanswerably decisive on the legal point under discussion.

The outlandish bit is his claim that sentences like *Either the husband or the wife has perjured himself (I insist on the asterisk; he omits it) are grammatical.

Here on Language Log we generally don't play the "your dialect / my dialect" game. (Bill opens such a game when he says, "at least as far as my own judgments are concerned".). Instead we ask for evidence. If Bill sends me five attested Standard English sentences of this general sort (where the two different sexes are explicitly referenced and a form of he is used to refer back promiscuously to either one of them), I'll admit that I was wrong. My belief is that he won't be able to do it. This challenge makes it an objective matter what the grammaticality facts are, and so we don't need to inquire into the issue of whether Bill might be deluding himself about his own dialect to defend his own earlier claim from my charge of error.

The other thing he says is the thing he should have contented himself with. It's about the legal issue. The Interpretation Act of 1889, "provides that words importing the masculine gender shall include the females." That's unanswerable. It makes it a matter of law whether the Canadian Supreme Court was right about the Famous Five. The law can say that words covering road vehicles shall include boats if it wants to; it's in the stipulation business. That's what makes the Canadian Supreme Court judgment of 1927 completely wrong. My point was merely that the legal stuff can't alter the facts of English as ordinarily used, and I still think that on that score it is just a mistake that he is ever sex-neutral.

Posted by Geoffrey K. Pullum at 07:51 PM

Dead language, dead factor

Karen Rothkin emailed an interesting mis-hearing in the transcript of Mark Shields and Irving Kristol discussing Kerry's war record on the Jim Lehrer News Hour:

MARK SHIELDS: I'll tell you where it's going to be a factor, and that is John McCain, a man for whom I have enormous respect. John McCain has to make a choice. John McCain has become the dead factor running mate of George W. Bush.

The audio makes it clear that Mark Shields said "de facto running mate", but presumably the transcriptionist did not know the Latin expression de facto, and so was willing to substitute the rather low probability string "dead factor" as the best available option. It's phonetically pretty reasonable: [ˈdeⁱ ˈfæk.to^u] is not far from [ˈdɛd ˈfæk.tɔɹ], especially since the final [ɹ] can be picked up from the beginning of the following words "running mate", and the effects of a final [d] on the preceding vowel are quite similar to those of a high front offglide.

We've recently seen another example of a transcription eggcorn from this same news show: "handfisted" for "hamfisted". I wonder whether the Jim Lehrer news hour gets their transcriptions done in India. Perhaps vocabulary from Sanskrit would be more accurately transcribed than bits of Latin are.

Posted by Mark Liberman at 06:05 PM

Update on the Germanspellingreformoppositionmovement

A little while ago, we had a flurry of posts on the German spelling reform (here, here, here, here), and Chris Waigl at serendipity followed up, and Trevor at kaleboel found satirical and politico-theoretical blues-historical angles [and this earlier post is also relevant, now that I'm correcting links 12/20/2005], and BebaManno mentioned it at Taccuino di traduzione, David Mortensen discussed it at It's Ablaut Time, Language Hat weighed in briefly, and Scott Martens wrote extensively about it at afoe and at Pedantry, and someone at Waffle commented on Scott's posts, and Des von Bladet deepened the cultural context by quoting a comment on the 1906 Swedish spelling reform, adding ominously "Ask me about Norwegish spelling reforms some dull and rainy decade, just ask".

Posted by Mark Liberman at 11:38 AM

Do-it-yourself classics

An outfit called 24 Hour Translations ("a premier German and Latin translation service") offers an online "list of some of the most commonly found Latin abbreviations and phrases".

The list of abbreviations and terms of art is pretty extensive, from ab aeterno to VRI (Victoria Regina et Imperatrix). They've also included a few mottos, proverbial expressions and other often-quoted fragments, like Dominus illuminatio mea and et tu, Brute, but that aspect of the list seems radically incomplete.

But it was something else that caught my eye. They offer "Latin Translations from only $16!" and what they mean by this is that "you can order a translation of 20 words or less between English and Latin (as used by the Romans). The cost of this service is £10 GBP / $16 USD / €16 EUR, and the translation will be ready within 24 hours - we'll even include a simple guide to pronouncing the Latin phrase".

This seems pretty expensive. I guess it's within an order of magnitude of standard commercial translation rates, which tend to run in the range of US$0.10-$0.40 per word, depending on the nature, size and urgency of the job. But if all you wanted to know was what the Romans meant by quoting Hannibal to the effect that "inveniemus viam aut faciemus", you'd be paying $4.00 per word. (Since 24 Hour Translations says that "Due to problems encountered in the relocation of our office to Austria, we have had to suspend normal trading until further notice", I'll tell you for free that Hannibal meant "we will find a way, or we will make one").

One of the nice things about dead languages is that they're finite, so that in principle, translations (of the extant genuine texts) can be done by table look-up. This is more or less true as a practical fact for classical Latin -- if you're puzzled by a phrase, you can go to the Perseus web site (or better, one of its less busy mirrors), and look it up. Unfortunately, Perseus doesn't seem to offer indexing by word sequences, but only boolean combination of words, which makes it a bit harder. But still, if someone has quoted a bit of Latin at you without translation -- say "forsan et haec olim meminisse iuvabit" -- Perseus will locate it for you as 1.198 of the Aeneid, and will offer you the ability to look the words up individually, and puzzle out that it means something like "perhaps one day even these things will be pleasant to recall". Perseus will also offer you John Dryden's translation

An hour will come, with pleasure to relate
Your sorrows past, as benefits of Fate.

thus suggesting that Dryden was an uninspired translator as well as a culpably bad grammarian. Perseus will also inform you that Theodore Williams translated this phrase as

It well may be
some happier hour will find this memory fair.

Turning from Perseus to Google, you can learn that forsan et haec olim meminisse iuvabit is the motto of the Complexity, Theory and Algorithmics Group at the University of Liverpool, who suggest the translation "Perhaps, one day, even this will seem pleasant to remember", and add that the motto "indirectly links the group with the City and University of Liverpool. The City motto (Deus nobis haec otia fecit, 'God has provided this leisure for us') is also taken from Virgil (Eclogue I, l.6) and is answered in the University motto (Haec otia studia fovent; 'This leisure makes our studies flourish')."

Google will also lead you to the story of Donald MacLeod's snuffbox, and why it had olim haec meminisse iuvabit engraved on the cover.

At this point , you could go out and buy a round or two for your friends with your savings -- $48 so far -- or you could start with another phrase -- say "si volet usus / quem penes arbitrium est et ius et norma loquendi", and try to build your classical capital to the level required for a good dinner and a show.

Posted by Mark Liberman at 11:34 AM

Generic He?

Geoff Pullum argues that the Supreme Court of Canada were correct in considering he to be an exclusively masculine pronoun and that I was therefore wrong to criticize them for their reasoning in the Persons Case. English is not my area of expertise, so I would normally defer to Geoff, but I am a native speaker of English, and at least as far as my own judgments are concerned, he simply has the facts wrong. His examples:

Either the husband or the wife has perjured himself.

and

Was it your father or your mother who broke his leg on a ski trip?

which he takes to be ungrammatical, are grammatical for me.

In any case, this is not merely a matter of grammatical analysis, but of the construction of statutes, where principles other than, and in addition to, those of grammar may be relevant. Of relevance here was the Interpretation Act of 1889, which in section 1, sub-section 2

provides that words importing the masculine gender shall include the females.

as the Law Lords put it in their opinion. This is of course a restatement of section 4 of Lord Brougham's Act of 1850.

Posted by Bill Poser at 02:59 AM

Whose English?

According to a CanWest News Service article by Tom Spears entitled Canajun not good enough, eh?, (Vancouver Sun August 20, 2004, p. A12), the British government is now requiring Canadians, Americans, Australians, and New Zealanders applying for British citizenship to undergo an evaluation of their English proficiency. An Australian woman with two university degrees in English has already failed the test.

Posted by Bill Poser at 02:46 AM

August 20, 2004

One, two, many -- or 'small size', 'large size', 'cause to come together'?

According to this 8/20/2004 Reuters story, Peter Gordon from Columbia University spent a few months in the Amazon with Dan and Keren Everett, testing the counting skills of the Pirahã, whose "words for numbers appear limited to 'one,' 'two' and 'many,' and the word for 'one' sometimes means a small quantity".

The article quotes Gordon as saying that "In all of these matching experiments, participants responded with relatively good accuracy with up to 2 or 3 items, but performance deteriorated considerably beyond that up to 8 to 10 items", despite the fact that "Piraha participants were actually trying very hard to get the answers correct, and they clearly understood the tasks."

Although I haven't asked Peter or Dan what they think, this story (though I suspect not its headline) seems to be a fairly accurate representation of their point of view. If so, it may be because the 400 word article contains about 250 words of direct quotations from Peter. From what I can tell, the article does manage to misspell the name of the tribe and its language, which doesn't matter much, and to state the central linguistic facts incorrectly, which does matter a bit, because it makes the situation seem less interesting that it really is. But in comparison to what one often sees, this is not too bad.

Dan's website links to an "abstract for paper in progress" called "On the absence of number and numerals in Pirahã". In the Documents and Papers section of Peter's website, there's a movie "[showing] a Pirahã participant using fingers and Piraha count words to enumerate quantities". There is also a wikipedia article on Pirahã.

The first small problem with the Reuters article is that the usual spelling for the language is "Pirahã", with a tilde over the final /a/, indicating that the vowel is nasalized. I suppose that Reuters has a policy against using diacritics in English-language articles, so this transduction is expected, and not relevant to the main point in any case.

The other problem with the article is a bit more interesting. As Dan's abstract explains, the relevant Pirahã noun qualifiers are actually /hóì/ (i.e. "hoi" with falling tone) meaning "small size or amount"; hòí (i.e. "hoi" with rising tone) meaning "large size or amount"; and then phrases like /bá à gì sò/ meaning "cause to come together". Dan chooses to call these "qualifiers" rather than "quantifiers" because he feels that they are not really "words naming numbers" at all. For example, someone who asks for "hóì fish" (if you'll forgive the mixture of Pirahã and English here -- see Dan's abstract for the real Pirahã phrase) is really asking for a small amount of fish, which could be one or two fish, or perhaps a small fish. It could not, crucially, be used to ask for one large fish. Someone who asks for "hòí fish" is really asking for "a {big/big pile of/many} fish".

Dan's point is that Pirahã is "lacking number and numerals entirely", and that "the very concept of counting is foreign to the Pirahãs". At the same time, their language does have a distinction between count and mass nouns, so that there is the equivalent of the English difference between "{many/*much foreigners}" and "{*many/much manioc meal". Again, see Dan's abstract for the details.

This is quite different from -- and more interesting than -- the Reuters statement that "their words for numbers appear limited to 'one,' 'two' and 'many,' and the word for 'one' sometimes means a small quantity".

As for the article's headline -- "Tribe has best excuse for poor math skills" -- I guess the kindest thing to say would that it attempts to relate the topic to the life experience of the audience.

But compared to some previous language-related stories, this one seems pretty good.

[Update: this article in the Toronto Globe and Mail is much better than the Reuters story, and also (unlike the Reuters story) mentions the new article in Science that is the reason for the item in the first place. The article in the Telegraph features explicitly Whorfian reactions from a variety of researchers, quoted from the news piece by Constance Holden in Science -- but the Telegraph mistakenly says that the article was published in Nature.

If you have a subscription to Science, you can read Holden's discussion (COGNITION: Life Without Numbers in the Amazon Science 2004; 305 (5687) : 1093a, in News of the Week), and the Gordon article itself.]

In fairness to Reuters, the "one-two-many" mistake is strongly encouraged by Gordon's Science article, whose abstract reads:

Members of the Pirahã tribe use a "one-two-many" system of counting. I ask whether speakers of this innumerate language can appreciate larger numerosities without the benefit of words to encode them. This addresses the classic Whorfian question about whether language can determine thought. Results of numerical tasks with varying cognitive demands show that numerical cognition is clearly affected by the lack of a counting system in the language. Performance with quantities greater than 3 was remarkably poor, but showed a constant coefficient of variation, which is suggestive of an analog estimation process.

]

[Update #2: The author of the entangledbank weblog emails with more flavorful Pirahã lore:

You don't reference this article in your latest Language Log: http://lings.ln.man.ac.uk/info/staff/DE/cultgram.pdf Everett discusses not just numerosity but other astonishing claims about Piraha language/culture: no embedding, no quantification, no creation legends, no fiction, no deep memory, no colour terms, pronouns borrowed, simplest kinship system, no relativization, no perfect. If even part of this is true it's a huge challenge to conventional wisdom. None of it reads as obviously loony, but I have to wonder whether he's some Borgesian fantasist, or some Margaret Mead being stitched up by the locals, because this is weird beyond most parameters. His glosses are highly suspect, with a deliberate Bloomfieldian strangeness. And the business about nominalized clauses is really shonky: by that standard any non-finite clause (I like to-go, I like ski-ing) is nominalized, depending on how alienly you translate the morpheme.
I have suggested that the one/two versus small/big distinction is not completely unlike English 'a'/'a couple of'. It's culturally conditioned by what artefacts are available and expected: 'a' basically means 'one', but if you're hammering and ask someone for a nail there's no strong pragmatic violation in being given several. Likewise 'a couple of' for many people means 'several', 'two or so'. So we have a precedent for numerals also having a less definite quantificational aura.

I agree that the lack of a number system seems to be the least of the interesting claims here. But I should add that Dan Everett is a skilled and reliable observer, and a number of other excellent linguists have spent time with the Pirahã in his company. So the Borgesian fantasy theory can be dismissed out of hand, I think. And it's hard to see why the Pirahã would have any motivation to play an elaborate linguistic and cultural joke on Dan (and everyone else) over so many years; and even harder to imagine how they could carry it off.

One point worth stressing, though, is that Dan Everett's account of what is going on is opposite to the way that Gordon and other are stating things: they're talking about the influence of language on thought, but Dan's discussion is mainly about the influence of culture on language. Specifically, he argues that

... these apparently disjointed facts about the Pirahã language -- gaps that are very surprising from just about any grammarian's perspective -- ultimately derive from a single cultural constraint in Pirahã, namely, to restrict communication to the immediate experience of the interlocutors, as stated in (1):

(1) PIRAHÃ CULTURAL CONSTRAINT ON GRAMMAR AND LIVING:
a. Grammar and other ways of living are restricted to concrete, immediate experience (where an experience is immediate in Pirahã if it has been seen or recounted as seen by a person alive at the time of telling).
b. Immediacy of experience is expressed by immediacy of information encoding -- one event per utterance.

]

[Update 8/26/2004: a fascinating set of comments by Dan Everett, introduced by Geoff Pullum, is here. ]

[Update 8/27/2004: and comments by Peter Gordon, author of the Science article, are here]

Posted by Mark Liberman at 10:16 AM

August 19, 2004

Eggcorn and malaprop among the flowers

Mark Liberman presents us with a shock-jock-filled bouquet of ache-corns taking off on the expression chock-full or chock-filled. Back in April I came across yet another (very common) variant, chocked full, in which the original chock is interpreted as if it were a reduction of a past participle (as in bake beans for baked beans), which is then "fixed" by having the past participle suffix restored. Along with this eggcorn came a classical malapropism as well.

On 29 April 2004, I posted the following (slightly edited here) to the American Dialect Society mailing list:

==========

While googling on "Zwicky Lederer", to see if my Prescriptivism and Usage website files have gotten into the system, I discovered a review of Spencer & Zwicky, Morphological Theory, that I hadn't seen before. On amazon.com, the only review there, by someone billed as "verafides, a Real, Live Linguist". (Verafides also has a list of favorite books in linguistics, and S&Z gets in there too.)

Well, it's a bouquet of flowers for Andy Spencer and me (and our many contributors). This is immensely gratifying, of course. And, as a bonus, there's a malaprop. From the review, which begins with five stars:

* * * * * What a pointless review this is about to be...

You know why nobody has ever reviewed this book on Amazon? Because shoppers interested in a gigantic collection of academic papers on morphological theory are already AWARE of what it is, and don't need to be told about it. And anyone else will never, in fact, look at this review. So it's entirely a bizarre anachronism -- a review that nobody will read, that has nothing useful to say.

This is, of course, a wonderful compilation of papers on morphology. It's chocked full of data, and tons of careful analysis...

[more praise]

But you probably already know this. If you didn't, you wouldn't be looking at this book -- you'd be off digging up a used copy of "M is for Mush-for-Brains" by Sue Grafton-Higgins Clark. And then you wouldn't have any clue what I'm talking about, and probably too busy being led astray by William Safire or Richard Lederer to bother trying to find out...

I left the last part in so you can see why this came up in a "Zwicky Lederer" search. (In another review, verafides savages Ehrlich & Lederer, The Highly Selective Dictionary For The Extraordinarily Literate.)

My interest was piqued (or, as some say, peaked, or peeked) by the word "anachronism", which certainly isn't the right one for the job verafides used it for. "Anomaly", maybe? (Malaprops tend to set of the tip-of-the-tongue phenomenon, alas.)

Yes, I was dubious about "chocked full" too. Google has about 18,700 web hits on it. "Chock full", with about 207,000 web hits, beats it all hollow, but 18,700 is not a negligible number.

And, no, I don't know who verafides is.

The review was posted on June 10, 2003, so it's not exactly hot news. I've been kind of out of the loop...

==========

That was the story at the end of April. Meanwhile, eggcorns have been rolling in in these parts. Yesterday I caught the following in The Advocate of 31 August 2004, "Insurance insecurity", by Jeremy Quittner, p. 44: "Joe says that he was given an HIV antibody test that came back negative. Since he lacked health insurance, he did not have access to the more costly viral lode test."

It makes some kind of sense; the test checks out the lode, or hoard, of HIV within your body.

And now, a few minutes ago, Language Log reader Max Vasilatos, of San Francisco, arrived to have lunch with me and offered two eggcorns that she'd come across: statue of limitations (13,300 Google hits, many in legal contexts) and taking for granite (only 30 hits, but some of them apparently genuine).

The beet goes on.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 05:42 PM

August 18, 2004

Canada Supreme Court gets the grammar right

It grieves me deeply to defend the benighted and probably sexist dimwits of what must have been Canada's most stupid collection of Supreme Court justices ever, but I'm afraid I have to. The judicial nitwits who ruled that women are not persons were right on a point of grammar, and Bill Poser has it wrong.

People make some cruel jokes about Canada. (I'm thinking of Ambrose Bierce's entry for "Man" in The Devil's Dictionary, which describes Homo sapiens as a species which "infests all habitable parts of the globe, and Canada"; and a joke David Perlmutter told me about how it was originally dreamed that Canada would be a country that married American efficiency with British culture and French cuisine, but through a terrible error it ended up with American culture, French efficiency, and British cuisine.) One might have thought we should just be grateful that Bill Poser had pointed out another thing about poor Canada that we could mock: it had a Supreme Court that (at least in 1927) was too stupid to figure out that when statutes said "if any person shall... then he shall..." they might mean the he to cover women too. They actually ruled that those occurrences of he (the forms he, him, his, and himself) meant that women were being assumed not to be persons, and had to be overruled by the Judicial Committee of the Privy Council in London, those guardians of the rights of women. "You'd think that the Justices of the Supreme Court would have been clever enough to recognize that he was used generically, not specifically in reference to men, wouldn't you?", asks Poser, in a rhetorical question biased toward a positive answer.

Well, no. It's my duty to report that The Cambridge Grammar of the English Language takes the position that he is never generic, i.e., sex-neutral. Chapter 5, by Rodney Huddleston and John Payne (see page 492), talks about "Purportedly sex-neutral he", and on page 494 they give evidence that it just isn't true that this pronoun may be used in a sex-neutral way: if it could, then there would be nothing at all wrong with saying

*Either the husband or the wife has perjured himself.

But that's a grammatical catastrophe, or a silly joke. One couldn't possibly think that was normal usage. Likewise with

*Was it your father or your mother who broke his leg on a ski trip?

That is not how we say things in English. (The commonest way to get around the gender problem here is to use singular they: Was it your father or your mother who broke their leg on a ski trip?; Either the husband or the wife has perjured themself. Shakespeare used it; Jane Austen used it; loads of fine authors use it. Get used to it. And if you have a usage book like Strunk and White that declares singular they to be an error, throw that book away.)

Yes, I grant you the Canadian Supreme Court were a bit myopic on the legal issue (women were voting already, as Poser points out; if they weren't voting as persons, then what were they, some sort of electorally competent barnyard animal?). But by The Cambridge Grammar's well-substantiated account they had it right on the grammar, and Poser has it wrong. Anyone who thinks the word he has a sex-neutral use is kidding themself. When someone says The person chosen as provost will need to know his stuff, they are talking as if the person chosen as the new provost will be a man. If you're not assuming that, don't use he.

Posted by Geoffrey K. Pullum at 09:18 PM

Terence in the Global Lunchroom

Bill Poser's post on the Persons Case echoes, by accident, Chris Waigl's 8/15/2004 post at serendipity on how to translate the Latin word homo in Terence's famous line "Homo sum: humani nil a me alienum puto". That in turn reminds me of Naomi Chana's 10/15/2003 post at Baraita, which takes on the question of how to translate alienum:

Terence said it first, Cicero quoted him, then the late medieval humanist tradition picked it up and ran with it into the Enlightement: "Homo sum; humani nihil a me alienum puto." That second clause is usually translated as "nothing human is foreign to me," but I'm not sure I buy that as a translation of puto alienum. The verb puto, putare is mostly used in a figurative sense: "to judge, suppose, account, suspect, believe, think, imagine." And the form is, of course, the first-person singular present active indicative. If I had to translate Terence, I'd go with "I consider nothing human to be foreign" -- or, more loosely, "I like to think everything's my business." It's a theory, not an established fact.

Still, I like the theory. It underlies the suspicion I inherited -- from my father the history teacher, but ultimately from the Enlightenment -- that one can learn more by hearing about an issue from as many informed perspectives as possible. I certainly don't think that "nothing human is foreign to me" ought to be an unexamined assumption in academic work, but it's not a bad way to build a blogroll, and for anyone who considers it too idealistic there's always "know your enemy," which amounts to the same thing in practice.

As a believer in Extreme Pluralism, I like this perspective (which I read last year, and found again today, via a characteristically quirky link in Desbladet). Naomi's post is mainly about the politics of academia, and the tension between informed commentary and anonymity, which is not the point here. But she also observes that Terence's line, generally quoted in support of non-judgmental disinterest, is originally a defense of meddling and gossip:

In the Terentian original, the assertion "I consider nothing human to be foreign" comes in answer to the question: "Chremus, can you take time from your own affairs to take care of someone else's, things that are completely foreign to you?"*** Or, more loosely: "Why are you poking your nose into someone else's business?" ... And Terence's exchange appears in a play called Heauton Timoroumenos, "The Self-Tormentor."

One of the things that I like about weblogs is that they add a new dimension to the virtual conversation that began when people started keeping a record, in cultural memory and then in shared texts, of their otherwise-ephemeral observations. Without the blogging medium, Bill and Chris and Naomi might have mentioned some of this stuff in a letter or a lunchtime conversation, but I never would have heard about it, and neither would you. And just as I can butt in with my own observation, so can you.

With millions of weblogs out there -- "3,547,057 weblogs watched", says Technorati -- no one is going to be even a passive part of all of these conversational strands at once. However, we each get to pick where to sit in the Global Lunchroom, and if we're bored or annoyed with the conversation at our virtual table, we can always assemble a new one. It's not a substitute for face-to-face interaction, or even for one-to-one exchanges by phone or mail, but it enriches one's life all the same.

Posted by Mark Liberman at 09:43 AM

Are Women Persons?

Emily Murphy I have to admit that I tend to be a bit skeptical of claims that "sexist language", such as the use of he as a generic, has a significant influence on people's thinking, but it is true that people can and will use gender-specific language to their advantage. I was reminded of this by the announcement that on October 17th Canada will issue a new $50 bill, which will bear on the back portraits of the Famous Five along with Thérèse Casgrain. If you're not Canadian, you probably don't know who any of these people are. Thérèse Casgrain (English version) was a human rights activist and politician who died in 1981. The Famous Five were Emily Murphy (shown above left), Louise McKinney, Irene Parlby, Nellie McClung, and Henrietta Muir Edwards. They are called the Famous Five because of their role in the Persons Case.

The Persons Case arose when Emily Murphy was named a magistrate in 1916. Some lawyers challenged her appointment on the grounds that she could not perform the duties of a magistrate because she was not a person. They based their argument on the wording of the British North America Act of 1867, the law that established the Dominion of Canada and served, in effect, as its Constitution. In modified form the BNA Act remains the Constitution of Canada as it was incorporated into the Constitution Act of 1982. The BNA Act used the word persons when referring to more than one person, and the word he when referring to a single individual, which led some to infer that as a matter of law only men were persons. In 1927 the Famous Five sought a declaratory judgment from the Supreme Court of Canada. One might think that the view that women are not persons was already an anachronism in 1927, as (white) women had already obtained the right to vote in 1918, due in considerable part to the efforts of Nellie McClung, but the Supreme Court ruled that indeed women were not persons. You'd think that the Justices of the Supreme Court would have been clever enough to recognize that he was used generically, not specifically in reference to men, wouldn't you? This was not the Supreme Court's finest hour. The Famous Five then appealed to the Judicial Committee of the Privy Council in London, which until 1949 served as the court of last resort for Canada in civil matters. On October 18, 1929, the Law Lords ruled unanimously that women were indeed persons.

Posted by Bill Poser at 03:25 AM

Chock, choke, chuck, check, chalk, jock, shock, chog: An ancient plantation of ache-corns

Tom Rossen emailed:

I'd like to offer an ache-corn (not an eggcorn, because I can't see any reasonable semantic interpretation):

(link) I had to publish the last three parts of Robb Hendrickson's (the CEO of Jellifish, www.jellifish.com) response all at once rather than stringing it out over three days because it is so good and chalk-filled with a rare combination of insight, analysis and candor that I can't sit on it.

Ouch. But as the inventor of Optimistic Programming, Tom, you should take a more hopeful view of the possibilities for fitting words together in new ways. Perhaps it's that chalk is so soft that you can mash pieces together so as to fit more into a full container?

Anyhow, it's not like chock means anything in particular to most people, when they encounter it in the phrase "chock full of" -- it's historically a variant of "choke(d) full", it seems, but with a long history of uncertainty dating back to Middle English. The OED sez:

The phonetic form and spelling and the derivation are alike unsettled, the uncertainty of the latter involving that of the former. In Dictionaries, first in Todd (1818) as choke-full (with mention of chuck-full as a ‘corruption’). Subsequent dictionaries have choke-full as main form, with chock-full as a recognized variant. But the American lexicographers have chock-full as the standard form, with choke-full as a cross-reference; and this appears to agree with literary usage in U.S. Choke-full appears to be rather the more frequent in literary use in England; but chock-full is almost universal in spoken use; chuck-full, in literary use bef. and after 1800, is now only dialectal.
The uncertainty begins with the first appearance of the word as chokke-fulle, cheke-fulle in the alliterative Morte Arthur, the spelling of which is very insecure.

Certainly "chalk filled" and "chalk full" are pretty common:

We bring you a selection from seven seasons chalk-filled with action, aliens, angst, and anomalies.
This place is chalk full of inspiring idea's from people chalk filled with awe inspiring talents.
It's free, and chalk-filled with goodies.
Because lo and behold every weekend seems to be chalk filled with drama.

This link is to her newsletter, chalk full of advice and tips to clear the paper clutter in your office and improve productivity.
UN institution established to promote regional development; chalk full of statistics and regional information.
The 51st Annual Dubuque County Fair is chalk full of great entertainment for the entire family.
Chalk full of schizophrenic arrangements, paranoia, alcoholic swagger, and sexual dementia, it fits perfectly with what the label's been up to.
This free report is chalk full of information that can help you live a healthier and more pain-free lifestyle.

But it doesn't end there. Just change one of the features in one of the word's phonemes to get jock (or one feature in each of two phonemes, for those who still distinguish caught and cot), and the hits keep on coming:

Now, we get to behold this 4-hour epic on a DVD release from 20th Century Fox Home Entertainment in a beautifully clean-up and THX-certified version on three discs. Two of them are reserved for the film itself while the third one is jock-full of bonus materials.
This book is jock-full of little-known secrets about using a Windows NT-based Web software server.
I don't want to spoil this story for the readers so I'll stop right here. I will say that it is a story jock full of adventure and surprises.
Stone paths lead you around this garden jock full of flowering plants.
This bag of treats is jock full of surf-instro, garage, punk, and good old rock and roll to make your party the best in town.
There are over 562 paying markets here for writers. Jock full of information and link to organizations, contests and writing seminars.
In a system jock-full of fishy hardware like the Vaio, this may even be the safer approach.|
Each jar is jock-full of native fruit pieces and contains only 2 g of carbohydrates per one tablespoon (1/2 oz) serving!
The forums are jock full of documentation and offer an incredible wealth in additional tips and tricks.

Or change a different feature to get shock, and the beat goes on:

It'll be shock full of slaggin' news, updates, and anything else we fit into an e-mail!
The brown paper grocery bags we'd fill shock full of freshly picked veggies and take to relatives, neighbors, friends, and people in our church were numerous.
It is an adventure to read and shock full of erudition. ... Read carefully, it's shock full of subtle details. I'm sure I missed at least half of them.
ROCK is Glenn Hughes' best album in years, it's shock full of energy and melodic prowess.
This game is shock full of options- from configuring controls and settings, creating a personalized character to selecting ships for specific missions.
Sample the crisp spring rolls, greaseless and shock-full of spicy pork and carrot-studded vegetables, served on a bed of lettuce and cucumbers...
Yup, and now according to DVDFile they're all also supposed to be available in a box-set with an extra disc shock full of extras.
Still, this title featured full-length stories that were shock-full of giant insects, dinosaurs, mutated (?) cats, cavemen and so on.
Yet make no mistake about it - shock full of tales of lost love, broken bonds and heartbreak - The Last Resort is a blues record.
While the series is not shock full of eye candy, it does feature pleasant character designs (both female and male), beautiful backgrounds and fluid animation.

You can even get hits on substitutions that are are not even words -- like chog -- though these are rarer:

Yes, I understand that the current Backlash enviroment is chog full of nasty Heel players, but a proper Face deck might make the mark in the correct conditions.

And of course there's chuck, choke and check, as the OED informs us.

Each collection is composed of five CDs check full of high resolution, broadcast quality scenes and clips.
Sign up to get your copy of Rubber Side Down, chuck full of news, interviews, bike tips, contests and more!
It is choke full of information that you can digest!

There's an all-too-obvious semantics for "jock full", and interpretations for "shock full" are not hard to imagine. But it's clear that you don't need to be choked full of meaning, in order to get started in the eggcorn business.

Posted by Mark Liberman at 12:59 AM

August 17, 2004

Journalist uses multiple logical operators correctly

From the San Jose Mercury News' Action Line for 8/16/2004:

Q: I want add my two cents and give thanks to all those police officers who ticket cars parked over sidewalks.

I have three small boys, and let me tell you, it is hard enough to get out on a walk without having to struggle to get the double stroller around poorly parked cars.

A: Far too many people still do not realize that parking over the sidewalk is illegal, dangerous and inconsiderate.

If Koko or N'kisi or Rico had said this, just imagine the excitement. I expect there'd be cover stories in Science, Nature, Newsweek, the Atlantic and People magazine. The amazing critter would probably get a gig on Letterman and be asked to host Saturday Night Live.

But Dennis Rockstroh, toiling away in the Action Line department at the SJMN, produces sentences like this all the time. Does anyone notice how deftly he manages the intricate interaction of far too many and still do not? No. Does he get his picture on any magazine covers? Not a chance. TV interviews? Nary a one.

Then one day, enervated by the constant toil of sorting through questions about city noise ordinances and minimum-stay requirements at state park campgrounds, he expresses frustration at some law about how gift certificates have be either replaced or redeemed in cash, except they they don't have to be redeemed in cash, and he unwisely substitutes nothing for any in paraphrasing a 34-word legal phrase already crammed with four lexical negations, two modals, one not and a generous sprinkling of quantifiers. Whak! Snik! Ka-choom! Geoff Pullum and Bill Poser are on him like Spiderman on Kraven the Hunter.

It's not easy, being human.

Posted by Mark Liberman at 12:30 PM

Confusing to Whom?

Geoff Pullum has pointed out two putative examples of badly written laws, both, not coincidentally, based on columns by Dennis Rockstroh in the San Jose Mercury News. I wouldn't want to claim that laws are always beautifully written, but I'm afraid that in both these cases the fault lies with the journalist rather than with the lawmakers.

The first example concerns the law governing gift certificates. In his column Rockstroh writes:

For example, it says in section 1749.5 of the California Civil Code that any gift certificate sold after Jan. 1, 1997, is redeemable in cash for its cash value. Then in Section 1749.6 of the same code, the law says, "This section does not require, unless otherwise required by law, the issuer of a gift certificate to redeem a gift certificate for cash." Confused? Me, too.

The problem is that Rockstroh has mis-stated the meaning of Section 1749.5. Here is the actual text of Section 1749.5(b). I've highlighted the crucial bit that Rockstroh missed.

Any gift certificate sold after January 1, 1997, is redeemable in cash for its cash value, or subject to replacement with a new gift certificate at no cost to the purchaser or holder.

His restatment of Section 1749.6 (c)(2)(a) is accurate. It says:

(2) This section does not require, unless otherwise required by law, the issuer of a gift certificate to: (A) Redeem a gift certificate for cash.

I don't think that this is confusing at all. 1749.5(b) says that the gift certificate must be redeemed either with cash or with a new gift certificate. 1749.6(c)(2)(a) says that the redeemer need not give cash. There is no contradiction. The effect is to make the choice that of the issuer rather than that of the recipient of the gift certificate.

The second example is Rockstroh's statement:

The law says: that nothing "shall be so designed and installed that it cannot, even in cases of failure, impede or prevent emergency use of such exit."

He appears to be quoting Section 3215 (d) of the California Code of Regulations. Here is what it actually says:

Any device or alarm installed to restrict the use of an exit shall be so designed and installed that it cannot, even in cases of failure, impede or prevent emergency use of such exit.

Notice that there is no nothing in the original text, and indeed that in Rockstroh's quotation his nothing lies outside of the quotation marks. The law is perfectly sensible. The problem was createdby Rockstroh's insertion of nothing into a clause that originally contained any.

Posted by Bill Poser at 05:17 AM

August 16, 2004

New intensifiers

In reference to my post on bu?? naked, Dan Chiavelli emailed:

In the realm of the evolution of slang involving the word "butt" as a modifier, I have an amusing anecdote for you. A friend of mine, who originally comes from the Seattle area, used to use the word "butt" as an intensifier, like "that's butt expensive". I understood this intuitively, as did others in our circle of friends.

I found this fascinating and listened to his speech for some kind of clue of this phrase's origin, and I finally heard him say the phrase "butt ugly" with the same intonation as "butt expensive", namely stressing the "butt" (whereas I consider normal intonation to stress both words more or less equally, if not to give primary stress to the first syllable of "ugly").

I assumed this was back-formation (no pun intended) on his part from "butt ugly". What do you think? Ever heard this before?

This one is new to me, but apparently it's out there:

(link) I'm wondering if anyone knows of any sites that sell custom body jewelry, specifically for eyebrow, surface, and deep tissue piercings. I'm going to get more work done and am wondering if my ideas are way out of my price range or not, as custom jewelry tends to get butt-expensive.
(link) Finally, I've found my HG blush! the formula and the color are goof-proof; this stuff is hard to screw up. will definietly buy this again even though it is butt-expensive.
(link) It's kind of depressing because all townhouses in Howard county are butt expensive.

(link) I have seen this person sell many I-grade stones but this one looks nice and is butt-cheap and an interesting cut if anyone is in the market.
(link) I think the touchpad was the only thing keeping the entire lineup of iPaq from looking butt cheap.
(link) These things are everywhere and rightly so since they are butt cheap and cool pretty darn well.

(link) Like so many butt-stupid movies that have come before it, Swimfan is, well, butt- stupid.
(link) The world is awash with butt-stupid ideas.
(link) I know this is a butt stupid question, but I have never had this thought enter my mind.

Dan's hypothesis about its origin makes sense, though there may also be some resonance from kick-butt, which can be used directly as a nominal modifier ("kick-butt hummus", "kick-butt publicity hound", "kick-butt workbench") but also an an intensifier of other modifiers:

(link) I just want to say France is kick butt beautiful!"
(link) I just had a look at your stuff and you are kick butt good for your age!!
(link) Siu long bao is in my view the epitome of the art as well as being kick-butt tasty ...

From a bit of looking around, it seems that plain butt as in idiomatic intensifier is mainly used with qualities are that are negative, or at least can be seen as negative. Thus "butt expensive", "butt cheap" (even when that is sometimes good), "butt stupid", but not (as far as I can tell from googling) "butt intelligent" or "beautiful" or "good". By contrast, kick-butt seems mainly to go with positively-evaluated properties.

It looks like "kick-ass" and "ass" are roughly equivalent to "kick-butt" and "butt":

... right now Columbia is overrun with millions of ass-ugly cicadas ...
Have fun, it's kick ass beautiful up there.
DysFunkshun's set was kick-ass good and had the small, intimate crowd dancin' hard all night long.

though here the distribution may be affected by the influence of the pseudo-affix -ass that attaches fairly freely to adjectives: "silly-ass", "long-ass", etc.

Posted by Mark Liberman at 06:29 PM

Nothing that cannot impede even by failure

In the Actionline column in the San Jose Mercury News — the very same one quoted in my previous post — Dennis Rockstroh provides some information about the content of the law governing access to and locking of emergency exits from public spaces:

The law says that nothing "shall be so designed and installed that it cannot, even in cases of failure, impede or prevent emergency use of such exit."

I don't think so! This time I think Dennis has it wrong. We have crazy laws here in California, but not this crazy. To require that nothing be designed in such a way that it cannot (even if it fails) impede use is to require that everything be designed in such a way that it can impede use (including in cases where it fails). That is, the way Dennis has it, under California law you must lock all emergency exits. I don't necessarily think this isn't a case of what we haven't failed to refer to on Language Log as overnegation. Why does my head hurt?

Posted by Geoffrey K. Pullum at 06:26 PM

Redeemable in cash

Section 1749.5 of the California Civil Code states that any gift certificate sold after January 1, 1997, is redeemable in cash for its cash value. Section 1749.6 continues, "This section does not require, unless otherwise required by law, the issuer of a gift certificate to redeem a gift certificate for cash." Dennis Rockstroh, in the San Jose Mercury News Actionline column (Sunday, August 15, 2004, page 3B), asks: "Confused? Me,too." He's right, I think. We should be confused.

Among the things that baffle me are: (i) To which section does "this section" refer? (ii) Could "otherwise" refer to the other section? (iii) Is "require ... the issuer of a gift certificate to redeem a gift certificate" supposed to mean "require ... the issuer of a gift certificate to redeem it", and if so, why didn't they say that grammatically? (Think about it. "I have a Corvette and I'm selling a Corvette" suggests there are two different Corvettes, right?) (iv) Why does my head hurt? (v) Is it fair that you can be sent to jail for breaking a law that no one can understand? (vi) If this section doesn't require redemption in cash unless it is otherwise required by law, does that mean (I hope you follow this) that if redemption in cash is otherwise required by law then this section changes its effect and does require cash redemption? (vii) Why do they allow state lawmakers to write slop like this instead of requiring them to work under the supervision of trained linguists?

Posted by Geoffrey K. Pullum at 06:09 PM

August 15, 2004

Binding time

Hanna Wallach emails an eggcorn:

Just found this one in the LiveJournal of a programming languages researcher:

"Binding my time," instead of "biding my time."

The bide in "biding one's time" is a verb that (according to the American Heritage Dictionary) means (in the transitive form) "To await; wait for". So the standard expression "bide one's time" means "to wait for the (right) time (to do something)".

But outside of this expression, and a few even more frozen fragments like the cutesy name "Bide-a-wee", bide is an obsolete word. As an alternative, bind is a common verb that is phonetically very similar to bide -- the /n/ will generally be realized in fluent speech only as nasalization of the preceding vowel -- and several of its meanings sort of fit. Among the transitive meanings of bind

1. To tie or secure, as with a rope or cord. 2. To fasten or wrap by encircling, as with a belt or ribbon. 3. To bandage: bound up their wounds. 4. To hold or restrain with or as if with bonds. 5. To compel, obligate, or unite: bound by a deep sense of duty; bound by a common interest in sports. 6. Law To place under legal obligation by contract or oath. 7. To make certain or irrevocable: bind the deal with a down payment. 8. To apprentice or indenture: was bound out as a servant. 9. To cause to cohere or stick together in a mass: Bind the dry ingredients with milk and eggs. 10. To enclose and fasten (a book or other printed material) between covers. 11. To furnish with an edge or border for protection, reinforcement, or ornamentation. 12. To constipate. 13. Chemistry To combine with, form a chemical bond with, or be taken up by, as an enzyme with its substrate.

it's plausible to think that when you "bide your time", what you're really doing is "holding or restraining" something. And among the intransitive meanings, you might think that you're "sticking or becoming stuck":

1. To tie up or fasten something. 2. To stick or become stuck: applied a lubricant to keep the moving parts from binding. 3. To be uncomfortably tight or restricting, as clothes. 4. To become compact or solid; cohere. 5. To be compelling or unifying: the ties that bind. 6. Chemistry To combine chemically or form a chemical bond.

As for the "[one's] time" part, it might be construed as an adverbial of temporal extent rather than a direct object, as in Shakespeare's "poor player, That struts and frets his hour upon the stage". Alternatively, you could imagine somehow binding strands of time into a sort of cord, like Clotho spinning out the thread of mortal life.

The "bind one's time" eggcorn is fairly common in some of its variants:

	bide	bind	biding	binding
my	7,900	53	13,700	726
your	7,310	385	1,900	10
her	1,810	57	3,500	1
his	7,650	14	13,800	24
our	3,020	2	2,030	7
their	11,100	15	12,600	13

Thus bind/binding my/your/her/his/our/their time has a total of 1307 whG ("web hits on Google"), or 305 whG/bp ("web hits on Google per billion pages").

This eggcorn might have a special appeal for a programming languages person, since it resonates with terms like "delayed binding".

Somewhat surprisingly, the LION database only finds one poem where something binds time. It's a 1990 work by Robin Becker entitled The White Place, which starts

         Bands of gray and rose bind Time in stone.
                        Easy to lose yourself
                             among enormous white tears
                             of spiraling rock.
		  You walked here, swallows left their tracks
                         in the air.

Right now I'm in the Denver airport, concourse A, binding my time while waiting for a plane to San Jose. Since you're reading this, I guess I've been binding your time too.

Posted by Mark Liberman at 06:34 PM

In the Arms of Another Man?

In his resignation statement last week, Gov. James McGreevey's admitted that he had had "an adult consensual affair with another man,"a phrasing that came up in a lot of stories on the scandal. In the Week in Review section of today's New York Times, Adam Nagourney writes:

But when it came to the most distinguishing aspect of Mr. McGreevey's affair - that it was with another man - the governor seemed neither remorseful nor regretful.

I didn't notice anything odd about the construction before it was pointed out to me in an email from Mr. Gary Apter that another was hard to figure here -- I mean, would that be equivalent to "I had an affair with a man other than myself"? (It's a safe bet that when McGreevey told his wife about his dalliance, he didn't say, "I've been seeing another man," which would have added insult to injury.)

Maybe this slips by because an admission by McGreevey that "I had an adult consensual affair with a man" would seem to locate the transgression in the homosexual connection, rather than the affair itself. Or maybe it's because of the parallel to "another woman." I'll leave it to another linguist to sort this one out.

Posted by Geoff Nunberg at 12:50 PM

Bu?? naked

Robert Johnson emailed to ask

I really enjoy reading your 'eggcorn' entries on the Language Log. I was wondering whether 'butt naked' vs. 'buck naked' is one. I’ve mostly heard African Americans use the former, while I always have used the latter. My wife, however, insists the former is 'right'. What do you make of it?

I didn't know the answer, and it looks like no one else does either. There are two stories out there. One story is that the original is "butt naked" and that "buck naked" is either a euphemism or a mishearing. The other story is the same, with the roles reversed.

The American Heritage Dictionary has an entry for "buck naked" whose etymology field says

buck- (perhaps alteration of butt) + naked.

The AHD's entry for "bare naked" adds more information in a regional note:

The chiefly Northern U.S. expression bare-naked illustrates the linguistic process of redundancy, not always acceptable in Standard English but productive in regional dialect speech. A redundant expression combines two words that mean the same thing, thereby intensifying the effect. The expression buck-naked, used chiefly in the South Atlantic and Gulf states, is not as clear as bare-naked with respect to its origin; buck is possibly an alteration of butt, “buttocks.” If so, bum-naked, heard in various parts of the country, and bare-ass(ed), attested especially in the Northeastern U.S., represent the same idea.

In Common Errors in English Usage, Paul Brians claims that "butt naked" is a mistake:

The standard expression is 'buck naked,' and the contemporary 'butt naked' is an error that will get you laughed at in some circles. However, it might be just as well if the new form were to triumph. Originally a 'buck' was a dandy, a pretentious, overdressed show-off of a man. Condescendingly applied in the U.S. to Native Americans and black slaves, it quickly acquired negative connotations. To the historically aware speaker, 'buck naked' conjures up stereotypical images of naked 'savages' or—worse—slaves laboring naked on plantations. Consider using the alternative expression 'stark naked.'

but I'm not sure he has any evidence for his assertion that "buck" is standard and "butt" is error: perhaps in this case "standard" just means "what they say where I come from."

"The Mavens' Word of the Day" at Random House has a different story, or rather a different set of stories, about the origins of "buck naked". The discussion doesn't mention "butt naked", but implies by omission that "buck naked" was an original development:

Buck naked, slang for 'completely naked' came on the scene in the late 1920's, and the qualified buck-ass naked a bit later. It's one of those terms which is most often accompanied by the irritating phrase "of obscure orig." or "origin unk." Given the preceding array of choices, one might hazard (as only one of my sources did) that the buck in buck naked refers to the color of buckskin, along the lines of "buff," as in "in the buff." But, while we're conjecturing, I might propose another possible etymology. Around the same time that buck naked was making its debut, so was another slang term, bucket, for 'buttocks, rump.' Shorten bucket to buck, and you've got a term for 'ass-naked,' which makes sense in a very, erm, transparent way.

The OED is uncharacteristically silent: neither "buck naked" nor "butt naked" appears, as far as I can tell.

Google finds 46,400 pages for "buck naked" and 210,000 for "butt naked", which is close enough to even to make it clear that both are well established.

FWIW, the AHD story sounds reasonable to me -- and Brians' notion that "buck" is interpreted by some as a reference to "savages" also sounds reasonable. This would mean that switches in both directions have had some euphemistic motivation, as well as the usual phonetic similarity and semantic resonances. There are apparently regional and perhaps ethnic differences in preferences for one expression or the other.

So which is "right" and which is "wrong"? That question can be interpreted to mean "which is the standard expression?" In this case, there's no clear answer. The question can also be interpreted to mean "which is the original expression?" There's no clear answer to that one either, though if the origins are really in the 1920s, someone may find some evidence eventually. The question can alternatively be interpreted to mean "which expression will make people think I've made a mistake, or will offend them for some other reason?" The answer to that one is "both, depending on the audience and the context". So you might as well do what comes naturally.

Posted by Mark Liberman at 10:33 AM

August 14, 2004

"Schopenhauer's Debate Camp"

Responding to my posts of August 6 and August 9 on "rhetorical flypaper", Glen Whitman at Agoraphilia has some astute observations on rhetorical strategies in formal debate. Over the past few years, I've met several undergraduates who participate in such competitions, and it's occurred to me more than once that these events are an untapped resource for people interested in discourse and communicative interaction. In many cases, I believe that the participants would be happy to have the proceedings recorded and published as digital transcripts linked to digital audio (and video). There are several different kinds of debates, but each is a relatively stereotyped form of interaction, within which there is nevertheless a great deal of variation in content as well as in effectiveness. There is also an audience interested in any analytic results. So collecting, publishing and analyzing a "debate corpus" would be a good project for someone.

Posted by Mark Liberman at 01:38 PM

Hand fisted

A few days ago, Linda Seebach emailed an "eggcorn alert", citing an August 11 post by Jane Galt at Asymmetrical Information that uses "handfisted" in place of the expected "hamfisted":

Some protesters I know have offered to "pay" for their trouble by volunteering to work in the park, but that won't fly for several reasons.

1) Most New Yorkers I know couldn't plant a fern in their windowsill without a Time-Life instruction manual and a team of landscapers

2) They neglected to raise the money to replace the damaged greenery, a not-inconsiderable expense

3) The public sector unions aren't going to let a team of handfisted amateurs take their overtime away.

This is a sensible mistake, as eggcorns usually are -- in fact, it makes more sense to attribute clumsy manipulation to a fisted hand than to a "ham hand", whatever exactly that is. Still, the socially sanctioned idiom is "ham fisted", and "hand fisted" is clearly a misunderstanding.

There's no question that this particular substitution is due to "Jane Galt", but many of the other examples on the web are in transcripts of commentators, reporters and other talking heads, where it's up for grabs whether the error is due to the speaker, the transcriptionist, some editor, or a combination of these.

For example, on April 4, 2003, Mark Shields and David Brooks were on PBS discussing the Iraq war, and Brooks said (according to the transcript)

I guess my first reaction was sort of visceral. I was appalled at the way the generals and officers in the Pentagon went leak happy to the New Yorker and to the Washington Post in particular. This goes back pre-9/11 to the transformation that Rumsfeld and people associated with him tried to do to the military, which hurt the army, helped some other parts. And he did it in a hand fisted way, which is his style, and he made a lot of people angry.

The audio indicates that this was probably a transcriptionist's eggcorn for the expected "ham fisted" Granted, it's hard to tell. Brooks says something like [ˈɦæ̃ˌfɪs.ɾɪd], that is, he produces the initial consonant and (nasalized) vowel that "hand" and "ham" share, and then transitions directly into the [f] of "fisted" without more than a pitch period or two of nasal murmur -- if that much. This pronunciation is somewhat easier to get to as an approximation to "ham", but it could also be a contextually reduced version of "hand".

This case is a good example of why one should be careful about making fun of public figures based on reported speech errors. Unfortunately, it's not always so easy to find the original audio to check.

CNN correspondent Matthew Chance, reporting from London on April 9, 2004, said (again about Iraq, and again according to the transcript) that

There's been criticism in the British press as well for the U.S. handling of the occupation. It's been called hand-fisted, even misguided. So, a general sense of dismay in the British press on this one-year anniversary since that toppling of the Saddam Hussein statue.

This could have been a transcriptionist's error -- perhaps even the same one, though I don't know if the Jim Lehrer News Hour uses FDCH as CNN does. Anyhow, because CNN doesn't keep old shows available on line, it seems that I'd have to give FDCH a significant sum for a video tape of the show in question, in order to confirm my suspicion that Chance said "ham fisted" rather than "hand fisted".

On Oct. 5, 2001, the Foreign Policy Association held an interview with R. Scott Appleby, whose transcript quotes him as saying

Oh, I don't think it's a clash of civilizations, Sam Huntington's phrase, in the most extraordinary sense that he means it. But there are elements of truth in that thesis. There are real differences in the way cultures and religiously based civilizations think about issues from gender relations to sexual relations to education to military use to images of God, and these are not insignificant for the way these cultures make political decisions. What's too hand-fisted about that thesis, or about the way it is often interpreted, is that it overlooks the real fact of diversity within these civilizations and different levels of assimilation into a cosmopolitan version of Islam or any tradition.

The audio is no longer available on line -- the link is dead -- so again we can't tell who is responsible. In this case, the FPA editors must be at least a bit complicit, since presumably someone checked the interview transcript.

It strikes me, by the way, that the culture of writing must significantly decrease the development and spread of eggcorns. Eggcorn invention and adoption must be much commoner in pre-literate (or post-literate) cultures. Perhaps there should be a subdiscipline of eggcornology after all, to study such questions.

[By the way, one of the comments on the cited Jane Galt post opines that "it's a bit of British slang rather than American", but I'm skeptical -- the OED has one citation for "hamfisted", seven for "ham-fisted" and one for "ham fisted", but none for "handfisted", "hand-fisted" or "hand fisted".]

Posted by Mark Liberman at 01:14 PM

August 13, 2004

Getting next to

My spouse Karen is reading a mystery novel called Inner City Blues by Paula Woods. Karen noticed the following interesting example:

"Lance had pulled four consecutive twelve-hour shifts at the hospital, and the strain was getting next to him." (p. 119)

Neither of us had ever heard "get next to" in this sense before -- we both use "get to" in this same sense. The characters in both instances are African American; this may be a feature of AAVE that I was not previously aware of.

Googling for {"getting next to"} is not terribly helpful, but {"really getting next to"} returns these two similar examples:

(link) At first this whole blog thing was really getting next to me, but it is ok now.
(link) I crawl onto my own bed to an anxious, impatient Jimmy Max--all that activity across from us is really getting next to him.

(That first example reminds me of Arnold's latest post.)

Googling for {"really get next to"} is even more interesting. Alongside results like these:

(link) The worst part is, I hate that I can't Remember,! I can handle this Staggering around, ! But not Remembering really get's next to Me !
(link) i have had reschedule after reschedule and its beginning to really get next to me.
(link) As I have moved to become an advocate, a promoter of our faith, I have heard and seen certain negative subtleties that really get next to me!
(link) My family some times really get next to my last nerve.

We also found the following:

(link) I love almost all types of music. Anything slow with a lot of emotion in it I can really get next to.
(link) "I think I could really get next to helping young French horn players and other classically oriented players who don't know where to start or feel their instrument has no business in jazz," Clark says.

Apparently, if something gets next to you that means it gets to you, but if you get next to something you get into it. Did my getting next to this get next to you, or did you get next to it too?

[ Comments? ]

Posted by Eric Bakovic at 01:44 PM

Can Geoff Pullum rest on his laurels?

There's a new collection of essays called "The Genius of Language: Fifteen Writers Reflect on Their Mother Tongues", edited by Wendy Lesser, in which Amy Tan discusses "Eskimos and their infinite ways to say `snow,' their ability to see differences in snowflake conflagrations, thanks to the richness of their vocabulary." And at least one reviewer -- Philip Marchand in the Toronto Star -- gently but firmly corrects her:

Tan seems not to realize that this old canard about the Inuit having 32 different words for snow, or whatever the number, is pure myth. Apparently, the Inuit have only a few more words for snow than English speakers do. The Sapir-Whorf thesis has generally been oversold in recent decades, and perhaps it is time to give it a rest — it has had dire effects, such as the desperate attempt of disadvantaged groups to come up with new names for themselves, in the belief that such names will magically alter society's perception of them.

This is perhaps not exactly as Geoff Pullum would have put it, and I'm not quite sure what's going on with the "new names for themselves" business, but the rest of it seems pretty reasonable to me. And this is featured right up at the top of Marchand's review, it's not a by-the-way at the end.

Marchand devotes much of his review to a related issue that is apparently also widely featured in the book (which I haven't read yet): how characteristic differences between cultures manifest themselves in characteristically different ways of using language. He quotes from Ha-yun Jung's essay:

"In Korean, the first-person singular is an elusive voice," she writes. A Korean would be much more likely to say, `It would be nice to have an apple,' rather than `I want an apple.'

"Rarely will you hear a Korean speak — or write — consecutive sentences that start with I-this or I-that," she notes. "`I' seems to crawl behind the curtain at the first given moment."

Alas, Marchand introduces this passage in a problematic way: "Another interesting test case for the Sapir-Whorf theory is mentioned by a writer named Ha-yun Jung". I don't see that poor old Sapir and Whorf really have anything at stake in this matter. Their key idea was that differences among languages -- especially in plurality or gender or definiteness or other sorts of morphosyntactic marking -- should have an effect on what people pay attention to. Recently, Lira Boroditsky has been doing some interesting work supporting this idea in the case of gender marking in European languages. But the Sapir/Whorf hypothesis has nothing to say, as far as I understand it, about the case in which a language has a perfectly serviceable piece of morphology that its speakers mostly choose not to use.

Maybe Marchand meant that the low rate of first-person use in Korean tends to argue against the Sapir-Whorf hypothesis, by suggesting that meaningful differences in culture are often not reflected in the (the basic structure of) the resources that each culture's language makes available, but only in how speakers act on their linguistic opportunities? Thus showing that linguistic differences are not a necessary condition for cultural differences in characteristic modes of thought? Though the S-W hypothesis really is that that linguistic differences are normally a sufficient condition for cultural differences in characteristic modes of thought?

If this is how Marchand is thinking, then it's only a small problem in logic. The alternative is that Marchand is one of those who think that the Sapir-Whorf hypothesis means something like "there's some sort of relationship between thought and language, which shows up when you look at how members of different cultures tend to think and talk".

That would be too bad, because the other examples that Marchand cites in this connection are interesting ones, once poor old Sapir and Whorf are off the hook:

A curiously parallel example is provided by the Chilean writer Ariel Dorfman. He recalls an incident in school when he accidentally smashed an object with a hammer in carpentry class. Rather than say, `I broke it,' he told his teacher, se rompio — `It broke.' His teacher had a fit. According to Dorfman, he shouted, "Everything in this country is se, it broke, it just happened, why in hell don't you say, I broke it, I screwed up. Say it, say, Yo lo rompi, yo, yo, yo, take responsibility, boy." Sometimes that `I,' that `Yo,' needs to crawl out from behind the curtain.

Dorfman notes that his acquisition of English helped him to counter the "proliferation of passive forms" his Spanish-speaking friends employed to pass the buck. He is also careful to point out, however, that, "no one language condemns you to laziness or efficiency, mendacity or truth. If you dispose of two languages, therefore, you can lie twice as much — but also have a good extra whack at the truth, if you are so inclined."

I think that the kindest thing to say here is that mistakes have been made. Maybe Chileans tend to express things without ascribing agency; and maybe Dorfman became more sensitive to this point as a result of learning English; but surely this is not because English has inadequate resources, relative to Spanish, for evading responsibility. It seems much more likely that Dorfman was wrestling with issues of moral responsiblity at the same time that he was learning English, and therefore he associated some genuine grammatical differences -- such as the different role of reflexives, which I sincerely hope that Dorfman is not calling "passives" -- with his ethical concerns. If so, that's mainly a fact about Dorfman versus his friends, not a fact about Spanish versus English.

Here at Language Log, we believe that the evidence about the Sapir-Whorf hypothesis, and about broader issues in the relationship between language and thought, is mixed. At least, that's my personal view. The concepts are tricky ones, and the facts are not favorable to those on either side who want to see everything in black and white. However, there's an analogous hypothesis that I hope we can all strongly support. I'll express it by editing a famous passage from Benjamin Lee Whorf. (His original is in blue, my interpolations in black.)

'We dissect ~~nature~~ speech and language along lines laid down by ~~our native languages~~ our initially very faulty understanding. The categories and types that we isolate from the world of [linguistic] phenomena we do not find there because they stare every observer in the face; on the contrary, the world [of linguistic behavior] is presented in a kaleidoscopic flux of impressions which has to be organised by our minds - and this means largely by ~~the linguistic systems in our minds~~ the concepts, terminology and analytic skills (if any) that we have been taught in courses, or developed by careful thought and experiment. We cut ~~nature~~ speech and writing up, organise it them into concepts, and ascribe significances ~~as we do~~ in a coherent and interesting way, ~~largely because~~ only to the extent that we are parties to an agreement that holds throughout our ~~speech community~~ intellectual tradition, and is codified in the patterns of ~~our language~~ our learned techniques of description and analysis. The agreement is, of course, an implicit and unstated one, but its terms are absolutely obligatory; we cannot talk at all [coherently about speech and language] except by subscribing to the organisation and classification of data which the agreement decrees.'

In other words, the writers in Lesser's anthology, as sensitive as they are to issues of language, thought and culture, could all have benefited from a good introductory linguistics course. This would have given them the concepts, terminology and analytic skills to think and write about their own experience more clearly.

[Note: the other reviewers that I've read so far don't pick up on Tan's Eskimo snow blunder, nor on the other language, thought and culture questions that Marchand features. Matt King in the East Bay Express doesn't mention the issues at all in his review, nor does Charles Matthews writing in the San Jose Mercury News, nor does Brian Dolan writing in the San Francisco Chronicle.]

Posted by Mark Liberman at 09:39 AM

Disgusting Condiments

Mark's discussion of the origins of disgust and of the things that adults find disgusting but small children do not reminds me of the fact that here in Carrier country the word for "mustard" is a compound meaning "children's feces", e.g. /ts'udʌnetsan/ in the Stuart Lake dialect, /s̪kehtsan/ in Stony Creek dialect. Presumably this is based on the colour and texture rather than the taste.

Posted by Bill Poser at 04:46 AM

August 12, 2004

Disgust for accents : pre-adaptation or figure of speech?

Continuing our earlier discussion of whether people sometimes feel real disgust in reaction to the speech of others, Paul Bloom sent me an electronic copy of a book chapter in which he discusses some closely related questions. This is chapter six of his new book Descartes' Baby: How the Science of Child Development Explains What Makes Us Human, and it discusses the development of disgust in human infants and children. Bloom deals with the progression from "babies and toddlers [who] will happily play with, roll around in, and even eat substances that make their parents gag", to adolescents and adults who may say that they are disgusted by such abstract concepts as commercial greed or badly designed software.

I'm focusing here on just one aspect of Bloom's chapter, where he deals with the question of whether the same emotion of disgust is really involved all along this gamut. The background is work by Paul Rozin, April Fallon, Jonathan Heidt and others, arguing that "core-disgust is an emotion that makes people cautious about foods and animal contaminants of foods"; but "disgust has expanded ... to become not just a guardian of the mouth, but also a guardian of the 'temple' of the body, and beyond that, a guardian of human dignity in the social order." (I'm quoting here from a paper entitled "Body, Psyche, and Culture: The Relationship Between Disgust and Morality", Heidt, Rozin, McCauley and Imada, because it's available on line. It's not one of the sources that Bloom quotes, but the ideas are essentially the same).

Heidt et al. go on to place this expansion of disgust into an evolutionary framework:

If the heterogenous class of disgust elicitors is linked together by a set of shared schemata, then the elaboration of disgust, from core through socio-moral, may be explained by the mechanism of "preadaptation" (Mayr, 1960). Mayr suggests that the major source of evolutionary "novelties" is the co-opting of an existing system for a new function. We suggest that core disgust be thought of as a very old (though uniquely human) rejection system. Core disgust was "designed" as a food rejection system, as indicated by its link to nausea, its concerns about contamination, and its nasal/oral facial expression. Human societies, however, need to reject many things, including sexual and social "deviants". Core disgust may have been preadapted as a rejection system, easily harnessed to other kinds of rejection. This harnessing, or accretion of new functions, may have happened either in biological evolution or in cultural evolution (Rozin, 1976; Rozin, Haidt & McCauley, 1993). Human societies take advantage of the schemata of core disgust in constructing their moral and social lives, and in socializing their children about what to avoid.

In the cited chapter from Descartes' Baby,Paul Bloom argues that this goes too far. He quotes Paul Rozin explaining (in another paper) that disgust has developed "from a defense of the physical body to a more abstract defense of the soul":

Humans must eat, excrete, and have sex, just like animals. Each culture prescribes the proper way to perform these actions—by, for example, placing most animals off limits as potential foods and most people off limits as potential sexual partners. People who ignore these prescriptions are reviled as disgusting and animal-like. Furthermore, humans are like animals in having fragile body envelopes that, when breached, reveal blood and soft viscera; and human bodies, like animal bodies, die. Envelope violations and death are disgusting because they are uncomfortable reminders of our animal vulnerability. Finally, hygienic rules govern the proper use and maintenance of the human body, and the failure to meet these culturally defined standards places a person below the level of humans. Insofar as humans behave like animals, the distinction between human and animals is blurred, and we see ourselves as lowered, debased, and (perhaps most critically) mortal.

Bloom suggests that disgust is not nearly this "smart", or at least not this abstract:

Rozin’s theory is too conceptual, too cognitive. It misses the physicality, the sensuality, of disgust. It is just not such a smart emotion. Simply being reminded—intellectually—of the fact we are animals is neither necessary or sufficient for disgust. Humans breathe and sleep, after all, “just like animals.” But breathing and sleeping are not disgusting. Looking at a brain scan or an X-ray is a stark and striking reminder of our physical nature, but these are not disgusting activities. Ruminating that I will one day die—just like any other animal—might make me sad, but it does not normally disgust me. In general, being reminded of our animal nature is not, by itself, disgusting.

A more plausible view is that death, bad hygiene, body-envelope violations, and certain sex acts disgust us simply because we perceive them, at a basic sensory level, in much the same way we perceive rotten meat and decaying flesh.

Bloom grants that people often use the language of disgust in "highly abstract and intellectual" ways:

In just a few months, I heard the word "disgusting" used to describe

The president’s tax plan
Someone writing a negative review of a grant proposal because he disliked the applicant
Microsoft
The high cost of prepared spaghetti sauce

But he argues that this is a "metaphor", not a true pre-adaptation:

This all seems to indicate that disgust can be highly abstract and intellectual. But I am skeptical. My hunch is that in these statements “disgust” is a metaphor. Saying that we are disgusted by a tax plan is like saying that we are thirsty for knowledge or lusting after a new car. After all, if you actually observe people’s faces and actions during heated political or academic discourse, you will witness a lot of anger, even hate, but rarely, if ever, the facial or emotive signs of disgust.

One problem is that none of the terminology (emotion, disgust, metaphor, pre-adaptation, etc.) is very precisely defined here. In the case of "emotion" and "disgust", the whole point is really to try to figure out what the boundaries and subdivisions are. And I should think that the "accretion of new functions" whereby "human societies take advantage of the schemata of core disgust in constructing their moral and social lives" might be described as a "metaphor" and simultaneously as a "pre-adaptation".

In evaluating these questions, it would be helpful to have more precise definitions of the terms involved, and it would also be helpful to have experimental evidence about a number of independent indicators. We have the words that people use to describe their feelings; we have their self-reporting about states like nausea; we have their facial expressions. It also seems that "core-disgust" may have some reasonably well-defined neurological correlates: a 2003 meta-analysis by Murphy et al. of available functional imaging studies found disgust-related activity most often concentrated in the insula/operculum and the globus pallidus, in quite a different pattern from the activation for other negative emotions:

["Lateral OFC" is "lateral orbitofrontal cortex", and "RSACC/DMPFC" is "rostral supracallosal anterior cingulate cortex/dorsomedial prefrontal cortex".]

It should be possible to check whether more abstract intances of self-described "disgust" show the fMRI patterns associated with disgust, or anger, or both, or neither. In particular, we could check whether the visceral flashes of negative emotion that many people report feeling in response to disliked accents look like disgust, in fMRI terms as well as in terms of facial expressions and so on.

Sometimes, when people say "disgusting" to mean "something I don't like", I'm sure that the metaphor is as dead as a computer mouse. It's plausible that being annoyed by overpriced sauce is often a completely different emotion from being nauseated by the smell of rotten meat, even if people use the same adjective to describe it. But it's also plausible that moral revulsion at price gouging is sometimes be strengthened by resonance with "core-disgust". Looking empirically at facial expressions (as Bloom suggests) and at functional imaging data probably wouldn't settle these questions, but it might help move them to more interesting levels of uncertainty.

Anyhow, in considering the emotional valency of speech sounds, we shouldn't limit ourselves to disgust, or even to the broader set of negative emotions. We're a cheerful and optimistic bunch here at Language Log, and so we'd want to consider the emotional reactions to accent and voice quality in a broader perspective. Reactions can be positive as well as negative. Some have to do with the listener's basic emotional frame for the speaker: sexual attraction or repulsion, dislike and annoyance or warmth and benevolence, respect or its disdain, enjoyment or disgust. Others evoke fairly abstract stereotypes of the speaker -- as snooty or stupid or whatever -- that have an emotional loading. A bit of poking around has not turned up much literature that deals with this subject systematically, but I'll keep looking.

Posted by Mark Liberman at 08:45 PM

August 11, 2004

A derangement of blogs

Leon Wieseltier reviews Nicholson Baker's latest novel, Checkpoint, in the 8/8/04 NYT Book Review. On p. 12, Wieseltier says of Baker's protagonist:

Jay is a deeply unhappy man. His wife has left him, he has lost his job as a high-school teacher, he works as a day laborer and has declared personal bankruptcy, he spends his days reading blogs. (About the deranging influence of blogs Baker makes a sterling point.)

Are we feeling deranged yet?

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 07:07 PM

Emergency call for Dr. IPA

Well, it seems that our idea about using phonetic transcription in advertising is seeping osmotically into the Zeitgeist. Or maybe the direction of flow was the other way around? Anyhow, TstT documents some Qwest ads that give pseudo-phonetic renditions of decomplexify and futureproofulate. But, as TstT explains in detail, they got it all wrong!

The results are ugly and unimpressive, as error always ought to be in comparison to truth. Quest's advertising agency used this

instead of this:

ˌdi.kəmˈplɛk.sɪ.faⁱ

and then for futureproofulate, misused a completely different approach to the representation of pronunciation. You'd think that a company aiming to present a high-technology image would take a few minutes to get this kind of thing right.

Posted by Mark Liberman at 11:31 AM

On Condoleezza Rice Confirming Disclosure of Pakistani's Identity

The exchange below took place Sunday, August 8, in a CNN interview between Wolf Blitzer and Condoleezza Rice:

  BLITZER: Let's talk about some of the
  people who have been picked up, mostly in
  Pakistan, over the last few weeks. In
  mid-July, Muhammad Naeem Noor Khan. There is
  some suggestion that by releasing his
  identity here in the United States, you
  compromised a Pakistani intelligence sting
  operation, because he was effectively being
  used by the Pakistanis to try to find other
  al Qaeda operatives. Is that true?

  RICE: Well, I don't know what might have
  been going on in Pakistan. I will say this,
  that we did not, of course, publicly disclose
  his name. One of them...

  BLITZER: He was disclosed in Washington on
  background.

  RICE: On background. And the problem is...

According to an article in yesterday's Boston Globe, "Blitzer said this exchange meant Rice had confirmed that the administration released Khan's name to a reporter on background". A government spokesman claims, to the contrary, that Rice's repetition was not a confirmation. But his position is, as my dad would say, baloney.

From the Boston Globe article:

  But Sean McCormack, a National Security
  Council spokesman, said yesterday that Rice
  did not say the leak came from American
  officials.  ''She was in the middle of making
  a point and he interrupted her, and she
  reflexively repeated 'on background,' but she
  was not confirming it and went on to complete
  her thought," McCormack said.

This strikes me as patently ridiculous.

The notion of a reflexive repetition is one with which I am not familiar, and one that I couldn't find with some Web searching, unless perhaps McCormack is suggesting that Rice is exhibiting immediate echolalia (think Rain Man).

Absent some ill-defined claim about "reflexive" repetition, it seems reasonable to suppose that Rice's utterance was, in fact, serving an intended discourse function of some kind or another. What might that function be?

It's true that under some circumstances (usually with intonation indicating, say, e.g. anger) repetitions can be used to question, negate, or contradict rather than to acknowledge or confirm. Consider:

  1     You're a fool.

  2(a)  I'm a fool??
  (b)  I'm a fool.  I'M a FOOL!!

But the context provides no support at all for this interpretation. We can dismiss 2(a) immediately, assuming the transcriptionist correctly typed a period rather than a question mark at the end of the utterance. What about 2(b) as a model? Might Rice have intended to express a dismissive attitude toward the statement, effected by repeating the phrase "on background" in a scoffing way, with a sort of "yeah, yeah" intonation? The recording of the interview would answer that question definitively, but that intonation seems pretty unlikely to me and so I'll disregard it until someone who's actually seen or heard the interview tells me otherwise.

I've suggested that Rice's utterance is unlikely to be function-free, and also that it's unlikely to have been used to question, negate, or contradict. So what are we left with?

Well, how about Occam's Razor? There is a discourse function which repetitions do serve, and frequently: they are often used to acknowledge a successfully communicated proposition, particularly when selecting between several alternatives. E.g. in

  3(a) I'd like my coffee with skim milk.
   (b) Skim milk.

the repetition of "skim milk" by the hearer confirms that the proposition "the coffee should be served with skim milk" has successfully been communicated, and it emphasizes the contrast with an alternative proposition such as "the coffee should be served without milk" or "the coffee should be served with cream".

With that in mind, let's look at the exchange. Rice says that officials did not "publicly disclose" the name of the informant. In logical terms, Rice has asserted

  (4)  (not (exists x (disclosure(x) and public(x))))
       "There was no public disclosure"

Blitzer is clearly concerned with distinguishing two alternative ways her assertion could be true: she could be saying

  (5)  (not (exists x disclosure(x)))
       "There was no disclosure"

or she could be saying

  (6)  (exists x (disclosure(x) and (not public(x))))
       "There was a disclosure and and it was not public".

As it turns out, journalistic parlance provides a term, on background, that describes the latter case: something stated on background is communicated subject to "an agreement between a journalist and an interviewee that the name of the interviewee will not be quoted in any publication, although the substance of the remarks may be reported" (Webster's). So when Blitzer interrupts with "He was exposed ... on background", he is using mutually familiar terminology with Rice to make the assertion in (6), "There was a disclosure and it was not public". When Rice replies with "On background", the most parsimonious explanation of her repetition is not some unusual (I would say bizarre) claim about reflexive repetition, but rather a straightforward, everyday use of repetition. She is emphasizing a contrast between (6) and another contextually salient alternative,

  (7)  (exists x (disclosure(x) and public(x)))
       "There was a public disclosure".

In the process of emphasizing this contrast, which applies to part of what is being said (public(x) versus (not public(x)), she is acknowledging the other part, namely disclosure(x).

Just for the fun of it, here's the same basic dialogue with the topic changed to coffee as follows:

  coffee  ~ disclosure
  with milk ~ public
  like  ~ have done
  black     ~ on background
    (i.e. black = coffee and not with milk
           ~
          on background = disclosure and not public )

We get:

  PERSON A:  I don't like coffee with milk.
  PERSON B:  You like your coffee black.
  PERSON A:  Black.  And finding good coffee is hard these days...

Is there any doubt that Person A has confirmed she likes coffee, as long as the listener understands that it's black coffee (not with milk)? Substitute the original terms back in. Is there any doubt that Rice confirmed the disclosure having taken place, as long as the listener understands it was a disclosure "on background" (not public)?

Now (good transitional cue word!), I should confess that I'm not a specialist in discourse and dialogue. I probably have as much grounding in that sub-field as your average computational linguist. (Perhaps even a little more, having once analyzed Abbott and Costello's Who's on First routine from the perspective of centering theory for a term paper. :-}) I'd be happy to hear from any discourse specialists (perhaps someone who can speak to this from the point of view of Rhetorical Structure Theory?) who would like to weigh in, correct my terminology, or provide a good linguistic argument that I'm wrong.

As for political arguments, I'm not sure what the implications are of Rice having confirmed that Khan's identity was disclosed on background. I would infer that this confirmation was viewed by the Bush administration as a mistake, or else the NSC's spokesman wouldn't have been denying it took place, in the face of evidence obvious to linguists and non-linguists alike. FWIW, I think mistakes are best acknowledged and dealt with, not brazened out. Linguists don't need their own technical term for the denial of a proposition known by the speaker to be true -- it's called lying.

Posted by Philip Resnik at 11:25 AM | TrackBack

Linguists and prime numbers

Adding to the ancient and honorable category of "all odd numbers are prime" jokes, David Mortensen posts a link to a version featuring subtypes of linguists. Those not familiar with the field may need to take a linguist to lunch in order to get an explanation...

Update: Dave Long emailed to point out that according to Google, some odd numbers are more (widely regarded as) prime than others:

proposition (ghits)
----------- -------
3 is prime  (937)
5 is prime  (740)
7 is prime  (709)
9 is prime  (443)
11 is prime (663)

]

Posted by Mark Liberman at 10:43 AM

Koko's Trip to the Dentist

According to an AP item in today's (August 10th) Prince George Citizen (p.2), Koko the gorilla recently had a tooth extracted after she alerted her handlers to her problem by making the American Sign Language sign for pain and pointing to her mouth. Her handlers

quickly constructed a pain chart, offering Koko a scale from one to 10. When Koko started pointing to nine or 10 too often, a dental appointment was made.

Actually, they decided to give her a general exam while they were at it, so in addition to three dentists, they brought in an otolaryngologist, a gastroenterologist, a cardiologist, a gynecologist, two veterinarians, and three anaesthesiologists. Poor Koko didn't realize what she was getting herself into. And I wonder why they waited until she indicated that she was in extreme pain before bringing in the dentist. That doesn't seem very nice. Maybe Koko has a history of exagerating, or doesn't have a dental plan.

Unlike many animal language stories, this one seems perfectly plausible. There is plenty of evidence that Koko and other non-human primates can learn and use symbols. In many ways, the most interesting thing about this story is what it tells us about what Koko can't do, since her fans have often claimed that she exhibits human-like language. Koko wasn't able to form an ASL sentence along the lines of "my tooth hurts" - she used one ASL word, "pain", and pointed at her mouth. Nor was she able to express the degree of her pain in language. She couldn't say: "it hurts a lot". To find out how bad her pain was, her handlers had to have her point at a chart. How they taught her the meaning of the chart is itself an interesting question, and we should probably be impressed that she was able to learn how to use the chart to express the extent of her pain, if that is indeed what she did. But the linguistic point here is that, although Koko is able to use symbols, her linguistic ability is quite different from that of a human being. She has no grammatical structure and cannot form sentences. She cannot even express, using words alone, simple things like: "my tooth hurts" or "it hurts a lot", which any normal human three-year can manage quite nicely.

Posted by Bill Poser at 03:07 AM

An Autobiography About Someone Else?

Today's Vancouver Sun (August 10, 2004, p. C3) contains a review by Tim Page of the Washington Post of the book The King and I: The Uncensored Tale of Luciano Pavarotti's Rise to Fame by His Manager, Friend, and Sometime Adversary by Herbert Breslin in collaboration with New York Times music critic Anne Midgette. What struck me as odd is the description of the book as an autobiography, which I take to mean a book about the life of the author. Breslin was Pavarotti's manager for more than 30 years, and Pavarotti was his most famous client, so an account of Breslin's relationship with Pavarotti would naturally occupy a prominent place in a biography of Breslin, but even so, the title seems inappropriate for an autobiography in that it clearly focusses on Pavarotti rather than Breslin. Indeed, the review quotes Breslin as calling the book:

the story of a very beautiful, simple, lovely guy who turned into a very determined, aggressive and somewhat unhappy superstar

Judging from the review, the book sounds to me more like a biography of Pavarotti from Breslin's personal perspective than an autobiography of Breslin. It seems to me that for a book to be an autobiography it has to focus on the life of the author. It may well be that what makes the author's life interesting to others is his or her association with someone else, but it is still an autobiography if the focus is on the author. If the focus is on someone else, the book is not an autobiography.

Posted by Bill Poser at 02:34 AM

Very good peppers

On the table in our house when we have a sandwich lunch is a jar of hot peppers we purchase at a Persian grocery in Cupertino. They are imported by a firm in Glendale, California, but they are produced in Turkey. The brand name is ZerGüt. I stare at it often. And this is what I am thinking. Is that just a Turkish company name? Or am I staring at the first German-Turkish pun I have ever understood? You see, the German for "very good" is "sehr gut". It sounds like "zair goot". But if I am not mistaken, if a word in Turkish with the root zer had a suffix of the form gut, the u in the suffix would be pronounced ü. (That's like the umlauted vowel in the first syllable of the German town name Tübingen.) So ZerGüt could be a Turkized pronunciation of the German phrase meaning "very good". You'd think a linguist would be able to tell whether such a hypothesis was right, wouldn't you? But I have no idea how to confirm it. I think only confirmation from the company itself would settle the matter.

Posted by Geoffrey K. Pullum at 12:53 AM

August 10, 2004

Single out and double up

Reader ACW reported a nice eggcorn overheard on NPR: "signaled out" for "singled out".

ACW emailed:

The eggcorn hunt must be getting old for you professionals. But when I see or hear an unfamiliar one, I feel compelled to Report It At Once To The Authorities, and for the moment that seems to mean Dr. Liberman. Let me know if you get tired of it. In the meantime, as you have guessed, I have another one.

I heard it on NPR this weekend. A family member of a soldier recently mustered to Iraq, I think it was, felt unfairly "signaled out". Google finds a thousand or so of these, and in the twenty seconds or so that I spent getting a sense of the statistics, I got the impression that the eggcorn was outnumbered by the acorn by a factor of several hundred. Caution: the hits for the unmarked form, "signal out", are dominated by prosaic devices sending out the usual kind of signals.
It's always tempting to guess what the attraction of an eggcorn is. There is always something about it that makes fortuitous sense. It doesn't have to make much: consider the number of collocations like "by and large" that we use with no discernable compositional rationale. But to succeed as an eggcorn, a collocation has to have something going for it, a theory that licenses it and makes it seem reasonable. In the case of "being signaled out", I'm having trouble seeing it. Maybe it's getting support from the slightly-weird fossil expression "signal opportunity"; maybe users have an image of being selected with a pointing finger, said finger being a "signal" of that selection.

Well, actually, when it comes to eggcorns, we're pretty much all amateurs. There's no official subdiscipline of eggcornology, nor any International Journal of Eggcorn Studies. Not even a panel discussion at the LSA.

With respect to combinations like "single/signal out", it would be wrong to expect full compositionality. The combination of verbs with intransitive prepositions is one of the many pseudopods of morphological quasi-regularity that extend into the phrasal domain in English. There are lots of regular patterns, lots of idiosyncratic exceptions, and lots of small to medium-sized subregularities in between. So when you choose someone you've singled him out, but when you choose two people, you haven't doubled them out, or even coupled them out (though you might have coupled them up, depending on things work out). "Single out" seems to resonate with "point out", and so it's a little surprising that most people don't think you can signal someone out (unless you're an umpire). Of course you can't designate him out either (again with a possible exception for fancy descriptions of umpires). Though you can pick him out.

There are two problems in parallel for the distribution of verbs with intransitive prepositions. The easy part is that some particular verb-preposition combinations get idiomatic meanings. The hard part is that even the most regular and compositional-seeming combinations often don't work. See this post from last spring for a discussion of this issue with VERB+up.

ACW's post brings up some other points that are worth more discussion. For example, he invites us to "consider the number of collocations like 'by and large' that we use with no discernable compositional rationale." So I did so, and I realized that I don't have any idea what that number is. And I wonder if anyone else does either.

Now, it's not easy to count the members of a category like that. When you ask "how many words are there in English?", the answer depends on what you mean by "word", what you mean by "English" -- and even what you mean by "how many" and by "are". Different answers to these questions can change the answers by large factors. Still, you can define the question more precisely, and find some specific answers in the literature, or make some counts based on text databases, on the lemma count of specific dictionaries, or on various combinations of these.

In trying to count the number of noncompositional phrases in English, similar questions arise. For example, what is "noncompositional"? Are we talking about phrases with completely unpredictable semantics, like "red herring"? or about phrases with somewhat unpredictable semantics, like "chair lift" vs. "face lift" vs. "fork lift"? Many phrases are often used compositionally but also have a more specialized meaning in combination, like "run out"? Since complex nominals and verb-particle combinations are both quasi-regular, it's pretty hard to draw the line between what's compositional and what isn't, even before getting to more complex idioms or to phrasal terms of art.

Still, you could set a operational compositionality threshold of some sort -- or a range of thresholds -- and ask how many thus-defined noncompositional collocations there are in English. Unfortunately, I don't know any good overall treatments of this question. You could start by pulling out the multi-word items listed in dictionaries. But these lists are radically incomplete, and many of the things on them are semantically compositional in any case. It probably makes more sense to ask the question in more psychological terms -- for how many phrases do typical English speakers store information about form and meaning, independent of their general process for determining the form and meaning of phrases? You might be able to get an answer by sampling techniques.

I'm pretty sure that the following non-specific statement is true: "The number of phrases about whose form and meaning a speaker stores (at least some) information is normally many times larger than the number of words the same speaker knows".

And as for "by and large", by the way, the original semantics is fairly discernable to sailors, if not to the population at large. But even for sailors, "by and large" is one of the unknown number of terms in the phrasal lexicon.

Posted by Mark Liberman at 07:51 PM

Illustrated eggcorns

In the spirit of her insightful analysis of "air ways" (the broadcasting kind), Chris Waigl takes on "peace core", "bare hug" and "pair shaped". And she has pictures. Of the Japanese Navy.

I'm traveling, with very limited internet access time, so I'll leave you in her capable hands without further commentary.

Posted by Mark Liberman at 09:24 AM

August 09, 2004

Rhetorical flypaper plagiarized from inspired by Schopenhauer?

In post on August 6, I linked to a page where Birger Neilsen quotes "Thirty-eight dishonest tricks which are commonly used in argument, with the methods of overcoming them", from a 1930 work Straight and Crooked Thinking by Robert Thouless. I liked the list, but wondered "why thirty-eight?" Reader Steve Matuszek emailed the true explanation: because Thouless was inspired by Schopenhauer, who compiled a similar list of 38 Kunstgriffe ("strategems") in 1830.

I was at AAAI the week before last, so I fell way behind reading the Log. I apologize if this has already come up somewhere in the comments.
Thouless's 38 tricks looked familiar to me immediately. I submit that he was either paying homage to, or ripping off, Arthur Schopenhauer's Die Kunst, Recht zu behalten.
I have only this Web version to go by:
http://coolhaus.de/art-of-controversy/
but if it is accurate, this was written around 1830 and translated into English in 1896.
It's a must-read in any case. German is the best Language for Philosophy because all its Nouns are capitalized.

But the classical capitalization rules are under attack by Die Rechtschreibreform! Only in some small details, but still... Of course, the classical capitalization rules of German were imposed around 1900, as I understand it, and so Schopenhauer may have capitalized in a more whimsical -- or philosophical -- fashion, I don't know.

Schopenhauer's list is quite similar in content to the one attributed to Thouless, though the order is different. I don't have a copy of Thouless' book, so I don't know if he credits Schopenhauer or not.

However, my question now regresses to this: why did Schopenhauer divide the taxonomy of rhetorical strategems into thirty eight branches?

[Update: on the question of whether Thouless credits Schopenhauer or not, Ray Girvan emailed to say

On the basis of a quick skim of my 1956 copy, only once, to cite Schopenhauer's defence against "This is Beyond Me".

]

Posted by Mark Liberman at 08:06 PM

August 08, 2004

Like a fish needs a bicycle

Near the end of his first post this morning on superfluity, Mark notes:

"Fifth wheel" is a common expression for superfluity, common in frames like "feel like a fifth wheel" (538 ghits), but it's not so commonly used in the frame "ADJ as a ___".

I've often noticed that people use "third wheel" instead of "fifth wheel" to mean "the odd person out in a group of three, the other two of which form a couple". (Please excuse the very unimaginative paraphrase.) It seems to be about twice as common as "fifth wheel", judging by the 1,150 ghits I got for "feel like a third wheel" -- probably because it's that much more common for a single person to be hanging out with a single couple.

I've always found "third wheel" a little strange; the prototypical vehicle with the superfluous wheel is a car, not a bike, so it should be "fifth wheel" regardless of the number of people involved. But for many folks it seems that the number is simply a variable part of the expression. (Careful -- some of the following links contain Christian themes, drug use, sex, and in one case some fondling between Harry Potter characters.)

(link) Then, we head out to the Universal City Walk, where we wander in and out of shops until Kathy, Matt, Andy, and Theresa show up. (It's amazing. In the span of about twenty minutes I go from being a third whell to being a fifth wheel to being a seventh wheel.)
(link) I swear, I am the eternal third wheel, or fifth wheel, or seventh wheel, or in one extraordinary case, the 103rd wheel.
(link) I'm tired of always being the one who is the third wheel or fifth wheel or, in tonight's case, the seventh wheel.
(link) My life in school has made me feel like a wheel. A Third Wheel, a Fifth Wheel, and a Seventh Wheel...I just feel like a wheel.
(link) Could be that I'm getting older, or that I'm hanging out with more and more couples (effectively becoming a third wheel, or fifth wheel, or seventh wheel ...)
(link) Though they tried to include me in their activities I generally declined their invitations not wanting to be a third wheel or a fifth wheel or a seventh wheel.
(link) "What is with the whole triple date thing? I feel like a seventh wheel."
(link) While Howard Dean hangs out with the popular crowd, Dennis Kucinich must be feeling a little like a ninth wheel among the Democratic candidates.
(link) After spending almost 24 non stop hours at LAX, the boys and their 'dates' (extras they hit it off with) went to an all-night dancing club. I felt like an eleventh wheel, sitting alone as the others chatted with their new friends.

Examples like the following seem to indicate that "third wheel" has more or less become the default among most speakers -- not because the vehicular prototype has become a bike, but because the three's-a-crowd situation is simply more common.

(link) So you felt like a third wheel along with Benji, Joel and Chris. (thats not really a third wheel right, thats a 5th wheel i think.)

The "third" can of course be overridden as necessary. Googling for "like a(n) ___ wheel", we get the following monotonically decreasing numbers of results:

like a(n) ___ wheel	third	fifth	seventh	ninth	eleventh	thirteenth
whG	1,620	881	10	2	1	0

Interestingly, even-numbered wheels follow a similar path:

like a(n) ___ wheel	second	fourth	sixth	eighth	tenth	twelfth
whG	15	97	5	2	1	0

The dip on "second" is easy to explain: feeling like the odd person out when there's only two of you is rare or just not worthy of discussion. Or perhaps unicycles are just not very good prototypes. Tricycles appear to be fine, though, as some of those 97 ghits explicitly note:

(link) Chris Collinsworth similarly seems like a fourth wheel on a tricycle.

[ Comments? ]

Posted by Eric Bakovic at 03:28 PM

Chocolate teapots and fireguards

In response to my post on idiomic similes for superfluity and uselessness in German and English, several people emailed to draw my attention to common expressions such as "as useless as a chocolate teapot" or "as a chocolate fireguard".

To me, these expressions were as novel and idiosyncratic as phrases like "as worthless as a Jello pool cue". However, the woods are full of 'em, as my first-year Latin teacher used to say about whatever expression, form or construction I had just failed to construe correctly. "Chocolate teapot" has 2,750 ghits, and "chocolate fireguard" another 2,740. Even subtracting the many references to "Chocolate Fireguard Music Ltd." and similar enterprises, it's clear that these are phrases in common use. I suppose that I've never heard them because neither teapots nor fireguards are in everyday use in most U.S. households, and so expressions involving them are unlikely to spread. Of course, most Americans don't meet up with boars every day either, but expressions like "useless as teats on a boar hog" presumably spread at a time when things were different.

These expressions add to the evidence cited earlier for an interest in functionality among English speakers, perhaps contrasting with an interest among German speakers in counts relative to quotas. This new investigation suggests confirmation for a different stereotype: the recent obsession with definition and harmonization of international standards among European academics, due to the funding priorities of the European Commission. The highest ranked page for "chocolate teapot" is this article: Simon Bradshaw, Amanda Baker, Bridget Bradshaw, John Bray, Gordon Brignal, David Clements & Del Cotter, "An Appraisal of the Utility of a Chocolate Teapot", Plotka (issue 23 volume 6 number 2), May 2001:

THE CHOCOLATE teapot remains popular as a general comparative standard for the failure of an object to perform in accordance with its intended function, rivalled only by its close relative (in terms of composition, if not morphology), the chocolate fireguard. However, whilst numerous items are colloquially labelled as being ‘as useful as a chocolate teapot', there does not appear to be any objective standard for the usefulness, or indeed uselessness, of a chocolate teapot itself. In the absence of any British, European or ANSI Standard, Def Stan or MIL-STD for this important but poorly-specified reference item, it was decided to conduct an independent assessment of exactly how much use one of them was. As well as filling an significant gap in the standards literature, it was felt that this study would add to the body of work published in the Annals of Improbable Research on the scientific evaluation of common metaphors (Sandford, 1995; Paskevich and Shea, 1995; Dubik and Wood, 1995; collected in Abrahams, 1998).

I have to agree with the referee's comments, which begin:

THE AUTHORS have attempted to define an objective standard for the usefulness of a chocolate teapot based on experimental measurements. Although this is a laudable undertaking the authors have only been partially successful in their aim. There are a number of problems with their approach:-

1. For any standard method it is necessary to have data which are statistically robust. The authors describe a single experiment with one chocolate teapot and make no attempt to investigate variability and reliability within a single chocolate teapot grade or the grade to grade variation between chocolate teapots from different suppliers. The behaviour observed may have been specific to the teapot tested. Only a much larger research programme could determine if these results are representative.

There are three other comments and a suggestion for an alternative mode of instrumentation, all quite compelling in my view.

American informality (not to say sloppiness) with respect to these matters is underlined by the fact there are apparently no standards documents whatsoever based on experimental investigations of the utility of nipples on male mammals. Over to you, NIST.

[ Note: let me try to forestall misunderstanding by saying that most of the content of this post is intended as a joke. I'm not sure whether English speakers and German speakers really differ in their attitudes towards functionality vs. quota fulfillment. I tend to doubt it, but in any case a small handful of stereotype-confirming Google counts is preposterously bad evidence. Not to speak of the fact that both groups are exceedingly diverse. There certainly are cultural differences in things like this, and evidence from language usage could be relevant to studying such differences, but the evidence that I've presented is a joke (in several meanings of that word), not a valid argument of any sort.

I'm quite sure that European academics are more interested in standards than American academics are, and that this is a response to recent funding contingencies caused by the problems of European integration. It may also reflect different attitudes towards the role and value of government planning and regulation, top-down vs. bottom-up activity, and so on. But the teapots vs. nipples thing is again a joke, in this case confirming a belief held for very different sorts of reasons. ]

Posted by Mark Liberman at 10:47 AM

Superfluity and Uselessness

In a post on German spelling (un)reform, I quoted a Deutsche Welle page (an English one) that quoted Adolf Muschg calling the reforms "as unnecessary as gout". Julia Hockenmaier emailed to explain that this was a mistranslation of the common expression "unnötig wie ein Kropf", which means "unnecessary as a goiter". Chris Weigl at serendity posted on this as well, suggesting that the original was "überflüssig wie ein Kropf", using a slightly different adjective -- in effect, superfluous rather than unnecessary. Goggle has 5,800 hits for "wie ein Kropf", of which 2,920 are for überflüssig and 823 are for unnötig. (Other options are notwendig with 178 presumably ironic hits; unnütz with 117, etc.).

This suggests an interesting small contrast between German and English. To start with, I don't think that English has any common fixed expressions that start either "unnecessary as a ..." or "superfluous as a ..." At least I can't think of any, and what Google finds for me is either idiosyncratic or borrowed or both.

(link) superfluous as a frog's croaks [Indian English; Hindi or Urdu?] {1 ghit}
(link) superfluous as a prostate gland [French; quote from Clemenceau about the office of the presidency] {100 ghits}
(link) superfluous as a typewriter {1 page}
(link) superfluous as a bicycle for a fish [German; reference to Irina Dunn's phrase "a woman needs a man like a fish needs a bicycle", in turn paraphrasing some (unspecified) philosopher's remark about man and God] {3 ghits}

(link) unnecessary as a condom machine in a convent {1 ghit}
(link) unnecessary as a well is to a village on the banks of a river [quotation from Bhagavad Gita ] {23 ghits}
(link) unnecessary as a men's bathroom at a Lillith Fair show {1 ghit}
(link) unnecessary as a glass of water on Noah's Ark {1 ghit}

However, there are several English idioms for uselessness -- perhaps this reflects a cultural concern with functionality, as opposed to the Teutonic concern with adherence to quota :-)?. The commonest English idioms about uselessness seems to deal with non-functional nipples. Among the variants:

useless as teats/tits on a boar (hog) {1815 ghits}
useless as teats/tits on a bull {635 ghits}

Minority choices include "brass monkey", "bullfrog", "duck", "rainbarrel" and "nun". Note that in all of its variants, this simile is much less common on the web than the variants of "wie ein Kropf": 2,557 for "useless as teats/tits", vs. 3,743 for "überflüssig/unnötig wie ein Kropf" (or 5,800 for "wie ein Kropf").

I couldn't find any documentation on how many pages Google indexes in each language, but a search for "und" yields a count of 478 million, while a search for "and" yields 3.84 billion, suggesting a ratio of about 1 to 8. On that basis,"überflüssig/unnötig wie ein Kropf" would be almost 12 times commoner than "useless as teats/tits": about 3743/.478 = 7,831 whG/bpG ("web hits on Google per billion pages in German"), versus 2557/3.84 = 666 whG/bpE ("web hits on Google per billion pages in English").

Aside from any possible cultural differences in level of interest in superfluity and/or uselessness, this overall difference probably reflects the fact that the English simile is at least informal and perhaps offensive. This hypothesis is supported by the normalized counts for the individual adjectives -- independent of context. These seem to confirm that superfluity is more interesting to German speakers. uselessness more interesting to English speakers, while lack of necessity is discussed equally often by both:

	whG	whG/bp(G\|E)
überflüssig	426,000	1,200K
unnötig	348,000	1,000K
unnütz	40,200	116K
superfluous	536,000	140K
unnecessary	4,530,000	1,180K
useless	3,700,000	964K

[Of course, to do this comparison seriously would require considering a wider range of words and structures...]

There are other inventive similes for uselessness in English, but none of them are familiar ones:

(link) as useless as a jam sandwidch [sic] to a drowning rabbit.
(link) as useless as a chocolate fireguard
(link) as useless as a one-armed juggler [+ a windshield wiper on a submarine, + a pogo stick in quicksand, etc.]

and here's a whole poem of nonce uselessness, attributed to "Goo, 12, Wales":

You're as useless as a sheet with no bed
You're as useless as a pencil with no lead

You're as useless as a watch with no time
You're as useless as a poem with no rhyme

You're as useless as a book with no words
You're as useless as a a birdbath with no birds

You're as useless as an orchestra with no sound
You're as useless as a football that's not round

You're as useless as a runner with no legs
You're as useless as a clothes line without pegs

Did I mention that you're USELESS?!

"Fifth wheel" is a common expression for superfluity, common in frames like "feel like a fifth wheel" (538 ghits), but it's not so commonly used in the frame "ADJ as a ___".

	tits on a	a fifth wheel
useless as	2,440	18
useful as	828	13
unnecessary as	3	0
necessary as	10	3
superfluous as	3	0

Anyhow, this all leaves it unclear how Adolf Muschg's "überflüssig wie ein Kropf" or "unnötig wie ein Kropf" should have been translated. It might not be possible to do better than the literal "superfluous as a goiter" or "unnecessary as a goiter", even though those expressions are much more vivid and suggestive in English than they are in German.

[Update: Geraint Jennings suggests that an idiomatic translation might be "if it ain't broke, don't fix it." This apothegm seems close to (what I guess was) the intended spirit of Muschg's comment, though it can't be used directly as a predicate applied to "the spelling reforms", thus requiring additional restructuring of the phrase. Geraint was also among those who pointed out the U.K. (and other commonwealth?) idioms relating to (non) heat-resistant objects made of chocolate, e.g. "chocolate teapot", of which more later.]

[Update #2: Chris Weigl at Serendipity clarifies exactly what Muschg said:

I misquoted Adolf Muschg in the last post. His actual words were "[Die Rechtschreibreform] ist unnötig wie ein Kropf." Unnötig (unnecessary), not überflüssig (superfluous). Former German federal president Roman Herzog, however, did call it "überflüssig wie ein Kropf" (as superfluous as a goiter).

]

Posted by Mark Liberman at 08:14 AM

August 07, 2004

It's a big ask

There are just some days when I wonder whether I'm a native English speaker.

On NPR's All Things Considered yesterday, Mara Liasson reported from the Kerry campaign in Missouri. There are a few clips in the report from recent speeches by Kerry; here's the part from one of those clips that made me do a double-take:

I'm asking you to trust our nation, our history, the world, your families, in my hands. And I understand that it's a big ask. And it's a tough judgment you have to make.

After a little Googling (14,100 hits for {"big ask"}), I'm pretty confident this is another case of mere variation and that there are people out there who use this phrase in the way I (and others? a little help here?) would use "big thing to ask".

Update: ~~Two~~ ~~Three~~ ~~Four~~ Five people have written with clarification on "big ask".

Jonathan Lundell writes:

The first Google hit I got for "big ask" (link) is the meaning that I think of first as well. "The ask" is the act of actually asking for money when one is fundraising. It might be at an event with entertainment, food, whatever, but at some point somebody's responsible for the ask.

Maybe Kerry's already beginning to confuse his stump-speaking with his fundraising?

Duncan Mak writes:

I don't think it's that uncommon to use the word 'ask' as a noun, I came across this listing of "Microsoft vocabulary" recently, and "ask" is the first word listed. I don't think it is a Microsoft-only usage, as I've heard it used by other people (in other industries) as well. (link)

The relevant entry from this page is:

ask
noun
A requirement or request that something happen.
Example: The Speedo team has an ask that we add red dancing baboons to our product's splash screen.
Example: What are the Speedo team's asks?

Ray Girvan writes:

I wonder if any others find the term conjures up rather a bizarre image? Part of my family is Scottish, and to me, "ask" as a noun is a newt. (link)

Liz Ditz writes:

It is a turn of phrase from fundraising or philanthropy, which became ubiquitous about 5-10 years ago, meaning a grant request. "We have a big ask out to the GotRocks Foundation, and four or five smaller asks to community foundations." I remember being startled on first hearing it, but can't place the year.

From someone who prefers not to be identified:

01 Apr 1999: [Australian colleague asking me for a favour] said "I know it's a big ask" [this is the first time I heard it]

30 Sep 1999: Alf in "Home and Away" [Australian soap opera]: "it's too big an ask",

19 Jan 2003: RTE News [Irish TV station]: Munster having to beat Gloucester to progress in the European Cup [rugby]: "a huge ask"

My impression is that the use of a verb as a noun makes the speaker feel more dynamic about themselves and the situation than if they said "it's a lot to ask". I wouldn't be surprised if "big ask" took its place alongside similarly-motivated corporate babble like "going forward" or "touch base" but I haven't heard it shouted into mobile phones yet.

Let's hope we never do.

Stephen Ritcey writes:

I work for an electrical utility. I stumbled onto your recent summary of the state of ask-as-noun via a Google search provoked by the following, found in an internal email message:
[We] must have your approval to move forward so please respond to the ask as instructed.
Clearly the phrase has already migrated to business-speak.

Yikes.

Alan Walker writes:

I was researching the phrase "a big ask" for my Australian English word game when I came across your posting about it on the Language Log website.

The phrase is listed as an Australian one, meaning a difficult target, in "A Dictionary of Australian Colloquialisms", G A Wilkes, Fourth Edition 1996, Oxford University Press, Melbourne.

Wilkes gives two Australian quotes illustrating the evolution and usage of the expression:

From 1985: "He had set an ask of $17,990, which was really stretching things."

From 1989: "A premiership, a State of Origin jumper and belting Canterbury at Penrith Park. Three big asks, but if they all happen, won't the phone be ringing off the hook then."

[ Comments? ]

Posted by Eric Bakovic at 09:41 PM

It's ablaut time

David Mortensen has started a new linguistics weblog It's Ablaut Time ("A Weblog of Popular Philology").

The inaugural post sez:

"It now seems to be a requirement of the field that linguists--like law professors, philosophers, and people with too much spare time--have to have a blog. The title I've chosen is one that a couple of friends of mine once kicked around for a (humorous) magazine. It expresses at least three themes that will come up on this blog: historical linguistics, non-concatenative morphological processes, and stupid puns."

David adds later:

"You may notice that there is nothing popular about this blog, and that it has featured little if any philology (at least in the modern sense of the term). Too bad: we had already settled on the subtitle before I even knew what a blog was. ;) "

As for what this new blog does feature, the content so far includes an interesting post about how to say "deaf" in Hmong and Kachai -- though you'll need to know a bit of terminology and theory to see the direction of David's thoughts -- and another post entitled "Language is Bluffing", which is accessible but deep:

It strikes me that a huge number of insights into linguistic phenomena can be dervied from a few relatively simple propositions. One of these is the observation that language is a code employed only by code-breakers: that none of us knows the language we speak as a fully explicit system. Instead, we bluff our way through, filling in the gaps in our knowledge of the code with an inference here and a leap of logic there. This capacity to extrapolate from the known to the unknown is, in essense, grammar. If these inferences follow naturally enough from the parts of the code everyone around us agrees upon, they are incorporated into it. If they don't follow at all from shared knowledge of the code, we come off looking inarticulate. The interesting thing is that the parts of the code we all agree upon were, at some point in the past, somebody's bluff.

Language dynamics can be modeled as a kind of multi-player game of imperfect information. That's true for developments at several time scales: individual lives, cultural histories and species evolution. This idea is about at the stage of formal language theory in 1950 or so, but some interesting exploration can be found e.g. in a book draft by Partha Niyogi, "The Computational Nature of Language Learning and Evolution". I'll have some more to say about this book later on, time permitting, but for now I'll just slip a link in here.

Posted by Mark Liberman at 07:30 PM

Many new rules a little meaningfully

Although automatic translation has been making great progress, there is still plenty of headroom for improvement. In a recent post, I linked to an article in German from the Stuttgarter Zeitung, sent in by Julia Hockenmaier. I started to add a link to Google's "Language Tools" translation, but after reading the Englished version, I decided to post the link separately.

Google's "Language Tools" MT of the article can be found here. The headline and first paragraph are funny enough to quote in full:

"many new rules a little meaningfully"

Tübingen - the Tuebinger linguist Wolfgang star field counts on an increasing scooping out of the spelling reform. Much at the new set of rules is "of the linguistic point of view from a little meaningful" and in practice will not become generally accepted, did not say a university professor on Saturday. "who comes to few thereby by right, is the teachers. They have simply no authority during the orthography." For the pupils against it a return to the old rules would not be a particularly large problem.

My feelings exactly, did not say a university professor on Saturday. Much at many things is "of the linguistic point of view from a little meaningful," certainly, and there can be no question that "who comes to few thereby by right, is the teachers."

Somebody should put these programs in a time capsule. Once MT really works, a reliable source of innocent amusement will be lost forever.

Posted by Mark Liberman at 01:18 PM

More on spelling unreform

Julia Hockenmaier emailed in response to my post on German spelling unreform. She corrected Deutsche Welle's mistranslation of Adolf Muschg, pointed to some remarks in the Stuttgarter Zeitung by Wolfgang Sternefeld, and added some comments of her own.

Here's her email, which I'll quote in full:

I just saw your post on the Language log about the German spelling reform; I also just found an article (based on a DPA newswire story) in which Wolfgang Sternefeld gives his opinion about it. The article that I found is at http://www.stuttgarter-zeitung.de/stz/page/detail.php/777902 (there might be others).

Sternefeld says that he expects that the reform will essentially become moot, since a lot of the new rules don't make any sense from a linguistic point of view, so that people will not actually follow them in practice. He also says that those who are the least able to deal with the reform are school teachers, since they have "no competence in orthography". But he thinks that the students [who have been taught the new rules exclusively since at least '98] wouldn't find it hard to return to the old rules. [I think I might disagree with him there: I know some teachers who have said they found it very difficult to switch to the new spelling, but I'd imagine younger children would find it hard to get back to the old system, especially since all new children's books now published with the new spelling]

Sternefeld also says [rightly so] that the reform became a political issue early on that took its own course. He also says that hardly anybody listened to the linguists, so that some of them became frustrated and stopped participating in the debate.

His prognosis is that the reform will largely be retracted in practice and that only the reasonable rules are going to survive. He himself only adopted two of the new rules: to use "ss" instead of "ß" after short vowels and the hyphenation of "s-t" [I don't even know what rule that is], and he finds the new rules to spell certain kinds of compounds as separate words very confusing.

I haven't lived in Germany since 1999, and my German spelling and punctuation are now certainly much worse than they were before I wrote and read mostly in English; but I do remember that a lot of the new rules seemed very counterintuitive, and even wrong, when they were first introduced. As far as I can tell, a lot of people now spell whichever way they want. But this recent debate made me look back at the old spelling, and I was surprised to find that I have switched to the new spelling in many cases (just like I now find some British spellings a little unusual, after only 18 months in the US).

The new rules concern mostly difficult cases of capitalization, some compounds or prefixes, punctuation, and the spelling of some loan words to make them look more German. Apart from the spelling of individual words, most of these rules concern aspects of the language that do not exist in English (the capitalization rules), or are not very much reglemented (punctuation, certain compounds), so it probably doesn't make people cease to understand each other if they do not adhere to the new (or old) rules.

Some of the new rules really do not make much sense: certain compound rules create homographs that didn't exist before or create otherwise unintended ambiguities, some new spellings of foreign words make it hard to relate them to where they originated (which itself might not be too bad, but it now looks in some cases as if words with very different origins have similar roots). But other (like the ss rule) certainly make a lot of sense.

Whatever one thinks about government-ordered spelling rules, simply returning to the old rules is not going to improve anything. A linguistically well-informed discussion would certainly be much needed. Good spelling is still very much considered a hallmark of education (which probably explains why people reacted so emotionally to the issue), but probably for all the wrong reasons. Also, in Germany "dyslexic" is still far too often used as an insult (to people who are anything but) without any understanding of the disability, and the people at der Spiegel should know better not to use it in the way they did.

Germany has changed a lot over the last 15 years (in increasingly painful ways for many) and people seem to be keen to hold on to whatever they can. Perhaps this whole debate should be seen in that light. (The Spiegel statement says also that they are "in favor of urgently needed and sensible reforms in our society", which they contrast with the spelling reforms).

Best,

Julia

PS: Muschg said "goiter" (Kropf), not "gout" -- "unnötig wie ein Kropf" just means completely unnecessary.

Posted by Mark Liberman at 12:56 PM

"State-ordered dyslexia"

According to Matt Surman on the AP wire and James McKenzie at Reuters, some major German publishers (Der Spiegel, Axel Springer Verlag, the Sueddeutsche Zeitung) have decided to join the Frankfurter Allgemeine Zeitung in abandoning the spelling reform started in 1998. FAZ jumped off the bandwagon in 2000. The AP quotes a joint statement by Matthias Doepfner from Axel Springer and Stafan Aust from Spiegel: "In responsibility to later generations, we advise others to end the state-ordered dyslexia and to return to classic German usage."

Here's an English-language article from Deutsche Welle, in which "Swiss author Adolf Muschg" is quoted as telling Bild-Zeitung that "[t]he spelling reform ... is as unnecessary as gout". But the president of the Kultusministerkonferenz, Doris Ahnen, "doesn't believe anything will change" in that group's views, saying that "[m]ost of the ministers in the committee are still in favor of the reforms, and are in favor of making them mandatory next August."

Here's the BBC's story, and here's a piece by Luke Harding in the Guardian.

The official web site on the spelling reform is at the Institut für Deutsche Sprache (IDS) in Mannheim. Here's an English-language description from UniLang.

There was some discussion on LinguistList back in 1998, by Martin Haspelmath and Gisbert Fanselow. I look forward to some informed comments on the current situation from others, but there are some interesting themes here: "state-ordered", "classic usage", "dyslexia", "gout", "mandatory".

[via A.L.D.]

[Update: Julia Hockenmaier emailed to point out that Deutsche Welle mistranslated Muschg -- the word he used (Kropf) means "goiter", not "gout", and is part of an expression "unnötig wie ein Kropf" that means "completely unnecessary".]

Posted by Mark Liberman at 09:27 AM

Half

A controversy over scalar predicates at the Thimk Institute, Linguistics Division. The OED hints at an alternative analysis, from the obsolete noun half-ass meaning "mule"...

Posted by Mark Liberman at 07:49 AM

August 06, 2004

Virtual IPA keyboard from E-MELD

A (more) convenient way to enter IPA, called Charwrite©, is available from the E-MELD project here.

It pops up a javascript window, with mouse-sensitive IPA charts that generate Unicode html entities for you, so that you can generate strings like [ˈsɪŋ ˈsæŋ ˈsʌŋ]. Basically this saves you looking up the codes and typing [ˈsɪŋ ˈsæŋ ˈsʌŋ] -- at least that was my previous method.

You can also download the code to set up your own web pages with the same facility, if you want to build IPA entry into a web-based application of some kind.

Now if only you could rely on actually seeing the right glyphs, with every browser in every OS with every font set-up with every ... And someday, even the diacritics will work!

Posted by Mark Liberman at 10:00 PM

Documenting endangered languages

There's just been a new program solicitation posted:

"This multi-year funding partnership between the National Science Foundation (NSF) and the National Endowment for the Humanities (NEH) supports projects to develop and advance knowledge concerning endangered human languages. Made urgent by the imminent death of an estimated half of the 6000-7000 currently used human languages, this effort aims also to exploit advances in information technology. Funding will support fieldwork and other activities relevant to recording, documenting, and archiving endangered languages, including the preparation of lexicons, grammars, text samples, and databases. Funding will be available in the form of one- to three-year project grants as well as fellowships for up to twelve months. At least half the available funding will be awarded to projects involving fieldwork.

"The Smithsonian Institution's National Museum of Natural History (NMNH) will participate in the partnership as a research host, a non-funding role."

Posted by Mark Liberman at 09:36 PM

The Navajo Language Academy

I spent the 19th through the 23d of July at the Navajo Language Academy session at the San Juan Campus of the College of Eastern Utah, in Blanding, Utah. Blanding is not connected very directly to the major air hubs, so I flew to Albuquerque and rented a car. By an amazing coincidence, it had an appropriate license plate:

Here's what the route through Northeastern Arizona looks like. This is near Many Farms.

The NLA is devoted to the scientific study and promotion of the Navajo language. It runs courses on theoretical and applied linguistics and holds research workshops.

Some of the linguists involved in the NLA are non-Navajos. One who played an important role was the late Ken Hale, seen on the right at the 2001 session in Rehoboth, New Mexico.

In the foreground at left is Aryeh Faltz, author of The Navajo Verb, and in the middle is Ted Fernald. Navajo is unusual in that there are a number of Navajos with advanced training in linguistics. In the background at left is Ellavina Tsosie Perkins, one of three Navajos with Ph.D.s in linguistics. (The others are Mary Ann Willie and Paul Platero.) Here she is holding forth at this summer's session.

Unlike most other native American languages, Navajo is still in widespread and active use. Most adults speak Navajo; indeed, most of the older generation don't speak English. The Navajo Nation radio station, KTNN ("50,000 watts of Indian power"), broadcasts in both Navajo and English. There are music programs in which the DJ speaks Navajo, news broadcasts in Navajo, and advertisements in Navajo. At the NLA, some courses are conducted in Navajo, and some colloquium talks are given in Navajo. Even so, Navajo is endangered, since most of the current generation of children do not speak the language.

Posted by Bill Poser at 03:55 PM

Presidential Immortality

A few weeks ago on ABC News Peter Jennings commented that Ronald Reagan's death was only a matter of time once he finished his second term as President. I think that what he meant was that because Reagan was so old when he became President we wouldn't expect him necessarily to live very long afterward. However, the way he put it, "once he finished his second term" suggests, at least to me, that there was no risk of his dying so long as he remained President. Of course, this isn't true: the clock doesn't stop for Presidents. As far as I can tell, nobody else has noticed this. I'm not sure if nobody cares or if for most people once has come to mean the same thing as after.

Posted by Bill Poser at 02:21 PM

Indo-Canadian Food

In the Overwaitea supermarket here in Prince George, British Columbia they have changed the signs that tell you what items are to be found in each aisle. One aisle now contains Indo-Canadian Food. It used to be Indian Food. The new sign is not an improvement. The relevant section contains food of the sort typically eaten in India, most or all of it imported from India. For such food the the label Indian Food is quite accurate. Indo-Canadian Food is inaccurate. It would be appropriate for food characteristic of Canadians of Indian origin, if such a thing exists.

Why the peculiar new sign? I'm not sure, but a couple of possibilities come to mind. First, Indian is ambiguous, since it can refer both to South Asian Indians and American Indians. In practice, though, this isn't likely to be the motivation for changing the sign. Canadian supermarkets rarely if ever carry foods characteristic of American Indians. I doubt that anyone has ever been disappointed in not finding dried salmon, moosemeat, or bear grease in the Indian Food section. The second, and more likely, possibility is that this is misguided political correctness. Indo-Canadian is, reasonably, preferred to Indian as the designation for Canadians of Indian origin or descent because it makes clear that, whatever their origin, they are nonetheless Canadians, whereas Indian suggests, by omission, that they are not really Canadian. My guess is that the Overwaitea people have mistakenly extended this designation from people to food. It is quite possible that most such food is purchased by Indo-Canadians, but the food is nonetheless Indian, not Indo-Canadian.

Posted by Bill Poser at 02:17 PM

The acid test of foreign language command

One of the pleasures of National Public Radio this week was hearing Rob Gifford's multi-part story about crossing China by road. At one point he stopped by a tiny Christian church where a small congregation was singing hymns and waiting for its itinerant preacher to show up and preach the sermon. Since the preacher hadn't shown up, and Gifford was there, and he is a westerner, and all westerners are Christians, they told him he should preach the sermon. He was embarrassed, and told them he was totally unequipped to do any such thing, but they insisted. So he took a bible in one hand and his tape recorder in the other, stood at the litle pulpit, a preached a short sermon in Mandarin Chinese. I'd be interested to know how good it sounded to someone who knows the language (it is archived here, and there's a photo gallery of the little church). My impression of the phonetics was that it was pretty damn good. But as Dr Johnson once remarked, unkindly comparing women preaching with dogs walking on their hind legs, one's surprise is not at so much at how well it is done but at the fact that it is done at all. Could you preach an impromptu sermon in a foreign language without preparation? I wish I could. For any religion, in any foreign language.

Thanks to Geoff Nunberg for an attribution correction. I am no good on remembering who said what in periods earlier than about 1970.

Posted by Geoffrey K. Pullum at 02:13 PM

Rhetorical flypaper

Birger Nielsen quotes "Thirty-eight dishonest tricks which are commonly used in argument, with the methods of overcoming them", from a 1930 work Straight and Crooked Thinking by Robert Thouless.

Here's what Thouless says he's up to:

"In most textbooks of logic there is to be found a list of "fallacies", classified in accordance with the logical principles they violate. Such collections are interesting and important, and it is to be hoped that any readers who wish to go more deeply into the principles of logical thought will turn to these works. The present list is, however, something quite different. Its aim is practical and not theoretical. It is intended to be a list which can be conveniently used for detecting dishonest modes of thought which we shall actually meet in arguments and speeches. Sometimes more than one of the tricks mentioned would be classified by the logician under one heading, some he would omit altogether, while others that he would put in are not to be found here. Practical convenience and practical importance are the criteria I have used in this list. If we have a plague of flies in the house we buy fly-papers and not a treatise on the zoological classification of Musca domestica. This implies no sort of disrespect for zoologists; or for the value of their work as a first step in the effective control of flies. The present book bears to the treatises of logicians the relationship of fly-paper to zoological classifications."

A neat idea; and the list of 38 tricks and parries is interesting. I do wonder, "why 38?"

Google doesn't really help here, other than to suggest that it's probably an accident. There are 332,000 pages indexed by the search string "thirty eight", as opposed to 310,000 for "thirty seven" and 306,000 for "thirty nine". Other than the cited page (which is number 1 for "thirty eight"), highly-ranked "thirty eight" pages include references to the 38 witnesses who didn't help Kitty Genovese, a Stargate Atlantis episode called "Thirty-Eight Minutes", and the DNC's claim that there were " 38 separate instances of intelligence that the State Department knew was faulty" in Colin Powell's WMD speech to the U.N.

Anyhow, the Thouless book was in print from 1930 through the mid seventies, but it doesn't seem to have done much to diminish the population of verminous insects in the world's rhetoric over the intervening decades. I'm reminded of the joke about the two Americans visiting New York. They have a series of communications failures with foreign tourists, who ask them for directions in a variety of languages, none of which they can understand. One of them says to the other, "you know, maybe we should learn another language." The first one responds "What for? Fat lot of good it did those foreigners!"

All the same, it would nice to see a similar analytic spirit applied again to the practical analysis of rhetorical techniques, both honest and dishonest. There's been a recent revival of interest in formal analysis of rhetorical structures, among cognitive psychologists and computer scientists as well as linguists. However, I haven't seen much connection between these strands of work and researchers in the social sciences -- or practical folk in politics and advertising either.

This sort of analysis is different from the recently-popular discussion of " framing" political debates. It's not that one is right and the other is wrong, they're just about different things. It would be a shame for the only analysis of political rhetoric to be in terms of frames, metaphors and word choices, as interesting as those topics are.

[Nielsen link via boingboing]

Posted by Mark Liberman at 09:13 AM

August 05, 2004

Music of the spheres -- and everything else

I've been working hard all day on non-bloggery -- letters of recommendation, grant proposals, project planning, laundry -- and so I'm going to indulge myself in a small rant about the popular presentation of astronomy. I'll get to a linguistic hook at the end.

Last year, it was "outbursts from a giant black hole" in the Perseus cluster at a "frequency ... equivalent to a B flat, 57 octaves below middle C". Now, according to an article by Dennis Overbye in the August 2 NYT, "another group of astronomers has discovered waves from another massive black hole spreading outward from the center of a galaxy known as M87". This one is "a little more than an octave higher than the Perseus black hole", and "a little rougher and less pure", sort of "like the cannons in the '1812 Overture'".

Give me a break. I'm not going to be as hard on Overbye as Geoff Pullum was, but I can't let this go by without a comment.

The universe is full of pitches, rhythms, periodicities at all scales of time and space. You can express any repetitive process as a musical tone if you want to. For example, you might pull up a sequence from the Tour de France, and find that Lance Armstrong is pedaling at a pitch of D, about 7 octaves below middle C. But except for a narrow range of frequencies of variation in air pressure, this translation into musical tone classes is irrelevant foolishness -- what you should really say is that Lance is pedaling at 140 cycles per minute.

Here's another example.

Because of interest in the southern oscillation, it's possible to find online a table of Monthly Mean Sea Level Pressure, measured at Tahiti, from 1876 to the present date. We can take this time series and look at its spectral content (I used the wonderful, free R statistics system):

If we translate 'cycles per month' into 'cycles per year' we find that the dominant frequency in the long-term variation of (monthly average) barometric pressure measured in Tahiti is -- big surprise! -- exactly one cycle per year. We have a name for this "pitch" -- it's called "the seasons".

Alternatively, we could translate this frequency into cycles per second, and say (truthfully!) that the seasons are a very, very deep tone. Specifically, the air pressure variation associated with the seasons is a pitch near C#, about 33 octaves below middle C. This is the result of a trivial calculation:

Seconds per year = 365.25*24*60*60 = 31,557,600
One cycle per year, in cycles per second = 1/31,557,600 = 3.168809e-08 Hz
(1/31,557,600)*2^33 = 272.1986

So 272.1986 is the frequency in Hz. of the seasons (one cycle per year), shifted up in pitch by 33 octaves.

Middle C is nine semitones down from concert A at 440 Hz. -- that's 440/(2^(9/12)) = 261.6256 on an equally tempered scale. The C# one semitone up is 440/(2^(8/12)) = 277.1826 Hz.

So if I didn't embarrass myself by making a mistake in arithmetic, the seasons are pretty close to C# -- a mere 1.8% flat, less than a quarter tone -- about 33 octaves below middle C.

But does all this really tell us something that we didn't already know? On the contrary, it takes something very simple and obvious -- that the seasons run on a yearly cycle -- and expresses it in two fancy, mysterious and completely unhelpful disguises. We first disguised the seasons as a spectral peak at 3.168809e-08 Hz (about 32 nanohertz), and then as a musical tone near C# 33 octaves below middle C.

Why do journalists -- and scientists -- fall into this nonsense for intergalactic gas clouds, but not for weather patterns or bicycle pedaling or political speeches? I suppose it's partly the old meme of the "music of the spheres", and partly the fact that intergalactic gas is outside of everyone's ordinary experience, so it's not so obvious how dumb the calculation is.

In fact the spacing of the Perseus Cluster pressure waves -- B flat, 57 octaves below middle C -- corresponds to a frequency of (233.0819/(2^57)) = 1.617330e-15 cycles per second, or (233.0819/(2^57))*365.25*24*60*60 = 5.103907e-08 cycles per year, or about 19,592,833 years per cycle.

In other words, one wave front passes by about every 19 and a half million years. (Or maybe half that, depending on which octave the "pitch" is really in).

Do you feel enlightened, knowing that this is close to a B flat?

Music of the spheres? Music of the con artists, if you ask me. But maybe we linguists should learn something about public relations from these guys. Perhaps I should offer Dennis Overbye a scoop -- oh, say, that new measurements of George Bush's stump speech shows a syllabic frequency almost exactly 6 octaves below middle C. Whereas John Kerry's speeches are in the syllabic key of... well, you get the idea.

The fact is, it might be interesting to look at political discourse in terms of simple measurements of speech rate, pitch range and so on. But translating those measurements into musical tone classes would be obfuscation or worse.

[To prevent misunderstanding, let me make it clear that I haven't actually measured the average syllabic rate in the current stump speeches of any national candidates. But I very easily could -- Dennis, are you reading this?]

Posted by Mark Liberman at 08:15 PM

Content clauses are not necessarily complement clauses

Andrea Lafferty, the executive director of the Traditional Values Coalition (a conservative religious organization) was recently quoted here by Brian Leiter saying something that provides an excellent illustration of the rationale for a terminological distinction made in The Cambridge Grammar of the English Language. Ms Lafferty said:

"There's an arrogance in the scientific community that they know better than the average American."

The Cambridge Grammar refers to finite clause constituents like that they know better than the average American as content clauses, taking the term from the great 20th-century Danish grammarian Otto Jespersen. We don't call them ‘that-clauses’, and we don't call them ‘complement clauses’, and there are solid reasons for both decisions. Ms Lafferty's quote provides a good example to illustrate why the second of those decisions is correctly made.

There are two reasons we don't call constituents of this sort ‘that-clauses’. First, that would be a parochial term rather than a universally applicable one: other languages have constituents of what appear to be exactly the same type, but in Spanish they're marked with que and in German they're marked with dass and in Hindi they're marked with ki and so on. Second, in many contexts the word that is omissible, and it would seem perverse to name a constituent after the one word in it that is freely omissible without any change in the construction (omit any other word from that they know better than the average American and you get either an ungrammatical constituent or at best one with a different meaning).

But we also don't call them finite complement clauses, though many linguists would. The reason is that content clauses are often complements, but not always. Notice that there is no way we can say in general that the noun arrogance takes content-clause complements: it just isn't grammatical to say something like *His arrogance that everything will be all right amazed me. (Try replacing arrogance by assumption and note the difference.) You might want to say that Ms Lafferty's remark isn't grammatical either, but it surely comes close, and it's fully intelligible (Brian Leiter quoted it and discussed its content at length; he didn't say it was garbled and he couldn't understand it). So set aside the question of whether it's perfectly grammatical, and just consider how we can talk coherently about its structure, for it certainly has syntactic structure. We can relate it to something found elsewhere if we note that occasionally utterances like this are encountered:

What's up with you, that you're looking so miserable?
You must have been sitting awfully quietly, that he could could come in there and not notice you.

What's important about such examples is that the clause after the comma is not subordinate, in the sense of having the function of complement to some noun, verb, adjective, adverb, or preposition that licenses it. Ms Lafferty's remark can be regarded as illustrating the same sort of possibility. Whatever the exact details of the structure, the point is that the constituent that they know better than the average American is a content clause but it's not functioning as a complement clause in Ms Lafferty's sentence (we're leaving the matter of how it does function to be determined by future research). So the property of being a finite complement clause is distinct from the property of being a content clause, despite the fact that nearly all content clauses function as complements.

Posted by Geoffrey K. Pullum at 04:59 PM

Disgust and language: metaphor, reality, both?

In yesterday's post on disgust for voices and accents, I suggested tentatively that this topic had been missed both in the literature on disgust and the literature on sociolinguistics. I also wondered how to distinguish between "conventional expressions of prejudice" and "real emotions of disgust".

Paul Bloom picked up on both of these issues in email:

I saw your very interesting comment on disgust in "Language Log"; two quick thoughts:
1. I'm on the road right now, with no access to books, but I have a strong memory that William Ian Miller, in "The Anatomy of Disgust" includes a discussion of the disgust generated by speech, including by certain accents.
2. My sense, though -- and I've argued this in detail elsewhere -- is that we don't feel actual disgust in response to experiences such as, e.g., Boston accents or Southern accents. We call it "disgust", but this is a metaphoric usage, used to express a certain form of strong disapproval. With only a few exceptions, disgust only really occurs in response to the "core disgust" stimuli listed by Paul Rozin: feces, vomit, rotten meat, etc. Someone might describe a certain accent as disgusting, for instance, but they won't actually show any of the facial and bodily signs of being disgusted. That's the theory, anyway.

I obviously need to do some more reading.

Meanwhile, I'll express an amateur's opinion that Paul is (at least sometimes) wrong about the physiology of disgust for voices and accents. For some years, I've been playing samples of sometimes-stigmatized dialects to classes of undergraduates, and some audio clips definitely produce facial expressions of disgust for some student listeners. When I ask individual students about this in private, some of them confide that particular accents make them feel visceral flashes of negative emotion that they can't control, even though they think that these feelings are inappropriate -- or think that they should say that they think that these feelings are inappropriate.

I'm not sure that the negative emotions in question are always (or even ever) the same as the disgust we feel for feces, vomit etc. Other feelings such as resentment and annoyance are available, and I'm not sure how to distinguish these other than by verbal reports, which I admit are not reliable even when honest. But I think there's a reasonable prima facie case that some degree of real disgust is sometimes involved.

I should also say that it's quite common for feelings of shame to come into play. "I hate my accent" and similar phrases are commonly heard, and even more commonly felt, though usually in the context of complicated identity issues:

(link) i have lived here my whole life and i wanna move away. I hate my accent.. i will spend my entire life trying to rid myself of it. i never ever date southern guys.. my boyfriend now is from cleveland, ohio.
south carolina= rednecks/sluts/football crazed lunatics.
( link) I try not to listen to my voice. The accent makes me cringe.

But I'm out of my depth at this point, in several different disciplines at once.

Posted by Mark Liberman at 11:40 AM

Making the point

On NPR's Morning Edition this morning, White House Correspondent Don Gonyea commented on a speech yesterday by the President in Iowa, which included the claim that Bush_i had "made the point that he_i has made the country safer" (coreference safely inferred).

To me, make the point that x presupposes the truth of x (or at least strongly implicates it; I don't know, it's been a while since I've thought deeply about the fine line between these two). I can think of only one reasonable scenario in which Gonyea did not intend to raise this presupposition/implicature, but do reporters' intentions matter much when it comes to how the public interprets what the media feeds it?

So here's the scenario: being a White House Correspondent, Gonyea lives and breathes the White House and the President. Apart from the Bush Administration itself, he is among those that know the most about what the President says and does publicly. We all know that Bush has been repeating his claim that his policies have made the country safer, but Gonyea -- bless him -- gets to hear the claim every time Bush makes it. Plus, he probably gets updates in the White House press room that summarize for the correspondents the "points" that Bush has made in a recent speech or will make in an upcoming speech.

So, to Gonyea, the embedded clause "that [Bush] has made the country safer" might just be an unanalyzed "point" that he knows Bush is repeating over and over again on the campaign trail. Under this scenario, I can see how Gonyea may not have intended to raise the presupposition/implicature that Bush has actually made the country safer. But even if this is all correct, I think that Gonyea should have been more careful with what he said: the kind of presupposition (or implicature) that shouldn't be raised in an unbiased news report was in fact raised, whether he liked it or not.

[ Comments? ]

Posted by Eric Bakovic at 11:15 AM

If P, so why not Q?

Gary Shapiro has a nice obituary in the New York Sun for Sidney Morgenbesser. Morgenbesser is probably best known to linguists for his riposte to J. L. Austin's claim that two positives never make a negative: "yeah, yeah..." Shapiro's article gives a number of other quotes -- it would be nice to have a complete collection.

A sample from Shapiro...

Morgenbesser on Jewish logic: "If P, so why not Q?"

On scholarly publication: "If your grandmother knew it, don't publish it." And, “Moses published one book. What did he do after that?”

Questioned for jury duty as to whether the police had ever treated him unjustly or unfairly: “Unfairly yes, unjustly no. The police hit me unfairly, but since they hit everyone else unfairly, it was not unjust.”

Gentile ethics: “ought implies can” vs. Jewish ethics: “can implies don’t.”

Q: why there is something rather than nothing?
Morgenbesser: “Even if there were nothing, you’d still be complaining!”

Student: “I just don’t understand.”
Morgenbesser: “Why should you have the advantage over me?”

Morgenbesser on the three types of umpires: the realist who says,“I call them the way they are”; subjectivist who says, “I call them the way I see them,” and the conventionalist, who declares, “I call them and then they are.”

Some more from the comments at Crooked Timber:

Sidney Morgenbesser walks into a restaurant, has dinner, and then asks the waitress what they have for dessert. She says apple pie and blueberry pie. Sidney Morgenbesser says he’ll have the apple pie. She comes back in a moment and says that they also have cherry pie. So Sidney Morgenbesser says “In that case, I’ll have the blueberry pie.”

Morgenbesser on George Santayana: “There’s a guy who asserted both p and not-p, and then drew out all the consequences…”

From the New York Times obituary:

Morgenbesser a few weeks before his death: "Why is God making me suffer so much? Just because I don't believe in him?"

From normblog:

At a conference on cognitive psychology and philosophy of mind, one scholar was presenting what was at the time a popular line on how 'madness' had no real referent and was merely a product of power-laden 'othering'. Sidney chimes, "You mean to tell me that it's all in my head?"

Maoist student: "Professor Morgenbesser, do you mean to say that you disagree with Chairman Mao when he states that a proposition can be true and false at the same time?"
Morgenbesser: "I do and I don't."

Posted by Mark Liberman at 09:07 AM

Big Grammar is Watching You

I may owe Christie Vilsack an apology. I think. At least, I might have to withdraw the record for Within-U.S. Linguistic Prejudice in Journalism that I awarded her on July 27, and re-assign it to the former record-holder, Michael Lewis. Probably. It's that old devil irony again. Sometimes it makes it hard to figure out what someone really means, and this is one of those times.

The Des Moines Register has reprinted Vilsack's 1994 column, where most of the quotes from her originated. I can't find it on the DMR web site, but it was re-reprinted on July 28 in a discussion area at the Boston Herald, which is the paper that brought the whole thing up in the first place. My speculation that David Guarino's 7/26/2004 article was an outlet for material fed by Republican researchers, timed to embarrass Vilsack just before her speech at the Democratic Convention, is echoed by others in the same discussion area, as well as by this editorial from Des Moines Register, though no one seems to have any proof.

As I suspected, the context of the whole column does affect the interpretation of the selective quotes. When Vilsack fantasizes about a future society in which Big Grammar is Watching You, it's clear that she's being satirical:

Maybe we could tie getting a driver's license with learning to use English correctly. Or maybe with the advent of the fiber optic network, people could be required to study language via computer five minutes a day the same way we convinced people to take the time to floss and recycle. Or maybe, like universal health care coverage, we could require employers to subsidize language classes for all employees.

Maybe we could tax those who prefer not to be grammatically correct or fine those who choose to speak the dialects of their geographical areas and double the fines for those who use slang or colloquialisms.

Think of the day when everyone in our country will speak English as well as the talking heads who anchor the TV news. Think of the day when American fiction will finally be purged of substandard English, so that reading Mark Twain's Huckleberry Finn will seem as strange as reading Canterbury Tales in Old English [sic].

The thing is, the essence of the satire seems to be that these measures, as desirable as they might be from her point of view, aren't politically or economically feasible ("English majors would outnumber business majors five to one."). She does put in a plug for diversity ("Our language is as flexible and diverse as the millions of people who speak it. I enjoy listening to people who are computer literate speak an English I can't understand. I'd be sad if I never heard another Iowan say, 'Where's it at?'"), though it's not so clear that she'd be sad to lose "substandard" variants from New Jersey and some of the other places she sneers at elsewhere in the piece.

She blames "grammarians" for the standards that nobody measures up to, but also describes adherence to these standards as "[getting] it right": "Even though we require our children to study the English language for 13 years in school, we can't seem to get people to speak English the way grammarians who write textbooks want us to. There has to be some way to make people continue to study the language until they get it right. "

She also concludes that "We don't need to mandate English as the official language of this country. People confronted with English as a second language seem more interested in learning to speak it correctly than those of us born here."

Anyhow, I've reprinted her whole column below for your convenience, and you can read it and see what you think. My own reading is that it's a mixture of genuine and unexamined distaste for American regional and class accents, an unthinking acceptance of Grammatical Correctness, a somewhat random mixture of stories about cross-dialect misunderstanding, some playful fantasizing about a dystopian society under the thumb of Big Grammar, and reasonable opposition to English Only laws as unnecessary. My recommendation to would-be politicians still stands:

OK, everyone, make a note: if you want to be a politician in 21st-century America, take a linguistics course and learn how to think and talk about dialect variation in a rational way.
Avoid those embarrassing gaffes! You too can learn to define and promote language standards without treating non-standard speech as lawlessness, stupidity, disease, laziness, duplicity or bad posture!

This column by Christie Vilsack, whose husband, Tom, then served in the Iowa Senate, appeared in the Mount Pleasant News on Aug. 24, 1994, under the headline "Hablas Ingles? Czy Pani Mowi Po Angielsku?" She wrote a weekly column for the paper called Main Street.

The time has come to make English the official language of this country.

I realized the situation had finally gotten out of hand when I couldn't even communicate with a woman running the cash register at a grocery store in New Jersey. "To whom do I make this check?" I asked. I thought she said, "Food Rush," but that didn't sound right. So I asked again.

The third time, embarrassed, I asked her to spell it. "Rush," she enunciated carefully. F-R-E-S-H, she spelled. I'm sorry, I said. I'm from Iowa and I'm having a hard time understanding people from New Jersey. "That's OK," she laughed. "I have a hard time understanding foreigners who speak English." I wasn't quite sure how to take that.

Later, on the boardwalk, I heard mothers calling to their children: "I'll meet yoose here after the movie." The only way I can speak like residents of New Jersey and eastern Pennsylvania is to let my jaw drop an inch and talk with my lips in an O like a fish. I'd rather learn to speak Polish.

Listening to me relate my experiences with the language of New Jersey, our friend Doug from Pittsburgh told about a recent trip to West Virginia where a waitress asked him if he wanted a "side saddle." He looked perplexed and studied the menu to see what that might be. His wife, who speaks British English figured it out. "She's asking you if you want a 'side salad,' " Shelly interpreted.

Her uncle, who speaks with an Oxford accent, was flying recently when his seat mate, an American, asked about his surname. "Robinson," he said proudly, rolling his R's. The woman said, "Excuse me?" and repeated her question. "Robinson," he said again wondering at the look on her face. It turned out that she was asking him what they were "serving" for dinner.

In southern Ohio, my waitress drawled, "what cain I get you, babe?" And I overheard a woman at a nearby table in Keokuk recently say, "She don't pay me no never mind."

Last weekend my college roommate and her children arrived from England. Ilene grew up in Westchester County, New York. She calls the fruit "ahrange" and says the news was "hahrable," except now she does it with an English accent. When her children ask, "Where's the loo?" they mean the bathroom. Her 11-year-old was insulted that we had set up a cot for him. At home that means a crib.

Ilene said when she first moved to England, a hotel clerk innocently asked her, "When do you want me to knock you up in the morning?" All he wanted to do was give her a wake-up call.

English as we know it is too complicated. We have to do something before none of us can understand each. After all, if we're going to expect immigrants to learn our language we have to set a good example. How can foreign-born citizens learn English if we can't speak it ourselves?

Even though we require our children to study the English language for 13 years in school, we can't seem to get people to speak English the way grammarians who write textbooks want us to. There has to be some way to make people continue to study the language until they get it right.

Maybe we could tie getting a driver's license with learning to use English correctly. Or maybe with the advent of the fiber optic network, people could be required to study language via computer five minutes a day the same way we convinced people to take the time to floss and recycle. Or maybe, like universal health care coverage, we could require employers to subsidize language classes for all employees.

Maybe we could tax those who prefer not to be grammatically correct or fine those who choose to speak the dialects of their geographical areas and double the fines for those who use slang or colloquialisms.

Think of the day when everyone in our country will speak English as well as the talking heads who anchor the TV news. Think of the day when American fiction will finally be purged of substandard English, so that reading Mark Twain's Huckleberry Finn will seem as strange as reading Canterbury Tales in Old English.

Think of the demand for English teachers! English majors would outnumber business majors five to one. Newspapers would devote pages to the language arts and NBC and CBS would fight over the right to broadcast Monday Night Spelling Bees.

Far-fetched? Yes. ridiculous? Of course. Our language is as flexible and diverse as the millions of people who speak it. I enjoy listening to people who are computer literate speak an English I can't understand. I'd be sad if I never heard another Iowan say, "Where's it at?"

I like hearing historian Shelby Steele, from Mississippi, talk about his "people" the way I talk about "my folks." I like finding in the dictionary that a word I use commonly originated in the Greek language or in Spanish or German.

I am fascinated at the way some African-Americans speak to each other in an English I struggle to understand, then switch to standard English when the situation requires. I'm also excited that American Indian children are using computers to learn the nearly extinct languages of their ancestors. I enjoy teaching foreign children our language to complement the several other languages at which they are already proficient.

I am impressed at the extensive vocabularies of many of the international students in my college English classes who make better use of a dictionary than I do.

My college friend marveled at the fluent English spoken by the electrical engineer from a tiny country near Ethiopia who drove a taxi in Chicago. She feared living in Chicago might corrupt his English.

We don't need to mandate English as the official language of this country. People confronted with English as a second language seem more interested in learning to speak it correctly than those of us born here.

Posted by Mark Liberman at 07:54 AM

August 04, 2004

Disgust for voices and accents

A.L.D. links to two recent articles on disgust, one by Martha Nussbaum in the Chronicle of Higher Education, and another by Paul Bloom in the Guardian. Nussbaum has recently published a book Hiding from Humanity, in which she "argues that shame and disgust tend to distort public discourse in highly illiberal ways", a point echoed in her Chronicle piece. Much of the seminal work on the psychology of disgust has been been done by Paul Rozin.

I'd like to draw attention to what I think is an omission in this literature. As far as I know, none of those writing about disgust have focused on speech as one of its objects. Nor, as far as I know, have sociolinguists examined this question. (I may be wrong about this, and if I am, I'm sure that someone will correct me...)

Sometimes individual vocal characteristics cause reactions of disgust: "disgusting nasal whine", "hoarse, revolting croak", "nauseating reedy voice", and so on. It's interesting to consider what aspects of voice quality can trigger disgust, and what they might have in common with disgust-inducing tastes, smells, sights and ideas. But in this post, I'm interested in what happens when people find a particular regional, class or ethnic accent disgusting.

Disgust is not always involved in linguistic prejudice: accents may trigger stereotypes of sleepiness, ignorance, boringness, etc. that are negative in character but don't (necessarily) involve sensations of disgust. However, in many cases it does seem that something very much like disgust is involved. Henry Higgins asserts disgust explicitly in addressing Eliza Doolittle:

A woman who utters such depressing and disgusting sounds has no right to be anywhere—no right to live. Remember that you are a human being with a soul and the divine gift of articulate speech: that your native language is the language of Shakespear and Milton and The Bible; and dont sit there crooning like a bilious pigeon.

It's easy enough to find explicit references to sociolinguistic disgust on the web (though some of them may just be conventional expressions of prejudice that don't reflect real emotions of disgust):

(link) That disgusting Boston accent mixed with the obnoxious attitude could drive anyone insane!
(link) The people - and the common, disgusting accent of people who're actually from the city - at least those from surrounding areas sound like they have a bit of decency about them and dont sound like such total minks!! [worst things about Aberdeen]
(link) And there are two of them right outside my window, in my cosy little cul-de-sac street, and when I walked past them they did the whole whistling thing and shouted, really loudly, in the most disgusting accent you can possibly imagine:
(link) ... eventually Wahlberg, in his revolting Bronx accent, got a microphone and explained “Here’s da situation – we is gonna start da show again”.
(link) Rainmaker, your right. It sounds like a Jamaican accent................I hate that accent, its so frickin annoying, as is the Aussie accent.
(link) Homicide rates would go through the roof as more people are driven to kill thanks to the annoying accent.
(link) Why anyone would want to listen to that nauseating scouse voice again is beyond me.
( link) When she asked him if he'd like some "wooder" to drink, I had flashbacks to a high school classmate of mine from Jersey who would ask me to "cawl" her. I never did. That accent makes me shudder.

Disgust also seems to be the emotion involved when someone writes about "one of those Southern accents that puts your teeth on edge", and perhaps also when Christie Vilsack observes that "the only way I can speak like residents of New Jersey and eastern Pennsylvania is to let my jaw drop an inch and talk with my lips in an 'O' like a fish".

Following Rozin, Nussbaum insists on the role of cognitive categories in distinguishing disgust from distaste:

Disgust is not simple distaste because, Rozin has found, the very same smell elicits different disgust reactions depending on the subject's conception of the object. Subjects sniff decay odor from two different vials, both of which in reality contain the same substance; they are told that one vial contains feces and the other contains cheese. (The real smells are confusable.) Those who think that they are sniffing cheese usually like the smell; those who think they are sniffing feces find it repellent and unpleasant.

And Paul Bloom underlines the special role of sexual attraction in overcoming disgust:

After Stephen Fry outlines what he sees as the disgusting nature of sexual intimacy - "I would be greatly in the debt of the man who could tell me what would ever be appealing about those damp, dark, foul-smelling and revoltingly tufted areas of the body that constitute the main dishes in the banquet of love" - he notes that sexual arousal can override our civilised reticence: "Once under the influence of the drugs supplied by one's own body, there is no limit to the indignities, indecencies, and bestialities to which the most usually rational and graceful of us will sink."

In that connection, remarks like these may be revealing:

(link) I think the Australian accent is disgusting on girls but pretty sexy on guys.
(link) the boston accent is cool for guys but is horrible for girls.

Posted by Mark Liberman at 08:00 PM

Empathizing with Simon Baron-Cohen's cousin

In an earlier post, I mentioned British neuroscientist Simon Baron-Cohen, recently famous for his theory that autism is a symptom of an "extreme male brain", where male "systematising" takes over at the expense of female "empathising".

This 4/2004 weblog entry says that Simon Baron-Cohen is the brother of Sacha Baron Cohen, who performs as the hiphop journalist Ali G (the hyphens come and go in both Baron-Cohens' names). Ali G's HBO TV show is starting its second season, specializing in comically outrageous interviews with subjects who are not in on the joke. This 2001 article in the London Review of Books says that Simon "is also rumoured to be Ali G's cousin". An article in the August Vanity Fair, which seems to be authoritative, confirms that Sacha and Simon are cousins.

The sociolinguistic theory of accommodation "starts from the premise that speech accommodation takes place when people modify their speech so that it conforms more with the way their conversational partner speaks". This can involve echoing particular words, adopting features of pronunciation, using similar syntactic structures, and so on. Accomodation might be blind adaptation to experience, or it might be a more complex negotiation of identities. Some ideas about accommodation suggest that a form of empathy plays a role, at least in some aspects of the phenomenon. I don't know whether anyone has checked to see whether sufferers from autism exhibit the usual phenomena of sociolinguistic accommodation or not.

In any case, Ali G's act often involves contrasts of identities, speech registers, and sometimes simply word usage. He often gets his victims to accomodate to his choices in ways that make them seem a bit ridiculous, as in this conversation with Sir Rhodes Boyson from his British show:

Ali G:  "Do you believe kids should be caned?"
Rhodes: "I do. I..."
Ali G:   "You do! Wikkid, man. You believe kids should be caned even in school?"
Rhodes: "Even in school."
Ali G:  "Do you not think, Sir Rhodes, if you get caned in school you can't concentrate 
         as well. Because a lot of people out there say that if you're getting caned..."
Rhodes: "Well, I was caned in my time and I've concentrated all my life."
Ali:    "You were caned? Respect, man. Respect."
Rhodes: "It shouldn't be done evil and it shouldn't be done badly."
Ali:    "Aye, You've got to have good stuff."
Rhodes: "You have to have rules in life."
Ali:    "You have to have good cane."
Rhodes: "You have to have a good cane."
Ali:    "Okay, but you're saying the caning is cool."
Rhodes: "The caning is cool, and most boys prefer it to being told off."

His greatest triumph, as far as I've seen, was his success in getting Pat Buchanan to accomodate to the malapropism of BLT for WMD. Here's the transcript, courtesy of the Chris Matthews Show:

ALI G: "Does you think that Saddam ever was able to make these weapons of mass destruction 
or whatever,  or as they is called, BLTs?
Mr. PATRICK BUCHANAN: "The--was Saddam able to make them?"
ALI G: "Could he make BLTs?"
Mr. BUCHANAN: "Yes. At one time, he was using BLTs on the Kurds in the north. If he had 
               anthrax, if he had mustard gas..."
ALI G: "Whatever he put in them."
Mr. BUCHANAN: "No. No, no. If he had mustard gas, no."
ALI G: "Let's say he didn't have mustard and the BLTs just was plain. Would you have been 
        able to go in there then?"
Mr. BUCHANAN: "No."

There's some evidence that Pat Robertson might be another empathetic, linguistically accomodating kind of interview subject, though by now it would be very surprising for a public figure (and his handlers) to be unaware of the Ali G act.

Besides Ali G, Sacha's current alter egos include Borat the Kazakh TV reporter and Bruno the Austrian fashion writer. He apparently moved to the U.S. because Ali G had become too familiar for interview subjects to be fooled, but the U.S. remained a fertile field for this particular kind of con, as his success with Buchanan showed. According to the Vanity Fair article:

On the American version of Da Ali G Show ... he asked Boutros Boutros-Ghali (whom he introduced as "Boutros Boutros Boutros-Ghali"), "Wot is da funniest language?" The former secretary-general of the U.N. was flustered. Ali G pressed on: "It's French, innit?" Boutros-Ghali laughed and momentarily agreed that French was indeed an amusing language.

In dealing with people like Ali G, honesty and consistency seem to be the most effective policies, as in this interview with Tony Benn, where Ali G more or less fails to accomplish his goals. I've never seen his show, and probably wouldn't like it, but that's not the point here.

Posted by Mark Liberman at 04:16 PM

Blogging's got a brand new bag

This morning my local public radio station ran a story about a new (and geographically local) member of the blogosphere: Freewayblogger. Not content to have his message carried only on the information superhighway, this individual (who, for reasons that should be fairly clear, only wants to be identified as the Freewayblogger) has taken his message to another cluster of heavily-traveled roads: the freeways of Southern California. I have yet to read one of his posts on my own regular routes in San Diego, but I may be traveling up to LA on the 5 tomorrow, so I'll be on the lookout.

(I would perhaps comment more on the extension of blog to this non-web-based form of individual expression, but I wouldn't want anyone to mistakenly think that I take the Freewayblogger's political message lightly.)

[ Comments ]

Posted by Eric Bakovic at 03:02 PM

Amateurs and professionals

In response to my post on "Blog cultures, academic and otherwise", Jason Streed of Finches' Wings sent email about an area in which there's a long tradition of amateur commentary, and an equally long tradition of controversy about the relations between amateurs and various sorts of professionals.

Here's Jason's note:

I enjoyed your remarks about nonexpert comments in expert-driven forums. They reminded me of a remarkable review by Richard Elliott Friedman of Harold Bloom and David Rosenberg's "The Book of J." It was originally published in Bible Review; I found it in The Iowa Review, which reprinted it with other articles attacking Bloom's misdeeds in that book.

I couldn't find the text online, so here's a transcription of the first page or so from a copy long dormant in my basement:

"It is a strange fact that we biblical scholars always seem to meet people who are surprised to hear that we really know things about the bible. They assume that the study of the Bible is a matter of opinions and interpretations, with few verifiable facts one way or another. Even though the archeological revolution is a century old, even though the advances in language, text, artistry, and history are reported in thousands of books of introduction, history, and commentary, people do not just conceive of biblical scholars as having the same kind of expertise that professionals in medicine or law--or even other scholars in the humanities--have.

"And so oddball theories make the front pages of respectable newspapers and magazines. Archeological discoveries are misinterpreted or blown out of proportion. Absurd computer programs are received as legitimate analyses. One view is as good--meaning as unprovable--as another. There is also Exodus Fever, a term used in the field for the phenomenon of persons from a variety of field who are attracted to explain the events of the exodus and Sinai stories with what they believe to be new insights from their own areas of knowledge: geologists, astronomers, Egyptologists, oceanographers, psychologists, historians of other periods and places. The temptation to explain the splitting of the Red Sea, the plagues, and the fiery mountain is irresistible. Everyone explains the Bible--and not hesitantly, or modestly, but like an expert. They are going to show us what the real experts have been missing.

"This must happen to some extent in most every other field as well. I suppose that medical doctors have to endure being told about amazing cures for diseases that the medical profession has failed to recognize. Probably almost everyone has been told how he or she could do his or her job better--that is, told by someone who has never done that job. But I think there is a qualitative difference when it comes to professional scholars of the Bible. I cannot think of any other area that so many persons from so many other fields try to practice. From Freud to Velikovsky to Isaac Asimov to Mary Douglas to Northrop Frye, and most recently Harold Bloom: when it comes to doing a subject in which one is not trained, the study of the Bible is in the first place (and the study of Freud probably second). . . .

"Do all of these people have the right to their opinions about the Bible? Sure. They have a right to opinions about law and medicine, too; but if you have chest pains I suggest you see a cardiologist, not Harold Bloom; and, as they say, anyone who acts as his own lawyer has a fool for a client."

Friedman, Richard Elliott. "Scholar, Heal Thyself; Or How Everybody Got to be an Expert on the Bible." The Iowa Review 21.3 (1991): 33-47.

From this point on, Friedman puts on a clinic in showing pseudo-experts their place.

I can tell you that my father, an epidemiologist, has to put up with all kinds of crazy ideas from people who've scanned a few headlines and cooked up a shocking insights--evil government geniuses who cook up HIV in secret labs, etc.

For my part, I like reading LL and related weblogs for the same reason I like watching master carpenters, electricians, etc--to see people who are good at something I know only a little about. I can't follow everything, but every time I peek over their shoulder, I learn something, and every now and again, I'll ask a question. That's good enough for me.

I do like comparing rational inquiry in an established discipline to the work of a master carpenter or a knowledgeable electrician. This reminds me of something Morris Halle once said to me when I was a graduate student, making the same sort of comparison for a different reason. Morris had just returned from giving a lecture in Paris, and he was still ruminating about the reaction he got in the question period. "Nobody wanted to talk about phonology", he complained. "They asked me what my ideology is. What ideology? Does a shoemaker need ideology to repair a shoe? 'I'm a shoemaker,' I told them. 'You worry about the ideology, I'm busy with the shoes.'"

But there's another perspective on this. The master of a craft is sometimes limited as well as empowered by the craft's body of traditional knowledge and techniques. And one of my favorite things about American culture is the tradition of amateur tinkering, in which any teenager feels authorized to start taking things apart and putting them back together again better -- or at least differently -- without the license of a formal course or apprenticeship. Of course, the laws of nature impose their own harsh and inflexible discipline on these activities. A kid messing around with a motorcycle engine or a computer program gets a kind of feedback that's denied to Harold Bloom messing around with the Bible -- when he's done, the thing either runs or it doesn't.

Posted by Mark Liberman at 08:17 AM

August 03, 2004

Power through damage

In comics and movies, we're used to the idea of humans getting superhuman powers as a result of some sort of damage: a radioactive spider bite, dunking in toxic waste, irradiation by gamma rays or cosmic rays, whatever. There's an even longer history of mythic animals undergoing similar transformations. The idea of gaining powers by contact with feared pollutants makes magical sense but not biological sense, and so I'm not used to seeing it outside of fantasy.

But here's a CNN story about Natasha, a black macaque who began walking on her hind legs after a severe infection that may have caused brain damage. Will we start to see some real-world stories about domestic animals who develop extraordinary communicative abilities as a result of over-exposure to currently feared pollutants of human culture, such as tobacco, fried potatoes or reality TV shows? [link via Pepper of the Earth]

Posted by Mark Liberman at 06:56 PM

Grammar and essay grading

The Wall Street Journal (print edition, Monday, August 2, 2004; page B1) carries an article by June Kronholz about the training of essay graders for the new ACT college entrance exam. The writing component is to be added next February. Essays will be scored on a six-point scale for such subjective elements as voice, style, flow, and deployment of the language. But it emerges that ACT Inc. does not plan to treat grammaticality as decisive. Organization and originality will trump mere syntax. In one training session for scorers, even "two paragraphs that were barely readable through the misspellings, twisted syntax, and bad grammar ... weren't enough to lower a score" in an essay that "offered a reason to support its point of view", Kronholz reports.

My view (and it may seem odd to you to hear this from an avowed grammarian who loves to see the language used accurately) is that this is all just as well. The grasp that even well-educated people have of what it means to have twisted syntax or bad grammar is so tenuous, and the misinformation and downright outrageous nonsense so widespread, that I would rather trust ACT's scorers to evaluate argumentational coherence and rhetorical effectiveness than to judge grammar. As we have noted so many times on Language Log, educated Americans hardly even know what grammar is. Tell scorers to deduct one percent for each grammar error, and they'll soon be penalizing stranded prepositions and banning genitive antecedents and condemning split infinitives and insisting on whom and following Spiderman in calling for grotesqueries like "He's no bigger than we", and all the other familiar old nonsense on which college aptitude certainly does not depend.

Posted by Geoffrey K. Pullum at 03:54 PM

PDF haha

Vardibidian at Tohu Bohu has a good e-laugh, courtesy of Geoff Pullum's suggestion that “Minimum trust in document transfer means only letting people have things in a page description language like PostScript, or better, PDF. It means they can't edit the file they receive, they can only print it, and if it prints at all it prints exactly the way you want it to.”

Quoting now:

"AhahahahahahahaHAHAHAHAHAHAHAHAHAhahahahahahooooooooooooohahahahaha."

Before I saw this, I was going to make a polite observation or two about some of my experiences trying to share .pdf files with publishers. But Varbidian's less intellectual approach is better, I think, and I'll spare you my tales of woe. I've healed, you see, and am reluctant to re-open old wounds. Also, the statute of limitations on DMCA violations has not expired... I'll just mention that the usual culprit is an unembeddable font or two (or worse, an older font lacking the now-required specific permission to be embedded), and some of the issues involved are discussed here.

Posted by Mark Liberman at 02:29 PM

Blog cultures, academic and otherwise

Phonoblog is developing nicely, with a fascinating exchange between Eric Bakovic and Travis Bradley on Bill Richardson's Spanish. There are lots of language-oriented weblogs -- see our blogroll on the right for a long and doubtless incomplete list -- but the flavor of the mix has been different from that of the philosophical blogosphere, which is older, larger and more academically oriented. The subdiscipline of semantics has a number of philosophy-style weblogs (e.g. semantics etc.), probably because semantics is academically bicultural.

Jordana Lewis had an item in the July 26 Newsweek on philosophy blogs, which I learned about from Brian Leiter's comment. Lewis invites public participation, suggesting that "you don't need a Ph.D. to participate, but don't fake it," which seems like good advice to me. Brian Leiter has a less charitable perspective on pro(fessional)-am(ateur) interactions. I've commented previously on professionalism in language commentary, and also discussed the quality of discussion in weblogs versus more authoritative sources. This is a very troublesome set of issues.

I think it's worth distinguishing at least three questions about would-be participants in serious discussions:

Do they have a genuine interest in the issues, an understanding of the methods of rational investigation, and a willingness to apply them?
Do they know a relevant set of facts and techniques?
Are they familiar with the (recent) intellectual history of some particular academic discipline?

In my opinion, (1) is the price of entry; (2) is almost always necessary to be able to make a contribution (other than bringing up interesting questions); (3) is worthwhile, but is too often overvalued by those who have it, and undervalued by those who don't. Degrees and institutional status are relevant only insofar as they are often reasonable proxies for positive answers to these three questions.

Of course, for participation in non-serious discussions, there's a different set of requirements. As a recent example, I'll point to my favorite bit of convention-blogging: Wolf Blitzer's interview with Fafnir and Giblets.

Anyhow, phonoblog is shaping up as a place where folk who care about such things can get right down in the details of the sound structure of language, or comment on interesting new papers, or even discuss who got what job or which institution is moving in which direction, like those philosobloggers do. I look forward to participating.

Posted by Mark Liberman at 10:05 AM

Referrer log flotsam

The blogger at but she's a girl... likes Brummie or Black Country accents.

Greg Kochanski at the Oxford University Phonetics lab posted a critique of the Fitch and Houser paper from Science on "Computational Constraints on Syntactic Processing in a Nonhuman Primate", and cited my post on the same subject, and this morning someone clicked on the link.

An unfortunate internet pilgrim somewhere on the east coast of the U.S.asked Google {how to get a boyfriend}, and was sent to Geoff Pullum's post about speech perception errors, sexism and dead sea lions. This happens several times a day, actually. I draw two conclusions: (1) there's still some headroom for research on improving precision in information retrieval; and (2) we should enrich that post with some useful advice, or at least a few words of comfort, as we did with the wedding vowels business.

Someone was reading an old post by Marc Moffett at Close Range on anti-individualism and the Sapir-Whorf hypothesis.

And hits on the Groseclose and Milyo discussion came from readers of pieces by Paul Goyette at locussolus and mallarme at the Greater Nomadic Council.

Posted by Mark Liberman at 07:05 AM

WYSIAANWTG: What You See Is Almost Always Not What They Get

There was just a single day of business involved in the two-week trip to England from which I just returned. For the most part Barbara and I tried to devote the time entirely to relaxation, but I just had to visit Cambridge University Press to check on how things were going with the editing of the book Rodney Huddleston and I had just submitted, A Student's Introduction to English Grammar. It's just as well I did stop by.

Anyone who does not want to read a rant about the state of word processing programs and the stupidity of the human beings using them and a tale of possibly the silliest electronic submission process in the history of computers should simply pass on at this point, and not read the rest. I'm sure Mark or Eric or Arnold or someone will have some nice material about words or pronunciation or grammar that you could read instead. I have only a tale like the one told by Coleridge's ancient mariner, who stoppeth one of three on their way to a wedding feast and grippeth him by the arm and will not stop telling the story until he has dealt with the last dead albatross and the last stony glare in a dead crewmate's eye. So you and you, go ahead, you don't need to hear this. But you, stop. I need to tell my tale, and I've decided that you're it. The moment that his face I see, I know the man that must hear me. Read on.

Cambridge University Press boasts of being "the oldest printing and publishing house in the world": it was founded on a royal charter granted to the University by Henry VIII in 1534 (you needed the permission of regal and religious authorities to publish in those days, it seems; England was rather like modern Iran). However, I note that the time taken to get its first book out was fifty years: the Press "has been operating continuously as a printer and publisher since the first Press book was printed in 1584." A press with such origins is, a priori, the least likely to move rapidly toward modern methods of book production. This is actually unfair to them: in many fields of science and mathematics they are now accepting LaTeX source. But from what happened in the third week of July 2004 one could certainly get the impression that they are not yet ready for modern methods of handling text, and will remain in the 19th century for some time.

Perhaps it was quixotic of Rodney Huddleston and me to hope that we might submit the typescript of A Student's Guide to English Grammar electronically, direct from our own word processor files. Our previous work, The Cambridge Grammar of the English Language, was (fantastically) printed out as a double-spaced typescript on one side only of about 3,500 sheets of paper, and airmailed from Australia to England in a box the size and weight of a sewing machine cabinet (with sewing machine contained therein). We thought it was rather wasteful of jet fuel to send all that heavy paper in a box. We suggested simply emailing the word processor source files for this one, rather than sending hard copy plus diskette. The Press would still need a printout for the copy editor to write corrections on in the traditional manner, but we're on a tight schedule, and we thought we could gain a week by just flashing them the word processor files and having them make the printout. And CUP said they could handle that.

Now, I'm well aware that printing something out from a word processor document file generally demands that precisely the same software and hardware is in use at each end. Translation from one word processor format to another in a way that preserves what is important about a complicated text like a grammar book is possible in principle but hopeless in practice. Differences in letter width way below the millimeter level pile up and lead to problems with tabbing and tables. Linebreak differences pile up and lead to disastrous page break placements. Special characters disappear. Tables are mangled. Fonts are randomly replaced in utterly lunatic ways. If you've worked extensively with moving files between different word processors you'll know why I'm gripping your arm and warning you about this. If not, you won't listen, but you should.

I'm giving you the short version of the story. This is it. Rodney completed the final edit of the new book using WordPerfect 6 for DOS (because he standardized on that in the late 1980s and now has too big of an investment in macros and text files to switch). I was worried that the Press would never be able to find a machine with WordPerfect 6 for DOS on it, so I converted the whole book to WordPerfect 11 for Windows, checked all the pagebreaks, and emailed the file to Cambridge with instructions about how it MUST be printed using WordPerfect for Windows, version 6 or later. Then, about a week later, having travelled to England and relaxed a few days to recover from the jetlag and the final push of writing the book, on July 19 I stopped by in Cambridge on the way up to York and went into CUP's headquarters to take my first look at the typescript that I thought by now they would be copy-editing.

But no copy-editing had started. Our senior commissioning editor had flagged at least one place where a table had been botched in the printout. I rapidly saw that there were unpleasant page-break issues too. Barbara was in on the meeting too, and she glanced at a page and pointed out a tabbing error. Slowly it became clear to me that virtually all the tabbed displays had gone wrong. And suddenly I saw what should have made me jump as if stung by a bee, only sometimes you can't see unexpected things when they're really huge. The entire typescript was in a new font. It looked a lot like Apple-style Helvetica (though oddly the footnotes were in Times Roman, the font we had used). And the bullets had been replaced by decimal points. In fact hardly any of the special characters were right.

The file had been through some kind of conversion process, exactly what they promised would not happen! I pounded on the table. I shouted and hurled medium-sized objects around. Secretaries nervously checked to make sure they had the phone number of security in case things really got ugly. Our commissioning editor apologized profusely and repeatedly, and said she'd look into it. She took us to lunch in the private dining room to try and calm the situation. (This worked well; they had profiteroles on the dessert menu, which did have a calming effect.)

I learned a day or two later, by phone from York, what had happened. A young CUP intern on a short-term contract (very short, I hope) had been unable to find a machine with WordPerfect that was connected to the right printer, and it was raining so he didn't want to have to go across the road to a different building (did the little twit think he would melt?), so, without approval, he just converted from Corel's WordPerfect for Windows to Microsoft's Word for the Macintosh, and printed the result. It was about 550 pages, double-spaced. The font was too big. Nothing usable remained of most of the carefully measured tables and meticulously laid out example displays. There was nothing to be done with it but to throw it away.

And then later, when he had been told he must use WordPerfect, the same assistant botched it again. He opening our Windows files with WordPerfect for the Macintosh — a totally different program, fairly old, no longer marketed or updated, and never much good for anything. Even he was able to see that it was useless and couldn't be the basis for the copy editing. So that copy had to be thrown away too. (Yes, that's around 1100 wasted sheets of paper so far, and counting.)

He then sent the files off to the Printing Services division of the Press, and they were able to grasp the notion that for printing a WordPerfect for Windows file, it's a good idea to start with a machine running Windows, and that having WordPerfect on it would be an excellent additional feature. Even their version, though, was apparently not perfect: certain lines from tables appeared to be missing, and the pagination did not match what Rodney Huddleston had in the version he had printed out. I haven't got the details, because I'm back in California, Huddleston is in Australia, and the Press's third printout is in England. But there is a real problem about neither Rodney or me being sure we have a version with the same page breaks as the CUP copy: we will never know what the copy editor means in her emailed queries ("On page 138, seven lines up, should which be changed to that?").

So what they eventually decided they had to do was to make a xerocopy of their third printout and airmail it to Australia. Rodney had printed it out originally to check it before doing the electronic mailing, but his copy didn't match the CUP copy with regard to page breaks. Putting Rodney's copy together with CUP's three attempts at printing what they received plus the xerocopy, the number of sheets of A4 paper used up so far is around 3,300. And jet fuel had to be used in the end to send the typescript back from the Press to the senior author. And I still don't have a copy that matches anyone else's (if I printed it out here, it would be on American letter-size paper, and wouldn't match any of the other copies even approximately).

So much for electronic transfer of documents in the modern world. So much for the vaunted paperless office which I remember being told would arrive in the second half of the 1980s.

What are the lessons learned? One is that word processor software is hopeless when judged by any kind of serious standard. Glitzy stupid features are constantly added (clip-art libraries, magnifying tools, different designs for font lists, dialog boxes, menus, status lines), but the basic formats and font handling mechanisms and printer interactions and so on just aren't fit to be used as a basis for electronic transfer of documents. Even keeping pagination control stable is out of the question. In fact the supervisor of the production department at Cambridge University Press told me recently that they are returning to a strict policy of requiring authors to submit hard copy as well as a computer file. Born in the galleon age, CUP is now deciding to stick with the steam age in this regard.

In part I blame myself. I should have stopped Rodney from mailing word processor files, seized control of the submission process, and done everything in a way that involved minimum trust. Minimum trust in document transfer means only letting people have things in a page description language like PostScript or PDF. It means they can't edit the file the receive, they can only print it, and if it prints, it prints exactly the way you want it to look. [Added later: OK, so Varbidian laughs his head off at this. I should have said, it is supposed to mean it prints exactly the way you want it to look. But PDF has its own horrible problems, as Mark hints.] You are [if it works] basically sending them (in the form of a compressed machine-readable description) a picture of each page. I should have made a PDF (WordPerfect 11 does a nice PDF conversion) and sent that, guaranteeing that the font sizes and page breaks would be as originally stipulated, and that anyone with Acrobat Reader and a laser printer could print the thing looking exactly the way we wanted it to look. Editable word processor document formats won't do that. WYSIWYG stands for What You See Is What You Get. It doesn't stand for What You See Is What They Get. What you see is almost always not what they get. I knew that already. I had a very bad feeling about the idea of attempting to submit a book with technical content, tables, diagrams, etc., in a word processor format. I knew in my gut that it wouldn't work but I tried it anyway. It was crazy; like shooting an albatross for no reason. It was my fault, all mine...

Since then, at an uncertain hour
That agony returns:
And till my ghastly tale is told
This heart within me burns.

[Endnote: Now that I've taken the advice of Varbidian and Liberman and read something about font embedding and PDF and the DMCA, I am appalled at the above hint of optimism that PDF might be the answer to typescript submission problems; there seem to be many forces arrayed against the very possibility of portability for electronic documents. Just as spammers are destroying the usefulness of the email medium, font foundries are intent on destroying document portability through absurd abuse of copyright laws and criminal prosecution of freeware font designers... Things are bad out there.]

Posted by Geoffrey K. Pullum at 02:03 AM

August 02, 2004

Science, politics and fair play

I wanted to add a brief comment of my own about the exchange between Language Logger Geoff Nunberg and political scientists Tim Groseclose and Jeff Milyo. I'm posting this as a separate item because I wanted to let Groseclose and Milyo speak for themselves, with a simple frame explaining that I was posting the response that they had sent me.

First, here's a bit about the history. Geoff Nunberg posted his critique of the Groseclose and Milyo article here on July 5. Jeff Milyo emailed me on July 21, explaining that he had read Geoff's piece after hearing about it from a Language Log reader, that had written to Geoff (Nunberg) asking if it would be possible to post a response, and that Geoff had suggested that he contact me, since I administer the weblog. I responded that I'd be happy to post their response, and this afternoon Jeff (Milyo) sent it to me. It took some massaging -- he sent a Microsoft Word file, and saving this as html resulted in some pretty strange html code -- but I hope that emacs and I have succeeded in coaxing the output into the form that the authors intended.

Second, I'd like to express my own opinions, such as they are. With respect to the statistical methodology, I don't think I'm in a position to judge. When I first read the Groseclose and Milyo article, my reaction to their "back-of-the-envelope" version was similar to Geoff's. I did realize that the most obvious objections to this version don't apply to the "real" technique that they used. On the other hand, I've noticed that the popular-press discussion of their article has focused mainly on the easier-to-understand "back-of-the-envelope" method, and so it does seem fair to me for Geoff to have criticized it. I believe that I do understand their "real" statistical model, and I look forward to some further discussion about what conclusions its application to in this case licenses. Geoff didn't engage this question, and it seems fair to me for them to complain about this.

With respect to the tone and style of the criticism, it seems to me that there's a certain clash of expectations here. As Geoff pointed out, the Groseclose and Milyo article has been widely discussed in the popular press, where its conclusions have been often presented in a polemical light. Although Geoff can (and I expect will) speak for himself, I took his LL posting to be presented in the rough-and-tumble style of these political polemics, rather than in the typically more subdued style of an academic review. The G&M paper itself somewhat straddles this divide. It's a piece of academic social science, but I get the impression from reading it that its authors also intended to make a political point.

So I feel their pain -- if I had published a scientific article that was criticized with the rhetorical devices that Geoff applied to their paper (starting with "sand sifted statistically is still sand", and moving on from there), I'd be pretty upset too. On the other hand, if I published a political tract that got similar treatment, I'd think to myself "oh good, I'm making enough impact that someone's taking the trouble to attack me", and I'd respond in the same spirit.

Anyhow, I think that it was only fair to give G&M the chance to respond, in the same space, to Geoff's critique.

Posted by Mark Liberman at 06:20 PM

Groseclose and Milyo respond

On July 5, Geoff Nunberg posted a critique of a recent paper on media bias by Tim Groseclose and Jeff Milyo. Professors Groseclose and Milyo have written a response to Nunberg, and asked us to post it on their behalf. I'm happy to be able to do so.

--Mark Liberman

Geoffrey Nunberg recently posted a critique of our paper, “A Measure of Media Bias” at this site. In his essay, Nunberg shows a gross misunderstanding our statistical method and the actual assumptions upon which it relies. We have decided to provide this response, not only to correct his many errors, but as a caution to other academics who would use blogs to pose as experts on subjects well-outside those for which they have the requisite knowledge or technical expertise

We would have ignored Nunberg’s rant, as we have other equally inflamed and baseless web-bashings, except that his posting has been taken by some to be a particularly powerful counterpoint to our study. Indeed, had we not been familiar with what we actually wrote in our study, we would have found it quite convincing, too. This is because Nunberg, in referring to our work, states that "If you take the trouble to read the study carefully, it turns out to be based on unsupported, ideology-driven premises and to raise what would be most politely described as severe issues of data quality…" This is not an isolated charge; Nunberg accuses us of unprofessional behavior throughout his essay. In our world, this is very damning; our livelihood and reputations depend crucially on our abilities to conduct scientific research. Such charges should not be made lightly.

We provide our response in three parts. The first is short and addresses only the most obviously false of Nunberg’s claims. The second is a one paragraph summary of our response regarding bias generated by our list of think tanks and advocacy groups. Together, these address Nunberg’s most serious criticisms of our work. The final part is an attempt to provide a more detailed point-by-point response to his complaints.

PART I. A SHORT RESPONSE

Suffice it to say that Nunberg could not have read our study carefully, as his methodological criticisms are directed only at what we repeatedly describe as our "back-of-the-envelope" method and not the procedure upon which we base our conclusions.

The "back-of-the-envelope" estimates are intended as an easy to understand initial set of calculations; this procedure is described in the section of our paper titled "Descriptive Statsitics." Indeed , we ourselves critique this "back of the envelope method," in order to highlight the strengths of our preferred statistical procedure. Despite this, Nunberg’s summary of our methods is only a summary of the "back-of-the-envelope" method, which we acknowledge to be simplistic and inferior to our primary method.

Anyone who even skims our paper will find a section entitled, "The Estimation Method," which describes our primary statistical procedure in detail. Nowhere in Nunberg’s critique, does he make even the slightest reference to this statistical technique. We are not surprised if Nunberg did not comprehend the material in this section, as it is intended for a somewhat statistically sophisticated audience. However, it is quite inappropriate for Nunberg to act as if this section does not exist.

For this reason, we believe Nunberg has lied when he implies that he has read the study carefully. This is a harsh criticism, but the alternative would be less charitable, as it would mean that Nunberg actually did read the study carefully, but purposely chose to misrepresent our work in order to undermine our credibility. Regardless, by taking on the guise of an informed and careful critic, Nunberg has misled many others who may have trusted him. This is unprofessional conduct, to say the least; other academics who blog should take care not to behave in a like manner.

PART II: ON BIAS

Nunberg finds fault with our list of think tanks and advocacy groups used to rate media outlets. But even if our sample of think tanks is skewed left or right, this will not bias our results. To see this, consider a regression involving height and arm lengths, as the independent and dependent variables. Suppose instead of a balance of short and tall subjects, the researcher includes twice as many tall subjects as short subjects. This will not change the expected relationship between height and arm length -- that is, the estimated parameter associated with the independent variable. Of course, it will cause predictions about arm length to be more precise for tall people than it will for short people. However, it does not cause a bias. E.g. it does not cause the researcher, say, systematically to predict arms to be too long (or too short). As we discuss below no statistics textbook claims that the set of independent variables must have a certain distribution if an estimator is to be unbiased. For the same reason, our method requires nothing of the ideological distribution of the think tanks for the estimates to be unbiased.

PART III. A LONGER RESPONSE

Nunberg makes five general points: 1) Our statistical method for rating think tanks assumes that there is no such thing as a centrist or apolitical think tank and it does not distinguish between, say, a moderately left think tank and a far left think tank; 2) Our method "assumes there can be no such thing as objective or disinterested scholarship"; 3) We "have located the political center somewhere in the middle of the Republican Party." 4) The list of think tanks and policy groups that we choose is an arbitrary mix, and this mix of think tanks causes the media to appear more liberal than they really are. 5) Our data from the Congressional Record "shows some results that would most kindly be described as puzzling" -- most prominent of which are the data that involve the ACLU and the Alexis de Tocqueville Institution.

We show why each point is wrong and in some instances dishonest.

1) Nunberg describes our study as "certainly the most ambitious and analytically complicated" of quantitative studies of media bias. We appreciate the compliment, but we should begin by clarifying the statement. The version of our paper to which Nunberg refers has nine sections, including the introduction. Eight of these sections, in our view, contain no specialized economics or political science jargon, nor do they require any mathematics skill above an eighth-grade level. However, one of these sections, "The Estimation Method," is somewhat analytically complicated. E.g. it describes a maximum-likelihood estimation technique and it notes a set of random variables that follow a Weibull distribution. Such techniques and concepts are somewhat specialized, but most people with a PhD in economics or statistics will know them, and more and more frequently they are becoming part of the toolbox of newly-minted political-science and other social-science PhDs.

Our main conclusions are based strictly upon the method that we describe in that section. However, in another section, entitled “Descriptive Statistics,” we show how a simpler method, which we call the “back of the envelope method” gives nearly identical results. We ourselves discuss the problems with the back-of-the-envelope method. Yet, we decided to include it, because (i) it is accessible to laypersons, and (ii) it helps to provide some intuition for our primary, more complicated, method.

We strongly suspect that (1) Nunberg did not read the more complicated section. Or, if he did, (2) he certainly did not understand it. Here is some evidence.

1a) Nunberg’s essay has four sections. One entitled “The Study,” appears to describe our statistical method. However, in this section he only describes our “back of the envelope” method. Nowhere in the section, nor in any other section of his critique, does make even the slightest reference to our primary statistical method.

1b) Nunberg writes “There are ideological implications, too, in Groseclose and Milyo’s decision to split the think tanks into two groups, liberal and conservative. One effect was to polarize the data. No group – and hence, no study – could be counted as centrist or apolitical.” This is true of the back-of-the-envelope method, but it is not true of the primary, more complicated method that we use (which, again, is the method on which we base our main conclusions).

Our method assumes that legislator i’s preference for citing think tank j is

a_j + b_j x_i + e_ij.

The key letter in this equation is the subscript-j associated with b. As we state in the paper, the j stands for the j-th think tank in our sample. It means that we estimate a different b_j for each different think tank. In contrast, if we had done what Nunberg says we did, we would only estimate two b_j’s, e.g., a b_L for liberal think tanks and b_C for conservative think tanks. That we estimate a different b_j for each different think tank means that we allow for a continuum of different ideologies for the think tanks. Indeed that is what we found. E.g. the b_j for the Heritage Foundation is significantly less than the b_j for the American Enterprise Institute, which is significantly less than the b_j for the Brookings Institution, which is significantly less than the b_j for the Urban Institute, and so on. As a consequence, if a media outlet cites a think tank that is cited predominantly by moderates in Congress or one that is cited nearly equal by conservatives and liberals (e.g. the Brookings Institution was one such think tank), then that will cause our method to rate the media outlet as more centrist. Likewise, if a media outlet cites a far-left think tank then this will cause our method to rate the outlet more liberal than if it had cited a centrist or moderately-left think tank.

1c) Nunberg makes the same error when he writes “In fact, even though the ADA rating that G & L’s [sic] method assigned to the Rand Corporation (53.6) was much closer to the mean for all groups than that of the Heritage Foundation (6.17), G & L [sic] ignored that difference in computing the effect of citations of one or the other group on media bias, compounding the polarization effect. That is, a media citation of a moderately left-of-center group (according to G & M’s criteria) balanced a citation of a strongly right-wing group.”

Again, this is true for our back-of-the-envelope method, but it is not true for our primary method. For an explanation, see our previous point. Again, it is the latter method, not the back-of-the-envelope method, on which we base our main conclusions.

(A separate error in Nunberg’s statement is to call the above numbers, 53.6 and 6.17, “ADA ratings.” We never do that, nor should anyone else. Here is one reason (which is the simplest to explain). It is conceivable that a think tank could be more right wing (or left wing) than any member of Congress in our sample. If so, then the average member citing the think tank would necessarily have an ADA score that is higher than the think tank’s true score. In fact, in general, if we defined the ADA score of the think tanks by the average score of the members citing them, then this in general would cause think tanks to appear more centrist than they really are.)

1d) Another error occurs where Nunberg writes, “Let’s begin with the assumption that underlies Groseclose and Milyo’s assignment of ratings to the various groups they looked at: if a group is cited by a liberal legislator, it’s liberal; if it’s cited by a conservative legislator, it’s conservative.”

We do not assume this, and in fact, it would be ridiculous if we did. Nearly every think tank in our sample is cited at least once by a liberal legislator and at least once by a conservative legislator. Thus, if we literally assumed the above statement, then almost every think tank in our sample would simultaneously be both a conservative and a liberal think tank. It would be very strange for us to make an assumption that is contradicted almost everywhere in our data.

We think that what Nunberg meant to say is that we assume that “if a think tank tends to be cited by liberals, then it is liberal, and if it tends to be cited by conservatives, then it is conservative.” This is a more reasonable statement, and it is true for our back-of-the-envelope method. However, it is not true for our main statistical method.

As mentioned above, our main statistical method estimates a different b_j for each think tank. These estimates indeed describe relative positions of the think tanks. However, we do not assume that our method gives an absolute position. In fact, it cannot give an absolute position. As we note in the paper, it is actually impossible to identify all the b_j’s. All our method can do is identify them up to an additive constant. As a consequence, we must set one of the b_j’s to an arbitrary constant. Substantively, this means that while our method can reveal that the Heritage Foundation is to the right of the Economic Policy Institute, it cannot say, e.g., that the Heritage Foundation is to the right of the political center of the U.S., while the EPI is to the left of the center. Although our results are consistent with this statement, our results are consistent with many other possibilities, including (1) Heritage is far to the right of the political center while EPI is near the political center, or (2) Heritage is near the political center while EPI is far to the left of the political center. Indeed any statement that describes EPI to the left of Heritage would be consistent with our results.

Why is this important? Nunberg says that our method divides think tanks into two dichotomous groups, liberal and conservative, and that we choose as our dividing line the middle of the Republican party. Later, we’ll explain why our paper does not define the political center at the middle of the Republican party. But, for the moment assume that it does. Even if we did make such a strange (and misleading, we would argue) choice, this would not affect our method’s estimates of the media’s ADA scores. The reason is that to estimate ADA scores our method does not make (and cannot make) any sort of assessment about which side of the political center that a think tank lies.

1e) All the evidence above is all consistent with the possiblity that Nunberg read “The Estimation Method” section but just did not understand it. However, some other evidence suggests he really did not read the section at all. Here are the first two sentences of the section: “The back-of-the-envelope estimates are less than optimal for at least three reasons: (i) they do not give confidence intervals of their estimates; (ii) they do not utilize the extent [italics in original] to which a think tank is liberal or conservative (they only record the dichotomy, whether the think tank is left or right of center); and (iii) they are not embedded in an explicit choice model. We now describe a method that overcomes each of these deficiencies.” If Nunberg had really read these sentences, especially reason (ii), we do not see how he could possibly make the statements that he made in points 1b and 1c above. (Another possibility is that he read all sentences of the section except the first two. But this would be even stranger. Each of the sentences in the section except the first two and last six require a fair amount of technical expertise. It would be strange for a person to read the difficult parts of the section but skip the easy parts.)

2) Another criticism that Nunberg makes is that “In fact, their method assumes that there can be no such thing as objective or disinterested scholarship.” This is the strangest sentence of all in Nunberg’s critique. We make six points in response. i) Our method does not make this assumption, and nowhere in the paper do we state anything like it. ii) Such a statement is neither necessary nor sufficient to justify our method. iii) As professors at research universities, we consider the primary aspect of our jobs to produce objective and disinterested scholarship. It would be very strange if we wrote a paper that assumes that such scholarship cannot exist at all.

iv) Although we did not state it in the paper, our own view is nearly the exact opposite of this assumption. Namely, by and large, we believe that all studies and quotes by the think tanks in our sample are true and objective. However, it just happens that some, but not necessarily all, of these true and objective studies appeal differently to conservatives than liberals. To see why, imagine that a researcher publishes a study in a very prestigious scientific journal such as the New England Journal of Medicine. Suppose this study gives evidence that a fetus in the early stages of its mother’s pregnancy can feel pain (or cannot feel pain). We are willing to bet that this true and objective study will appeal more to conservatives (liberals) than liberals (conservatives). We are also willing to bet that conservatives (liberals) would tend to cite it more.

This is all that our study assumes—that these studies can appeal differently to different sides of the political spectrum. We do not assume that the authors of the studies necessarily have a political agenda. Not only that, we do not even assume that each study will appeal differently to different sides of the political spectrum. We only assume that it is possible that such studies will appeal differently. That is, our method does not force each b_j to take a different value. It allows for the possibility that the estimate of each b_j could be the same (of course, however, that does not happen with our data).

v) We took great pains to include in our statistical model the possibility that there are factors besides ideology—including possibly a reputation for objective and disinterested scholarship—that can cause a think tank to be cited more frequently by the media and in Congress. These are represented by the a_j’s that we estimate. Our decision to include these parameters came at a considerable cost in terms of computer time and our own effort to estimate the model. Including these parameters approximately doubles the number of parameters that we need to estimate. This, for reasons that we explain in the last two paragraphs on p. 11, actually quadruples the effort and computer resources that we need to calculate the estimates. As we explain, once we run the full model, we expect the statistical program to take approximately eight weeks to run. If instead, we eliminated the a_j’s, the program would only take two weeks. If we really assumed that there is no such thing as disinterested and objective research, why would we choose to estimate a much more complicated model that tries to account for this possibility?

vi) In contrast, the assumption that Nunberg claims that we make seems to apply more to his views than ours, at least in regard to research on the media. His second to last sentence reads, “It seems a pity to waste so much effort on a project that is utterly worthless as an objective study of media bias.” Is he saying “there can be no such thing as an objective and disinterested” study of media bias?

3) Nunberg claims that “In effect, G & C [sic] have located the political center in the middle of the Republican Party, by which standard the majority of American voters would count as left-of-center.” Here is another case where Nunberg seems not to have read a section of the paper. We devote an entire section to defining the political center (the section is entitled “Digression: Defining the ‘Center’”). We conclude the section with the following sentence, “As a consequence, we think it is appropriate to compare the scores of media outlets with the House median, 39.0”

We devote an entire table, Table 2, toward comparing the median and means of the entire Congress to the means of each party. As we note, the Republican mean is 11.2. Meanwhile the Democratic mean is 74.1. By no stretch of the imagination is 39.0 in the middle of the Republican party. In contrast, it is almost exactly equal to the midpoint of the middles (means) of the two parties.

We also illustrate this in Figures 2 and 3. Both figures list the median of the House, 39.0 and the averages of the Republican and Democratic parties. As anyone can see, 39.0 is approximately the midpoint between the two parties’ averages.

Finally, we also devote an entire table, Table 3, toward showing that 39.0 is indeed a moderate score and not a position in the middle of the Republican party. For instance, it is very near the score of Dave McCurdy (39.8), a Democrat who represented southern and central Oklahoma, a district that consistently and significantly voted for Republican presidential candidates. The 1994 Almanac of American politics notes that he often breaks with the Democratic Party, and in 1990 he formed a “Mainstream Forum” for moderate House Democrats. Our definition of the political center is also near the score of Tom Campbell (41.5), a Republican who represented two different districts in Silicon Valley. Both districts voted overwhelmingly for Gore in 2000. Campbell was one of a handful of House members (of either party) who voted against Newt Gingrich for speaker in 1997 while voting in favor of impeaching President Clinton. The 1998 Almanac of American Politics calls him “[c]onservative on economic issues, liberal on cultural issues.” It is also near the scores of Sam Nunn (D.-Ga.) and Arlen Specter (R.-Penn.). No one with an even moderate knowledge of American politics can say that these legislators are in the middle of the Republican Party.

4) Nunberg raises a number of issues about the set of think tanks we choose to analyze. We make three points in response: a) Despite what he implies, we did not cherry-pick our list; b) He bolsters this charge by reporting citation data about the Conference of Catholic Bishops and the National Association of Manufacturers. If we add these groups to our list, this in general makes the media appear more liberal, not less. c) Nunberg criticizes our list of think tanks for not being the most prominent possible set and for not being a “genuinely balanced” set of think tanks. Even if these charges are true, we show that they do not necessarily imply a bias to our method. That is, if we had used a more prominent set of think tanks or a more balanced set, it is just as likely that this would cause the media to appear more liberal as more conservative.

4a) First, the cherry-picking charge. When we began our study, Milyo, while searching the internet, found a list of think tanks that seemed to be a good place to start to look for data. This is the list created by Saraf. We have never met Saraf, nor do we know anything about him except what he lists on his web site. Further, when we first downloaded the list, we had not even read any other parts of his web site. In short, we knew nothing about Saraf or how his list was created. We chose the list simply because (i) it listed many think tanks, (ii) it seemed to include all the major ones, and (iii) it seemed to include a healthy balance of far-right, right-leaning moderate, moderate, left-leaning moderate, and far-left think tanks.

(As Nunberg mentions, Saraf won an award from a Republican group; thus, it is possible, and maybe likely, that the list is stacked slightly in favor of right-wing groups. Later, we’ll explain why this will not cause a bias to our media estimates. But in the meantime, consider this: Suppose instea d we had chosen a list that was stacked in favor of left-wing groups. We are certain that if we had done that someone, possibly Nunberg himself, would accuse us of intentionally picking a left-wing list in order to make the media look liberal. Here’s how such a critic could explain his or her charge. “Because Groseclose and Milyo’s list has a disproportionate number of left-wing think tanks, this causes media outlets in their sample to appear to cite left-wing groups disproportionately. This, in turn, causes their method to report the media more liberal than it really is.” Later, we’ll explain why this argument is wrong. But for now suppose it is correct. Remember, our list, if anything, seems to be stacked the other way, toward more right-wing groups. This would cause our method to report the media more conservative than they really are.)

This was Spring of 2002 when we first came across the list. Groseclose gave the list to his r.a.’s and asked them to begin data collection. After several months we considered adding more think tanks to the list. However, for two reasons we did not. One is simply the extra effort that it would bring upon us and our research assistants. We have now hired a total of 21 research assistants, and they have spent a total of approximately 5000 hours collecting data over a period of 2 ½ years, and we are still not quite finished. If we were, say, to expand our list to 300 think tanks, then this would cause our data-gathering exercise to take another year and a half, a total of about four years. At some point we have to say “Enough.”

But what about adding, say, 10 or 25 more think tanks? Would that be such a large burden? No, but if we did, our list would no longer be chosen exogenously by another authority. We would be even more susceptible to charges that we cherry-picked our list. Imagine how nefarious someone like Nunberg could make us look, saying, e.g., “Groseclose and Milyo began with a list chosen by another source. But then for some puzzling reason they chose to add several think tanks. Did the first list not give them the results they wanted? One suspects that the media would not look so liberal if they had stuck to their original list.”

Nunberg says that we should have used a set of think tanks “whose prominence was objectively determined.” We’re not sure how he defines “objectively determined,” but if he means “exogenously chosen” in the sense that eonometricians and statisticians use the phrase, we agree. That’s exactly why we use a list chosen by someone else.

As a final word on the possibility we cherry-picked the set of think tanks to rig our result, recall that we have hired 21 research assistants for the data-gathering exercise. We carefully chose them so that approximately half were Gore supporters in the 2000 election. If we really did cherry-pick our list or, say, begin with one list and then switch to another, then almost surely one of these research assistants would recognize it. Imagine the damage to our careers if one of them was able to step forward with such a charge. Even if we had the lowest possible regard for honesty in research, wouldn’t self-interest alone motivate us not to cherry-pick a list given how many research assistants are involved in the project?

4b) To bolster the charge that we chose an arbitrary set of think tanks, Nunberg gathers data from two think tanks that we did not include on our list: the National Association of Manufacturers and the Conference of Catholic Bishops. He states that by not including groups such as these, we “exaggerate the media’s liberal tilt.”

Our first response is simply to apply Nunberg’s critique to himself. What is the “objective criterion” that he used to choose these two groups? In the words of his own critique, he “gives no indication of how his list was compiled, or what criteria were used.”

We are certain that some think tanks that we did not include would cause the media outlets to appear more liberal than we report. We are also certain that other think tanks would cause the outlets to appear more conservative than we report. Accordingly, it would be easy for a critic to cherry-pick two think tanks and then offer them as an example to show that the media are really more conservative than we estimate. We would accuse Nunberg of engaging in such an exercise, except the two think tanks that he chooses work in the opposite direction. If we had included them, our results would generally show the media to be more liberal, not less!

To see this, let us focus on our “back of the envelope” method. Although this is not the method on which we base our conclusions, it is the one on which Nunberg bases his conclusions. Thus, if we want to explain Nunberg’s errors it’s better to focus on this method. Further, it happens that these results very closely approximate our primary method, and it is easier to explain the reasoning with this method than our primary method.

Consider Nunberg’s claim, “By excluding conservative groups that are frequently mentioned in the media, the study appears to exaggerate the media’s liberal tilt.” On the surface, this appears to be an obvious and true statement. For instance, as Nunberg suggests (and our sample examination seems to verify), the National Association of Manufacturers is a group that conservative legislators cite more than liberal legislators. Thus, our back-of-the-envelope method would indeed classify it as a “conservative” group. As an example, consider ABC World News Tonight, which for the period we examine, cites NAM 13 times. (Lexis-Nexis actually lists 17, but four of these are repeat entries.) When we add NAM, World News Tonight necessarily increases its proportion of conservative cites. This would seemingly make its ADA score become more conservative. However, when we add NAM to the mix, this also causes Congress to increase its proportion of conservative cites, which makes it appear more conservative as well. Our method only estimates the extent to which a media outlet is liberal or conservative relative to Congress. Consequently, the net effect is not clear.

If World News Tonight is to make its ADA score more conservative, it must cite NAM in relative greater frequency than does Congress. It does not do this. Namely, when NAM is not in the mix, World News Tonight cites conservative groups 318 times. When we add NAM to the mix, this number becomes 341, an increase of 4.1 percent. Meanwhile, Congress’s conservative cites increase by a much greater degree. Without NAM, Congress cites conservative think tanks 4294 times. When we add NAM, this number becomes 4673, an increase of 8.8% -- more than double the increase associated with WNT.

As a consequence, if we add NAM to our list of think tanks, this causes World News Tonight to appear more liberal, not more conservative. Specifically, when we recalculate its ADA score, it increases by 1.16 points. We did the same calculation with all the other media outlets in our sample except the Drudge Report (it is impossible to do the calculation for it because we do not have an archive of its old reports). These outlets are: (i) CBS Evening News, (ii) Fox News Special Report, (iii) L.A. Times, (iv) NBC Nightly News, (v) New York times, (vi) USA Today. Their respective ADA scores increased by the following when we add NAM: 0.45, 2.32, 2.04, 0.42, –1.52, and 0.85. The New York Times’ score decreased; hence the negative number.

Nunberg reports data about CNN’s cites of the NAM. In the version of our paper that Numberg criticizes, we do not examine any show on CNN. However, in a presentation that Groseclose made at the Stanford Workshop on Media and Economic Performance in Spring 2004, he presented results from CNN’s Newsnight with Aaron Brown. For the period that we examine, 11/9/01 to 2/5/04, Newsnight never cited NAM. Consequently, if we include NAM among our set of think tanks, then this would cause Newsnight’s ADA score to increase (ie to become more liberal). Specifically, it increases by 2.39 points.

(Here are some more details of our calculations. Nunberg reports that NAM received 617 mentions in Congress during the period we consider. In contrast, we found only 541 mentions. We use the latter number. [The anomaly could be explained by the possibility that Nunberg included the 108^th Congress in his calculations; our study did not. Regardless, if one uses Nunberg’s number, this works even more in favor of the point we are making.] Next, we read the first 20 cases that Thomas, the official congressional web site, reports of NAM mentions in the 107^th Congress. Six of these would not be counted in our sample as bona fide cites. For instance, one mention is a case where Rep. Thomas Sawyer lauds one of his constituents, who has just retired from the Goodyear Tire and Rubber company. Sawyer notes that his constituent was a member of the National Association of Manufacturers’ Communication Council. Since this is not a case of a member of the NAM being cited as a policy expert, we do not include it in our sample. Three other cases were similar, and in two cases the legislator criticized the group. Thus, an estimate of the total number of citations that our method would count is 379 [=3D 541 x 14 / 20 ]. Of the sample of 14 cites that we read and did not exclude, 11 were made by Republicans and 3 by Democrats. For the media mentions, we excluded all editorials, letters to the editor, and case where Lexis-Nexis lists the same mention twice. Of the remaining mentions, we excluded six cases where NAM was not cited as a policy expert. Two of these were with Nightly News. One mentioned a lawsuit that NAM had filed but did not quote any member of the group. Another mentioned that a member of NAM would appear on a future NBC show. The four other cases occurred with Special Report. E.g., in one NAM was mentioned because it recently placed tenth on Fortune Magazine’s most powerful lobbyists list. Again the story did not cite any member of NAM.)

Like the case with NAM, if we add Conference of Catholic Bishops to the mix of think tanks, this causes most of the media outlets to appear more liberal, not less. The ADA score of World News Tonight, Newsnight, and the above six media outlets increase by the following when we add CCB to the mix: 0.10, 0.34, -0.15, 0.21, -0.58, 0.23, -1.09, and 0.21. (The negative numbers indicate that the scores of Evening News, L.A. Times and New York Times would decrease.) If we include both the CCB and NAM, the average score of the eight media outlets increases (ie becomes more liberal) by 0.46 points.

(Here are some more details of our calculations. By our calculations CCB received 107 mentions by members of Congress. In contrast, Nunberg reports 130. Again, if one uses Nunberg’s number this works in the direction of making our point even stronger; so let us adopt 107 as the correct figure. We read all 57 of the mentions that occurred in the 106^th and 107^th Congress. We would include only 24 of these in our data set. That is, slightly more than half were not bona fide cases where a member of the group was being cited as a policy expert. Instead, most were cases like Rep. John LaFalce’s speech on May 22, 2002, when he eulogized Monsignor George Higgins. In the eulogy, LaFalce quoted kind words about Higgins from the president of the CCB. Of these 24 cites, 14 were by Republicans and 10 by Democrats. If CCB had been included in our list of think tanks, we estimate that this would add approximately 45 more congressional cites [ =3D 107 x 24/57]. When CCB was mentioned in the media, it was usually in regard to the sexual-abuse scandal by priests. We would not count these as cites, since any quote by the CCB would be to defend their own organization, not a quote where it is treated as an outside expert on policy. To eliminate these cases we searched Lexis-Nexis using the search parameters, “Conference of Catholic Bishops” and not “sex” and not “abuse.” We read the resulting mentions to make sure our method would count them as bona fide cites. The resulting cites for World News Tonight, Newsnight, and the above mentioned media outlets were respectively 3,0,2,1,0,7,7,and 18. )

4c) Nunberg also criticizes our list of think tanks for not being the most prominent possible set and for not being a “genuinely balanced” set of think tanks. However, there is no a priori reason why either criticism would bias our results. Further, Nunberg does not give one.

First, let us address the charge about not selecting the most prominent set of think tanks. Nunberg writes “Start with the list of groups from which G & M drew their initial sample. The describe this simply as a list of ‘the most prominent think tanks,’ …” Then he explains why our set is not the most prominent possible set—that is, there are groups not on our list that are more prominent than some of those on our list. Nunberg concludes this point by stating “On the grounds of sample choice alone, in short, the Groseclose and Milyo study would be disqualified as serious research on ‘the most prominent think tanks.’”

Nunberg implies that we call our list “the 200 most prominent think tanks,” as if there were a way to rank the prominence of all think tanks, and we selected the top 200 from the list. However, we do not claim that. Here’s what we actually write: “The web site, www.wheretodoresearch.com lists 200 of the most prominent think tanks in the U.S.” The key word in the sentence is “of”. That is, we are only claiming, e.g., that of the possibly several hundred think tanks that one can call prominent, our list contains 200 of them. Nunberg is deceptive when he claims that we describe the list as “the most prominent think tanks.”

More important, for our study to give an unbiased estimate of the slant of media outlets, it does not matter if we have selected the 200 most prominent set of think tanks. All we need is that the set is chosen exogenously (again, that’s why we let someone else choose our list).

For the same reason if one is running, say, a univariate regression, it does not matter if the researcher’s independent variable never takes the value that occurs most frequently in the population. For instance, suppose the independent variable is height of male subjects and the dependent variable is the subjects’ arm length. Since heights follow a uni-modal distribution, the most prominent values of the independent variables are the ones associated with moderate heights. Suppose the researcher chose a wide mix of short, medium, and tall subjects, but failed to include any subject whose height is 5’10’’, the most common height among American males. No serious statistician would claim that this causes a bias. Similarly, no statistics or econometrics textbook claims that the set of independent variables must have a certain distribution if an estimator is to be unbiased. For the same reason if we omit a few (or many) of the most prominent think tanks from our sample, this will not bias our results.

Related, Nunberg criticizes Saraf for choosing a “jumble” of groups. If by “jumble” Nunberg means “random,” for the purposes of our study, that is a compliment of the set, not a criticism. As we mentioned, what’s most important is that the set be chosen exogenously. As one learns in the most elementary econometrics classes, “random” is a sufficient (but not necessary) condition for “exogenous.” To see this, again, consider the height-arm length example. If a researcher chose his subjects randomly as opposed to those with the most frequently-observed (“prominent”) heights, then this would not affect his findings about the relationship between height and arm length. That is, he or she will find that arm length is approximately half the subject’s height, and this estimate, “half,” would be the same regardless of which of these two samples that he or she chooses.

Nunberg notes that Saraf is “a free-lance researcher with a masters degree in history who lists among his achievements that he was named Man of the Year by the Cheshire (Connecticut) Republican Town Committee.” We’re not sure of Nunberg’s purpose in this description, but we suspect it was to criticize the credentials of Saraf. If so, this is a little vicious. But it matters not a whit to our results. That is, suppose Saraf has even lower research credentials. Suppose even that he’s only a trained monkey who picked the set randomly. That does not cause a bias to our results (nor does Nunberg even attempt to explain why it could cause a bias to our results). In fact, if Saraf’s research credentials are indeed low, one could even argue that is even more reason to believe that the set is formed randomly (thus exogenously), and hence, it’s even better for our method.

Another point that Nunberg raises is that many of our groups are not pure think tanks. E.g. some, such as NAACP, the NRA, and the ACLU, are more appropriately described as activist groups. We are guilty of calling all of them “think tanks.” We do this only because it is unwieldy to to call them throughout the paper, eg., “think tanks, activist groups, and other policy groups.” But more important, there’s no a priori reason to exclude groups that are not pure think tanks. Likewise, there’s no a priori reason to exclude pure think tanks and to use only activist groups. For our method, the key is to include groups that are cited both by the media and members of Congress. In fact, just imagine the criticism to which we would expose ourselves if we had used only one type of group. Someone such as Nunberg could say “It is ‘puzzling’ why Groseclose and Milyo included only pure think tanks in their list. This alone would disqualify the study as serious research.” Or, alternatively, if we had done the opposite, such a critic could say “It is ‘puzzling’ why Groseclose and Milyo included only activist groups in their list. … ”

A separate issue is whether the list of think tanks is ideologically balanced. Nunberg is not clear in which direction he thinks Saraf’s set is ideologically imbalanced. We think, if anything, Saraf’s set is slightly skewed toward containing more conservative groups—e.g. it contains none of the “Nader” groups such as Public Citizen, Center for Auto Safety, and Center for Science in the Public Interest. And Nunberg notes that Saraf was awarded Man of the Year by a Republican group. (We do not know why Nunberg mentioned this. It is possible that it was only to denigrate Saraf’s credentials and not to suggest that the list is skewed in the conservative direction.) On the other hand, Nunberg writes “by excluding conservative groups that are frequently mentioned in the media, the study appears to exaggerate the media’s liberal tilt.”

But even if our sample of think tanks is skewed left or right, this will not bias our results. To see this, consider the above regression where the researcher includes twice as many tall subjects as short subjects. As we explained, this will not affect the expected relationship between height and arm length—that is, the estimated parameter associated with the independent variable. That is, it will not cause a bias to the estimates.

5) Nunberg writes “Then, too, Groseclose and Milyo’s survey of the citations of groups in the Congressional Record shows some results that would most kindly be described as puzzling.” He focuses especially on the results we report for two groups, the ACLU and the Alexis de Tocqueville Institution. Nunberg is dishonest in his presentation of our ACLU results. In his presentation of results surrounding the Alexis de Tocqueville Institution he reveals, once again, that he did not read our paper very well: that organization ranks highly based on the criterion of sentences cited, not total cites (but Nunberg misses this point). Also, with each group, Nunberg makes a suggestion that, if we were to follow them, it would make the media outlets in our sample appear more liberal, not more conservative.

Consider the ACLU results. Nunberg writes:

“At another point G & M explain that they disregarded the ACLU in their final analysis because it turned up with an excessively conservative score, owing to Republicans who cited it for its opposition to McCain-Feingold.”

Here’s what we actually wrote:

“The primary reason the ACLU appears so conservative is that it opposed the McCain-Feingold Campaign Finance bill. Consequently, conservatives tended to cite this fact often. Indeed, slightly more than half of the ACLU sentences cited in Congress were due to one person, Mitch McConnell (R.-Kt.), who strongly opposed the McCain-Feigold bill. If we omit ACLU citations that are due to McConnell, then the average score, weighted by sentences, increases to 70.12. Because of this anomaly, in the Appendix we report the results when we repeat all of our analyses but omit the ACLU data. This causes the average score of the media outlets to become approximately one ?? point more liberal.”

At this point, we ask you, the reader, to re-read these two passages. With many of Nunberg’s criticisms, he is simply sloppy or careless, or simply misunderstands some technical details of our method. With this point he is dishonest.

Despite what he writes, our final analysis included the ACLU data. In fact, it turns out that the only analysis that we report in the paper contained the ACLU data. Our passage notes that we did the analysis both ways: with and without the ACLU data. The results with the ACLU data are reported in the main text, and the results without the ACLU data are reported in the Appendix. However, we have not yet written the Appendix (and of course the web site to which Nunberg links to our paper lists no Appendix).. Thus, the only results we report in the paper are the ones that do not disregard the ACLU data. The paper is still a rough draft, polished enough to present at academic seminars (that is where the paper is listed—on the web page for a Yale seminar series, where Groseclose presented the paper). Yet it is clearly not in its final form. Indeed, throughout the paper we have written “xx” where we intend to fill in details, and in fact the above passage regarding our results when the ACLU is omitted lists “??” in the sentence. We have done some preliminary analysis that suggests that ADA scores of media outlets will increase by about one point when we omit the ACLU data.

Remember, that an increase in an ADA score means the outlet becomes more liberal. Nunberg writes that our final analysis disregarded the ACLU data, and he implies that we should have done the opposite. Of course, if we follow this suggestion (which, it turns out, we did) this makes the media appear more conservative, not more liberal, than if we had disregarded the ACLU data.

Related, Nunberg’s next two sentences after the above sentence are, “Other researchers might wonder whether there might be similar anomalies in the results obtained for other groups, and might even suspect that this result cast some doubt on their overall method. G & M seem untroubled by that possibility.”

How ominous. We are “untroubled by that possibility.” It turns out that out of 200 think tanks in our sample, there seem to be only two anomalous rankings. First is the Rand Corporation, which our method places to the left of center. We have mentioned this finding to four scholars at Rand. None were surprised, and each agreed that the result is due to the fact that most of the conservative scholars at Rand focus primarily on military research, and these studies tend not to be cited very frequently by the media and members of Congress. Part of the reason is because these studies are often classified. The other anomaly was the ACLU. Our method ranked it (just barely) among the most conservative half of the think tanks. As we mention in the paper, the reason is due to one person, Senator Mitch McConnell. After the ACLU announced that it opposed McCain Feingold, McConnell seemed to mention this at every opportunity he had. In fact, he alone accounted for half of the total congressional citations to the ACLU. No other think tank had such an odd distribution of citations.

In closing, we have devoted considerable time and effort to responding to Nunberg’s irresponsible charges. We do not intend to repeat this exercise for every bit of malicious gossip posted by someone on one of these “blogs.” By exposing Nunberg’s errors and deceptions we hope to encourage other scholar/bloggers to behave in a more professional manner.

August 2^nd, 2004

Tim Groseclose
Jeff Milyo

Posted by Mark Liberman at 05:07 PM

August 01, 2004

Verbatim fragments online

Editor Erin McKean and the other folks at Verbatim Magazine are working towards having "the complete run of VERBATIM back issues available online, and searchable, too". Meanwhile, some issues from 1998 to 2001 are up, in various forms. On a quick browse, I enjoyed Michael Adams' two-part series from summer and autumn 1999 on "Slayer Slang", which became the title of his 2003 OUP glossary.

Posted by Mark Liberman at 10:37 PM

Not on planet earth

Ian Urbina has a feature in the 7/31 NYT on New York City trash collector slang. Despite quotes from notables such as Grant Barrett, there's not a great deal of lexicographical juice in the article, once you get past disco rice (maggots) and urban whitefish (used condoms). The high point for me was the progression from nimby ("not in my backyard") to banana ("build absolutely nothing anywhere near anyone") to nope ("not on planet earth").

However, I'm not sure whether the second and third terms are more than in-jokes in the "City Council's sanitation and solid waste committee", whose general counsel is quoted as the source.

[Update 8/2/2004: Barry Popik emails to point out that Word Spy has NOPE with a citation from 1990, entered in Sept. 2002, and BANANA with a citation from 1991, entered in Feb. 1999. Word Spy also has another NIMBY-extension, NOTE ("not over there either"), with a citation from 1994, entered Sept. 2002.

So clearly these other jokey terms for resistance to development have been around for a while, and have nothing special to do with the NY Sanitation department -- though unlike NIMBY, I don't think they've spread very widely in the population at large.]

[Update #2: Linda Seebach emailed to observe that

Our real-estate columnist wrote about "banana" in 1995 (the only hit on "nimby and banana" but banana by itself would give a lot of false hits)

]

[Update #3: Jesse Sheidlower emailed to observe that the OED has citation slips for both BANANA and NOPE from "the earlier 1990s or earlier", as well as a variety of variants on the whitefish="used condom", including Coney Island whitefish 'condom washed up on the beach'. ]

Posted by Mark Liberman at 08:30 PM

all things phonology

As you can probably tell from my many posts to Language Log since I started a little more than a month ago, I'm pretty taken with the whole blogging-about-linguistics thing -- so much so that I decided to start my own blog, specifically geared toward phonology.

I call it phonoloblog. Within the mere week or so since I started slowly spreading the word, a few interesting contributions have already been posted (from phonologists other than me). My first post is a welcome message defining the rules of the blog. Please feel free to visit.

[ Comments? ]

Posted by Eric Bakovic at 06:22 PM

A long sort of grasping grope

I got curious about graunch, because I saw it used in a comment on a commentary on one of our posts. I don't think I've ever come across this word before, which is not surprising since all its forms (graunch, graunches, graunched, graunching) total 1573 whG, or about 367 whG/bp, or roughly one form of graunch in every 3 billion words of web-accessible text. Also, it seems to be a British-commonwealth thing, with associations with New Zealand, the R.A.F., and South Africa.

Graunch seems to have started as an onomatopoeic word for the sound made by a sort of medium-length grinding contact, along with some associative blending of words like grope, ground, grind, grate, crunch, crash, crush, punch, launch, raunch, etc.. People make this kind of word up all the time, and the use that started me off (wait, I'll get to it!) may be have been this kind of nonce formation. Or maybe not.

The OED has graunch as a dial. and N.Z. verb, glossed

intrans. To make a crunching or grinding sound; trans., to cause to make such a sound; hence, to damage (a mechanism of some kind). Also vbl. n.; graunchy a., difficult, testing.

with citations from 1881:

1881 A. B. EVANS Leics. Words 163 Graunch, var. of ‘crunch’ and ‘scrunch’, to crush or grind with a noise; crash. ‘I'm sure it freezes, for I heard the ice graunching under the wheels of the carriage.’
1954 Dominion (Wellington, N.Z.) 1 July, As far as I know ‘to graunch’ means to damage an engine, instrument, machine, etc., by using wrong tools and/or repair methods, and a ‘graunch-artist’ is ‘a person who does that’. Ibid. 9 July, The first time I heard the word [graunch] was some time in '39 or '40, and it was used by an English airman. To the best of my knowledge it originated in the R.A.F. and was pronounced ‘garraunch’ to reproduce the sound of a plane as it crashed and slid... Later it became a one-syllable word and refers to any metal torn or damaged by force.
1957 Evening Post (Wellington) 17 Apr. 8 (headline) Graunch! -- Bang goes more than the door.
1959 D. BEATY Cone of Silence xxviii. 288 ‘Have you tried this new take-off technique?’ ‘Yes sir..we're in for a hell of a graunching.’
1964 Observer (Colour Suppl.) 11 Oct. 42/1 Many people ‘graunch’ their gears.
1965 N.Z. Listener 27 Aug. 9/1 John Pascoe himself knew that editing an encyclopaedia would be a lengthy project. ‘When I started I knew it was a graunchy job. But then I've always liked long graunchy jobs. It's tied in with my youthful experiences of long distance running... I rather like long, slow patient plodding.’
1968 Dominion Sunday Times (Wellington) 10 Apr. 2/3 They said they could hear the ship ‘graunching’ on the rock.

UrbanDictionary gives three (user-supplied) definitions, all of a verbal form: "To spoil an object by carelessness or ignorance or both"; "to damage something by using excessive force, often with the use of an incongruos [sic] implement"; "To make to fit by the use of excessive force."

This page (among others) cites a South African slang usage:

Graunch: Make out - "during the film, my boyfriend and I graunched in the back row" - during the movie we french kissed, rubbed, etc....

This meaning apparently comes from the kind of forceful grinding contact imagined to result in a graunching sound. It's possible, of course, that the sexual sense preceded the mechanical one -- there are lots of explicitly sexual metaphors in mechanics' jargon, both formal and informal.

Extending the word in another direction, this page cites an application of the "fit by extended forceful wiggling" meaning to software rather than hardware:

In the technical lingo, connecting programs in this way is often called systems integration. But Brian Randell, a computer scientist at the University of Newcastle upon Tyne, suggests that "there is a better word than integration, from old R.A.F. slang: namely, 'to graunch,' which means 'to make to fit by the use of excessive force.' "

But these lexical citations are all verbs. I found a couple of bits of apparently nominal lexicography for graunch: this glossary says that "Graunch is computer slang for a big mistake", and The Dictionary of Antarctic Slang says that Graunch is "The result of hamfistedness".

On the web, nominal uses of graunch forms seem to outnumber verbal ones. Checking 30 of Google's 120 instances of "graunches" and 30 of the 654 instance of "graunch" yielded about 60% noun uses (and about the same proportion for each form separately as well).

Some of the nouns describe sounds:

(link) Check for rumble from the main bearing - spin the platter without the drive belt in place, getting your ear close to the deck to listen for any graunches.
(link) I found that the Cedar de-noising system couldn't cope with some of the worst graunches so I redrew some of the waveforms in ProTools and sat back astonished.
(link) Playing both electric and acoustic guitars (not in the same song of course!), her extraordinary voice (off to the rough edge of deep with some incredible, soul-tearing graunches and heartmelting vibrato a little bit like Macy Gray's) imprinted the rapt audience with some equally extraordinary songs...

while others describe mechanical events associated with such sounds:

(link) Little 'graunches' (such as landing a bit too heavily after a jump) cause a hairline crack to creep along the tubular steel framework of the car. Mega graunches cause 'impact craters' (little holes).
(link) Gear changing takes practice and despite a lot of "graunches and grinds" till we got the hang of shifting, recent inspection of the transmission shows all teeth to be present and correct.
(link) Particular problems of Tauri in general include warped front brake rotors, steering
graunches, splotchy paint, water leaks around the trunk, and rust.

still others describe the results of such events:

(link) The big clues that a Range Rover has been exposed to more than its fair share of off-road work include scuffs, scraps and graunches in the floorpan and chassis rails.
(link) 9:54pm, nah, we're ready to roll, the graunch on the lower front is where we collapsed the ramp at the other end of the bridge.

There are some "integration" or force-fitting nouns:

(link) I am curious to see if the haskell integration is a good fit or a graunch (square peg into a round hole).

and some "long tough grind" ones:

(link) John had suggested that the team would take about 3 hours to get to the dig. I was thus mentally prepared for a long slow graunch down Southern Stream Passage.
(link) The long trek to Pen-y-ghent (2,273ft) was a graunch for some and the general feeling was that if the Dales National Park could put it in a wheelbarrow and place it nearer to the other two, they would be performing a great service.

as well as some quite unexpected ones:

(link) During the intervening years I had on-going mild to moderate symptoms - periods of diarrhea, abdominal pain in varying degrees - manageable, and I continued to put them down to stress because they had found nothing wrong in 1992. I just thought of them as "my graunches".
(link) Composite of Primitive Graunch, Ovis aries vignei; introgression since 1970 from Merino Precoce (France), Suffolk (United Kingdom), German Merino (Germany)

The verbal uses are generally similar, spanning a range from noises

(link) Slowly the drive head graunches and whirrs its way across the disk marking and getting data off the bad sectors.
(link) The rubber graunches and squeaks as it slithers along the course, hitting corners and bouncing to the next.
(link) When stopping the brakes may graunch, squeak and in some cases whistle.
(link) Yes, bumps will cause some violent tugging at the wheel, and yes, it graunches horribly while reversing at slow speed, but the upside is a whole new chapter written into the laws of physics.

to events, noisy or otherwise

(link) Dust and dirt get in and the mast graunches against the housing, wearing it out.
(link) Try not to show irritation if your new driver stalls the car, or graunches the gears.
(link) Even long time Alfa owners sometimes graunch a second gear shift.
(link) But just when you think they've lost their way and must be storyboarding The Black Cauldron II right now, the ever-reliable Disney Machine graunches into gear.

and the (negative) results of such events:

(link) I soon discovered that one has to have the rear doors on the semitrad cockpit fixed open or latched shut, otherwise you graunch your knuckles on them as the tiller only just clears them.
(link) The mail software didn't request transparent mode, and thus couldn't be
used to graunch someone's terminal.
(link) You get what you pay for in Sockets, a bad 12 point can graunch a bolt head faster than you can say "i graunched all the bolt heads!!"
(link) I graunch my R knee getting out of the car every now and then and not using it is the only quick healing you can do.

This page has an extended discussion of what seems to be an example of the S.A. sexual slang, though it has something to do with the (British) Dr. Who show, and takes place (or at least is posted) in New Zealand:

Katy: Yes, and Nick always got first prize as being the warmest, cuddliest person but you have got to be careful because he graunches.

Nick: Graunches?

Katy: You remember that?

Nick: I remember the word graunch very well.

Katy: It was scripted. We got a script and it said, well it said one thing I can't say here which was very funny, but you're all much too young. It was very funny but I'm not going to do that - that was with the Doctor! But it also said: "Brig enters and graunched Jo."

Nick: Which story was that?

Katy: Don't ask me, I can't remember. I remember the graunch, though.

Nick: Yes, I remember, but why would I graunch you?

Katy: Thanks a lot!

Nick: Not why would I, Nick, graunch Katy, but why would the Brig graunch Miss Grant?

Katy: Well, I think he must have quite liked Miss Grant.

Nick: Yes, graunch, yes, I remember that word. It's kind of close to a grope.

Katy: It's the nearest the Brig ever got!

Nick: I think it's an extension of a grope.

Katy: Yes, it's a long sort of grasping grope. (She is given a cup of Coke) Oh, how sweet!

OK, now we get to the example that I started with. It's a comment by "artela" (who is from Wales) to a post on a LiveJournal site belonging to "murkee" (perhaps Car Talk's pollster Paul?):

Amusing, yes... but the grammar really graunches!

This is all in reference to Eric Bakovic's recent (Language Log) discussion of Jon Stewart's remark on Larry King Live:

Well, just purely for the knowledge of geography. It's just fascinating to learn about these countries. ... I didn't know Kabul was the capital of Afghanistan until we started bombing it. ... If we would haven't gone to war there, I certainly wouldn't have known that.

Murkee apparently didn't realize that Stewart was being ironic, and so he is shocked that Eric would carry on about the grammar rather than being outraged that someone would propose going to war in order to learn geography. Artela's rejoinder, in context, seems to mean that "the grammar really grates", in the sense of setting her teeth on edge.

I haven't seen this use of graunch elsewhere, but it's reasonable enough to take the idiom "X grates (on me)" meaning "X annoys me", and substitute a more expressive term for "grate". When I first read the comment, graunch struck me as a substitute for the verb of another idiom class, e.g. "X bites" -- with graunch maybe a blend of gnaw, growl, crunch and raunch, or just imitative word for a large growling bite. But I now think that's less likely than the grate-substitution story.

I guess what strikes me about this whole business is how such a rare word manages to maintain a coherent pattern of use among English speakers across the world (if we ignore the sheep :-)). There are all sorts of extensions -- noise/event/result/intent, mechanical contact vs. sexual contact -- but extensions like that are part of the history of every word.

And by the way, when I search for "graunch", Helpful Google asks me:

Did you mean to search for: france

Posted by Mark Liberman at 02:36 PM

Joe versus the semi-conversation

Neal Whitman emailed with a memory evoked by my post about the woman who asked her cell phone over and over about the weather. This reminded him of the Tom Hanks cult film Joe Versus the Volcano.

When we first see Tom Hanks's workplace, his boss is on the phone, and during the whole scene, he's in the background saying various combinations of:

   I'm not arguing that with you!

   I know he can GET the job. But can he DO the job?

   [one more that I've forgotten, I think]

I think we even see him in a later scene, and he's still conducting the same conversation.

I've never seen the movie, but I filled in the details from the script available here. Tom Hanks works in the International Advertising Department at American Panascope Corporation. His boss is Mr. Waturi. The opening scene shows Joe arriving for work:

 
     Behind Dede, at a bigger
	  desk, is MR. WATURI.  He's leaning back in an executive
	  chair, talking on the phone. [...]

						  WATURI
				Yeah, Harry, but can he do the
				job?  I know he can get the
				job, but can he do the job?
				I'm not arguing that with you.
				I'm not arguing that with you.
				I'm not arguing that with you

	  Mr. Waturi waves absently at Joe and goes on talking into
	  the phone.

						  WATURI
				Who told you that?  No.  I
				told you that.  Me.  What?
				Maybe. Maybe.  Maybe.

	  Joe hangs up his coat on the coat rack and goes to the
	  coffee set-up at the rear of the office.

Depressed and hypochondriacal, Joe goes to the doctor and learns that he has a fatal "brain cloud" (don't ask). When he returns to work, Waturi is on the phone again:

						  WATURI
					  (on phone)
				No.  No.  You were wrong.  He
				was wrong.  Who said that?  I
				didn't say that.  If I had
				said that, I would've been
				wrong.  I would've been wrong,
				Harry, isn't that right?

Since Waturi spent some time bullying Joe in the first scene, this can't actually be the same conversation, at least if the film follows the cited script. However, I don't think it's an accident that Neal seized on Waturi's semi-conversations as one of the characteristic torments of the modern damned. The script specifies lots of other unpleasant details at American Panascope -- bad weather, brain-damaged workflow, fluorescent lights, a co-worker who "[eats] pink Hostess snowballs ... in a slow, dismal way, as if they were giant sleeping pills." But it seems to be Waturi on the phone that made the biggest impression, and not only on Neal:

(link) This film has become a Fontaine Family Favorite and we’ve seen it at least ten times. ... Babs and the Babettes still use lines from the movie such as: "It's always something with you, isn't it, Joe?" and "I didn't say that. If I said that, I was wrong." and “You can do the job but can you get the job?”

(link) Joe vs. the Volcano is not completely devoid of charms. ... The movie's best bit ... is at the beginning with Dan Hedaya. Hedaya is a pro who can steal any scene, even when it's nailed down in a good movie. His interaction with the phone, and to a lesser degree with Hanks, is excellent, recalling his great work in Blood Simple.

Given open-plan offices and cell phone plans with unlimited minutes, semi-conversations are an increasingly inescapable social toxin. You can console yourself by thinking of them as a historical step up from chamber pots emptied out windows, body odor and the breath of crowds for whom raw onions were a staple snack. You can make the overheard conversations into poems, or maybe fragments of movie scripts. You can meditate on the associated psycholingistics and social psychology. Somewhere, someone has probably revalued semi-conversations as a sexual fetish -- no doubt there's already a network of web sites devoted to this specific type of masochism. But the most practical option seems to be to screw your earbuds in tighter and turn up the volume on your personal sound track. When you're not talking on your own cell, that is.

Posted by Mark Liberman at 09:30 AM

What book did John buy and read the magazine?

I've been unearthing coordinations that I find grammatical, even though they violate some hypothesis about coordination, as in my "Do I misspeak?" posting. The latest find -- (1) "Hyatt Rickeys, which will be demolished and the property turned into a residential development" (which I reported on on ADS-L, 7/28/04) -- is an apparent violation of the Coordinate Structure Constraint, with an extraction only from the first conjunct; (2) "What book did John buy and read the magazine?" is a classic example of this type, and pretty much everyone (me included) judges it to be ungrammatical. But (1) seems fine to me, so it looks like another in a list of types of exceptions to the CSC, for instance things like (3) "What did Harry go to the store and buy?" and (4) "How much can you drink and still stay sober?", which are widely judged to be grammatical (and I agree).

Now, at Neal Whitman's suggestion, I've looked at the "coherence and extraction" section (chapter 6) of Andy Kehler's Coherence, Reference, and the Theory of Grammar (2002), where exceptions to the CSC (also the literature on them) are thoroughly surveyed, then reanalyzed in discourse-structural, rather than purely syntactic, terms -- and there I find Kehler asking us to reassess even the asterisk on examples like (2).

As Kehler puts it (p. 133):

It may... be possible to retain a suitably articulated hypothesis that syntax is autonomous and still account for [data like (2)-(4)], as long as the CSC is neither included within nor is a by-product of the system of grammar rules. Because this view forces us to the conclusion that sentences like [(2)] are perfectly grammatical, however, it brings to light a potentially worrisome situation regarding the manner in which theorists rely on their judgments. Previous researchers have certainly considered sentences like [(2)] to be ungrammatical, and it is at least questionable whether this judgment differs qualitatively from other ungrammaticality judgments upon which researchers commonly construct their syntactic theories. This data should force us to reassess whether the intuitions we have about ungrammaticality really represent syntactic wellformedness, and if they do not, what one might use as a basis for determining what sentences are unacceptable for purely syntactic reasons.

Definitely food for thought, I say.

Some further remarks on (1)... Though this particular sort of example hasn't, so far as I know, been discussed in the literature (even by Kehler), it strikes me as falling pretty easily into Kehler's general "coherence" approach, though the details need to be worked out. The idea would be that there's a relationship of association-in-context between the hotel (Hyatt Rickeys) and the property (the property on which the hotel sits, in fact) -- a relationship much like that between the referent of the dependent noun of a N-N compound and the referent of its head noun (cf. "pumpkin bus", with its wide range of interpretations, or for that matter "hotel property") -- as well as a relationship between the demolishing of the hotel and the transformation of the property into housing, namely that the two are presented as parts of a single unfolding event.

Things are not quite so simple, though. Gapping is crucial to the acceptability of (1). Without Gapping in the second conjunct, the coordination is much less acceptable: (1') "Hyatt Rickeys, which will be demolished and the property will be turned into a residential development". I wouldn't go so far as label (1') as ungrammatical, especially in view of Kehler's cautions above, but it certainly is clunky. In contrast to (1), where Gapping helps things along, compare cases where Gapping helps not at all: (5) "Kim, who ate sushi and Sandy sashimi" is as bad as (5') "Kim, who ate sushi and Sandy ate sashimi". It seems that the special relationship between the referents of the two subject NPs (Hyatt Rickeys and the property, vs. Kim and Sandy) plays an important role here, and in other examples constructed by Clai Rice in a 7/30/04 posting to ADS-L: (6) "the junk cars, which will be destroyed and the tires recycled" (the tires could be the ones belonging to the cars, ones on which they sit in the junkyard, ones that are stored inside the cars, or many other things -- roughly the range of interpretation for "car tires"); (7) "the junk cars, which will be crushed and the birds transferred to the sanctuary" (cf. "car birds"); and (8) "the hotel, which will be demolished and the lake filled in" (cf. "hotel lake").

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 03:08 AM