Language Log: July 2007 Archives

July 31, 2007

well worth waiting for

As promised, the answer to "Karttunen's Konundrum," the puzzle Lauri Karttunen raised about the entailments of negated wait that I posted about last week. The point, if you'll cast your mind back, was that a sentence of the form "X didn't wait to do Y" (e.g. "She didn't wait to open the box") can mean either that X did Y right away or that X didn't do Y at all. As Lauri explains:

Question 1: How does the ambiguity of "not wait to Z" come about?

On the web, we find examples such as:

"My biggest regret is that I didn't wait to get married to have kids," says Gerald, a father of three. "If I had it to do over again, I'd wait until I was married to become a father."

In this case we get two entailments:

=> I didn't get married.
=> I had kids right away.

Similarly, we find:

Trudeau did not wait for Nixon before recognizing China, and Mulroney did not wait to take his cue from Reagan to boycott South Africa.

=> Mulroney did not take his cue from Reagan.
=> Mulroney boycotted South Africa right away.

To put it more generally, the verb wait can take two infinitival complements. If both complements are present we get two entailments of opposite polarity:

not wait (to X) (to Y)

=> not X
=> Y right away

There is an implicit temporal ordering. If X had happened, it would have been before Y. Both complements are optional. When only one complement is present, as in Neil didn't wait to take off his coat, we don't know which of the two it is without the help of the context.

We find the same ambiguity in embedded wait to contexts. For example, did not bother to wait to X entails "not wait to X" (bother to is a ++/--implicative). In most of the examples of this type on the web, the intended reading of did not bother to wait to X seems to be "not X," as in:

Obviously, our leaders hold the Iraqi war machine in such utter contempt that we didn't bother to wait to use our huge advantage in night-fighting.

But we also find examples that go the other way:

"Pressing my lips hard against his I didn't bother to wait to slide my tongue into his warm inviting mouth."

This is clearly a case of "Y right away" (how come we all agree on that?).

Question 2:: Why is it not possible to retain the ambiguity in translation?

Once you realize that the ambiguity is syntactic and depends on the available "subcat" frames for the verb wait, it is not at all surprising that the ambiguity is not preserved in translation. In many Western European languages such as Finnish and Dutch it is not even possible to use the counterpart of wait to express either one of the two meanings. In Finnish, "Neil did not wait to take off his coat" translates as eitherNeil jätti takkinsa päälle, "Neil left his coat on" or Neil riisui takkinsa tuota pikaa "Neil took off his coat right away." Mona Diab tells me that a translation into Egyptian Arabic (but not Standard Modern Arabic) could preserve the ambiguity. It is the only exception I have heard about so far.

Posted by Geoff Nunberg at 02:14 PM

At last...

The long-awaited killer app for speech synthesis:

Posted by Mark Liberman at 02:09 PM

Maybe not as sorry as all that

To Roger's enumeration of the features of conventionalized apologies we can add those clauses that serve to extenuate ill-considered statements, where the blame is laid on an unfortuate choice of words, often compounded by the uncooperativeness of one's hearers ("I'm sorry if my remarks left the impression that...."). Though it's worth bearing in mind that sometimes we have an interest in conveying that a deliberately chosen utterance was a mistake, if only to reassure our listeners of our benign intentions. All of which I raise entirely as a pretext for posting my favorite cartoon by George Price, the greatest draughtsman ever to have graced The New Yorker's pages. The caption, for the diopterically challenged: "It's a typographical error, but have no mercy on me."

Posted by Geoff Nunberg at 01:09 PM

Lips that touch lambchops, shall never touch mine

A recent New Zealand research report on attitudes about vegetarianism (Annie Potts and Mandala White, "Cruelty-Free Consumption in New Zealand: A National Report on the Perspectives and Experiences of Vegetarians and Other Ethical Consumers", New Zealand Centre for Human-Animal Studies, University of Canterbury) includes a short section on "cruelty-free sex" (p. 98), which the authors summarize this way:

Some women felt squeamish about the idea of having intimate physical contact with a person who ate meat; sexual intimacy with meat-eaters was also opposed on more ideological grounds, viewing their bodies as composed of/from dead animals

This immediately reminded me of an earlier melding of sex and the morality of consumption, Sam. Booth and Geo. T. Evans' 1874 song "The Lips that Touch Liquor, Shall Never Touch Mine":

This aspect of Dr. Pott's report is taken to the next logical step -- coining a word -- in an article by Rebecca Todd in today's Christchurch Press, "Carnivore sex off the menu":

A new phenomenon in New Zealand is taking the idea of you are what you eat to the extreme.

Vegansexuals are people who do not eat any meat or animal products, and who choose not to be sexually intimate with non-vegan partners whose bodies, they say, are made up of dead animals.

The co-director of the New Zealand Centre for Human and Animal Studies at Canterbury University, Annie Potts, said she coined the term after doing research on the lives of "cruelty-free consumers".

In the news story, the research report's mention of "some women [who] felt squeamish" (about indirect contact with eaten animals) has subtly morphed into "many female respondents" who appear to be fighting against the dark side of their sexuality:

Many female respondents described being attracted to people who ate meat, but said they did not want to have sex with meat-eaters because their bodies were made up of animal carcasses.

(Note to self: good example for Ling 001 lecture on de re vs. de dicto reference.)

The research report mentions one man with similar concerns -- he says that non-vegetarian partners would have to brush their teeth carefully. This is not mentioned in the news story, which seems to assume that men have no preferences in this respect, or that their preferences don't matter. In either case, there's a striking psychological echo of the gender choice in the old temperance song.

[Hat tip: Reinhold (Rey) Aman]

[Update -- a reader comments:

"Lips that touch liquor shall never touch mine."
I've always been tickled by this statement's ambiguity: "mine" could refer equally to my lips or my liquor. I try to live by the latter reading.

]

Posted by Mark Liberman at 08:06 AM

Cow dialects keep on keepin on

The ghost of PR gimmicks past?

[Hat tip: Robert Pérez]

Posted by Mark Liberman at 07:25 AM

A dipole charge embiggens the smallest D-brane

Anyhow that's what Riccardo Argurio, Matteo Bertolini, Sebastian Franco and Shamit Kachru think, according to their paper "Gauge/gravity duality and meta-stable dynamical supersymmetry breaking", arXiv:hep-th/0610212v2.

You can read more about it in an interview with Kachru on the Scientific American blog, posted yesterday by JR Minkel under the title "How a fake word from The Simpsons ended up in a perfectly cromulent string theory paper".

[Hat tip: Leslie Katz]

Posted by Mark Liberman at 07:24 AM

July 30, 2007

Spitzer limps through a public apology

In an op-ed piece in Sunday's New York Times we find still one more public apology, this one by New York's governor, Eliot Spitzer. Language Log has been alert to this speech act in its past posts on non-apologies here and here and here and here, as well as in Geoff Nunberg's assessment of what makes an apology work effectively. Now we hear from Spitzer about some recent events that happened in his office.

From what Spitzer wrote in his apology, I paraphrased his points in the sequence he said them, followed by my own brief comments in parentheses:

1. Nothing illegal happened on my watch.

(Whew! What a relief.)

2. What my staff did was wrong.

(He wasn't directly responsible, although it happened on his watch.)

3. I've already apologized to the Senate majority leader.

(He apologized for what his staff did.)

4. What happened is not what we are about.

(This is not their/our typical behavior.)

5. I warned my staff to avoid this.

(See? It was their fault, not his.)

6. Some forgot my warning and created the appearance of wrong-doing.

(They didn't do this intentionally, though; they just had a memory lapse.)

7. We acted at once and got rid of some fine, distinguished people even though they didn't break the law.

(The royal "we" fired some really good people who didn't do anything illegal.)

8. What happened should not deter our good progress.

(Can't pass up the opportunity for a bit of p.r. here.)

9. Partisan politics should stay out of this.

(A warning in an apology?)

10. We will move forward anyway.

(Not exactly a vow the his staff won't repeat this in the future.)

You can reach your own conclusions about the effectiveness of this as a public apology but it's possible to note a few things that the governor might have done differently. First, nowhere are we told what the offense was (only that it was not illegal) and, despite the fact that he admits it happened on his watch, Spitzer tries to make it clear that he was not personally responsible for whatever this was, making "under my watch" a bit confusing. He says he warned his staff not to do whatever it was they did. They just "forgot" his warning (nothing intentional here) and gave the "appearance" (not specified how they managed to accomplish this) of doing something (still unspecified) that was actually wrong. He fired them for creating this "appearance" (maybe it's our fault for not recognizing this as only an "appearance") of doing something wrong (still unspecified) because "it" (creating such an appearance) is not what his office is all about. Nor will this unspecified, not-illegal event that led to the public appearance of something wrong stop Spitzer and his remaining staff members from continuing to do the long list of good things his office has been doing -- just in case we happened to forget these and need to be reminded. And finally, he gives a warning, which seems odd in an apology: his opponents better not try to convert this into political partisanship.

Applying Geoff's list of reported conditions that an utterance is said to satisfy if the utterance is to count as a true apology, we learn that the apologizer is supposed to regret the act, feel sorry about it, accept responsibility for it, and vow not to repeat it. It would have been helpful for Spitzer first to have mentioned the offensive act that he is now regretting. Maybe everyone in New York (and perhaps even beyond) already knows that his aides are accused of misusing the State Police to try to tarnish his political opponent, but an apology looks pretty lame when it fails to mention the thing for which the apology is being made. We can infer that Spitzer feels sorry about what happened, and he undoubtedly is. He apparently can't vow to not repeat this act because in doing so, he'd have to tell us what it was.

Maybe Spitzer's effort to apologize is better than Gonzalez' "mistakes were made," but it falls short in many respects. I like what Geoff said near the end of his post:

In the contemporary theater of contrition, the point of ritualistic apologies isn't to demonstrate that an offender is really, truly sorry, but only that public opinion has the power to exact the expression of self-abnegation (or in Goffman's terms, self-splitting) that's inherent in a formal apology.

Update: Stacy Furrer writes that she tells her kids, "If you can't name what you did, you're not owning it."

Posted by Roger Shuy at 10:17 AM

Thou shalt not report odds ratios

This is a second in a series of posts aimed at improving the rhetoric (and logic) of science journalism. Last time ("Two simple numbers", 7/22/2007), I asked for something positive: stories on "the genetic basis of X" should tell us how frequent the genomic variant is among people with X and among people without X. This time, I've got a related, but negative, request.

No, let's make it a commandment: Thou Shalt Not Report Odds Ratios. In fact, I'd like to suggest that any journalist who reports an odds ratio as if it were a relative risk should be ~~fired~~ sent back to school.

Many of you probably don't know what I'm talking about -- that's why dozens of science journalists disobey this commandment every week. But the basic concepts are simple, and nothing more than simple arithmetic is required to understand them.

Here's a simple, classic example that illustrates the problem. A few years ago, some researchers from Georgetown University published in the New England Journal of Medicine a study that demonstrated systematic race and sex bias in the behavior of America's doctors. Needless to say, this finding was widely reported in the media:

Washington Post:	"Physicians said they would refer blacks and women to heart specialists for cardiac catheterization tests only 60 percent as often as they would prescribe the procedure for white male patients."
L.A. Times:	"[Doctors] refer blacks and women to heart specialists 60% as often as they would white male patients."
N.Y. Times:	"Doctors are only 60% as likely to order cardiac catheterization for women and blacks as for men and whites."

Now let't try a little test of reading comprehension. The study found that the referral rate for white men was 90.6%. What was the referral rate for blacks and women?

If you're like most literate and numerate people, you'll calculate 60% of 90.6%, and come up with .6*.906 = .5436. So, you'll reason, the referral rate for blacks and women was about 54.4 %.

But in fact, what the study found was a referral rate for blacks and women of 84.7%.

What's going on?

It's simple -- the study reported an "odds ratio". The journalists, being as ignorant as most people are about odds and odds ratios, reported these numbers as if they were ratios of rates rather than ratios of odds.

Let's go through the numbers. If 90.6% of white males were referred, then 9.4% were not referred, and so a white male's odds of being referred were 90.6/9.4, or about 9.6 to 1. Since 84.7% of blacks and women were referred, 13.3% were not referred, and so for these folks, the odds of referral were 84.7/15.3 ≅ 5.5 to 1. The ratio of odds was thus about 5.5/9.6, or about 0.6 to 1. Convert to a percentage, and you've got "60% as likely" or "60 per cent as often".

The ratio of odds (rounded to the nearest tenth) was truly 0.6 to 1. But when you report this finding by saying that "doctors refer blacks and women to heart specialists 60% as often as they would white male patients", normal readers will take "60% as often" to describe a ratio of rates -- even though in this case the ratio of rates (the "relative risk") was 84.7/90.6, or (in percentage terms) about 93.5%.

(There was another set of rhetorical problems with the reporting of this particular study, one that was created by the study's authors. In fact the referral rates for white males, black males and white females were apparently pretty much all the same -- the crucial part of the pattern was that the referral rate for black females was much lower, namely around 78.8%. The 90.6% vs. 84.7% figures were created by comparing the white male data against aggregated data for white females and blacks of both sexes. But I digress.)

My discussion of this case is drawn from an article published subsequently in the same journal: Lisa Schwartz et al., "Misunderstandings about the Effects of Race and Sex on Physicians' Referrals for Cardiac Catheterization", NEJM 341:279-283, July 22, 1999. The problem was well understood by statistically well-informed people long before then, and was explicitly discussed in an earlier study in the British Medical Journal: H. T. O. Davies et al., "When can odds ratios mislead?" BMJ 316:989-991 (1998).

OK, so this is a long-standing and well understood problem, which led to some spectacularly botched (and prominently excoriated) presentations of important results back around 1999. Surely all competent science journalists understand this now, and have mended their ways?

Guess again.

Find any piece of reporting that talks about "raising the risk of X by Y%", or any of the many other ways of putting this same concept into English, and the chances are that you've found a violation of this commandment. Let me give two recent examples, among thousands lurking in the past month's news archive.

According to Steve Connor, "Childhood asthma gene identified by scientists", The Independent, 7/5/2007

A gene that significantly increases the risk of asthma in children has been discovered by scientists who described it as the strongest link yet in the search to find a genetic basis for the condition.

Inheriting the gene raises the risk of developing asthma by between 60 and 70 per cent - enough for researchers to believe that the discovery may eventually open the way to new treatments for the condition. [emphasis added]

The study in question (I believe -- the article doesn't give any specific reference, as usual for the genre of science journalism) is Miriam F. Moffatt et al., "Genetic variants regulating ORMDL3 expression contribute to the risk of childhood asthma", Nature 448, 470-473 (26 July 2007). This is another big genome-wide association study -- roughly 300,000 single-nucleotide polymorphisms were scanned in several populations in the UK and in Germany.

In this case, general information about allele frequencies is not provided (and perhaps was not available). However, this information is given in one crucial case:

In the subset of individuals for whom expression data are available, the T nucleotide allele at rs7216389 (the marker most strongly associated with disease in the combined GWA analysis) has a frequency of 62% amongst asthmatics compared to 52% in non-asthmatics (P = 0.005 in this sample).

Now, how can 62% vs. 52% be interpreted as "[raising] the risk of developing asthma by between 60 and 70 per cent"? I mean, 62 is about 19% greater than 52, not 60-70% greater. If you're guessing "it's that old devil the odds ratio", I'm sure you're right.

If you're good at mental arithmetic, you may be worried that even the odds ratio doesn't quite make it to 1.6 or 1.7 in this case: (.62/.38)/(.52/.48) ≅ 1.51.

We can't tell what Steve Connor was really talking about when he wrote that "inheriting the gene raises the risk of developing asthma by 60 to 70 per cent", but one possibility is that he (or his informant) cherry-picked some even better odds ratios, not from the "combined GWA analysis" (where GWA stands for "genome-wide association"), but some one of the data subsets, perhaps this one:

Restricting analyses to cohort members of Caucasian ethnicity, we found that the 398 cases recalling 'asthma ever' at age 42, showed a significant association (odds ratio, 1.21, 95% confidence interval, 1.04-1.40, P = 0.012). Ninety-three individuals were reported to have 'asthma attacks' in the first seven years of life (that is during 1958 to 1965), and these were strongly associated to rs3894194 (odds ratio = 1.68, 95% confidence interval, 1.25-2.26, P = 0.0005). [emphasis added]

With a bit of arithmetic, you can work out for yourself what that 1.68 odds ratio would correspond to in terms of allele frequencies.

Another possibility is that the odds ratio was inferred from the coefficient in a logistic regression, where many factors (in this case the presence or absence of many SNPs) are weighed together in a single statistical model. See below for a bit more discussion; but it remains true that describing an odds ratio inferred by such a model as if it were a risk ratio (i.e. a ratio of rates) is highly misleading.

Now, Steve Connor is not a sports columnist trying his hand at a science piece. (That's a plausible excuse for Denis Campbell's disastrously botched autism/MMR story in the Observer, memorably vivisected by Ben Goldacre in many Bad Science posts and a BMJ article.) Connor is listed as the "Science Editor" of the Independent, and he ought to know better.

And O'Connor is not the only "Science Editor" for a major publication who has violated this same commandment recently. Mark Henderson, identified as "Science Editor" of the London Times, was the author of "Genetic breakthrough offers MS sufferers hope of new treatment", 7/29/2007:

The first genetic advance in multiple sclerosis research in three decades has opened new approaches to treating the neurological disorder, scientists said yesterday.

Research has identified two genetic variants that each raises a person's risk of developing MS by about 30 per cent, shedding new light on the origins of the autoimmune disease that could ultimately lead to better therapies. [emphasis added]

The scientific publication in question (well, there are several of them, but this one will do) is "Risk Alleles for Multiple Sclerosis Identified by a Genomewide Study", NEJM, July 29 2007. And the "risk-raising" in question is, needless to say, calculated in terms of odds ratios:

A number of allelic variants had a significant association with multiple sclerosis. Of these, two SNPs in intron 1 of the IL2RA gene encoding the alpha chain of the interleukin-2 receptor (also called CD25, located at chromosome 10p15) are notable: rs12722489 (P=2.96x10-8; odds ratio, 1.25; 95% confidence interval [CI], 1.16 to 1.36) and rs2104286 (P=2.16x10-7; odds ratio, 1.19; 95% CI, 1.11 to 1.26) (Figure 4).

How do odds ratios of 1.25 and 1.19 translate to "[raising] a person's risk of developing MS by about 30 per cent"? Well, again, I believe there is some cherry-picking of numbers from data subsets going on here (the odds ratios reported are from multiple logistic regression in all cases). But odds-ratio oddities side, what are the allele frequencies involved? This time the scientific article doesn't tell us, but we can work it out.

In this study, there was a "Screening Phase", consisting of 931 Family trios vs. 2431 Control Subjects, and a "Replication Phase" involving 2322 Case Subjects, 609 Family Trios, and 2987 Control Subjects. The article's Table 2 gives us the following

SNP rs12722489 has RAF ("risk allele frequency") of .85, with odds ratio of 1.35 in the screening phase, 1.19 in the replication phase, and 1.25 in the combined data. SNP rs2104286 has RAF of .75, with odds ratio 1.26 in the "screening phase", 1.16 in the "replication phase", and 1.19 in the combined data.

An odds ratio of 1.25 for the first genomic variant, given a background rate of 85%, would imply a rate in the MS patients of 87.6% : (.876/.124)/(.85/.15) ≅ 1.25. To get an overall odds ratio of 1.19 for the second variant, given a background rate of 75%, we'd need a rate in the MS patients of 78.1% (you do the math...).

Here's what the NEJM authors say about their findings:

These variants are not rare mutations of the type that occur in diseases caused by a defect in a single gene, such as muscular dystrophy or sickle cell anemia. Rather, they are polymorphic variants that also occur in normal populations. However, each is more common in patients with multiple sclerosis than in control subjects, and each has a small effect on the risk of the disease.

More specifically, one of these variants occurs in about 85% of people without MS, and about 87.6% of people with MS; the other occurs in 75% of people without MS, and 78.1% of people with MS. (At least, these are the rates reconstructed by the logistic regression model.)

On this basis, we can calculate that the first variant increases someone's risk of MS by a factor of 87.6/85 ≅ 1.03, or in ordinary language by 3%; for the second variant, it's 78.1/75 ≅ 1.04, i.e. by 4%.

The Science Editor of the London Times describes this situation by telling us that "Research has identified two genetic variants that each raises a person's risk of developing MS by about 30 per cent".

With all respect, I submit that this is someone in acute need of further education in basic statistical reasoning and journalistic responsibility.

[There's another story to be told about why researchers (as opposed to journalists) like to report odds ratios. The basic answer is that they use logistic regression to model rates, so as to assign responsibility fairly among many factors at once, for example among many SNPs distributed in a partially-correlated way in the sample under study. This is an appropriate thing to do, in general -- but odds ratios, whether calculated directly from raw frequencies, or inferred from logistic regression coefficients, are NOT risk ratios, and should never be presented as such.

When the relative risk is very low, the odds ratio approaches it asymptotically, but for ratios of risks in the range we've been talking about, the odds ratio is a massive overestimate, and to present it as if it were a ratio of risks is massively misleading.

If you believe the logistic regression, and you want to tell the public about relative risk as inferred by the regression model rather than as estimated directly from the raw frequencies, it's straightforward to calculate the (inferred) relative risk from the (inferred) odds ratio and information about overall rates (e.g. the "risk allele frequency"), as I've done above.

But my third cup of coffee is getting cold, so further discussion of logistic regression will have to wait for another morning.]

Posted by Mark Liberman at 07:38 AM

July 29, 2007

Language and identity

Simon Baron Cohen thinks that autism is a symptom of an "extreme male brain". In today's NYT, Benjamin Nugent ("Who's a Nerd, Anyway?", 7/29/2007) mentions Mary Bucholtz's hypothesis that nerd is a name for extreme white behavior:

What is a nerd? Mary Bucholtz, a linguist at the University of California, Santa Barbara, has been working on the question for the last 12 years. She has gone to high schools and colleges, mainly in California, and asked students from different crowds to think about the idea of nerdiness and who among their peers should be considered a nerd; students have also "reported" themselves. Nerdiness, she has concluded, is largely a matter of racially tinged behavior. People who are considered nerds tend to act in ways that are, as she puts it, "hyperwhite."

If you want to read about this theory at the source, you could check out "The Whiteness of Nerds: Superstandard English and Racial Markedness", Journal of Linguistic Anthropology, 11(1) 84-100 (2001). The abstract:

Anthropological research has shown that identities that are "not white enough" may be racially marked. Yet marking may also be the result of being "too white." California high school students who embrace one such white identity, nerds, employ a superstandard language variety to reject the youth culture norm of coolness. These practices also ideologically position nerds as hyperwhite by distancing them from the African American underpinnings of European American youth culture.

[Update -- Andrew Brown at Helmintholog cites some contrary evidence from an authoritative source. (Actually, he alludes to it, but I think we can give him a little time to come up with the specific reference...)

According to Language log there is a study out suggesting that Nerds, or dorkenheimers, are identified in the Californian school system because they are “too white” and avoid all the “Black” stylings of speech and dress traditionally¹ affected by white teenagers. But this can’t be true. I remember distinctly when Zonker, who is Californian, was advertising a Nerdcare anti-sun lotion, and it was minty green. I’m away from the family archive CD, so I can’t find the link.

¹ on the “classic” timescale — ie since about 1959

]

Posted by Mark Liberman at 04:47 PM

The ambiguity of modern life

One more cartoon. Hilary Price manages to construct a wonderful ambiguous sentence turning on two facets of modern life, piercings and cellphones:

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 03:01 PM

The culture of Zippy's world

Here and there with Zippy and Gertrude Stein. And Zippy mondegreens Black Sabbath monstrously:

For those of you who haven't committed "Paranoid" to memory, here are the original lyrics for this verse:

Finished with my woman cause she couldn't help me with my mind
People think I'm insane because I am frowning all the time
All day long I think of things but nothing seems to satisfy
Think I'll lose my mind if I don't find something to pacify

[Added 7/31/07: My posting originally had "browning" (from the lyrics website linked to above) rather than "frowning", but several correspondents pointed out that many other lyrics sites had "frowning" and that "frowning" made more sense. Listening to (one) recording of the song -- on the "Reunion" CD (1998), a live performance by the original Black Sabbath members -- made it clear to me where "browning" came from: though it's hard to make out the words (this is, after all, heavy metal music), it sure SOUNDS like "browning". Maybe the initial segment is just a voiced [v] rather than [f], and I'm interpreting that as [b] before [r].]

Zippy's version is considerably more, well, concrete.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 02:20 PM

if u cn rd this u cn !@

While we're doing cartoons (here and here), here's a pair from Don Piraro on linguistic themes -- the mysteries of texting, and taboo avoidance characters:

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 01:53 PM

Illeism and its relatives

Zippy confronts Salvador Dali (at the museum Dali built in Figueres, Spain, his birthplace), and picks up Dali's habit of referring to himself in the third person -- what's sometimes called illeism (Wikipedia site here, with lots of examples):

Back in late May and early June, the American Dialect Society mulled over various oddities in the way names are used in referring to people. We started with what I'll call binomialism -- uses of FN+LN (first name plus last name) in such references -- for third persons, and then quickly moved on to 2nd-person binomialism, and then to 1st-person binomialism and to illeism in general. Here's a tour through that discussion.

[Note added 7/31/07: What follows is JUST a tour through the ADS-L discussion; it's not intended as a full inventory of illeisms and related phenomena. The Wikipedia site has lots and lots of illeism examples, from Julius Caesar on, and that would be the place to check for cases not mentioned below and the place to add your favorites.]

It started (5/26/07) with a posting of mine on 3rd-person binomialism:

I've been sort-of-watching a long Biography episode about Johnny Depp. One of his biographers provides a great many comments about Depp and his work -- always referring to him as "Johnny Depp", never "Johnny" or "Depp". (Almost everyone else uses "Johnny" all the time, though the narrator seems to use FN + LN at the beginning of a new segment of the show and in summary statements.) Very awkward effect, as though the guy was introducing Depp into the discourse again and again.

Larry Horn followed up by noting Hemingway's references to his character Robert Jordan as "Robert Jordan" in For Whom the Bell Tolls, and Jon Lighter added another literary example: Tim O'Brien "constantly refers to the protagonist of Going After Cacciato (1978; rpt. N.Y.: Dell, 1979) as 'Paul Berlin'":

He... handed the glasses to Paul Berlin... Paul Berlin watched through the glasses... Paul Berlin watched through the glasses [a second time]... And the arms kept flapping... Paul Berlin suddenly realized... Paul Berlin could not hear... So Paul Berlin repeated it. (pp. 25-6)

"These are all in the space of about 700 words and appear to be representative. O'Brien is less systematic in The Things They Carried, but frequently uses FN+LN for characters in places where it feels like an affectation."

Lighter then made the transition from 3rd-person to 2nd-person binomialism by citing uses of "Charlie Brown" in Charles Schulz's cartoons, where both sorts of binomialism abound. For the 2nd-person use, note the title of Clark Gesner's musical based on the cartoon: You're a Good Man, Charlie Brown

Doug Harris pursued the 2nd-person theme:

Alan Chartok, the main man (president, political commentator, overall overseer but not CBW) at WAMC Public Radio in Albany NY does the same thing on most of his half-hour one-on-ones with politicos. It is very awkward-sounding, as if he doesn't know whether to be familiar and call them by their first name or address them more formally...

and I explored it a bit further:

As Doug points out, the second-person case presents a social difficulty, since all of the alternatives (FN, LN, FN+LN, Prefix+LN, etc.) convey something about the relationship between the interviewer and the interviewee -- so it actually makes sense for the interviewer to avoid address forms entirely. And indeed that's what most of them do; check out Terry Gross on Fresh Air, for example.

The exceptions are (a) cases where the interviewer and interviewee are acquaintances or friends, in which case both are likely to use FN; (b) cases in which the interviewer wants to project intimacy with the interviewee (think sports interviewers and Charlie Rose), via FN; (c) cases in which the interviewer wants to express deference, usually via Prefix+LN ("Professor Chomsky").

If the interviewer avoids address forms, then for the sake of the listener, the interviewee can be periodically identified by third-person reference ("I'm talking with FN+LN", "We'll return to this interview with FN+LN in a moment"). On TV, of course, identifying information can be displayed on the screen (although this information is usually on the screen for only a little while, and usually isn't repeated when the interviewee returns after other material intervenes, unless it's been some time since the interviewee's last appearance).

As Jim Stalker pointed out, on the radio, address avoidance can make it hard for listeners to figure out who they're hearing. This is especially troublesome for me, since I collect a fair number of examples from radio interviews; often, I catch the words first time they go past, but then have to go back and listen to the recording on the program site to figure out who the speaker was.

In the case of third-person reference, avoidance of names (via pronouns) is often not available, but the choices are less socially fraught: LN is merely non-intimate (while Prefix+LN is "polite", sometimes deferential).

Having gone through the 3rd-person and 2nd-person cases, we then moved to the 1st-person case, binomial illeism. I remarked that Bob Dole was famous for referring to himself as "Bob Dole". Ben Zimmer added:

And was mercilessly spoofed by Norm MacDonald on "Saturday Night Live" in 1996 for doing so, e.g.:

Bob Dole: Bob Dole likes peanut butter. Bob Dole's never made a secret of that. (3/16/96, "Real World" sketch)

After all the ridicule, Dole hired a speech coach to force himself to use 1st-person reference. On 10/15/96 USA Today reported:

He has already largely rid his standard campaign speech of the verbal tic that's prompted the most jokes about his style: third-person references to himself as 'Bob Dole.' Friday in Dewey Beach, Del., the Kansas senator referred to himself as 'Bob Dole' only once and used the pronoun 'I' 59 times."

And after the election he came on "SNL" to poke fun at himself (11/16/96):

Norm MacDonald: Aw, come on now, Senator, it's a great impression. Listen to this: [speaking in his Bob Dole voice] "Come November 5th, a lot of people are going to be surprised by Bob Dole, because Bob Dole's gonna win this election!"

Bob Dole: [shaking head] Doesn't sound a thing like me. First of all, I don't run around saying "Bob Dole does this" and "Bob Dole does that." That's not something Bob Dole does. It's not something Bob Dole has ever done, and it's not something Bob Dole will ever do!

From there we went on to LN illeism, most famously

You won't have Nixon to kick around anymore.

Charlie Doyle noted that this is what the Yale Book of Quotations has, and what the major newspaprs reported in 1962,

But the "Dick" is frequently inserted into oral quotings and paraphrasings (as well as later writings and reminiscences)--BECAUSE OF Nixon's tendency to refer to himself as "Dick Nixon" (as well as just "Nixon")...

Larry Horn provided a rich background:

... I did this same search several years ago, when I was working on a paper touching on what I called the "Dissociative Third Person", or "Bobdolisms" for short. (The version I gave at the 2002 LSA was called "1,3: Indexicality, reference, and the asymmetries of binding".) Dole's political mentor was, of course, Nixon, so I ended up tracking down and finding on the internet a sound bite of the relevant Nixon speech (from after his loss to Pat Brown). Sure enough, it's Dickless, but like Charlie I had the same sense that we remember it [as "Dick Nixon"] because in general the form of the name appearing in the Bobdolism is the one by which the celebrity is usually known (hence Bob Dole, not Robert).

Most of my examples [see below] came from athletes' using this "third person" for themselves (almost always in the form of proper names, though, not "he", "him", or "his", which makes "illeism" a less than ideal term), following the lead of Bo Jackson, who was the athletes' Nixon/Dole of the third person. But here's one not involving a politician or athlete, just a self-styled celebrity contractor; note the reference to the "Nixonian third person".

[48] Chris Clark, a Manhattan contractor, slipped into the Nixonian third person as he described his rational for rejecting homeowners without designers: "Chris Clark can't sit down at the kitchen table with Mrs. Jones, who wants white cabinets, a granite counter and Miele dishwasher. The room for dispute is too vast. Do you know how many white Formicas there are?"
(NYT 15 July 1999, F10, "Courting the Contractor")

And here's the actual Nixon quote, direct from the audio.

[49] Just think how much you're gonna be missing. You don't have Nixon to kick around anymore. (Richard Nixon, concession speech after losing California gubernatorial election to Pat Brown, 7 Nov. 1962; usually recalled as "You won't have Dick Nixon to kick around anymore")

Some cases of the athlete's dissociative third person:

[34]    What's wrong with [Larry] Johnson? Nothing, he insists. "People know what L.J. can do," he said. "I know what L.J. can do."
(basketball player Larry Johnson on his offensive struggles, NYT 22 Nov. 1996, B11)

[35]    Can they [the New Jersey Nets] re-sign Cassell? "I have to see what's right for Sam Cassell," said Cassell, who wants a salary around $5 million. "Money is going to be the key."
(basketball player Sam Cassell on his salary dispute with the Nets, NYT 22 April 1997)

[36]    Establishing a balance between being the world's greatest basketball player as well as a purveyor of cologne, footwear, briefs, and motion pictures has been a chore at times. "As you look at my career, those things haven't defined Michael Jordan, he said. "Michael Jordan's basketball skills defined him."
(M.J. on the difficulties of being M.J., NYT 9 Sept. 1997)

[37]     "I just want to win. The bottom line is whatever Todd Hundley has do to help this team win, I'll do."
(Catcher Todd Hundley's travails in learning to be an outfielder, NYT 13 July 1998)

[38]a.   "I gave Pittsburgh every opportunity to sign Neil O'Donnell", O'Donnell said.
(Chicago Sun-Times 1 Mar. 1996, p. 110)

b. O'Donnell, who was benched in the fourth quarter with the Jets leading, admitted: "It's a hard thing. I'm just doing what Neil O'Donnell can do.
(NYT 3 Nov. 1997, on travails of N. Y. Jets quarterback Neil O'Donnell)

[39] "I'm just going to do the things Derek Harper has done for 10 years, and hopefully that will be enough."
(NYT 8 Jan. 1994, p. 32)

[40] "I just want to go to a place where Howard Johnson is going to put up some big numbers."
(Nov. 1993 radio interview with baseball player on signing with Colorado Rockies)

[41]   I feel I'm just out there doing the sort of things Lenny Harris can do.
(baseball player Lenny Harris in radio interview on WFAN 29 July 2000)

[42] He said he'd take of me, and it hasn't happened yet. I want to be there, but I've got to look out for Tim Hardaway and Tim Hardaway's family.
(basketball player Tim Hardaway, complaining of his treatment by coach Pat Riley, NYT 29 Aug. 2000, D2)

and from Bob Dole's own mouth:

[43] [Responding to Ted Koppel's query about whether he intended to stress the character issue in the campaign] "I don't think so," Dole said. "My view is that I'm going to talk about Bob Dole, and I've been doing a little of that."
(ABC "Nightline" show, March 1996)

[44] I am very proud to be from Russell, Kansas, population fifty-five hundred. My dad went to work every day for forty-two years and pround of it, and my mother sold Singer sewing machines...to try to make ends meet. Six of us grew up living in a basement apartment. That was Bob Dole's early life, and I'm proud of it, because we learned a lot about values, about honesty and decency and responsibility and integrity and self-reliance and loving your God, your family, your church, and your community..."
(Dole speech in Columbus, Ohio, 3/14/96)

Crucially, the name shows up when the celeb is viewing himself (I have no examples from women) from the outside, so we would never hear Dole saying "That was my early life, and Bob Dole is proud of it", or "Bob Dole is going to talk about me", or pausing in the middle of a speech to murmur "Bob Dole needs to pause a moment to {take a sip of water/visit the rest room}". (Except maybe on the Saturday Night Live parodies of him that were popular during the 1996 presidential campaign.) Finally, here's NYT sports media reporter Richard Sandomir during the '96 campaign on this "affliction":

Some strange, grammatical, mind-body affliction is making some well-known folks in sports and politics refer to themselves in the third person. It is as if they have stepped outside their bodies. Is this detachment? Modesty? Schizophrenia? If this loopy verbal quirk were simple egomania, then Louis XIV [sic] might have said, "L'etat, c'est Lou."

Third Personspeak's greatest sports champion is Bo Jackson, the former football-baseball star. Bo knew Bo intimately, but he had a more distant relationship with "I." Bo quoted Bo so frequently that Bo needed another Bo to speak for Bo. "The key was Bo wants to play baseball," Bo once said. "I want to see what Bo wants to do. Let me state a fact: Bo Jackson can play baseball."
(--Richard Sandomir, N. Y. Times Week in Review 10 Mar. 1996, p. 2)

"Bo" here is a FN illeism. Jon Lighter turned to popular culture for another FN illeism:

Back in the '50s there was a Bugs Bunny cartoon involving Arab anti-rabbit terrorists from the 1001 Nights. Hassan carried a scimitar that he would swing at Bugs while crying, "Hassan CHOP!"

and Ben Zimmer added a FN+LN musical case:

A more influential appearance of the dissociative third-person in '50s pop culture was the song "Bo Diddley" by Bo Diddley (June 1955, Checker):

Bo Diddley bought his babe a diamond ring,
If that diamond ring don't shine,
He gonna take it to a private eye,
If that private eye can't see
He'd better not take the ring from me.

This was followed up by other third-person songs such as "Diddley Daddy," "Hey, Bo Diddley," and "Bo Diddley's A Gunslinger." And to bring things full circle, Bo Diddley appeared with Bo Jackson in Nike's "Bo Knows" commercials of 1989-90 ("Bo, you don't know diddley!"). (link)

But Lighter objected that

the Diddley usage was not so flagrantly illeistic. There's Diddley, then there's the unnamed narrator of the song, then there's" Diddley," a possibly fictitious character in the song.

Meanwhile, Zimmer went back a year in the blues:

... a year before that was "Don't You Know" by Ray Charles (July 1954, Atlantic):

Say, have you heard baby,
Ray Charles is in town.
Let's mess around till the midnight hour,
See what he's puttin down.

Presumably there are other musical examples stretching way back in time.

Two more recent examples: Charlie Doyle recalled that

Some comedy show on TV has a running gag in which a Karl Malone impersonator is featured saying foolish things in Dissociative Athletic BoSpeak.

and Michael Covarrubias identified the show:

I believe that's Jimmy Kimmel playing the recurring character on The Man Show.

And of course there's the often mimicked self-referencing declaration from Seinfeld, "George is getting upset!" George uttered the line (and other similar lines) several times on various episodes. The habit was introduced on the show by the character "Jimmy" (introduced in the locker room after a basketball game) who pushed the style to terrible limits: "Hey, look. Hank's got a new boyfriend. Jimmy's not threatened by Hank's sexuality. Jimmy's happy for Hank." -- "Hands off Jimmy! Don't touch Jimmy!" (episode 105; 16 March 1995).

To sum up: 3rd-person binomialism, 2nd-person binomialism, and proper-name illeism of all three types: binomial ("Bob Dole"), LN ("Dali" and "Nixon"), and FN ("Zippy" and "Bo").

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 01:17 PM

Communicating

Yesterday's Zits:

I imagine that Eleanore would understand Jeremy's message just fine -- the problem for Hector is that he doesn't know the context, and mistakenly assumes that just because he's Jeremy's best friend, he should understand what Jeremy means rather than just transmitting the message verbatim. Of course, that kind of message is really hard to remember well enough to transmit accurately if you don't understand it.

Homework assignment: rewrite paratactically (i.e. by stringing phrases together without embedding, using explicit or implicit anaphora to keep track of the connections) what Jeremy expressed syntactically (here, using complement clauses): "Tell her that Brittany said that Zuma said that Sara said that it's okay with her if that's what D'ijon said." Extra credit: evaluate your experience in the light of Robin Dunbar's theory about the role of gossip in the evolution of language.

Today's Zits focuses on medium rather than on message:

In Mountain View on Friday, I overhead a conversation among several of the younger academics at the Google Faculty Summit -- 30- to 40-year-olds -- complaining that their students think that email is for old people (a category that this group is not yet used to being part of). Does this reliance on IMing and SMS predict future trouble for the iPhone, whose texting capabilities seem to be poor? Virtual keyboards on small screens seem to be hard for many people to use -- it's not obvious that it'll be harder to adapt to this than to multitap or T9, but maybe it will be. Certainly the fact that the iPhone entry system (systems?) is not adaptive, and that an IM client is not even among the initial set of applications, suggests that its designers were not focusing on this feature the way they would have if they thought it was really important. Maybe they, like Steve Jobs, are on the wrong side of this generation gap?

Posted by Mark Liberman at 08:52 AM

July 28, 2007

From the headline desk at Language Log Plaza

Here at the headline desk at LLP, we don't write headlines, we analyze 'em. The latest headline episode began on Wednesday with Ben Zimmer's puzzlement at the following head on a Reuters story:

Taliban say kill Korean hostage, set new deadline

At least that was the head here at the time. What you get at that site now is

Taliban kill South Korean hostage in Afghanistan

and a somewhat rewritten story. In the interim, Barbara Partee found the rewritten story with yet another head:

Taliban say they killed Korean hostage

These two are entirely ordinary as headlines, but the original is remarkable. We gathered around the water cooler and I delivered a long lecture trying to assimilate the "say kill" head to some other remarkable heads I looked at back in 2005 (which I'll get to in a little while), while Heidi Harley speculated that it was just an editing error. Then Barbara and Ben found some more heads like the first, most of them on Reuters stories, and we were obliged to find some sort of analysis for them.

So, some hits. From Barbara:

Researchers say find key nerve injury protein (link) [Reuters]

US troops say find second site with vials, powder (link)

Ben then searched on Factiva for {"say find"} and found 31 heads, all but one from Reuters, e.g.:

U.S. scientists say find cause of degenerative disease.
Reuters News, 3 July 1991

Peru rescuers say find survivors from plane crash.
Reuters News, 2 October 1996

Scientists say find gene for child cancer syndrome.
Reuters News, 7 May 1997

Congo rebels say find massacre of Tutsis
Reuters News, 18 August 1998

Thai police say find, lose North Korean diplomat.
Reuters News, 9 March 1999

At this point the water cooler crowd proposed that that the headlines are of the form:

plural subject of say -- say -- complement of say: finite plural present-tense VP

which is almost an ordinary headline, except that the finite complement of say is missing a subject:

Researchers say find key nerve injury protein 'Researchers say they find key nerve injury protein'

In this analysis, both the main clause, with say, and the complement clause, with find, are in the "headline present", interpreted as present, present perfect, or past, depending on the context. There's nothing unusual about that. But subject omission in a complement clause (though attested) is rare in English, even in registers where subject omission in main clauses is commonplace, as in:

Saw two foreigners the other day taking pictures of a building in Times Square. Don't know what country they were from. [1sg subject] (Clyde Haberman "NYC" column, "Picture-Takers, Noisemakers And Evildoers", NYT 6/11/04, p. A23, beginning in diary form)

Long Beach (AP)... Drago, a 3-year-old Belgian Shepherd, disappeared from Officer Ernie Wolosewicz's back yard Sunday. Turns out he was picked up by animal control. [dummy it subject] (Palo Alto Daily News, 11/6/03, p. 35)

"There is a guy who would like to be on the board [of catering firm Caterair]. He's kind of down on his luck a bit. Needs a job. ... Needs some board positions." [3sg subject, supplied in context] (Ron Suskind, "Without a Doubt", NYT Magazine, 10/17/04, pp. 48-9, about George W. Bush)

The subject-omission proposal is supported by headlines with 3sg present finds in the complement (just in case you thought kill and find in the earlier examples were base-form, rather than present-tense, verbs):

3sg says: U.S. researcher says finds Atlantis off Cyprus (link) [Reuters]

3sg says: Extreme Networks says finds deficiencies in option practices (link) [Reuters]

and by headlines with past-tense found in the complement:

3pl say: Iraqi forces say found more US-made weapons (link)

3pl say: US forces in Iraq say found more Iran-made weapons (link) [Reuters]

3sg says: Statoil says found oil northwest of Shetlands (link) [Reuters]

3sg says: Researcher says found location of the Holy Temple (link)

This particular headline formula seems to be mostly a Reuterism, but it has a robust life on that news service. Subject omission in complement clauses lives!

Now to the odd headlines from 2005, all of them reported by Ron Hardin on the newsgroup sci.lang (and many of them commented on by me on the American Dialect Society mailing list at the time). All but one were from AP wire stories Hardin found in the Washington Post:

Ind. Fire Said May Take Days to Burn Out

Seepage Said Likely Didn't Cause Oil Spill

Hunter S. Thompson Said Spoke of Suicide

Mercury Damage to Babies Said Costs $8.7B

Drugs to Quit Smoking Said Show Promise

Japan Flight Said Hits Turbulence; 4 Hurt

DeLay Said Agreed Not to Extend Dad's Life

These are all of the form:

subject of finite VP -- said -- finite VP [present or past tense]

The other headline came from the Scientific American; it has a reduced predicate, found 'is found':

Starless Galaxy Said Found

The interpretation of such examples is along the lines of:

a source has said: subject -- finite VP

(Notice how different this is from the "say find" headlines, where the initial NP serves as the subject of a form of SAY.)

Your first inclination is probably to try to relate the Hardin headlines to garden-variety headlines like

Risk (Is) Said to Increase with Age

and, yes, the infinitival-passive headline formula exemplified here no doubt played a role in the creation of the Hardin formula, but there's no way to see the Hardin pattern as simply a telegraphic version of the infinitival-passive pattern (which is itself quite close to ordinary English). The Hardin pattern is, in effect, a construction of its own, restricted to headlines (and possibly to just a few headline writers).

I'd argue, in fact, that in general it's a mistake to see "telegraphic" or "truncated" patterns as literally reductions of fuller versions, whatever the history of the "abbreviated" versions. Once the shorter versions are out there, they can pick up new meanings and discourse functions and can undergo syntactic change. But that's a topic for another day.

Also for another day is the use of Taliban in the original example. It's a "zero plural" there, functioning as a plural NP syntactically but (in English) lacking an overt mark of plurality. In addition, Taliban is also used as a mass noun by some people (so that it functions as a singular NP syntactically), and some people have Taliban "doubly categorized", sometimes used as a count plural and sometimes as a mass singular. These wrinkles in English morphosyntax belong in a follow-up to my posting "Plural, mass, collective".

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 08:55 PM

ILO5

Tomorrow, Dragomir Radev and eight amazingly smart high-school students will be taking off for St. Petersburg (Russia, not Florida), to participate as the American entrants in the 5th International Linguistics Olympiad. There are two teams: the first team is Rachel Elana Zax, Ryan Aleksandrs Musa, Adam Classen Hesterberg, and Jeffrey Christopher Lim; the second team is Rebecca Elise Jacobs, Joshua Stuart Falk, Anna Tchetchetkine, and Michael Zener Riggs Gottlieb.

The students won their spots based on their performance in the 2007 North American Computational Linguistics Olympiad (described in this post from last February). Lori Levin and Tom Payne are the co-chairs of NACLO 2007, and Drago Radev is the U.S. team's coach.

Here is question no. 5 from the individual phase of last year's International Linguistics Olympiad competition, held in Tartu, Estonia:

Below you see sentences in English and their translations into the Ngoni language. In fact each English sentence can be translated into Ngoni in more than one way, but only one variant is given here.

1. Kamau bought a farm for the women
2. The grandmothers bought a hoe for the grandson.
3. The grandsons bought beer for the guest.
4. The grandmother bought a knife for Kamau.
5. The guest bought a goat for the grandsons.
6. The grandson bought a farm for Zenda.
7. Zenda bought a house for the grandmother.
8. The guests bought a knife for the woman.
9. Mwangi bought a hoe for the guests.
10. The women bought a house for Mwangi.

Kamau aguli vadala mgunda.
Vabuya vaguli mjukulu ligela.
Vajukulu vamguli mgeni ugimbi.
Mbuya guli Kamau chipula.
Mgeni avaguli mene vajukulu.
Mjukulu amguli mgunda Zenda.
Zenda amguli mbuya nyumba.
Vageni vamguli chipula mdala.
Mwangi avaguli vageni ligela.
Vadala guli Mwangi nyumba.

Assignment. Each of the Ngoni sentences 11-16 contains an error. Translate these sentences into English, explain what the error is in each case, and then correct it, giving for each example four correct sentences in Ngoni that describe the same situation.

11. Mdala guli ugimbi Mwangi.
12. Mdala mguli Mwangi ugimbi.
13. Mdala aguli ugimbi Mwangi.
14. Kamau vamguli vabuya mene.
15. Kamau guli mene vabuya.
16. Kamau vaguli vabuya mene.

I'm sure that you all see the solution immediately -:). But if you want to check that you've got it right, the answers are here.

A search of the Google News Archive for {"linguistics olympiad"} comes up empty. If the LSA had a public relations department, we ought to fire them. (Lucky for them, they don't exist.) Maybe we could hire those cheese guys.

Seriously, in a world where airport bookstores are full of language puzzles, and Will Shortz is a one-man conglomerate, how does something like this stay hidden?

[Update: and they won!]

Posted by Mark Liberman at 09:02 AM

July 27, 2007

Exotic-Looking Typefaces

Those who liked the pseudo-Cyrillic of the Vancouver Sun cartoon that I discussed yesterday will also likely find interesting this post by Geek Of All Trades on "Faux Exotic Typefaces", the highlight of which is an example of Devanagari made to look like Urdu.

Addendum 2007-07-28: Another interesting post on this topic is this one by Joel Martinsen on Chinese made to look like Tibetan. Incidentally, I've seen the movie "Mountain Patrol" whose title he mentions and recommend it highly.

Posted by Bill Poser at 07:14 PM

Running time backwards

From the Palo Alto Daily News, 7/24/07, "Nephew of man killed by police to be tried as an adult" by Mark Abramson (p.3):

This was the first officer-involved shooting in San Mateo since Labor Day, when a homeless man wielding a knife was shot. The last such shooting in the city since that incident was almost 24 years ago, Raffaelli said.

In the first sentence we have an ordinary use of temporal since, picking out the time span elapsed between an "anchor time" (last Labor Day), denoted by the object of since, and a later time (the time of the recent San Mateo shooting).

But in the second sentence the time span is between an anchor time (again, last Labor Day) and an EARLIER time (of the shooting 24 years ago). Time seems to be running backwards; before, not since, is the appropriate P (preposition or subordinator) here.

Still, the writer didn't just pull a P out of a hat. Since is wrong (well, non-standard), but it's close.

The writer seems to have generalized since from referring specifically to elapsed time (along "time's arrow") to referring to any span between two times. For him, the object of since picks out one end of the span, in effect a point of view from which the span is measured, and that point can be at either end of the span.

[Added 7/31/07: Several correspondents suggest that the odd use here might be the result of careless revising or editing. Indeed it might. But then again, maybe not, and there's an interesting general question -- see below -- that arises from thinking about the example.]

Standard English has forward-looking since and backward-looking before, but no double-sided temporal P, one covering both directions. In a roughly similar fashion, standard English has forward-looking tomorrow and backward-looking yesterday, but no double-sided temporal adverb, meaning 'one day from today'. Such lexical items aren't unnatural -- I believe that languages have been reported with 'one day from today' adverbs (at the moment I'm away from sources I could check [7/31/07: Priyanka Chauhan has written from New Delhi to tell me about Hindi kal]) -- but we'd expect them to be relatively rare, since they're less informative than the more specific items. Still, a double-sided temporal P would have its uses, allowing speakers to view things from either end of a time span, the way the PADN writer (who used the same anchor time in both sentences) wanted to do with since. No doubt I'll soon hear of languages with double-sided Ps. Or of other English speakers who use since this way (it's not in the OED, but then plenty of innovative non-standard usages aren't in the OED, and shouldn't be).

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 02:24 PM

Google Faculty Summit

Yesterday and today, I'm in Mountain View for Google's Faculty Summit. It's been fun. I've had a chance to catch up with old friends -- both Googlers like David Talkin and Franz Och, and other academics like Ed Fox, Mary Harper, Bob Futrelle and Richard Sproat. Among the many interesting new people that I've met, I've especially enjoyed conversations with Graeme Bailey from Cornell, about musical search; Lillian Cassel from Villanova, about whether ontologies are discovered or invented; Munindar Singh from NC State, about the pragmatic web; and with one of the Googlers who invented/implemented Google Scholar.

There are a few differences with the last Google Faculty Summit that I attended. This one is bigger -- big enough so that so far I haven't even bumped into some of the attendees that I know, like Christiane Fellbaum, one of the authors of WordNet. And there was no NDA to sign, unlike the rather ferocious one we had to sign on the way into the Googleplex last time; but on the other hand, there are a lot of hip but tough-looking security people around.

I'll mention just three of the technical highlights so far.

First, there was an announcement of two new ways for faculty to use Google's resources remotely for research purposes ("Drink from the firehose with University Research Programs", 7/26/2007). Google Search "is designed to give university faculty and their research teams high-volume programmatic access to Google Search, whose huge repository of data constitutes a valuable resource for understanding the structure and contents of the web". And Google Translate "will allow researchers programmatic access to Google's translation service", including "detailed word alignment information" and/or "a list of the n-best translations with detailed scoring information".

Second, there was an excellent talk by Mehran Sahami, "Text Mining in Information Retrieval: Theory and Practice". (The talk was videotaped and perhaps it will show up on YouTube, as many of the open talks at Google do, I'm not sure.) Its technical content has been published, as Mehran Sahami and Timothy Heilman, "A web-based kernel function for measuring the similarity of short text snippets", Proceedings of the 15th international conference on World Wide Web, 2006.

I don't have time to explain it now (I'll try to get to it later), but trust me, there are some very neat ideas in there. Mehran's presentation was also extremely well crafted, presenting the key issues clearly and accessibly, but without dumbing them down -- that's why I hope that the video ends up being posted on YouTube.

Finally, there were a couple of presentations on Google Code for Educators, which looks like it has some neat stuff in it, not only for use in courses but also for self-education. Want to learn Ajax programming, or how to use Hadoop and GFS? There are what look like accessible tutorials and course materials.

Posted by Mark Liberman at 10:00 AM

July 26, 2007

Pseudo-Cyrillic

This cartoon on the editorial page of today's Vancouver Sun is linguistically interesting. Look at how the word "no" on the sign held by the inukshuk is written.

The sign ичёт makes no sense if one actually reads Russian, in which it would be pronounced /itʃot/, which as far as I know means nothing at all. It is evidently intended to be read as /njet/ "no". The letter и is like an <N>, the ч is similar to a <Y>, the ё is an <E> with a diacritic that means nothing in English but makes the word look Cyrillic, and of course the т closely resembles a <T>. The actual Russian spelling is: нет.

The author presumably assumed that Canadian readers would know the Russian word for "no" but would not be able to read Cyrillic letters. These are probably correct assumptions. That people who otherwise know no Russian can be expected to know the word for "no" is an interesting relic of the Cold War. I don't think that is true of other languages. People who have studied a language a bit are likely to learn and retain the word for "no", but I suspect that Russian and German (because of all those movies about WW II) are the only languages whose word for "no" can be assumed to be known by people who have never studied them. People who know only a few words of Chinese or Arabic or Italian tend to know things like the names of foods and how to say "hello", but not "no".

Addendum 2007-07-28: A couple of readers have raised the possibility that the Russian letters are intended to spell a Russian word in Cyrillic. The hypothesis is that the intended word is учёт /utʃot/, literally "inventory". This word is often used in Russian stores as a shorthand for "closed for inventory". During the Soviet period workers in stores did not care about sales or customer service, so they would take inventory during regular business hours and put up the "(closed for) inventory" sign whenever they felt like it. As a result, it now has the connotation "Fuck off, you're not welcome here.".

It isn't trivial to decide what the intention of the cartoonist was. The Russian speakers who have written are divided as to the appropriateness of "inventory" in this context. One says that it is not appropriate since the Arctic is not a store; for the other this apparently doesn't matter. The fact that the "inventory" interpretation requires an error in the first letter argues in favor of the "no" hypothesis but is hardly conclusive. What leads me to continue to favor my original interpretation is that the cartoon is directed to a readership most of whom have no Russian to speak of. They can be assumed to know the word "nyet", but can hardly be expected to know "inventory", much less its Soviet-era connotation. It is not out of the question that the cartoon is even cleverer than it seems and that the cartoonist knows Russian and had both interpretations in mind. The cartoonist, Roy Eric Peterson, is quite well known, but I have no information regarding his knowledge of languages. Since he was born in 1936 he did grow up during a time in which, due to the Cold War, it was more common than it is now for high school and college students to study Russian, but I have no idea if he actually did.

Posted by Bill Poser at 10:38 PM

"Official" Hispanic Interns in Washington

Thirty-four Latino college students are currently serving as congressional summer interns, reports the Washington Post. Among other things, they object to being called Hispanics, preferring Latino instead. They feel that Hispanic is an "oppressive, colonial term that emphasizes the Spanish (European white) part of their identity." So why did this program choose the name, Hispanic? Esther Aguilera, president of the summer program, says that a few years ago the U.S. census used the designation, Hispanic, "making it the official term."

We might ask how terms like Hispanic get to be official. It's usually up to legislatures to try to make such decisions, as they have recently when arguing about a "national" language. Language Log has been on this matter, as the recent posts by Ben Zimmer illustrate (here) and (here). As far as I can tell, however, the U.S. Census Bureau lacks the authority to designate Hispanic as the official way to define these students.

The reporter's interviews with interns point out the conflict these students face in trying to hang onto their Latino identity while also taking on whatever identity that is considered American. That's a hard thing to do and it reminded me of the time I spent, back in the mid 1970s, trying to help the San Francisco Public Schools address the issues put before them by the U.S. Supreme Court decision, Lau v. Nichols 414 U.S. 563 (1974). 1,800 of the 2,856 Chinese speaking students in that school system did not get supplementary English instruction and, of course, they did poorly in school. They filed a class action suit that wended its way to the Supreme Court, which ruled that the schools should develop a plan to remedy the problem. Not knowing how to create such a plan, the school's administration called on the Center for Applied Linguistics for help.

Realizing that this was a very sensitive social, political, and educational issue, Rudy Troike and I began by flying from DC to San Francisco weekly for multiple, all-day sessions with leaders of the Chinese, Hispanic (yes, that's the term we used then), Filipino, and Japanese communities to try to get their input and guidance about what kind of supplementary instruction in English they would agree to and ultimately accept. There we found many of the conflicts and confusions that are still being reported in the Post article.

Of the four language groups, we met the most resistance to teaching English at all (especially TESOL classes) from the Spanish speakers, who argued strongly for preserving classroom instruction in Spanish. The Chinese were also interested in preserving Chinese language instruction (although it was never clear which Chinese language should be used), but they were more willing to have supplemental English instruction. The Filipino representatives were more interested in in preseving Filipino culture than in using Tagalog for instructional purposes and they seemed willing to have TESOL classes. The Japanese community leaders felt little need to have their children taught more English, since they were topping out in the scores anyway. They wanted their children to learn Japanese as their second language. Out of these meetings came a plan that was a mixure of transitional bilingual education, as it is called today, with more intensive English instruction for those who wanted it. Not perfect, but at least a start.

However much these four groups differed about the 1974 Court's ruling for supplementary instruction in English, they all desperately wanted to preserve as much as they could of their cultural heritage. Almost all wanted to keep their language background alive. Based on the newspaper's report of these Latino summer interns, it doesn't look like things have changed much. Some lament that they've lost the language of their ancestors but yearn to keep some remnants of the culture. All seem to struggle with their identity... as we all do in one way or another.

Posted by Roger Shuy at 01:43 PM

Unhinged on phonics

Every once in a while, I read something that makes me wonder whether I've strayed into a parallel universe. This morning, it was a passage in Anna Jane Grossman's NYT article "Is Junie B. Jones Talking Trash?" (7/26/2007):

But more than a few parents have taken issue with Junie B., as she is called. Their disagreement is a pint-size version of the lingering education battle between advocates of phonics, who believe children should be taught proper spelling and grammar from the outset, and those who favor whole language, a literacy method that accepts misspellings and other errors as long as children are engaged in reading and writing.

I was so surprised by this that I checked the wikipedia entry. No, I'm still on a world-line where phonics means "teaching children to connect sounds with letters or groups of letters".

Does the general public really think that the debate over the role of phonics in reading instruction is about whether it's OK for kids to be exposed in print to inappropriately regular past tense verbs ("runned") or non-standard adverbs ("real mad")? Or are Anna Jane Grossman and her editors suffering from cognitive deficits caused by chronic exposure to second-hand rhetorical smoke?

A bit of web search supports the second hypothesis. Some random examples, suggesting that ordinary folk understand the term phonics in something close to its real meaning, and in fact often associate it with creating or understanding non-standard spellings:

I laughed histerically when Christine and Shana gave me this in High School... and I probably laughed nearly as hard when I found it and re-read it just now.... and for no further adue (dunno how thats spelled... but phonics it out if need be lol)

I asked my brother about her grades in spelling and all he said was that they are doing the phonics thing so no one knows how to spell anymore. [...] I'm curious about this phonics crap. When are the kids supposed to learn to spell correctly? Someone who knows the logic behind this program please comment and fill me in.

This is not the first time that the NYT has warped reality in order to take sides in the phonics v. whole language controversy. The last case that we discussed involved a spectacular distortion of historical fact. This time, the method is a distortion of standard and commonly-accepted word meanings.

Posted by Mark Liberman at 10:41 AM

The ecology of peevology

Over on OUPblog I write today about the use of the word carbon to generate new eco-buzzwords like carbon-neutral and carbon footprint. A letter-writer on Salon recently got into a tizzy over this point, saying that the new usage confuses carbon with carbon dioxide. As I explain in the OUPblog column, there are perfectly good reasons for referring to carbon rather than carbon dioxide in expressions like carbon-neutral, but it's nonetheless true that carbon is often used these days to refer elliptically to the emission of carbon dioxide and other greenhouse gases into the atmosphere.

In the comments section on Salon, the initial argument over carbon semantics spilled over into generalized griping about disfavored buzzwords and linguistic imprecision. As should be unsurprising to Language Log readers, the Salon commenters opened up yet another forum for linguistic naming and shaming. As is typical of such forums, the subjects of the complaints were all over the map, from dialectal variants (acrost) to pleonasms (free gift) to punctuation issues (the serial comma) to common eggcorns (hone in on) to Briticisms creeping into American usage (went missing). All in all, it's a good example of what the nonpareil linguablogger Mr. Verb has taken to calling peevology.

Mr. Verb's adoption of peevology was inspired by a recent column by Jan Freeman in the Boston Globe (though she spelled it peeve-ology), which in turn was inspired by a Language Log post I wrote a couple of years ago on peeveblogging. In his latest post on the topic, "Peevology and its semantic field," Mr. V considers whether peevology should properly refer to the collection and public airing of language peeves, or instead to the study of such peevish behavior (as Freeman had originally intended). Interestingly enough, two commenters in the Salon forum raised a parallel question with the word ecology:

Comment 1: People always tal[k]ed about being upset about the "ECOLOGY," when that wasn't what they meant at all. -OLOGY means the "study of," whether it's theology or psychology or whatever. Ecology just means the study of eco systems or the environment.

Comment 2: As a biologist I have the same reaction when someone claims something is bad for "the ecology." Ecology is a field of study, like biology, geology, and all the other -ologies. I doubt they mean that using plastic shopping bags is bad for the study of organisms and the way they interact with their environment. What people are trying to say is that plastic shopping bags are bad for the environment.

Ecology has indeed expanded its semantic range from the scholarly study of the environment to the environment itself. It's also taken on extended meanings to refer to complex systems that mirror the interrelatedness of the natural ecosystem, as in the ecology of language. Such polysemy is endemic to our linguistic ecology, so it's fitting that the neologism peevology should develop its own polysemous behavior right out of the gate.

[John Cowan writes in: "My psychology is such that I get peeved when people complain about the extension of -ology from the field of study to the object of study."]

Posted by Benjamin Zimmer at 09:51 AM

July 25, 2007

The new French: tortured by work?

The Economist (7/21/07, p. 51) reports on responses to Nicolas Sarkozy's call for the French to get down and WORK. "Sweating in Sarkoland: Coping with the irksome notion of hard work" comments:

Reconciling the French with hard work could prove ambitious. The low Latin root of the French word travail is tripalium, an instrument sometimes used for torture.

We are asked to suppose that modern French speakers using travail call to mind a torture instrument of the Inquisition. This is the Etymological Fallacy in full bloom, and in fact we've looked at travail and its etymology here on Language Log, only a couple of weeks ago.

On the other hand, it's not entirely implausible to think that some speakers of French might see an unpleasant penumbra around travail 'work', given the existence of travail 'pain, suffering'. But maybe not. How would we find out?

Note: "Just ask them" is not a good plan of research. If you just ask people to rate travail on a scale from negative to positive, how do you know which word travail they're rating? (They might even be thinking of the count noun travail 'a literary work' or one of the other items travail.) And if you ask them if the painful travail affects their feelings about the merely work-a-day travail, you're inviting positive responses, by juxtaposing the two words. So a cleverer and more indirect approach is called for.

And how would we distinguish attitudes towards words from attitudes towards their referents? After all, speakers of English might -- probably do -- have somewhat negative attitudes towards work that would show up in responses to the word work, even though English has no work 'pain, suffering'.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 02:18 PM

Banning "loopholes"

We've recently witnessed the legal battle over banning words like "rape," "assault," and "rape kit" in a Nebraska rape trial (here) and (here) and now Slate suggests that "loophole" should join the growing list of banned words, this time not in trials but in the context of news reporting. "Loophole," says writer Jack Shafer, is a loaded, partisan word that implies wrongdoing and scandal where none exists (he also adds the verb, "skirt," to his list of words to be avoided by the press). He finds 45 instances of this questionable, non-objective use of "loophole" that have appeared during the last six months in major US newspapers and he cites a few impressive examples.

In case people haven't noticed, language changes. "Loophole," says Shafer (with support of the OED) dates back to the 16th Century, when it referred to a narrow opening in a wall through which defenders could shoot arrows or throw stones at attackers -- a pretty useful and positive meaning in those olden times. Later, following the meaning-change process of pejoration, "loophole" took on the more sinister senses that today associate with finding gaps in the law, such as the tax code, that allow one to "skirt" around what is legally restricted. Shafer lays this change in meaning at the feet of "rhetorical con artists."

Word meanings tend to do this and rhetorical con artists are not the only perpetrators. Some words take on more specialized meanings, as when Old English "deor," a word for animals in general, eventually specialized to refer only to those pesky animals that eat the flowers in my garden and ruin my saplings when young bucks rub the velvet off their antlers, shredding the bark. But that's only my personal pejorative sense of "deer" and mercifully it hasn't caught on. Most people regard deer as beautiful, Bambi-like creatures. Some words generalize far beyond their original specific senses ("quarantine" no longer means 40 days). Others ameliorate, turning a negative sense into a positive one over time ("marshall" is no longer a mere horse servant and "pretty" no longer means sly). So there's nothing very surprising about "loophole" joining the meaning-change process by taking on a pejorative sense. We no longer have much need for small holes through which we can shoot arrows at our attackers anyway, so the word is up for figurative grabs.

Interestingly, Bryan Garner, in his Dictionary of Modern Legal Usage, says Shafer may be a bit off in his etymology. Garner says that the word, "loop," in the compound "derives from the medieval Dutch verb, lupen, meaning to lie in wait, watch, or peer." He reports that by the end of the 17th Century, it "took on its figurative sense in reference to an ambiguity, omission, or exception in a statute or other legal document. Today this figurative sense prevails." (538)

Whether derived from a narrow hole in a wall or from the Dutch sense of lying in wait, we find "loophole" in our news almost daily and Shafer may be right that using it can insert a bit of ideology into news reports that may be expected to be objective. Geez, language is complicated, but that's why it's so much fun.

Posted by Roger Shuy at 01:44 PM

I18N invective

In this digital and international age, it's hard out there for a Bowdler. Just think how tough it is to find all the spammers' creative ways of spelling words that they hope will attract an occasional sucker.

My own introduction to the censor's side of this duel of wits came in 1982, back when Disney World's Epcot ("Experimental Prototype Community of Tomorrow") first opened. AT&T sponsored a large exhibit there, and one of the initial installations was a real-time speech synthesizer that I had helped to develop. The idea was that visitors could type text on a keyboard, and hear the synthetic results immediately every time they hit 'enter'. This was so long ago that the system ran on a PDP-11, or perhaps it was clockwork, I'm not sure...)

Anyhow, the Disney people's immediately saw that we'd have to figure out how to thwart kids who type taboo words or phrases. So of course we added a list of words to the pronouncing dictionary, all with the pronunciation "cough" (or sometimes "cough cough"). But this wasn't really enough, of course, since we also had letter-to-sound rules, and with a bit of effort kids could figure out how to get the system to deliver their message. (Of course we tried to forestall this as well, but in the battle between censorship and creativity, censorship usually loses sooner or later.)

I was reminded of this by an email from Mike Albaugh:

Reading your "Expressions of negative Clippy feelings" post I flashed back to two memories.

In one, a friend was bitterly criticized on a online "forum", because he had ported a coin-op game to the PC, and left the "stop list" of words that users should not be able to enter as "names" in the high-score list. Someone had run "strings" or the DOS equivalent on the game, and was _outraged_ that three of the words, adjacent simply because they were alphabetized, formed what the complainer considered a phrase, and an obscene, racist one at that. No amount of explanation on my friend's part would mollify this possible troll.

Perhaps for similar reasons, the VMS "password suggestion" feature "encrypted" its stop-list. Not industrial-strength, but a bit more than ROT-13, IIRC. So of course folks had to figure it out and someone posted the list to comp.os.vms or the like, whereupon various people commented on what languages the forbidden words came from. Many were easy, but one stayed unattributed for a few days, until a post saying "It's Turkish. Don't ask".

I mention this because I can imagine, what with the deep well of love for Microsoft in the world at large, that they might fall prey to similar ill-will for simply having such a list for many languages. I hope they have learned about hashes.

Well, I'm highly confident that there are many people in Microsoft R&D who know about hashes. I'm somewhat less certain that the software engineers involved will have seen this one coming -- but I guess by now they've probably experienced, at least once, pretty much every way that ill-wishers of one kind or another might approach their creations.

Another response to the Clippy post: Joseph Kynaston Reeves writes:

Just read your fascinating Language Log post about how Clippy responds to verbal abuse, so (like everyone else who read it, I suspect) I gave it a try. The results are generally as you say, with one notable exception: tell Clippy "Fuck you" in Word and the first template he offers you is "Thank-you for job interview". Genius.

"Mute inglorious Milton" doesn't quite cover this one. "Mute inglorious Dilbert", maybe.

Posted by Mark Liberman at 06:57 AM

No Singhs or Kaurs Need Apply

According to recent CBC news reports, Immigration Canada has a policy of denying permanent resident status to Sikhs whose last name is Singh or Kaur on the grounds that these names are too common. Such people can immigrate to Canada, but only after legally changing their names to something less common.

A bit of background on Sikh naming practices may help to explain this. Sikhs use a set of several hundred given names, all or nearly all meaningful, usually with a religious or moral theme. For example, ਉੱਜਾਲ Ujjal means "bright, clean, holy", ਹਰਪ੍ਰੀਤ Harpreet "God's love", ਗੁਰਪ੍ਰੀਤ Gurpreet "one who loves the guru", ਸਿਮਰਨ Simran "meditation on God's name", ਲੱਖਬੀਰ Lakkhabeer "as brave as 100,000". A family often selects a name for a child by opening the Sikh holy book, the ਗੁਰੂ ਗਰਨਥ ਸਾਹਿਬ /gurū granth sāhib/ to a random page and choosing a name that begins with the first letter of the first word on the page. Sikh given names are not gender-specific. In addition to a given name, a Sikh will usually have a second name, which may be a family name or a caste name.

All Sikhs are expected to aspire to a higher status, that of ਖਲਸਾ /khalsā/ "pure". A Khalsa is a person who is fully dedicated to Sikhism, has shed his or her ego, and truly honours the memory of Guru Gobind Singh through his actions and deeds. The ceremony in which a Sikh becomes a Khalsa is referred to in English somewhat misleadingly as "baptism". Khalsa are therefore referred to as "baptized Sikhs". On becoming a Khalsa, the Sikh undertakes the obligation to wear the physical symbols of this status (unshorn hair, comb, steel bracelet, undershorts, and dagger) and takes the name ਸਿੰਘ "lion", usually romanized as Singh, if a man, or ਕੌਰ /kɔr/ "princess", usually romanized as Kaur, if a woman. (Note that Singh is spelled irregularly: it is written ਸਿੰਘ /siṃgh/ but pronounced /sɪŋ/.) These names reflect the strong egalitarianism of the Sikh religion. They were originally intended to replace the Sikh's original surname, which as often as not was a caste name, thereby making everyone the same caste.

Some Sikhs do replace their original surname with their Khalsa name, but many retain their original surname and add the Khalsa name before it. Thus, a man born ਸੰਦੀਪ ਬਰਾਰ Sandeep Brar may become ਸੰਦੀਪ ਸਿੰਘ Sandeep Singh but more likely will become ਸੰਦੀਪ ਸਿੰਘ ਬਰਾਰ Sandeep Singh Brar. Similarly, a woman born ਹਰਪਰੀਤ ਗਿੱਲ Harpreet Gill may become ਹਰਪਰੀਤ ਕੌਰ Harpreet Kaur or ਹਰਪਰੀਤ ਕੌਰ ਗਿੱਲ Harpreet Kaur Gill.

The result of this is that the surnames Singh and Kaur are especially common. Apparently, Immigration Canada thinks that they are too common. This seems a bit odd since only about 10% of Sikhs are Khalsa, and only a subset of these replace their surname with their Khalsa name.

It is also a little odd that Immigration Canada hasn't encountered similar problems elsewhere. 22% of Koreans have the family name 김 (金) Kim, almost certainly a considerably higher percentage than Singh or Kaur. And what about countries like Indonesia in which many people have no surname at all?

Finally, it is hard to understand what purpose this ban serves. If Immigration Canada is concerned about distinguishing one person from another once he or she is a Canadian national, surely this can be done by means other than the name. For example, every Canadian national is assigned a Social Insurance Number, which, unlike a name, is unique. Or they could use a case number, which I believe they already do. On the other hand, if Immigration Canada is concerned with distinguishing one person from another in records pertaining to life in India, e.g. for purposes of criminal record checks or confirmation of educational credentials, forcing the applicant to change his or her name will not make things easier: the records will still use the original, potentially ambiguous, name. It seems, therefore, that this policy is motivated by a very small number of actually ambiguous names and serves no useful purpose in better identifying applicants.

Addendum 2007-07-25: Some of our readers think that this is a hoax or misunderstanding. It certainly doesn't look like it. Not only have there been two successive CBC reports, but one of them contains a link to a PDF of a letter citing this policy received by an applicant for permanent resident status. The Times of India now has a story on this, with the byline of its own reporter, therefore presumably not merely a rewrite of the CBC story. Moreover, the comments following the posting of one of the CBC articles on Sikhnet contain reports from people indicating that they have encountered this policy.

Further update: according to a new CBC news report, Immigration Canada is now saying that its policy has been misunderstood and that the letter sent to applicants is poorly worded. The actual policy, they say, is to ask applicants who give their surname as Singh or Kaur to supply their third name if they have one. An official statement by the Minister of Citizenship and Immigration may be found on the Ministry website here.

It isn't entirely clear what is going on here. In light of the letter to Jaspal Singh made available by the CBC, it seems that the policy as presented to applicants is indeed that they may not have the surname Singh or Kaur. The letter says:

Please note that your surname must be endorsed on your passport. The names Kaur and Singh do not qualify for the purpose of immigration to Canada.

It may be that the letter does not reflect the intended policy, or it may be that it did and that the "clarification" offered by Immigration Canada reflects backpedaling.

Update 2007-07-30: According to this CBC report, Sikhs are not satisfied with the response of Immigration Canada and are pressing for a change in policy.

Posted by Bill Poser at 01:56 AM

July 24, 2007

Speech events in a kickback case

Readers of Language Log will recognize that linguists deal with language priniciples and structure on many levels, including language sounds, morphemes, words, sentences, semantics, pragmatics, speech acts, discourse, variation, and change. Here I'll illustrate how the use of another unit of language, the speech event, works in the forensic linguistics context.

The words uttered on a tape recording in a sting operation are not always what they may seem. Take, for example, the case (back in the 80s) of a Texas legislator who was approached by an undercover FBI agent who claimed to be representing a large, well-known insurance company. The agent's plan was to get the legislator to open the bidding on that state's insurance program so that his company could obtain the contract. So far, so good. But the agent made two offers simultaneously, one legitimate (to make a campaign contribution) and one illegal (to split the agent's commission for getting his company in -- a kickback). More to the point here, the structure of the business speech event sheds light on what was wrong with the government's case. A much shortened version of their conversation follows:

Agent: There will be a savings of approximately a million bucks.

Legislator: Anytime you can save the state a buck by God, I'm for it.

(agrees to the idea of saving the state money)

Agent: We want to make a contribution to your campaign.

Legislator: Let's get this thing and try to take care of it first. Then let's think about that.

(tries to separate the business offer from the campaign contribution. Note his use of "this" and "that" here)

Agent: I will, whatever you want to run. $100,000 going in and we can prepare to put a half a million.

(sounds like a campaign contribution, but wait...)

Legislator: Anytime you can show me where you can save the state money well, by God, I think that's what part of my job is, try to save the state.

(back to the business discussion, not the campaign contribution)

Agent: There's $600,000 every year. I'm keeping 600 and 600 whatever you want to do with it to get the business.

(the money is his to do with what he wants, for the insurance company to get the business, a quid pro quo)

Legislator: Our only position is we don't want to do anything that's illegal or anything to get anybody in trouble and you all don't either. And that's as legitimate as it can be because anytime somebody can show me how we can help save the state some money, I'm going to bat for it. But you know it'll be reported.

(refuses anything illegal, goes back to the business proposal, and says he'll report any potential campaign contribution)

Agent: Why do you have to report it?

Legislator: Well, I don't want to get into no damn tax-- (interrupted by agent)

Agent: You can report it later on. Put it away because we're talking about -- (interrupted by legislator)

Legislator: No, no, no.

(refuses putting it away and not reporting it)

This was a business conversation. Many language crimes, such as kickbacks, bribes, or solicitations to murder, take place in a discourse format or genre of a business conversation. Linguists call formats such as this, "speech events." Dell Hymes originally called them "communicative events" but the term, "speech events," has become more common. Quite simply, speech events are structured activities that are governed by rules or norms for the use of speech. Participants in such events use language that reflects the way people belong to or are involved in the social life of that specific community. Examples of speech events discussed in the past include telephone conversations, lectures, prayers, jokes, business meetings, and interviews. Each has a discourse structure that identifies it for what it is.

The speech event of a business meeting has six phases:

1. Introduction: ritualized greetings, small talk, establishment of mutual authenticity

2. Present the problem: need for services or products in exchange for payment; the "why we are here" part.

3. Present a proposal to solve the problem: offer of services or products for payment, or vice versa, plus negotiation.

4. Completion: verbal acceptance or rejection of the offer and negotiation of phase 3, ending with verbal agreement or disagreement; if agreement, usually a handshake or signing of a contract.

5. Extension: if agreement is reached, discussion about future possible deals takes place.

6. Closing: ritualized closing talk, thanks, small talk; if no agreement, the closing is briefer.

As simple as the structure of a business speech event may seem, when criminal proposals are on the table, things may get confusing for the prosecution. And that's where a linguist can be helpful. Undercover agents typically propose an illegal activity to targets to see if they will bite. If the targets accept the offer, such as a kickback, bribe or show willingness to have somebody killed, they are dead meat. But sometimes things are not that simple, making linguistic analysis useful. In this sting case, the agent confused the proposal phase by offering to give the legislator a contribution for his forthcoming reelection campaign (perfectly legal in that state at that time), while simultaneously offering him an illegal kickback.

The government heard the kickback offer but apparently failed to notice how the legislator separated the two offers, refusing the kickback, but leaving open the possibility of a campaign contribution after they took care of the legitimate business offer to save the state money. Nevertheless, the legislator was indicted and went to trial.

In terms of the structure of a business speech event, it is clear that the conversation discussed here never got through the phase 3 proposal to do business and they clearly didn't reach the phase 4 completion phase. The legislator agreed only to the idea that saving the state money would be a good thing, part of phase 2. Nothing illegal about that. In phase 3, the kickback offer was presented, contaminating the tape with whatever illegality existed here. Somehow, the prosecution overlooked the fact that the legislator disagreed with the kickback. It apparently took his agreement to save the state money as evidence of his guilt and it also overlooked his disagreement to the agent's suggestion to not report the possible campaign contribution. In short, the prosecution appeared to be so enamored with the possibility of a kickback that it was blinded to what actually happened in this business speech event.

If it had examined the conversation from the perspective of a business meeting speech event, the government could have saved much time and taxpayer money on this prosecution. The case went to trial and the jury wisely acquitted the legislator. Good intelligence gathering and good intelligence analysis could have prevented the whole thing.

Posted by Roger Shuy at 02:45 PM

Sí se puede

I don't often listen closely to Marketplace, but this story in yesterday's edition caught my ear:

Huddled masses yearning to learn free

With English proficiency becoming an increasingly crucial skill for U.S. immigrants, classes to learn the language for free fill up fast. Jessie Graham reports on the demands facing English-language programs in New York.

It's a very short story, but it highlights the difficulties that immigrants face when they really want and need to learn English. Even when cost is not a factor (the classes discussed in the story are free), there's a massive scheduling problem in both directions: finding a class that's not full, and finding a class that fits into your work schedule (particularly hard if you're working long hours or more than one job). This is the kind of thing that rarely gets this kind of undivided attention in debates about the enforcement of English as the official (or "national", or whatever) language of the United States, so I'm particularly glad to see it get this kind of attention on Marketplace.

I'd certainly like to see a sharp increase in the number/convenience/availability of (free/subsidized) English language classes in a successful immigration reform bill, regardless of the official language question. Some of our elected public officials have other ideas about how to learn English, though.

Last month, California Governor Arnold Schwarzenegger further endeared himself to admirers of his "straightforward" approach when it comes to politically sensitive matters at the annual convention of the National Association of Hispanic Journalists.

[R]esponding to a question about how to help struggling students, [the Governor] said they should "turn off the Spanish television set. It's that simple. You've got to learn English." That remark set off a debate with NAHJ taking the position that the governor made a good point -- poorly. (link; see the full NAHJ response here)

Needless to say, these remarks sparked a flood of responses -- positive, negative, and somewhere in between -- in the opinion sections of newspapers in California (and no doubt elsewhere). The San Diego Union-Tribune had its share, which I'd like to share here with you.

"Solid advice: Governor is right on English-immersion"
(June 19 editorial)
"On English, the governor is one to talk"
(June 20 column by Ruben Navarrette)
"Governor's advice on English immersion"
(June 21 letters to the editor)
"Gov partly right but mostly wrong"
(June 25 column by Maria Elena Salinas)
"Columnist's advice judged mostly wrong"
(July 2 letters to the editor)

In particular, I'd like to draw your attention to this June 28 letter from my UCSD colleagues John Moore (Linguistics) and Ana Celia Zentella (Ethnic Studies): "Learning a second language: When simple solutions and anecdotes collide with the facts". Moore and Zentella begin with this observation:

Invoking simple solutions to complex problems is an easy and effective rhetorical device. No need to do research, check facts, consider complexities -- just assert the solution and, as long as it is close enough to what people already believe, the argument is won.

The letter ends with the following food for thought.

Rarely do politicians think to consult language researchers when dealing with linguistic problems. The governor seems to think that his recollection of his own experience with learning English is enough evidence to know how to deal with complex issues of second-language acquisition and literacy among poor immigrants under very different circumstances. However, we still harbor hope that research and facts might occasionally trump a facile appeal to personal anecdotes, so often invoked in political discourse.

Amen.

[ Comments? ]

Posted by Eric Bakovic at 01:42 PM

PNAS embargo policies considered annoying

Like most high-profile scientific journals, the Proceedings of the National Academy of Sciences of the United States of America sends journalists preview copies of forthcoming articles, which they are instructed to treat as "under embargo" until a designated time.

The idea is to allow the journalists to study the article in advance, get quotes from experts, and prepare a story that can run at the same time that the scientific article actually "appears" (which these days generally means release on the web in advance of the pro forma paper publication).

For some reason, PNAS seems to schedule its "embargoes" to expire several days before the article is actually available to the public. (Perhaps some other journals do this too, but I haven't seen it.) I mentioned an instance of this a couple of months ago in the case of Dediu and Ladd's paper on possible gene/tone connections, and I've experienced it silently quite a few other times.

The most recent case is some interesting-looking work by Jay McClelland and others on an application of machine-learning techniques to induction of vowel categories in motherese. Although this is now hitting the popular press, the paper in question isn't on the PNAS "Early Edition" yet. If previous practice is a guide, it'll appear some time later this week, perhaps as late as Thursday or Friday.

This is manifestly unfair to bloggers. We work faster than journalists do, but we don't have time machines. The way PNAS plays the game, the old media get several days to tell the story their way, before we even see the original paper.

Let me be clear -- I don't in any way oppose the embargo concept. It's a Bad Thing to have journalists "explaining" a putative scientific result, when there's no way for people to get access to details about the research in question.

The worst case is when there's never any paper at all, just a mass journalistic confusion-fest like the "email lowers IQ more than pot" or "20 words make up a third of teenagers' speech" or "cows have regional dialects" foofaraws. Not far behind is the egregious misrepresentation of leaked drafts of politically-sensitive scientific reviews.

But if you delay public access to a paper until several days after the press has had a chance to "explain" it, you're taking a step in that same bad direction. So, PNAS, shape up and fly right!

Posted by Mark Liberman at 06:52 AM

Men are from ...

Ellen Goodman's column for 7/20/2007 ("The mythical chat gap") has a quote from me, but I like Janet Hyde's quote better: "Men are from North Dakota and women are from South Dakota."

[Update -- Josh Millard observes:

So, knowing that there's this old classic, and inspired by today's reference, I hopped over to Google to see what the other side of that gimme of a construction looks like, and, well, there's a heck of a lot of variation there.
Too busy to do the proper digging at the moment, but it's got some entertaining promise.

Indeed. ]

Posted by Mark Liberman at 06:14 AM

July 23, 2007

Just because people visit a whorehouse

From Frank Rich's Sunday July 22, 2007 NYT column; it's behind a pay-me gate, but I've put a copy here.

"Newspapers back home also linked the senator to a defunct New Orleans brothel, a charge Mr. Vitter denies. That brothel's former madam, while insisting he had been a client, was one of his few defenders last week. "Just because people visit a whorehouse doesn't make them a bad person," she helpfully told the Baton Rouge paper, The Advocate."

This was fun to read because it sounds authentic (but I realize we have no way of being sure of that) and has one of my favorite constructions and may or may not have another of my favorites but interesting either way.

One is the "Just because X doesn't mean Y" construction, here in the variant "Just because X doesn't make Y Predicate" . After years of foolish disdain under the influence of my prescriptivist upbringing, I realized that this hopelessly "ungrammatical" construction is our most unambiguous way of negating an "if-then" construction, specifically negating the conditional connection itself while remaining uncommitted as to the truth of the consequent. I don't know of any other way to negate a conditional that is both unambiguous and colloquial.

I thought there must have been some work on this construction sometime, and sure enough, around the water cooler at Language Log Plaza Arnold Zwicky told me about a nice article due to appear, treating the syntax, semantics, and pragmatics of the construction with construction grammar:

Bender, Emily M. and Andreas Kathol. To appear. Constructional Effects of Just Because ... Doesn't Mean ... BLS 27. (Also here: www-csli.stanford.edu/~bender/papers/bender_kathol01.ps ) -- it contains references to earlier work on the construction as well.

And the other phenomenon the quoted sentence may or may not illustrate is "singular "they"", discussed a quite a lot on Language Log -- here and here, for instance. I see why I felt unsure: There are three expressions connected to one another by anaphora or predication: people, them, a bad person. If you just look at people ... them, then them looks like a normal plural. But then there's make them a bad person. That isn't definitive either, but it favors singular "they". This may be the sort of intermediate case that softens us up and helps singular "they" enter the language without much notice.

Posted by Barbara Partee at 03:14 PM

And now, a compliment machine

The Washington Post tells us about a candycane striped machine that sits in front of a shop on 14th Street in DC, randomly spewing out compliments to pedestrians as they walk by, including:

You help create a brighter future.

People are drawn to your positive energy.

You don't hate the player or the game.

The creator of this fine machine, a local artist, says that he constructed this invention to make people feel good, whether they believe the messages or not. His intentions are admirable, of course, but he may be missing the point that compliments lose a lot of meaning when they're automated that way. Suppose he had developed a thanking machine, for example? Or an apology maker? Or a contraption to make our complaints for us? They would save us a lot of trouble but would seem sadly lacking in sincerity.

Okay, don't tell me; I know. The greeting card industry already creates speech acts for those who are too lazy to produce their own or are possibly too language-impaired to even try. But at least greeting cards, however lame they are, aren't uttered randomly and they're usually sent from one known person to another.

Both compliment machines and greeting cards (maybe I should add fortune cookies) leave a lot to be desired. Maybe I'm just an old curmudgeon, but to me it would seem better if people would do their own complimenting, thanking, and complaining -- all by themselves to specific receivers, and in their own relevant and specific words, however hard this task may seem.

Posted by Roger Shuy at 01:50 PM

Shack!

Norimatsu Onishi, "Bomb by Bomb, Japan Sheds Military Restraints", NYT 7/23/2007:

To take part in its annual exercises with the United States Air Force here last month, Japan practiced dropping 500-pound live bombs on Farallon de Medinilla, a tiny island in the western Pacific's turquoise waters more than 150 miles north of here.

The pilots described dropping a live bomb for the first time — shouting "shack!" to signal a direct hit — and seeing the fireball from aloft.

This use of shack is not in the OED, but Grant Barrett has it covered:

shack n. a direct hit on a target by a bomb or missile.

But Grant's earliest citation is from 1998, and I know from personal experience that it goes back to Vietnam in the 1960s. Here's an interview with Roger Preu from the Stamford Historical Society, about his experiences in WW II, showing that it goes back to 1943, at least as a term for the target in bombing practice:

And if you hit the target, the shack it was called, it was built up where you'd see it.

And though shack probably was a noun to start with, you won't be surprised to learn that it's used as some other part of speech as well. The example from today's NYT suggests that it's often a kind of specialized interjection. Here's an article from a U.S. Air Force Source ("Raptor drops first bomb", 10/21/2005) where it's used as a verb:

After watching the first bombing flight through a live television feed, the colonel said he could tell it was a successful event, but not where the bombs hit.

Hill's Weapons Systems Evaluation Program operators verified the bombs not only hit the targets, they "shacked" them.

"That's a fighter-pilot term for when you hit a target dead center -- a bull's-eye." said Capt. Shawn Anger, 43rd FS air-to-ground weapons chief. "Hit criteria will vary depending on the size of the target and the munitions, but when you put the bomb directly in the center of the target it's a shack."

And here (Mark Bowden, "The Kabul-ki Dance", The Atlantic 11/2002) it's an adverb:

The bomb hit "shack on," or dead center, and the SAM launcher vanished in a satisfactory black splash on the monitor.

[Update -- Benjamin Zimmer located this from the Nov. 1956 issue of Boeing Magazine (misdated as 1934 by Google Books, following its usual unfortunate practice of dating all issues of a serial in terms of the earliest issue):

"It's a shack!" someone yelped. "Shack" is a bombing man's term for bull's-eye, dating from the days when the usual target on a bombing range was indeed a shanty built of boards. Sure enough, it was a perfect hit on a hat-sized target at Springfield, Massachusetts — the only "shack" of the day among the heavy bombers: the all-jet B-52s and the piston-powered B-36s.
...
And at Montreal, third target city, it was a storage tank. But not the whole tank. Merely the geometrical center of a circle formed by a small railing atop the tank. Hit that, dead center, and you've got a "shack."

]

[Update -- Jim Gordon writes:

The usage was first and foremost by bomber crews of the 1940s, because fighters and dive-bombers used flat targets. Ben Zimmer's citation has the original usage and the using population. The WWII Norden bombsight worked best when the target was a structure with some vertical dimension, rather than being a flat surface on the ground. Thus, the bombing ranges used to train B-17, B-24 and B-29 crews had rudimentary shacks built as/at the aiming point for dummy or practice bombs. (I'm not sure whether smaller Air Corps bombers used them, nor do I know which Navy bombers carried them -- the Norden bombsight was invented for the Navy.)
As an Air Force brat, I regularly heard and read the term used by SAC aircrew in the early 1950s, and by the Air Defense Command pilots and navigators who flew "aggressor" missions, simulating the Russian bombers that we all expected at any moment, to test the air defense system of the mid-1950s. The measurement equipment of the 1950s made it unnecessary to drop even a dummy "shape" (bomb), instead using a rudimentary computer in a truck-transportable box or shack. If one looked at SAC publications of the 1950s, intended to propagandize the rest of the Air Force and other services, reports of the annual bombing competitions would include use of "shack."
When I served in the Air Force 1966-1970, the term had shifted away from bomber-crew use as atomic bombs became the weapon of the day and iron bombs were disdained in SAC. With atomic bombs, the targets were much larger "target islands" and precision was relative. The term was largely replaced by references to the "circular error" (a term much used by the end-of-WWII Strategic Bombing Survey of European targets), or by references to the distance of impact from the desired ground zero (DGZ); If they hit the DGZ, they had "zero error" or "zero C.E.P." (sic -- The abbreviation for the pre-drop estimate of the expected "Circular Error, Probable" became the term for the result.) As fighters were equipped with more complex, computing bombsights, late-Vietnam era, the shack term moved into use in the fast-movers' culture.

]

Posted by Mark Liberman at 11:59 AM

"Expressions of negative Clippy feelings"

Michael Kaplan, who works on "internationalization and localization issues" for Microsoft, especially "collation and keyboard issues", recently posted some useful information about Clippy:

I had a friend complain to me the other day (the way that all folks who have friends working at Microsoft tend to do) about Clippy and how to turn him off in Office 2003.

Now I have mentioned before that Clippy is off in the default install and has been for a few versions now.

But I figure if even Charles Simonyi can be confused by it then I suppose anyone can. :-)

So I remembered an old trick someone had mentioned to me and asked my friend "Have you tried being rude to him?"

"What do you mean?" she asked me. "How can you be rude to a talking paper clip?"

"Well," I suggested, "try venting your anger at him. Tell him in a few concise words how you feel about him."

Here's an example of Michael's suggested remedy:

He explains further:

After telling Clippy this, the first item on the list explains how to change the Office Assistant,and the second item explains how to hide or show it.

Now this is obviously not the only way to find the message, but I find three different language issues amusing here:

An amazing number of people use this exact phrase;

There are reportedly many other expressions of negative Clippy feelings that will have the same effect on search in help;

There are disadvantages to a formal education that make this method of finding a solution less obvious.

And this immediately brings up some questions:

I wonder how sophisticated the "unhappy user" detection is here in language. And whether it has been appropriately localized.

Apparently Michael, who works on collation issues in Redmond, is not on the mailing list for the Translingual Cussing Committee. (Perhaps it's run out of Microsoft's new Bishkek Lab...)

Michael's questions raise some problems in machine learning. Some of the engineering issues now are covered under topics like "sentiment detection". More scientifically, we could ask for a discovery procedure, applied to a corpus of texts in an (otherwise unknown) new language, that will find all and only the cuss words. (By which I mean something like "taboo expressions of negative affect" -- though it's not easy to define this across languages and cultures. More links on this are here.)

Imagine if Zellig Harris, a half a century ago, had assigned that problem to Noam Chomsky, rather than the (easier?) task of inducing syntactic structures!

Posted by Mark Liberman at 05:56 AM

July 22, 2007

Wait, wait, don't tell me

In Prague last month, the Association for Computational Linguistics honored Lauri Karttunen with its Lifetime Achievement Reward for his contributions to the field. In his presentation of the award, Mark Steedman mentioned Lauri's work on discourse semantics, unification-based parsing, and finite-state-based approaches to morphology and syntax, any one of which would have justified the award by itself. But before he elected to sully his hands with computation, Lauri had a notable career as a linguistic semanticist, where he made a specialty of raising puzzles about topics like presupposition and reference and anaphora that researchers are still scratching their heads over several decades later (try doing a Google search on "Karttunen plugs OR paycheck"). At the end of his ACL acceptance speech, Lauri suggested that some of those semantic issues are now becoming relevant to NLP, with the advent of search engines that actually make use of semantic processing in addition to simple string matching, so as to be able to draw textual inferences. By way of example, he mentioned the classification of complement constructions, and then, faithful to his past practice, he left the audience with a little conundrum to puzzle over, which I hereby pass on to you, gentle readers. Riddle me this one, and no fair peeking at his ACL slides.

The construction didn't wait to is ambiguous. Here are a couple of examples from Google to to illustrate the ambiguity.
(21) a. Deena did not wait to talk to anyone. Instead, she ran home.
b. It hurt like hell, but I'm glad she didn't wait to tell me.

(21a) implies Deena did not talk to anyone. But (21b) implies She told me something right away.
Question 1: How does it come about that X didn't wait to do Y means either that X did Y right away or that X didn't do Y at all?

When you look at examples with didn't wait to in their full context, it is nearly always possible to tell which of the two meanings the author has in mind. In (21a), for instance, the negative polarity item anyone and the word instead are telltale indicators. In (21b), the cataphoric pronoun it indicates that a telling event took place. I am sure that it is possible to learn to pick the intended meaning by statistical techniques. But statistics alone will not give you an answer to Question 1, nor will it solve the related problem in Question 2.
Question 2: Why is it not possible to translate expressions such as Neil didn't wait to take off his coat to other languages in a way that preserves the ambiguity the sentence has in English?

In languages such as Dutch, Finnish, French, German, Hungarian, and Japanese among others, it is of course possible to express the two meanings of X did not wait to Y but not in one and the same sentence.

Posted by Geoff Nunberg at 02:28 PM

Yes, that about covers it...

User Friendly for 7/17/2007:

To clarify the gruntsponge reference, see here.

Posted by Mark Liberman at 01:48 PM

One will get you four more

On Monday Alison Murie asked on the American Dialect Society mailing list about prepositions in clock time expressions -- a quarter to/till/until/before/of ten (to which we can add past/after, though I've been concentrating on the other set) -- and people began expressing an assortment of preferences. Since I've been laboring on factors that favor the choice of one lexical variant over another, and in fact am teaching a course on the topic at the LSA's Linguistic Institute this month (course materials available here), the question looked looked right up my alley, so I went to see what people have said about these prepositions. Damn little, as it turns out.

But my searches through dictionaries, reference grammars, style manuals, and advice books of all sorts led me to four phenomena that I hadn't thought about before, all involving the preposition of. This happens to me a lot: I look for one thing and stumble on others. (I will, apparently, never lack for things to think about.)

First, the time prepositions. Geography plays a role. The OED notes that temporal of is "N.Amer., Sc., and Irish English (north.)", and that accords pretty well with my experience in England: I quickly learned to replace my Yank of with another preposition; in the U.S., my usual alternative is to, but in England I tended to favor before, because it's semantically transparent (not any sort of idiom) and therefore entirely safe. Within the U.S., it looks like to is especially common in the Northern dialect area, and Joan Houston Hall writes to say that the Dictionary of American Regional English will label till as "widespread except Northeast, Great Lakes". Given its distribution in the British Isles, I'd expect of to appear especially in areas with early Scots-Irish influence (which would include the Midland areas of the U.S. and parts of eastern Canada, but not the rest of that country, and probably would include Australia as well), and not to be used in AAVE; DARE has it as widespread, but especially Northeast and central Midland.

Whatever the geographical and social distribution of the variants, the fact remains that a great many people use two or more of them, and the question that really interests me here is what influences their choices. Since the advice literature on grammar, style, and usage tends to adhere to the principle that there is Only One Right Way, I'd have expected this literature to be directive. But so far I haven't found a prescription anywhere. The Chicago Manual of Style (15th) has at least one example (He left the office at quarter of four, p. 391) with temporal of (showing that CMS is indeed an American style manual), but it illustrates a point other than preposition choice (and, incidentally, also illustrates a choice between a quarter and plain quarter).

There are lots of possible factors, stylistic and structural and maybe even semantic, that might be relevant. Maybe, for instance, some people's preferences are different for ten minutes P T (where P is the preposition and T is the "goal" time), ten P T, (a) quarter P T, and elliptical variants (Let's meet at ten to, Let's meet at a quarter of). Maybe it makes a difference if T is just a number (ten minutes to five), or has o'clock expressed (ten minutes to five o'clock), or is noon/midnight.

I don't at the moment know a thing about these questions, though I do know that just asking people about their (or other people's) preferences or practices is not likely to produce accurate data. Such reports are notoriously unreliable. I don't trust my own reports, in fact. We have to study what people actually do, in what circumstances, and that's not an easy task.

Putting these questions aside, I turn to the advice literature on of, which I explored in the hope that there would be something about temporal of there. Now, I have been a visitor to the entries on of in this literature for many years, but for other purposes. I can tell you that for about a hundred years, usage advisers have been railing at the "intrusive" of in off of, out of, outside of, inside of, and alongside of -- the Wordy Five -- especially off of. (I intend to post on the subject eventually, but for the moment you can look at some course notes here.) And for twenty or thirty years, they've been railing about the "intrusive" of in exceptional degree marking: too/that/so/how big of a dog, etc.; it's spreading fast, especially among the young, so it's become many people's pet pet peeve about English. (I intend to post on this one too.) Recent manuals are pretty much guaranteed to have complaints about these two.

I was, then, not at all surprised at the first sentence in Rob Colter's entry for of in Grammar To Go (3rd ed., 2005:59):

If you accept "It fell off of the table," then you should accept "It fell on of the table," since using of is as meaningless in the first example as in the second.

(though the reasoning by analogy is entertaining, since usagists uniformly reject analogical defenses of non-standard usages). But the second (and last) sentence had something in it that was news to me:

The same can be said for "inside of," "underneath of," and "outside of."

Whoa! Underneath of? No one else seems to have complained about this one. But it's out there, in respectable numbers. A sampling:

The greatest accumulation of this potentially harmful debris is underneath of the vehicle, around the frame, undercarriage and wheel wells, etc. (link)

To delete an existing photo and not replace it with another one, click on the word Delete that is underneath of the photo to permanently delete if from our ... (link)

The battery access is underneath of the color LCD ... (link)

I want the line that is underneath of them to be one continuous line. (link)

It actually magnifies what is underneath of it, it's very cool. (link)

Apparently, these writers are treating underneath as parallel to ahead and instead and the opposite of underneath, on top, all of which require of. Non-standard, but not crazy.

In fact, DARE has of after prepositions other than the five that the manuals complain about: aboard of, above of, around of, aside of, behind of, beside of, on board of, over of, underneath of; these are labeled as chiefly Southern, South Midlands, and Northeast. Having already searched on {"is underneath of"}, I went on to check out under of, over of, and, yes, on of. Modest number of hits for under of, e.g.:

A child who is under of the care of some one else; Most children who are eligible to receive child support must be a dependent. (link)

... shall allow any alcoholic beverages to be sold, given or otherwise supplied upon the licensed premises to any person who is under of 21 years of age, ... (link)

The directorate is under of the Norwegian Ministry of the Environment. The directorate's mission is to "preserve biological diversity and strengthen the ... (link)

There is a lot of benefits to taking this if a person is under of stress whether mental or physical. L-glutamine is a free-form amino acid (protein) that is ... (link)

Some of the hits might be typos, but there are enough to suggest that under of is a live option for some people. Fewer hits for over of, but here are two that are probably not typos:

No wonder Tom feels "an abounding sense of relief and security", as he stands over of his dead body. (link)

But I would never use it over of an iPod for music. And since Apple does such a great job with music and they know video better than the rest, ... (link)

For on of, I got few hits, and most of them were probably typos of one sort or another. But there is a little island of on of, in writings by and about people who practice healing by "the laying on of hands" (or "the laying-on of hands"), as in:

I want a church where they lay on of hands and heal people but that you don't have to give your address because it's that followup that I hate. (link)

There are a fair number of such examples. You can see where this usage probably came from. People in these communities seem to refer to the practice by the action nominal the laying(-)on of hands (rather than the gerundive nominalization laying on hands). The of can then be interpreted as a marker of the object of a complex verb lay on.

Next, I took a look at Rudolf Flesch's 1964 volume The ABC of Style, which I hadn't consulted for some time. The entry for of was a complete surprise: no mention of off of and its brethren, but instead three complaints that were new to me. Flesch (p. 210) begins sternly:

of is a weed that should be pulled out of all sentences where it doesn't belong

and goes on by giving six examples where of is to be extirpated. There is no commentary, explanation, or characterization of the offending constructions; readers are entirely on their own. The examples aren't even grouped into types; I've added identifying letters and brief characterizations.

A: Repeated partitive

Of the 15 millionaires who used this charity provision to avoid playing taxes, eight (of them) made their charitable contributions to their own private foundations.

Of all the objections everybody had to giving me the part, not one (of them) was because I was too pretty.

B: With superlative

The process of being born is one of the most hazardous (of) medical episodes in America today.

C: WH-clause complement of abstract N

He emphasized his belief in the right of self-expression, leaving ambiguous the issue (of) whether spitting, pushing and placard-throwing were covered by his call for the articulation of deep convictions.

The only remarkable thing about Goldwater's explanation (of) how he and Senator Javits might find a way of living with each other is the fact that he made it.

There is no earthly explanation (of) why.

For types A and B, I agree that of COULD be omitted, but deny that it MUST be; in each case, there are two somewhat different constructions, with subtly different uses. For type C, I find the first two of Flesch's "corrections" awkward, though examples without the of are certainly well attested; again, there are two different constructions. I'll start my discussion with type C and work backwards.

Some background... Complements of nouns get two different treatments, depending on their category. NP complements of Ns are marked with of: the issue of the extent of his problem, Goldwater's explanation of their rapprochement. That-clause complements of Ns are unmarked (in general, a that-clause is not eligible to be the object of a preposition): your explanation that you had to leave early to catch a plane. What then of WH-clause complements (in whether, how, why, etc.)?

Such complements might be treated like other finite clauses, in particular like that-clauses; they would then be unmarked, as Flesch recommends. Or since they are eligible to be the object of (certain) prepositions -- as in I know nothing about whether they did that, They said nothing about how they did that -- they might be treated as the equivalent of a NP; they would then be marked with of, as I recommend. Clearly, many people (probably most) allow either treatment, and I would expect there to be a subtle difference in meaning or discourse status for the two treatments, or at least a stylistic difference.

On the numbers, marking with of wins handily over the unmarked variant, somewhere between 2-to-1 and 10-to-1 (there's a lot of the noise in the data), I would estimate from searching on {"explanation (of) how"}, {"explanation (of) why"}, and {"issue (of) whether"}. Some examples:

This is a remarkable explanation of how the internet works! (link)

Attached is the explanation how to do it ... (link)

A Lengthy Explanation of Why There's a Picture of Bottles of Water. (link)

"Is there an innocent explanation why my boyfriend feels the need to go to a nightclub with his mates?" (link)

Summary: This FAQ addresses the issue of whether base station transmitter/antennas for mobile phones (cellular phones, PCS phones), and other types of ... (link)

On the issue whether a non-economic highest and best use can be a proper basis for the estimate of market value. (link)

There's clearly more to be said here, but I'll move on to type B. The model for the version with of comes from examples like the Monty Python reference to the intelligent sheep:

that most dangerous of animals

In such superlative examples, the of is not omissible; its object is a full NP (plural or mass), which can contain determiners; the object is interpreted as denoting a type rather than an individual or individuals; and the construction is normally headless, with the semantics of the head supplied by the object NP (roughly, 'animal' in the Monty Python example). In any case, in this construction of + NP is a partitive associated with the superlative (and not available for most other sorts of modifiers):

that/the most dangerous of all/our animals

cf.: *that most dangerous (all/our) animals [without of]

cf.: *that most dangerous of animal [with count singular object of of]

cf.: #that most dangerous of these animals [anomalous if these animals denotes individuals]

cf. *that/the very dangerous of (all/our) animals [with a non-superlative modifier]

There is an alternative construction with a singular (and overt) head, and without the of; this is just garden-variety premodification. Among the available premodifiers are superlatives like most dangerous (though many other modifiers, like very dangerous, are possible), and the head N can be understood as referring to an individual or a type:

that/the most dangerous animal

cf.: *that/the most dangerous of animal [with of]

cf: that/the very dangerous animal [with a non-superlative modifier]

There are simply two different constructions here, with slightly different meanings. Flesch's example with of is of the first construction, his of-less "correction" the second. It just happens that the understood head in the of-full version is plural (something like 'medical episodes'), as in

those most hazardous of all medical episodes

so that one construction can be "converted" to the other by removing the of. This is essentially an accident.

You get the feeling that Flesch spent a fair amount of time as an "of-hunter", reading texts for instances of of that could be removed, without really understanding the syntax of the material he was looking at.

An antipathy towards of has a long history in the advice literature on English, going back at least to H. W. Fowler and continuing in recent years to Bryan Garner. The usual complaint is that of is too frequent (frequent words in general are deprecated, as being "over-used") and has too many different uses (words with many uses are deprecated in general, on the grounds that they are potentially ambiguous), so it's virtually meaningless (in general, words perceived as being "vague" are deprecated) and should be avoided whenever possible. (You can find similar complaints about very, it, and, forms of the verb BE, and a number of other items.) Now, it's good advice to avoid piling up occurrences of of and similiar words in a short space, but an antipathy to such words in general is just silly; they perform crucial roles in indicating syntactic structure and discourse organization. You can appreciate this point by looking at the twenty most frequent words in the Brown Corpus and asking yourself how you would get along if you had to avoid them whenever possible:

the, of, and, to, a, in, that, is, was, he, for, it, with, as, his, on, be, at, by, I

(A side point: the "words" on this list are picked out orthographically. A number of them -- notably to and that -- clearly represent two or more distinct lexical items, while others listed separately belong together as forms of a single lexical item: is/was/be and he/his.)

On to type A. Here we have a fronted partitive, of + NP, which is interpreted in combination with a later quantity determiner. Flesch's first example, simplified here,

Of the 15 millionaires who used this provision, eight contributed to their own foundations.

is a variant of

Eight of the 15 millionaires who used this provision contributed to their own foundations.

Why would someone want to repeat the fronted partitive (in a pronominal version)? To make the connection between the fronted partitive and the quantity determiner absolutely clear. The single-partitive version,

Of the 15 millionaires who used this provision, eight contributed to their own foundations.

takes, I think, a bit more interpretive work than the double-partitive version,

Of the 15 millionaires who used this provision, eight of them contributed to their own foundations.

Well, that's just speculation, but it should be possible to test. In any case, I don't find the double-partitive versions unacceptable, or even pointlessly redundant (they're just more emphatic). Flesch clearly did, but then he was a demon for brevity.

In the end, I didn't find much on the temporal prepositions, but I did unearth underneath of and its relatives, plus three constructional choices involving of. One will get you four more.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 12:05 PM

Two simple numbers

OK, after years of complaining about the darkness of science journalism, I'm lighting a candle. I've found a cure.

Not, I'm afraid, a cure for the whole syndrome of credulousness, carelessness and misreading. I can't even pretend that my remedy will have any effect on the underlying causes, which are ignorance of science and the motive of sensationalism. My medicine will only provide symptomatic relief, and only in a specific class of cases, those where scientists (or their press agents) claim to have found "the genetic basis of X".

But given the new technologies and social policies facilitating genome-wide association studies, there are going to be a lot of these stories over the next few years. My medicine should be easy for journalists to swallow, and easy for the public to understand. And later, we'll add other simple remedies for other common kinds of science stories (like effect sizes in group-difference studies).

Today's prescription is a trivial rule of scientific rhetoric. When there's a claim that some genomic variant is associated with some phenotypic trait -- whether it's breast cancer or homosexuality or conservatism or stuttering -- we need to know four simple numbers. Specifically: (A) the number of "case subjects" in the study (people with the trait in question); (B) the number of "control subjects" in the study; (C) the proportion of the case subjects with the genomic variant in question; and (D) the proportion of the controls with the genomic variant in question.

If four numbers are too many, leave out (A) and (B), as long as they're not really small. But stick with (C) and (D) -- they're the medicine that really does the work here.

Let's do a little experiment of our own, with respect to the reporting of two recent genome-wide association studies: Juliane Winkelmann et al., "Genome-wide association study of restless legs syndrome identifies common variants in three genomic regions", Nature Genetics, published online July 18, 2007; and H. Stefansson et al., "A Genetic Risk Factor for Periodic Limb Movements in Sleep", N Engl J Med, published online July 18, 2007.

The striking thing about these studies is that they went on genetic fishing expeditions in several different populations -- Icelanders, Germans, Americans, French Canadians -- and found (in part) the same thing. By "fishing expeditions" I mean that they used microarray techniques to trawl for a lot of genomic variants -- 500,568 single nucleotide polymorphisms (SNPs) in one case, and 311,388 SNPs in the other. And they zeroed in on (different) SNPs in the same gene, BTBD9, whose etymology I discussed the other day. (They also found some SNPs in different genes that weren't replicated across studies, but never mind that.) When you combine this replication of results with the eminence of the researchers and the reputation of the journals, you can be sure that there's really some substance here.

OK, go off now and read some of the reporting of these results in the popular press. You could read the treatment that ABC News gave it (Denise Dador, "Restless Leg Syndrome Found to Be Genetic"), or the New York Times (Nicholas Wade, "Scientists Find Genetic Link for a Disorder (Next, Respect?)"), or Scientific American (Gene Emery, "Studies find gene linked to night leg movement"). Or (if you're reading this in July or early August of 2007) you can take your pick from what Google News returns from a search string like { "restless legs"}.

On balance, I think the stories that I've read about these studies are mostly pretty good, presenting a complex picture in a clear and fair way. But in none of the stories (at least among those that I've read so far) do we get the numbers that I claim are critical to understanding the real meaning and impact of this work.

So, on the basis of your popular-press readings, please now guess what those crucial numbers (C) and (D) are. What proportion of RLS sufferers (the case subjects) were found to have a variant form of BTBD9? What proportion of the general population (the control subjects) had the same variant?

To make the experiment fairer, you could ask one of your colleagues to read one of the popular-press stories, and then ask them to guess the numbers. Tell them it's part of a "wisdom of crowds" experiment.

I've tried this with a couple of random local acquaintances, and gotten guesses like "50% and 5%" or "75% and 10%" or "30% and 3%". I'd guess that less scientifically-sophisticated people might conclude from the popular-press coverage that people with the disease all have the key allele, while people without the disease don't (though the stories don't say any such thing, that's a common understanding of what "the genetic basis of X" means).

The numbers are in fact reported in the scientific articles that I referenced and linked above. They are:

Study	Population	Allele in SNP	Case subjects		Control subjects
Study	Population	Allele in SNP	N	Proportion	N	Proportion
Stefansson	Icelanders (combined)	rs923809	429	0.774	16,866	0.656
Stefansson	Americans	rs923809	188	0.766	662	0.681
Winkelmann	Germans (2) + Canadians	rs9296249	401+903+128	0.838	1,644+891+287	0.765

(Winkelmann actually gives the proportions as MAF ("minority allele frequency"), so the the case subjects were at 0.162 and the controls at 0.235. I've subtracted these from 1.0 in order to make the numbers more comparable to Stefansson's.)

I doubt that many readers of the popular press accounts of these studies will guess that "genetic basis" means genomic variants that occur at 66% (or 77%) in the asymptomatic general population, vs. 77% (or 84%) in patients with a diagnosis of RLS confirmed by periodic leg movements found in sleep studies with an accelerometer on the ankle.

(Nor would they guess that about 35% of the RLS diagnoses were not confirmed by the accelerometer tests, or that the genetic scan did not have significant results if the accelerometer tests were not used to trim the set of case subjects. But that's a different set of problems.)

This is not to say that RLS is not real, or that these genomic variants are not somehow connected with it -- or more likely, connected with things that are connected with things that are connected with it. But there's a big difference between numbers like these and what most readers will conclude from phrases like "discovery of the genetic basis of the disorder" (a phrase used in Nicholas Wade's NYT article).

So the next time you see an article in the popular press about "the genetic basis of X" or "the gene for Y" (and surely you will!), look for the case-subject and control-subject percentages. If you don't see them (and probably you won't!), write to the editor and complain.

Posted by Mark Liberman at 10:09 AM

July 21, 2007

Everybody in Philadelphia

From Philip Stevick, Imagining Philadelphia, 1996:

For many years in the forties and fifties, the New Yorker ran an advertisement, its format resembling a full-page New Yorker cartoon. It was always drawn by Richard Decker. Its medium was always the same: a public scene rendered in ink and wash. And the caption never varied. "In Philadelphia," it said in bold italics at the bottom, "nearly everybody reads the Bulletin." So like a New Yorker cartoon these drawings appeared to be that the magazine saw fit to clarify the intent by printing "Advertisement" in parentheses below the caption.

I remember being impressed by several of these advertisements when I was a child of two or three. Here's one that ran when I was two, in the issue of March 18, 1950:

Stevick describes another from the same year:

In the February 4 illustration, the view is of a theater, from the rear, looking up the center aisle toward the stage, on which an actor and an actress are in postures suggesting the possibility of high romantic emotion, except that both of them are reading the Philadelphia Bulletin. In the seats of the theater are the patrons; one sees the backs of their heads. But instead of watching the action on stage, each one is holding the Philadelphia Bulletin. Only a single figure, the familiar man in the black suit, with the long nose and the bald head, leans into the aisle from his seat, attempting to see the action on the stage, not an easy thing to do since his view is blocked by every else's newspaper.

As Stevick observes,

...what is most likely to strike a reader of those advertisements at our distance from them is the uncompromisingly derogatory view that they carry of life in Philadelphia. Faced with the unexpected, or the dramatic, or the exciting, or indeed the life-threatening, Philadelphians, the ads seem to say, cannot be roused from their daily papers. It is the evening Bulletin that provides all the engagement with life that Philadelphians can stand. Experience itself if simply not interesting. [...]

Yet there is something oddly compelling about those ads. One's ability, now, to respond to such a quality depends on a willingness to take them as occupying a plane that their creators surely never intended them to occupy. They are, in some quite convincing sense, night scenarios.

At the age of two, I accepted without question that illustrations like these, glimpsed in adult books and magazines, depicted the world as it was. Maybe not in Mansfield, Connecticut; but somewhere.

The Bulletin went out of business in 1981, several years before I ever visited Philadelphia. But today I finally caught a glimpse of Richard Decker's dreamtime city. In Philadelphia, nearly everybody was reading Harry Potter and the Deathly Hallows.

As I sat at outdoor cafe table, five of the seven occupied tables were occupied by people engrossed in the book. Someone else, standing propped against a lamppost, was about half-way through it. As I walked west on Walnut St., I nearly collided with two pedestrian readers. At the market, I saw several other people reading copies propped on the handles of their grocery carts, as they plucked items absently from the shelves in passing.

[I'm sure that these folk were all absorbed in the story, but somewhere, scholars are patiently compiling notae variorum, to add to the lists compiled for the earlier volumes in the series:

Harry Potter and the Philosopher's Stone / Sorcerer's Stone
Harry Potter and the Chamber of Secrets
Harry Potter and the Prisoner of Azkaban
Harry Potter and the Goblet of Fire
Harry Potter and the Order of the Phoenix
Harry Potter and the Half-Blood Prince

There are some interesting nuggets in this collection. For example, p. 558 of the British edition of Harry Potter and the Half-Blood Prince apparently has

one of the fighters detached themself

where the corresponding sentence in the American edition, on p. 598, reads

one of the fighters detached themselves

Curiouser and curiouser, as Alice said to her feet.

Arnold Zwicky discussed this issue (in general, not in the works of Ms. Rowling) in an earlier Language Log post: "Themself", 3/8/2007.

]

[Hat tip to Daniel Reeves for the differences pages]

Posted by Mark Liberman at 07:19 PM

Breakfast as text: the theory of cerealativity

In response to our earlier posts on the new intellectual currents in breakfast-cereal advertising, David Haan has reminded me of Jeff Reid's graphical essay from In These Times, 3/29/1989, which was reprinted in Harper's Magazine in August of 1989, and also in the essay collection Theory's Empire. We now reprint again, for your (post-)prandicular pleasure, "Breakfast theory: A morning methodology":

Posted by Mark Liberman at 09:10 AM

Homophonic slurs

Noted at Slacktivist: according to a story by Mike Bruno at Entertainment Weekly ("Washington Doing Five Episodes of 'Bionic Woman'", 5/16/2007),

Former Grey's Anatomy star Isaiah Washington will appear on five episodes of the new NBC series Bionic Woman, which debuts this fall. Ben Silverman, Co-Chairman, NBC Entertainment and Universal Media Studios, made the announcement on Monday (July 16) at the network's presentation at the Television Critics Association summer tour. The actor will play a member of the organization that created the bionics that turn Jaime Sommers (Michelle Ryan) into the ''Bionic Woman.'' The character appears to have a hidden agenda as he helps Jaime handle her new abilities. Washington was not asked to return to Grey's Anatomy next season after a turbulent year in which he was accused of calling co-star T.R. Knight a homophonic slur, which he then said publicly back stage at the Golden Globes in January.

That would be, like, calling Knight trilingual?

(Not really; it would be more like speculating that he might become hoary in his later years. Though that's pretty weak.)

According to my internal norms of English usage, you can call someone a name, but you can't call them a slur. I wonder why not?

[Hat tip to Chris Laning]

[Mark Reed wrote in to make one of those points that's obvious in retrospect. You might think that Mike Bruno and/or his editor committed a malapropism, or a typographical error (since 'n' is right next to 'b') -- that's how I interpreted it -- but perhaps this was just a clever reference to the fact that Washington referred to Knight as a faggot (of wood). (No, I don't really believe that, and Mark doesn't, either, but it's a clever idea.) Mark also notes that EW's circumlocution confirms that faggot has joined nigger as a slur that it's taboo to print.

Mark raised another question as well: in the same passage, was "Jaime" for "Jamie" a slip of the fingers, a misspelling, or an error of memory? There seems to be some internal evidence here for careless typing and slipshod proofreading, both faults that I'm prone to myself.]

[But Marc Naimark writes from Paris to set us straight:

The character's name in the original series was Jaime Sommers. Drove me crazy... Jaime isn't normally pronounced like Jamie and is usually a man's name. Ugh. This was particularly annoying since it was coupled with her being from Ojai, California, which is pronounced in the Spanish fashion, o-high. How did the producers get the "jai" in Ojai right, and the Jai in Jaime wrong?!

]

[Though it somewhat spoils the joke, I do need to point out that while homophonous commonly means "having the same sound", homophonic is a different word, used in music to mean "in unison" (as opposed to antiphonic). However, a connection with linguistic homophony exists through the term homophonic substitution (cipher).]

[And Arnold Zwicky writes to point out that homophonic for homophobic happens a lot, sometimes clearly as a slip of the fingers, and sometimes apparently as a malapropism:

An earlier occurrence at http://www.overthrow.com/lsn/news.asp?articleID=6773 (hard to piece out just what was happening back in 2004).

I could swear that i've seen other reports of "homophonic" for "homophobic" or the reverse, but I don't find it on my computer, in the ADS-L archives, on michael quinion's site, or on the ecdb site.

But here are some from the first 40 hits for {homophobic homophonic}:

Because we are all products of our society, most of us are homophobic, regardless of our sexual orientation. Assume that you are homophonic.

I also challenge the word "homophobic" as fear of hobosexuals. I'm not homophobic--I have no fear of your type, only contempt. And now you have homophobia to wave around just like the jews have anti-semenic. So lets get rid of the word homophobia: How about "Homo-Blyiccch" (gag, choke, vomit)? Sure, you can call me homophonic if you like but I know what's right and what's wrong. When all you perverts are in hell it will be a much better place.

Girl #1: What's the word for when there's just a voice and a harmony?
Girl #2: I think that's called "homophobic."
Girl #1: Really, homophobic?
Girl #3: Actually it's "homophonic."
Girl #2: Oh.
Girl #1: Then what does homophobic mean?

ase note - I am not condoning discrimination here - just saying that people need to lighten up. What if I'm a straight guy who likes my yellow polo shirt and dogs - if I have a small dog collection that I walk with am I gay?

And I'm not homophonic, or uptight... kind of ironic that you're calling everyone else uptight.

Police investigating allegations of racist and homophobic offences have ... But the announcer stumbled over one of the words and said "homophonic offense". ...

The most common categories of homophonic incidents witnessed by ..... Successful strategies to increase the inclination to report homophobic abuse should ...

as for who's homophonic, I think that the current turkish government is certainly homophobic because they referred to Belgium's publication of the fact that ...

]

[And last of all, Mickey Blake was taken aback by the reference (in Arnold Zwicky's list of citations) to "'homophobic' as fear of hobosexuals". She asked "Would that be the fear of people whose sexual orientation is towards hygeine-deficient unemployed migrants?"

My own guess is that it's someone who was so stirred up by the topic as to type with the left index finger rather than the right, while preserving as many other phonological features as possible (i.e. typing 'b' for 'm'). But hobophobic, hobophonic, etc., would be good words for a game of Extended Fictionary, in which you have to make up fake etyomologies involving allegedly real source words. In this case, that would be Greek ὑβός "humpback" -- or is it ὡβός "mercenary soldier"? ]

Posted by Mark Liberman at 08:35 AM

July 20, 2007

Broad Complex-tramtrack-bric-a-brac

Acording to Nicholas Wade, "Scientists Find Genetic Link for a Disorder (Next, Respect?)", NYT, 7/19/2007

Many human genes were first described by geneticists who identified counterpart genes in the laboratory fruitfly. Fruitfly researchers consider it a matter of pride to give genes colorful names, but these are often moderated or disguised by medical researchers who feel absurd names will not help attract research funds. BTBD9, the gene in today's two studies, stands for broad complex-tramtrack-bric-a-brac-domain 9.

You can sometimes learn the story behind fruitfly gene names by consulting FlyNome:

Here at FlyNome, you can find the answer to that burning question: How exactly did they come up with the name for that gene? If you'd like to know the story behind a Drosophila gene name, this is the place for you. Simply type in the gene name above, or go to the search page.

For example, FlyNome knows that bruchpilot (BRP), named in Wagh et al. (2006) Neuron: 49:833-844,

is German and means crash pilot. The gene is named after a famous German movie from the nineteenthirties "Quax, der Bruchpilot" about a pilot who always crashes his planes but survives. This name was chosen because the first flies with a severe reduction of the encoded protein due to RNAi knock-down could not sustain stable flight but crashed to the ground when released into open air.

It also knows that ken and barbie (KEN) was so named, by Kuhnlein, R.P. et. al (1998) Mech. Dev. 79(1,2):161--164, because

The external genitalia are absent in some males and females. Thus these flies are named after the famous dolls who also lack these "features."

(Why the scare quotes around features, I wonder...)

And we can learn from FlyNome the story behind fear of intimacy:

In foi mutants, gonad coalescence is affected. In wild type, the germ cells and gonadal mesoderm cells normally coalesce tightly together during stage 14 of embryogenesis. Since these cells fail to undergo the "big group hug" of coalescence in the mutant, it was named fear of intimacy.

But unfortunately, FlyNome doesn't have anything to say about broad complex-tramtrack-bric-a-brac-domain 9, or BTBD9, or even tramtrack or bric-a-brac.

The FlyNome FAQ continues:

If the story for a gene in which you're interested isn't in our database, you can ask someone to add it. Go to the request page where you can send an email request to an author of a paper referencing the gene. Once that person has entered the information in the FlyNome database, you'll be able to find the answer to why the gene is named as it is!

So I went to FlyBase -- but a search for BTBD9 also comes up empty there. FlyBase does know about bric a brac 1 (bab1) -- also bab2 -- and about tramtrack (ttk), and perhaps these are related. FlyNome has no story to tell about these genes, but the FlyBase description of bab1 gives a clue about its probable origin:

It is involved in the biological processes described with 16 unique terms, many of which group under: regulation of development; sex differentiation; primary metabolism; regulation of developmental pigmentation; organ morphogenesis; metamorphosis (sensu Insecta); regulation of metabolism; behavior; organismal physiological process; sex determination; establishment and/or maintenance of chromatin architecture. 24 alleles are reported. The phenotypes of these alleles are annotated with 25 unique terms, many of which group under: adult segment; metatarsus; metathoracic metatarsus; female reproductive system; adult mesothoracic segment; adult prothoracic segment; peripheral nervous system; ovariole; gonad; nervous system.

Next, I tried a search at euGenes. Here a search turns up BTBD9 as a human gene, in a cross-reference to fruitfly gene CG1826. The full name is given as "BTB (POZ) domain containing 9", with other cross-references to mosquito, mouse, worm and rat genes.

A wildcarded euGenes search for BTB turns up 326 hits. The string BTB itself, turns up the D. melanogaster gene bumper-to-bumper, FlyBase GBgn0015368. But as we'll soon see, this seems to be a false lead. (Anyhow, FlyNome doesn't yet know the story behind "bumper-to-bumper" either).

A further euGenes search indicates that BTBD9 is one of a series of (non-fruit-fly) genes: BTBD1 = "BTB (POZ) domain containing 1", BTBD2 = "BTB (POZ) domain containing 2", BTBD3 = "BTB (POZ) domain containing 3", etc.

The "BTB/POZ domain" is a term that defines a pretty common and important class of genes/proteins, apparently first discussed in O. Albagli et al., "The BTB/POZ domain: a new protein-protein interaction motif common to DNA- and actin-binding protein", Cell Growth Differ 6 (9), 1193-8, 1995, which says:

The BTB (3) (for Broad Complex, tramtrack and bric à brac) or POZ (4) (for poxviruses and zinc finger) is an approximately 120-amino acid conserved and hydrophobic domain present generally at the NH₂-terminal end of numerous proteins including Zinc finger, poxvirus, and actin-binding proteins.

Reference (3) is S. Zollman et al., "The BTB domain, found primarily in Zinc finger proteins, defines an evolutionanily conserved family that includes several developmentally regulated genes in Drosophila", PNAS 91: 10717-10721, 1994. And this confirms the terminological history, though not the story behind it:

The Drosophila bric à brac protein and the transcriptional regulators encoded by tramtrack and Broad-Complex contain a highly conserved domain of ~115 amino acids, which we have cafled the BTB domain. We have identifed six additonal Drosophila genes that encode this domain. Five of these genes are developmentally regulated, and one of them appears to be functionally related to bric a brac. The BTB domain defines a gene family with an estimated 40 members in Drosophila. This domain is found primarily at the N terminus of zinc finger proteins and is evolutionarily conserved from Drosophila to mammals.

So the (syntactic) structure of BTBD9 is [[[Broad Complex][Tramtrack][Bric-a-brac]] Domain] (containing) 9] -- that is, the ninth in a series of genes containing a region coding for the BC/T/Bab amino-acid sequence. Apparently this sequence (or genes coding for it among other things) was independently identified three times, under different names ("Broad Complex", "Tramtrack", "Bric a brac").

The BTB/POZ domain also comes up in the famous fruitless gene, for which FlyNome does have a story to tell:

Fruitless is responsible for all aspects of courtship in Drosophila. The first fru mutants courted males and females indiscriminately. The gene was to be called "fruity", but the more P.C. "fruitless" was chosen.

Posted by Mark Liberman at 08:11 AM

July 19, 2007

They just don't care

Well, I wasn't going to bother. After beating the drum again and again about the careless nonsense that BBC News passes off as science reporting, I was sick of the topic, and I bet that you are too. But in the wake of the breaking scandals about faked call-in shows ("BBC Suspends Quizzes After Problems Exposed", AP, 7/18/2007) and faked "documentary" video (Andrew Pierce and Emma Henry, "BBC apologises to Queen over footage", The Telegraph, 7/13/2007), I guess that a brief note is in order about the BBC's bizarrely false reporting on one recent story to which I have a personal connection.

Their story was "Men 'no less chatty than women'", BBC News, 7/5/2007, and it started this way:

The common notion that women are the more talkative sex has been undermined by scientists in the US.

Researchers who bugged 400 students to log their chats found little difference in word count between the sexes.

The University of Arizona study, in Science, conflicts with previous US research suggesting women talk almost three times as much as men. [emphasis added]

Now the key point about this work was that there never was any "previous research", in the US or anywhere else, "suggesting that women talk almost three times as much as men". Those numbers were invented (made up, concocted, fantasized, fabricated, ...), apparently by popular writers in the relationship-counseling industry, and then spread all over the world's media by Luann Brizendine's pop-neuroscience best-seller The Female Brain.

This was stated explicitly in the Science article (M.R. Mehl, S. Vazire, N. Ramírez-Esparza, R.B. Slatcher and J.W. Pennebaker, "Are Women Really More Talkative Than Men?", Science, 317(5834) p. 82 July 5, 2007), and in Constance Holden's ScienceNOW piece in the same issue ("Talk About a Gender Stereotype", 5 July 2007), and in most of the rest of the press coverage. The case was made in detail in a Language Log post referenced in the Mehl et al. article ("Sex-linked lexical budgets", 8/6/2006).

Anyone could figure this out, given 30 seconds of web searching and two brain cells to rub together.

And as far as I know, no other news outlet in the world got this point wrong except the BBC -- not even the tabloids.

People often accuse the BBC of agenda-driven falsification of stories. Perhaps that's sometimes true, I don't know. But in the cases of science mis-reporting that I'm familiar with -- and there are many of them -- the problem seems to be that the reporters and editors concerned are arrogant, lazy, and not very smart.

I'm reminded of a Gamble Rogers story about a character named "Still Bill" who's trying to trade his hunting dog. The prospective customer decides to lead the dog out into the yard to take a look at her. She walks head-first into the door jamb; then she backs up and makes it through the door, but stumbles over the sill and tumbles down the back steps, caroms off the shed and fetches up against the fence, upside down. The prospect complains that Bill is trying to trade him a dog that's stone blind.

Bill's response?

"She ain't blind -- she just don't care!"

[After further thought, I feel that I should be careful not to judge the character of people whose circumstances are not known to me, merely by reasoning from the results of their actions. Perhaps the BBC News stories in question are turned out by low-level employees who are given only a few minutes to re-write each press release, and are strictly prohibited from doing any independent investigation, even as much as might be accomplished in a half an hour of web research, or a brief interview with an expert. If so, then all the blame belongs to the managers who have thus condemned their writers to produce drivel.

A review of the evidence suggests to me that the fault is more broadly distributed; but of course I don't really know.]

[Update -- Cosma Shalizi writes:

Reading this

"I should be careful not to judge the character of people whose circumstances are not known to me, merely by reasoning from the results of their actions"

made me think of this:

http://www.schneier.com/blog/archives/2007/07/correspondent_i.html

]

Posted by Mark Liberman at 06:50 AM

July 18, 2007

The sound of the King James Bible?

In other linguistic news from the New York Times, Michiko Kakutani ("No Mercy, Please, They're English", 7/17/2007) has reviewed the publication in book form of A.A. Gill's reflections on the English language and the alleged personality traits of English people. The newspaper-column version of one of these essays originally gave rise to our discussion of the cultural roots of word rage ("Word rage outside the anglosphere?", 11/4/2005).

Her review includes this interesting and curious turn of phrase:

... he delivers a finely observed monologue on English accents from "Received Pronunciation" (the sound of the classic novel and the King James Bible) to the increasingly popular Estuary ("flat, unimaginative, diluted Cockney"), adopted by the young who think there is nothing cool about "sounding like a character from 'Tess of the D'Urbervilles.'" [emphasis added]

I wonder whether describing "received pronunciation" as "the sound of the classic novel and the King James Bible" will really help readers to understand what it is like.

For what RP really is -- or was -- see here. Most of its characteristic features developed long after the creation of the King James Bible in 1611. And what "classic novels" would we be talking about? The Dorset dialect of Tess and other Thomas Hardy characters? Somerset's Tom Jones? Charles Dickens' cockneys? The provincial Tristram Shandy?

[Hat tip: Joshua Jensen.]

Posted by Mark Liberman at 08:52 AM

Briefly noted and quoted

Evan Goldstein, "The Language of Farting", The Chronicle Review, 7/20/2007:

Meet Roland the Farter. A minstrel in the court of Henry II of England, Roland had an annual Christmas Day engagement with the king and his fellow revelers. Roland's act consisted of a dance that culminated with his trademark forte: a synchronized jump, whistle, and fart. Though accounts are sketchy, they indicate that Roland's remarkable trifecta was performed simultaneously (and not surprisingly, only once). Roland was so valued as an entertainer that the king rewarded his impressive feat of dexterity with a plot of land.

The story of Roland the Farter is told by Valerie Allen in On Farting: Laughter and Language in the Middle Ages, published by Palgrave Macmillan. Allen uses flatulence as a prism through which to explore the entertainment mores of medieval society. Roland's popularity calls to mind our own longstanding (if sometimes sheepish) embrace of bathroom humor. Among many other revelations found in the pages of Bob Woodward's State of Denial is that President Bush is fond of cracking fart jokes with Karl Rove. And the flatulent campfire scene in Mel Brooks's 1974 film Blazing Saddles remains a cultural touchstone. The young comedian Sarah Silverman once commented that fart jokes are "the sign language of comedy." What is it about farts that we find so funny?

You'll have to read the rest of Evan's article to find the answer -- and to learn about the historical contingencies of "performance farting" -- though regrettably, there is no discussion of whether flatulence can be recursive. (And some people say that the humanities are no longer relevant!)

[Update -- in other flatulence-related news, take a look at this scan from Monday's New York Daily News (from Regret the Error).

Arnold Martin's observation that "Our 'number two' [the fake dog poop] is still our number one" underlines the fact that flatulence is not the only form of excretion with semiotic functions.

The scan was deemed notable because of the misplaced caption underneath the picture. It's not clear whether this was an honest mistake, or some compositor's attempt at political commentary. That's often a problem with farts, as well, and for that matter with many other events that may or may not have been communicative choices.]

Posted by Mark Liberman at 08:09 AM

A principle that no one can convince me that doesn't exist

According to yesterday's NYT Op-Ed by David Brooks ("Heroes and History", 7/17/2007):

Bush is convinced that history is moving in the direction of democracy, or as he said Friday: "It's more of a theological perspective. I do believe there is an Almighty, and I believe a gift of that Almighty to all is freedom. And I will tell you that is a principle that no one can convince me that doesn't exist."

Andrew Sullivan thinks it shows that "[President] Bush isn't merely not a conservative, but a tragi-comic version of what conservatism has long opposed". But what Dick Oehrle noticed -- in an email sent to me among others -- was that W (if he wasn't misquoted) has violated the Fixed Subject Constraint.

The "Fixed Subject Constraint" is not a new euphemism for a torture technique, but rather a linguistic phenomenon that was first noticed and named by Joan Bresnan in 1972, based on examples like these:

(a) I believe that Bush appointed Lute.
(b) I believe Bush appointed Lute.
(c) Who do you think that Bush appointed?
(d) Who do you think Bush appointed?
(e) ??Who do you think that appointed Lute?
(f) Who do you think appointed Lute?

(If you're confused, we're talking about Douglas Lute, the "war czar", appointed on May 15. Remember him?)

The issue at hand is whether it works, grammatically speaking, to say things like

That is a principle that no one can convince me that __ doesn't exist.

where the underbars mark the canonical location of that principle in the most deeply embedded clause. For most speakers of English, sentences like these are pretty bad; but many Americans don't mind them at all. Nicholas Sobin has reported that native English speakers in central Arkansas find such sentences no worse than the ones without the complementizer that ("On Comp-trace constructions in English", LSA 1983), and similarly for people from (at least some parts of) Iowa and Illinois ("The variable status of Comp-trace phenomena", NLLT, 5 33-60, 1987).

The true geographical distribution is not clear, but David Pesetsky has been know to quip that the FSC does not apply in the Central Time Zone. This would certainly include Texas, so president Bush is off the grammatical hook on the grounds of geography, even if he was quoted accurately (which is always unlikely, in the case of a quotation provided by a journalist).

Short of completely re-framing the sentence, those of us who are not from the central time zone have got a couple of other choices. You could omit the (always optional) complementizer that:

That is a principle that no one can convince me doesn't exist.

This results in a sentence that most people find awkward but grammatical. Or you could add a so-called "resumptive pronoun":

That is a principle that no one can convince me that it doesn't exist.

Standard English generally frowns on resumptive pronouns, though they're fairly common in speech and even in some forms of writing. A couple of weeks ago, Barbara Partee discussed a case where this issue came up ("A cat whose owners thought was lost", 7/6/2007).

(And if Dan Everett is right about the Pirahã, they don't have to wory about the problem at all, because they don't have any embedded clauses. So they'd have to use parataxis, something like:

Let's talk about that principle. It exists. No one can convince me otherwise.

Of course, if he's right about their cultural preference for avoiding discussion of things that are not part of immediate experience, then perhaps the existence or non-existence of principles is not a suitable subject for polite Pirahã conversation, in any syntactic form.)

Since 1972, Bresnan's facts have given rise to a very wide range of different explanations, some rather narrow and specific (the "that-trace filter"), others expansive and fundamental (the "empty category Principle"). The phenomenon has been seen by some as a deep and necessary truth about universal grammar, and by others as a superficial and contingent pattern that is not even true for all dialects of English. If you understood all the stories that linguists have told about the facts listed above, you'd know a large fraction of the history of syntactic theory in the last third of a century.

A note on sourcing is in order here. Here's what David Brooks says about where the Bush quotation comes from:

I spent the first four days of last week interviewing senators about Iraq. The mood ranged from despondency to despair. Then on Friday I went to the Roosevelt Room in the White House to hear President Bush answer questions on the same subject. It was like entering a different universe.

Far from being beleaguered, Bush was assertive and good-humored. While some in his administration may be looking for exit strategies, he is unshakably committed to stabilizing Iraq. If Gen. David Petraeus comes back and says he needs more troops and more time, Bush will scrounge up the troops. If GeneralPetraeus says he can get by with fewer, Bush will support that, too.

"Friday" would have been July 13, 2007. There's nothing of the sort on whitehouse.gov for that date, so I assume that the interaction was not recorded or transcribed, at least for the public. Based on extensive experience checking reporters' allegedly verbatim quotations against recordings (in cases where recordings exist), I need to point out that such quotes are almost always spectacularly inaccurate. It's common for more than half of the words to be wrong. (See here and here for some samples.)

And here's a private message to my fellow linguists: wikipedia searches for "fixed subject constraint", "that-trace filter", "comp-trace", "complementizer-trace" and even "resumptive pronoun" come up more or less empty! Resumptive pronouns are mentioned in passing in the article on Irish syntax, but that's it.

[Update -- David Pesetsky writes:

Actually, the effect was first noticed within generative grammar, as far as I know, by David Perlmutter in his dissertation (published shortly thereafter as a book). I don't remember if he gave the constraint a name.
I also seem to remember that something was noted by Jespersen about the effect in MEG. Also that there was a book by John Haiman from a functional perspective that may well have preceded Bresnan's article.
Not sure about Jespersen or the Haiman/Bresnan chronology, but I'm quite certain about Perlmutter.

That would be David M Perlmutter, "Deep and surface structure constraints in syntax", MIT PhD thesis, 1968. There's a pdf behind that link, so I'll find out what he said a bit later, when I have time, if someone doesn't beat me to it.

And the Jespersen reference would probably be to volume 4 of his "A Modern English Grammar on Historical Principles", originally published in 1931, which is also available as a .pdf file.

So if you understood the whole history of this topic back to Jespersen, you'd cover 2/3 of a century of syntactic theory! ]

[David also reminded me that I once sent him a quotation from Junior Wells, introducing a song, along the lines of "This is a song that I think that will live in your hearts forever." ]

[Here's the passage from Perlmutter (1968), on pp 214-215:

As far as I can tell, Dave didn't assign a name to the phenomenon, though he does propose to account for it in terms of a surface structure constraint ("Any sentence other than an Imperative in which there is an S wthat does not contain a subject in surface structure is ungrammatical") that is evocative of the (later?) name "fixed subject constraint".

I don't have the exact Junior Wells citation yet.]

Posted by Mark Liberman at 07:37 AM

I gay, you gay, he gays

Some readers will already have heard, and others will not, that I recently decided to end my employment at the wonderful Department of Linguistics at the University of California, Santa Cruz, and move to the Linguistics and English Language department at the great University of Edinburgh, in Scotland. This is necessitating an extraordinary quantity of work packing books and getting ready to emigrate, so it has been totally impossible for me to post anything on Language Log for a while now. The moving truck arrives tomorrow (Thursday), and it is still not clear whether we can be ready. (Thanks, Rachel, for coming down to help for a day; you are truly a special Language Log fan. Your generosity brings tears to my throat and gives me a lump in my eye, or possibly I have those idioms the wrong way round. Didn't we have fun tossing out all those files and hauling them to the recycling bins? So I was not expecting to write anything for Language Log readers today, not even for Rachel, but I simply cannot miss the opportunity to repeat for you what Dave Landfair (thanks, Dave) told me he witnessed on the Logo network's "Coming Out Stories" show: the mayor of West Sacramento, talking to the camera about how coming out of the closet might affect his politics, said: "I want to start thinking of gay as a verb and not just a noun."

Ye gods. This one is a jaw-dropper even in a world where people think faith is a verb and God is a verb and happiness is a verb and terror is something other than a noun and all adjectives are negatively judgmental and all employees should be forbidden to use gerunds...

The linguistic fact, by the way, is that gay is primarily an adjective, though just like the adjective homosexual it has a secondary use as a count noun referring to a person who has the property in question. If the mayor wants to start thinking of gay as a verb, is it transitive ("I gayed him")? Or intransitive ("How often do you gay")? What meaning does he think of it as having? When someone gays, what is it that he is doing? What is gaying? (Oops, I used a gerund.)

[Added later: Steve at Language Hat points out that there is an intransitive verb in Scottish English pronounced the same as gay. It means "go". But hey, I haven't even got time to type this.]

Look, I have to be getting back to my moving chores. But let me just point out that move really is a verb, while in Standard English chore normally isn't, even though a chore is something you do. (It actually is a verb in some rural American dialects, where you can refer to doing chores as choring; but not because chores are something you do; in those dialects it takes verb suffixes like any other verb.) ‘Verb’ does not mean "thing that you do"! Got that? Because thousands of the people around you don't get it at all.

Posted by Geoffrey K. Pullum at 01:04 AM

July 17, 2007

Still in the Catbert seat?

A couple of days ago, in a post on "Weird logic and Bayesian semantics", I wanted a good example to show that sentences with multiple negations and modals, though hard to process, often turn out to mean exactly what the author intended. I settled on the quotation "That should remove all doubt that our policies are designed for any reason other than evil", from the middle panel of a recent Dilbert cartoon.

I felt that Catbert meant that the policies clearly have no purpose other than evil; and on reflection, I felt that what he said meant what he wanted it to mean.

But perhaps my evaluation was wrong.

Take a look at the strip and see what you think:

Yesterday, Robert Corr wrote from Australia to disagree with my interpretation:

I came to the opposite conclusion. I broke the sentence into its two parts, worked out what each part meant, and put them back together. So:

"That should remove all doubt that..." => "It is certain that"
"...our policies are designed for any reason other than evil." => "...our policies are not designed for evil."

So I think Catbert got it wrong.

I appreciate Mr. Corr's reasoning, but I believe that he is confused by "any", which normally needs to be in the scope of a negation (or question). Thus when he split the sentence, he imagined an extra negation governing "any" in the second half -- or interpreted any in the "free choice" sense that would here combine with other than to the same effect -- and thereby put Catbert inappropriately in the wrong.

So I took the matter to a higher court -- Larry Horn, author of A Natural History of Negation -- and got this ruling:

I think she (Catbert looks like a she, anyway) gets to stay in the Catbert seat. Granted, the sentence would be easier to process if the second half were easier to parse:

"That should remove all doubt that anyone left early" = 'Clearly, no one left early'
"That should remove all doubt that I drank any of the scotch" = 'Clearly, I didn't drink any of the scotch'

So I agree with you that the "any" is probably messing things up for Mr. Corr, and that Catbert is indeed claiming--as presumably intended--that it is clear that "our policies are not designed for any reason other than evil", which amounts to an admission that our policies are designed for no reason other than evil.

The tricky thing here is that we get into the semantics of exclusive propositions, which as everyone since Peter of Spain, William of Sherwood, and Ockham recognized is not always a safe place to hang out. In particular, it could be that our policies are designed for no reason other than evil without being designed for evil, although that does violate a conversational implicature at the very least. Just as I can swear "I love no one other than you" even if I don't love you either -- so too perhaps the company's policies are designed for no reason whatsoever.

Thus, pace Dilbert, Catbert may not be quite as refreshingly honest as she may (eventually) appear.

This solomonic opinion appears to support my argument about the sentence, while at the same time it opens the door to Mr. Corr's conclusion about the policies. Indeed, in my limited experience of corporate life, "no reason whatsoever" often fits better as an policy justification than "evil" does. And in fact, Scott Adams appears to agree, based on this strip from a few days earlier:

So perhaps Catbert's aim, after all, was to hide absurdity behind a false veneer of malevolence. If so, I should have chosen a different example; but in the end, this one serves my purpose, since the point was that such sentences are difficult to process, but their interpretation is not consistently backwards, unlike "cannot be underestimated", "difficult to underestimate", and the like.

[Update -- Jeffrey Kallberg writes:

Doesn't the ambiguity in the second clause of Catbert's statement lie in the placement of "other"? If we change "our policies are designed for any reason other than evil" to "our policies are designed for any other reason than evil" then the sense seems clear. Maybe . . .

Does this editorial change remove all doubt that the sentence is subject to any other interpretation than the one I first attributed to it? Perhaps, but I believe that I've now fatigued certain key brain cells to the point that I find it difficult to underestimate the likelihood of any interpretation for such sentences. I'll try again after lunch.]

[Well, I've had lunch, and things aren't getting any clearer. Brian Decker writes:

I have a slightly different take: I think the word doubt is partly to blame here. Typically doubt, at least as a noun, is followed by the condition that is postulated to be true but which the subject is uncertain about (as opposed to the content of the doubt itself, which is the uncertainty in the condition postulated to be true).

I have a doubt that the shirt is red = You say the shirt is red, but I doubt that and think it is not red, not You say the shirt is not red, but I believe it may be in fact red.

Removing the doubt typically thus converts the condition back to certain.

That should remove all doubt that the shirt is red = The shirt is certainly established as red, not You can no longer believe that the shirt is red, because that doubt has been removed.

Introducing any into the object doubted usually doesn't change this, which means the object remains something postulated to be true that the subject is uncertain about.

I have a doubt that the shirt is any color other than green = You say the shirt is not green, but I doubt that and think it is green , not You say the shirt is green, but I believe it may in fact be some other color.

You would expect (and Mr. Corr did), consistent with above, that removing the doubt would convert the condition back to certain. But that's where the issue is: does the object of the doubt suddenly flip and become the uncertainty in the condition postulated to be true, rather than the condition itself?

That should remove all doubt that the shirt is any color other than green = either The shirt is certainly established as some other color or The shirt is certainly established as green.

The former would be more consistent with what happened we we removed all doubt that the shirt was red, but it doesn't sound right. I think that when you remove a doubt whose object is something involving any, it's as if the doubt's object switches to become a description of the uncertainty rather than a description of what somebody postulated to be true that is now doubted.

Ran Ari-Gur writes:

In your post "Still in the Catbert seat?", you say that Mr. Corr was confused by the presence of "any" in the latter part of the sentence, and point out that "any" "normally needs to be in the scope of a negation (or question)."

Firstly, it's odd to me that "doubt that ___" can license an negative polarity item when introduced by "remove all"; for me, "remove all doubt that" is a strict formula, and a positive one. As a result, Catbert's utterance sounds ungrammatical to me.

But secondly, even if we break things up differently, I agree with Mr. Corr's interpretation. If claim X is "[the company's] policies are designed for [some] reason other than evil," then a "doubt that [X]" would be a belief that they might well be designed only for evil (or for no reason at all), and to remove that doubt would be to remove that belief - i.e., to reassure that they are definitely designed for some reason other than evil.

You could get the opposite interpretation if you take a "doubt that [X]" to be a "suspicion that [X]" (i.e., a "belief that perhaps [X]"); but that's not its usual sense in the formula "remove all doubt that", and I doubt it's what Catbert can mean (though it would explain why (s)he felt it could license an NPI).

And Matt Berends writes:

I think you and Robert Corr are both right. It seems that the sentence "that should remove all doubt that our policies are designed for any reason other than evil" is ambiguous between it is certain that our policies are designed for some reason other than evil (Robert Corr's interpretation) and the policies clearly have no purpose other than evil\(yours). However, I don't think the ambiguity turns on NPI "any" or the position of "other". Rather, I think the sentence simply means something like "it is not uncertain whether our policies are designed for anything other than evil", or in other words all matters regarding the design of the policies are settled -- settled because the policies *are* designed for only evil, or perhaps because they're used for good, too. I'm probably on the wrong track, but I thought it was worth a shot.

At this point, I'm beginning to feel the need to call in some formal logic, though perhaps it's too late.

In any case, I'm sure that Samuel Beckett is enjoying this immensely, as he looks on from an outdoor table at some celestrial bistro. Here we are, a half a dozen intelligent and highly trained lawyers and linguists and whatnot, and we can't get to a consensus on whether a cartoon character's sentence can be construed to express what (s)he apparently meant it to say about doubt, evil and the randomness of organizational policy. ]

[Tom Recht adds a fine scholarly note:

When I first read your Catbert multiple-negation post I agreed with your judgement that the sentence does say what it means, both at first glance and after logical calculation. But the logic of your correspondents' dissenting opinions seems equally robust. I think what's going on is that "doubt" has two meanings: most commonly it means "suspicion that something is not the case", but it can also mean simply "suspicion or apprehension" (i.e., that something IS the case). The second meaning is rare these days, but it's found in Shakespeare:

'Tis a shrewd doubt, though it be but a dream. (Othello)

KING Methinks the power that Edward hath in field
Should not be able to encounter mine.
EXETER The doubt is that he will seduce the rest. (Henry VI)

SERVANT What, think you then the king shall be deposed?
GARDENER Depress'd he is already, and deposed
'Tis doubt he will be (Richard II)

Under this interpretation Catbert's statement works fine: "remove all suspicion that our policies are designed for any reason other than evil" = "our policies are certainly designed for evil". Who ever doubted that Catbert was an Elizabethan?

Not me. As the Clown says in All's Well that Ends Well, in reference to Parolles:

Heere is a purre of Fortunes sir, or of Fortunes Cat, but not a Muscat, that ha's falne into the vncleane fish-pond of ambiguity, and as he sayes is muddied withall.

(OK, he didn't, but he might have.) ]

Posted by Mark Liberman at 07:15 AM

July 16, 2007

The Grammar Vandal strikes in Boston

According to Danielle Dreilinger, "Self-proclaimed 'grammar vandal' goes after public mistakes that grate", Boston Globe, 7/15/2007:

The ads said "run easy," but they made Kate McCulley's teeth clench.

The 22-year-old grammarian stared at Reebok's Marathon-themed posters on her commute from Somerville to Fort Point this spring, on her way to her job as a research assistant at a concierge services company. "RUN EASY BOSTON," the ads announced, inviting locals to . . . do what?

The question began to haunt her.

"Should I run an easy Boston? Should I run, and is Boston a promiscuous city?" she riffed on her travel blog, katesadventures.com. Her conclusion: "Without punctuation, we have nothing." [...]

On May 29, a memorable date for its linguistic personal import, McCulley cracked. The mild-mannered blogger ducked inside (well, next to) a bus shelter on Summer Street by South Station, pulled out her handy sheet of comma stickers, and made one small correction:

"RUN EASY, BOSTON."

She had become the Grammar Vandal.

Bostonians should be grateful, I guess, that no one has been hacked into pieces and that no windows have been broken.

Meanwhile, the Globe's editorial staff apparently could use a grammatical refresher course (by which, of course, I mean to criticize my profession for failing in its educational mission, not the Globe for failing to find a way to recover from our dereliction). The ellipsis in the quote above includes this editorial admonition:

(Grammar note: "Easy" is an adjective, which must never be used to describe a verb, such as "run"; that task calls for the adverb "easily." A sentence addressing someone directly, such as "Run easily," must separate that address from the party being addressed -- in this case, Boston -- with a comma.)

Whoever wrote this -- it's not clear if it was Ms. Dreilinger or one of her editors -- needs to take the general grammatical point up with William Shakespeare, Jack London, Francis Beaumont, Wilfred Owen, and many other worthies, as discussed in an earlier Language Log post: "Amid this vague uncertainty, who walks safe?", 2/23/2007.

For example, Shakespeare had Lysander say:

For ought that ever I could reade,
Could ever heare by tale or historie,
The course of true love never did run smooth
But either it was different in blood.

But according to the Globe's prescription, the course of true love never did run smoothly.

And this is not just some antique Elizabethan quirk. From C. Day Lewis, The Magnetic Mountain:

5 You were my world my breath my seasons
6 Where blood ran easy and springs failed not,
7 Kind was clover to feet exploring
8 A broad earth and all to discover.

[Update -- an anonymous reader writes:

It seems to me that "run easy" and "run easily" aren't interchangeable. "Run easy" is like "Take it easy;" "easy" here seems to mean something like "laid-back." "Run easily" makes me think of someone who is a naturally gifted runner and can run without having to do a lot of training.

True. In cases like "walk safe" or "go free" or "run easy", the second word is a genuine adjective, used appropriately, not a non-standard adverb form, like real in "real nice". You can see an elaborate example of predicative adjectives with run in this quotation from Alexander Pope (from a letter to Walsh dated Oct. 22, 1706):

It is not enough that nothing offends the ear, but a good poet will adapt the very sounds, as well as words, to the things he treats of. So that there is, if one may express it so, a style of sound -- as in describing a gliding stream, the numbers should run easy and flowing; in describing a rough torrent or deluge, sonorous and swelling, and so of the rest.

Note that that there are also some adjectives that can double as adverbs even in formal style, as discussed here.]

[Update #2 -- several readers have written to point out that this echoes the organization invented by David Foster Wallace, a self-described snoot, in his novel Infinite Jest: The Militant Grammarians of Massachusetts. Thus on page 987:

]

[Update #3 -- Jonathan Weinberg writes to remind us of the saying "still waters run deep" (not "deeply"), and the Edward Beach novel and Clark Gable movie "Run Silent, Run Deep" (not "Run Silently, Run Deeply").]

Posted by Mark Liberman at 11:05 AM

The prescriptivist's lament

I can only respond, "Hwanon ferigeað ge fætte scyldas?"

[Hat tip: Tim Leonard]

[Update: more here]

Posted by Mark Liberman at 08:35 AM

With a free colorless green idea in every box!

Yesterday I called for a linguistic response to Ruben Bolling's fantasy of Kellogg's "new marketing campaign ... with mascots we are CERTAIN will not appeal to any kids". This morning's mail brings the first answer, from Sam Young on behalf of Post cereals (unsolicited by Kraft Foods, I hasten to add):

This is an inspired pairing, in my opinion. But we still need a storyboard for the TV spot.

And of course there are many other linguists, and many other cereals.

[Update -- This cereal mascot (from last year) is not a linguist, but it shows that the meme has legs...

And there's a TV ad, too!]

Posted by Mark Liberman at 07:31 AM

July 15, 2007

Listen-a-me!

Zippy returns to the world of clichés, wielded by the fast-talking salesman Shelf-Life, who seems to have no clue about what lies behind them. But he slings casual speech like a pro:

We have miss the boat, get on board, all in black and white, staring right at you, be gold, plain as the nose on your face. And the colloquial discourse-management formulas I'm talking here, you know what I'm saying?, listen to me, I know what I'm talking about, see?, believe it! (plus address vocative pal). And two colloquial uses of got: obligative got in "You gotta get on board!", possessive got in ""I got inside information here!" And the repetition for emphasis.

Then there's the representation of casual pronunciations: -in' instead of -ing one hundred percent of the time (8 occurrences); reduced ya for you (one time out of three); y'know for you know (two times out of three); gotta for got to; and my favorite, listen-a-me. (Shelf-Life uses Auxiliary Reduction -- I'm, you're, it's -- throughout, but then so does Zippy, and these particular "contractions" are used by everybody in speech, all the time.)

I wonder if anyone has compared Shelf-Life's language with the language of David Mamet's salesman characters in Glengarry Glen Ross (beyond the obvious difference that Zippy has to stay away from serious obscenities, while Mamet's characters revel in them).

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 12:21 PM

Luann doesn't read Language Log

Or Science, either. We're not talking about Luann Brizendine -- she's switched sides on this one, at least in interviews -- but Luann DeGroot, the cartoon "typical American girl" created by Greg Evans, who hasn't gotten the memo yet...

Actually, it's not Luann DeGroot either, it's Nancy, her mother. Maybe Luann is better informed.

Posted by Mark Liberman at 11:36 AM

Better than lolcats

Intellectual cereal ads:

Tom the Dancing Bug gives a couple of other inspired examples:

Waspish Brit polemicists, intense German actors, enlightenment philosophers -- OK. But what could be more likely to not sell cereal than linguistics? Polish up your cartooning and photoshopping skills, and send me some storyboards!

Posted by Mark Liberman at 09:37 AM

Weird Logic and Bayesian semantics

There are some common English expressions -- "could care less", "still unpacked", "(not) fail to miss", "cannot be underestimated" -- which are almost always used as if their meanings had a negation added or subtracted.

For example, if unpack means "to removed or undo from packing or a container", as Merriam-Webster tells us, then "still unpacked" should mean "still removed from packing", i.e. "not yet packed", or "not re-packed" or something of the sort. But if you search the web for {"still unpacked"}, you'll find that most of the examples clearly mean "still packed".

(Well, actually, the Observer's Paradox bites us on this one -- the first page of Google hits includes two Language Log posts, a Language Hat post, a Linguist List posting by Larry Horn, and a Boston Globe Brainiac column "Don't fail to miss it" by Jan Freeman. But once you get past the commentary, most of the real uses mean "still packed", e.g. "It's absolutely brand new and still unpacked.")

The standard story about this is that it's hard for our poor monkey brains to deal with the logical interaction of multiple negations. And it's especially hard when some of the negatives are implicit in word meanings (like fail or miss), or when modals (like can or possible) and scalar expressions (like less or still or underestimate) are also involved. We could call this the "semantics is hard -- let's go shopping" theory.

No question, this stuff is difficult. Quick, evaluate Catbert's statement in the middle panel of yesterday's Dilbert:

Catbert obviously intends to say that HR's sole motivation is evil, and that everyone should now recognize this -- but is this really what he said?

Yes, I think so -- but it takes a couple of seconds of conscious checking to be sure, at least for me.

But the thing is, not all combinations of negations, modals and scalar expressions are equally likely to be used illogically, in the end. "Cannot be underestimated" is a case in point. Let's use Google Scholar as our corpus, since the material ought in principle to be carefully written and copy-edited. Here are some counts:

	be overestimated	be underestimated
cannot	7,140	12,900
should not	3,390	27,200
must not	389	4,360

The color-coding is based on my quick evaluation of the 20 examples on the first two pages of returns. My judgment is that nearly all of the "cannot be underestimated" examples are logically reversed, e.g. the first five of the "cannot be underestimated" crop:

The value of prevention in relation to these issues cannot be underestimated, yet it is not easy for physicians to learn...
...many patients have undergone an unnecessary operation whose complications cannot be underestimated...
The importance of this cannot be underestimated.
The need for abstinence from alcohol cannot be underestimated, given its documented synergistic effects on hepatotoxicity when combined with ...

(In these examples, I've read the surrounding paragraph or two in order to be sure of the intended meaning.)

By contrast, in the first 20 pages of returns for each of the other five strings, all of the uses seem to be logically correct, e.g. the first five of the "cannot be overestimated" crop (I'll spare you the other four cells, which you can check for yourself):

In short, Merleau-Ponty's importance to and influence among contemporary philosophers cannot be overestimated.
The importance of this search cannot be overestimated, because success could mean prevention of the heart failure syndrome...
The importance of this book to the development of hydrology as a geoscience cannot be overestimated and will be long-lasting ...
The significance of this condition for the individual and for the family constellation cannot be overestimated by any student of human behavior.
The importance of the boron hydride problem for valency theory cannot be overestimated.

(Thus when Barbara Landau wrote "The importance of this effect shouldn't be overestimated", meaning that its importance shouldn't be underestimated, she committed a very unusual error. In contrast, when Lila Gleitman wrote "The importance of this position cannot be underestimated", meaning that its importance cannot be overestimated, she was behaving the way English speakers usually do.)

The pattern of errors here shows that our monkey brains are not just responding randomly to combinations of negation, modality and scalar expressions. The responses are not always logically correct, according to our analysis at least, but they're far from random.

A second hypothesis that doesn't help is that English, deep down, is still a negative concord language. According to this view, our linguistic DNA yearns to amplify every no into a chorus of negation ("ain't never got nothing from nobody nohow"). Again, the "nothing nohow" theory may be true in general, but it doesn't help predict the specific "cannot be underestimated" pattern.

And a third true but unhelpful hypothesis is that some illogical expressions have simply become idioms or additional word senses. In the case of "still unpacked", this hypothesis is enshrined in the OED's entry for unpacked, which balances the first sense "Not made up in, or put into, a pack" with a second, opposite sense "Not taken out of a pack or parcel". This is no doubt correct, but it fails to explain why the same development occurs spontaneously with some other words, e.g. unwrapped, uncorked, unsealed, unveiled -- but not others, e.g. undressed, uncovered, unplugged.

In the case of "cannot be underestimated", the "illogical idiom" idea fails because other semantically-similar phrases show the same apparent illogic. Maybe it's not a surprise that the active-voice versions work the same way as the passive-voice cases do (data also from Google Scholar):

	overestimate	underestimate
cannot	565	1,310
should not	1,080	6,260
must not	185	1,210

But the same thing happens with phrases like "impossible to over/underestimate", "difficult to over/underestimate":

	overestimate	underestimate
impossible to	804	178
difficult to	2,710	451
hard to	1,640	281
easy to	639	1,540

And likewise if we replace overestimate and underestimate with overrate and underrate, or overstate and understate:

	be overrated	be underrated	be overstated	be understated
cannot	364	184	14,200	3,510
should not	562	860	3,920	1,170
must not	98	196	291	105

OK, so we've disposed of three theories, which may be true but don't predict the facts in this case: "Semantics is Hard", "Nothing Nohow", and "Illogical Idioms". This leaves (at least) two hypotheses still standing, which I'll call "Weird Logic" and "Bayesian Semantics".

The "Weird Logic" theory was discussed and found wanting, perhaps prematurely, in an earlier Language Log post: "We cannot/must not understate/overstate ... ?", 5/6/2004. Its basic idea is that according to what we might call natural or psychological logic, "it is not possible to underestimate the greatness of X" really means that X is really great.

The "Bayesian Semantics" theory was proposed for the expression "fail to miss" in another old Language Log post, "Why are negations so easy to fail to miss?", 2/26/2004. The basic idea is that Nature (or at least the Speech Community) abhors a semantic vacuum:

...you can fail to do something only if you first intended to do it. It's relatively rare for people to intend to miss something, but missing things is generally easy to do, so when you try to miss something, you usually succeed (and you might describe what you did as avoiding rather than missing, anyhow). Therefore, failing to miss things just doesn't come up very often. Perhaps this hole in the semantic paradigm leaves a sort of vacuum that bad fail to miss rushes to fill?

So the claim here would be that the literal meaning of "it is not possible to underestimate X" is something that people are extremely unlikely to intend to convey, and therefore the form is susceptible to adoption by other, more probable meanings.

My breakfast hour is over, and this post is already too long, so evaluation of these theories will have to wait for another day.

Posted by Mark Liberman at 09:06 AM

July 14, 2007

Undernegation of the day

Once you start looking for them, malnegations are everywhere. Margaret Marks of Transblawg sent in this specimen, from Ian Jones, "Rise and fall of a comic genius", The Guardian, 7/12/2007:

Plots swung sickeningly from one cliche to another. Jokes arrived out of the blue for no reason. No attempt was made to cling to reality. Now Homer would end up in new employment six or seven times a series. To date, he's held 118 (and counting) jobs, from missionary to garbage commissioner to grease salesman to fortune cookie writer, which wouldn't be such a damning statistic had almost none of them been particularly funny.

The context makes it clear that Mr. Jones meant something like "...had it not been the case that almost none of them were particularly funny", which he might have tried to render as "... hadn't almost none of them been particularly funny", or perhaps "... had not almost none of them been particularly funny", had not almost neither of those been particularly grammatical.

Then again, maybe he did write "... which wouldn't be such a damning statistic had not almost none of them been particularly funny", and some bewildered copy editor red-penciled one of the words more or less at random.

Posted by Mark Liberman at 02:59 PM

Negative is the new positive

In Mr. Toad's world, good has snowcloned to bad, and Zippy responds with an instance of the snowclone Are We X Yet?:

The models for Are We X Yet? include at least the child's querulous "Are we there yet?" and the question "Are we having fun yet?" (used as a simple catchphrase in a Zippy cartoon I blogged about here). "Are We There Yet?" is a 2005 comedy movie; there's a 2007 sequel "Are We Done Yet?" And you can google up "Are we finished / happy / safe / safer / intimidating / dead / winning / aware / automated / overachieved / paranoid / paranoid enough / scared / doomed / dysfunctional yet?" (these just from the first 50 Google webhits I got). Some of these might be intended literally, with no allusion to the formula, but many strike me as allusive.

You can also find occurrences of the formula with now rather than yet -- lots of "Are we having fun now?", and some with elaborations on the X slot, as in "So, here's the thing: are we middle-aged now?" (Ann Burlingham, on the newsgroup soc.motss, 1/21/05), which asks the question about middle age but also plays on the formula.

[Added 7/15/07: I see that Zippy used "Am I empathetic yet?" in a cartoon I posted a while back: a play on "Misery loves company".]

As for The New Y, it made it into the Snowclones Database on 7/1/07. We've blogged about it many times on Language Log since 2003; my last posting on the snowclone was on 1/18/07, but people send me fresh sightings every so often.

That posting had a diagram put together by Randall Szott at Leisure Arts, showing instances of the snowclone collected during 2005. John Emerson noted immediately that the Leisure Arts sample had plenty of "X is the new black", but you could google up a fair number of "black is the new Y" (for Y = pink, green, blue, blonde, white, yellow, gay, Jewish,... -- that is, with black taken to be either a simple color name or else a social group identifier), so that a more complete chart would have arrowheads at both ends of some of its lines.

But of course the diagram wasn't intended to be an account of all the occurrences of The New Y out there. It would be insane to try to inventory them all, since new ones are created every day. The figure is all over the place; it might well be the most common technique these days for drawing a vivid analogy between earlier Y and current X. The figure does require the reader or listener to do some serious interpretive work, by finding the appropriate context for comparing X and Y -- unless, of course, the context is made explicit, as in a couple of the examples below (marked with an X for explicitness).

Vegetarian is the New Prius (1/20/07 column here; thanks to Paul Handford)

For bears, 30 is the new 90. ("Reported Elsewhere" column here; thanks to Bill Poser, 1/22/07)

How wasabi became the new black, and other tales from the color industry. (subhead for "Made in the shade", by Eric Konisberg, New Yorker 1/22/07, p. 42)

Internet video is the new baby kissing (NPR's Morning Edition 1/23/07, referring to presidential candidates conducting press conferences via the web; thanks to Evan Bradley)

Manorexia is the new pink (head for Darren Franich opinion column, Stanford Daily 1/30/07, p. 4)

Why art is -- and is not -- the new fashion ("Style" column head, NYT Magazine 2/25/07, p. 71)

[X] [Video of Klan leader: Illegal immigrants is bringing us far more members that we did when we were just totally against any ethnic group.]
Comedian Lewis Black: That's right! When it comes to hate, Mexican really is the new Black! (Daily Show 2/28/07; thanks to Karen Davis)

[X] ... insisted Adam Michnik, the Polish writer, "Poland is the new Spain, absolutely." He continued: "Spain was a poor country when it joined the European Union 21 years ago. It no longer is. We will see the same results in Poland." (Roger Cohen, "For Europe, A Moment To Ponder", NYT Week in Review 3/25/07, p. 1)

The finest general study to date of the freshwater-supply crisis in Florida. Drinking water is the new oil. Get used to it. (blurb by Michael Gannon, Univ. of Florida, for Cynthia Barnett's Mirage, NYRB 5/10/07, p. 2)

With Republicans in revolt over the surge and losing patience, and Bushies worried... that "July has become the new September," the president decided to do a p.r. surge... (Maureen Dowd, "History As An Alibi", NYT 7/11/07, p. A23)

And then, for true enthusiasts of The New Y (and American popular culture), a comedy routine by Rev. Mitcz, available on YouTube (hat tip to Eric Baković), with a dozen X Is The New Ys in sequence, all with at least some explanation:

iPod : Walkman
George Bush : Richard Nixon
SUVs : minivans
high class call girls (doing oral) : Fabergé eggs
babies : over-hyped Prada bags
Pat Robertson : Jerry Falwell
Scientologists : Mormons
emo devil's lock haircut : big Heavy Metal hair
emo : goth
straight-edge : out-of-control coke addict
butt implants : boob implants
throat fucking : rough anal

Remember: Rev. Mitcz said it; I didn't.

This closes out my accumulated The New Y files. I posted because of the Zippy cartoon, with its trio of examples of the form "opposite-of-X is the new X" (for positive X), which I took to be notable. Now I'd like to take a vacation from this ubiquitous figure.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 02:19 PM

Multiplex negatio ferblondiat

Doonesbury for 7/13/2007:

Multiple negatives can indeed be tricky-- especially in combination with modals and scalar predicates -- as I was reminded by an email exchange over the past few days.

It started with a message from Lila Gleitman about a coincidence:

First Anna Papafragou and I wrote a paper, it happened to be a review of Whorfian perspectives, and at one point we are pointing out that this approach, if true and general (which it happens not to be!), would be a radical departure from prior thinking and therefore would be of great importance. So we wrote:

(1) The importance of this position cannot be underestimated.

Of course I shouldn't have just said "so," but rather "but," because what we wrote is the opposite of what we meant to convey, unless you conjecture some Freudian-like slip exposing our underlying Whorf-skepticism. The interesting thing is that neither Anna nor I nor our editors nor most readers noticed the slip, until Barbara Landau who read it some months after it was published, Barbara, of course, was laughing at me and Anna at this point. So imagine my glee (and Barbara's chagrin) when just the other day I came across the following, which was the last line in a recent LANDAU paper. Here, Landau is speaking of an experimental effect that she and colleagues had achieved and she meant to say that these had broad theoretical implications, but she wrote

(2) The importance of this effect shouldn't be overestimated.

Funny coincidence, these two verbal gaffes. But afterwards I was trying to reflect on what makes such expressions tricky to assign the right interpretations (to). Something about how the over/under is interacting with both the negatives and modals, but I found that I could not think how to describe this. Why should "should not" and "cannot" yield the interpretations of under/overestimate that they do here?

I responded that there's a series of Language Log posts on this and closely-related subjects, which collectively offer several (non-exclusive) hypotheses about why over- and under-negation are so easy to fail to miss:

1) People get confused about multiple negatives and/or scalar predicates, etc.
2) The connection between English and modal logic may involve some unexpected ambiguities;
3) Negative concord is alive and well in English (or in UG);
4) Odd things become idioms or at least verbal habits ("could care less"; "fail to miss"; "still unpacked").

A message from Kai von Fintel reminded us of a seminal post on semantics etc., citing the ancient wisdom that "no head injury is too trivial to ignore".

And Larry Horn cited his even more seminal 1991 CLS paper "Duplex negatio affirmat...? The economy of double negation" (The seminality of this paper would be further enhanced by the availability of old CLS papers in digital form, so that people could actually, like, read them -- hey, Chicago Linguistic Society, how about it? In this particular case, I'll see if I can find a copy to scan...):

First, some instances illustrating the principle I suggested calling "Triplex negatio confundit", which I attribute to the familiar difficulty (citing P. Wason, H. Clark, and my negation book) of processing negation. I claim there that "The tendency to use a triple negation to convey a positive is especially prevalent when at least one of the negatives is incorporated into the adverb too or as an inherently negative predicate like surprised, avoid, deny, or doubt, as seen in the examples below. In these cases, we have the effect of a REINFORCING ("illogical") double negation canceling out an ordinary negation, yielding a positive."

There was none too poor or too remote not to feel an interest.

Jane Austen, cited in Jespersen (1917: 78-9)

No detail was too small to overlook.

New Yorker 12/14/81, Words of One Syllable Department

People knew too little about him not to vote against him.

Bill Moyers on why voters in 1984 primaries voted for Gary Hart

Nothing is too small or too mean to be disregarded by our scientific economy.

R. H. Patterson, Economy of Capital (1865), cited in Hodgson (1885: 219)

No one is too poor not to own an automobile.

Review by Vincent Canby (N. Y. Times 1/22/84) of "El Norte", characterizing the
naive belief of two young illegal Guatemalan immigrants about riches of America

There was no character created by him into which life and reality were not thrown with such vividness, that the thing written did not seem to his readers the thing actually done. J. Forster, Life of Charles Dickens (1873), cited in Hodgson (1885: 219)

I can't remember when you weren't there,
When I didn't care

For anyone but you...

Opening lines of Kenny Rogers pop song "Through the Years"

I can't say I don't blame him.

Radio disk jockey; meaning in context = 'I don't blame him'

I have but one comfort in thinking of the poor, and that is, that we get somehow adjusted to the condition in which we grow up, and we do not miss the absence of what we have never enjoyed.
Froude, Nemesis of Faith, cited in Hodgson (1885: 218)

It never occurred to me to doubt that your work...would not advance our common object.
Darwin, cited in Jespersen (1917)

One senior White House official said no one ever doubted that Mr. Reagan would allow Mr. Meese's move to the Justice Department to deprive him of a trusted adviser who had served him in his 1980 campaign and later as counselor to the President.

N. Y. Times article "Politics and the Attorney General", 4/21/85

There is no doubt that the commissioner will not give Pete an impartial hearing.

Pete Rose's lawyer Reuven Katz in radio interview, 8/24/89, expressing his
(perhaps premature) confidence in Commissioner Bart Giamatti's fairness

I would not be surprised if his doctoral dissertation committee is not composed of members from several departments within a university.

Letter of recommendation for applicant to Yale Graduate School

(cf. Don't be surprised if it doesn't rain.)

We sincerely hope and insist that peaceful means should be used to solve the Taiwan issue...China has never committed to not taking nonpeaceful means to solve the Taiwan issue simply because such a commitment would make peaceful reunification impossible.

--Chen Defu, Chinese Embassy Press Counselor, letter to editor of N. Y. Times, 7/18/89, A20

As Hodgson (1885: 218) muttered gloomily a century ago, "Piled-up negatives prove easy stumbling-blocks." [This is the useful, if occasionally a bit prescriptive, Hodgson, W. B. (1885) Errors in the Use of English. New York: Appleton.] Then I moved on to the effect of four negations, predictably overwhelming the poor language mechanism and resulting in a range of extremely unfortunate examples:

No one denies that a baby with a neural tube defect isn't a catastrophe, but...
Dr. Philip LaMastra, quoted in the New Haven Advocate, 8/19/81

I have never known another reciter of a speech who could avoid weakening the sentences in his mouth by not thinking of the one that was to come.
H. Cockburn, Memorials (1874), cited in Hodgson (1885: 219)

"Bernie produced what Bernie is supposed to produce", Smith said, "but I don't think, either, that you can single out Bernie as not a guy who is not part of the disappointment."

New York Rangers' general manager Neil Smith, declining (I think) to absolve star forward Bernie Nicholls for his play in the team's first-round playoff series loss, N. Y. Times, 4/15/91, C3

Based on such cases, I proposed the generalization "Quadruplex negatio ferblondiat," with the explanatory footnote:

For readers not familiar with the somewhat obscure Late Latin verb employed here, the standard pronunciation of this loanword is farblondzhet."

My paper contained a footnote relating this legal case, courtesy of a possibly interesting legal application of Triplex Negatio Confundit:

[footnote 13]
Bryant ([English in the Law Courts, New York: Columbia U. Press] 1930: 264) cites a case from the Alabama state court in 1912 in which Aletha Allen, a 80-year-old deaf woman, was killed by a train after having been warned not to go onto the track, prompting her estate to sue the Central of Georgia Railway Company. The original verdict was for the defendant, the jury finding the late Ms. Allen guilty of contributory negligence, but a new trial was granted because of an errant Triplex Negatio:

The charge to the jury had been that unless the jurors believed from the evidence that the engineer did not discover the peril of the woman in time to avoid injury [emphasis mine--LH], they must decide in favor of the defendant. The higher court held that unless meant "if not", the use of the double negative having the effect of making the charge predicate the defendant's right to an acquaintance [sic] based upon the fact that its engineer did see the dangerous position of Aletha Allen in time to prevent the injury. The jury overlooked the grammatical inaccuracy, as the court did, and interpreted the charge as a correct proposition of the law. Thus the court ordered that the original verdict be adhered to.

The decision was based on the interpretation of the intent. The court did not intend to use a double negative; the jury, not realizing that a double negative was used, gave their verdict accordingly.

Thus the principle that Triplex Negatio Confundit is enshrined in the halls of justice, at least in Alabama.

In a subsequent email, Larry went into further etymological detail on ferbondiare:

This very useful term is a Late Latin verb borrowed from the Yiddish. As one web site (GantzehMegillah.com) puts it,

There are certain words in Yiddish which have no English equivalent - and more's the pity. One of my favorites is usually pronounced "far-BLUN-jit," and written "farblondzhet." Its majesty is owed to the fact that, in only three syllables, it describes people in situations that have spun totally out of control, well beyond the descriptive limits of chaos, confusion and emergency. Even the classic "SNAFU" and "FUBAR" of World War II and the "Chinese Fire Drill" of earlier vintage failed to embrace the full range of cataclysmic situations embraced by "farblondzhet."

I was curious about the reference to Jane Austen, attributed to a citation by Otto Jespersen: "There was none too poor or too remote not to feel an interest." That sounds like something Jane Austen might have written, at least in terms of its content, but where? Searches on Google and on Literature Online, surprisingly, come up empty.

I guess that "Jespersen 1917" would be one of the editions of J's seven-volume opus "A Modern English Grammar on Historical Principles". [Update: but my guess is wrong -- it's "Negation in English". Still, see below.] Several versions of volume 4 (on Syntax) are available from the Internet Archive. A search turns up the cited sentence, on p. 455-456, and it turns out that it's not Jane Austen at all!

The crucial sequence is

| Austen P 133 he can have nothing to say to me that anybody need not hear [= that any- body may not hear; that it is necessary that nobody hears]
| NP 1899 there was none too poor or too remote not to feel an interest

Thus Jane was responsible (in Pride and Prejudice) for a sentence in which (Jespersen believed) the interaction of modality and negation went awry ("He can have nothing to say to me that any body need not hear") -- but "NP 1899" is Jespersen's abbreviation for (generic) "Newspaper in 1899", and so Jane is innocent of the "none too poor" sentence.

Posted by Mark Liberman at 11:01 AM

July 13, 2007

The spoils of linguistic piracy

The Linguistic Mystic links to a very silly post by jedimasterwendy from the San Antonio Forum, "Speak English, Your In America Now", which urges us to

Petition to NOT modify our native language to include any foreign language.

This 12-word slogan has the lovely self-refuting property that (according to the OED) all six of its content words are borrowed from other languages: petition from Spanish peticionar, modify from French modifier, native from French natif, language from French langage, include from Latin inclaudere, foreign from French forain.

At this point it's obligatory to quote, again, James D. Nicoll's 5/15/1990 post to rec.arts.sf-lovers:

The problem with defending the purity of the English language is that English is about as pure as a cribhouse whore. We don't just borrow words; on occasion, English has pursued other languages down alleyways to beat them unconscious and rifle their pockets for new vocabulary.

[Update -- Bill Poser writes:

In this connection I have always found it amusing that the title of the major work of the traditional Japanese linguists, Tokieda's Kokugogaku Genron "Theory of the Study of the National Language", is entirely in Chinese.

This has not been a one-way trade. According to Zhou Chenggang and Jiang Yajun, "Wailaici and English borrowings in Chinese", English Today 20(3) 2004:

Borrowings from Japanese form a special category in Chinese, because most of them were acquired without graphic adaptations [i.e. the spelling was also borrowed -- myl]. Such borrowings amount to almost half of the words in Chinese dictionaries of neologisms (cf. Massini 1997:158). There are two types: coinages by Japanese using Chinese characters and Chinese-cum-Japanese-cum-Chinese borrowings. Massini (1997:154) calls the former original loans and the latter return loans. These may be hard to distinguish...

Both the term wailaici (literally 'word that came from the outside') and its predecessor and synonym wailaiyu ('language from the outside') are themselves wailaicis from Japanese (Massini 1997:153).

Another Japanese "return loan"is zhuyi "doctrine" as in shehui zhuyi "socialism" and ziben zhuyi "capitalism".

Lexical borrowing into Chinese is nothing new. According to Jerry Norman, Chinese, pp. 16-17

China's later cultural hegemony in East Asia has been confused with a kind of cultural and linguistic immunity which exempted Chinese from any but the most trivial of outside influences. Widespread acceptance of such a view has no doubt impeded a serious search for foreign influence in Chinese.

The Sino-Tibetan forebears of the Chinese must have come into contact with peoples speaking different and unrelated languages at a very early date. Unforunately, we do not know when the earliest Sino-Tibetan grups moved into the Yellow River Valley, nor do we know what sort of population they encountered when they arrived there. We can surmise, however, that the Sino-Tibetan dialect which is ancestral to Chinese was influenced in a number of ways by the language (or languages) which they encountered at this early period. Some of the typological difference found between Chinese and Tibeto-Burman may in fact be due to such early inguistic contacts. The fact that only a relatively few Chinese words have been shown to be Sino-Tibetan may indicate that a considerable proportion of the Chinese lexicon is of foreign origin. Some of the words for which Sino-Tibetan etymologies are lacking may go back t languages which have since become extinct; in other cases, however, words were borrowed from languages whose descendants still live along the present periphery of the Chinese-speaking area. Here I would like to examine several words that I think are ancient loanwords fro languages whose descendants are still spoken in China, or in countries adjacent to it.

Among the words that Norman proposes as ancient loans are gǒu "dog" (from Miao-Yao), hǔ "tiger" (from Mon-Khmer), dú "calf" (from Altaic).

]

Posted by Mark Liberman at 04:53 PM

July 12, 2007

Dropping the other shoe: "Republic[an]s and Democrats"

Well, I didn't think that he'd do it, but apparently he did. Lane Greene emailed that Waldo Jaquith and Andrew Sullivan reported that President Bush used the phrase "Republics and Democrats" in his press conference this morning. This is news, of a sort, because of the fuss over W's previous uses of "Democrat" in place of "Democratic", argued to be part of a general Republican strategy to delegitimize, disrespect and annoy the majority party. After his last State of the Union message, W offered his Texas dialect as an excuse, along with the unrelated assertion that he's "not good at pronouncing words anyway". Roger Shuy investigated and found no general tendency for him to drop (the ends of) syllables -- at least not -ic -- but apparently syllables near -ic are also in danger, and apocope can happen even to Republic[an]s.

The press conference in question is this one. The transcript has "Republicans":

And as I mentioned, to talk to Bob Gates about it, as well as the Joint Chiefs about it, as well as consult with members of the Congress, both Republicans and Democrats, as I make a decision about the way forward in Iraq.

But what W says is pretty clearly "Republics":

As far as I know, this is neither a dialect feature nor a calculated slur, but simply a speech error. One hypothesis might be that the prospective repetition of [ən] in "Republicans and" ([ɹiˈpʌb.lə.kən.zən]) led to suppression of the first syllable nucleus.

Jon, commenting at Waldo Jaquith's site, quips that "a Republic is someone who belongs to the Republicanic party".

Then again, maybe it's just turn about as fair play. After all, it's now those Republic[an][ic]s who are becoming a thorn in W's side.

Posted by Mark Liberman at 05:39 PM

Eggcorns on OUPblog

A couple of weeks ago Mark Liberman was kind enough to announce my new blogging venture on OUPblog, the official blog of Oxford University Press. Wearing my new hat as editor for American dictionaries at OUP, I hold forth on matters lexicographical in a weekly column with the catchy title "From A To Zimmer." Some columns will be of particular interest to Language Log readers, and this week certainly fits the bill, as it's all about one of our favorite topics: those semantically motivated reshapings of words and phrases we call eggcorns.

If you're new to the field of eggcornology, I hope the column serves as a useful primer. Now that I'm at OUP, it's exciting to be able to poke through the 1.8 billion-word (and counting) Oxford English Corpus to track the mainstreaming of popular eggcorns. Last year, before I started the Oxford job, I tried to discern the utility of the OEC for eggcorn-tracking in a Language Log post based on press reports about the Corpus, but now I have the inside scoop!

[A word of warning: Chris Waigl's indispensable Eggcorn Database, which I link to in the OUPblog post, is currently down. I hope I haven't flooded it with new traffic.]

[Update, 3 pm EDT: The Eggcorn Database is back up and running.]

Posted by Benjamin Zimmer at 09:07 AM

The grass is always greener on the other side of the predicate

Sarah Churchwell, writing in the independent ("Why can't British students write like Americans?", 7/12/2007), tells us that

The combined result of Thatcher's decision that teaching English was supererogatory and the dim memories of people for whom parsing was punishment is written mayhem; my students use punctuation marks interchangeably, as ornamentation, and their malapropisms are worthy of a Restoration comedy (as are their proliferating capital letters). This spring I read about a character who wore "promiscuous clothes", and a novel that was "mellow-dramatic". One essay announced: "I will now offer a few examples to undermine my position". Another wrote of a character who has been misled by the promises of the American Dream: "This concept of her being an aspirant is shown through her excessive longing to be Ben Franklin." Their sentences aren't always this funny, but they're often this garbled, because students are guessing at grammar.

I'm an enthusiast for the idea of teaching linguistic analysis in what used to be called "grammar school". But Dr. Churchwell's examples demonstrate faults of vocabulary, not syntax -- except for the sentence about Ben Franklin, which has no obvious linguistic flaws, except perhaps for a stylistically questionable use of "this".

No amount of parsing practice will teach students that undermine doesn't mean "underline", or persuade them to reject the plausible hypothesis that the word melodrama includes the morpheme mellow. The only effective medicine for symptoms like that, I'm afraid, is to do more reading -- though some systematic vocabulary exercises probably wouldn't hurt.

In fact, if Dr. Churchwell hadn't brought up parsing, I'd have classified this as an example of the practice of using "grammar" to mean "language standards and conventions", and ignored it. But the first paragraph of her essay makes it clear that she's serious about this having something to do with syntax -- and even more surprisingly, that she believes that American youth write better because they are still taught to analyze sentences:

Every spring, I have Rex Harrison's voice in my head, singing: "Why can't the English teach their children how to speak? Norwegians learn Norwegian; the Greeks are taught their Greek." For eight years I've been teaching extremely bright, overwhelmingly middle-class university students studying American and English literature, who achieved minimum A-level scores of three Bs. They are intelligent, skilled at passing exams, and most of them don't know what defines a complete sentence. This is not sarcasm: every year I ask my students to name the three parts of a complete sentence. Usually they mumble, "subject, verb, object" or "subject, verb, predicate". I have never had an English student who knew the answer. The Norwegians and the Greeks do. So do the Americans, because they were taught grammar, vocabulary, and spelling. The majority of middle-class Americans who went to a state school, like me, have known the definition of a complete sentence since age seven. (In case anyone is wondering, the answer is: subject, predicate - which essentially means verb - and complete thought.) [emphasis added]

Although I mostly paid attention during first grade, in the Buchanan elementary school in Mansfield Center, Connecticut, I'm afraid that this tripartite division of the sentence is news to me. But I was often tardy, due to stopping off at the creek or the cow pasture or other attractions on the way to school, and I might have missed that lesson.

Of course, experienced linguists know that "complete thoughts" can be hard to find, and even locating the predicate can be a challenge for novices, as Dave Barry famously explained:

Q. Please explain how to diagram a sentence.
A. First spread the sentence out on a clean, flat surface, such as an ironing board. Then, using a sharp pencil or X-Acto knife, locate the "predicate," which indicates where the action has taken place and is usually located directly behind the gills. For example, in the sentence: "LaMont never would of bit a forest ranger," the action probably took place in a forest. Thus your diagram would be shaped like a little tree with branches sticking out of it to indicate the locations of the various particles of speech, such as your gerunds, proverbs, adjutants, etc.

More seriously, I'm sure that I was told more than once in school about the standard old-fashioned definition of a sentence as "a group of words containing a subject and a predicate and expressing a complete thought". (Who first coined this definition, I wonder, and when?) But as the Cliffs Notes summary of grammar explains, "for this definition to be helpful, you must be able to recognize a subject and a predicate and understand what is meant by 'a complete thought.'" That's where practice with the X-Acto knife comes in.

Even more seriously, it's been a long time since most American students learned any useful grammatical analysis in school. For example, though nearly all college freshmen know that they're supposed to avoid using the passive, fewer than 10% of them (at least among those that I've tested) can accurately locate the passive verbs in a passage of text.

On the other hand, inoculation with the variety of grammar that used to be taught in American schools hasn't prevented Dr. Churchwell from writing in a way that sincerely confused Edward Wilford, who wrote to me from England:

I thought you might be interested in this column, run in the Independent on 11 July. The title asks 'Why can't British Students Write Like Americans? My favourite claim is that all American students know that a sentence consists of three parts: a subject, a predicate, and a 'complete thought'. I've asked many people in the department and no one has given that definition, or confessed to ever having heard it in their schooling. Is this something taught in American schools?

But no doubt a stiff dose of modern linguistics would have immunized her against the compositional choices leading to that semantic miscommunication :-).

[Update -- Simon Overall writes from Australia:

I was interested to see that Dr. Churchwell's page at the University of East Anglia lists her "principle interests"...

This demonstrates that the Hartman/McKean/Skitt Law of Prescriptive Retaliation is in good working order, if only indirectly. Except to make this point, though, I would never tease anyone about a spelling error, since I'm a rather careless speller and a terrible proofreader. I already had to correct "innoculate" in this post, and it wouldn't surprise me to find another mistake or two.]

Posted by Mark Liberman at 08:57 AM

July 11, 2007

Keeping "wrong grammar" off the air

According to Variety Asia, the Thai government has proposed a new television rating system ("Thai TV sector protests against proposed ratings system", 7/11/2007):

The proposed TV system will give a suitability rating to every TV program that goes to air. Ratings will then be used to specify what time of the day the show can be aired. A show designated as "content requiring parental guidance" will only be allowed to screen between 9am and 4pm on weekdays and 8pm to 5am on weekends and public holidays.

Among the reasons for the "requiring parental guidance" flag:

A program will be assigned a "PG" rating if it shows "people speaking with wrong grammar (except for humorous effects)."

The article doesn't say who gets to be the "grammar cop" -- some colonel with time on his hands, I guess, who would presumably delegate the problem to a clerk. There's potential for a comic novel here. Usually it's self-appointed language mavens who get to make up arbitrary prescriptions. Imagine, however, being a young company clerk in Bangkok, endowed with the power to decide (say) that dai "get, be able to" can't be used with compound verbs, or that theung "although" should never be used to start a sentence. And given the general Southeast Asian areal interest in subtle word-play, you could even invent some politicially subversive grammatical prescriptions.

[Hat tip: Brett Reynolds]

Posted by Mark Liberman at 11:07 AM

Autocompletion considered embarrassing

Email address autocompletion is an underappreciated peril of modern life. Once, the danger was failing to distinguish between "reply" and "reply all", or failing to notice that even a simple "reply" would go to a mailing list rather than to an individual. There are many entertaining stories, some of them true, about the consequences of this sort of carelessness.

But if you use an email agent that helpfully provides autocompletion suggestions from your address book, and harvests addresses from every passing message, then a whole new set of mistakes become possible.

Years ago, I once sent... well, never mind. More recently, I composed an elaborate explanation of issues in porting some speech-technology software from Linux to OS-X, and sent it to the CEO of a health-care start-up, whose only connection to the software in question was sharing the same first name as the person I meant to correspond with.

The July 10 Achewood offers an example of autocompletion embarrassment that also includes a dig at the lolcats phenomenon. Chris Onstad seems to agree with Geoff Pullum about this, despite being the author of a comic strip that "portrays the lives of a group of anthropomorphic stuffed toys, robots, and pets":

(I feel Ray's pain -- though in my defense, I never actually transferred the slogan from the "cookie" picture, much less sent it to a journalist.)

Anyhow, there's an application for simple-minded AI here: an email agent that could associate sets of "topics" (represented as regions in n-gram space, or something like that) with individuals, evaluate how well a given message fits with different people's profiles, factor in your own past communication patterns, and act accordingly. The program could re-order suggested addresses; red-flag addresses that seem unexpected; or whatever.

True, you could make things worse, by introducing an obtrusive "helper" like Clippy or a damaging meddlesome misfeature like the "December 1 DWIM effect". But you wouldn't do that, would you? And an unobtrusive and lossless intervention might actually be helpful.

Posted by Mark Liberman at 09:31 AM

July 10, 2007

Annals of Exoticism

Robert Neal Baxter sent this quotation from Michel Malherbe, Parlons Maori (p. 20):

De plus, les associations d'idées conduisent à des emplois souvent inattendus: le mot "ihupuku" qui signifie frugal est employé pour dessigner la classe économique dans les avions. Tout aussi amusant, le mot "utu", signifiant prix, paiement, avait originellement le sens de vengeance, et s'appliquait au prix a payer pour laver un affront.

In addition, the association of ideas often leads to unexpected usages: the word "ihupuku", which means frugal, is used to designate economy class in airplanes. Equally amusing, the word "utu", meaning price or payment, originally meant "vengeance", and applied to the price to be paid in recompense for an injury. [translation by myl]

Robert comments:

I really do find it difficult to understand why people find such trivial facts as this so quirkily amusing obstensibly based exclusively on their being used in "exotic" languages by "exotic" peoples. Surely it's no more "amusing" than the fact that the French word for "work" ("travail") derives from an ancient torture device, the "tripalium"...

We've posted about linguistic exoticizing before, more famously in John McWhorter's "Mohawk philosophy lessons" (11/18/2003), and (with a different twist) in various posts on "Etymology as argument".

This case is especially striking because the first semantic extension cited for Maori -- "frugal" used for "economy class" -- seems remarkably unremarkable, being more or less identical to the European-language equivalents.

And Robert's reference to the etymology of travail is a good point. The OED says that travail, v. is

... held by Romanic scholars generally to represent a late pop.L. or Com. Rom. *trepāliāre, deriv. of trepālium (582 A.D. in Du Cange), an instrument or engine of torture (prob. f. L. trēs, tria three + pālus stake, being so named from its structure). The etymological sense was thus 'to put to torture, torment', passing at an early stage into those of 'afflict, vex, trouble, harass, weary'. Through the refl. sense 'to trouble, afflict, or weary oneself', came the intrans. 'to toil, work hard, labour'. Thence also (as is generally thought) the verbal ns. OF. travail m. and travaille f., ME. travail, -aile:

This made me idly curious about what a trepalium actually was. Lewis & Short seems to have no entry for that word; in fact, search of the whole Perseus site for "trepalium" turns up nothing. And Google's index didn't turn up much besides references to a French Death Metal band.

There's a fantasized picture here, sure to appeal to robot-fetishizing masochists, but the fact that the name of the device is misspelled doesn't fill me with confidence that its structure is rendered in a historically accurate way. If you have better information, let me know.

[As for English work, its etymology is not much fun at all, being merely derived from Indo-European *werg- "to do". Though the AHD does tell us that other English words derivatived from this same root include allergy, surgery, wrought, and orgy. ]

[Update -- Robert N. Johnson and Eulàlia de Bobes both pointed out that Lewis & Short has an entry for "trĭpālis, that has, or is propped up by, three stakes or pales: vineae, Varr. ap. Non. 219, 18". But this would have been a structure for holding up grapevines, though I suppose it might have been repurposed for punishment.

Eulàlia also reported on changes over time in the (Spanish) dictionary of the Real Academia de la Lengua:

They first (1739) refer to 'tripalium' and they mention it is a torture instrument;

then, in the following editions, they don't inform about the etymology (1780-1789);

later on, then they change their mind and they say it comes from a gothic word "dreiban" (1884);

in 1914 they stop mentioning the etymology of 'trabajo' or 'trabajar' (they just give the italian and french translation);

in 1950 they don't even give those translations;

and in 1956 they inform again it comes from 'tripalium' but they say it is an instrument to fix the horses;

they keep this version until 1984, but in 1989 they remove the etymology.

Finally in 1992 edition they mention again it comes from 'tripalium', but don't say anything about the meaning of this word in Latin.

She also observes that "'tripalium' and 'tripalus' in Google find quite a few occurences and explanations".

Marie-Lucie Tarpent informs us that the Petit Robert gives a horse-restraint meaning for (French) travail, but specifying construction in stone:

- meaning 1 ((summarized)): a) painful state, suffering, including that of a woman in childbirth; b) work [a long list]
- meaning 2: ((quote)) Techn. Dispositif servant à immobiliser les grands animaux (chevaux, boeufs) pour pratiquer sur eux certaines opérations. "On ne les ferre ((les chevaux)) que dans un travail des plus solides non en chêne, mais en granit." (Hugo)
A contraption serving to immobilize large animals (horses, oxen) in order to perform varions procedures on them. "They shoe them (horses) only within a very strong 'travail', made not of oak, but of granite".

James Russell turned up J. Cary Davis, "'Trabuculu >> Trabajo' the Case for and against", Hispania (60)1 pp. 108-110, 1977. This paper cites some sources using the spelling "tripalium", so I should not have complained about the artist's spelling merely on the basis of what's in the OED.

More to the point, Davis begins

To those of us who once blithely accepted in good faith that model of phonological development first suggested by Diez -- Latin TRABACULU(M) >> Sp. trabajo, Port. trabalho (et al.) -- it came as a rude shock to find most later authorities summarily dismissing this etymon in favor of the Latin TRIPALIUM. The evidence for the latter choice, in various forms and languages, was clearly overwhelming.

but concludes

Do we not have here a confusion or contamination between trebejo-trebelho-trabelho, from whatever source, and trabajo-trabalho from TRIPALIUM (TRABACULUM??), influenced surely by the various derivatives from TRABE itself?

In other words, if I undersand his argument, he's suggesting that the Romance words for work were... eggcorns! ]

[Update #2 -- John Cowan writes:

I think, based on a bit more research, that no post-Roman ever bothered to describe a trepalium, and everything we think we know about it is based on context and etymology.

You can get a bit more, and better, information by googling for the classicizing spelling tripalium; however, that form not anywhere in Perseus either, probably because it postdates their cutoff point.

I was a little bit familiar with the word through a passing mention of it in T. H. White's lesser known novel Mistress Masham's Repose, in which the Professor is trying to figure out the Latin word trifarie (having lost his du Cange somewhere in the piles and piles of books all over his study) and briefly wonders if tripalie is meant.

In case the backstory doesn't instantly spring to mind, here are some of the relevant passages -- on pp. 75-76:

and on p. 248:

I'm afraid you'll need to (re)read the book in order to understand how this fits into the story.

Boy, they sure don't write children's books the way they used to. Or wait, maybe they do. Sort of.]

[Update #3 -- Chris Coon sent in this passage from Jonathan Raban's 1993 NYT review of Lars Eighner's Travels with Lizbeth ("T he View from a Literary Dumpster", 10/10/1993):

In the mid-1980's, Mr. Eighner, who had no college degree, was working as an attendant at what he calls "the state lunatic asylum" in Austin, Tex., when he quarreled with his supervisor and lost his job. Passing effortlessly through the wide mesh of the welfare system, he was soon evicted from his rented shack. With his dog, Lizbeth, he camped out on the floors of friends' apartments, and when his welcome ran out he slept in parks and on roadsides, foraging for food in Dumpsters. For three years he zigzagged between Austin and Los Angeles; a fat, fortyish hitchhiker in badly torn jeans with a dog, for whom few of us would have stopped on the hard shoulder. This wasn't Robert Louis Stevenson with his donkey or John Steinbeck with Charley: Mr. Eighner and Lizbeth's "travels" restore the word to its roots in travail and trepalium, the triple-staked torture of the Inquisition.

]

[Update #4 - Steve from Language Hat writes:

Interesting post! I have to say, though, it puzzles me that people (I'm not singling you out, I've seen it all over the place) continue to refer to Lewis & Short as if it were "the dictionary" for Latin. It may have been state-of-the-art when T.H. White published Mistress Masham's Repose in 1946, but it was rendered instantly obsolete when the first fascicle of the Oxford Latin Dictionary appeared in 1968, and ever since the OLD was published as a complete book in 1982 there's been no excuse for referring to L&S except as a fond memory or a historical curiosity. It's as if people approached questions about English by referring to Dr. Johnson rather then M-W or the latest Oxford dictionary. Again, this is not aimed at you (except insofar as one might expect linguists to be more aware of such things) -- it seems as if L&S has been so ingrained in the culture (of those aware of the classics, anyway) that it can never be replaced.
True, Latin does not "change" in the way English does, so my Johnson comparison is hyperbolic, but needless to say they've discovered a lot of material and changed their minds about a lot of meanings and etymologies since 1879, when L&S was first published -- not to mention that the latter is in large part based on a translation of Wilhelm Freund's Wörterbuch der Lateinischen Sprache (1834-45)!
However, in this case the OLD wouldn't have changed anything, since they too have tripalis but not tripalium.

I like the availability of Lewis & Short to everyone for free online via Perseus, and I guess that I did unconsciously accept the idea that classical Latin hasn't changed recently. But I'll take pains to check out OLD in the future.]

Posted by Mark Liberman at 06:32 AM

July 09, 2007

Brizendine turns myth slayer

The San Francisco Chronicle took the appearance of the new paper in Science about women's and men's chattiness as a prompt for a front-page story last Friday (July 6), and of course got some quotes from San Francisco resident and myth spreader Louann Brizendine. Quotes of astonishing disingenuousness, it turns out. Brizendine's newest story is this:

My book is really about hormones, and that one line [about women uttering three times as many words per day as men] has been taken out of context. It's fascinating, anytime you talk about sex differences, it's controversial. But the bottom line is, there are more similarities than differences between men and women.

So first she claims to be just an ordinary working endocrinologist. Then, like a politician caught on tape saying something derogatory about negroes, she plays the "I-was-taken-out-of-context" card. Next, she ruminates in wonderment at the controversiality of the whole topic (could it be the fault of the press, perhaps, pumping all this up?), and then, in a dramatic big-lie U-turn, she endorses the "more-similarities- than-differences" position that properly belongs to her critics. Words almost fail me. And yet they must not, for there is more.

The Chronicle quotes her indirectly as going on in this manner:

The important question now, she said, is how the stereotype started in the first place. Psychologists don't know exactly where the myth came from, but Brizendine speculates it probably took hold in the 1950s or so, when men worked the 9-to-5 jobs and women stayed home with the kids. At the end of the day, men would come home to wives who wanted to talk about the children the house and finances — basically, what felt like a lot of nagging, said Brizendine.

The empirical basis here is evaporating (did anyone count the words uttered in the 1950s by tired hubbies and home-maker wives, and does this have anything to do with hormones?); and it's Brizendine now who is the hunter of the origins of the myth — a myth that she now implies she has almost nothing to do with! Then what has Mark Liberman been doing lo these many posts? And why doesn't she mention him? (Rhetorical question. Don't bother to answer.)

Really, this is the weirdest turn yet in the long saga of Brizendine dishonesty. It's like finding Paris Hilton representing herself as a California Highway Patrol officer. It's like one of the vampires suddenly starting to talk like Buffy the Vampire Slayer.

Posted by Geoffrey K. Pullum at 03:55 PM

People are the only agents for the job

My current NSF project Expressive content and the semantics of contexts made a headline today. Journalist J. M. Berger has a piece in today's Boston Globe Health and Science section called 'Coming soon, a linguist's guide to obscenities'. I like the article. It mixes an exuberance for pure science with an awareness that applications are important. This balance runs throughout the NSF's mission statement, and it is a theme of my project description, so I am gratified to find my work presented in this way to the general public. The bottom line is this: expressive words of the sort we are studying (swears, honorifics, epithets, and their ilk) have the power to shape public discourse in dramatic ways, so it is in our best interests to get a grip on how they work.

The opening line of the Globe piece is a doozy, though.

Before the science, before the implications for public discourse, law enforcement, linguistic theory, and so forth, we get the price tag: $200,000. It looks big. Not physical-sciences big, but big for a project that will mainly involve searching through corpora and probing speakers' intuitions.

The money is mostly people money. It will be used to fund graduate student researchers. And the reality is that researchers are expensive. They need to be paid, of course, but that's just the start (they make a pittance). The grant also needs to cover their university fees, their health care charges, and a host of other costs. This is, in my view, the reality of hiring people. I'm certainly not complaining. In the case of this project, it reflects the fact that people are the only agents for the job (at least in 2007).

Just think back on recent news stories concerning expressive language. Language Log has covered all of the big ones: Imus, The Redskins, Grey's Anatomy, and so forth and so on. To understand any of them, you need to have a deep understanding of the words involved, the language they are embedded in, the cultural context in which they were uttered, who the speaker was and what he or she intended to say, who was listening and what he or she expected (wanted) to hear, and many other complicated factors. Only highly skilled humans can do this analysis and, as I've said, they don't come especially cheap.

It's an oddity of our times that the costs would probably make more sense to people if the project's centerpiece were a massive super computer or some similarly outsized gadget. But, for this work, we need systems of significantly greater complexity.

Posted by Christopher Potts at 12:32 PM

Snowclonerei ist überall

Emmanuel Maria Dammerer wrote to announce his "Buch von der deutschen Snowclonerey", which is "Versuch einer Definition der Snowclones mit bekannten deutschsprachigen Beispielen" ("A preliminary definition of the snowclone phenomenon, including a list of frequent German examples"). The site's name is itself a snowclone, taking off on the title of Martin Opitz's 1624 "Buch von der deutschen Poeterey".

Note the echo of the archaic spelling "Poeterey" (modern "Poeterei") in Snowclonerey, showing that the analogical processes involved in snowcloning are not limited to simple lexical or morphological substitution.

And Lane Greene wrote to point out that Achewood for 6/19/2007 has a riff on the staleness of the snowclone "In X, Y Z's you":

Posted by Mark Liberman at 12:14 PM

"Nearly/almost no" vs. "not nearly/almost"

In connection with three earlier Language Log posts ("Why is 'nearly no' nearly not?", 6/14/2007; "Nearly no: a gnarly knot", 6/16/2007; "Nearly and almost", 6/24/2007), I recently got a fascinating note from Lucia Pozzan and Susan Schweitzer.

The earlier posts observed that although nearly seems to mean just about the same thing as almost, it is much rarer before negatives:

	no one	everyone
nearly	27.6K	1.29M
almost	1.06M	1.66M
almost/nearly ratio	38.4	1.29

Lucia and Susan started by asking a simple question that the rest of us missed -- what about the interaction of nearly and almost with negation in the opposite order, e.g. "not nearly" and "not almost"? Answer: it goes the opposite way! For example:

	is\|was\|are\|were not __ enough	is\|was\|are\|were __ enough
nearly	253,000	36,200
almost	103	184,000
almost/nearly ratio	0.0004	5.08

A guest post by Lucia and Susan about this is below.

The previous discussion on nearly and almost has focused on the different behavior of these two elements when followed by universal negative expressions such as "never", "nobody" etc...Nearly feels bad, almost feels perfect. Now, it is interesting to see that there are other negative contexts in which almost and nearly behave differently: interestingly enough, the fact get reversed and when negation precedes these elements, almost is pretty bad and nearly is prefect!

Thus "not nearly" is commoner than "not almost" (2.22M vs. 338K). Moreover, "not almost" seems in many cases ungrammatical; if not ungrammatical "not almost" has to be in some sort of echo context and seems to have a different interpretation from "not nearly" Compare:

1. Not nearly as good as it used to be
2. *?not almost as good as it used to be

Notice that "not nearly as good" seems to mean something different from "not quite as good as it used" to be, namely "way worse than it used to be". Let's look at "NEG almost" and "NEG nearly" in an echo context (with almost in the previous context):

3. A: Given that we are almost done with the paper, we should celebrate.

B: I am not almost done with my paper --

i. I am done!

ii: I'm still working on the results section.

B negates the almost "doneness" of the paper, by either saying that he is done or still at some earlier point than what he considers the "almost done" point.

Now suppose B said:

4. I am not nearly done with my paper

This could only mean that one is far from being done. The same seem to happen with numbers:

5. The candidate got almost 500 votes (slightly less than 500, suppose 470)
6. ( in echo context) The candidate didn't get almost 500 votes (any number from 0 to ~470 or more than 500 votes)
7. The candidate didn't get nearly 500 votes (we got way less than 470, say 50).

So: almost and nearly, when in the scope of negation, behave differently. Not almost X is (in the relevant case) the complement of whatever we think almost X is, while not nearly X picks up a number that is on the other side of the scale.
(Something perhaps on these lines: X= 500
Nearly X = from X-1 to 500 - (10 % 500), that is from 499 to 450
Not nearly X = from 0 to 0 + (10% 500), that is from 0 to 50)

One possibility would be to develop Jerry Sadock's idea from June 24: "Nearly n connotes that n exceeds (hence is better than) what was expected or hoped for [...]". Suppose that, whenever we use nearly, we suggest that some a value that was expected for was exceeded. Hence when we say "Nearly 10 dollars": we convey that were expecting less (suppose 5 dollars), and hence the obtained result exceeded the expectation, by having way more than 5 (suppose 9). Could we then say that when we use "Not nearly 10" we convey that a reasonable expectation nonetheless was 5. but that our amount exceeded (in worse) not only the goal but even the reasonable (5 dollars) expectation?

We think the conventional implicature hypothesis works nicely in the positive contexts, where there is an alternation between almost and nearly. The problem is that it is not clear how to build in an expectation in the negative one, where the alternation between almost and nearly is less productive and nearly seems to be able to appear in out of the blue contexts.

One other suggestion comes from Italian. Italian has only one word for translating nearly/almost, quasi. Quasi behaves exactly like almost under negation. It is weird unless an echo context is provided. How does then the concept expressed by "NOT nearly" get conveyed? Interestingly, Italian uses an adverb that is semantically related (but opposite) to nearly: lontanamente (which means "farly/by far")

8. Il candidato non ha preso lontanamente 500 voti.

The candidate didn't get farly 500 votes.

But if lontanamente means "farly", then shouldn't "not farly" mean "near"? Is this a case where one would want to claim a failure of semantic compositionality, a "close miss" situation?

Notice that in these cases we cannot say that lontanamente has wide scope over the whole sentence, given that lontanamente has to be in the scope of negation:

9.a : *Lontanamente la miglior pizza della città

By far the best pizza in town

9.b Non è lontanamente la miglior pizza della città

It is not nearly the best pizza in town

Interestingly, in the Italian cases, we can have an optional even: "nemmeno" or "neanche".

10. Il candidato non ha preso neanche lontanamente 500 voti.

The candidate didn't get even by far 500 votes.

Now, we would like to suggets than in the English cases, when nearly is in the scope of negation, a covert even is present (or at least, nearly acts as if it were). Suppose in these cases we have an implicit ranking of propositions (in terms of most to least likely to happen or in terms of a value going from high to low) with respect to a goal (500 votes). [We use X > Y to represent the notion that X is better than Y.]

Get 500 votes >
Get nearly 500 (say, 470 votes) >
Get not nearly 500 votes.

In turn, "not nearly 500" implies a ranking like the following

Get 400 votes >
Get 300 >
Get 200 >
Get 100...

And in this case, our covert even would tell us that the proposition with the lowest ranking (let's say 0-100) holds. Now if this is reasonable, something similar could be at play when nearly is in construction with "no one", "nothing", and so on: a covert even would get triggered by the interaction of nearly and negation. But even is incompatible with such quantifiers (*even no one liked it); and hence such sentences would be ruled out.

Posted by Mark Liberman at 08:26 AM

July 08, 2007

What men and women actually talk about

I posted earlier today about sex-linked vocabulary items, as imagined by an anonymous BBC News writer and as measured in a sample of weblog text by Koppel et al. ("What men and women blog about"). It occurred to me that I could easily check the generality of these two sets of items by searching the LDC's collection of transcriptions of telephone conversations. (Thus making an even larger mountain out of the original molehill of an article -- but it's 96 degrees out today, and I'm putting off going out to run errands...)

The corpus that I used includes 14,136 conversations, comprising a total of 26,151,602 words. Out of the 31 items listed in the BBC News article (they claim 46, but quantification is obviously not their strong suit), 10 are either too rare or too British or too topical to occur at all in this corpus ("home birth", "pomegranate", "conventionally attractive", "Jessica Metcalfe", "footless tights", "kitten heels", "agony aunt", "handbagging", "beefeater", "concealer"). Of the remaining 21, 5 are actually used at a higher rate by males in the conversational corpus ("what are you thinking", "Afghanistan", "flexible working", "Ms", "Middleton"). Only "babies", "absolutely beautiful", "pilates", and "heels" seem to be be reasonably common words or phrases that are actually useful indicators of a female speaker in this corpus.

Here are the details, with the raw counts and the counts normalized as frequency per million words (note that there are more female than male speakers in this collection, 15,685 to 12,589).

Item	Women count (f/M)	Men count (f/M)
book club	11 (.79)	7 (.6)
accessorize	1 (.07)	0
body image	3 (.21)	1 (.09)
empowering	3 (.21)	2 (.17)
burlesque	2 (.14)	1 (.09)
size zero	1 (.07)	0
pilates	37 (2.65)	9 (.77)
cellulite	2 (.14)	0
absolutely beautiful	27 (1.93)	8 (.68)
breastfeeding	15 (1.07)	2 (.17)
emotional intelligence	0	1 (.09)
heels	37 (2.65)	15 (1.28)
what are you thinking	10 (.72)	12 (1.03)
feminism	3 (.21)	2 (.17)
afghanistan	168 (12.01)	212 (18.13)
airbrushing	1 (.07)	0
flexible working	0	1 (.09)
babies	419 (29.96)	97 (18.13)
superwoman	1 (.07)	0
Ms	9 (.64)	13 (1.11)
Middleton	0	1 (.09)
why	10390 (743)	8266 (707)

If we turn instead to the list of content-words from the Koppel et al. study, we get much better predictions. Only one of the items is missing from the conversational corpus ("gb", which was presumably an abbreviation for "gigabyte", specific to the textual mode, as Cory Lubliner has pointed out to me). In general, the frequencies are higher. And there are no reversals -- all the items that were sex-associated in the weblog sample are sex-associated in the same direction in this conversational sample. For comparison, I've taken the top ten items from each end of their list (male-associated and then female-associated):

Item	Women count (f/M)	Men count (f/M)
linux	2 (.14)	10 (.86)
microsoft	80 (5.72)	145 (12.4)
gaming	23 (1.64)	40 (3.42)
server	19 (1.36)	26 (2.22)
software	137 (9.8)	198 (16.93)
programming	86 (6.15)	122 (10.43)
google	38 (2.72)	48 (4.11)
data	84 (6.01)	125 (10.69)
graphics	44 (2.15)	76 (6.5)
india	155 (11.08)	229 (19.59)

cute	668 (47.77)	164 (14.03)
gosh	2242 (160.34)	530 (45.33)
kisses	15 (1.07)	5 (.43)
yummy	21 (1.5)	1 (.09)
mommy	154 (11.01)	20 (1.71)
boyfriend	743 (53.14)	102 (8.72)
skirt	47 (3.36)	7 (0.6)
adorable	57 (4.08)	13 (1.11)
husband	9168 (655.65)	484 (41.4)
hubby	10 (.72)	3 (.26)

I'm sure that if we sorted words by information gain with respect to sex determination, we'd get a different ranking from these conversations than Koppel et al. got from their weblog corpus. But it's encouraging that predictions from the weblog sample are so reliably maintained in the (very different) conversational data.

[Note that my title, "What men and women actually talk about", is mostly tongue-in-cheek. I'm not looking at anything except counts of some of the words that people choose to use. And obviously any individual's word choices would vary widely, depending on the context and the topic.

The material that I searched comes from a variety of sources, and does include some conversations in which people talked with friends and family about whatever they wanted to. But most of the transcribed conversations were with randomly-assigned strangers, where the participants were asked to talk about a randomly-assigned topic (among a set that they had previously agreed they would be willing to discuss). One of the sub-collections had 70 such topics; another had 30. You can read more about some of these collections here, here, here, here, etc.]

Posted by Mark Liberman at 03:05 PM

The Relationship Between Underwear and Literacy

Although the invention of printing with movable type is often taken to be the technology that led to greatly increased literacy, a recent paper argues that a key step took place earlier, namely the development of rag paper. Until then, in Europe books were all written on parchment, which was very expensive though very durable. (The archival copies of British Acts of Parliament are still printed on vellum.) Owning a book was doubly expensive because not only did it require many hours of skilled labor to copy but the material of which the copy was made was expensive. Rag paper provided a suitable material at much lower cost, and its development therefore led to an increase in literacy.

The interesting thing is, where did the rags come from? In mediaeval Europe, most clothing was made of wool, which is ill-suited to making paper. The key development came in the 13th century when more people began to wear linen underclothing. This practice led to a significant increase in the supply of linen rags, from which paper was made.

Posted by Bill Poser at 02:38 PM

Iraqi reversal

The web comic Overcompensating returns to Snowclone World with an occurrence of a chiastic snowclone that originated with a Yakov Smirnoff catchphrase:

The last time we looked at an Overcompensating snowclone, it was a variant of the Eskimo N figure that gave snowclones their name. Now it's a figure that Mark Liberman first posted about here on 1/29/04 in "In Soviet Russia, snowclones overuse you" and that made it into the snowclone database on 5/22/07 under the heading "In Soviet Russia, X Ys you!" (a.k.a. "Russian reversal").

The general form is "In P, X Ys you" or, even more generally, "In P, X Ys Z", where P is a placename, Y a verb, and X and Z the subject and object, respectively, of Y. We would normally expect "Z Ys X", but instead we get the reversed "X Ys Z": "TV watches you" instead of the expected "you watch TV" (in the Smirnoff original), "snowclones overuse you" instead of the expected "you overuse snowclones" (in Mark's title), "church and state separates you" (why singular "separates"?) instead of "you separate church and state" (in the cartoon).

In any case, we have Weedmaster P (who's something of a pothead) deflecting Jeffrey's pun by responding with another type of play with words -- but one that has no visible relevance in the context.

(Hat tip to Hannah Flaherty.)

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 01:23 PM

What men and women blog about

Chris Brew writes, under the Subject line "BBC Watch":

This is probably the silliest contribution to the Brizendine discussion yet ("What women talk about", BBC News, 7/6/2007).

The original claim is false, they say. So let's poll readers to find which alternative spurious claims would most appeal to them.

I agree, this is a good example of the tabloidification of BBC science coverage.

If they cared about the answer to their question, instead of about pandering to their readers' prejudices, they might look at the research on the topic. This sort of research has become fairly easy to do, and quite a bit of it has been done.

For example, they might have read M. Koppel, J. Schler, S. Argamon and J. Pennebaker, "Effects of Age and Gender on Blogging" (in AAAI 2006 Spring Symposium on Computational Approaches to Analysing Weblogs), which looked at "all blogs accessible from blogger.com one day in August 2004, downloading each one that included author-provided indication of gender and at least 200 appearances of common English words". This created a corpus of "over 71,000" blogs, from which they created "a subcorpus consisting of an equal number of male and female blogs in each age group, by randomly discarding surplus documents in the larger category". The result was "a total of 37,478 blogs ... comprising 1,405,209 blog entries and 295,526,889 words".

The table below shows male vs. female word frequencies (per 10,000 words) for those "content-based unigrams" (i.e. content words) with the greatest information gain for gender in this sub-corpus, along with standard errors for the frequency estimates.

feature	male	female
linux	0.53±0.04	0.03±0.01
microsoft	0.63±0.05	0.08±0.01
gaming	0.25±0.02	0.04±0.00
server	0.76±0.05	0.13±0.01
software	0.99±0.05	0.17±0.02
gb	0.27±0.02	0.05±0.01
programming	0.36±0.02	0.08±0.01
google	0.90±0.04	0.19±0.02
data	0.62±0.03	0.14±0.01
graphics	0.27±0.02	0.06±0.01
india	0.62±0.04	0.15±0.01
nations	0.25±0.01	0.06±0.01
democracy	0.23±0.01	0.06±0.01
users	0.45±0.02	0.11±0.01
economic	0.26±0.01	0.07±0.01
shopping	0.66±0.02	1.48±0.03
mom	2.07±0.05	4.69±0.08
cried	0.31±0.01	0.72±0.02
freaked	0.08±0.01	0.21±0.01
pink	0.33±0.02	0.85±0.03
cute	0.83±0.03	2.32±0.04
gosh	0.17±0.01	0.47±0.02
kisses	0.08±0.01	0.28±0.01
yummy	0.10±0.01	0.36±0.01
mommy	0.08±0.01	0.31±0.02
boyfriend	0.41±0.02	1.73±0.04
skirt	0.06±0.01	0.26±0.01
adorable	0.05±0.00	0.23±0.01
husband	0.28±0.01	1.38±0.04
hubby	0.01±0.00	0.30±0.02

Their conclusion:

Male bloggers of all ages write more about politics, technology and money than do their female cohorts. Female bloggers discuss their personal lives -- and use more personal writing style -- much more than males do. Furthermore, for bloggers of each gender, a clear pattern of differences in content and style over age is apparent. Regardless of gender, writing style grows increasingly "male" with age: pronouns and assent/negation become scarcer, while prepositions and determiners become more frequent. Blog words are a clear hallmark of youth, while the use hyperlinks increases with age. Content also evolves with age in ways that could have been anticipated.

This is not the first time we've observed that "young men talk like old women".

The words in that list are certainly consistent with stereotypes, both of women vs. men and of bloggers vs. the general public. So look here, BBC News: you can pander to your readers' prejudices without actually having to make stuff up! Of course, you'd have to talk with some scientists or even read some scientific articles, rather than just rewriting press releases or riffing about your personal issues, but hey, no pain, no gain.

[Note that James Pennebaker, one of the authors of the paper described above, is also one of the authors of the recent Science paper. Another relevant recent paper from his shop is Newman, M.L., Groom, C.J., Handelman, L.D., & Pennebaker, J.W., "Gender differences in language use: An analysis of 14,000 text samples", Discourse Processes (in press). ]

[Update -- Peter Howard writes:

Oh, come on, Mark. It's quite clear from the context that the BBC article you refer to isn't supposed to be science reporting; it's just a bit of fun. It's in the light-hearted *magazine* section of the BBC News website, not in one of their 'serious' columns. Your ongoing criticism of BBC science coverage is usually valuable, but this is just tilting at windmills. (BTW an earlier version of the article contained the phrase 'tongue in cheek', but they seem to have removed that. Perhaps they thought they were stating the bleedin' obvious.)
If you want to have a go at the BBC on this, tackle them on their purportedly serious take on the story ("Men 'no less chatty than women'", 7/5/2007), which contains the sentence:
"The University of Arizona study, in Science, conflicts with previous US research suggesting women talk almost three times as much as men. "
That might give you something to legitimately complain about.

Fair enough. But I'm tired of criticizing them for obvious stuff like that, and I thought that Chris Brew's comment was probably an accurate picture of their editorial thought processes. (For some less temperate, but more entertaining, reactions, see here. And posting on this gave me an opportunity to cite some of the genuine research on the topic, which unfortunately the BBC News is highly unlikely to do.

And speaking of genuine research, Tim Finin reminded me of H. Liu and Rada Mihalcea, "Of Men, Women, and Computers: Data-Driven Gender Modeling for Improved User Interfaces, ICWSW 2007.]

Posted by Mark Liberman at 12:02 PM

Argumentative Dogs

This morning's Sally Forth:

Sally and Ralph are apparently reading job applications, as indicated in the opening panel:

What caught my eye was Ralph's "If one were to describe their dog as 'argumentative', would they have revealed too much?" (remember that singular they is divinely sanctioned), but then I was caught trying to figure out what this conversation actually means.

Is Ralph asking about a phrase he read in the job application? Or is he thinking about his own life, and how he might describe it? Or both? Is Sally being politely agreeable, or subversively critical? Or both?

Meanwhile, on the argumentative dog beat, there's this morning's Cathy:

And this morning's Dilbert, featuring a typically subtle argument from Dogbert:

Dogbert's argument reworks the traditional sophist's argument for buying lessons in rhetoric ("You should buy my lessons so that you can evaluate my argument that you should buy my lessons"), by developing the infinite regress implicit in recursive consultation.

The argumentative animal in this morning's Fusco Brothers is apparently supposed to be Axel the Wolverine, but I believe that most readers probably see him as a dog (or perhaps a giant rat):

Posted by Mark Liberman at 09:23 AM

The Linguistic Abilities of the Presidential Candidates

In the 2004 election, John Kerry's ability to speak French played a certain role. In anticipation of the 2008 election, I've gathered what information I could about the linguistic abilities of potential candidates. Curiously, it seems that few American Presidents have been able to speak any language other than English. Only one has had a language other than English as his first language: though born in New York State, Martin van Buren (1837-1841), spoke Dutch before he spoke English. Of recent Presidents, Jimmy Carter is reported to speak imperfect but functional Spanish. I don't know if he did when he was President or if he has improved in his retirement. As far as I can tell, no other recent President has been able to speak a language other than English. (President George Bush is often said to speak Spanish, but most of my sources indicate that he speaks it very poorly. However, he apparently understands Spanish well. His brother Jeb, however, is truly fluent.)

Michael Bloomberg (Republican): Speaks no language other than English well but is reported to be studying Spanish. Source
Sam Brownback (Republican): Is reported to have been taking Spanish classes since 2004. I have no information on what level of ability this has led to. Source
Hillary Clinton (Democrat): Apparently monolingual.
Christopher Dodd (Democrat): Speaks Spanish quite passably, though with a definite American accent, as a result of service in the Peace Corps. Source Source Source Source
John Edwards (Democrat): Appears to speak no language other than English.
Jim Gilmore (Republican): Speaks fluent German as a result of service as a US Army counterintelligence officer in West Germany. Source
Newt Gingrich (Republican): Speaks no language other than English well but is reported to be studying Spanish. Source
Rudy Giuliani (Republican): In spite of his identification with the Italian-American community, I can find no indication that he speaks Italian. Apparently a monolingual English speaker. His wife is reported to be learning Spanish.
Mike Gravel (Democrat): Is a fluent native speaker of Canadian French. Source Source
Mike Huckabee (Republican): Speaks no language other than English but reads New Testament Greek. Source
John Kerry (Democrat): Speaks fluent French. Source Source His wife Teresa speaks fluent Spanish, French, Italian and Portuguese. Source Source
Dennis Kucinich (Democrat): Has been studying Spanish. He is apparently not yet very fluent but is able to speak to some extent extemporaneously and carry on a conversation of sorts. Source
John McCain (Republican): He is not reported to speak any language other than English. If he learned Vietnamese during his years in Vietnam, he does not seem to have mentioned it.
Barack Obama (Democrat): Speaks Indonesian and limited Spanish. Source Source
Ron Paul (Republican): Speaks Spanish. Source
Colin Powell (Republican): Speaks Yiddish. Exactly how well is not clear. He describes it as a bissel "a little" but may be being modest. Apparently more than just the odd word but less than what is necessary to carry on a real conversation. Source
Bill Richardson (Democrat): A native speaker of Spanish, he is also reported to speak French well. Source
Mitt Romney (Republican): Is said to speak fluent French resulting from serving a 30 month mission in France. There are reports that he also speaks Swahili, though I cannot determine where he might have learned it and have not confirmed this.Source

By the way, Howard Dean, who is not running this time, also speaks Spanish. His Spanish is reported to be "pretty heavily accented, nerdy, Northeastern Spanish, but, still, pretty good Spanish." Source.

PS: Don't be surprised if the page changes. I'm updating it as additional information comes in.

Posted by Bill Poser at 01:07 AM

July 07, 2007

Incubus

Did you know that William Shatner starred in a 1965 movie whose dialog was entirely in Esperanto? I didn't. You can learn more about "Incubus, the Esperanto Film" in a post at The Proceedings of the Athanasius Kricher Society.

For lagniappe, there's a curse.

[Hat tip: Kerim Friedman]

Posted by Mark Liberman at 05:11 PM

Following too close after truth

Jumpstart for 7/3/2007: Dot tries to teach Joe proper grammar, while Joe teaches Dot about NASCAR:

Though I hate to get between a mother and her son, I have to side with Joe on the grammatical point. The word close is among those English vocables that never quite recovered from a morphologically unfortunate incident a thousand years ago, when Old English mislaid its final short e's, including what was then the adverbial ending. As John Cowan put it, in a comment quoted in an earlier Language Log post ("Amid this vague uncertainty, who walks safe?", 2/23/2007),

Adverbs in adjective form have been around in English since forever, or at least since the fall of final short e, which was the original adverb ending. In OE, we had a contrast between læt ’slow’ and læte ’slowly’, but later these came to be pronounced identically. Similar stories stand behind go fast and hit hard and many other adverbs, most of them monosyllabic.

Indeed, the ModE adverb ending -ly was -lice in OE, a compound of -lic (same as lic ‘body, corpse’ > lich, lyke ‘corpse’) and this same original -e.

As a result, the OED doesn't even try to distinguish close the adjective from close the adverb, but gives just one entry for both, which includes the relevant examples:

a1400 Morte Arth. 1196 The clubbe..That in couerte the kynge helde closse to hym seluene.
1601 SHAKES. Jul. C. IV. iii. 164 Now sit we close about this Taper heere.
1611 BIBLE Prov. xviii. 24 A friend that sticketh closer then a brother.
—— Jer. xlii. 16 The famine..shall follow close after you.

And as a modern example of the continuing equivocation about close vs. closely, the OED's entry for dog, v. includes the glosses

1. trans. a. To follow like a dog; to follow pertinaciously or closely

and

2. intr. or absol. To follow close.

Turning to Literature Online, we find that "follow close" has 47 hits in poetry, 38 in drama and 30 in prose, whereas "follow closely" has 6 in poetry, 6 in drama, and 12 in prose.

We can find citations for the "follow close" choice in works by Alexander Pope, John Keats, Mary Shelley, Bret Harte, Walt Whitman, John Wesley, Charles Dickens, Anthony Trollope, among other notables.

But as Charles F. Briggs put in in 1843 (The Haunted Merchant, Chapter 1)

... Sir Walter Raleigh advises the writer of history even, not to follow too close after Truth, lest he should get a kick from her heels.

So perhaps this is enough overinterpretation of the funny papers for today.

[Hat tip: Timm Ferree]

Posted by Mark Liberman at 07:20 AM

The Supreme Court Fails Semantics

One of the skills expected of judges is the ability to understand specialized legal language; this they are trained to do in law school. It also falls to judges to interpret ordinary language. In this they receive no special training, and, from time to time, fail. The recent US Supreme Court decision in Morse v. Frederick is a case in point.

The case concerns an incident that occurred when students at a high school in Alaska were let out of school to observe the Olympic Torch Relay. A group of students, among them Joseph Frederick, held up a banner with the words "BONG HiTS 4 JESUS". When the principal, Deborah Morse, demanded that they take it down, Frederick refused. As a result he was suspended.

The case raises a number of issues about the free speech rights of high school students that have been the subject of extensive commentary and debate elsewhere. An issue that has not been adequately addressed is what Frederick said that justified punishment. The Supreme Court majority is of the view that the school had the right to punish Frederick for the slogan because it encourages the use of drugs, which it is the policy of the school to discourage. Here is Chief Justice Roberts' majority opinion, in which he was joined by Associate Justices Alito, Kennedy, Scalia, and Thomas, on this point:

The Court agrees with Morse that those who viewed the banner would interpret it as advocating or promoting illegal drug use, in violation of school policy. At least two interpretations of the banner's words - that they constitute an imperative encouraging viewers to smoke marijuana or, alternatively, that they celebrate drug use - demonstrate that the sign promoted such use. This pro-drug interpretation gains further plausibility from the paucity of alternative meanings the banner might bear.

Associate Justice Thomas does not address this issue in his concurring opinion, which is devoted primarily to his view that high school students have no free speech rights and that Tinker should be overturned. Associate Justice Alito in his concurring opinion explicitly limits the school's power to forbidding advocacy of illegal drug use, stating that advocacy of the legalization of the use of marijuana, for example, would fall within the student's First Amendment rights.

Thus, a critical element of the Court's decision is the claim that "BONG HiTS 4 JESUS" is properly interpreted as advocating the illegal use of marijuana. Frederick, on the other hand, asserts "that the words were just nonsense meant to attract television cameras". His explanation is supported by the playful, irregular capitalization of "HiTS" and the use of "4" in lieu of "for", as well as by the inclusion of the PP "for Jesus", which makes no sense here but is a common component of the slogans of the religious right and so is incongruous in combination with "bong hits".

If an utterance is well-formed we arguably don't need to engage in any kind of linguistic analysis to find out what it means: any fluent speaker of the language will know. I say arguably, because even fluent speakers may not realize without thinking about it or unless someone else points it out, that more than one interpretation is possible. Not only may they miss structural ambiguities, they may not know some possible meanings of some words, or these meanings may not be salient for them.

In this case, however, we have a legal controversy, between fluent native speakers, as to the meaning of the utterance. Even those who purport to know that it encourages illegal drug use tacitly admit that it does not wear this meaning on its face. The utterance is peculiar and requires a process of inference to interpret. In such a case, we can't simply rely on the claims of fluent speakers but must carry out an analysis of the utterance to determine its possible meanings.

I submit that there are two serious errors in the Court's analysis. The first is that it is unwilling to accept the possibility that the utterance is meaningless. Justice Roberts writes:

The dissent mentions Frederick's "credible and uncontradicted explanation for the message - he just wanted to get on television"... But that is a description of Frederick's motive for displaying the banner; it is not an interpretation of what the banner says.

This begs the point. No "interpretation of what the banner says" could be offered by Frederick insofar as it has no meaning. By dismissing any explanation for what was written on the banner that does not provide an interpretation, the Court assumes that it must mean something. Nowhere in the opinion is any justification offered for this assumption.

That it is perfectly possible for a well-formed utterance to have no semantic interpretation is well known to linguists. The classic example is the sentence:

Colorless green ideas sleep furiously.

put forward by Noam Chomsky in Syntactic Structures. In An Inquiry into Meaning and Truth (1940, Ch. XII) Bertrand Russell used the example:

Quadruplicity drinks procrastination.

That an ill-formed or incomplete utterance might have no semantic interpretation is of course completely uncontroversial.

Thus, from a linguistic point of view, it is perfectly possible that the words on the banner might have meant nothing at all. Frederick's explanation of his motivation for displaying the banner provides a plausible account of his use of words that did not mean anything.

The other error in the Court's analysis is in mistaking one kind of meaning for another. The modern study of meaning is based on the assumption that the meaning of a complex expression can, in general, be computed from the meanings of its components. To make an exhaustive analysis of the possible meanings of "BONG HiTS 4 JESUS", we would determine the possible meanings of each of the individual words, what the possible syntactic structures are, and what overall meanings can be computed from the meanings of the words when inserted into the possible syntactic structures. Doing this would be quite tedious, and very few combinations are at all plausible, so I'll assume that "BONG HiTS" is a compound noun referring to inhalations of marijuana and that "4 JESUS" is a prepositional phrase meaning "for the benefit of a person named Jesus", most likely the famous one, but conceivably someone else. If the slogan is a single syntactic constituent, it must be a Noun Phrase in which "for Jesus" modifies "bong hits".

There are a number of types of meaning. When we talk about the meaning of a noun, we are normally concerned with its denotation, that is, with the set of things to which it refers. The word dog, for example, denotes the set of dogs. Sentences, on the other hand, typically express propositions, such as "Roses are red." and "Beavers build dams.".

The kind of meaning that the Court purports to find is propositional. It claims, in effect, that the interpretation of the banner is something like "It is good to smoke marijuana even though it is illegal." or "Go ahead and smoke marijuana.". However, the banner does not, on any plausible analysis, contain the kind of syntactic structure that serves to express propositions, namely a sentence, not even a sentence part of which is not overt. Nor is this an example of a construction with an implicit verb, such as "Freedom for Tibet", which means something like "Freedom for Tibet would be good" or "We support freedom for Tibet". (The Court does not argue that the banner means "It would be good for Jesus to smoke marijuana.")

All that the Court actually argues is that "BONG HiTS 4 JESUS" contains a reference to drug use.

Gibberish is surely a possible interpretation of the words on the banner, but it is not the only one, and dismissing the banner as meaningless ignores its undeniable reference to illegal drugs. [emphasis mine]

That is probably true, but a reference is not a proposition and does not support any inference as to the speaker's attitude toward smoking marijuana. Even if the banner said "smoking marijuana" we could not say whether it meant "Smoking marijuana is hazardous." or "Smoking marijuana is delightful." or something else.

In sum, from the observation that the banner contains a reference to smoking marijuana, and the false assumption that the banner must express a proposition, the Court has invalidly inferred a particular proposition. The slogan is in fact meaningless in the sense that it expresses no proposition, and Frederick gave a perfectly plausible explanation for the use of a meaningless slogan. The Court was therefore wrong in finding that the banner advocates the use of marijuana.

Posted by Bill Poser at 05:09 AM

July 06, 2007

Female talkativeness: "knowledge protected against induction"?

In a recent post ("The New York Times slyly abets a lie", 7/6/2007), I discussed Dr. Louann Brizendine's amended claim about female talkativeness.

... because of her larger communication center, this girl will grow up to be more talkative than her brother. [original: Men use about seven thousand words per day. Women use about twenty thousand. ] [amended: In most social contexts, she will use many more forms of communication than he will.]

What does "forms of communication" mean? Or, or get all positivistic about it, how could we check whether or not this claim is true?

Some guidance is offered by an interview with Dr. Brizendine, published last year in the NYT Sunday Magazine ("He thought, she thought", NYT Magazine 12/10/2006):

Q: Your book cites a study claiming that women use about 20,000 words a day, while men use about 7,000.

A: The real phraseology of that should have been that a woman has many more communication events a day — gestures, words, raising of your eyebrows.

In a post the next day ("Sex differences in 'Communication events" per day?", 12/11/2007), I tried to evaluate the "communications events" claim by taking a look at John F. Dovidio, Clifford E. Brown, Karen Heltman, Steve L. Ellyson, Caroline F. Keating, "Power displays between women and men in discussions of gender-linked tasks: A multichannel study", Journal of Personality and Social Psychology, 55(4), 580-587, 1988.

My conclusion:

... the guys did more of the talking, as is often the case -- 43% more, this time, which is a bigger difference than one usually sees. What about non-verbal signals? Well, the guys did 80% more gesturing, and produced 623% more chin thrusts. The gals did 28% more smiling, 7% more self-touching, and 46% more laughing.

I even tried to add up all the "communication events", as absurd as that is, and found that "for the males, we get ... 278.62 'communication events'. For the females, we get 203.36 "communication events'".

Dovidio et al. didn't count eyebrow motions, it's true. But there's certainly no support here for the view that women produce about three times more "communication events" on average than men do.

You can read many further details in the cited post.

Dr. Brizendine's end-note for the amended sentence ("In most social contexts, she will use many more forms of communication than he will") is this:

14: ". . . communication than he will.": Tannen 1990.

This cashes out in her bibliography as Deborah Tannen, "You just don't understand: women and men in conversation", 1990. Tannen's central thesis is that "If adults learn their ways of speaking as children growing up in separate social worlds of peers, then male-female conversation is cross-cultural communication. Although each style is valid on its own terms, misunderstandings arise because the styles are different."

This book is the source of my favorite passage of cinematic sociolinguistics, from one of my favorite movies, White Men Can't Jump.

Actually, the linguistics comes in two parts. First, Gloria Clemente (Rosie Perez) explains to Billy Hoyle (Woody Harrelson) about (one aspect of) the difference between rapport talk and report talk:

Gloria:	Honey? My mouth is dry. Honey. I'm thirsty.
Billy:	Umm... [ Water Runs ] There you go. honey.
Gloria:	When I said I was thirsty. it doesn't mean I want a glass of water.
Billy:	It doesn't?
Gloria:	You're missing the whole point of me saying I'm thirsty. If I have a problem. you're not supposed to solve it. Men always make the mistake of thinking they can solve a woman's problem. It makes them feel omnipotent.
Billy:	Omnipotent? Did you have a bad dream?
Gloria:	It's a way of controlling a woman.
Billy:	Bringing them a glass of water?
Gloria:	Yes. I read it in a magazine. See. if I'm thirsty. I don't want a glass of water. I want you to sympathize. I want you to say. ''Gloria. I. too. know what it feels like to be thirsty. I. too. have had a dry mouth.'' I want you to connect with me through sharing and understanding the concept of dry mouthedness
Billy:	This is all in the same magazine?

Several scenes later in the movie, after a big fight and separation, Billy approaches Gloria and sings her a song:

Billy:

Honey. All right... don't say anything. all right? Just listen for a second.
Ahem.
Ahem.

I will never bring you water
When you're thirsting in our bed
You know I understand dry-mouthedness
And I sympathize instead

There's more, but it's about love rather than language.

[Note: the movie transcription comes from this source, which curiously doesn't provide any speaker-turn divisions or speaker IDs -- apologies if I divided things up wrong.]

I have no idea what magazine Gloria fictionally read, but the relevant passage in You Just Don't Understand is on p. 51, under the heading "I'll Fix It For You", which starts like this:

Women and men are both often frustrated by the other's way of responding to their expression of troubles. And they are further hurt by the other's frustration. If women resent men's tendency to offer solutions to problems, men complain about women's refusal to take action to solve the problems they complain about.

Anyhow, I've read the book, and I don't recall anything in it to support the claim that "In most social contexts, [a woman] will use many more forms of communication than [a man] will". I just paged through my copy (the 13th printing of the 1990 edition, if it matters) without finding anything that could be said to back up the assertion. The trade paperback version is searchable on Amazon, so I looked at all 39 places where the word "communication" occurs in the book, again without finding anything.

In fact, what Tannen 1990 had to say on the subject is this (p. 75):

Who talks more, women or men? According to the stereotypes, women talk too much. Linguist Jennifer Coates notes some proverbs:

A woman's tongue wags like a lamb's tail.
Foxes are all tail and women are all tongue.
The North Sea will sooner be found wanting in water than a woman be at a loss for a word.

[..] Modern stereotypes are not much different from those expressed in the old proverbs. Women are believed to talk too much. Yet study after study finds that it is men who talk more -- at meetings, in mixed-group discussions, and in classrooms where girls or young women sit next to boys or young men.

So where does this leave Brizendine's assertion about women's talkativeness? Well, a recent note from Robin Shannon came with a link to a relevant paper by Gail Jefferson, "A Note on Laughter in 'Male-Female' Interactions", Discourse Studies 6(1) 117-133, 2004. The (start of the) abstract:

Working with interactional data, one sometimes observes that a type of behavior seems to be produced a great deal by one category of persons and not all that much by another category. But when put to the test of a straightforward count, the observation does not hold up: Category X does not after all do this thing significantly more often than Category Y does. It may then be that the apparent skewing of the behavior's distribution across categories is the result of selective observation; noticing with greater frequency those cases which conformed to some biased notion held by the observer of how these categories behave.

And towards the end of the paper, Jefferson writes:

The foregoing may turn out to be an object lesson in the persistence of stereotypes even when confronted by cold, hard, neutral facts. As happens again and again, the facts (in this case the results of counting the assembled instances of 'male-female' laughter) are disputed with anecdotes (here, with a few cases that serve the stereotypes, while those that don't are treated as 'exceptions').

[...]

This begins to look like something akin to Harvey Sacks' observations on 'category-bound activities' with their associated 'knowledge protected against induction' (1992: 295). As Sacks remarks:

It's not the case that exceptions involve any change in what you know about [a] category's members. For all the categories that have . . . a bunch of activities bound to them, exceptions don't matter. It's built in that there are exceptions, and they do not involve you in modifying what you know.

Exactly.

[Full disclosure -- Jefferson argues that in some rather subtle way, perhaps the stereotypes will turn out to be true. The abstract ends:

But there seems to be another possibility. It may be that the observation has located, but only roughly and partially described, a complex of behaviors which the observation can then be seen to reflect, refer to, or constitute a 'gloss' for.

After reading her paper fairly carefully, I'm still not sure that I understand this, but perhaps it's a case of truthiness avant la lettre: those "hard, cold, neutral" behavioral counts, like books, are "all facts, no heart".]

[Update -- a reader asked for a more complete reference to the work by Harvey Sacks that Gail Jefferson quotes. It's given in her bibliography as Sacks, H. (1992) Lectures on Conversation, Vol. I. Cambridge, MA: Blackwell.]

Posted by Mark Liberman at 06:10 PM

The New York Times slyly abets a lie

Well, a fib, anyhow.

According to Donald G. McNeil Jr., "Everybody's Talking", New York Times, 7/5/2007:

Briefly: Who talks more? Man? Woman?

Conventional wisdom: women use 20,000 words a day, men 7,000. Come cocktail hour, hubby played out. Wife frustrated: 13,000 words to go, no takers.

But wisdom comes from populist 2006 book, "The Female Brain." Data shaky. Skeptics abound.

Today, study published Science magazine: 396 subjects wear tiny microphones. Result: whoops. Women emit 16,125 words per day, men 15,669. Statistically, even-steven.

However, authors admit flaw: all 396 were college students — congenitally loquacious, no jobs, no commutes, no need for aphonic mesmerization by Monday Night Football.

Despite the flaw, says lead author, Matthias R. Mehl, University of Arizona psychologist, "Our paper puts to rest the idea that the female brain evolved to be talkative and the male brain evolved to be reticent."

However, fact slyly not mentioned in Science study: after first printing of "Female Brain," author, Louann Brizendine, began worrying that 20,000 vs. 7,000 figure was just invented by marriage counselors and removed it.

To say that the Science article "slyly" failed to mention this removal is pretty strong language: sly is glossed by MW as "clever in concealing one's aims or ends ... "lacking in straightforwardness and candor", with synonyms furtive and dissembling.

It's true that Dr. Brizendine removed the specific 20,000-vs.-7,000 numbers after being challenged on them ("Word counts", 11/28./2006). However, she in no way retracted the claim that women are more talkative.

Let's be painfully specific.

A passage on p. 14 first said (emphasis added to highlight the changed sentence):

Until eight weeks old, every fetal brain looks female -- female is nature's default gender setting. If you were to watch a female and a male brain developing via time-lapse photography, you would see their circuit diagrams being laid down according to the blueprint drafted by both genes and sex hormones. A huge testosterone surge beginning in the eighth week will turn this unisex brain male by killing off some cells in the communication centers and growing more cells in the sex and aggression centers. If the testosterone surge doesn't happen, the female brain continues to grow unperturbed. The fetal girl's brain cells sprout more connections in the communications centers and areas that process emotion. How does this fetal fork in the road affect us? For one thing, because of her larger communication center, this girl will grow up to be more talkative than her brother. Men use about seven thousand words per day. Women use about twenty thousand. For another, it defines our innate biological destiny, coloring the lens through which each of us views and engages the world.

And in later printings it reads:

Until eight weeks old, every fetal brain looks female -- female is nature's default gender setting. If you were to watch a female and a male brain developing via time-lapse photography, you would see their circuit diagrams being laid down according to the blueprint drafted by both genes and sex hormones. A huge testosterone surge beginning in the eighth week will turn this unisex brain male by killing off some cells in the communication centers and growing more cells in the sex and aggression centers. If the testosterone surge doesn't happen, the female brain continues to grow unperturbed. The fetal girl's brain cells sprout more connections in the communications centers and areas that process emotion. How does this fetal fork in the road affect us? For one thing, because of her larger communication center, this girl will grow up to be more talkative than her brother. In most social contexts, she will use many more forms of communication than he will. For another, it defines our innate biological destiny, coloring the lens through which each of us views and engages the world.

And current printings of the book still say (p. 36)

[W]omen, on average, talk and listen a lot more than men. The numbers vary, but on average girls speak two to three times more words per day than boys.

The end-notes don't offer any source for that estimate. We could, however, appeal to a 2004 meta-analysis (a study of studies, so to speak), C. Leaper and T.E. Smith, "A meta-analytic review of gender variations in children's language use: talkativeness, affiliative speech, and assertive speech", Developmental Psychology 40(6) 993-1027, 2004, which concludes that

On average, girls were slightly more talkative and used more affiliative speech than did boys, whereas boys used more assertive speech than did girls. However, the average effect sizes were either negligible (talkativeness, d=0.11; assertive speech, d=0.11) or small (affiliative speech, d=0.26).

Leaper and Smith surveyed 61 studies for the talkativeness part of their meta-analysis. An "effect size" of 0.11 means that the average difference between girls and boys was about one tenth of a standard deviation -- for more on what this means, see my post "Gabby Guys: the effect size", 9/23/2006.

So I'd say that it's not Matthias Mehl and his co-authors who are being sly.

And it's not only the female-talkativeness claim that still stands in the new version of this passage. The whole word-count business was just one of many shiny factoids decorating the basic structure of what Young and Balaban, reviewing Brizendine's book in Nature, called "Psychoneuroindoctrinology". Some of the falsifiable factoids have been replaced by vaguer versions, but the basic thesis hasn't budged an inch:

If you were to watch a female and a male brain developing via time-lapse photography, you would see their circuit diagrams being laid down according to the blueprint drafted by both genes and sex hormones. A huge testosterone surge beginning in the eighth week will turn this unisex brain male by killing off some cells in the communication centers and growing more cells in the sex and aggression centers. If the testosterone surge doesn't happen, the female brain continues to grow unperturbed. The fetal girl's brain cells sprout more connections in the communications centers and areas that process emotion. [emphasis added]

I discussed the end-notes provided for this passage at tedious length in an earlier post ("The laconic rapist in the womb", 9/4/2006). In particular, I looked in detail at the six references that are cited to support the passage in bold face, and concluded that "none of the references that Brizendine cites in support of this passage provide any empirical support for them at all".

In the later printings (I'm looking at the 12th printing), these references have been updated as follows. Instead of

14: ". . . both genes and sex hormones.": Glickman 2005; Arnold 2004.
14: ". . . the sex and aggression centers.": Sur 2005.
14: ". . . areas that process emotion.": Hill 2006; Herbert 2005; Sun 2005; Witelson 1995; Goldberg 1994.
14: ". . . women use about twenty thousand.": Deacon 1997; Garner 1997; Lewis 1997; Pease 1997; Lakoff 1976; Thorne 1983.

we now have:

14: ". . . both genes and sex hormones.": Arnold 2004.
14: ". . . the sex and aggression centers.": Sur 2005.
14: ". . . areas that process emotion.": See Chapter 6, "Emotions."
14: ". . . communication than he will.": Tannen 1990.

So the irrelevant Glickman 2005 is gone; the irrelevant review in Sur 2005 remain as the only support for the claim that fetal testosterone kills cells in "communication centers" and grows cells in the "sex and agression centers"; support for the claim about fetal girls sprouting more connections in the communication centers and the centers that process emotion has been deferred to Chapter 6; and Tannen 1990 is cited in support of the "many more forms of communication" claim; and

There's a lot to say about Brizendine's Chapter 6 -- but there's nothing in the text about sex differences in fetal brain development, and I don't see anything in the end-notes that seems relevant either.

I'll take up the Tannen reference and the business about "more forms of communication" another time. As for the rest of it, there's still no scientific support whatsoever offered for the morality play about the fetal origins of male sex and aggression vs. female communication and emotion.

But this morality play is the meat of the matter. The word counts and other shiny factoids are just the sauce.

Of course, that sauce is still being ladled out. Check out the publisher's blurb at Powell's Books for the paperback edition of The Female Brain, due to be released on August 7, 2007:

Louann Brizendine, M.D. is a pioneering neuropsychiatrist who brings together the latest findings to show how the unique structure of the female brain determines how women think, what they value, how they communicate, and who they'll love. Brizendine reveals the neurological explanations behind why

A woman uses about 20,000 words per day while a man uses about 7,000

A woman remembers fights that a man insists never happened

A teen girl is so obsessed with her looks and talking on the phone

Thoughts about sex enter a woman's brain once every couple of days but enter a man's brain about once every minute

A woman knows what people are feeling, while a man can't spot an emotion unless somebody cries or threatens bodily harm

A woman over 50 is more likely to initiate divorce than a man

Women will come away from this book knowing that they have a lean, mean communicating machine. Men will develop a serious case of brain envy.

The same blurb is used on the publisher's web site to promote the e-book version, also due out August 7 -- I snarfed this screen shot a few minutes ago:

Broadway Books Brizendine Blurb

Posted by Mark Liberman at 03:27 PM

More on Bongs

It turns out that "bong" has even more meanings than I realized. Here is my current list:

N. a pipe used especially for smoking marijuana
N. an instrument used for chugging beer
N. a very wide piton
N. the sound made by, e.g., a sledgehammer bouncing off a large, heavy sheet of metal
Vi. to make the above sound
Vt. to strike something in such a way as to produce a bong sound
N. Indian English slang term for a person from Bengal

The newest one (to me) is the last, of which I have been apprised by a couple of readers from India. They say that it is slang but not pejorative. It is thought to have originated in the elite national schools like the Indian Institute of Technology. There is even a movie about Bengalis that uses the term in its title: The Bong Connection. It is one of a set, such as tam for Tamils, mallu or mal for Malayalis, and gult for Telegu speakers.

There is also a bit more to say about bong as a smoking device. In current usage it appears to describe a water pipe. The Wikipedia article, for example, is devoted entirely to water pipes. As one reader wrote to point out, that is not the original meaning.

Back then a bong was a wide tube open at both ends with a little hole near one end. A joint, or a pipe bowl, was inserted into the small hole, and the user drew in smoke from the other end with his hand over the end near the hole, releasing his hand near the end of his toke to receive a sudden insurge of smoke. What is today called a "bong" was then called a "water bong", because it was a combination of a water pipe and a bong.

This is consistent with my memory. I remember seeing non-water bongs of this type, though I don't recall using one. So, it looks like bongs were originally very wide pipes, and that the term was transferred to water pipes via the intermediate stage of "water bongs". The original component of width appears to have been lost: though most water pipes are fairly wide, they don't have to be.

This raises the question of whether there is a relationship between bong as a wide pipe and bong as a wide piton. I can easily see the term for pipe being extended to pitons. I don't know of any evidence bearing on this.

Addendum: Reader Mark Seidenberg informs me of the existence of still another kind of bong: the beer bong. This is a device used for chugging beer.

Posted by Bill Poser at 02:12 PM

A cat whose owners thought was lost

AP on Yahoo, "Cat survives three weeks crossing ocean", 7/5/2007. First sentence of the story:

A cat whose owners thought was lost spent nearly three weeks crossing the Pacific Ocean in a shipping container with no food or water — and appears to be just fine.

I had to read it three times. It looks almost like a possible parasitic gap, but it's not. I suppose it could be fixed by filling either gap with a pronoun:

A cat who its owners thought was lost …
A cat whose owners thought it was lost …

Why do you suppose they did the illicit parasitic gap? Might the two alternative well-formed possibilities have struck their ear superficially as illicit resumptive pronoun constructions? Is this an attested pattern of hypercorrection?

[Update -- Marilyn Martin did a Google search and sent the following results, all showing exactly the same pattern as the cat example:

(link) Picture: AFP Iraqis carry the coffin of a man whose relatives said was killed during a US military raid on their home in Baghdad. ...

(line) A jury award $15 million to the estate of a resident whose family claimed was given Darvocet (a mild painkiller) in place of morphine (a more powerful ...

(link) The 24 family members, from four states and representing four generations, were there to honor the man whose wife said was among "about three men" of the ...

(link) Early in the week, a man, whose wife said was bipolar, was shot in Miami after allegedly claiming to possess a bomb. Then on Thursday, a six-year old boy ...

(link) there was the boy whose parents thought was a potential genius, yes they could, with a lot of work, get him to Level 5, but for his sake didn't want to. ...

(link) A patent clerk whose parents thought was retarded, couldn't remember his own phone number yet manages to revolutionize science for all time. Mythical: ...

(link) i'm the girl whose sister thought was obnoxious once upon a time. i'm that girl who couldn't bear to miss classes in year 1 yet happily skipped a tute just ...

Similar searches turn up hundreds of other examples, suggesting that there is a real phenomenon here.

Laura Kalin wrote with puzzlement, trying to figure out why the sentence sounds so close to being correct and wondering why it’s actually not correct.

I read the sentence featured in your post ("A cat whose owners thought was lost spent nearly three weeks crossing the Pacific Ocean ….") over and over again, because the sentence almost sounded correct. I have finally figured out what stumped me: why is it that replacing "whose owners" with an 'equivalent' pronoun makes the sentence grammatical? The phrase "A cat they thought was lost" sounds completely grammatical to me. Why the disparity? Is it something to do with the possessive determiner?

For Laura and any other similarly puzzled readers, I think the comments and examples below from Craig Russell show quite explicitly both how the cat sentence comes so close to being correct and why it's not.

I got several messages (thank you all) suggesting hypotheses to explain it. Now that we have Marilyn's examples, I’m convinced it needs explaining; and her examples also serve to eliminate some of the hypotheses that came in. The suggestion that seems most plausible so far comes from Craig Russell. He apologizes for his lack of formal linguistic training, but needn't have -- the result is that he writes in clear language that will be as accessible to non-linguists as to linguists, which is all to the good.

I've been giving some thought to your Language Log posting, and I have come up with sort of a theory to explain the construction. I will apologize, as I always do in my responses to Language Log postings, for my lack of formal linguistic training and familiarity with terminology; I am a graduate student in the Classics whose exposure to the study of language is mostly through Greek and Latin.

Anyway, my cat theory is based on the normal pattern for the formation of sentences with relative clauses: what could be a regular pronoun in a separate sentence can be replaced with a relative pronoun. E.g.

1. Here is a cat. He is orange.
Here is a cat who is orange. (he-->who)

2. Here is a cat. People love him.
Here is a cat who(m) people love. (him-->who(m))

3. Here is a cat. His owners are the Smiths.
Here is a cat whose owners are the Smiths. (his-->whose)

Familiar stuff. But English sentences can sometimes leave out the relative pronoun:

Here is a cat. People love him.-->Here is a cat people love.

Maybe you can see where I'm going with this. That construction could lead to the following sentence:

4. Here is a cat. His owners thought he was lost.
Here is a cat his owners thought was lost.

"His owners thought was lost" is a relative clause with the relative pronoun "who" omitted. But structurally, it looks kind of like my sentence #3--the one where 'his' goes to 'whose' when creating a relative clause.

So my theory is that the sentence in question arose by taking a sentence that already had a relative clause, and falsely changing "his" to "whose" by analogy with sentences like #3. Perhaps a case of overcorrection--are there prescriptive grammarians who condemn omitting relative pronouns in sentences like #4? If so, it is easy to imagine someone accidentally changing "his" to "whose" in an attempt to fix this egregious error.

This looks quite plausible. I don't know if the result is a hypercorrection (because I also don't know if anyone feels any need to avoid no-relative-pronoun relative clauses) or a blend, but in either case the similarity Craig notes between his #3 and #4 may well be the best clue to an explanation.

If there has been real linguistic work done on this, I don't know it. But at any rate, it's a real phenomenon, more interesting than I had originally suspected. I also have no idea whether it's related to how other (well-formed, at least according to current standards) parasitic gaps have arisen. But this is Language Log, not a journal, so we can just observe such things and then go on with our real work, isn't that nice? ]

Posted by Barbara Partee at 03:55 AM

July 05, 2007

The first time?

Here's the References and Notes section of M.R. Mehl, S. Vazire, N. Ramírez-Esparza, R.B. Slatcher and J.W. Pennebaker, "Are Women Really More Talkative Than Men?", Science, 317(5834) p. 82 July 5, 2007:

Mehl et al. References

Reference #4 is a Language Log post from 8/6/2006, "Sex-linked lexical budgets". One of the journalists who interviewed me about this story asked whether this was the first time that a blog entry has been footnoted in a paper in Science. Though I don't know, I suspect that there must have been some others -- but probably not many.

Let me also take the opportunity to qualify what Constance Holden quoted me as saying in her ScienceNOW Daily News piece about the Mehl paper ("Talk About a Gender Stereotype", 5 July 2007):

"At this point, the only remaining scientific question appears to be why so many intelligent and well-educated people have so easily--even eagerly--accepted and spread what appear to be fabricated numbers supporting a false generalization," says linguistics professor Mark Liberman of the University of Pennsylvania in Philadelphia, who was not involved in the research.

The findings confirm other studies in more limited settings that suggested men hold their own in the chattiness department, Liberman says. Even so, Pennebaker's team may have missed important gender differences because they didn't consider the context in which people were speaking, says professor Deborah Tannen of Georgetown University in Washington, D.C. She points out, for example, that men and women differ in their gregariouness depending on whether they're in private or public, same-sex or mixed-sex gatherings.

I can't complain about the quote, since it was taken verbatim from our email exchange. But as soon as I had sent off the email containing it, I realized that it would probably be quoted, and so I sent this qualification:

Reading over what I wrote, I guess I should clarify that

"...the only remaining scientific question appears to be why so many intelligent and well-educated people have so easily..."

was meant to refer to the issue of whether men or women are overall more talkative -- not to the area of research into sex roles and communication, where there are certainly plenty of interesting hypotheses to explore.

Ms. Holden replied:

that was clear!

I certainly hope so.

Tomorrow, I'll take a look at the Mehl study's uptake in the popular press and the blogosphere.

Meanwhile, below is a list of relevant Language Log posts. This is more on the subject than any sane person wants to read, but you may find it amusing to browse.

Other posts on Louann Brizendine's The Female Brain:

"Neuroscience in the service of sexual stereotypes" (8/6/2006)
"Sex-linked lexical budgets" (8/6/2006)
"Sex and speaking rate" (8/7/2006)
"Yet another sex-n-wordcount sighting" (8/14/2006)
"The main job of the girl brain" (9/2/2006)
"The superior cunning of women" (9/2/2006)
"The laconic rapist in the womb" (9/4/2006)
"Open-access sex stereotypes" (9/10/2006)
"David Brooks, Neuroendocrinologist" (9/17/2006)
"Gabby guys: the effect size" (9/25/2006)
""Every 52 seconds": wrong by 23,736 percent?" (10/13/2006)
"Guys are a bit gabbier in Dutch, too" (10/16/2006)
"Two new reviews of Brizendine" (10/30/2006)
" Word counts" (11/28/2006)
"Sex differences in "communication events" per day?" (12/11/2006)

More on the spread of these ideas in the media:

Regression to the mean in British journalism(11/28/2006)
Censorship at the Daily Mail(11/29/2006)
Contagious misinformation(12/1/2006)
Femail again(12/2/2006)
~~Bible~~ Science stories(12/2/2006)
Fabricated but true?(12/3/2006)
The spread of bogus numbers in the meme pool (12/16/2006)
Busy tongues (12/31/2006)
The silence of the men (12/29/2006)
Cerebro de El País (1/28/2007)
The Female Brain is out in Britain(4/4/2007)
The New York Times slyly abets a lie (7/6/2007)
Luann doesn't read Language Log (7/15/2007)

And on Leonard Sax's Why Gender Matters, and Michael Gurian and Kathy Stevens' The Minds of Boys:

"David Brooks, cognitive neuroscientist" (6/12/2006)
"Are men emotional children?" (6/24/2005)
"Of rats and (wo)men" (8/19/2006)
"Leonard Sax on hearing" (8/22/2006)
"More on rats and men and women" (8/22/2006)
"The emerging science of gendered yelling" (9/5/2006)
"The vast arctic tundra of the male brain" (9/6/2006)
"Girls and boys and classroom noise" (9/9/2006)

Brizendine Proposes a New Stereotype!

I tell you, the speed of modern communication is only rivalled by the speed at which people can come up with new unsupported generalizations. Alert reader Tamara Bhandari has pointed us to this article in the LA Times in which Louann Brizendine, author of The Female Brain, responds to the paper just published in Science that demolishes the claim that women are more talkative than men as follows:

"What it really means is not that she talks too much," said Brizendine, who directs the Women's Mood & Hormone Clinic at UC San Francisco. "It's that he doesn't listen enough!"

Of course, the Mehl. at al. study doesn't "really mean" this. That is, that men don't listen enough is not a logical implication of the fact that men and women are about equally talkative. What she seems to mean is that in face of these empirical findings, the closest hypothesis she can think of to the old stereotype that is not ruled out by the data is that men don't listen enough. She doesn't cite any evidence for her new hypothesis. Indeed, it is an inferior hypothesis, from a scientific point of view (but therefore a superior one from the point of view of popular writers), in that the empirical claims that it makes are far from clear. We know what it would mean for women to talk more than men and how we can measure this. What exactly does it mean for men to listen enough, or not enough? How can you measure that?

Posted by Bill Poser at 08:09 PM

Male and Female College Students are Equally Talkative

That women are more talkative than men is a popular belief with little empirical support, in spite of which it is promoted in purportedly serious books like Louann Brizendine's The Female Brain, which Mark has discussed here. A new nail in the coffin for this idea appears in today's (6 July 2007) issue of Science, in a short paper entitled "Are Women Really More Talkative Than Men?" by Matthias R. Mehl, Simine Vazire, Nairán Ramírez-Esparza, Richard B. Slatcher and James W. Pennebaker (Vol. 317. no. 5834, p. 82 DOI: 10.1126/science.1139940).

Mehl et al. studied 396 university students, 51 of them in Mexico, the remainder in the United States, consisting of 210 women and 186 men, who wore specially designed digital audio recorders. They could not tell when the recorders were recording and they could not turn them on and off. The recorders were programmed to record for thirty seconds every 12.5 minutes. In this way, they collected random samples of the participants' speech from which they could extrapolate the number of words each spoke per day. Overall, the women produced an average of 16,215 words per day, the men 15,669. Although a naive interpretation is that this shows that women are more talkative, the variance is large, so the difference of 546 words, only 3.5%, is not statistically significant. Indeed, although I don't think that anything can be made of the fact statistically, inspection of their data reveals that the handful of really extreme magpies, who produced over 40,000 words per day, were all male.

P.S.: In addition to the actual report cited above, access to which requires a subscription, there is an article about it, which I think does not. Of course, you really should be a member of AAAS, in which case you would have a subscription.

Posted by Bill Poser at 06:25 PM

Font rage

We've previously documented cases of word rage, in which peevishness about pronunciation, usage and even spelling inspires campy threats of violence. But now, for the first time, we have a case of font rage.

I've seen negativity about comic sans before, but this is extreme. Will Vincent Connare really become the Salman Rushdie of typography?

[Hat tip: Lane Greene]

Posted by Mark Liberman at 09:52 AM

Insert Flap "A" and Throw Away

Geoff Pullum's ruminations on the shortage of determiners in pre-recorded warning announcements ("Please put __ luggage cart brake to on") reminded me of S.J. Perelman's 1944 essay "Insert Flap 'A' and Throw Away".

It begins:

One stifling summer afternoon last August, in the attic of a tiny stone house in Pennsylvania, I made a most interesting discovery: the shortest, cheapest method of inducing a nervous breakdown ever perfected. In this technique (eventually adopted by the psychology department of Duke University, which will adopt anything), the subject is placed in a sharply sloping attic heated to 340d F. and given a mothproof closet known as the Jiffy-Cloz to assemble. The Jiffy-Cloz, procurable at any department store or neighborhood insane asylum, consists of half a dozen gigantic sheets of red cardboard, two plywood doors, a clothes rack, and a packet of staples. With these is included a set of instructions mimeographed in pale-violet ink, fruity with phrases like "Pass Section F through Slot AA, taking care not to fold tabs behind washers (see Fig. 9)." The cardboard is so processed that as the subject struggles convulsively to force the staple through, it suddenly buckles, plunging the staple deep into his thumb. He thereupon springs up with a dolorous cry and smites his knob (Section K) on teh rafters (RR). As a final demonic touch, the Jiffy-Cloz people cunningly omit four of the staples necessary to finish the job, so that after indescribable purgatory, the best the subject can possibly achieve is a sleazy, capricious structure which would reduce any self-respecting moth to helpless laughter. The cumulative frustration, the tropical heat, and the soft, ghostly chuckling of the moths are calculated to unseat the strongest mentality. [emphasis added]

As this classic quotation indicates, it's traditional for assembly instructions to be under-determined, in the sense of omitting determiners as well as in the sense of being vague at crucial points: "taking care not to fold __ tabs behind __ washers".

Though actually, I should say that it was traditional, since these days, assembly instructions are usually graphical rather than textual.

Perelman's essay continues:

In a period of rapid technological change, howeer, it was inevitable that a method as cumbersome as the Jiffy-Cloz would be superseded. It was superseded at exactly nine-thirty Christmas morning by a device called the Self-Running 10-Inch Scale-Model Delivery-Truck Kit Powered by Magic Motor, costing twenty-nine cents.

After some adventures with a knife and "the only sentence I could comprehend, 'Fold down on all lines marked "fold down"; fold up on all lines marked "fold up",'" we encounter another spate of determiner-omission:

"Let's see -- what's the next step? Ah yes. 'Lock into box shape by inserting tabs C, D, E, F, G, H, J, K, and L into slots C, D, E, F, G, H, J, K, and L. Ends of front axles should be pushed through holes A and B.'"

There's some grammatico-cultural generalization here, having something to do with the abstract or underdetermined nature of the conversational context, which makes the writers of instructions and warnings uneasy about choosing any of the available determiners.

In headlines, there's the motivation of saving space; but this doesn't apply to a recorded warning message on a monorail, or to printed assembly instructions for a piece of furniture or a toy.

Posted by Mark Liberman at 08:40 AM

Scriptwriter for the monorail

I am trapped in Munich (eastern Germany) at the moment, stashed in a hotel by an airline that failed yesterday to get me from Toulouse (southern France) to San Francisco (central California) via Frankfurt (central Germany). And one of the travel un-pleasures I have to look forward to today is (under the best case scenario) a very long flight from Munich to San Francisco, followed by a chance to hear this line repeated half a dozen times by a familiar recorded female voice as the monorail starts off from the airport to the giant (and extraordinarily ill-designed) San Francisco Airport Rental Car Center:

Please hold on. Please set luggage cart brake to on.

Get me rewrite! Who was the scriptwriter? First, we don't use please with warnings (Please look out! There's a tiger behind you!). Second, we don't leave out determiners in English speech unless we are Russian or Korean or something: we need a determiner such as your on luggage cart brake. Third, we don't say things like "You're going too fast, daddy! Set the brake to on!". What went wrong here? The scriptwriter should have written this:

Hold on; and if you have a luggage cart, put the brake on.

Why do the people who write scripts for recorded announcements in elevators and shuttle buses and subway trains have such a tin ear for ordinary-sounding English? What is wrong with them? Please set ordinary native command of spoken English to on, or else hire a linguist.

Added later, July 6: I was wrong about how good the best case scenario could be. It turns out that although I did get back to San Francisco (the total time from leaving my Toulouse hotel to arriving back at my Santa Cruz home was a staggering 47 hours), I never did hear the message. My friend Caroline Henton (who is in the relevant industry, working on speech issues for Apple, and sympathized) read the above post on Language Log, took pity on me, and drove up from Cupertino to meet my plane and drive me home (we both live in Santa Cruz), so I didn't need to go to the rental car center and I didn't need to drive. Isn't that cool? The above post now has the special distinction of being the most useful one to me personally that I ever wrote.

One person has now written to me to say that "please" is required because airports have to be polite; but that's nonsense. Announcements like "Hold on!" or "All aboard!" or "Step this way" or "Follow me" or "Mind your head" or "Hang onto your hat" or "Be careful out there" are perfectly polite, given a suitable tone of voice. The occurrences of "please" were in there (and the determiner was missing on "luggage cart brake") because the scriptwriter didn't know how to distinguish natural-sounding speech from brochure prose or written notices.

Another reader asked whether it might have been a translation problem. The answer is no: they announce only in English, and the bad script was written (doubtless by a native speaker) only in English and solely for people who could understand English. This is incompetence, not poor translation skills.]

Posted by Geoffrey K. Pullum at 06:46 AM

July 04, 2007

A New Theory of Language Loss

It isn't often that one encounters an entirely original theory of language loss, but I chanced on one in this comment by one TB Tabby on this blog post:

This is how languages die out: Over time, every single word in the language becomes slang for something dirty. People didn't forget to speak Latin: they just got tired of all the snickering whenever they spoke.

Posted by Bill Poser at 07:59 PM

The Ease of Learning Writing Systems

Yesterday I quoted Stolper and Tavernier's observation that "For a modern student, to learn the Old Persian script is a work of scarcely an hour". I'm not sure if everyone got it, but on the basis of my own experience learning writing systems as well as my experience of teachers of "exotic" languages, I took this to be somewhat tongue in cheek. This view is confirmed by The Mad Latinist, a former student of Professor Stolper, who thinks that most students will require well over an hour. In the comments, Tiye provides some other choice quotes garnered while taking Akkadian from Professor Stolper:

It takes only a couple of years to learn all the Akkadian you need to pass a comprehensive exam. It takes much longer to get a degree in French Lit, so obviously Akkadian is easier than French.

Those scribes had pretty short life spans, so it must not take too long to get good at Akkadian.

Reading a newspaper in English is 100 times more complex than reading a cuneiform tablet.

Posted by Bill Poser at 07:23 PM

Dice-K

Last night, I went to Fenway park, with my brother and sister and some members of our various families and around 37,500 other people, to see the Red Sox play Tampa Bay. Pitching for the Red Sox was Daisuke Matsuzaka, familiarly known as "Dice-K", and the Devil Rays couldn't do much against him (Daniel Malloy, "Matsuzaka was on the money", 7/4/2007). Before the game, we had dinner at a restaurant in Kenmore Square, and I noticed that the waiters and waitresses were wearing Red Sox jerseys with this on the back:

マツザカ

I wonder if the entry of Japanese baseball stars into the American major leagues will lead to Americans learning some katakana? How about at least enough to spell some suitably negative slogans about the ヤンキーズ on the tasteless t-shirts peddled around the ball park? (That's how they were hawking them last night: "tasteless t-shirts, get your tasteless t-shirts here!")

I also wonder whether the wait staff's jerseys were true to Japanese practice -- do uniforms in Japan carry names in katakana rather than kanji or hiragana?

As for Dice-K's nickname, I gather that it combines the symbol K used to record strikeouts, with the sense that his enormous salary was a gamble for the Red Sox management.

[Update -- Jane Acheson wrote to explain that the katakana version of Matsuzaka on the wait staff's jerseys is an affectation:

If you'll look at this YouTube clip you'll see, in addition to a nice demonstration of the supposed gyroball, that Matsuzaka's Seibu Lions jersey includes "Lions" and "18" on the front in Roman letters, and (at 00.03) "Matsuzaka" on the back, also in Roman letters.

Bruce Ryan confirms this:

FYI: Japanese baseball uniforms have player names written in phonetic English.

Here's an old picture of Hideki Matsui from his days with the Tokyo
Yomiuri Giants.

And Robert Hay adds more information:

Here's a recent unveiling of the Seibu Lions' new uniforms as an example: (link).

Notice that not only are the player names in romaji, but team name is written in English, "Lions".

I don't know much about the history of Japanese baseball, but I have found this photo of the 1935
Tokyo Giants, whose uniforms have Kanji numbers: (link).

If you're curious about actual Japanese jersey habits, you might inquire with Paul Lukas, who has a blog about sports uniforms at www.uniwatchblog.com.

Major League Baseball likes to embrace the Japanese fans, and you can get all sorts of japanese gear. I got a similar shirt for the Yankees last year, but apparently it's not available anymore.

David Massey agrees:

I read your entry about Daisuke Matsuzaka on Language Log and wanted to add a bit about Japanese baseball uniforms. The typical Japanese practice, for teams whose uniforms use a player name, is to use romaji, as this publicity photo of Hanshin Tigers outfielder Takahiro Shoda shows. Seeing katakana, or any Japanese characters at all, on the back of a Tigers jersey would be just as strange as seeing them on a Red Sox jersey.

Jane Acheson offered these further comments on Dice-K's nickname:

My take on it was that the Globe (which is frankly a PR arm for the team) realized that American viewers might have a problem with pronouncing the name -- despite the fact Japanese names are actually really easy, once you get the hang of them -- and "Dice-K", idiotic as it is, won in a poll on Boston.com, on or about the 28th of December.

On the upside, we don't have legions of the Fenway Faithful calling him dai-SOO-kee or other potential manglings. On the downside, calling him a variation on "Daisuke" is like calling Jim Rice "Jim." Kind of -- excessively familiar, I always thought.

(Ironically, since I don't have the right doohickeys installed, your representation of the tasteless t-shirt looks like ???? on my screen. But I've seen the t-shirts enough to know what you're talking about.)

]

Posted by Mark Liberman at 07:17 PM

The right to do process

This Fourth of July, I've been thinking about those "unalienable Rights" that the signers of the Declaration of Independence felt were so self-evident. When it finally came time to spell them out in a Bill of Rights, the list included:

the right of the people peaceably to assemble, and to petition the Government for a redress of grievances (First Amendment);

the right of the people to keep and bear Arms (Second Amendment); and

the right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures (Fourth Amendment).

Syntactically at least, those rights seem pretty straightforward (ignoring for the moment the troublesome commas of the Second Amendment), since "(the) right (of the people)" is simply taking an infinitival verb phrase as its complement. In the First Amendment, two of these infinitival VPs are coordinated ("to assemble" and "to petition the Government"). The Sixth Amendment features a coordination too, but it gets a little stickier:

In all criminal prosecutions, the accused shall enjoy the right to a speedy and public trial, by an impartial jury of the State and district where in the crime shall have been committed, which district shall have been previously ascertained by law, and to be informed of the nature and cause of the accusation; to be confronted with the witnesses against him; to have compulsory process for obtaining witnesses in his favor, and to have the Assistance of Counsel for his defense.

Here the first complement of "(the) right" is a prepositional phrase: "to a speedy and public trial," which then gets coordinated with a few infinitival VPs. In both the PP and the VP complements, the first word is to, but it's used for two different syntactic purposes: as a preposition and as an infinitive marker. In other words, you can have "the right to something" or "the right to do something." Right isn't the only noun that has this patterning; one can have "an inclination to violence" or "an inclination to commit violence," for example. But the PP/VP alternation for right shows up in so many set phrases in American political discourse that it's particularly prominent. And sometimes the ambiguity of the role of to in these set phrases can lead to confusion. Consider, for instance, the constitutional eggcorn that reinterprets the "due process" of the Fifth and Fourteenth Amendments by substituting the homophonous do for due, resulting in "the right to do process."

Another potential source of confusion is when PP and VP complements for "(the) right" are coordinated in a non-parallel construction, as we find in the Sixth Amendment. If the word to appears at the beginning of both types of complement, there's generally no problem in figuring out that to is filling different grammatical roles. Here's an example often heard in the police dramas of American film and television, when a cop has to provide a Miranda Warning upon making an arrest:

You have the right to an attorney and to have an attorney present during questioning.

Now, what if the cop decided to save some breath and omit the second to?

?? You have the right to an attorney and have an attorney present during questioning.

That's a bit rougher to parse, because it requires realizing that to is pulling double-duty: first as a preposition, then as an infinitive marker. Thus it's an example of syllepsis, aka WTF coordination. And if the above coordination rubs you the wrong way, then you better stay out of New York City taxicabs. Every cab driver is required to post "The Taxicab Rider Bill of Rights," which is not quite as eloquently phrased as the original Bill of Rights it's modeled on. Here's what it says (I've supplied the numbering):

As a taxi rider, you have the right to:

[i] Direct the destination and route used;

[ii] Travel to any destination in the five boroughs of the City of New York;

[iii] A courteous, English-speaking driver who knows the streets in Manhattan and the way to major destinations in other boroughs;

[iv] A driver who knows and obeys all traffic laws;

[v] Air-conditioning on demand;

[vi] A radio-free (silent) trip;

[vii] Smoke and incense-free air;

[viii] A clean passenger seat area;

[ix] A clean trunk;

[x] A driver who uses the horn only when necessary to warn of danger; and

[xi] Refuse to tip, if the above are not complied with.

So to recap, you have the right to two bare-stem VPs [i-ii], eight NPs [iii-x], and one more bare-stem VP [xi]. If you actually read through to the end, you need to make two mental switches in parsing the word to, from reading it as an infinitive marker, to reading it a preposition, to re-reading it as an infinitive marker. I had my WTF moment sitting in a Manhattan cab a few weeks ago, wondering if I'd ever make it across town through rush-hour traffic. I must have seen this "Bill of Rights" dozens of times, since it has apparently been an obligatory item in New York cabs since 1995, but this was the first time I'd bothered to read it. (I guess that's my Syntaxicab Confession.)

I haven't managed to find anyone online griping about this, which is surprising since New Yorkers are notoriously quick to gripe about anything. I did find a long funny riff on the list of rider's rights in a post on Tomato Nation, where Sarah Bunting speaks of the times spent in New York cabs where "you glare sullenly at the Taxicab Rider's Bill Of Rights posted in the back seat and wish you'd thought to bring a magic marker." Bunting's imagined magic marker isn't there to amend the grammar, which she seems to have no problem with; rather, it's poised to make those "rights" correspond to the sad reality of the cab-riding experience. If I wanted to avoid any further WTF moments, my marked-up version would read:

As a taxi rider, you have the right:

[i*] To direct the destination and route used;

[ii*] To travel to any destination in the five boroughs of the City of New York;

[iii*] To a courteous, English-speaking driver who knows the streets in Manhattan and the way to major destinations in other boroughs;

[iv*] To a driver who knows and obeys all traffic laws;

[etc.]

But I don't think I'll be bringing a magic marker with me on any cab rides, since I have a hard time getting truly dyspeptic about matters sylleptic.

(Thanks to Arnold Zwicky and Neal Whitman for confirming my native-speaker intuitions.)

Posted by Benjamin Zimmer at 11:29 AM

The state of Natural Language Processing as Revealed by Google

A few minutes ago I googled for information about the incident in 2003 in which the Saudi religious police prevented the rescue of students from a burning girls school in Mecca because they were not (by the standards of misogynist psychopaths) fully dressed — killing 15 of them. (In fact, it seems it was even worse; they reportedly actually forced some girls who had escaped back into the burning school.) I used the query "Saudi Arabia girls school fire". In addition to the regular results, which were useful, I got two sponsored links, which Google chooses by means of Natural Language Processing techniques. One of them was for firefighter training; the other was a link to a dating site headed "Hot Dubai Women".

Addendum: as one reader has pointed out, Google allows both for exact matches and "broad matches". The dating site link is presumably the result of a broad match and does not reflect the full capability of Google's matching algorithm. It is still hilarious, though in exceedingly bad taste.

Further addendum: I didn't notice the bad pun with "Hot Dubai Women" being a "broad match", pointed out by a reader. Honest.

Posted by Bill Poser at 05:13 AM

Language Documentation & Conservation journal

The National Foreign Language Resource Center and the University of Hawai'i Press have recently announced the inaugural issue (Volume 1, Number 1) of Language Documentation & Conservation (LD&C). LD&C is an open-access, online journal that is published twice a year, in June and December. Please visit the LD&C webpage and subscribe. It's free.

The inaugural issue of LD&C contains a paper by Paul Newman ("Copyright Essentials for Linguists") that may be of particular interest to readers of this blog. The entire table of contents is below the fold.

Language Documentation and Conservation
Volume 1, Number 1 (June 2007)

Table of Contents

ARTICLES:

Endangered Sound Patterns: Three Perspectives on Theory and Description
Juliette Blevins

Solar Power for the Digital Fieldworker
Tom Honeyman and Laura C. Robinson

Managing Fieldwork Data with Toolbox and the Natural Language Toolkit
Stuart Robinson, Greg Aumann, and Steven Bird

Ethics and Revitalization of Dormant Languages: The Mutsun Language
Natasha Warner, Quirina Luna, and Lynnika Butler

Writer's Workshops: A Strategy for Developing Indigenous Writers
Diana Dahlin Weber, Diane Wroge, and Joan Bomberger Yoder

TECHNOLOGY REVIEWS

Review of TshwaneLex Dictionary Compilation Software
Reviewed by: Claire Bowern

Review of Fieldworks Language Explorer (FLEx)
Reviewed by: Lynnika Butler and Heather van Volkinburg

Review of Computerized Language Analysis (CLAN)
Reviewed by: Felicity Meakins

BOOK REVIEWS

Review of A Grammar of South Efate: An Oceanic Language of Vanuatu
Robert Early

Review of Kerresel a klechibelau: Tekoi er a Belau me a omesodel: Palauan language lexicon
Robert E. Gibson

Posted by Eric Bakovic at 03:04 AM

July 03, 2007

Old Persian News

There has been an interesting development in the study of Old Persian. Old Persian is the language of the royal inscriptions of the Achaemenid kings, such as the Behistun inscription of Darius, and is known to us almost exclusively from these inscriptions. The inscriptions are written in a unique writing system, cuneiform in form, but in structure quite different from the more familiar Sumerian-Akkadian Cuneiform and its derivatives, such as Hittite, Elamite, and Hurrian Cuneiform.

This writing system is generally believed to have been created for Darius and to have been deliberately restricted in use to royal inscriptions. This was not because there was no other application for writing - Achaemenid Persia was a literate society that kept extensive administrative records. However, it kept them in Elamite and Aramaic, not Persian.

The development, reported in a recent paper by Matthew Stolper and Jean Tavernier, is the discovery of an Old Persian administrative text. A nice photograph may be seen here. The tablet was actually excavated in 1934, but no one seems to have noticed until recently that it is in Old Persian. Much of the text is uninterpretable, but enough can be read that it is clear that it is an administrative text: it deals with a transaction of unknown type involving 6,000 or more litres of a dry commodity from a named person in five named villages.

The implications of this discovery are not clear. It may be that Old Persian cuneiform was used for purposes other than royal display and that only one example has thus far been found, or it may be that this example is a fluke. Stolper and Tavernier have an interesting discussion of literacy in the Achaemenid Empire and how it could be that an administrator could get away with writing such a document in a language and writing system not normally used for such purposes.

Incidentally, over at the Harvard Iranian Studies Department Prods Oktor Skjærvø has made available on-line his Old Persian Primer. Those worried about learning to read Old Persian cuneiform will be heartened by Stolper and Tavernier's view that "For a modern student, to learn the Old Persian script is a work of scarcely an hour". Skjærvø also offers Older Avestan, Younger Avestan, and Sogdian, introductions to Zoroastrianism and Manicheism, the two major pre-Islamic Iranian religions, and Wheeler M. Thackston's reference grammars of Sorani and Kurmanji Kurdish. An excellent opportunity to bone up on your Iranian languages.

Posted by Bill Poser at 08:15 PM

Plus Ça Râle...

Arnold's reminder of the long history of complaints about the educational neglect of grammar put me in mind of the parallel history of indignation over the replacement of canonical texts with fashionable fluff, as witness this passage from the 1915 Essentials of English Speech, by the lexicographer and critic Frank Vizetelly:

As these pages are passing through the press, the subject of the school course in English is again receiving the attention of educators. There is a tendency, in certain parts of the country, to modernize the curriculum, and in one of our central States some of the changes proposed include such a radical substitution as the study of "Cabbages and Kings" for that of "Paradise Lost"; that is to say, the writings of the late Sydney Porter, better known by the pseudonym "O. Henry," are to take the place of those of Milton.

The President's addresses to Congress are to be studied in preference to the works of Shakespeare, but while Shakespeare's writings will still be used sparingly (for which one may be excused for offering a prayer of thanks), the monotony of applying the mind to the Bard of Avon's exquisite work is to be relieved by studying the writings of Bernard Shaw and Oscar Wilde! In fine, it is declared that the worth of the English classics is a negligible quantity--teachers, we are told, are "killing the love of literature by forcing on pupils too much Carlyle, Scott, Thackeray, and Dickens." As a further excuse for the substitutions suggested, it is pointed out that the great English poets and masters of literature did not write or speak in the vernacular of the present day....
These proposals imply a morbid abhorrence for the study of the accepted standards of beauty and expression and of form so trying to the patience that one is driven to ask whether it is not the teachers of English rather than the well of good English that has run dry. As the editor of an evening paper recently remarked, "To insist on diluting Shakespeare with Bernard Shaw does, indeed, indicate a certain futility of mental process which does not command respect." It may be pointed out that in regard to forms of speech the present usage of society as a whole--with its jargon and its conventionally imposed bad grammar and vicious syntax--is not more authoritative than the illiterate or obsolescent phrases of past generations.

Since then, of course, the "modernization" that critics deplore has acquired a new prefix post-, but otherwise the timbre of the keening hasn't much changed. Then as now, the well of English is perceived to have run dry (that is, if a well can be said to "run dry" in the first place) a generation earlier -- a constant since the age of Pope. Yet over the long run, it all sorts itself out, doesn't it? Shaw is comfortably canonical now, while O. Henry is a quaint curiosity (destined to be the respective fates, you have the feeling, of Toni Morrison and Rita Dove). Shakespeare and Dickens are still holding their ground, and Jane Austen is an industry. And if some people may feel a bit wistful about Scott, does anybody really miss Carlyle?

Posted by Geoff Nunberg at 03:54 PM

Date that quote

An exercise for the reader: date this quotation, on the basis of its content or form or both:

The present tendency in the teaching of English composition is all for power, for originality, for evidence of intellectual promise and capacity, for striking and vivid expression, -- in a word, for personality.

... There is a gap in the transition from school to college, and the reminders of grammar and good form are too often dismissed in the effort to obtain vigor and freshness of thought.

The general sentiment is a familiar one: critics complain that the teaching of writing has gone to hell in a handbasket because teachers emphasize creativity and the development of an individual "voice", meanwhile slighting grammar and mechanics. These days, such unfortunate trends are usually attributed to permissiveness (dating back to the '60s), the abandonment of traditional grammar (linguists are often identified as the villains here), and a decline in respect for authority.

But in the quotation above, the blame is laid specifically on college teachers. The passage assumes that grammar and good form are taught in the schools, but abandoned in college. Nobody would assume that today, when the critiques are of teaching at all levels.

So we're probably looking at a passage from some time ago, a conclusion supported by aspects of its form ("is all for", "too often dismissed", "obtain").

And so it is: it's from the preface to Manual of Good English by H. N. McCracken and Helen E. Sandison (then the president of Vassar College and an instructor in English there, respectively), published by Macmillan in 1917. The book was apparently a best-seller in the '20s.

Some things remain the same, some change. The "Words" section covers different from/than, fewer/less, oral/verbal, individual for person, unique 'rare, odd', "overworked" very, and many other familiar points of usage. On the other hand, McCracken and Sandison label as "colloquial" -- sometimes acceptably so, sometimes not -- many usages that wouldn't raise an editorial eyebrow these days, for instance the noun raise (in salary) for increase and the verb run (a business) for conduct or manage. And of course they don't discuss points that have become usage shibboleths since their day: notably, speaker-oriented hopefully and restrictive relativizer which vs. that. (M&S don't use hopefully, which didn't become common for decades after their manual, but they do use plenty of restrictive which, beginning on p. xix, which has three occurrences.)

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 12:03 PM

Considered harmful

Today's snowclone of the day arrived in email from Josh DeWald:

I'm a regular reader of Language Log and know that sometimes you guys track down sources of particular language use/memes. I've become curious about "X Considered Harmful". I think it originated in the 80s related to LISP or programing languages but that could be completely wrong (I'm a software engineer, so I'm biased).

The rise of "X considered harmful" as a phrasal template was sparked by Edsger Dijkstra's 1968 letter "Go To Statement Considered Harmful", published in the Communications of the ACM, Vol. 11, No. 3, March 1968, pp. 147-148. According to the Wikipedia article on "considered harmful" (you knew there had to be one, right?):

The original title of the letter, as submitted to ACM, was "A Case Against the Goto Statement," but ACM editor Niklaus Wirth changed the title to the now immortalized "Go To Statement Considered Harmful."

This view is supported by a quote from Dijkstra in his obituary in The Register:

"I had submitted a paper under the title 'A case against the goto statement', which in order to speed up its publication, the editor had changed into a 'Letter to the Editor', and in the process he had given it a new title of his own invention! The editor was Niklaus Wirth".

Under Wirth's title, Dijkstra's letter made a big impression, and the title's rhetorical impact was reinforced by a response and counter-response entitled "'GOTO Considered Harmful' Considered Harmful" and "'"GOTO Considered Harmful" Considered Harmful' Considered Harmful?".

However, "X considered harmful" was already a well-established journalistic cliche in 1968 -- which is why Wirth chose it. The illlustration above shows the headline of a letter to the New York Times published August 12, 1949: "Rent Control Controversy / Enacting Now of Hasty Legislation Considered Harmful". I'm sure it's not the earliest example of this phrase used in a headline or title, either -- I chose it only as a convenient illustration of usage a couple of decades before the date of Dijkstra's paper.

Note that this example is also in the title of a slightly cranky letter to the editor - it's probably not an accident that the first example that came to hand of "considered harmful" in a pre-Dijkstra title was of this type.

The phrase "is considered harmful", in itself, is an ordinary combination of words, whose form is regular and whose meaning is the usual kind of function of the meanings of its parts, and was in common use a hundred years ago. Thus Mary Rankin Cranston, "Child Wage-Earners in England", The Craftsman XII (4) July 1907, p. 427

"Street trading is considered harmful for girls under sixteen, therefore is some places is prohibited, and in others carefully safeguarded, according to local dangers and customs."

In headlines and titles, "X considered harmful" was a way for an editor to alert readers that the writer is going to be expressing negative opinions about X. In these more informal days, the same function might be fulfilled in headlines by phrases like "seen as bad policy" or "viewed as bad thing".

Dijkstra's argument against the use of the goto statement was an interesting and subtle one:

[O]ur intellectual powers are rather geared to master static relations and that our powers to visualize processes evolving in time are relatively poorly developed. For that reason we should do (as wise programmers aware of our limitations) our utmost to shorten the conceptual gap between the static program and the dynamic process, to make the correspondence between the program (spread out in text space) and the process (spread out in time) as trivial as possible.

[...]

The unbridled use of the go to statement has an immediate consequence that it becomes terribly hard to find a meaningful set of coordinates in which to describe the process progress. Usually, people take into account as well the values of some well chosen variables, but this is out of the question because it is relative to the progress that the meaning of these values is to be understood! With the go to statement one can, of course, still describe the progress uniquely by a counter counting the number of actions performed since program start (viz. a kind of normalized clock). The difficulty is that such a coordinate, although unique, is utterly unhelpful.

A much greater danger, however, is posed by the come from statement and its many practical approximations.

Posted by Mark Liberman at 06:33 AM

Is South America Spanish-Speaking?

In the article Spaniards Had Help in Conquering the Incas the New York Times recently stated that as a result of the conquest:

The great majority of people in South America speak Spanish today.

The Times subsequently added a correction:

A television review on Tuesday about "The Great Inca Rebellion," on PBS, misidentified the language spoken by a majority of South Americans. While Spanish is more widespread geographically, a small majority of the continent speaks Portuguese - not Spanish - because of Brazil's large population.

LL reader Jeff Taylor points out that even this correction is not accurate, depending on how one measures how widespread a language is. Let's begin with a list of the countries and territories in South America and some basic facts about them.

Country	Area (km²)	Population	Official Language
Argentina	2,766,890	39,537,943	Spanish
Bolivia	1,098,580	8,857,870	Spanish
Brazil	8,514,877	187,550,726	Portuguese
Chile	756,950	15,980,912	Spanish
Colombia	1,138,910	42,954,279	Spanish
Ecuador	283,560	13,363,593	Spanish
Falkland Islands	12,173	2,967	English
French Guiana	91,000	195,506	French
Guyana	214,970	765,283	English
Paraguay	406,750	6,347,884	Spanish
Peru	1,285,220	27,925,628	Spanish
Suriname	163,270	438,144	Dutch
Uruguay	176,220	3,415,920	Spanish
Venezuela	912,050	25,375,281	Spanish

If we ask how many people have a given language as the official language of their country, as the correction indicates, the dominant language in South America is, by a small margin, Portuguese.

Language	Population
Portuguese	187,550,726
Spanish	183,759,310
English	768,150
Dutch	438,144
French	195,506

When the Times says that Spanish is the most geographically widespread, they are evidently using as their measure the number of countries with a language as its official language:

Language	Number
Spanish	9
English	2
Portuguese	1
French	1
Dutch	1

However, Jeff Taylor points out that this is a bit misleading in that the usual measure of how widespread something is geographically is based on the area of which it is true. If we add up the areas of the countries in which the various languages are official, Spanish is still ahead of Portuguese, but only by a very small margin. The reason is that Brazil is very large, while some of the Spanish speaking countries are fairly small.

Language	Area (km²)
Spanish	8,825,130
Portuguese	8,514,877
English	227,143
Dutch	163,270
French	91,00

Of course, it need not be the case that everyone in a country speaks its official language. Virtually all South American countries have minorities who speak other languages and who may not speak the official language. In most cases these minorities are relatively small, but in Paraguay, they are not. Almost all Paraguayans speak Guarani, while only a minority speak Spanish. If we remove Paraguay as a Spanish-speaking country on the grounds that it is not Spanish but Guarani that should be considered the national language, the language spoken over the widest area becomes Portuguese:

Language	Area (km²)
Portuguese	8,514,877
Spanish	8,418,380
English	227,143
Dutch	163,270
French	91,00

It is true that if you randomly choose a South American country, the odds are it will be one whose official, and most widely spoken, language will be Spanish. Nonetheless, the total numbers of speakers of Spanish and Portuguese are actually about equal, as are the areas over which they are spoken.

Update: The European languages listed in the table above are not the only official languages. As I mentioned in a previous post, Aymara and "Quechua" (which is actually a small language family) are both official in both Paraguay and Bolivia. While this is true in legal terms, in both countries Spanish is the dominant language in terms of what language is actually used in the higher levels of the government and business, by major publications, and so forth. It isn't entirely clear how one might adjust the various calculations above to take into account multiple official languages. My impression is that any such adjustments will not affect the main point, that Spanish and Portuguese are roughly tied in terms of both number of speakers and geographical extent.

Posted by Bill Poser at 12:46 AM

July 02, 2007

Another Devanagari Rendering Problem

Wikipedia isn't the only organization to have trouble with the rendering of Devanagari. LL reader Partha Pratim Talukdar received the airline safety card shown below on a British Airways flight. The Hindi text in the antepenultimate line has /vi/ rendered incorrectly in the word विमान /vima:n/ "airplane".

Airline safety card with Erroneous rendering of Hindi /vi/

[click to enlarge]

Indeed, that isn't the only problem with the Hindi on this card. The third word, /surakʂa:/ "safety", contains a cluster that is normally rendered by a ligature, that is, by an "idiomatic" combination. In the security card, it is spelled out as a /k/ followed by an /ʂ/, with the default vowel following the /k/ suppressed by the diacritic known as a halant, the little thingie hanging off the lower right edge of the /k/. Even the halant is odd-looking - they are usually more-or-less straight, not curved like this.

Posted by Bill Poser at 10:19 PM

The Universal Declaration of Human Rights in Unicode

The Universal Declaration of Human Rights, proclaimed by the General Assembly of the United Nations in 1948, has been available for some time in close to 300 languages, although often only in the form of images of handwritten or printed text, not as proper text files. The Unicode Consortium has recently adopted the UDHR as a demonstration project and is facilitating the translation of the UDHR into more languages and making it available as ordinary text in several formats, particularly XML. You can view the results of their efforts and tables showing the status of work for various languages at the UDHR in Unicode website.

Unfortunately, this is the easy part. Getting governments to abide by the Declaration is a much harder task.

Posted by Bill Poser at 09:59 PM

You better not call me a puppy

Part of the daily grind of your hard-working staff here at Language Log Plaza is to keep readers abreast of all kinds of language use, including the nasty, insulting expressions that we see and hear around us every day, like the flap in Northern Ireland about words like shite and gobshite, the analysis of the nappy headed ho controversy, or which presidential candidate Ann Coulter has (or hasn't) called a faggot lately. We locate them all for our readers.

Insults have been around for a long time but, like every other aspect of language, they have a way of changing dramatically. And so do their consequences. Today we can be thankful that we have defamation law to remedy such things, but it wasn't always that way. It's easy to forget that only a couple centuries ago, insulting someone often led directly to bloody duels. Folks felt pretty strongly about a code of honor back then -- personal honor, at least. [note to self: maybe it would be good to have some kind of code of honor today]

I've been reading up on dueling lately for various reasons, none of which, I assure you, have anything whatsoever to do with the fact that I still haven't received my key to the executive wash room that was promised me when I was hired on here. Let's just say that I've been reading up on the topic of dueling. In her book, Affairs of Honor (Yale University Press, 2002, xvi), Joanne B. Freeman reports:

A man of honor deserved respect, so signs of disrespect were dangerous. Certain slurs were off limits, tame as they are by modern standards. Rascal, scoundrel, liar, coward and puppy: these were fighting words, and anyone who hurled them at an opponent was risking his life.

I can understand why someone would be offended by being called a rascal, scoundrel, liar, or coward, but puppy puzzles me. Why would that be so insulting as to cause a duel? Aren't puppies cute little bundles of joy? Isn't puppy love sort of nice? Maybe, I thought, it's because puppies aren't housebroken. But enough of the speculation -- off to the book shelf.

The Oxford English Dictionary lists an alternative British meaning: "an unpleasant or arrogant young man," and a similar sense appears in the American Heritage Dictionary without any reference to British usage: "a conceited or inexperienced youth." Okay, maybe these are slurs but I still wonder why the word would lead to bloodshed. Unless there is a meaning of puppy that has changed drastically since about 1800 (tell me if you know), all I can think is that people had some pretty thin skin back then.

[Update] We have great readers, and the following from Thor Lawrence is proof. He sent me the 1811 Dictionary of the Vulgar Tongue, subtitle A Dictionary of Buckish Slang, University Wit, and Pickpocket Eloquence, unabridged from the original 1811 edition, compiled by Captain Grose, assisted by Hell-Fire Dick and James Gordon, Esqrs. of Cambridge and William Soames, Esq. of the Hon. Society of Newman's Hotel.

The preface explains that before a member of the Whip Club altered and enlarged Captain Grose's dictionary, its "circulation was confined almost exclusively to the lower orders of society: he was not aware, at the time of its compilation, that our young men of fashion would at no very distant period be as distinguished for the vulgarity of their jargon as the inhabitants of Newgate."

And yes, it has a brief entry for puppy:

PUPPY: An affected or conceited coxcomb.

I hope this clears things up.

Posted by Roger Shuy at 12:25 PM

Convergence of initiatives

The recent international meeting on Darfur may not have done anything to stop the genocide, but it did have one positive result from my selfish point of view: a lovely addition to my personal stock of empty but positive-sounding phrases (Francois Murphy and Arshad Mohammed, "Session on Darfur ends without action plan", Reuters, 6/26/2007):

Despite the absence of specific action from the meeting, a UN special envoy, Jan Eliasson, said it had been useful.

"There has been a long period now of sometimes competing initiatives. Now there was general agreement that we should have a convergence of initiatives," he told reporters.

I guess that this translates as something like "Working separately, we've done nothing but talk; now most of us are saying that we need to talk about how to do nothing but talk in a more unified way".

This is a pretty common situation, actually. Quite a few of the meetings I've attended in recent years could have ended their work with a similar mention of general agreement about the appropriateness of pursuing a covergence of initiatives.

I'd offer thanks to Mr. Eliasson, but a Google search finds 11,600 prior instances of {"convergence of initiatives"}, which is apparently a standard piece of diplospeak that I've previously failed to notice, despite its obvious value.

Posted by Mark Liberman at 09:00 AM

The Snowclone Database

How many snowclones are there? The Snowclones Database, set up recently by Erin O'Connor, has only a half a dozen listed so far, but she writes that

As of today, I have at least 30 more snowclones queued up to be posted. I am holding off on taking suggestions until I get through them, although comments are open as you can see. I intend to post at least one snowclone a week until I get through my queue. All posts should be searchable, so if you do have a suggested addition to the database, please do a search before sending me a suggestion!

Posted by Mark Liberman at 06:39 AM

July 01, 2007

We are stardust, we are golden, and we've got to get ourselves one of those gizmos

The consumer electronics marketers have had their Woodstock, and so, after their various fashions, have journalism, gamers, physicists, open-source, and evolution science. Capitalism has its every year at the Berkshire Hathaway shareholders meeting, and the members of the Tri-Cities (Kennewick, Pasco and Richland, Washington) Astronomy Club have theirs on the grounds of the Laser Interferometer Gravitational Wave Observatory in Hanford, WA. Still, it's a rare event that can lay claim to being the Woodstock for an entire generation:

Earlier Friday afternoon, it was a techie Woodstock outside the Burlingame store, complete with lawn chairs, laptops, "smart" water vendors and an overwhelming sense of camaraderie as about 150 people waited in line outside the Burlingame Avenue store. Oakland Tribune

It appeared that the true believers were gathering in Palo Alto, where the Apple Store became a destination site for techno-pilgrimages. Atkinson, who left Apple in 1990, said he just dropped by Thursday night to check out the line and ended up sending his daughter home for a sleeping bag. "You know, I missed Woodstock,'' Atkinson said. "But I wanted to be a part of this.'' San Francisco Chronicle
Is experiential retail the new Woodstock? Are tech writers the new rock stars? "Inside Chatter"

Just another of the wearily ironic snowclones that journalists keep pulling out of their lunchpails, I'd have said.

But when the going gets tough, ironists aren't conspicuous among the people who are willing to spend 48 hours sitting in a lawn chair on the sidewalk waiting to buy a new telephone:

Camping out last night at the Palo Alto Apple store was not about an iPhone. It was about an experience. Something that I value far more than my new iPhone. There were many highlights for me. Listening to Kristopher react to having traded nods with Steve Jobs. When we left the Apple store we crossed paths with Jobs. I'm embarrassed to say that I didn't even notice. Kristopher did though and he traded nods and a wink with Steve Jobs. Steve Jobs has been a long-time hero of Kristopher's. It was an intensely powerful emotional thing for him. One of the best days of his life I'm sure.

True, the total number of people who were waiting in line at Apple stores across the country was probably about 498,000 short of the half a million that legend ascribes to Woodstock. But within a generation it's a fair bet that the ranks of those who claim to have been there will exceed the 12 million boomers who claim to have been at Woodstock, or even the 17 million or so who say they were present in Hershey, Pennsylvania on March 2, 1962, when Wilt Chamberlain scored 100 points against the Knicks. Well, okay, not there actually, but "there." As in, "Were you really in line at the i-phone launch, daddy?" "Well, no, son, but I eventstreamed it."

Posted by Geoff Nunberg at 02:47 PM

Creationist Linguistics

This just in: the new Creation Museum in Petersburg, Kentucky, claims that language families are a recent phenomenon, and cites linguists as authorities for this claim. Its display shows language families as rays radiating from a sun labeled BABEL, a reference to the Babel story of Genesis 11.

Here's the entire text of the display item:

The Bible claims that God created a number of human languages at the Tower of Babel "according to their families". Nineteenth-century linguists argued that languages evolved slowly, one by one. Today, linguists recognize languages fall into distinct "families" of recent origin.

This text is one of several on a single display board. The other items also contrast nineteenth-century science with purported modern science -- for instance, "The Bible claims that God destroyed the earth in a worldwide Flood. Nineteenth-century geologists argued that rocks were formed slowly. Today geology confirms that many rock layers wre deposited catastrophically."

The Babel story seems straightforward in its implications, from verse 1 (in the King James version), "And the whole earth was of one language, and of one speech", to verse 7, "Go to, let us go down, and there confound their language, that they may not understand one another's speech". If you believe the Creationists' Young Earth claim, also derived from Genesis, this is the Biblical authors' way of accounting for the fact that there are thousands of languages on earth today, rather than just one, or at least no more than the languages we could reasonably expect to develop over a period of only four or five thousand years.

Actually, the Bible is rather confusing on the question of when human languages diversified, at Babel or a bit earlier. The quotation "according to their families" in the Creation Museum's text comes from Genesis 10, verse 20 (not from the King James version, but with the same meaning): "They were the sons of Ham, according to their families, according to their tongues, in their lands, in their nations" (and similar text in verse 5). I am not a Biblical scholar, and I don't read either Hebrew or Aramaic, but it seems that the apparent contradiction between Genesis 10, which clearly posits numerous languages and even language families already in existence, and the (supposedly?) later events at Babel in Genesis 11, has bothered at least some devout interpreters (see here, for instance). In any case, the time depth available for the diversification of a single original human language, under a literal interpretation of Genesis, would be no more than a few thousand years.

So who are these modern linguists who "recognize" that languages "fall into distinct `families' of recent origin"? The Creation Museum doesn't say, of course. I can think of three possibilities. First, the recognizing linguists could be mythical, invented by the Creation Museum. Second, they could in fact be linguists, even with Ph.D.s, who are completely innocent of any understanding of historical linguistics. I'd like to think that there are no such linguists; but many linguistics departments nowadays have no historical linguists on the faculty, and some of them don't send their students to anthropology departments or language departments where a few historical linguists might still lurk, so many people with Ph.D.s in linguistics have never been exposed systematically to the study of language change. Even so, they're unlikely to be ignorant enough to subscribe to the Creation Museum's text, so this is the least likely of the three possible sources of that text.

The third possibility is the most interesting one: whoever is responsible for the text on that Creation Museum display might actually have read about recent controversies on establishing language families, and they might have misinterpreted the claim that after enough time has passed (10,000 years is commonly mentioned, as the roughest of rough estimates) it is likely to be impossible to support a hypothesis of relatedness among languages. Among historical linguists and in the popular press, the controversy has focused primarily on the claims of the late Joseph Greenberg and his follower Merritt Ruhlen about much more ancient language families, extending perhaps even to what Greenberg once suggested as "Proto-Sapiens" (see e.g. here for a Language Log post on the subject). Historical linguists' skepticism about such claims stems from the fact that, after some thousands of years have passed, it is likely that too little systematic evidence -- in the form of corresponding sound/meaning pairs of words or other morphemes -- will remain, and without such evidence no hypothesis of relatedness can be tested. We see the decay of the crucial evidence in all well-established language families, and it is certain that more time will mean more decay. A few years ago I posted comments on changing pronoun systems, showing among other things that the words for `I' in three Indo-European languages, though ultimately related, have changed so much in 4,000 (Latin) and 6,000 (Russian, English) years that their connection is no longer recognizable. This sort of example can easily be multiplied, for any language family.

But if language families can't be established beyond a few thousand years, does that mean that all language families arose within the past few thousand years? No, of course not, and that's where the Creation Museum's creators might have misinterpreted the linguists: no one, but no one, believes that an inability to find adequate evidence to support a hypothesis of distant linguistic relationships translates to the non-existence of distant linguistic relationships, including very ancient language families. It is certain that many modern language families are subgroups of more ancient families, but that their historical links are beyond the reach of the well-tested and validated methodologies. It is even quite possible that all human languages arose from a single ancestor. If they did, that ancestor must have existed many thousands of years ago. Twenty-first-century linguists, like nineteenth-century linguists, believe that languages diversify slowly -- that a language family arises when two (or more) subgroups of a single speech community become partly or entirely separated and then, because language change is unpredictable, their dialects inevitably change in different ways, until they have split into separate languages. Depending on external factors such as relative isolation from each other and contacts with unrelated languages, the process of language split might take 500 to 1,000 years. That is, it is gradual. And then you have a language family: a parent language, no longer spoken, and two (or more) daughter languages, split from their common parent.

Regardless of where the Creation Museum got its "information" about a contrast between older linguistics and current linguistics on this subject, the text on language families in their display is completely bogus. It fits well with their other "scientific" claims.

[Thanks to David Brumble, via Dan Everett, for the photograph of the Creation Museum's language-family display item.]

Posted by Sally Thomason at 12:20 PM

Colorblindness on the U.S. Supreme Court

There's been a lot of controversy over the Supreme Court's recent decision in Parents Involved in Community Schools v. Seattle School District No. 1 (See Linda Greenhouse, "Justices Limit the Use of Race in School Plans for Integration", NYT, 6/29/2007).

Deepak Chopra is among those who have commented on the associated battle for rhetorical possession of terms such as "colorblind" ("The Cruelty of Semantics", The Huffington Post, 6/29/2007)

... the conservative movement has a disgraceful track record for covering up cruel intentions with soothing semantics. "Compassionate conservatism" lulled the American electorate into accepting the most far-right presidency in history. "Enemy combatant" has deprived hundreds of prisoners at Guantanamo of basic protections mandated under the Geneva Conventions and opened the door for torture. "Family values" covers up hatred of gays and denial of social tolerance. Now to this legacy we are adding "colorblind" as a disguise for racial neglect.

In my opinion, the most interesting aspect of Chopra's commentary was a turn of phrase in its ending:

Despite the overwhelming public support for school integration in both Seattle and Louisville, five powerful white males were enough to squash a society's better nature. A pall hangs over the court for what they did, to the English language as much as to fair play.

The five "powerful white males" in question? Chief Justice John Roberts, along with Associate Justices Antonin Scalia, Samuel Alito Jr., Anthony Kennedy -- and Clarence Thomas.

Whatever you think about this decision (or about who should get to use "colorblind" to mean what), some things have certainly changed since 1954. And I don't mainly mean that an African-American justice on the Supreme Court was then outside the pale of political possibility. I mean that no commentator of any political persuasion would then have unthinkingly included an African-American justice in a group described as "five powerful white males".

It's ironic that Chopra did this in the context of an argument about reality vs. rhetoric. Perhaps there's some sort of corollary to the Hartman/McKean/Skitt Law that applies to such discussions.

Posted by Mark Liberman at 08:12 AM

3.	A:	Given that we are almost done with the paper, we should celebrate.
	B:	I am not almost done with my paper --
			i. I am done!
			ii: I'm still working on the results section.

8.	Il candidato non ha preso lontanamente 500 voti.
	The candidate didn't get farly 500 votes.

9.a :	*Lontanamente la miglior pizza della città
	By far the best pizza in town
9.b	Non è lontanamente la miglior pizza della città
	It is not nearly the best pizza in town

10.	Il candidato non ha preso neanche lontanamente 500 voti.
	The candidate didn't get even by far 500 votes.

[i]		Direct the destination and route used;
[ii]		Travel to any destination in the five boroughs of the City of New York;
[iii]		A courteous, English-speaking driver who knows the streets in Manhattan and the way to major destinations in other boroughs;
[iv]		A driver who knows and obeys all traffic laws;
[v]		Air-conditioning on demand;
[vi]		A radio-free (silent) trip;
[vii]		Smoke and incense-free air;
[viii]		A clean passenger seat area;
[ix]		A clean trunk;
[x]		A driver who uses the horn only when necessary to warn of danger; and
[xi]		Refuse to tip, if the above are not complied with.

[i*]		To direct the destination and route used;
[ii*]		To travel to any destination in the five boroughs of the City of New York;
[iii*]		To a courteous, English-speaking driver who knows the streets in Manhattan and the way to major destinations in other boroughs;
[iv*]		To a driver who knows and obeys all traffic laws;
		[etc.]

Language Log