Language Log: July 2004 Archives

July 31, 2004

Chattels personal, personal chattels, and (fresh) fish

Margaret Marks once again demonstrates that my ignorance of legal terminology is even more profound than I had imagined, by exploring the distinction between chattels personal and personal chattels.

I await with interest her judgement on the point raised by this exchange which seems to involve a novel instance of the sorites paradox:

Mr. Alexis FitzGerald: ... There is another point I should like to raise. It is, perhaps, a question I should be able to answer myself. Do we have a definition of personal chattels which entirely, completely and satisfactorily excludes what is intended to be free from the particular transaction which is to be avoided? I do not find anything in the Principal Auctioneers Act which this Bill is amending. While in that Act there is no definition of a personal chattel there is a reference to some particular type of goods which are excluded from the requirement of a licence. If an auction of fresh fish, for example, is not to be the subject of [419] the requirement of the provisions of subsection (1) of section 6, has it to be there spelled out that, as such, it is not a personal chattel? At what stage does it become a personal chattel? When it reaches your dish or when it has been extracted from the sea——

Mr. Cooney: When it is no longer fresh.

If I understand Dr. Marks' reasoning, fish (once caught) are always chattels personal (since they are not real), but may or may not be personal chattels (since they may or may not be things like "toilet articles, bags, umbrella"). Is the crucial question then whether a fish becomes like a toothbrush by virtue of being portable, or by virtue of being inedible? I look forward to the day that Margaret's readers in the Irish legal profession come back on line and clear this up. Or at least provide further elaboration, in the ancient Irish tradition of exact but somewhat puzzling regulations concerning fish:

''For digging in a churchyard to steal from it, for making a dam in a stream to take an excess of fish, or for stealing a hunter's tent, your cattle will be taken to the animal pound for three to ten days, depending on the circumstances."

For some further discussion of "legal regimes based on sharp but unpredictable distinctions among similar objects ", see the quote from Eben Moglen in this Language Log post from last fall.

[Update: the Irish are still offline, apparently, but Margaret Marks created, posted and explained this diagram of the ontology of property in English law:

]

Posted by Mark Liberman at 11:31 AM

The status quo just can't stand

In my last post, I would have liked to have given you all a link to the full text of the article by Just et al. from the August edition of Brain, entitled "Cortical activation and synchronization during sentence comprehension in high-functioning autism: evidence of underconnectivity". But instead I just linked to the abstract, because unless you have a subscription to Brain, the full-text link would have taken you to an Oxford University Press screen informing you that "You may access this article (from the computer you are currently using) for 24 hours for US$27.50", and I suspect that only a few of you care enough about the topic to pay that much for that little.

If you really want to read the article, and don't have a subscription, you might be able to find a public library in a big city that subscribes. Otherwise, you'll have to pay, or wait a while.

Brain is published by Oxford University Press, and like many other publishers, OUP is dipping its toes in the water of Open Access. According to this item at Peter Suber's excellent Open Access News, there was a news note in the July 27 Library Journal to the effect that

"After positive initial results from Oxford University Press's open access 'experiment' with Nucleic Acids Research (NAR), the press announced it will move to a full open access publishing model from January 2005. It has been published under a subscription model for 32 years and includes around 1000 original research papers per year; OUP said NAR was 'the first journal of such stature to make a complete switch from a subscription to OA model.' Said Martin Richardson, managing director of Oxford Journals, 'To fulfill our role as a university press we felt a responsibility to the scholarly communities we represent to explore it as a viable publishing model.' Rachel Goode, communications manager, noted that there is a huge correlation between institutions that subscribe to NAR and authors who contribute to it, making the journal a particularly good candidate for open access."

Sounds promising. But there's one more sentence in the quote from Rachel Goode:

"'I don't think the market is ready beyond certain subject areas,' she said."

Oof. But before you get too depressed about this July 27 quote, here's a July 28 quote from another Open Access News posting. The speaker is the director of the National Institutes of Health, and the quotation is taken from an article in The Scientist:

"National Institutes of Health (NIH) director Elias Zerhouni indicated at a gathering of 43 scientific journal publishers and editors Wednesday (July 28) that eventually all NIH-financed research will be freely available to the public. Zerhouni stopped short of setting deadlines for depositing full-text materials in the searchable PubMed database, as recommended in a House Appropriations Committee report released earlier this month. Instead, he asked the publishing executives to inform him how best to manage material so that the public can freely use it. 'The public needs to have access to what they've paid for,' Zerhouni told commercial and nonprofit publishing executives at a meeting he called on the NIH campus....'The status quo just can't stand.' " [emphasis added]

There are significant economic issues here, both theoretical and practical, but he's right.

The Just et al. paper that I wish you could read is directly in Zerhouni's cross hairs. The acknowledgement at the end explains that "This research was supported by the University of Pittsburgh-Carnegie Mellon University-University of Illinois at Chicago Collaborative Program of Excellence in Autism (CPEA), Grant P01-HD35469 from the National Institute of Child Health and Human Development." And NICHD in turn is "is part of the National Institutes of Health, the biomedical research arm of the U.S. Department of Health and Human Services".

Posted by Mark Liberman at 09:37 AM

Autism as lack of neurological coordination

In the August issue of Brain, there's an article by Marcel Just, Vladimir Cherkassky, Timothy Keller and Nancy Minshew, presenting evidence from functional MRI brain imaging for a new hypothesis about autism. They suggest that autism is not mindblindness due to a faulty theory-of-mind module, nor is it runaway maleness overwhelming empathy with analysis. Instead, it's underconnectivity: "a deficiency in the coordination among brain areas".

According to the CMU press release:

In explaining the theory, Marcel Just, one of the study's lead authors and director of Carnegie Mellon's Center for Cognitive Brain Imaging, compared the brain of a normal person to a sports team in which the members cooperate and coordinate their efforts. In an autistic person, though some "players" may be highly skilled, they do not work effectively as a team, thus impairing an autistic's ability to complete broad intellectual tasks. Because this type of coordination is critical to complex thinking and social interaction, a wide range of behaviors are affected in autism.

Here's the full abstract:

The brain activation of a group of high-functioning autistic participants was measured using functional MRI during sentence comprehension and the results compared with those of a Verbal IQ-matched control group. The groups differed in the distribution of activation in two of the key language areas. The autism group produced reliably more activation than the control group in Wernicke's (left laterosuperior temporal) area and reliably less activation than the control group in Broca's (left inferior frontal gyrus) area. Furthermore, the functional connectivity, i.e. the degree of synchronization or correlation of the time series of the activation, between the various participating cortical areas was consistently lower for the autistic than the control participants. These findings suggest that the neural basis of disordered language in autism entails a lower degree of information integration and synchronization across the large-scale cortical network for language processing. The article presents a theoretical account of the findings, related to neurobiological foundations of underconnectivity in autism.

Although the findings deal only with language, and in fact only with specific aspects of sentence comprehension, the paper's discussion extends the hypothesis to much broader ideas about autism as characterized by, or even caused by, lack of adequate integration among different brain areas.

This is a plausible and interesting idea, for which the authors cite a range of other evidence. But the contribution of this particular experiment should be interpreted a bit more cautiously, it seems to me. The task studied is a very specific and limited one: visually presented text with binary choices between interpretations:

The cook thanked the father.
Who was thanked? cook -- father

The editor was saved by the secretary.
Who was saving? editor -- secretary.

It's certainly interesting that there's a significant difference between the autistic and the control groups in the distribution of brain activity in performing this simple task. The autistic group showed more activity in Wernicke's area, and less activity in Broca's area. Putting it more generally, the autistics showed more activity in posterior language-related parts of the brain, and less activity in anterior language-related parts of the brain. This suggests greater focus on the words and their meanings, and less focus on how the words were put together. It seems to me to raise a host of interesting questions -- were the autistics just trying harder to figure out who did what to whom on the basis of meaning rather than form, as the authors suggest? or were they composing elaborate taxonomic theories about the entity classes involved? "Let's see, cook::father, secretary::editor, teacher::??" or might they have more efficient Broca's areas, which therefore were working at a lower duty cycle, needed less blood flow, and therefore looked less active to fMRI?

But I'm more concerned about the argument for lack of coordination, which depends on finding lower correlations among activity levels in different brain "regions of interest" (RoIs). fMRI measurements of activity levels are very noisy. A lower correlation in activity levels between two regions might reflect the fact that they are genuinely less coordinated, but it might also reflect the fact that the measurement of one of them has a lower signal-to-noise level. Which it would, given that its task-related activity level is lower.

So in evaluating this particular argument, I'd like to see the full dataset. There are some convincing-looking pictures

Fig. 2 Examples of functional connectivity between LDLPFC and LIFG (Broca's area) in individual participants, shown as the activation time series in the two brain regions, with vertical bars indicating boundaries between seven epochs of sentences of the same type. (A) Autistic participant with low functional connectivity, r = 0.31, where the two time series do not closely track each other. (B) Control participant with high functional connectivity, r = 0.79, where the activation time series in the two regions is highly similar.

and some significant (though not overwhelming) overall statistics ("When the functional connectivities of the two groups were comparedin each ROI pair separately, every single one of the 10 reliable(P < 0.05) differences (out of 186 comparisons) showed alower functional connectivity in the autistic group ... Although about nine differences might be expectedto be reliable by chance, the uniform direction of differenceis not expected by chance.") But I'd like to be able to play with the original data, to convince myself that they're not seeing the results of SNR differences rather than coordination differences.

Let me emphasize that I find the coordination hypothesis interesting and attractive. On one level, it seems directly opposed to the mindblindness or "theory of mind module" hypothesis, due originally to Frith et al.:

"...in a normally developing child, the computational capacity to represent mental states has an innate neurological basis. In the autistic child damage to the circumscribed system of the brain has occurred, and this prevents the normal operation of the critical cognitive mechanism"

If you subtract the idea that "theory of mind" is a highly localized function, the two ideas are less opposed -- "theory of mind" is a late-developing ability that plausibly depends on difficult coordination of several different brain regions, and (to the extent that a single brain region is singled out) shows heavy involvement of anterior regions near those that were less active in the autistic group in this experiment (which of course required no "theory of mind" reasoning at all).

The coordination idea seems less easy to square with Simon Baron Cohen's notion of autism as " extreme maleness", runaway male analytic thought (with concomitant deficits in female-associated empathizing, natch). Though you could spin out a theory about female brains being more integrated, etc. -- and of course some people have done that...

So maybe Marcel Just and Simon Baron Cohen are working from different ends of the autistic elephant. Or maybe one of them is struggling in the dark with a palm tree that happened to be nearby. Stay tuned.

[The Just et al. paper was brought to my attention by Fernando Pereira]

Posted by Mark Liberman at 08:07 AM

couldan't, shouldan't, wouldan't

I followed Mark's link to Nathan's Notebook and found the following interesting clipping from a text transcript of Jon Stewart's appearance on Larry King Live on June 25.

Stewart has just mentioned that he is "not a pacifist" -- "As a matter of fact, I like bombing countries." Larry King is surprised, and Stewart clarifies:

Well, just purely for the knowledge of geography. It's just fascinating to learn about these countries. ... I didn't know Kabul was the capital of Afghanistan until we started bombing it. ... If we would haven't gone to war there, I certainly wouldn't have known that.

Would haven't? Totally ungrammatical, I think, and take comfort in the fact that at the top of the transcript page it says plainly:

THIS IS A RUSH TRANSCRIPT. THIS COPY MAY NOT BE IN ITS FINAL FORM AND MAY BE UPDATED.

I've never heard anyone actually utter one of these types of examples; if I didn't have a paper to get back to, maybe I'd hunt down some audio/video of the Jon Stewart appearance on Larry King to hear what it sounds like. It really is so strikingly ungrammatical to me that I can't even clearly imagine how it would sound.

But then I Googled {"would haven't"} and got 5,930 hits. (Of course, Google helpfully asks Did you mean: "would have", which would have gotten me 15,300,000 hits.) Here are some more interesting results (taking reductions of HAVE into account, which as we all know are often spelled in various ways):

*modal-HAVE-n't*		*modal-HAVE-not*		modal-n't-HAVE		modal-not-HAVE
would haven't would ofn't would'ven't wouldan't	5,930	would have not would of not would've not woulda not	71,900	wouldn't have wouldn't of wouldn't've wouldn'ta	1,690,000	would not have would not of would not've would nota	2,960,000
could haven't could ofn't could'ven't couldan't	5,980	could have not could of not could've not coulda not	15,400	couldn't have couldn't of couldn't've couldn'ta	973,000	could not have could not of could not've could nota	1,520,000
should haven't should ofn't should'ven't shouldan't	5,920	should have not should of not should've not shoulda not	20,700	shouldn't have shouldn't of shouldn't've shouldn'ta	843,000	should not have should not of should not've should nota	1,270,000
TOTALS	17,907		111,468		3,655,900		4,647,310

Although I find all of the modal-HAVE-n't examples in the left-hand column ungrammatical, I'll assume that these are not errors of some sort and that some speakers find them grammatical. But I have vague memories of reading somewhere (probably something by (Pullum &) Zwicky?) that n't can only be ~~enclitic to the highest verb in a verbal projection (in this case, the modal)~~ affixed to finite auxiliaries (including modals) [as Zwicky & Pullum (1983:507) point out; see update below]. So what gives? Here's my hypothesis:

Some speakers have reanalyzed modal + reduced HAVE as a single ~~verb~~ finite auxiliary/modal.
The modal-HAVE-n't examples written with unreduced HAVE are not really pronounced with [hæv] -- they're pronounced the reduced way, with [ǝv] or just [ǝ].
Speakers who find the modal-HAVE-n't forms grammatical are among those who have reanalyzed modal + reduced HAVE as a single verb, which is what allows n't to be ~~enclitic~~ affixed in this context.

I wouldn't be surprised to find that there is work out there somewhere showing that (1) is true, or at least plausible. (Remember, I'm a phonologist, I don't read much of this stuff anymore. For all I know, the whole topic I'm talking about here has already been addressed somewhere.)

The empirical claim in (2) needs verification, and in fact one might surmise that the separated Google results directly contradict it:

*modal-HAVE-n't* (unreduced HAVE)		*modal-HAVE-n't* (reduced HAVE)
would haven't	5,930	would ofn't would'ven't wouldan't	29
could haven't	5,980	could ofn't could'ven't couldan't	20
should haven't	5,920	should ofn't should'ven't shouldan't	28
TOTALS	17,830		77

On average, then, I found about 232 instances of unreduced HAVE for every one instance of reduced HAVE (all three spellings combined) in the relevant examples. That's pretty striking. But here's what I think: a person writing down one of these examples really doesn't have much choice. Consider the options. Even if this person often writes e.g. woulda or would of, ~~adding enclitic~~ affixing n't to one of these is extremely odd. ~~Enclitic~~ The affix n't doesn't really fit well on would've either, because the result is a form with two apostrophes. So the person is left with would haven't -- not perfectly consistent with the reduced pronunciation, but the best orthographic alternative under the circumstances.

I realize this isn't quite sufficient evidence to reach the conclusion in (3), but that's my story and I'm sticking to it until I hear a better alternative. (Arnold? Geoff? Bueller?)

Interestingly, the single hit for could'ven't that I came across is an example sentence in a handout from a talk given by David Lightfoot. The sentence is starred:

34. a. Kim visited NY and Jim could've _VPe.

b. Kim visited NY and Jim couldn't _VPe.

c. Kim visited NY and Jim couldn't've _VPe.

d. I'd've visited NY.

e. *Jim could'ven't seen it.

The reason? Apparently:

32. E. Syntactic rules can affect affixed words, but cannot affect clitic groups.

F. Clitics can attach to material already containing clitics, but affixes cannot.

(As far as I can tell, there are no principles A-D anywhere in the handout. Maybe this is an example of auto-numbering in Microsoft Word gone haywire.)

Update, 8/1/04: Nope, just a case of citation without full representation in the body of the handout. Arnold Zwicky writes to tell me that (32E,F) are the last of six criteria distinguishing (inflectional) affixes from clitics in Zwicky & Pullum (1983), "Cliticization vs. inflection: English n't", Language 59.3, pp. 502-513 (cited in Lightfoot's references).

My defense is in three parts:

It was late. I was tired. Really.
It didn't occur to me to look in the references, since the only citations in the body of Lightfoot's handout are (a) Lasnik & Saito 1984, (b) Aoun, Hornstein, Lightfoot & Weinberg 1987, (c) "Syntactic structures (1957)", (d) Gibson & Wexler 1994, (e) Dresher 1999, and (f) Clark 1992.
I at least remembered having read something by Zwicky (& Pullum) about this subject.

Now that Arnold has called me out, I am re-reading Zwicky & Pullum 1983. If you care to join me, you can download a smaller (low-res) file here or a larger (high-res) file here (courtesy of JSTOR). I'm also very much looking forward to Arnold posting a reply to this post.

Assuming that n't is an affix but that 've and 'd are clitics [as shown by Zwicky & Pullum 1983; see update above], the contrast between (34c,d) and (34e) follows from (32F).

(32E) is necessary to explain the following contrast. (33a) is grammatical because couldn't is an affixed word and thus licensed, by (32E), to invert with the subject Kim. (33b) is ungrammatical because could've is a clitic group and so is not licensed to invert.

33. a. Couldn't Kim see that?

b. *Could've Kim seen that?

So, now I wonder a few things.

What would Lightfoot [rather, Zwicky & Pullum; see update above] say about the grammaticality for some speakers of modal-HAVE-n't examples like (34e)?
How do speakers who find the modal-HAVE-n't examples grammatical judge the form in (33b)? (My hypothesis predicts that (33b) should be grammatical for them.)
What am I still doing awake?

Good night.

[ Comments? ]

Posted by Eric Bakovic at 02:58 AM

July 30, 2004

Mixed action of ejectment

The OED has this to say about the origin in legal history of the standard names John Doe and Richard Roe:

John Doe, (a) (Eng. and U.S. Law), the name given to the fictitious lessee of the plaintiff, in the (now obsolete) mixed action of ejectment, the fictitious defendant being called Richard Roe; (b) name given to an ordinary or typical citizen (see also quot. 1942)

The earliest citation given is 1768:

1768 BLACKSTONE Comm. III. xviii. 274 The security here spoken of..is at present become a mere form: and *John Doe and Richard Roe are always returned as the standing pledges for this purpose.

Further enlightenment, including the distinction between personal chattels and chattels real as applied to fresh and unfresh fish, may or may not be available here, in a fascinating discussion which I have not had time to read.

Posted by Mark Liberman at 01:53 PM

Not Joao Euro

Fernando Pereiro emailed to explain that the Portuguese "John Doe" equivalent, given in the list that Robin Stocks copied from Blick Online who copied it from funnyname.com, is not just "Joe Euro" at all.

The Portuguese "João Ninguém" from that list has yet a different meaning from John Doe or Joe Bloggs. It's not (AFAIK) used in legal proceedings, nor does it refer to the average Joe. Instead, it has a picaresque meaning, going back at least to sixteenth century play "Auto da Lusitânia" by Gil Vicente. In it, "Todo o Mundo" ("everybody") is a rich merchant, and "Ninguém" ("nobody") is a poor man who discuss what they wish for in this world, while two demons (Berzebu and Dinato) joke for the gallery playing on the double meaning of the characters' names:

Ninguém Buscas mais, amigo meu? What else do you seek, my friend?

Todo-o-Mundo Busco a vida e quem ma dê. Life and who will give it to me.

Ninguém A vida não sei que é,
a morte conheço eu. Life I don't know, but death I do.

Berzebu Escreve lá outra sorte. Write down this finding.

Dinato Que sorte? What finding?

Berzebu Muito garrida:
Todo-o-Mundo busca a vida,
e Ninguém conhece a morte. Very colorful: Everybody seeks life, and nobody knows death.

Todo-o-Mundo E mais queria o paraíso, quanto devo para isso. I also want to get to heaven without anybody getting in the way.

Ninguém E eu ponho-me a pagar quanto devo para isso. And I am paying what I owe to get there.

Berzebu Escreve com muito aviso. Write (this) down carefully.

Dinato Que escreverei? What?

Berzebu Escreve que Todo-o-Mundo quer paraíso,
e Ninguém paga o que deve. That everybody wants heaven, but nobody pays what they owe.

"Ninguém" is the traditional honest loser who will not get his reward in this world. Later, "João Ninguém" seems to have acquired a more comical or tragicomic cast in Brasil, but I don't know about that in detail.

Posted by Mark Liberman at 01:49 PM

Doing the Kenosha kid

At the end of one of our obsessive efforts to construe a puzzling song line, I mentioned "the extended riff in Gravity's Rainbow on the phrase 'you never did the Kenosha kid'". A couple of readers asked me about this. I guess it's worth going over, if only to show that somebody else can be just as overanalytically playful as we sometimes are. Of course, this was a fictional character in a drug-induced delirium, but we'll take our role models where we can find them.

The context is London in 1944, when a thousand V2 rockets were raining down. The behavior of an American soldier named Tyrone Slothrop has shown an uncanny correlation with V2 landing sites. This is discovered because he puts stars on a map of London to locate his sexual connections, in a pattern which is recognized by a statistician who has been making his own map of rocket strikes. The thing is, Slothrop's hook-ups precede the rockets by a day or so, in what will turn out to be a victory for a farcical generalization of Pavlov over a real application of Poisson. But that's another subject entirely -- the point is that Slothrop's map gets him interrogated under sodium amytal, a technique first documented in 1943 as narcoanalysis, and sometimes referred to as "truth serum".

Sodium amytal, which is known chemically as sodium amobarbitol, is a barbiturate which, when administered intravenously, produces a relaxed and sleepy state in the subject. While in this state the patient tends to become more talkative, uninhibited, and spontaneous with what appears to be less guarded and defensive speech and behavior. Sodium amytal is not “truth serum” and individuals can lie or otherwise report misinformation under the influence of this barbiturate. However, individuals with dissociation of identity typically respond with overt symptoms and signs of their dissociative disorder including flashbacks, abreactions, and visual imagery with narratives by the patient in alternate dissociated identity states.

In Slothrop's case, one of the themes of his reaction to the drug is an obsessive meditation on alternative possible analyses of the six-word sequence "you never did the kenosha kid".

Pynchon's description starts with six alternative construals, organized as four cases with two subcases, and then explains

These changes on the text "You never did the Kenosha Kid" are occupying Slothrop's awareness as the doctor leans in out of the white overhead to wake him and begin the session. The needle slips without pain in the the vein just outboard of the hollow of the in the crook of his elbow: 10% Sodium Amytal, one cc at a time, as needed.

and then gives three more, for a total of nine. The first six construals (four cases) were

(1) A letter is sent from Slothrop (at the address "TDY Abreaction Ward, St. Veronica's Hospital") to "The Kenosha Kid, General Delivery, Kenosha, Wisconsin", asking "Did I ever bother you, ever, for anything, in your life?" The answer comes back

You never did.

The Kenosha Kid

Construals 2, 2.1, 3 and 3.1 are presented as dialogs:

(2) Smartass youth: Aw, did all them old-fashioned dances, I did the "Charleston", a-and the "Big Apple," too!

Old veteran hoofer: Bet you never did the "Kenosha," kid!

(2.1) S.Y.: Shucks, I did all them dances, I did the "Castle Walk," and I did the "Lindy," too!

O.V.H.: Bet you never did the "Kenosha Kid."

(3) Minor employee: Well, he has been avoiding me, and I thought it might be because of the Slothrop Affair. If he somehow held me responsible --

Superior (haughtily): You! never did the Kenosha Kid think for one instant that you ...

(3.1) Superior (incredulously): You? Never! Did the Kenosha Kid think for one instant that you ... ?

Construal (4) is presented in mock-epic narrative form:

(4) And at the end of the mighty day in which he gave us in fiery letters across the sky all the words we'd ever need, words we enjoy today, and fill our dictionaries with, the meek voice of little Tyrone Slothrop, celebrated ever after in tradition and song, ventured to filter upward to the Kid's attention: "You never did 'the,' Kenosha Kid!"

The construal numbered (5) -- which is the seventh if we count the variants (2.1) and (3.1) -- comes as a verb-final accusation:

(5) Maybe you did fool the Philadelphia, rag the Rochester, josh the Joliet. But you never did the Kenosha kid.

Number (6) -- or eight -- is one side of another conversation in a strangely imagined context:

(6) (The day of the Ascent and sacrifice. A nation-wide observance. Fats searing, blood dripping and burning to a salty brown ... ) You did the Charlottesville shoat, check, the Forest Hills foal, check. (Fading now ... ) The Laredo lamb, check. Oh-oh. Wait. What's this, Slothrop? You never did the Kenosha kid. Snap to, Slothrop.

Then there's 10 Kenosha-free pages featuring dream-sequence flashbacks to a jazz club in Roxbury MA (with a cameo appearance by Malcolm X as a shoeshine boy), and a long fantasy about Crutchfield the Westwardman and his Afro-Norwegian sidekick Whappo. Finally, back in dreamland Roxbury, the chapter ends with one last unnumbered variant:

In the shadows, black and white holding in a panda-pattern across his face, each of the regions a growth or mass of scar tissue, waits the connection he's traveled all this way to see. The face is as weak as a house-dog's, and its owner shrugs a lot.

Slothrop: Where is he? Why didn't he show? Who are you?

Voice: The Kid got busted. And you know me, Slothrop. Remember? I'm Never.

Slothrop (peering): You, Never? (A pause.) Did the Kenosha Kid?

[pp. 60-71 of the 1995 Penguin edition]

Posted by Mark Liberman at 10:15 AM

John Doe, Joe Farnarkle and John Furphy

[Following up on my earlier post " Matti, Nanashi and Fred"]. Robin Stocks' original post at carob (a blog) copied a list from Blick Online, which seems in turn to have gotten it from funnyname.com. Erin McKean (re)posted a relevant 2002 Verbatim column by Nick Humez.

I noticed an apparent typo in the Finnish name, but there is clearly a good deal more to say about this, starting with several of the comments on Robin Stock's post.

Minna from all-things-me added a more complete and believable correction of the Finnish name:

just letting you know that the Finnish version is a bit "off". It's actually Matti Meikäläinen

Minna should know, being actually Finnish, but the fact is that I should have see the vowel harmony problem too.

entangledbank made an important distinction:

'John Doe' and 'Richard Roe' are legal terms in the English-speaking world. They're not casual terms for the person in the street, which Joe Bloggs and Fred Nurke are, or in the USA Joe Public or Joe Sixpack. So I don't know whether any of these terms are translations of John Doe or just of Joe Bloggs.

Fred Nurke was a minor character in The Goon Show, the 1950s radio comedy, and taken up in wide use in Britain (and evidently in Australia) as a name for 'just some bloke'. I haven't heard of Mr Farnarkle but it's presumably of more recent invention, as farnarkling was a sport full of nonsense names invented by the New Zealand comedian Fred Dagg in the 1970s.

David Nash emailed me to agree, and explained further:

(except I go with the whG majority spelling Nurk rather than Nurke).
Fred Nurk is getting a bit more respectable, as in the model forms at http://www.pks.com.au/company/labwizardmarketfeedback.pdf
And, my parents' generation would instead say "Joe Blow".

A likely source of the furphy of "Farnarkle" is:

English (Australia): Fred Nurk, as in "afraid not" in a deep Aussie accent. Joe Farnarkle is another, a farnarkler is a bullshit artist.

-- Courtesy of Jeremy Ham http://www.funnyname.com/anonymous.html

where I suspect this Jeremy Ham to be indugling in a bit of farnarkling himself -- I've never heard "Joe Farnarkle" and Google only gets us to self-conscious listings.

In the legal context here, I think I've seen "A.N. Other" (but can't Google any for you).

If you wonder about furphy, as I did, here's a gloss and (long) explanation, starting

In the latest edition of The Australian Oxford Paperback Dictionary(1996) I entered furphy as a noun and an adjective and defined it as follows:

furphy n.(pl.furphies) 1 a false report or rumour. 2 an absurd story. adj.(furphier, furphiest) absurdly false, unbelievable: that's the furphiest bit of news I ever heard.

This Ozword comes from the name of [John] Furphy, a blacksmith and general engineer, who went to Shepparton from Kyneton in 1871 and set up a foundry. John Furphy designed a galvanised iron water-cart on wheels and his firm, J. Furphy & Sons, manufactured them. Each cart had the name FURPHY written large on the body. So successful were these carts that during World War 1 the Department of the Army bought many Furphy carts to supply water to camps in Australia and especially to camps in Palestine, and Egypt.

Fine -- but how did John Furphy's name come to be associated with rumours and lies? As far as I know, John Furphy was a most respectable and upright man, a Methodist lay preacher, and not in the least bit given to rumour mongering or telling tall tales. As a matter of fact, he often used the cast-iron ends of his carts to carry a variety of engraved moral advertisements, the following being typical:

WATER IS A GIFT OF GOD
BEER AND WHISKY OF THE DEVIL
COME AND HAVE A DRINK OF WATER

The standard account has it that the term furphy arose among Australian soldiers overseas during World War 1. It seems that when soldiers gathered around these water-carts, they became sites for gossip and rumour. Another story has it that the drivers of these water-carts carried gossip and rumour from camp to camp, no doubt making a good story better as they proceeded. Whatever the reason for the nexus, the nexus was soon established between the name on the cart and the rumour-mongering associated with the cart's arrival: the furphy was born as soldier slang. Shortly thereafter furphy (also spelled furfy and furphey) left the confines of the camps and established itself firmly as part of the general Australian language, a position it holds securely to this day.

What a great example of metonymy in action!

But one more thing: in legal parallel with John Doe we have Jane Doe. Female forms are rare in the lists cited so far -- is that because they're really not out there, or because they weren't collected?

[Update 1/6/2005: Philip Ryan writes to say that

Fred Dagg is actually the stage name of John Clark. Fred Dagg was his alter-ego, the canonical New Zealand sheep farmer.
The choice of the name "Dagg" was because a "dag" in Australian and New Zealand idiom is a silly, dumb, or idiotic person. Think Homer Simpson.
The word "dag" comes from the name for the shit, muck, and mess that gets stuck on sheep wool around the sheep's anus. Farmers and farmers-hands may regularly cut off the dags to keep their sheep clean. Or they are removed from the wool prior to processing, once the sheep is shorn.

]

Posted by Mark Liberman at 07:32 AM

July 29, 2004

Ain't I'm a stinkah

I think Mark's onto something when he writes that "ain't can be a sort of phrase-initial marker of questions and exclamations".

(Blah blah blah the obligatory links to previous relevant posts blah blah blah ...)

Mark cites the following examples:

(link) Ain't I'm a dog, I'm always steppin' around
(link) Ain't this is a great country -- free bologna sandwiches for profaners!!

Mark notes that the first is a "pretty solid citation" and that the second "might be a typo, I admit". My personal judgment is that the first is awful, but then again I didn't grow up hearing the relevant song. The second I find totally normal. If you follow the link, you'll see that this is the closing comment by a person who claims to have been arrested for holding a sign reading "F.U.G.W." along a Bush motorcade route. I take the comment to be sarcastic -- (and I mean sarcastic sarcastic, not could-care-less sarcastic).

Anyway, I think both examples make perfect sense if you take ain't to be a phrase-initial marker meaning something like "isn't it the case (that)", "don't you agree (that)", "it's clear (that)", or the like -- observe:

It's clear that I'm a dog; I'm always steppin' around.
This is a great country, don't you agree? Free bologna sandwiches for profaners!!

Mark then gives props to Trevor's alternative idea that there's a missing subject of ain't in the lyric we've been debating. Mark cites the following four examples to support this idea; curiously, only one of them (#3) has an actual missing subject.

you better focus, cos it ain't no one can quote this (link)
It ain't no cat can't get in no coop (from "Bill Labov's early work")
cos ain't no limitations on the things we do (link)
I know it ain't how it used to be (link)

All of these cases, including the subjectless #3, are examples of existential sentences. There is used as the subject of an existential sentence in standard English while it is used in most other situations in which a "dummy" or expletive subject is necessary. But in many nonstandard varieties of English, including AAVE, it fulfills both roles. Apparently, so may a null subject -- at least in some circumstances, such as the cos ain't no construction (and I mean "construction" in a relatively neutral sense -- I might have said "frame", but that might open up another can of worms).

(By the way, in case you found the 238 ghits for Mark's suggested {"cos ain't no"} search underwhelming, try adding the 18 for {"coz ain't no"}, the 89 for {"cus ain't no"}, the 533 for {"cuz ain't no"}, and the 3620 for {"cause ain't no"}.)

There's a lot I'm unsure of about the Ain't how that God planned it line, but I'm pretty sure it's not an existential that would thereby license a null subject. So what is it? The mystery remains.

Oh, yeah -- Happy Anniversary, Language Log!

[ Comments? ]

Posted by Eric Bakovic at 10:53 PM

Calling Samuel Beckett

At the gym this evening, I didn't bring my mp3 player, thinking it would be a good chance to think. Think again. The most salient sound in the fitness room, cutting through the techno-musak and the thud of feet on the treadmills, was the young woman four machines over talking on her cell phone. She was there when I started, and until she left 25 minutes later, cell phone cradled against her ear, I could hear every word she said. But that wasn't the unusual part. The thing is -- and I swear this is true -- the only contentful noun stem that she uttered during the entire conversation was weather.

The first thing that I heard her say was

So what's the weather there?

OK, fair enough. After a brief silence, she continued

Well, go check weather.com.

Good idea. Then she emitted a series of four equally spaced versions of yeah, each different:

Yeah. [low falling intonation = "I get it"]

Yeah? [high rising intonation = "Really?"]

Yeah?!! [empatic rise-fall intonation = "yes of course, you stupid idiot!"]

Yeah... [mid level intonation, pharyngealized voice quality = "well, I guess so, but..."]

After a brief pause, she said

Ask him what the weather is like.

Curiously, weather was focused, as if she'd just been talking about other things. She then added

Well, I'd like to know about the weather.

I had figured that out already, and so had everyone else in the fitness room, but apparently the party at the other end of the line was having some trouble grasping the concept. So she repeated

What's the weather there now?

A scattering of back-channel responses intervened:

Yeah...

Right...

Uh huh...

OK...

And then she returned, without any indication of impatience, to her theme:

And how's the weather?

I'll spare you the transcript of the next 23 minutes.

Her accent was generic educated non-southern American, maybe with a touch of California. Her voice struck me as piercing and nasal, but it's a little hard to separate form from content in judging voice quality: "nasal" is often just a phonetic re-interpretation of "annoying".

It's possible to fill in the context, and the other side of her conversation, in a way that redeems this individual from idiocy, if not from rudeness. Maybe she was talking with a whole series of family members, and wanted some independent judgments; maybe she was worried about her rose bushes and her interlocutor only wanted to talk about his tennis elbow; who knows? The trouble with public phone conversations is that you can't stop yourself from trying to fill in the other side, at least half consciously. In my opinion, and Mark Twain's opinion too, and also according to some experiments, this makes such half-conversations seem much louder, more salient (and more annoying) than they otherwise would be.

The trouble with this particular conversation was that it kept getting harder and harder to make sense of it. This was not because it was made up of complex phrases. Nor was it because its phrases were especially ambiguous or contentless -- in fact, a conversation made up of nothing but versions of "yeah" and "OK" would have been less gripping, I think. It was her endless repetition of queries about the weather, performed in an invariant manner as if each was the first, that made this such a special experience.

Oh, as she walked out the door, 25 minutes later, she did introduce a new morpheme:

Well, is it raining?

Posted by Mark Liberman at 09:54 PM

A birthday card from the Trib

Nathan Bierma, whose own weblog Nathan's Notebook is in our blogroll, focuses on Language Log in his column this week in the Chicago Tribune [registration required -- a copy is here].

As Nathan points out, his article appears on the anniversary of Language Log's first post, give or take a day: "Wednesday marked Language Log's first birthday, a major milestone by blog standards."

Though I say it as shouldn't, I'll quote the article quoting Erin McKean:

"Language Log excels not only at delivering readable, witty and informative entries about cool language topics, but also at pointing out the places where linguistics has something pertinent and interesting to say about current events," said Erin McKean, editor of the language quarterly Verbatim and Chicago-based senior editor of U.S. dictionaries for Oxford University Press.

One of the things that struck me about this article was its headline, "Linguists share love of words with amateurs via Weblog":

Back in April, Geoff Nunberg wrote about the "culture of polarization" in the language area, looking at amazon.com's "customers also bought" lists for a sample of popular books on language. He found a striking division between what he called "popularizations of linguistics" and "books by language mavens and word-lore collectors".

As a bunch of academic linguists, we're definitely on the "popularization of linguistics" side of this gulf. In fact, Geoff's list of P of L authors includes three Language Log contributors (if I add him, as he modestly failed to do).

The headline phrase "love of words" evokes the "language maven and word-lore collector" side of things -- maybe the alternative headline "Linguists share love of language with amateurs via Weblog" would have expressed our purpose better. It would have fit in the same space in the on-line version of the paper, as shown above, which has plenty of white space after "with". Of course, I haven't seen the print version, where the space constraints might have been different.

But judging from the links and email that we get, I believe that our readers come from both sides of the linguistic divide. For that matter, many don't seem to be consumers of either kind of book. I'll take this as confirmation of my conviction that language ought to be intrinsically interesting on many levels to almost everyone, and I'll take the headline to mean that our stuff appeals to the word lovers as much as to those with other orientations.

I'll even come out of the lexicographic closet and admit to being a word lover myself, though my affections are not limited to that aspect of language.

[Headlines are usually added by an editor, not written by the journalist responsible for the article that runs under them, so this should not be taken as a criticism of the writer. In fact, it's bad manners to complain at all about such a nice birthday card. I'm just sayin', is all... ]

Posted by Mark Liberman at 12:10 PM

Matti, Nanashi and Fred

Robin Stocks at Carob (a blog) has a fascinating list of John Doe equivalents around the world, occasioned by a translation error in German stories about recent events in Michael Jackson's trial. The child involved in the trial is referred to as "John Doe" for the sake of anonymity, but apparently someone didn't get it, and so the papers write about "die Aufenthalte der Familie Doe" and the like.

I'm not sure that the German references are really wrong, since "Doe" is presumably what the court papers actually said, but anyhow, it's interesting to learn about Matti Meikalainaen (Finland) [which I think is a typo for "Meikalainen", originally the fault of Blick Online], Nanashi No Gombe (Japan), Fred Nurk (Australia) and the rest of them.

[Update: Erin McKean at Verbatim Magazine has re-posted a 2002 column by Nick Humez on the same subject.]

Posted by Mark Liberman at 10:27 AM

Old wine in new bottles

In 2002, translator Sibel Edmonds was let go by the FBI after she complained about poor-quality work and even sabotage in translations produced by other bureau linguists. The Justice Department's inspector general has produced a classified report on Edmonds' charges, some aspects of which were revealed in a July 21 letter to congress from FBI director Robert S. Mueller III. The NYT has gotten a copy of Mueller's letter, and Erich Lichtblau has an article about it today's paper.

According to the article,

Ms. Edmonds worked as a contract linguist for the F.B.I. for about six months, translating material in Turkish, Persian and Azerbaijani. She was dismissed in 2002 after she complained repeatedly that bureau linguists had produced slipshod and incomplete translations of important terrorism intelligence before and after the Sept. 11 attacks. She also accused a fellow Turkish linguist in the bureau's Washington field office of blocking the translation of material involving acquaintances who had come under F.B.I. suspicion and said the bureau had allowed diplomatic sensitivities with other nations to impede the translation of important terrorism intelligence.

The inspector general's report apparently concludes that Edmonds "was dismissed in part because she accused the bureau of ineptitude", in the words of Lichtblau's article, and also "found that the F.B.I. did not aggressively investigate her claims of espionage against a co-worker."

A few years ago, people used to say that the classic George Smiley/Harry Palmer type of spy story was finished, because the dramatic frame was gone. But it looks like the classical themes are back, in a new framework.

Posted by Mark Liberman at 08:02 AM

You better focus, cos it ain't no one can quote this

OK, we're having fun now, Eric and Trevor and I. The rest of you can sing along or tune out, your choice. Warning: without the backstory (here, here, here, here, here, here) this might not make sense. In fact, with the backstory... well, anyhow.

There's another couple of possibilities here, both for ain't and for that, in Chuck D.'s puzzling line "Ain't how that God planned it?"

First, ain't. There's some evidence that ain't can be a sort of phrase-initial marker of questions and exclamations:

(link) Ain't I'm a dog, I'm always steppin' around
(link) Ain't this is a great country -- free bologna sandwiches for profaners!!

The first of those is a pretty solid citation, being the from the title and refrain of a popular song, repeated many times in each performance. It might be a joke but it's not a mistake. The second one might be a typo, I admit. This construction -- obviously a reinterpretation of subject-auxiliary inversion -- tends to undermine a half-century of arguments about structure-dependence, but never mind.

As an alternative, I like Trevor's idea -- at least I think it was his idea -- that maybe there's a missing subject before ain't, in a construction like

(link)
BAM, I slam cos Lil' Bud is who I am
I guess you noticed but if you haven't noticed
you better focus, cos it ain't no one can quote this
paragraph that I gets ill with
I got the lyrics that them fools can't deal with

There's a famous linguistic example of this construction in Bill Labov's early work, "It ain't no cat can't get in no coop". You can find plenty of examples with a dropped subject, for instance by searching on {"cos ain't no"}: "cos ain't no limitations on the things we do".

And you can find plenty of examples where the predicate is a wh-clause, like

(link)
I know it ain't how it
used to be
but I'm not good
at being me
anymore.

Right, now that. Since God is unique, in the relevant theology, and also invisible, the obvious meanings of the demonstrative (that one not this one, that one I'm pointing at, etc.) don't seem to work, which is why we've been talking about that as a complementizer. But there's another sense for the demonstrative, which the OED glosses as

b. Indicating a person or thing assumed to be known, or to be known to be such as is stated. Often (esp. before a person's name: cf. L. iste) implying censure, dislike, or scorn; but sometimes commendation or admiration.

In this usage, there's no implication that the referenced entity is non-unique or visible:

(link) Man, I really have to get to that Pynchon someday.
(link) Sometimes, when I want to get a little freaky, I turn up some of that Louie Armstrong.

In fact, speaking of Pynchon and Armstrong, this whole thing reminds me of the extended riff in Gravity's Rainbow on the phrase "you never did the Kenosha kid".

Posted by Mark Liberman at 07:05 AM

How (that) now, brown cow?

Trevor at Kaleboel has responded to our responses to his response to ... oh, never mind.

In my last post on the subject, I admitted that I could accept subject-drop in a noninverted declarative, but not in a noninverted interrogative:

That ain't how God planned it. → ø Ain't how God planned it.
Ain't that how God planned it? → *Ain't ø how God planned it?

Taking advantage of this admission, Trevor writes:

If you listen to other recordings by Chuck D, I think you may find that he doesn't use intonation to distinguish between questions and statements in the way that speakers of standard English do. That makes me slightly curious as to why we're so sure he's nervously asking us "Ain't how that God planned it?" instead of rounding off the section by telling us emphatically--as his tone suggests--"Ain't how (that) God planned it!"

Consider the line right before the debated part of the lyric again:

All I want is peace and love on this planet

Trevor is suggesting that the line following this one is the emphatic assertion "Ain't how (that) God planned it!", with (I assume) the implicit subject and wh-comp analysis that Trevor originally suggested. So, the whole lyric "should be":

All I want is peace and love on this planet /
That ain't how God planned it!

But what does the "that" (i.e., the implicit subject of the line actually sung) refer to? If Chuck D. is asserting that something is not going according to God's plan, what is that something? Here are the likely phrasal possiblities from the preceding line that could substitute for the implicit subject:

[All I want is peace and love on this planet]_S
[Peace and love on this planet]_NP
[This planet]_NP

Consider now how each of these fits in the line:

[That [all I want is peace and love on this planet]_S]_S' isn't how God planned it!
[Peace and love on this planet]_NP isn't how God planned it!
[This planet]_NP isn't how God planned it!

Only the third of these seems to fit in the line in a way that makes sense in the context of the whole lyric; as my wife Karen has suggested to me, one can imagine "this planet" standing for "the situation on this planet, lacking peace and love", given the context of the whole lyric.

I'm still not persuaded, though. The interpretation I imagine for the interrogative fits much better in my mind:

Ain't [peace and love on this planet]_NP how God planned it? (where "it" = "things (to be)")

(Besides, I still stand by my claim that the unreduced vowel in Chuck D.'s "that" is pretty strong evidence that it is the demonstrative [ðætˀ], not the complementizer [ðǝtˀ]. Trevor is right to point out that Chuck D.'s intonation is not going to give us much in the way of clues, but I'm pretty confident about the unreduced vowel thing.)

So, in my view, we're back to my original question (sort of, since I'm restating it in the light of subsequent discussion): did Chuck D. say "how that" (specifically, wh-subject)? If so, is that an error or a point of variation? If not, what did he say? -- We are obviously having a hard time coming up with alternatives that don't leak.

[ Comments? ]

Posted by Eric Bakovic at 02:57 AM

July 28, 2004

Skazeetch skazootch skazeetch skazootch

That's the sound of a magician sawing a box in half, according to MAD #213, March 1980, Page 15, as documented by The Don Martin Dictionary. You can add this to the other lexicographical references on comic-book sounds that we posted last week.

[ link via email from Jeff Erickson]

Posted by Mark Liberman at 05:22 PM

That's not how that ... is it?

Mark narrowly beat me to the punch in commenting on Trevor at Kaleboel's response to my post yesterday about the "Ain't how that God planned it?" lyric. I'm gonna have to (respectfully) go ahead and sort of disagree with Mark; I'm not so sure about the sensibility of Trevor's story.

Trevor writes:

I reckon that when Chuck D of Public Enemy sings
Ain't how that God planned it?
he is using "how that" where standard English speakers would use "how", and that the pronoun "that" is assumed in the "ain't" or what precedes it.

If I understand Trevor correctly -- and I think I do; Mark seems to have arrived at the same understanding -- he is saying that "how that" (wh-comp) in Chuck D.'s speech is "how" (wh alone) in standard English, and that there is an implicit pronominal subject of "ain't" (coincidentally, I assume, also "that"). Translating to standard English, then, we get:

Ain't (um, I mean, Isn't) that how God planned it?

Hmm. Now, I ain't no syntactician, but I'm suspicious of Trevor's wh-comp analysis of the "how that" in this case as well as of his assertion that there is an implicit-yet-unexpressed pronominal subject in this question.

Take the wh-comp analysis of "how that" first. In defense of this analysis, Trevor cites a case of "how that" that is undoubtedly wh-comp, and implies that he has found plenty more such examples:

The "how that"/"how" swap turns up in a variety of sources, including in 1513 in Douglas's Æneis (OED), where the "that" clearly does not refer to one particular (manifestation of) Aeneas:
How that Eneas socht ansueir at Apollyne

I'm not saying that Chuck D.'s speech couldn't have descended directly from the speech/writing of a 16th Century Scottish poet, or from the speech of the writers of any of the other 282 search results for "how that" in the OED quotations for that matter, though we still need to remove all the cases in which the "that" is clearly (part of) the subject of the clause following the "how". But if you just listen closely, Chuck D. pronounces "that" with a completely unreduced vowel; it's clearly [ðætˀ], not [ðǝtˀ]. In contemporary English, the complementizer "that" is only pronounced with an unreduced vowel in hyper-careful speech (if ever); I think we can all agree that Chuck D. is not being -- and has no reason to be -- hyper-careful in this case.

Now consider Trevor's implicit-subject assertion. Are inverted subjects dropped in (non-standard or standard) English? Certainly not in my dialect:

That ain't how God planned it. → ø Ain't how God planned it.
Ain't that how God planned it? → *Ain't ø how God planned it?

(Mark expresses similar skepticism about this point, too.)

These arguments notwithstanding, it just seems like a striking coincidence that these two things should go together in this case. Of course, Mark and I could just be misunderstanding Trevor after all. I hope he says more about it on his blog.

[ Comments? ]

Posted by Eric Bakovic at 05:10 PM

(That's) how that

Trevor at Kaleboel offers a sensible story about where "how that" comes from, in the phrase from a Public Enemy song lyric that Eric Bakovic brought up the other day:

All I want is peace and love on this planet /
Ain't how that God planned it?

Eric observed that Chuck D. is both clear and emphatic in enunciating that paticular sequence of words. It goes without saying that a recording like this is examined carefully many times before it's released, and anything regarded as a mistake, or at least as contrary to the artist's intention, would normally be fixed.

Trevor suggests that the expected "that" after "how" is part of the archaic (and dialectal) pattern in which "that" combines with wh-words ("how that", "why that", "who that", "where that" etc.), while the expected "that" before "how" has been elided. He cites analogous cases in Dutch as well.

I'd come to a similar hypothesis about the extra "that" -- minus the Dutch dimension, where Trevor has the advantage of knowing the language -- and had even come up with a sampling of "that's how that" cases from the web. The first two of these are from poems in the style of traditional song lyrics, and thus represent (or imitate) an archaic state of the language preserved in a local dialect. However, the last two are from spoken transcripts. The first transcript is of an interview with a mainstream American speaker (a retired high school principal) while the second comes from an interview with someone who is not a native speaker, but whose transcribed speech seems otherwise quite fluent and correct.

(link) With a door closed in my face now, and that's how that I was greeted
(link) That's how that she paid all her bills.
(link) And that's how that I really got into the principalship.
(link) That's how that I got started thinking of the Equitable Building.

The thing is, getting the extra "that" after "how" is the easy part [Update: as Eric observes, keeping it fully stressed might be a problem, though]. It's the missing "that" that's hard. I didn't post about this because I couldn't think of any plausible way to get rid of that first "that".

It's true that "that" is often optional, but that's the complementizer "that" (as in "I think (that) I know the answer) rather that the demonstrative "that" (as in "Is that what you want?").

A sentence-initial demonstrative can often be deleted, as an example of the general phenomenon of prosiopesis, in which "the speaker begins to articulate, or thinks he begins to articulate, but produces no audible sound (either for want of expiration, or because he does not put his vocal chords in the right position) till one or two syllables after the beginning of what he intended to say".

This can give you things like

That's what I want → 's what I want

but as far as I can see, it only works following a pause, or anyhow not in a context like

Ain't that how that God planned it?

So I wouldn't think twice about hearing "Ain't that how that God planned it?" -- in the right context and company I might even say it -- but the phrase that actually occurs in the song seems surprising to me.

But maybe some people allow more general that-deletion? Or maybe some more generic sort of pro-drop is involved?

Posted by Mark Liberman at 04:24 PM

Estuary English

Adding to our discussion of the intricate dance of regional and social variation in British speech, Kate Joester emails a comment about Estuary English:

I think there's probably another dimension in prejudice against Estuary English in particular. It's associated with "youth culture" and with being a fake accent acquired by speakers who are "really" something else in order to be youthful and cool.
It's the universal accent for imitating the stupid young and the mutton-dressed-as-lamb, no matter where they come from.

The opposite of that ever-popular LL post "When's the last time you heard an old person say "dadburn it", perhaps?

Here's a diagram from the Varieties of English site at Arizona, from the page on Estuary English, which makes a related point graphically. The diagram is copied (I think) from the original 1984 study by David Rosewarne, who wanted to describe the development of speech varieties in between the local dialects and the (mostly class rather than locality-based) "Received Pronunciation":

I guess that there might be a similar set of attitudes involved in the U.S. with respect to the spread of speech varieties perceived as originating among southern Californian "valley girls", though in that case there is the additional dimension of sex-related stereotyping.

Posted by Mark Liberman at 03:37 PM

The Queen's business

The following BBC News Online story about a British accent survey comes to some conclusions that complement Ray Girvan's comments and Steve Thorne's findings (both reported by Mark today).

The details of the survey are not particularly clear from the story, but the results are claimed to reflect people's perceptions of different accents in terms of such things as success in business, trustworthiness/honesty, aptitude for hard work, and reliability. Khalid Aziz, chairman of the Aziz Corporation (which carried out the survey), is quoted as saying:

"In terms of success in business the Geordie accent comes out right at the bottom. In terms of associating it with honesty, it comes right out at the top which is why there is a lot of call centres in that neck of the woods. The accent that is really bad was the Scouse accent which comes at the bottom of practically everything, particularly in terms of honesty. It is unfortunate and is clearly a stereotype."

Sid Waddell, a darts commentator, has a nice response to the survey results concerning his native Geordie accent:

"There is a paradox here. If Geordies are seen are [sic] the most honest, but don't get on in business, then what does that say about business? I think these views are a deeply ensconced stereotype coming from the fact that anyone who speaks with a strong regional accent is thick."

Good thing he didn't say "thick regional accent" ... :-)

I sometimes find it hard to blame regular folks (or PITS, as Arnold sometimes refers to them) for their linguistic prejudices, given how badly some of our own representatives are often portrayed in the media. For example, there's a link to the above story from the sidebar of another one entitled "Yoda 'speaks like Anglo-Saxon'". What is this story doing in the Education section? And was David Crystal really interviewed about how Yoda's OSV speech pattern could be used to educate children about embracing linguistic diversity?

Mr Crystal said his mission was for non-standard English to be recognised. "The history of English is a history of the non-standard language. The people I'm attacking are the purists who say language should never change and be 'like it was when I was a lad'. The message should be that we welcome diversity."

Another link from this story's sidebar takes you to one entitled "Do you speak Elf?" (also in the Education section). No linguists comment in this one, but a "special needs co-ordinator" at a Birmingham boys' school, who is offering after-hours courses in Elvish, says:

"The children really enjoy it. It breaks the idea that education should simply be aimed at getting a job. [...] It's very different from just studying a language like French: the boys are doing this for fun, like [Tolkien] did. That has to be a good thing."

I don't disagree, but consider some of the comments from readers of the story:

Great. Will be really handy when these kids grow up into the real world and need to get jobs.
Don't go learnin' Sindarin - it's bad for yer' elf.
No, I am very glad to say that I do not speak "Elvish" and it will be a cold day in hell before I do.
What an absurd idea. Most of the children I come across can't even speak English, never mind French or German or any other useful language. If grunting is a major element of elf vocabulary, the children should be brilliant at it.

Full disclosure: more of the comments are positive, or at least not negative. Some particularly good ones:

I think that it is really great that these children want to learn it, because once you have acquired the skills of language learning, it is much easier to pick up another language, thus what they see as a fun pastime could actually turn out to be a big help to them in the future.
My daughter (aged 14) and her friends have been studying Elvish in their school lunch hours for two or three years now. They really seem to enjoy it. It amazes me how deeply they get into the grammar of it, but this can only be helpful for their learning the more "mainstream" languages.
Excellent stuff. They'll probably learn more grammar studying Sindarin than they will in modern mainstream education French or German.

(But this last one sadly continues: "More elves in our schools, that's what I say! Equip all teachers with longbows? Hmm. Not a bad idea. Now if only we can get that past the whining liberal orcs, there's a fighting chance of improving our education system.")

Finally, let's not forget everyone's real motivation for learning a foreign language: love, sweet love:

I can't speak Elvish. But I wouldn't say no to learning it if all the Elvish-speaking fellas are as hunky as that Aragorn. I wouldn't mind being his Bess!

[ Comments? ]

Posted by Eric Bakovic at 01:06 PM

The beauty of Brummie

Following up on recent posts about accent evaluation, here's a clear explanation by Steve Thorne of an experiment comparing how different English accents are perceived by native and non-native speakers.

In May 2002, I recorded short samples of 20 different accents of English... In order to limit the influence of extraneous variables, the speakers chosen were all male, white, aged between 35 and 40, and upper-working to lower-middle class. These recordings were played to 96 native and 109 non-native English speakers who were then asked to briefly describe each accent and rate each one on a scale of 1-10 (1 = very unpleasant, 5 = neutral, 10 = very pleasant).

According to Thorne:

... the native speakers reacted predictably. The French, Southern Irish, Edinburgh Scottish and Geordie (Newcastle-upon-Tyne) accents received the most favourable responses (none, incidentally, described the very nasal French accent as 'nasal'), the American and rural accents such as Cornish and Norfolk also did well, but Welsh, RP (Received Pronunciation), Northern Irish and accents associated with large urban conurbations such as London (Cockney) and Liverpool (Scouse) fared badly. No prizes for guessing which accent came bottom. Black Country.

Black Country is "an accent associated with the South Staffordshire area of the English Midlands". I'll freely confess that I had never heard of it, and so would not have guessed that it would come in last (or "come bottom", as Thorne puts it in that charmingly quaint UK-ish patois of his :-).

In the cited page, Thorne doesn't explain in detail who the "native speakers" were and where they came from -- that surely would make a big difference, since Americans' ability to distinguish among UK accents is usually rather poor, and their evaluative associations with UK accents are quite limited. I expect that the same is true in reverse for the British speakers with respect to American accents, though perhaps American television and movies have conveyed some clues about the stereotypes involved.

Anyhow, one of Thorne's original goals is to defend the speech of his native Birmingham, which seems to be something like the New Jersey or Brooklyn of England:

Ask a British person what their least favourite accent is, and they will more than likely say 'Brummie' - the variety of English spoken in the West Midlands city of Birmingham. Ask them why, and they will more than likely use adjectives such as 'nasal', 'monotonous', 'miserable' and/or 'ugly' to justify their responses. Such views are based on the belief that all other accents are higher in aesthetic value than Brummie, and even those who are prepared to accept that Brummie is not 'wrong' (and many aren't) seem fundamentally opposed to the idea that other accents are not more aesthetically pleasing. But is Brummie really ugly?

In support of his claim of anti-Brummie prejudice, Thorne quotes from a shockingly smug and nasty BBC page on "How to speak Brummie", which he accuses of speading untruths:

[A] common misconception about Birmingham intonation... is that it 'falls' at the end of sentences, and this leads to criticisms that it is 'dull', 'miserable', 'depressing' and/or 'downbeat': "In Brummie, the lowering suggests despondency and makes it less attractive to the listener . . . the lack of aural variation quickly begins to grate".

The results of Thorne's experiment support his intuition:

The responses of non-native speakers, on the other hand, were inconsistent - ranging from 'harsh' (for Brummie), through 'nice', to 'melodic', 'lilting' and 'musical', and from 'clear' (for Southern Irish), through 'boring', to 'disgusting'. Although there was no significant difference between the overall scores for each accent, many appeared to prefer the characteristically Brummie 'rising' and 'high tone at the end of sentences', criticising instead the 'cold and unemotional' character of Edinburgh Scottish - one respondent even going so far as to describe the Scottish speaker as 'untrustworthy'. Scouse was also praised on many occasions for its intonational distinctiveness - its clarity, 'pleasant tonality', and dynamic 'rolling of the r', but reactions on the whole were generally mixed, and there was little evidence to suggest that foreign speakers were dipping into the same adjective cluster as their British counterparts - no high occurrence, for example, of the words 'nasal', 'common', 'whingey', or 'wrong' to describe the Birmingham accent.

Thorne's conclusion:

These findings demonstrate that non-native speakers work to a totally different set of criteria when evaluating English accents, and do not discriminate on the same grounds as native English speakers. Judgements of the perceived beauty or ugliness of accents are based almost entirely upon a knowledge of the social connotations which they possess for those familiar with them.

Returning to that BBC Brummie page again, I observe that its author confidently asserts that

The Birmingham accent hits one note - usually a low one - and sticks to it no matter what. It is this lack of aural variation that is the principle cause of irritation for others. It is also the source of the stereotype of the unimaginative Brummie. The accent stays the same and never varies, and so subconsciously people assume the same must be true of the speaker.

although I suspect that this assertion has no factual foundation whatsoever.

It seems much more likely that the individual who wrote this BBC page has a standard old-fashioned snobbish distaste for Brummies, finds their accent to be irritating and unpleasant for the usual social-psychological reasons, and has decided to rationalize these competely irrational prejudices by offering a pseudo-scientific explanation in terms of hypothesized properties of Brummie intonation.

As Thorne points out, what this BBC page says about Brummie intonation is internally contradictory -- it is described as having "a downward intonation at the end of most sentences" which "suggests despondency and makes it less attractive to the listener", but it is also described as "[hitting] one note - usually a low one - and [sticking] to it no matter what". Thorne also points out that the different aspects of the description are in any case factually incorrect, and especially that "[a]n extensive use of rising rather than falling tones ... is typical in Birmingham speech". His non-native listeners bear him out, describing Brummie as "'melodic', 'lilting' and 'musical'".

One could also study this question using the tools of instrumental phonetics rather than native vs. non-native perception. Given an appropriate sample of speech, one could evaluate quantitative properties of pitch contours, such as percentage of final rises, falls and levels, or distributions of rates of change over final syllables or final post-stress regions. One could look at distributions of pitch values and their first couple of derivatives at different time scales, or the distribution of pitch shapes over syllables and words.

As far as I know, no one has ever done this, in a systematic way, in comparing different accents of English, though Pierre Delattre did something similar across languages, 50 years ago, in comparing the pitch contours of lectures by Margaret Mead and Simone de Beauvoir.

Posted by Mark Liberman at 10:44 AM

Standard = Neutral?

In response to my post on within-U.S. linguistic prejudice, Ray Girvan emailed:

I can fill in some UK equivalents on the basis of my experience.

Mainstream Scots English speakers ridicule teuchter (outer islands) accents and the Anglicised "pan loaf" accents of middle-class sections of Edinburgh and Glasgow.

In England, many British regional accents are considered acceptable by RP speakers: for instance, Scottish, Welsh, Northern Irish and Irish, Yorkshire, Lancashire and Geordie. But they tend to view speakers of southern rural (e.g. Devon or Norfolk) as yokels; and those of Estuary English, despite its growing user base, as uneducated. ("At the 1995 Conservative party conference the Minister of Education, Gillian Shephard, launched into a denunciation of EE, condemning it as slovenly, mumbling, bastardized Cockney").

There is also the plumminess factor.

In the U.S., the traditionally standard radio or television voice is perceived as being maximally bleached of all marked characteristics ("having no accent"). Linguistically this is nonsense, of course, but it does reflect a democratic set of values, in which the desired reference value is viewed as being at the middle or zero point of the descriptive space, rather than being at one extreme corner.

As I understand it, traditional BBC English, in contrast, is perceived by most people as being a marked value. In this 1999 article, Boris Johnson, after losing a BBC radio gig, claimed to be "the first casualty in a war against those 'what speak proper' - the victim of a fresh assault on the mode of speech once dubbed BBC English". The article says that

Radio veteran John Peel, whose show Home Truths will precede the Borisless Week in Westminster, swapped his public school accent for a Liverpudlian drawl during Beatlemania.

Peel is pictured as moving from one corner of the space (the "plummy" "BBC English" "public school" upper class corner) to a different one (the "Liverpudlian drawl" corner). This perspective is echoed by a quote from linguist J.C. Wells:

"People are no longer automatically inclined to assume what people from the upper classes do is worthy of imitating," he said.

With speakers of received pronounciation no longer monopolising higher education, the media and the government, the accent may have become as much a liability as any other.

"People have prejudices about the social group who use a certain accent, rather than the accent itself," said Professor Wells.

Wells' way of talking reflects the linguistic truth of the matter, which is that every way of talking is one "accent" or another, since the underlying descriptive system has no natural zero point.

However, the same article quotes "Gregory de Polnay, head of voice at the London Academy of Music and Dramatic Art" talking in a way that assumes the contrary:

Polnay thinks Boris could soon master a radio-friendly "neutral" accent.

The quotes from Polnay characterize upper-class speech in a way that is more stylistic than linguistic: "clipped vowels", "nasality", "pushing the sound of his voice down through the nose", "exaggerate his diction by pushing out phrases". These are rather different descriptive categories than a phonetician like Wells would use. I'm not sure whether they can be given any scientific validity by being reduced to properties of physiological or acoustic measurements, or even to properties of intersubjectively valid perceptual scales. I'm skeptical that Polnay's "nasality" has anything to do with the nose or the nasal passages and sinuses, for example. Perhaps these terms are just useful aids to performance, like the golf instructor's admonition to "be the ball".

But the main thing here is that Polnay seems to see the space of accents as having a zero point, a "neutral accent" which is in effect lack of accent, all other ways of talking being deviations from that. In any time and place, this is an evaluation placed on a space of linguistic variation, not any intrinsic property of the system itself. But it's also not the only way to view a linguistic standard.

From my outsider's perspective, it seems to me that the British have traditionally taken the view that the standard to be aspired to -- once known as "received pronunciation" or "RP" -- is definitely an "accent", a particular set of values in the space of linguistic variation. This is in contrast to American folk linguistics, in which the standard is usually seen as a pure and transparent form of speech that lacks all discernable properties.

The American folk view is scientific nonsense, but it reflects a democratic set of values, which I for one find laudable. So if the British are now coming around to the American view, they are improving their social attitudes at the same time that they are moving further away from understanding the linguistic facts of the matter.

Of course a better result, on both sides of the Atlantic, would be for people to learn to understand the space of linguistic variation, the space of social evaluation, and the relationship between them. Then they could make well-informed decisions about both.

Posted by Mark Liberman at 08:57 AM

July 27, 2004

How that?

Reading some of Arnold's posts on the thin line between error and mere variation reminded me of a song lyric that I've frequently wondered about.

The song is Public Enemy's "Fear of a Black Planet" (from the 1990 album of the same name). As you can clearly hear, Chuck D. sings:

All I want is peace and love on this planet /
Ain't how that God planned it?

Interestingly, the lyrics for this song on the band's official site are written as:

All I want is peace and love
On this planet
(Ain't that how God planned it?)

The following review of the album quotes the lyrics as they're actually sung on the album.

Is this an error or mere variation? Inquiring minds want to know.

[ Comments? ]

Posted by Eric Bakovic at 09:58 PM

A new record for within-U.S. linguistic prejudice?

In earlier posts, we've discussed cultural prejudices about the speech of southerners and other groups within the U.S. Up to this point, Michael Lewis' reporting on the Microsoft anti-trust trial was my touchstone for density of linguistic prejudice in journalism, but now there's a new contender. It's a series of columns by Christie Vilsack, published about ten years ago in the Mount Pleasant (Iowa) News.

Christie Vilsack is the wife of the governor of Iowa, and a scheduled speaker (tonight?) at the Democratic convention in Boston. I haven't been able to find a copy of her columns, which I suppose must have been dug up by the active researchers at the RNC, but they were discussed and quoted at length in a 7/26/2004 article by David R. Guarino in the Boston Herald. (Guarino also has an Election 2004 blog, but there's no extra material there, at least so far). The story has also been picked up by the AP, the Washington Post, the Washington Times and other outlets.

Quotations from Vilsack's columns (published in 1994 and 1996) have her visting Atlantic City, NJ, and blasting people from the part of the country where I now live:

"Later, on the boardwalk, I heard mothers calling to their children, 'I'll meet yoose here after the movie,' "she wrote. "The only way I can speak like residents of New Jersey and eastern Pennsylvania is to let my jaw drop an inch and talk with my lips in an 'O' like a fish. I'd rather learn to speak Polish."

She also attended the Atlanta Olympics, and had this to say about the people she met there:

"When I ask for directions, I can't understand the slurred speech of southern Americans, who are so polite and eager to please," Vilsack said.

And she was made uneasy by those tricky bi-dialectal African-Americans:

"I am fascinated at the way some African-Americans speak to each other in an English I struggle to understand, then switch to standard English when the situation requires."

Guarino positions Vilsack in the traditionally despised role of censorious schoolmarm:

An educator for 30 years and former eighth-grade language teacher, Vilsack has made language and literacy priorities as first lady.

She has become a key power player in Iowa politics and is widely credited with breathing new life into Kerry's flagging presidential bid in January with her endorsment a week before the kickoff Iowa caucuses.

At the Jan. 12 endorsement event, Kerry said of Vilsack, "Christie is the first teacher, not just the first lady."

There is a certain amount of consistency here -- like Michael Lewis, Christie Vilsack is quoted as scorning the speech of southerners and people from New Jersey. Her problem with African-Americans is a somewhat unusual one, since they mostly get slammed for not mastering the standard version of English at all, so to be disturbed that they switch back and according to circumstances seems egregiously boorish. (Well, she wrote that she was fascinated, but Guarino is clearly expecting us to conclude that "fascinated" = "disturbed" in schoomarmish, though without the context it's hard to tell whether this is what she meant -- maybe she was just fascinated by the discovery of dialectal code-switching?) And I wonder whether Vilsack has any idea what Polish sounds like, or was just picking on Poles as an instance of a stereotypically despicable group.

Without the full context of the columns, it's hard to tell, but I'll speculate that what underlies Vilsack's quoted comments is the visceral distaste that some people feel for the speech patterns of particular other groups. Such distaste seems evident in Michael Lewis' reportorial obsession with the "booming hick drawl" of Microsoft's lawyer, for example. A classical example is Henry Higgins' exhortation of cockney Eliza Doolittle not to "sit there crooning like a bilious pigeon". The blogger at justoneminute (himself from New Jersey) cites an even more spectacular example:

This isn't all new - if I recall, it was H.L. Mencken who observed that the most effective method of birth control yet invented was a Brooklyn accent.

It's not easy to see just what determines which accents will seem disgusting to someone, and which will seem merely exotic or even attractive. Certainly racism, anti-semitism and so on are part of the picture, but probably not all of it, since these don't seem to explain the widespread prejudice against New Jersey (where I proudly lived for 15 years, myself), or the analogous prejudice in Europe against the speech patterns of Belgians. I've heard spectacularly prejudiced observations from Dutch people about Flemish speech, or from French people about Walloon speech. (Note to Americans: if a French person comments on your command of the French language by observing that they thought you were Belgian, it's probably not a compliment).

Meanwhile, Chistie Vilsack's husband Tom has apologized for signing an English-only bill a couple of years ago, which he says was forced on him by a Republican-controlled legislature).

OK, everyone, make a note: if you want to be a politician in 21st-century America, take a linguistics course and learn how to think and talk about dialect variation in a rational way.

Avoid those embarrassing gaffes! You too can learn to define and promote language standards without treating non-standard speech as lawlessness, stupidity, disease, laziness, duplicity or bad posture!

Posted by Mark Liberman at 02:08 PM

IPA in Japan

A couple days ago, Mark Liberman suggested some ways in which International Phonetic Alphabet might infiltrate pop culture, creating a more universal awareness of phonetic transcription. He might be pleased to know that in addition to British cameras and German beers, such an effort is also underway in Japan and Korea, where IPA transcriptions show up from time to time on product labels and in commercials.

I first became aware of this phenomenon a few years back. One day I was watching Japanese TV, and was startled to see the following commercial: a dog is shown sitting and looking attentively, while a woman's voice off-screen says "I love you..." The dog cocks its head, looks puzzled, and then barks "roof rooooof roof". The screen fades to black, and we see in neat block letters: [aɪ lʌv juː]. This is repeated several times. (If I ever knew what the commercial was actually for, I've forgotten by now.) I asked a Japanese (linguist) friend of mine about it, and she assured me that IPA familiarity is very high in Japan, because it is used in foreign-language dictionaries. (It has apparently been used in English-Japanese dictionaries since the 20s.)

I had thought the fad was pretty much over by now (decorative writing in French and Italian seem to be the new vogue), but recently I encountered the mysterious drink called [woː], manufactured by Kirin (picture above). It comes in Salty Cat, Monkey Fizz, and Bloody Wolf flavors, modeled on various cocktails. If you're curious and can read Japanese, there is a review here.

Kirin is also responsible for a drink called B-flat (bii furatto), which is written simply with the flat sign: ♭ Maybe their marketing division is actually a bunch of former opera singers?

Posted by Adam Albright at 01:54 PM

IPA in beer (and other) ads

A couple of days ago, I speculated in a jokey way about how to promote the International Phonetic Alphabet by direct appeal to the general public. One of my silly fantasies involved brewers starting to label their products phonetically. It seems that in Germany at least, reality is running ahead of fantasy.

Abby Shoun writes, with photos, to tell me that

You might be interested in an ad campaign that was run recently here in southwestern Germany - you just may have gotten your wish! A German beer company used pseudo-IPA (adapted for German speakers) in a series of advertisements to highlight their connection to the local dialect, Schwäbisch, and its cultural associations. Here attached are pictures of two of the posters from the campaign. They've obviously tried to adopt genuine phonological notation (square brackets and length marks), but they've also deviated from IPA proper to make the ads more accessible to German speakers (<w> for [v], <ch> for [x], etc.). ...

(For a while, by the way, it seemed that the campaign had run its course; most of the posters have been gone for weeks. But I recently started noticing new placement of the ads on café umbrellas and beer coasters. Who knows - maybe IPA sells!)

I like the angle of using IPA to emphasize a local dialect connection. Local microbrewers all over the U.S. could take advantage of this idea: "Have a [ˈɓɘeɹ]!" "A beer?" "No, dummy, this here is a [ˈɓɘeɹ]."

A couple of months ago, way back in May -- what is that in blog years? -- Steven Bird noted the use of IPA in marketing an Olympus camera in Europe. Unfortunately the same camera is marketed in the US without IPA, but perhaps the seed of a trend is there.

If IPA labels, logos and ad copy can come to be associated simultaneously with high technology, high style and gritty local authenticity, the battle will have been won.

Now I'm waiting to hear about the use of IPA in hiphop lyric sheets.

Posted by Mark Liberman at 10:39 AM

Kerry's French cousin, and Derrida's obscurantisme or otherwise

I've learned some more from other bloggers about the kerneuropa ("core Europe") idea, which I previously posted about here, here and here. And along the way, about a number of other things as well.

Chris Waigl at ˌser.ən'dɪp.ɪ.ti has composed a long meditation on "Europe’s left-wing republicans and right-wing liberals", full of fascinating digressions, like this one:

John Kerry ... has a French first cousin, the politician Brice Lalonde. On first sight you would imagine these two being cousins fits very well, in political terms: Kerry is the “left-wing” candidate (to be) in the upcoming US presidential elections and Lalonde used to be an early member of the Green party and was Minister of Environment in a left-wing government under president Mitterrand. But that’s not the entire story. Lalonde, who has travelled to Boston to lend (moral?) support to Kerry, left the Green party in the mid 80s and underwent a large shift to the right. He is now an outspoken opponent of linking together environmental issues and traditional left-wing (socialist) positions, supports Jacques Chirac and considers himself politically closest to Alain Madelin.

Lalonde being Kerry's cousin is well known (e.g. here, here, here, here), and it's even alluded to (though in passing and without a name) in a New Yorker article that we referenced here, but none of this had really registered with me, and I certainly didn't know anything about Lalonde's political trajectory.

Trevor at kaleboel writes more briefly that

I'd have thought that one would describe Derrida and Habermas as conservatives not because of any association with any particular dogma but because they both clearly long for times past.

The trouble with that definition, it seems to me, is that history is so diverse that anyone who doesn't have a soft spot for one past period or another must be completely lacking in opinions. I'm personally quite fond of the salad days of the Enlightenment, for example, but I've never felt that this entitled me to be considered a conservative. Anyhow, the particular period for which Derrida and Habermas pine is more recent, according to Trevor's hypothesis:

The 70s, let us not forget, were a period which combined rampant collectivism with reasonable sales for the writings of the Axis of Retrieval.

Axis of Retrieval? I can appreciate the cleverness of the pun, but I'm worried that the the content might be going over my head. "Axis of Evil", OK; "retrieval" = "longing for times past", fair enough. But could there be something more specific? According to Google, no -- "axis of retrieval" turns up only an article on some sort of security technology, and a couple on document search. Still, this is Europe, where words don't always mean what us naive Americans think they do, so I'm on my guard.

Trevor goes on to reference John Searle quoting Michel Foucault saying mean things about Jacques Derrida:

Searle: With Derrida, you can hardly misread him, because he's so obscure. Every time you say, "He says so and so," he always says, "You misunderstood me." But if you try to figure out the correct interpretation, then that's not so easy. I once said this to Michel Foucault, who was more hostile to Derrida even than I am, and Foucault said that Derrida practiced the method of obscurantisme terroriste (terrorism of obscurantism). We were speaking French. And I said, "What the hell do you mean by that?" And he said, "He writes so obscurely you can't tell what he's saying, that's the obscurantism part, and then when you criticize him, he can always say, 'You didn't understand me; you're an idiot.' That's the terrorism part." And I like that. So I wrote an article about Derrida. I asked Michel if it was OK if I quoted that passage, and he said yes.

Chris, on the other hand, says this about Derrida:

Derrida has this, well, technique of always pushing two apparently contradictory points at once; and I have often found the results enlightening (his written work may be difficult, but he is a superb, crystal-clear lecturer, which amazed me when I first heard him speak).

The idea that Derrida is a crystal-clear and charismatic lecturer is new to me. It would explain something that I've never been able to understand: how could someone whose works are so, well, obscure have become so famous? You can speculate about the appeal of esoteric knowledge to adolescents, or the effects of pure fashion, but these are desperate post-hoc theories that have no predictive power, as they would apply equally well to any incomprehensible crank at all. I've never heard Derrida speak, and based on my experience with his writing, I would not have gone out of my way to attend a lecture, but now I'm curious about it. I wonder if there any lectures of his on the web, in video, audio or transcript form...

[Update -- I haven't located any recordings of Derrida's lectures, but some insight may come from the lectures of Jacques Lacan, whose writings are at least as obscure as Derrida's, but whose performance style creates an unmistable impression of lucidity.]

Posted by Mark Liberman at 09:45 AM

July 26, 2004

Two paradigms of eloquence

Two blogs that I read regularly have recently featured devastating analyses of shameful behavior by mainstream media figures.

Eugene Volokh has continued his effective critique of Slate's Kerryism feature, which is at least as lame as its catalog of Bushisms. He asks:

When someone asks the author of Slate's Kerryisms "Do you have the time?," does he just say "Yes" and walk on? If someone else says "Yes, it's five thirty," does the author condemn the "it's five thirty" as a "caveat" or "embellishment"?

From these rhetorical questions to the peroration,

...it just galls me to see this sort of stuff -- not substantive, not funny, just empty snideness descending into self-parody -- in a magazine of Slate's prominence and quality.

it's a pleasure to watch to Prof. Volokh dissect William Saletan's Kerryism of the day, in a small masterpiece of merciless but controlled outrage. The unavoidable conclusion: either Saletan is stupid, or he was forced to create material to occupy a predetermined slot in the magazine by filling in a predetermined form with the first quotation he could lay his hands on.

The supporting testimony of several decades of research on conversational meaning would be a superfluous addition to Prof. Volokh's commonsense analysis. Still, the pairing of Saletan's paint-by-numbers japery and Volokh's insightful reaction would make a good reading assignment for a course on pragmatics -- or one on rhetoric.

Larry Lessig's takedown of Bill O'Reilly over the Jeremy Glick issue is a rhetorical tour de force of a different kind. It's longer, and is supported by no fewer than 16 hyperlinks documenting the facts of the case, O'Reilly's ongoing campaign of abuse against Glick, and the gulf between the two. Against the background of this documentation, Lessig's rhetoric is as strong and as carefully controlled as Volokh's was:

On February 4, 2003, Jeremy Glick was your guest on THE FACTOR. Glick had lost his father in the attack of 9/11. He had also signed an ad criticizing the war in Iraq. You were “surprised” that one who had lost his father could oppose that war. And so you had him on your show, presumably to ask him why. (Here’s a clip from Outfoxed putting this story together.)

You might not remember precisely what you said on that interview, or more importantly, what Jeremy Glick said. So here’s a copy that you can watch. Nor may you remember precisely what the ad that Jeremy Glick signed said. Here’s a copy that you can read. And when you’ve watched what was actually said, and read what was actually written, I’m sure you will see that the statements you continue to make about Jeremy Glick are just plain false. Not Bill Clinton “depends upon what is is” false, but false the way most Americans learned growing up: just not true.

[... 13 bullet points, with hyperlinks, detailing and refuting O'Reilly's accusations against Glick]

I understand how someone loses his temper, Mr. O’Reilly. I have done the same myself. But a decent man apologizes for his lack of control, and he certainly doesn’t continue to abuse someone he has wronged.

Mr. Glick is not the New York Times. He will not earn more money from higher ratings because you attack him so viciously. Neither he nor his widowed mother get any benefit at all from seeing Glick slandered by your show on a regular basis.

You are wrong about the facts, Mr. O’Reilly. And you are wrong to continue to do such harm. Have the courage to admit your error. Apologize to Mr. Glick, and let him go back to a life that has been made difficult enough by, as you said, the “barbarians” who killed his father. This family has suffered enough from barbaric behavior.

It's nice to see that the tradition of Joseph Welch is in good hands.

Posted by Mark Liberman at 10:28 PM

The thin line between error and mere variation, part 4: Do I misspeak?

Much of my recent research has to do with syntactic variation in English (I really must edit my website to reflect this) -- sometimes on details of constructions that are for the most part uncontroversial, sometimes on phenomena that are very widespread but are condemned by some usage manuals, sometimes on relatively infrequent and largely disregarded phenomena. I seem to have specialized in variation that isn't tied in any obvious way to the standard extralinguistic factors (geographical region, class, age, sex, race/ethnicity), although a few of the variables are associated with informal style or with speech as opposed to writing. (To get the flavor of this research, check out the handout for my "Seeds of Variation and Change" paper at the 2002 NWAV conference.)

Now, I'm used to having people, especially non-linguists, respond to some of my data through the lens of rules they've been taught. Being Blinded By the Rules, I call it. It seems that once you've had a generalization about grammar, however spurious, made explicit for you, you can no longer judge language like a normal person; a little learning is a dangerous thing. You may deny that you use some variant -- possessive antecedents for pronouns, split infinitives, stranded prepositions, certain types of "dangling modifiers" -- when in fact you use it with some frequency. You may make tortured attempts to avoid this variant. You will certainly discredit reports that other people use a variant that you don't -- say, Isis ("The problem is is that I don't speak that way"), GenXso ("I'm so not going to talk about this"), or themself ("Everybody should get themself a research project"). You'll be inclined to treat these usages as errors, not as real linguistic variants, that is, parts of somebody's grammar (maybe your own).

But for some time now I've been getting Blinded By the Rules responses from other linguists.

Sometimes, most annoyingly, on variants that I use myself, like "a morphological rule or rules". I'm told that if I'd only think about these examples, I'd see that they were ungrammatical. This is a covert reference to some hypothesis about the structure of English, in the face of which colleagues just deny other people's judgments (and practice). It's just like a prescriptivist confronting someone who says "Me and Kim did it" with the instruction to just think about the sentence; you don't say "Me did it", do you? (You don't say "a morphological rules", do you?) When variants can be associated with recognizable social categories, most linguists know enough not to treat variation as error -- that would be insulting -- but when the variants aren't socially anchored, linguists tend to revert to behaving like ordinary people, except of course that the linguists are in possession of theoretical hypotheses that probably wouldn't have occurred to non-specialists.

If I'd stuck to looking at the details of constructions that are for the most part uncontroversial, there wouldn't be much of a problem. I could continue looking at stranded to (infinitival to stranded by Verb Phrase Ellipsis, as in "They told me to jump, but I didn't want to"; see my "Stranded to and phonological phrasing in English", Linguistics 20.3-57 (1982)) and the Quasi-Serial Verb construction ("I'll go see who's at the door"; see Pullum's "Constraints on intransitive quasi-serial verb constructions in Modern English", OSU WPL 39.218-39 (1990)), and I'd get little grief from my colleagues. (I do, in fact, continue to look at these.) But when I branch out from there, I start to be confronted by collegial disbelief.

This is easiest to respond to for variants that are condemned by usage manuals, like nominative coordinate object pronouns ("Just talk to Kim and I") or themself (the topic of a 2003 Stanford honors thesis by Joel Wallenberg). Every linguist knows that things get in the usage manuals only because a lot of people are inclined to say/write them, so that if a lot of usage manuals condemn it, there must be a "dialect" (in some sense of this word) in which this variant is grammatical, even if this dialect is not easily associated with extralinguistic factors (as these two are not).

After that, we wade into dark waters. Two sample cases, one involving a variant that I don't have (Wh-that, as in "I wonder how many people that were at the party"; see my 2002 article "I wonder what kind of construction that this example illustrates", in Beaver et al., The Construction of Meaning, 219-48), another involving a variant I do (GoToGo, as in "She's going to San Francisco and talk about firewalls", the topic of a 2004 Stanford qualifying paper by Laura Staum). Note that some sort of error was almost surely involved in the historical origin of each of these constructions, so there's room here for the exercise of the etymological fallacy, which would class them as errors rather than variants, but that's not my point here.

Wh-that is not at all frequent -- from ten years of fortuitous collection, I have only 50 examples -- and it occurs mostly in speech, where it could easily be overlooked. Colleagues I showed examples to were at first inclined to dismiss them all as production errors, blend slips, in fact; these colleagues would point out that if you just thought about the examples, you'd see they have an extraneous that (extraneous from the point of view of the usual formulations of the rules for embedded Wh constructions). Quite possibly some of the examples were production errors. But in other cases, a single speaker produced a number of examples, never self-corrected, and (when I was able to question them) judged them entirely acceptable, typically not understanding why I was asking. So I argued that, for at least some of the speakers, we were looking at a genuine variant, not an error.

GoToGo isn't very frequent, either, but it's been around a long time (David Denison, at Manchester, has a big file of examples going back to ca. 1930), and for those of us who have it (maybe 20% or so of modern American English speakers, including me and Ivan Sag, but not my daughter or Tom Wasow, just to choose people from my immediate milieu) it's unremarkable. When challenged on it, by colleagues who said, helpfully, just think about, you'll see it's ungrammatical (a failure of parallelism in coordination), my first response was, but how else would you say it? (The answer is: "She's going to go to San Francisco and talk on firewalls", with two occurrences of go rather than the one in my version.)

In both cases, my colleagues' response can be unpacked into two parts: (1) this isn't English, because it's not grammatical for me (the colleague); and (2) it can't be English, because it violates a well-known constraint (on embedded Wh constructions, on parallelism of form in coordination). Part (1) isn't a problem for me; everybody's entitled to their own grammar, after all, and if Wh-that and GoToGo aren't your things, that's ok with me. Part (2) is the sticking point; keep your goddamned hands off my grammar. If my grammar doesn't obey the rules you cite, then either you've formulated the rules wrong (this is, I think, the case for certain instances of "dangling modifiers", as in "As a parent, my concern is with the children"), or there are special constructions to which your rules do not apply (this is what I claim for Wh-that and GoToGo). If the special constructions were associated with, say, the Pittsburgh area, or working-class speakers in Northern cities, or African American men, linguists would readily admit the reality of the variants, however far from their own grammars they might be. But if it's just some random collection of speakers sprinkled across the geographical and social landscape, then even linguists are inclined to think they're looking at error rather than variation. Sigh.

Now for an even tricker case. The story begins with a posting of mine to ADS-L on 7/9/04: "For fans of the type of nonconstituent coordination referred to in the generative literature as Right Node Raising (RNR), here's an extraordinary example from the 7/8/04 Palo Alto Daily News (p. 12), in a letter from Charles Browning, M.D. of Palo Alto, on health care costs: A 2004 Institute of Medicine report... documents that the uninsured, unable to afford health insurance, have less access to, and receive inferior, care."

I should say at the outset that, for me, RNR tends to be formal, self-conscious, and mostly restricted to writing, though there are certain instances of it that are unproblematic for me in that context ("They gave books to, and in return took money from, numerous customers"). The Browning sentence was notable enough for me to label it an "extraordinary coordination", something I wouldn't expect to come across in speech rather than writing. Other ADS-Lers agreed with me, but others had their doubts. Larry Horn fired back wryly that same day (7/9/04) with a flagrantly zeugmatic example, suggesting that the Browning sentence was similarly flawed: "And then there are the unconcerned uninsured, who have less access to, but don't really, care." After that, things tended to decline into a parade of My Favorite Zeugma (Flanders and Swann and all that).

And then (still on 7/9/04), from a distinguished colleague I'll refer to as K, there came this e-mail message: "Hey, Larry, with his wit, has made me see that the example is actually completely ungrammatical." K went on to explain that it couldn't be grammatical, because it violated a constraint on reduced coordination, namely that the shared element had to be assignable to a single category, but that "care" in "have less access to care" was a NP while "care" in "receive inferior care" was only a Nom (depending on your religious beliefs, you can substitute, for NP and Nom, either N" and N' or, oh alas, DP and NP). There is actually a very subtle point here, having to do with K's implicit claim that "care" in "have less access to care" is a NP but not a Nom, but let's not get hung up on that; what so annoyed me about K's response was that it simply denied my grammaticality judgments and cited, in defense, a hypothesis about the analysis of RNR -- the (1)-(2) punch above. K and I have failed to come to any sort of understanding about this.

It gets worse. On 7/11/04 I wrote to K: "by the way, what the hell licenses "a morphological principle or principles" 'a morphological principle or morphological principles'?" This was a matter of some concern to me because I'd just used this very phrase, quite unreflectingly, in a Language Log posting, and then noticed the odd way in which modifiers were distributed across the conjuncts. True to form, K replied, still on 7/11/04: "I don't regard "a morphological principle or principles" as grammatical. I think it sneaks by under the radar like "if you have or would apply"...." That is, it's an error; if I'd only thought about its form, I would have seen that it violates a condition on reduced coordination. Later that day I defended myself: "It's perfectly fine for me, and parallel examples (which I find acceptable as well) are easy to find. In fact, "a specific/particular person or persons" and "another person or persons" are abundant in legal and administrative contexts."

During this exchange, K resurrects another coordination puzzle, from three months before: "If you have or would apply..." On ADS-L on 4/11/04, I reported that a grad student had sent me the example (from a friend in e-mail) "I could (and have) watched people play that game for hours." Here there are two coordinated auxiliary verbs, requiring two different verb forms (base and past participle, respectively), which should (according to some hypotheses about reduced coordination) result in an irresolvable conflict, and ungrammaticality -- but, instead, the form required by the nearer conjunct (the second one) determines the verb form. Similar examples can be found in more elevated contexts:

(Ex 1) There are plenty of venues at which Mr. Chirac could, and has, demonstrated his rapport with Mr. Schroeder. (New York Times editorial, “Playing Politics with D-Day”, 1/19/04, p. A20)

(Ex 2) To the Editor: The United States government's attempts to manipulate the world price of crude oil by increasing gasoline taxes or by releasing oil from the Strategic Petroleum Reserve have and always will fail. (Richard J. Stegemeier, "retired chairman and chief executive of Unocal Corporation", New York Times letter, 5/27/04, p. A30)

As I reported on ADS-L: "The grad student suggested I do a Google search on "could +and have" -- which yielded over 8k hits (without going into Google groups)! Some of these are irrelevant, of course, but, still, the number is huge. Actually, "would +and have" netted over 17k hits. Even "might +and have" got about 2k. So there's a hell of a lot of determination by the nearest going on in unguarded writing. Agreement with the nearest is ridiculously easy to collect from unscripted [English] speech, though most linguists seem to treat it as a performance error. [It's well-known that agreement with the nearest is grammaticalized in certain contexts in some languages.] I'd imagine that government by the nearest is also common in speech, given how incredibly frequent it is in writing. I'm tempted to suggest that government by the nearest conjunct is in fact the rule for vernacular English -- which would explain why it's so hard to teach people to avoid this construction in formal writing."

But K insists on treating government by the nearest as an error, not a variant. After all, it's not in K's grammar (or mine, for that matter), and it violates a generalization about how reduced coordination works.

One brightish note: when I sent K a note on 5/15/04 about "I have had my car washed and hair cut" (from a postcard I'd just written to a friend), which has yet another sort of odd distribution of a modifier across conjuncts, K merely confessed puzzlement about its analysis -- undoubtedly because K found it grammatical (as I do).

Still, I sometimes despair about getting colleagues to take my research program on syntactic variation seriously -- as research on syntax, not on speech errors.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 09:13 PM

You've come a long way, baby

This link says it all.

(By way of Andy Kehler.)

[ Comments? ]

Posted by Eric Bakovic at 02:58 PM

Europeans are Republicans (except for the ones that aren't)

Fernando Pereira has emailed a couple of clues to help clarify European political nomenclature for me.

The term "liberal" in continental Europe mostly means "market liberal," someone who is for reducing the role of the state in economic and social affairs. Like cutting welfare, reducing employment protections, facilitating transnational investment, ... In contrast to US "conservatives", Euro liberals tend to be social libertarians too. For example, the current ruling coalition in Portugal is a combination of market liberals and social conservatives. This may not be so different from the Republican party, come to think of it, except that the market liberals have more relative power in the coalition than the social conservatives.

The term "conservative" in continental Europe tends to be associated with socially conservative, petit burgeouis positions, including concerns with immigration and assimilation, preservation of state funding of religious education, and support for small businesses and farms. Because of this, Euro conservatives tend to support strongly certain aspects of state power (farm subsidies, mandatory religion in schools, law and order). Not so different from some factions of the Republican party.

OK, I get it -- politically speaking, all Europeans are like Republicans?

Fernando clarifies:

Not quite :) There are all of those pesky social-democrats and socialists (communists are out of fashion), not to mention a fair sprinkling of greens.

Well, I'm exaggerating my naiveté here, if only a little. But I'm still puzzled about the kerneuropa business. When Jan Leidecker wrote that "we completely disagree with the whole notion of a European core that conservatives want to create", did he mean Euro-conservative in the sense given by Fernando above? I should have thought that such conservatives would subscribe to narrower forms of nationalism, though perhaps I'm out of date. And is someone like Jan likely to be opposing the kerneuropa idea from the side of wanting less European integration, or more? Or both?

I'll find out soon enough what Jan Leidecker thinks, by reading his web site and by asking him about it via email. But I'd like to understand the landscape that he lives in.

[Update: Jan wrote with a very clear explanations, both of the history of the term and concept kerneuropa, and also of his own perspective on it, which I've quoted below with his permission. Key quote: "The 'discourse' on European Integration and European cultural identy has many different layers. It cannot easily be understood under a right/left scheme."

First, you've got to understand that the term "kerneuropa" has a history that goes beyond its use by Habermas. The term was invented by Wolfgang Schäuble and his political ally Karl Lamers in 1994. Schäuble at the time was a central figure of the German conservative (centre-right) Christian Democratic party. As you might know, for a decade or so, Europe was gripped by a debate about political and economic integration. Several treaties, most important the "Maastricht", "Amsterdam" and "Nice" treaty defined this process. The most important achievements were the euro, the european free market and a great number of other thinks. All of this was overshadowed by the persepective of integrating middle and eastern Europe into the EU. Even further away at the horizon, there is still the unsolved question of the Turkish wish to enter the European Union.

I am generalizing now: Schäuble and others proposed a new model of integration. "Kerneuropa", or the core of Europe, means nothing less than that a few nations should integrate faster than the others (Schäuble and others usually refer to the founding member of the European Community: Germany, France, Belgium, Netherlands, Luxemburg and Italy). He and those lobbying for this concept did not believe all countries could integrate politically at the same time at the same speed. Furthermore the core should have consisted out of the economically successful, rich countries only. Therefore this concept has also been called the "two speed Europe". I have always been convinced that this would lead to a disintegration of the European integration process, a process that I for too many reasons to explain here, strongly support. The concept died pretty soon, after it became clear that the majority of other countries did not want this two-speed Europe to happen. Nevertheless, the word and the idea survived.

The term has also a strong cultural dimension: The people that supported the "kerneuropa" idea believed that there are cultural barriers that could and should not be overcome between the countries at the "heart" of Europe and those at the fringes, especially "underdeveloped" Eastern Europe and the "islamic" Turkey. The German Christian Democrats are strong opponents of a turkish membership of the European Union, because they believe - and argue - that the EU should be a "christian" community, the "christliches Abendland".

Therefore, this has been a line of political conflict between the German left and right. Those who do believe a European Integration should be possible despite cultural differences and those that want to split the European countries into different camps, according to their cultural origin.

"Kerneuropa" had a strong revival after the European heads of state initially failed to agree on a European Constitution in Novemember 2003. That's when some friends and I secured the name and created our webblog. BTW, I am glad you liked the Latvia post :)

Habermas and Derridas manifesto came pretty much as a surprise to the European left and it was at the time in 2003 seen as an intellectual accident.and a political disappointment. It has to be read in the context of the time (February 2003) immediatly before the Iraq war. At that time 8 Eastern European heads of state hat pledged their support for the war in Iraq. Habermas and Derrida answered this with a call for a "European Identity" and for a political integration of those countries that opposed it. Though I myself opposed the war, I consider this as a pretty weak argument. At the time there was a lot of talk about how the "European Identity" might, or should be shaped by a seperation from the United States - remember the debate on the multi-polar world.

The "discourse" on European Integration and European cultural identy has many different layers. It cannot easily be understood under a right/left scheme. I hope I clarified some of the questions without raising too many others.

And no, we're not becoming a political weblog. This is all about lexicography :-). And maybe orthography -- have you noticed that Huntington and Habermas both start with "H"? ]

Posted by Mark Liberman at 12:13 PM

Habermas the conservative?

I'm glad that Sally Thomason is in Hamburg, because I need help. I've just gotten a nice note from Germany, but the more I think about it, the more confused I get. Visiting the University of Hamburg, Sally is in a good position to do some linguistic anthropology to map out the intellectual territory that I'm lost in.

Here's the note:

I am the owner webmaster of kerneuropa.de. I noticed you mentioned my website in your blog on Hambermas, though you mentioned you were not sure whether our site has anything to to with the kerneuropa debate started by Habermas and others. Some friends and I secured the domain because we completly disagree with the whole notion of a European core that conservatives want to create: an idea that has unfortunately been endorsed by some German intellectuals. We wanted to counter this idea with a weblog on European Politics and Sports. Therefore, we spent most of June and July on the Euro 2004. Anyway, I just wanted to thank you for noticing us, since you were our #2 referrer url (only beaten by google).

Best regards,

Jan Leidecker

[hyperlinks added]

I wish Mr. Leidecker and his associates well. At least I think I do. How can you not like a site that has posts on sports entitled "Portugal wird Europas Meister" [ translate] and "Latvia, oder die Band auf der Titanic spielte auch bis zum Schluß" [ translate]?

But the thing is, I have to confess that I find European intellectual politics baffling, and so I'm not completely sure what I'm supporting here. Is Habermas being viewed as a conservative? or is he just one of the German intellectuals who have unfortunately endorsed a conservative initiative? Are "conservatives" and "German intellectuals" disjoint sets of people, who occasionally make alliances for convenience? Is the kerneuropa development that motivated Leidecker to get started the same Habermas statement that I blogged about, or something else? Is the idea of "core Europe" being rejected because of opposition to all regional identity politics, or because of support for more local regional identity politics?

Let me remind American readers (and any others who may be puzzled) of what we're talking about. As Nader Vossoughian put it, introducing his January 2004 interview with Richard Wolin:

On May 31, 2003, philosophers Jacques Derrida and Jürgen Habermas published a joint statement in Germany's Frankfurter Allgemeine Zeitung and France's La Liberation, calling for the formulation of a common European foreign policy in order to "balance out" US global hegemony. A greater show of solidarity between the members of "Kerneuropa" or "core Europe," they contend, and the empowerment of intergovernmental organizations like the UN, the World Bank, and the IMF, is the only way to contain (and perhaps combat) the recent "pre-emptive" foreign policy initiatives of the United States.

So is this the "European core that some conservatives want to create"? or just evidence that the "idea ... has unfortunately been endorsed by some German intellectuals"? What does "conservative" mean in this connection? Is it the old meaning of "someone who wants to prevent change", or the new meaning of "someone who wants to change things"? Or one of the many other current meanings, such as "someone who generally dislikes government intervention", or "someone who supports the wealthy and powerful"?

Mr. Leidecker has kindly written to me in English rather than in German, so perhaps he has tried to translate some European political terminology into its American equivalent. Another piece of evidence here is that kerneuropa.de links to Daily Kos and politics1. If so, I'm afraid that the translation didn't work. In the US these days, "conservatism" spreads its tent widely, over many values of many social and ideological variables, but I don't think you could stretch it to cover Habermas.

Of course, I'm in no position to complain about the irrationality of European politics, being myself a citizen of a country in which Ralph Nader has been variously allied with Pat Buchanan, Fred Newman and Lenora Fulani, and the Oregon Republican Party. But the sad fact is that politicians -- and other humans -- are like words. Their meanings shift in ways that make local sense but combine over time to create bizarre large-scale trajectories. And some politicians, like some words, are what Roman Jakobson called "shifters" -- their meanings involve contextual references that land on completely different things on different occasions. Nationalists, for instance.

Anyhow, I look forward to more information, in the naive hope that it will make me less rather than more confused.

[For what little it's worth, the pattern {Habermas conservative} gets 14,500 Google hits (though many are things like "Habermas criticizes the conservative nature of ..."), while {Habermas liberal} gets 38,300.]

Posted by Mark Liberman at 10:20 AM

Famous Authors Debate Oral vs. Written Language

...in a display window along the Rothenbaumchaussee in Hamburg (where I'm currently visiting the Sonderforschungsbereich 538, "Mehrsprachigkeit", at the University of Hamburg). The display in the window, which belongs to the Redaktionsbuero Udo Pini, is striking: it's a collection of antique writing machines -- not all of them typewriters, one or two look like possible early competitors of the typewriter -- with enlarged-type pages of quotations sticking up from their rollers. The texts on the pages all have to do with writing, and two of the quotations might have come from a fantasy debate. First, Goethe:

"Schreiben ist ein Missbrauch der Sprache, stille fuer sich lesen ein trauriges Surrogat der Sprache."

This translates to 'Writing is a misuse of language, reading silently to oneself a sad surrogate for language.' On one of the other quotation sheets in the window, Goethe explains that he never actually does much writing, because he dictates his works. Who knew? (I'm puzzled about the spelling of Missbrauch, which on the sheet has the old German "s-z" letter that looks sort of like a Greek beta. If I ever understood its use fully, I no longer do; why wouldn't there be just a plain s in this word?)

Second, E.M. Forster, who persumably wrote this in English, but I don't know what his exact English words were:

"Wie kann ich wissen, was ich denke, bevor ich lese, was ich zu sagen habe?"

This means, 'How can I know what I think before I read what I have to say?'

For those of us who write for publication, though, a third quotation in the display may mean even more than Goethe's and Forster's words of wisdom. This one is attributed simply to 'Volksmund' (that is, it's a folk saying):

"Wer schreibt, der bleibt."

This is hard to translate gracefully, but it means roughly "He (...s/he...) who writes, lives on." A familiar optimistic thought, but the German saying gives it an especially neat turn of phrase.

(I have written this entire post without, I think, falling once into the trap of typing z instead of y, although the German keyboard tempts an American to do that constantlz I mean constantly. )

Posted by Sally Thomason at 10:08 AM

Ms. Frizzle, meet Claude Shannon

This post has been entered using Dasher. It's a little tricky, but with practice I could get good at this, I'm sure. Dasher is a really nice idea. I don't know what kind of uptake it's getting, but it deserves to succeed. It should be useful for people with various kinds of disabilities, and for people limited to tablet/stylus text input. Its developers have documented up to about 30 wpm with a mouse, and 25 wpm with an eyetracker. With some refinement and practice, it might even turn out to be better than typing.

Well, maybe not. Anyhow, I've stopped using Dasher and gone back to typing. But the idea of using dynamically recalculated probabilisitic autocompletion, based on an adaptable language model, is a terrific one. It's another nice idea to navigate though that space using up/down for the paradigmatic dimension and left/right for the syntagmatic one. And it's sheer Borgesian genius to describe the concept this way:

Imagine a library containing all possible books, ordered alphabetically on a single shelf. Books in which the first letter is "a" are at the left hand side. Books in which the first letter is "z" are at the right. ... The first book in the "a" section reads "aaaaaaaaaaaa..."; somewhere to its right are books that start "all good things must come to an end..."; a tiny bit further to the right are books that start "all good things must come to an enema...".

When someone writes a piece of text, their choice of the text string can be viewed as a choice of a book from this library of all books - the book that contains exactly the chosen text. ...

By looking ever more closely at the shelf, the writer can find the book containing the text he wishes to write. Thus writing can be described as zooming in on an alphabetical library, steering as you go. ...

This is exactly how Dasher works, except for one crucial point: we alter the SIZE of the shelf space devoted to each book in proportion to the probability of the corresponding text.

But couldn't they do better with the presentation? The colors vary among shades of yellow, green, pink and blue, plus black and white, kind of like a 1950s motel lobby. And the space is represented using unattractive overlapping rectangles that don't give me any sense of an inhabitable landscape.

Compare the base26 interactive visualization (due to toxi) of the space of four-letter English words that I discussed here. That visualization is limited to one serial position in the simple and static universe of four-letter words, with no practical application in view, but it gives a sense of a space that you could sail around in, and (perhaps for the same reason) it's interesting to watch and to interact with, even though it's not good for anything. Using Dasher, or watching someone use it, doesn't convey much of the sense of the organic form of the word space that toxi's visualization does. At least not to me.

The "base26" style of visualization presumably uses a lot more graphics processing, but that shouldn't be an insuperable barrier as time passes. I certainly don't mean that the base26 mode of mapping the word space could or should be adopted exactly and in detail, and I certainly haven't thought through the problem of adapting that style of representation to a Dasher style of interaction. But I think it would be neat to maneuver through a base26 type of space with a Dasher type of control, kind of like The Magic Schoolbus meets Claude Shannon.

Really, it'd be great to have Ms. Frizzle at my side as I write... "take chances! make mistakes! get messy!" But I digress.

There's no daring and no messiness in the WORDCOUNT application by Jonathan Harris from FABRICA, the "Benetton Research and development Communication Centre" in Treviso, Italy. It's presented as "an artistic experiment in the way we use language", based on a frequency list of words from the British National Corpus. Maybe I'm being dense, but this experiment doesn't do anything for me. It lacks Dasher's ugly colors and shapes, but nothing about it seems visually memorable or interesting. The description calls it "minimalist", but sometimes that's just a word for "dull". As for the content, WORDCOUNT doesn't seem to show anything beyond what a simple textual frequency list shows. In contrast, base26 really seems to present something about the informational structure of the orthographic word space that isn't trivially equivalent to a simple list of words and counts.

[link to WORDCOUNT from Abnu at Wordlab]

Posted by Mark Liberman at 12:00 AM

July 25, 2004

IPA in the USA

No, I don't mean India Pale Ale, I mean the International Phonetic Alphabet.

A few days ago, I wrote about opera singers using IPA to learn foreign-language lyrics. Several readers have confirmed that IPA is a standard part of the curriculum at music schools. For example, Richard Alderson emailed:

My son, Rich Alderson, sent me your posting. Rich, of course, is the linguist in the family, but he knows that my work as a voice teacher over the years has included use of the IPA.

Thank you for your posting on the LanguageLog blog, correcting the mistaken impression that music schools do not teach the IPA. Of course, applied voice teachers around the world have used the IPA for decades to teach singing diction. For more than fifty years a major advocate for such instruction has been the National Association of Teachers of Singing, and music schools and departments have long included the IPA in their applied curricula.

Why the system has not caught on with the general public is anyone's guess.

I've always thought that the problem was the educational establishment, not the general public. But this does raise an interesting point -- maybe we should bypass the schools and look for solutions from religion, government and popular culture. Some possible futures, looking past the opera demographic:

There's a new spiritual discipline based on contemplating the mystic relationship between signals and symbols. Madonna gets tattooed with a spectrogram of the syllable ॐ , annotated in IPA as [ɛɔm] .

Some hiphop artists decide to print their lyrics in IPA. English orthography doesn't fit anyhow. IPA is the raw way to write.

Brewers are persuaded that their IPA needs to be labelled with IPA. The fashion spreads to other categories of adult beverage. Congress passes a law making it a felony to teach the IPA to anyone under the age of 21, with the result that every 17-year-old in the country becomes an expert within a year.

Fear Factor adds an event requiring one partner to provide an IPA transcription of the non-lexical vocalizations produced during the next Dumpster Diving episode, while the other partner's performance from the transcription is compared against the original recording, to the amusement of the studio audience.

1337-speak is over -- the next generation of hackers leaves messages in IPA: [ju hæv bɪn o^und dʉdz]

Well, none of these things are likely to happen. But it's still worth thinking about how to create IPA buzz on the street. Maybe we should start with opera fans...

Posted by Mark Liberman at 08:07 PM

The thin line between error and mere variation, part 3: They started to saying

Something I'm always on the lookout for is a kind of speech error, the inadvertent syntactic blend. There are easy cases, and then there are hard ones, where I start by thinking that what I've just read or heard is a slip, but when I Google around, I discover a pile of examples suggesting that some people have grammaticalized the construction; what almost surely began life as a slip has now been installed in the grammars of some speakers as one of the available options.

Case in point: someone who reports on the radio that "economists started to saying that..." I took this to be a simple slip, blending "started to say" and "started saying". And maybe it was, in this case. But it looks like, for some people, "to V-ing" is now a third scheme of complementation for "start" (and perhaps other verbs).

As background, a few cases I think really are inadvertent slips:

(1) It really becomes down to a question of... (commentator on local elections, NPR's Morning Edition, 2/26/04) ["becomes a question of" x "comes down to a question of"]

(2) He'll have a lot of more competition. (posted to ADS-L, 7/15/04, from NPR's All Things Considered, 7/14/04: cycling expert talking about Lance Armstrong in the Tour de France) [presumably: "a lot more competition" x "a lot of competition"]

(3) ...look at the power of art has to... (posted to ADS-L, 6/25/04, from NPR's Fresh Air, 6/25/04), Alain de Botton in an interview) ["look at the power of art to..." x "look at the power art has to..."]

(By the way, these slips could use some serious study. I wouldn't want to give the impression that the blending just involves strings of words. But that's a topic for another day.)

And then, the precipitating example:

(4) ... economists started to saying that... (interviewee on PRI's Motley Fool Radio Show, 7/25/04)

I caught this because it sounded odd to me. On the other hand, it didn't sound like total Martian. Maybe the speaker slipped. But maybe he was using a construction that was in his grammar but not mine. Maybe, even, he was using the V-ing form to indicate a semantic subtlety -- perhaps the continuative sense associated with progressive V-ing forms -- that I didn't get.

So I Googled. There were about 35 relevant Google web hits for "started to saying", 5 google groups hits; about 20 web hits for "start to saying", 4 groups hits; and 5 web hits for "starts to saying", no groups hits. Not too shabby. I tried some other complement verbs: "started to going" got about 300 relevant web hits, about 130 groups hits; and "started to thinking" was similar (about 300 web, about 150 groups). Not shabby at all. It began to look like I was once again contemplating the thin line between error and mere variation (see part 1 and part 2). Look at some of the examples:

(5) Q: I broke up with my ex-boyfriend for various reasons, one of which was his drug habits as he had started to going to clubs and doing pills (he's 22). (www.channel4.com/health/microsites/ H/health/magazine/drugs/youask_ex.html)

(6) However, that wore thin quite quickly - I started to thinking of ways of integrating much deeper layers of meaning. (www.shaav.com/art/art-intro.html)

These are easily seen as continuative (and I find them relatively easy to understand). Examples involving repetition, but over short periods of time, are easy to find; these I find a bit harder to understand:

(7) Then some big dude came over and backed me into the corner. Where I immediately started to saying "I thought she was legal! I thought she was legal! (www.ubersite.com/m/28836)

(8) The walls started receding back and forth. And then the ceiling started to going down and going up. And that hard, cold cement floor started to shaking. (seattlemedium.com/News/article/ article.asp?NewsID=39654&sID=36)

And then there are pure inception examples, which strike me as very weird indeed:

(9) ...Cook campsite, after that my sleeping bag started to getting wet and i decided to move over to the YHA hostel after the 48 hours of rain. (www.bikeforums.net/archive/index.php/t-22925)

So maybe the semantics thing is just (mostly) a red herring. Maybe some people simply have three constructions -- V to V:Base, V V-ing, V to V-ing -- where most of us have only the first two. Maybe some people have a semantic distinction and others don't. Lots of things are possible. Here's a paper topic for linguistics students!

In any case, there's a lot more variation going on with the government of forms of complement verbs than most scholars of English think. For some entertaining moments, check out the verb "try" with various base-form complements: "try start", "try go", "try say", and the like. Googling will pull up a lot of junk, but also a respectable number of examples from what appear to be competent writers who are native speakers of English. Further study is called for -- like, examining the productions of these speakers over significant amounts of text, interviewing them, etc. -- but I don't think such examples can be dismissed as mere errors just because the people who produce them are in a fairly small minority.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 06:34 PM

The straight poop on Fiiijb

The mystery is solved! Yesterday I came across some OED citations like this:

1579 HAKE Newes out of Powles (1872) Fiiijb, O wylie wincking wyzard Woolues.

and wondered, "what in the world does 'Fiijb' mean?" Well, now I know.

Here's an authoritative answer from Jesse Sheidlower:

It's the foliation number, or the reference to the particular signature. Medieval and Early Modern books or manuscripts often did not have continuous page numbering, but rather foliation numbers, which had the general form of an initial capital letter indicating the particular quire (F being the sixth, expectedly), the leaf within the quire in Roman numerals ("iiij" being the fourth, with the "j" substituting for "i" in terminal position), and the recto/verso indication ("b" indicating the verso, the recto usually being unmarked).

So anything marked "Fiiijb" in OED, or in other un-Google-indexed bibliographical references, would be the back side of the fourth leaf of the sixth folio of the book/manuscript in question. If you imagine other references, you'll find lots of them--there are 264 examples of "Biij", for example, and 173 of "Cijb", and so on.

OED indicates the foliation number whenever necessary, either because (1) there's no continuous pagination, or (2) the continuous pagination is inaccurate, which is quite common. The foliation number is usually accurate because otherwise the text would never be bound up properly.

Best,

Jesse Sheidlower
Principal Editor
North American Editorial Unit
Oxford English Dictionary

P.S. I've long admired your posts on LL, but have never said so directly before, because there hasn't been an inducement like a FREE YEAR OF LL! Wow! So I'll say so now.

(Some further definitions, diagrams and examples can be found here, and (especially for hand-copied manuscripts rather than printed works) here. Look in the second reference to learn what a "hair/flesh disturbance" is, for example -- I for one wouldn't have guessed.)

Others who wrote in with essentially the same answer to the Fiiijb question were Steve at Language Hat, Jim at Uncle Jazzbeau's Gallimaufrey, and David Nash.

Lance Nathan took a sort of " new criticism" approach:

Never one to let a mystery lie, I set out to determine what "fiiijb" could mean. Searching the OED for "Powles" as the source of the quotation goes a long way to clearing it up. Every citation for the text (whose name varies in representation--Newes out of Powles Ch. -yd; Newes Powles Churchyarde; Newes out of Powles) includes something etymologically related to fiiijb. The first few are:

Fvj, Giij, Hj, Eijb, Gjb, Diij, Dij, Dijb...

(More or less the same is true of searching for Caxton's "Geoffrie de la Tour", though in some instances the first letter is separated, as in "E. v", and in some what follows is written as a numeral, as in "E 4".)

Alas, I can't find the full text of Hake's poem online (nor Caxton's, which was as far as I looked), and I know I don't have a copy in book form. However, the fact that "O wylie wincking wyzard Woolues" was marked as Fiiijb and occurs in "The syxt Satyr" suggests that the first letter is a section marker. This is followed by a Roman numeral (with a "j"), then an optional b. (And from there, you can write a context-free grammar to generate these things.)

That doesn't completely answer the question. I don't know what the Roman numeral indicates, though if you have the text of the poem you may be able to sort that out.

Since some citations have no apparent Roman numeral ("Bjb", say), I strongly suspect that the "j" is an old way of of writing "i" at the end of a Roman numeral--the reason it only sometimes occurs is that not all Roman numerals end in an i. (Hence "Divb" in one citation: no "j" because there's no "i" at the end of the number.) A little Googling confirms this.

So you've got a section marker, a number indicating...stanza? I have no idea, I admit. And then, in some cases but not all, a 'b'. Again, I bet this would make sense if I had the poem in front of me.

This probably doesn't warrant a year's subscription to Language Log, but if it's worth at least a few months, I'll donate it to the Newark Public Library (having already paid in full for the next year myself).

I think it's fair to say that Lance (almost) figured the thing out entirely by reasoning from document-internal evidence. I hypothesize that he took this approach because (like me) he was handicapped by having wasted his youth on mathematics and computers instead of history and manuscripts. Lance sorted out the counting system quite accurately, but was on the wrong track in hypothesizing that the system refers to the author's or editor's divisions of the text, rather than to the physical structure of its production as a book. However, with access to a representation of one of the cited documents, I'm sure he would have gotten that too. I'm impressed -- he should consider a sideline in archeology or cryptology.

Several other people wrote in with the "j for the last i in a roman numeral" business. I also got several extremely creative suggestions that turned out to be not as close to the mark, such as the reader who almost managed to construe the whole thing as an acronym for a term found in the OED's guide to interpreting senses and citations --"Further Information In Index or Bibliography" -- but couldn't quite figure out what to do with the "j". She suggested "Further Information In Index or Joint Bibliography", which shows an admirable combination of gumption and inventiveness, but she was also a scrupulous enough scholar to point out that (unlike the un-jointed suggestion) this lacks specific textual support.

Anyhow, I've awarded a free year's subscription to Language Log to all who wrote in with a piece of the answer (or a sufficiently creative alternative), and another one to the Newark Public Library in the name of Lance Nathan. The rest of you will just have to pay at the usual rate.

[By the way, the full texts of Hake's poems are available from the LION ("LIterature ON line") database ("a fully searchable library of more than 350,000 works of English and American poetry, drama and prose, 131 full-text literature journals, and other key criticism and reference resources"). Many if not most university libraries subscribe to this, and perhaps some public libraries do as well. The full bibliographic record in question, from LION, is

Hake, Edward, fl. 1560-1604(fl.1566-1604) Newes out of Powles Churchyarde Now newly renued and amplifyed according to the accidents of the present time, 1579. and Otherwise entituled, Syr Nummus. Written in English Satyrs. Wherein is reprooued excessiue and unlawfull seeking after riches, and the euill spending of the same. Compyled by E. H.
London
Imprinted ... by John Charlewood, and Richard Ihones 1579

The LION version indicates the pagination (though without page numbers, just with an indication of where in the text the page divisions fall), but not the "foliation".

]

Posted by Mark Liberman at 08:59 AM

July 24, 2004

'My Life', the British way

In an interesting piece in the New York Times today, Edward Wyatt reports that Bill Clinton "authorized changes to a dozen or more passages" of his recent memoir before publishing it in Britain in June. The majority of the changes had to do with differences between American and British libel laws; some of the book's passages about Ken Starr were, shall we say, less flattering in the eyes of Brits than in those of Yanks.

Here's the one example of a changed passage that Wyatt reports. In the American version,

Mr. Clinton speaks of Mr. Starr's "continuing efforts to coerce people into making false charges against Hillary and me, and to prosecute those who refused to lie for him."

In the British version, the quoted passage reads:

"continuing efforts to coerce people into making charges against Hillary and me, and to prosecute those who refused to tell him what he wanted to hear."

Wyatt goes on to (briefly) explain the fundamental difference between British and American libel laws that prompted the changes:

Britain's libel laws are almost the opposite of those in the United States. In Britain the burden of proof is on the defendant, with the law essentially assuming that a published statement is false and requiring proof that it is true. In the United States, however, if the plaintiff is a public figure, like Mr. Starr, he or she must prove both that what was reported was false and that the publisher either knew that or printed the statements with reckless disregard for their possible falsehood.

I'm sure the legal eagles who reviewed all this know what they're doing, but this has made me just a little bit curious about exactly how libel laws work. I imagine two possibilities in this case:

It has already been established beyond reasonable doubt (though not necessarily in a court of law) that Starr coerced people to do things and prosecuted those who refused to do things. However, it has not already been established beyond reasonable doubt that Starr coerced people to lie or prosecuted those who refused to lie. The relevant changes to the passage thus properly separate fact and allegation.
It is easier to defend the allegation that Starr coerced people to do things and prosecuted those who refused to do things than it is to defend the allegation that Starr coerced people to lie and prosecuted those who refused to lie -- easier enough to make the relevant changes to the passage worth the effort.

It seems to me that it is already pretty damaging to Starr's reputation to accuse him of coercing people to do things and of prosecuting those who refused to do things, and if this hasn't been established as in (1) above, then Starr has a solid libel case against Clinton('s publisher) -- at least in Britain. My curiosity is more about (2), though: is the substitution of lie for do things so much more damaging to Starr's reputation that it would worry the British publisher about being sued and losing? I just don't know, but I would think not.

(An almost totally unrelated aside: the title of the NYT article is "Changing His 'Life' to Suit British Law". The single quotes around Life are of course supposed to refer to the title of Clinton's memoir. But the title is not Life; it's My Life. Yes, the My is a possessive pronoun referring to Clinton, and the NYT article title has His more or less in its place. But does that by itself license the omission of My from the book's title, or is the desire for a cute title also necessary?)

[ Comments? ]

Posted by Eric Bakovic at 04:25 PM

Caring less all the time: A variant of the etymological fallacy, and some cautions about the pragmatics-phonetics connection

Mark Liberman (most recently, here and here) has been looking, with increasing skepticism, at Steve Pinker's claim that the idiom could care less is sarcastic in intent and sarcastic in prosody. For current usage, I'm as dubious of these claims as Liberman is.

But if we're right, then Pinker's perceptions of both the pragmatics and the phonetics of could care less are mistaken. How could this happen? I suggest that Pinker is subject to a variant of the etymological fallacy, the idea that the "true" meaning of a word is its historically "original" meaning -- applying here to pragmatics and phonetics, rather than semantics and lexicon.

I also share with Liberman a deep distrust of Pinker's apparent assumption that purposes and bits of phonetics are connected to one another in very simple ways. Instead, I maintain that the usual situation is a many-to-many association, much like what we see in the connection of social meanings (or aspects of a persona) to bits of linguistic form (phonetics included), or the connection of semantics to aspects of syntactic form.

Let's start with the core of Liberman's objections: Pinker maintains that "the expression I could care less 'is not illogical, it's sarcastic.' I agree that the phrase is not a mistake in logic, but I think that Pinker is wrong about the sarcasm. And I'm pretty sure that he was wrong to argue that the melody and stress of the phrase convey -- to those who don't have a 'tin ear' -- that it's being used sarcastically."

In a small defense of Pinker, I note that I have heard occurences of I could care less with a marked prosody, occurrences that I took to be sarcastic in intent -- in particular, using a positive to convey a negative, and drawing attention to this reversal by phonetic means. On the other hand, I don't hear this often, and in fact don't recall having heard it for some time now. What I hear now, all the time, is phonetically unexceptional productions of an idiom with negative import that happens to contain no standard negative marker.

Now, one plausible hypothesis about the origin of this idiom -- hang on, I'll eventually get to Liberman's alternative hypothesis -- is that it began in sarcasm, both in intent and in prosody. The intent would be easy to miss; sarcasm is notoriously risky, likely to misfire. The prosody, too, would be easy to miss; prosodic contours do lots of things, so that even a remarkable one might be interpreted as just emphasis or foregrounding. The situation is ripe for reanalysis, after which later generations of speakers -- most current speakers -- will lack both the intent and the prosody.

Pinker is maintaining that things haven't really changed. He's fallen into a variant of the etymological fallacy. On the contrary: where something comes from isn't necessarily its description now.

In its textbook manifestation, the etymological fallacy has to do with semantics. People maintain that "decimate" can't mean 'almost entirely wipe out' because it really means 'wipe out one-tenth of'. Or that "since" and "while" can only be used as temporal connectives, not as logical ones (meaning, roughly, 'because' and 'although'), because that was their original meaning. ("Original" is, of course, a moving target here.) What's going on here is a reluctance to recognize change, and that idea can be applied to all sorts of innovations: a [t] in "often" or an [l] in "walk" (note that these things can come around in cycles); the use of past forms in counterfactual conditionals ("if I was your father"); plural subject-verb agreement with "none" ("None of the students were prepared"), because "none" is really "not one" and therefore singular; and so on.

Another manifestation of the etymological fallacy shows up in the way many ordinary people (PITS, People In The Street) think of non-standard, innovative, regional, informal, etc. usages. Many of these usages have their origins in what could broadly be labeled as "mistakes" or "errors" -- via regularization, reanalysis, generalization, hypercorrection, and the like -- and PITS are inclined to see them as still errors still, as (inadvertent) failures to attain the correct usage. This attitude towards variation leads to what I think of as the Repetition Annoyance Syndrome, or RAS: PITS are mightily annoyed when speakers or writers keep producing the manifestly incorrect usages, time after time. "There he goes again", they cry out in exasperation, as "She talked with Tom and I about it" is succeeded by "That really pleased Tom and I" and so on, one nominative coordinate object pronoun after another, in what strikes many PITS who abhor this construction as a perverse indulgence in error.

But enough of the etymological fallacy. Let's get back to the origins of could care less. Mark Liberman has suggested an alternative history, via "negation by association", that involves no sarcasm at all. This is an attractive story, but it's not necessarily the whole story; there's no need to claim that idioms and constructions each have a single historical source, and in this case a number of people (Pinker among them, but me too) have reported hearing utterances of could care less that seemed clearly sarcastic in intent (accompanied, perhaps, by a raised eyebrow or a wry smile) and were prosodically marked, so that models were available for the scenario based on the bleaching of sarcasm. We don't have to choose between the negation-by-association and the bleaching-of-sarcasm scenarios. In fact, it seems likely to me that both effects contributed to getting us to where we are now.

A parallel. Consider the Isis construction ("The problem is is that we have to leave now"), last treated on Language Log, I think, here. There are two obvious sources for this construction: (1) the pseudocleft construction of "What the problem is is that have to leave now" (with restructuring) and (2) a pause and re-start in production, as in "The problem is, um, is that we have to leave now" (with grammaticalization of a production strategy). Opinion on the historical source of Isis is increasingly tending towards the view that both phenomena contributed to its development. Opinion on the synchronic description of Isis, meanwhile, has pretty much crystalized in the view that it doesn't involve a simple encapsulization of either source. For a recent proposal exhibiting both of these views, see Jason Brenier and Laura Michaelis's "Optimization via syntactic amalgam: Syntax-prosody mismatch and copula doubling".

Enough of diachrony vs. synchrony for this posting. What about the claim that there's a "sarcastic prosody"? Liberman is dubious about this, and I agree, whether we interpret the claim as being (a) that there's a prosodic contour devoted to conveying sarcasm (and nothing else, though sarcasm might be conveyed by other means), or as being (b) that sarcasm is (always) associated with a particular prosodic contour (which might have other functions as well). Claim (b) is straightforwardly false, since sarcasm often goes unmarked in any way, sometimes is marked only by gesture or facial expression, and can be marked linguistically by a variety of means. I myself often use creaky voice to flag that an utterance is to be taken in some special way (of which sarcasm is one possibility), and I believe that other speakers sometimes use drawl, a stretching out in time, to flag utterances this way, and that some use falsetto voice. No doubt there are other phonetic resources that could be pressed into service for this purpose.

Claim (a) is subtler. It could, conceivably, be true. There could be a phonetic resource that was used for only one purpose. It's just massively unlikely. The world of phonetic resources is really big, but the world of purposes is really big, including as it does all sorts of stuff: conveying conversational intents (like sarcasm), marking discourse functions (like foregrounding), and displaying social relationships (like intimacy), social identifications (like gender), and aspects of a persona (like flirtatiousness, dependability, authority, ßor flamboyance). Multifunctionality is pretty much guaranteed.

Much the same is true of other choices from inventories of formal resources: lexical choices, choices of alternative inflectional forms (like past "shrank" or "shrunk"), choices of a particular inflection form (like the present participle or the past participle), choices of one syntactic construction over another. In my current mantra for such things: It's all just stuff. That is, it's all just material that can be invested with some sort of content -- pragmatic, discoursal, social, personal, semantic. Nothing is intrinsically associated with some particular content (using the higher end of your pitch range can convey femininity -- or any number of other things), and even where the associations are conventionalized, as in the associations between semantics and syntax, the usual situation is a many-to-many mapping.

So talk of a "sarcastic prosody" is pretty much sure to be misleading. Though there's nothing wrong with saying that sarcasm can, on occasion, maybe even conventionally, be conveyed though the choice of a particular prosody. Which would be a generous way of interpreting Pinker's original claim. Though then, as Liberman points out, the claim becomes devilishly hard to test.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 04:18 PM

Freedom, piracy and lobbying

In the middle of the 19th century, there was a raging controversy about copyright piracy, and American publishers lobbied Congress repeatedly about it. You may be surprised, however, to learn that the publishers were the pirates, and that for many decades Congress protected their freedom to re-publish and sell works as they pleased -- at least if the authors were foreign.

According to an article by Kevin Baker in the latest American Heritage,

In 1842 there was still no international copyright law, a condition that was stunting American letters and depriving authors on both sides of the Atlantic of a living. Britain was willing to recognize the copyright of foreign writers—but only if their countries reciprocated.

This American publishers adamantly refused to do. Instead, they competed in bribing English pressmen to get early sheets of British books. The sheets were rushed by boat over to the United States, where the jolly pirates churned out cheap editions in a matter of hours.

This was one of the (alas many) things that annoyed Charles Dickens about America. He brought it up a few times during his American lecture tour in 1842, which did not make him popular with editorial writers:

... the Hartford Times bluntly informed him, “It happens that we want no advice on the subject and it will be better for Mr. Dickens if he refrains from introducing the subject hereafter.” Dickens’s biographer Edgar Johnson writes: “Other newspapers asserted that he was no gentleman, that he was a mercenary scoundrel, that he was abusing the hospitality of the United States… . Anonymous letters echoed all these attacks in every key of scurrility.”

Anticipatory plagiarism of the Slashdot hordes?

I should say that I'm a stalwart supporter of free software and of open access to the scholarly and scientific literature. It's normal that over the past 150 years the industry of "content providers" has lobbied on behalf of what they see as their interests. But it's interesting how different their interests have seemed to them at different times. I guess that over in China, there are pirate DVD and CD factory owners who are "lobbying" (which in that context probably means bribing) the authorities to protect their rights. When I first visited Japan, the bookstores were full of pirated editions (mostly from Taiwan, as I recall) of American and European scientific and scholarly books. I suppose that those publishers probably lobbied too, until Taiwan finally caved in and signed up to international copyright agreements, not too many years ago.

In Dickens' case, several publishers added insult to injury by stealing his books and also giving them bad reviews, at least in the case of his American Notes:

James Bennett’s New York Herald pilloried the book as the product of “that famous penny-a-liner” with “the most coarse, vulgar, impudent, and superficial” mind. This set new standards in gall, inasmuch as the Herald had been an active pirate of American Notes. Bennett’s pressmen sold 50,000 copies of the book in two days’ time, without so much as a dime going to that famous penny-a-liner.

Posted by Mark Liberman at 03:27 PM

Fiiijb

Following up on my post about "all's I know", I checked the OED, which was helpful, as it almost always is for questions that can be associated with the usage of particular English words.

The OED's entry for as points to cases where loss of such (in patterns like "such X as Y") leads to as being essentially drafted into service as a "relative pronoun" (though maybe it's more of a "complementizer introducing a relative clause"?):

24. a. The antecedent such is also replaced by that, those, or entirely omitted, leaving as an ordinary rel. pron. = That, who, which. Cf. Norse use of som. Obs. in standard English, but common dial. in England and the United States.

[...some citations omitted]
1525 LD. BERNERS Froiss. II. Pref., The ymages as they used in olde tyme to erecte in worshyp.
1592 SHAKES. Rom. & Jul. II. i. 36 That kind of Fruite As Maides call Medlers.
1603 HOLLAND Plutarch's Mor. 222 To those as have no children.
1645 FULLER Good Th. in Bad T. (1841) 32 It is false that the marigold follows the sun, whereas the sun follows the marigold, as made the day before him.
1747 GOULD Eng. Ants 70 That prodigious Size as we see in many Places.
c1852 Lamplighter (1854) 91 It's he as lives in the great stone house.

The OED also mentions the use of as in introducing sentential complements:

28. Introducing a noun sentence, after say, know, think, etc. Sometimes expanded into as that. Obs. and replaced by that; but still common in southern dialect speech, where often expanded to as how. ...

1483 CAXTON G. de la Tour Fiiijb, I saye not as ye shalle be pryuely and alone one by other.
1578 TIMME Calvin on Gen. 331 It seemeth to be a very absurd reason that he giveth, as that the children of Abram could not be saved.
1689 Tryal Bps. 55 Do you know My Lord Bishop of St. Asaph's handwriting? Not as I know of.
1712 STEELE Spect. No. 508 p. 6 That the Fop..should say, as he would rather have such-a-one without a Groat, than me with the Indies.
1748 RICHARDSON Clarissa (1811) IV. 259 Pray let her know as that I will present her..my Lancashire Seat.
1771 SMOLLETT Humph. Cl. I. 274, I believe as how your man deals with the devil.
1833 MARRYAT P. Simple xiii. (Hoppe) Seeing as how the captain had been hauling him over the coals.
1856 MRS. STOWE Dred xi. 100, I don't know as you'll like the appearance of our place.

This clears almost everything up, I think, except... what in the world does "Fiijb" mean?

This item occurs 15 times in the OED. Here are few more examples:

1528 PAYNEL Salerne's Regim. Fiiijb, Suche wynes..amende the coldenesse of complection.

1483 CAXTON G. de la Tour Fiiijb, Ye may eslargysshe yourself to say or do your wylle. Ibid. Iij, God..moueth hym self to pyte and eslargyssheth his misericorde.

1579 HAKE Newes out of Powles (1872) Fiiijb, O wylie wincking wyzard Woolues.

The first person to solve this mystery for me wins a free year's subscription to Language Log.

It should go without saying that Fiiijb is not part of the quotation in any of these examples. For instance, the larger context of the last citation is Edward Hake, The syxt Satyr (from Newes out of Powles Churchyarde, 1579):

297 They talke from feare of check at large.
298 But yet of them there bee
299 That prease amongst professors true,
300 and well with them agree.
301 For why, their lyuings so doe lye,
302 that but they seemed such,
303 They neuer coulde aspire so high,
304 nor yet obtaine so much
305 As now they doe. O Ianus Iacks
306 and double faced Dogs?
307 O wylie wincking wyzard Woolues,
308 O grunting groyning Hogs?

For those of you who may be puzzled by Elizabethan spelling, that's

... O Janus jacks
and double faced dogs?
O wily winking wizard wolves,
O grunting groaning (?) hogs?

In whatever spellings, there's no "Fiiijb" anywhere around in the original, needless to say. And there is no "fiiijb" in Google's index (yet). So Fiiijb must be some OED citational thing, but it's not one that I understand.

Posted by Mark Liberman at 11:31 AM

All's I know is ...

The Language Log bat signal went up in the sky over Strange Doctrines yesterday, where "Tadlow Windsor II" asked:

A blog post by Stuart Buck about his daughter's declination of subject rather than verb reminded me of a generatively unrelated if phonologically similar construction I've been hearing a lot, viz., 'Alls I know is, ...' (or is it 'All's I know is, ...'?).

I'm not sure quite what to make of this. Looks like a job for Language Log.

We're always glad to be of assistance to citizenry in language-related distress. As for "all's I know is...", the linguistic superheros over at Linguist List handled this one back in 1992, when Mike Carter wrote:

Reply to Ron Smyth on "all's I know...." This looks like a contraction of "all as I know..." similar to "there's some as....." for "there are some who..."a regular feature of my childhood dialect (English, East Kent), and presumably influenced by Scandinavian, where I believe "som" can be both a relative pronoun and the equivalent of "as".

and Scott Delancey expressed a similar opinion:

_All's I know is_ would be perfectly regular in a dialect with _as_ as a relative marker (as in _Them as has, gets_) -- All as (= that) I know is. I've always assumed that this was the analysis; any reason not to think so?

Brian Teaman, on the other hand, said that for him, the contracted form was a frozen idiom without any uncontracted variant:

"Alls I know" is one feature I have in my English that doesn't seem to be common on the East Coast. I must have acquired this in Lorain, Ohio (near Cleveland) where I grew up. Others have pointed out that it struck them as unusual. I think I can credit Peter Patrick for first pointing it out to me.

As for the analysis "All as I know", alls I know is it might be a historical fact but it is not part of my understanding of the form. To me, it was always just "all" with an "s" attached. I absolutely cannot say the full form "as" in this context. And, until now, I don't think I ever wrote it down; perhaps this could be some indication that I recognized it as non-standard, or at least only a spoken form.

while Ellen Contini-Morava offered an explanation for Brian's intuitions:

on "all's I know is...": I had always heard that this was a dialect form of "all", retaining an -s originally derived from German "alles"-- no connection to the copula at all. Is it found outside of areas that were settled by German speakers?

So there's apparently a dialect in which "all's" is a special colloquial form, perhaps influenced by German alles, but the more productive (and I think more widespread pattern) is for as to be used as a substitute for that in certain contexts. Among the contexts that come to mind for me, besides the ones already cited by Mike Carter and Scott Delancy, are

I don't know as I agree.
He's the one as needs to apologize.

As for Stuart Buck's daughter's hypothesis about where English verbal inflections should go

he's want to go
it's make me happy

I agree with Tadlow Windsor II that this is a different thing entirely. And I very much doubt, by the way, that the young lady is really putting verbal endings on the subject -- it's much more plausible that she's just overgeneralizing the location of 's and 'd between subject and verb in phrases like "she's fallen" or "it's raining".

Turning back to "all's I know" and the like, it's not easy to find other examples via Google, but here's one:

( link) Them as can do has to do for them as can't. And someone has to speak up for them as has no voices.

and here's a whole little bouquet, though of artificial flowers:

Them as has, gets.
Them as ain't should have been.
Them as didn't belonged to the wrong pahty.
Them as don't care can go away.
Them as ain't , is.
Them as reads old New England proverbs is teched, I 'low.

By the way, a Google search on "all's I know" turns up the two Linguist List issues that I quoted, as the fourth and fifth items in the list returned -- use the net, Luke!

Posted by Mark Liberman at 07:56 AM

The Cows of Language

You thought the talking dogs were amazing? Well, now here's "Bess the Amazing Talking Cow".

Ok, it's a comic strip, the first in a compilation of Ruben Bolling's "Tom the Dancing Bug's Super-Fun-Pak Comix". Moo. Moo. Moo.

Three panels. Characters, left to right: woman, man, cow.

First panel: Man says, "I've taught Bess a vocabulary of 10,000 words." Bess says, "Moo."

Second panel: Woman says, "So why does she only say "Moo"?" Bess says, "Moo."

Third panel: Man says, "That's all that's ever on her mind." Bess says, "Moo."

What I'm remembering here is Laura Petitto and Mark Seidenberg's reports on their work on Herb Terrace's Nim Chimpsky project. Paraphrasing from memory: what do signing chimps have to say to reseachers? "Me banana you banana banana banana banana." Moo.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 02:06 AM

July 23, 2004

Homiesexuals and guys on the down low

Mark Liberman and Eric Bakovic have discovered the notion of "life on the down low". Here's a version of what I wrote to the American Dialect Society mailing list on 5/1/04 about "homiesexuals" and guys on the DL.

Warning: Some decidedly "adult" sexual content in what follows.

Act 1. I come across "homiesexual", a new word (for me) for a familiar concept (to me):

Noticed in the May 2004 issue of Out magazine, in a fluff piece about brush-off lines for use by gay men in various settings (Hippie Hollow Park, Austin, Texas: "Hell, no! You look like you been rode hard and put up wet." Amoeba Music store in Berkeley, California: "Dude, I'm really, really sorry..."):

Homiesexual bar in the West Village, New York City: "Is you playin'?"

Putting aside the representation of dialect and speech style, this was the first time the almost surely inevitable "homiesexual" 'gay Black' (homey/homie + homosexual) had come past my eyes (or ears). About 7190 hits on Google, though. A great many for Homiesexual, a gay club in Manchester (England England, not New Hampshire). Others for an assortment of Black gay websites, including one for James Earl Hardy, author of (among other things) B-Boy Blues (1994), "a seriously sexy, fiercely funny Black-on-Black love story", as the front cover says; the back cover tells us, in a burst of -in', that the protagonist had "always wished, hoped, and dreamed for a RUFFNECK -- a hip-hop-lovin', street-struttin', cool posin', crazy crotch-grabbin' brotha". Very sweet book, in fact.

End of Act 1. As the curtain opens on Act 2, another ADS-Ler introduces "on the down low":

On May 1, 2004, at 9:36 AM, Nathaniel Thomas wrote:

The term I've heard, mainly from a NY Times article [NYT Magazine, "Double Lives on the Down Low", by Benoit Denizet-Lewis, 8/3/03, with responses in the gay, Black, AIDS, and general media, not to mention an Oprah show], is "on the down low", meaning closeted gay black men who visit gay bath houses (I think). The article is not fresh in my mind, but the essential aspect is that this term does not refer to an openly gay, especially flamboyant, black man, rather one who continues to live in the hip-hop subculture, has a female significant other, and has sex with other men "on the down low".

To which I respond:

That's something else. Homiesexuals are quite clearly Black, gay, out, proud, and identified with ghetto/hip-hop culture. They're not flamboyant, in the sense of outrageous, ostentatiously effeminate, campy, etc.; they're home boys -- queer home boys, but still home boys, with all the displays of hypermasculinity that go along with the homey identity. Hardy starts chapter 3 of B-Boy Blues with a little lecture on this role from his protagonist, who has a jones for these guys:

...[Raheim's] a B-boy -- or banjee/banji/banjie boy, or block boy, or homeboy, or homie, or as MC Lyte tags 'em, "ruffneck." [pages of exposition follow, ending with...] They are the boyz who are the true hip-hopsters, the gangstas, the menaces 2 and of society, the troublemakers, the troubleseekers, the hoods, the hoodlums, the hood-rocks, the MacDaddys, the DaddyMacs, the rugged hard-rocks...

[I continue...] In contrast, guys who are (or live) on the down low, the down-low, or the DL, emphatically don't identify as gay or bisexual and don't necessarily identify as homies (many are middle-class). They're the Black contingent of a group that social services people have come to label MSM, Men who have Sex with Men. MSMs (yes, I know, an awkward plural) in general reject the label gay as an identity label for themselves (which is why the social services folks need another term, if they're going to deal with these guys) and usually reject the label gay for their sexual activity, which they describe as playing with, or hooking up with, other guys. They're not queer, they just like to get it on (or off) with other guys now and then. Like closeted gay men, they don't publicly disclose their interest in having sex with other men, because of strong social disapproval for these activities, but they'd never describe themselves as closeted, because the closet is a gay thing.

When MSMs are willing to talk about the motivations for and satisfactions of their sexual activities with men, they tend to stress the male-bonding aspects of the thing, the celebration, even sharing, of masculinity with their partners (sort of like sports, but with orgasms). In this they are very much like frankly gay men. However, the enormous weight that MSMs put on masculinity tends to make their sex extraordinarily dick-centered, even more so than for frankly gay men. Caressing, kissing, and cuddling are just not on the program for many MSMs, because they're "too queer", too tainted with femininity; for many frankly gay men, on the other hand, displays of affection play a central role in sex.

Living on the DL (described in those terms) is definitely a Black way of being, and most guys on the DL are looking for Black partners. It's a Black-on-Black thing. In part this is because these guys (in common with many African Americans) see "being gay" as a specifically white thing. and in part because in their sexual activities guys on the DL are celebrating not just their masculinity, but their specifically Black masculinity.

I've gone on at such length about these identity categories because I think it's important to try to understand the categorizations that people use for themselves, rather than imposing an external "scientific" classification (based on observable characteristics, like actual sexual activity, or the nature of the objects of desire) on people. This is science too, social science in fact. and, as a practical matter, it's counterproductive to tell people that they are "really" something other than what they believe themselves to be ("really" gay, say, rather than just someone who plays with guys).

I've also tried to describe MSMs (and guys on the DL in particular) in as sympathetic terms as I can manage. Frankly, their way of thinking is just foreign to me, so this takes some work. But then I don't really understand the thinking of people who devote their lives to (as they see it) God's work, or who climb the world's highest mountains, or lots of other people.

In any case, living on the DL has been very much in the news, and there's at least one non-fiction book (which I haven't yet read) about the subject: the best-selling On the Down Low: A Journey Into the Lives of "Straight" Black Men Who Sleep With Men, by J.L. King. [Not an auspicious title. There's the quotation marks around "straight". And the euphemism "sleep with", which is especially off-kilter, since one thing that guys on the DL do little of is actually sleep with other men. According to the (mostly unsympathetic) responses to the book I've seen, King (who has himself renounced life on the DL) is disapproving and judgmental, suggesting that guys on the DL are primarily responsible for the spread of AIDS in the African American community, especially to Black women.] There are works of fiction as well: D.L. Smith, Down Low, Double Life, and of course books by E. Lynn Harris.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 09:16 PM

Down low, quiet tip

I'm probably not much more in the cultural loop than Mark is, but I've heard on the down low (and some of its variations, many mentioned by Mark here) used to refer to pretty much any covert behavior, much in the same way as the perhaps-more-familiar on the quiet tip (or on the QT).

Specific reference to covert homosexual behavior is most likely not as new as the relatively recent attention this has received in books and television might lead one to believe, but reference to other types of covert behavior is at least still salient enough for Scion ("A Marque of Toyota Motor Sales, U.S.A., Inc.") to advertise their hip new cars in a film entitled "On the D.L."

(I'm not saying that Scion wouldn't want to sell their cars to closeted homosexuals, but I kind of doubt that they would willingly use this phrase in an ad for their car if it didn't have other salient meanings. Or, maybe Scion is even cooler than I think.)

On the quiet tip and its variants are apparently also used specifically to refer to covert homosexual behavior, unless I am misunderstanding the title of this magazine. As this article clarifies, however, on the down low is more readily understood in this context:

Kerr tested other street slang that surfaced in his conversations - on the quiet tip, hush-hush, bling-bling - but no word or phrase connected with the audience as solidly as "on the down low."

Perhaps this distinction explains why this lonely prisoner searching for "Women to correspond" is not shy to write:

I like dancing and going to clubs, but I can also enjoy myself on the quiet tip: reading books, going to movies or just talking.

Or maybe this is a covert declaration that he's also searching for men? I'm so out of the loop, I wouldn't even know.

[ Comments? ]

Posted by Eric Bakovic at 05:39 PM

On the DL

A story by Kia Gregory in PW features an expression that was new to me. It's variously written in the article as "on 'da low'", "on the low", and "on the DL". The modifier version is "DL", as in "some DL men", "there are no DL signs", etc. Gregory defines being "on the low" as "having girlfriends, but sleeping with men on the side".

At first I thought that "DL" was an acronym for "da low", so that "on the DL" would be a covert double determiner; but in fact "DL" represents "Down Low". There's a book title that is obviously a version of this same expression: "On the Down Low: A Journey into the Lives of Straight Black Men Who Sleep with Men", and several web sites with discussions of the behavior involved.

Sometimes "on the down low" seems to refer to covert sex of whatever kind. For example, there's another book "Down Low, Double Life" in which the expression seems to refer to heterosexual infidelity.

At least some examples of "on da low" in song lyrics also seem to be associated with covert heterosexual relations:

meet me at da southside
let yah know imma hit that
make sure dat youre daddy just dont know
ill put u in da hoodbug
u no wat we doin cause
we be eatin on da low

meet me at da southside
baby we can go hide
knowin dat muh boys goin have my bakk
only at tha southside girl
southside

Anyhow, the bisexual version of DL was featured on Oprah back in April, so I'm just out of the cultural loop altogether, it seems.

An idea about where the expression might come from is suggested by this usage, in Willie Perdomo's 1996 Poet in Harlem:

That night he went looking for
a poem
he left his electric typewriter humming
on the kitchen table
and ran out to the wide
sidewalks of Lenox Avenue

Aunties sat on their stoop box seats
mixing cheers and gossip
beers on the down low
With arms thrown to the sky
I celebrate a touchdown

This suggests the way that people hold a beer privately in public, so to speak. But whatever the source, the expression is new to me.

[Update: I'm convinced by Eric Bakovic's suggestion that "on the DL was originally a general term for covert activity. But reader Andy Carvin emailed some evidence that the more specific sexual identity meaning that Arnold Zwicky discussed has been around in popular culture for at least a decade:

Last year in Washington DC a local tv news show did a series on "down low" culture amongst African American men who are married but are secretly bisexual. I thought I'd never heard the term before until I was watching the Top 100 Videos of All Time countdown on VH1, and they played "What a Man" by En Vogue and Salt 'n Pepa. My wife and I both noticed the following lyric:
And although most men are ho's he flows on the down low Cuz I never heard about him with another girl
The song is from 1994, give or take.

But here are the lyrics to that song, and you could also understand them as meaning that the guy is discreet in his connections with other women, if Eric is right. Or you could connect the same lines, as Andy did, to the female version of the attitude expressed in the PW article that started this out for me:

"I'm not with another woman," he says as a consolation to the women he dates. "I'm not being a dog. I'm just fulfilling my needs."
[...]
Eventually Will hopes to find Ms. Right, get married and have kids--ideally a boy and a girl. But with one condition.
"I would never cheat on my wife with another woman," he says, "but I have to have my best friend on the side."

]

Posted by Mark Liberman at 03:49 PM

The best part of a day

Sometimes you read or hear something, and realize that things have changed. You get that "we're not in Kansas anymore" feeling. At a project meeting this morning, one of the grad students was explaining how he'd spent his week:

"...so we ran [Ryan's] gene tagger and compared the results with Yang's list of alternative gene names. It was kind of slow, though -- we did about two million documents, and it took the best part of a day."

The documents were abstracts, so if you do the math, this is only about 12,000 words/second. And he probably ran it on our linux cluster, so you could divide 12K by N for however many nodes he used.

But all the same, when I think about my first experience with computers...

It was on an early PDP-9 with 8192 18-bit words of core memory, whose mass storage devices were paper tape and a magnetic drum whose storage capacity was 256KB, if I remember right... It could do a fixed-point addition in two microseconds -- multiplication was done in software. There was an CRT display driven by a special-purpose vector system. For a while after the machine arrived, it was fairly easy to get console time since there wasn't any software yet.

I wrote a program for measuring durations in speech waveforms.

Posted by Mark Liberman at 02:59 PM

It's about ideas, not words

I understand that George Lakoff will be interviewed this evening on NOW with Bill Moyers, as part of a pre-convention broadcast.

From Moyers' website:

"This Monday when the lights go on in Boston at the Democratic convention, speakers will be center stage and working hard to deliver messages that connect with voters. Analysts say that for two decades conservatives have done a much better job than progressives to frame and talk about their values and some say the convention could be a make or break moment in the election. Do words really have the power to win not just hearts and minds, but votes? David Brancaccio gets a few words on the subject from world-renowned linguist George Lakoff. Dr. Lakoff is a founder of the Rockridge Institute, a new political think tank set up to reframe the terms of political debate to make a progressive vision more persuasive and influential. Lakoff is a professor at UC Berkeley and is the author of 8 books, including the influential MORAL POLITICS: HOW LIBERALS AND CONSERVATIVES THINK and most recently DON'T THINK OF AN ELEPHANT! WHAT EVERY AMERICAN SHOULD KNOW ABOUT VALUES AND THE FRAMING WARS, which is due out next month."

First James Fallows, now Bill Moyers -- at least via ex-Marketplace host David Brancaccio. Good for George.

But there's something funny about that blurb. George's point about how political arguments are framed is not about words, it's about ideas. When he writes that

...there are distinct conservative and progressive worldviews. The two groups simply see the world in different ways. ... these political worldviews can be understood as opposing models of an ideal family -- a strict father family and a nurturant parent family. These family models come with moral systems, which in turn provide the deep framing of all political issues.

he's not talking about vocabulary. He's claiming that there are (at least) two systematically different perspectives on the family, and that the major difference in American politics today is a metaphorical transformation of that opposition.

He could be right or wrong -- or a bit of both -- but he's right or wrong about the nature and role of ideas, not about the nature and role of words.

I think there's a lexical ambiguity that contributes to this confusion. Lakoff talks about "frames", by which he means something like "basic cognitive structures which guide the perception and representation of reality". (The frame terminology is not an original contribution -- it's especially associated with work in sociology by Erving Goffman and others -- though George has his own take on the concept).

So when Lakoff talks about how political debates are "framed", he means (I think) to talk about what frames (in the sense of conceptual structures) underlie them. But the verb to frame has an ordinary meaning "to put into words", and whoever wrote Moyers' blurb seem to have translated George's shtick about how conservatives have done a better job of "framing" their arguments into the rather different idea that "words really have the power to win not just hearts and minds, but votes"

Of course specific words and phrases evoke (aspects of) specific frames. But to say that George is telling us about the role of words in political discourse is like saying that an architectural critic is telling us about the role of building materials in urban culture. There's a layer or two left out.

Posted by Mark Liberman at 09:00 AM

July 22, 2004

La yapita

As a kid growing up in La Paz, Bolivia, I remember using la yapa (less often heard there, at least in my own memory, as la ñapa) in the sense described for lagniappe by Twain. For example, you might be putting sugar in your coffee, and count off the teaspoonfuls this way: "uno, dos, y la yapita" ("one, two, and the little extra bit" -- it was most often used in the diminutive form, with -ita).

I do wonder why the diphthong of the original Quechua form (according to the AHD) was lost, though. People in that part of Bolivia have a lot of Quechua and Aymara words in their vocabulary, and ones with final falling-sonority diphthongs are pretty typical. They even pronounce certain Spanish morphemes with a final diphthong; e.g. hijito ("son, dim.") is often pronounced hijitoy -- the diphthong appears to give the word a little extra bit of the sense of endearment already provided by the diminutive suffix.

[ Comments? ]

Posted by Eric Bakovic at 03:18 PM

Bear-knuckled politics

Jacob Sullum at reason online asks "Do Bears Have Knuckles?"

"From a story by Jonathan Chait in the July 26 New Republic: 'Bush and his allies have been described as partisan or bear-knuckled...' If you've ever seen Yogi and Boo-Boo fight, you know what he means."

In the comments, Phil asks "Are we witnessing the birth of an eggcorn?"

Probably not the birth, exactly:

(link) "Thank goodness actually they're wearing gloves, because I've witnessed bear knuckled boxing in a barn in Somerset, about 3 years ago..."
(link) While is generally recognized that this issue is one of many which underlies the Act 60 debate, the degree of rhetoric, finger pointing and name calling has escalated to a level one would expect in big-city, bear knuckled politics.
(link) Huber and his supporters were somewhat less comfortable engaging in a bear-knuckled internal political struggle for power than Peters' adherents.
(link) In fact, we often disagree on many subjects, leading to a full on, stripped to the waist, bear-knuckled, single-takedown grudge match.
(link) I've always found that part of the NASA Uber Alles crowd to be very troublesome, thinking that, gosh, it is all sweetness and light down there at JSC, when it's a bear-knuckled bureaucracy like anything else.

By the way, Andrew Sullivan quoted the same New Republic passage on July 17 without noticing (or at least without mentioning) the substitution.

Here's what bears really do with their knuckles in political encounters, apparently:

A bear may 'pop' its jaws or swat the ground while blowing or snorting. It may lunge toward you or 'bluff' charge in an attempt to motivate you to leave - usually stopping well short of contact. These are defensive behaviours, signalling you are too close. Remain calm and increase your distance from the bear.

I find that this is good advice in most bureaucracies as well.

Back at reason online, there are some pretty good jokes in the comments on Sullum's post.

Posted by Mark Liberman at 02:25 PM

You want to lose the etymologist vote?

For lagniappe on the whole "big apple" business, Gene Buckley sent in a link to a page at Cecil Adams' Straight Dope site. Adams documents his dialogue with Barry Popik, whom he calls "[b]y day a NYC parking-ticket judge, by night ... an indefatigable word sleuth". Adams' discussion of Popik's position on "big apple" is clearer than what you'll find on Popik's web site, and Adams also discusses Popik's theories about "windy city".

My favorite part?

"... when Popik attempted to notify former Chicagoan but soon-to-be New Yorker Hillary Rodham Clinton of his findings, she blew him off with a form letter--and this from a woman facing a campaign for the Senate. Come on, Hill, quit worrying about the Puerto Ricans and pay attention here. You want to lose the etymologist vote?"

A word to the wise.

Posted by Mark Liberman at 11:57 AM

Michael Moore on back-up vocals at the Aladdin?

According to the OED, the term publicity stunt, meaning an apparently newsworthy event staged to create free advertising, dates from the 1920s:

1926 ‘SAPPER’ Final Count vii. 195 It was just an advertisement -- an elaborate publicity stunt.

Looking this up, I was surprised to find that hype, in the sense of "Deception, cheating; a confidence trick, a racket, a swindle, a publicity stunt", is not attested in print before 1962:

1962 J. BALDWIN Another Country (1963) II. iv. 336 Life is a bitch, baby. It's the biggest hype going.

Anyhow, it seems that future dictionary editions might want to illustrate these entries with pictures of Linda Ronstadt and the management of the Aladdin Theater in Las Vegas, linking to news stories about how on July 17

Aladdin President Bill Timmins ordered security guards to escort pop diva Linda Ronstadt off the property following a concert Saturday night during which she expressed support for controversial documentary filmmaker Michael Moore.

Timmins, who was among the almost 5,000 fans in the audience at the Aladdin Theatre for the Performing Arts, had Ronstadt escorted to her tour bus and her belongings from her hotel room sent to her. Timmins also sent word to Ronstadt that she was no longer welcome at the property for future performances, according Aladdin spokeswoman Tyri Squyres.

Liz Ditz at I Speak of Dreams suspects that "this whole flap was a publicity stunt by the president of Aladdin to get free publicity for his dive, and to cover up the marketing fiasco that meant the concert lost money". Liz cites pretty good evidence for her suspicions, including a later newspaper story with quotes like this:

Some concertgoers took issue with the Aladdin's accounts of angry patrons tearing down posters and throwing drink cups.

"I was so stunned to read in the newspaper that anyone had a negative reaction," said KLAS-TV, Channel 8, news anchor Paula Francis. "Everyone who was leaving when I was leaving was just thrilled. They thought it was a good concert."

At the end of an hour's worth of singing, "she got a standing ovation, then she came out and did the (`Desperado') encore," Francis said. "There were loud boos and there was quite a bit of applause. But everyone calmed down right away and seemed to enjoy the rest of the encore."

And according to yesterday's Billboard:

Thanks to negotiations today (July 21) between the Recording Artists Coalition (RAC) and the prospective new owners of Las Vegas's Alladin Theater, expect to see RAC member Linda Ronstadt back at the venue this fall -- with filmmaker Michael Moore on backup vocals.

Of course, a guerilla band from the Onion may have taken over Billboard's web site.

By the way, Billboard's "Alladin" for Aladdin does not show the usual preservation of gemination pattern:

	dd	d
ll	7,740	191,000
l	2,350,000	558,000

perhaps because "aladin" brings up lots of non-English sites, project names, etc.

Posted by Mark Liberman at 09:59 AM

"He prefers the perverse French esthetes"

Jeff Erickson at Ernie's 3D Pancakes points to Christian Bök's Eunoia (2001), which is named for the "the shortest word in English to contain all five vowels." Each of the first five chapters is written using only one of the vowels.

And there's more, such as a chapter entitled "AND SOMETIMES", whose content I bet you can guess.

The book's web site even has a flash animation (of CHAPTER E, natch).

But I think I'll wait for the comic-book format, myself. Or perhaps the mini-series, since according to a blurb available on amazon, "Bök has created artiflcial languages for two television shows: Gene Roddenberry’s Earth: Final Confiict and Peter Benchley’s Amazon".

The same source tells us that "[h]is conceptual artworks (which include books built out of Rubik cubes and Lego bricks) have appeared at the Marianne Boesky Gallery in New York". As Dave Barry says, I swear I am not making this up.

In other news, "Reading is at risk in the United States, according to a new report from the National Endowment for the Arts".

Posted by Mark Liberman at 08:36 AM

Lagniappe

According to the American Heritage Dictionary, this is lagniappe:

NOUN:	Chiefly Southern Louisiana & Mississippi 1. A small gift presented by a storeowner to a customer with the customer's purchase. 2. An extra or unexpected gift or benefit. Also called Regional boot². See Regional Note at beignet.
ETYMOLOGY:	Louisiana French, from American Spanish la ñapa, the gift : la, the (from Latin illa, feminine of ille, that, the; see al-¹ in Appendix I) + ñapa (variant of yapa, gift, from Quechua, from yapay, to give more).
REGIONAL NOTE:	Lagniappe derives from New World Spanish la ñapa, “the gift, ” and ultimately from Quechua yapay, “to give more.” The word came into the rich Creole dialect mixture of New Orleans and there acquired a French spelling. It is still used in the Gulf states, especially southern Louisiana, to denote a little bonus that a friendly shopkeeper might add to a purchase. By extension, it may mean “an extra or unexpected gift or benefit.”

According to Mark Twain (Life on the Mississippi, ch. 44), this is lagniappe:

We picked up one excellent word--a word worth traveling to New Orleans to get; a nice limber, expressive, handy word--'lagniappe.' They pronounce it lanny-yap. It is Spanish--so they said. We discovered it at the head of a column of odds and ends in the Picayune, the first day; heard twenty people use it the second; inquired what it meant the third; adopted it and got facility in swinging it the fourth. It has a restricted meaning, but I think the people spread it out a little when they choose. It is the equivalent of the thirteenth roll in a 'baker's dozen.' It is something thrown in, gratis, for good measure. The custom originated in the Spanish quarter of the city. When a child or a servant buys something in a shop--or even the mayor or the governor, for aught I know--he finishes the operation by saying--

'Give me something for lagniappe.'

The shopman always responds; gives the child a bit of licorice-root, gives the servant a cheap cigar or a spool of thread, gives the governor--I don't know what he gives the governor; support, likely.

When you are invited to drink, and this does occur now and then in New Orleans--and you say, 'What, again?--no, I've had enough;' the other party says, 'But just this one time more--this is for lagniappe.' When the beau perceives that he is stacking his compliments a trifle too high, and sees by the young lady's countenance that the edifice would have been better with the top compliment left off, he puts his 'I beg pardon--no harm intended,' into the briefer form of 'Oh, that's for lagniappe.' If the waiter in the restaurant stumbles and spills a gill of coffee down the back of your neck, he says 'For lagniappe, sah,' and gets you another cup without extra charge.

The Mark Twain reference was sent in by reader Jerry Kreuscher, to defend the plausibility of John Ciardi's etymology for "big apple" against Barry Popik's objection that "New Orleans is a French town, not a Spanish one". The AHD citation might be enough for a scholar, but Jerry threw in the Mark Twain quote for lagniappe.

Posted by Mark Liberman at 07:56 AM

Uphill and Downhill on the Alpe D'Huez Pitch Track

Dear Mark,

I didn't hear Lance Armstrong's recent post-stage-winning press conference, but the quote you discuss (here) is even more fun than you seem to have realized.

Relaxed and smiling at a news conference, he said, "I didn't think this would be the decisive day of the Tour," implying that it had been.

If the writer, Samuel Abt in the NYT, realized how important it was to clarify what the implication was, then that's really amazingly sensitive of him. You see, what Lance said, in written form, could either imply that the day was the decisive one, or imply the exact opposite.

The key is intonation. So listen up, Mark.

There are several different possible intonational patterns for each of the two readings I'm talking about. The reading that the writer thinks Lance intended is one you can get with very little stress on think, a high pitch accent on decisive and Tour (or on almost any combination of words in the propositional complement of think), ending with a final low at the very end of Tour, which may end up pronounced quite long, almost as if it were bisyllabic.

In the following picture, which I'm hoping your browser will render in a fixed font, the line gives an idealized pitch track (idealized e.g. by not including the declination that would take place throughout the sentence, and ascii-izing the smooth interpolation between various pitch targets, and exaggerating in places). The funny symbols under the pitch track are what intonational phonologists term ToBI labels, H* being a high accent on the stressed syllables of decisive and Tour, and L-L% being a combination of low tones at the end of the prosodic phrase. These are so-called boundary tones, and here are not heard as distinct from each other.

I didn't think this would be the decisive day of the Tour
                             /\________________/\
__________________________________/      \
                                                      \_
                          H*       H*L-L%

The opposite reading, if I remember the classic source on this matter, is one you get if you go low on think, and maintain that low right up to Tour, which then ends with a final high (boundary) tone. This is what is known in the literature as a contradiction contour. To recognize the reading try to add to the sentence ...and it wasn't, as opposed to ...but it was for the reading pictured above.

I didn't think this would be the decisive day of the Tour
                                                           _
__________                                    /
          \____________________________________________/

            L*                      L*L-H%

So, you see Mark, changing the intonation on an attitude report can completely switch the implicatures. But don't trust me. The reference you need, a classic paper which I believe was the first to discuss the contradiction contour, was written by two guys who really understand this stuff:

Liberman, M., and Sag, I., Prosodic form and discourse function, Papers from the Tenth Regional Meeting of the Chicago Linguistic Society, pp. 416--427 Chicago, (1974).

Posted by David Beaver at 01:18 AM

July 21, 2004

Pragmatics and semantics at Alpe d'Huez

Samuel Abt's NYT story about Lance Amstrong's win on Alpe d'Huez has two linguistically interesting quotes from Armstrong's post-stage press conference.

The first one is a classic conversational implicature:

Relaxed and smiling at a news conference, he said, "I didn't think this would be the decisive day of the Tour," implying that it had been.

The hypothesized implicature is easily cancelled ("and sure enough, it wasn't", or "and we won't know for sure what today's result means until the race is over"). It's so easily cancelled that I wonder whether Armstrong meant Abt's conclusion to be as transparent as the casual tag "implying that it had been" suggests. In fact, elsewhere in the interview, Armstrong goes out of his way to express the conventional "it ain't over 'til it's over" athletic perspective.

The second interesting quote is a complex combination of negation, quantification and comparison, expressed as an evaluation of the rowdiness of the spectators. These (as Abt puts it) "raced beside the riders, flapped flags in their path and waited in the road until the last moment to take a photograph of the onrushing competitor". Armstong's comment:

"There never was a moment when anyone was more aggressive than I've ever seen."

Clear enough, I guess. But a tough exam question would be: translate into predicate calculus, and explain how to derive the meaning from the form. Uphill switchbacks all the way, against the clock.

Posted by Mark Liberman at 08:28 PM

"Ho ho ho", she laughed in a refined feminine way

In reply to my question about lexical extensions in non-English-language comics, Ray Girvan emailed several helpful links.

(link) "Japanese sound effects and what they mean"
(link) Michaela Schnetzer, "Problems in the translation of comics and cartoons"

The Schnetzer paper has a number of other useful links in its bibliography, especially

(link) Gergana Ivanova. “On the Relation between Sound, Word Structure and Meaning in Japanese Mimetic Words”.

A couple of interesting odds and ends about Japanese, comics and otherwise:

In Japanese manga, (according to the first link above), "masculine laughter" is "ha ha ha" or "ahahaha", whereas "refined feminine laughter" is "ho ho ho". This seems to be the opposite phonetic direction from English, where stereotypically feminine laughter is usually represented as something like "teeheehee", and "ho ho ho" is what Santa Claus does. In manga, apparently "a strange laugh" is "hu hu hu" or "fu fu fu". This would be strange in English as well, too strange to use, I think. The English convention for diabolical laughter is more like "bwahaha".

In (ordinary, non-manga) Japanese ideophones (according to the Ivanova paper)--

the pattern /(Ne+CV)²/ means something like "stickiness", "tenacity"; e.g. NEba-NEba (sticky, greasy), NEchi-NEchi (sticky, persistent);

the pattern /(NO+CV)²/ means something like "slow action", "lack of stress/ anxiety/uneasiness"; e.g. NObi-NObi (feel at ease, be relaxed/relieved), NOko-NOko (nonchalantly), NOro-NOro (drag oneself, walk slowly).

I'm not clear about whether manga use ideophones in the normal way, or if there are special manga conventions or extensions.

It's worth trying to clarify what we're talking about here. We have to start by distinguishing among several categories of sounds:

1. Sounds not made by humans at all (like things falling, machines working, punches landing)
2. Biologically constrained human sounds (like sneezes, cries of pain, laughter, breathing)
3. Filled pauses and other hesitation sounds (like English uh, um, er)
2. Non-lexical vocal gestures (like clucking the tongue or English "sh+" or "aw+"
3. The wider class of conventionalized interjections (like English whoa or d'oh)
4. Non-phonological onomatopoeic sounds, whether imitations of natural sounds or non-representative evocative noises
5. Ideophonic words and systems of ideophonic vocabulary, fully embedded in a language's phonological system

These categories blend into one another in many cases, but the distinctions are still worth making.

Cross-cutting these distinctions, we need to distinguish between the way that such sounds are performed (or happen naturally), and the way that they're represented orthographically. The orthographic conventions can in turn influence the way that some people perform the sounds, as in the case of "tsk tsk", which starts as a way to represent clucking the tongue, but is often pronounced as if it were a phrase spelled "tisk tisk".

Within the realm of human performance of these sounds, there are some that are completely adopted into the phonological system, and others that are completely outside it, and everything in between. The English word tinkle might be a case of phonetic symbolism, but it's also just an English word, like tanker or pickle. At the other end, there are expressive noises that are completely outside the phonology. Clucking the tongue is a good example, but there are plenty of others -- kissing, spitting or slurping noises, for example, or naturalistic imitations of animal sounds. Such expressive noises are not entirely universal, though, either in their inventory or their modes of performance. And there's a sort of continuum of degrees of phonologization, for instance from a completely naturalistic imitation of a cat's meow to the English word meow, perhaps pronounced in a somewhat cat-like way.

English has a lot of ideophonic words, but it doesn't really seem to have an ideophonic system of the kind that Japanese or Korean or Yoruba have. One of the things that comes close, I think, is the emergent culture of comics sound spelling, which differs in being initially written rather than spoken, but otherwise might lend itself to the kind of analysis performed in the Ivanova paper.

Posted by Mark Liberman at 05:02 PM

Comicbook grammarians

I enjoyed Mark's post on comicbook spellings of vocalizations, interjections, and other noises because I've been reading more comics lately, including The Essential Spider-Man series. (Older) Spider-Man comics are interesting to read for a variety of reasons, including the overabundance of scene-setting dialogue, Stan Lee's editorial comments, and of course the spellings of noises (not to mention the great artwork and stories).

Part of Spider-Man's appeal is his alter ego Peter Parker, at once an everyman and a science genius. Less well-known but of particular interest here is that Peter is apparently also a grammar buff. Here are some examples.

(In all of these examples, the bolded emphasis is in the original.)

In this panel, Kraven the Hunter lunges at Spider-Man and misses, resulting in the following exchange:

Kraven: Once I get my hands on you I'll.. UHHH!

Spider-Man: Tsk-tsk! Didn't anyone ever tell you not to end a sentence with an expletive? I can tolerate your nastiness.. but bad grammar... unforgiveable!

In this panel, Spidey's fighting off some baddies:

Baddie #1: C'mon... grab 'im!! He's no bigger'n us!

Spider-Man: Tsk tsk! You mean "no bigger than we"!

And in this panel, Peter once again corrects his Aunt May's failed attempt to use slang. Aunt May is defended by Mary Jane Watson's Aunt Anna (who seems to know a thing or two about satiation effects):

Aunt May: Isn't that Dr. Bromwell the dearest thing? As you would say Peter.. he's a regular pussywillow!

Peter: No, Aunt May! I keep telling you.. the word is pussycat!

Aunt May: But I think pussywillows are much cuter, dear!

Peter: OK, pretty girl! If you say so, he's a pussywillow!

Aunt Anna: The more May says it, the more it sounds right to me!

More examples coming soon.

[ Comments? ]

Posted by Eric Bakovic at 02:45 PM

Stop us before we derive again

Llanfynydd in Carmarthenshire has has re-named itself Llanhyfryddawelllehynafolybarcudprindanfygythiadtrienusyrhafnauole, in protest against a planned wind farm. The new name means "a quiet beautiful village, a historic place with rare kite under threat from wretched blades".

I'm sympathetic -- a nice nuclear power plant would be much less intrusive -- and the change has obviously gotten the villagers a certain amount of publicity. But it's a funny world in which this sort of action poses a threat to developers:

"Boss, we've got big trouble."
"What, protesters are tearing up our new transformer platforms?"
"No, it's that Welsh village again. They've sworn to add two morphemes a week until we halt construction."
"(annoyed grunt)."

[via OnzeTaal]

[Note: I've changed the title of this post because the village's protest name is so long that it messes up the index page.]

Posted by Mark Liberman at 11:54 AM

Dunglish

No, this has nothing to do with fertilizer. According to Vanessa Baks-Pannell, it's "the language that overtakes English-speaking people living in an environment where Dutch and Dutch/English are all around them".

My son speaks fluent Dunglish, days of the week for him are Monday, dinsdag, woensdag, donderdag, vrijdag, zaterdag and Sunday, unless he is speaking Dutch of course. :)

My 21-year-old daughter is also perfecting Dunglish. Her specialty is expressions, oost west home's best, alle hands aan dek, every boontje heeft een toontje and many more ...

[via OnzeTaal]

Posted by Mark Liberman at 11:24 AM

Unh, Ka-BOOM, BZZURKK

Following up on my post about the (non) standard spelling of d'oh and other interjections and non-lexical noises, Jeff Erickson emailed pointers to three highly relevant reference works:

The Unh! Project ("A collection of guttural moans from comics"), including "An analysis of comics vocalizations".

Ka-BOOM! A Dictionary of Comicbook Words on Historical Principles ("Based on the Latest Conclusions of the Most Dubious Wordologists & Comprising Many Hundreds of New Words which Modern Literature, Science & Philosophy have Neglected to Acknowledge as True, Proper & Useful Terms & Which Have Never Before Been Published in Any Lexicon. Compiled & Edited Under the Careful Supervision of Kevin J. Taylor").

BZZURKK! The Thesaurus of Champions ("Providing Correct Spellings Without Mention of Time Wasting Parts of Speech, Pronunciations or Derivations & Yet Including The Most Comprehensive Definitions & Synonyms Available At Any Time Past or Present And Including Many Words Never Before Published in Any Lexicon. Compiled Under The Careful Scrutiny Of Kevin Taylor").

This is an essential but understudied area of (non) lexicography, a corner of English that has escaped the orthographic standardization of the 18th and 19th centuries, preserving the creative exuberance of spelling in Shakespeare's time. The truth behind the (technically false) observation that no one ever misplaces the apostrophe in d'oh is that folks don't normally get upset about how to render such items -- Lynne Truss and her ilk never feel compelled to chuck a dictionary through the window of someone who has misspelled a word like vree.

Jeff's references, valuable as they are, leave out the all-important cross-linguistic dimension. How, you may wonder, do I translate "urk" into Chinese? How does one render an amused snort in Bahasa Indonesia? The answer is, I don't know, though I once listened to a lecture on the phonotactics of comic-book sounds in Finnish, and I have passed many a happy hour in foreign train stations reading the comics noises in diverse languages.

Seriously, this is a problem with many fascinating aspects. There is the phonological space of such 'words', which are phonologized renderings of non-speech noises, human and otherwise; there is the underlying phonetic symbolism and its cultural development through partial lexicalization; there is the orthographic dimension, which has a life of its own that is both influenced by the sound and sense, and also influences them; and there is the sociological and cross-cultural dimension, where traditions grow, influence one another and decay...

One minor but annoying problem is the apparent lack of any accepted term for the whole area of study. It's simultaneously narrower and broader than the study of ideophones; talking about "comics noises" is too limiting, since the forms are used in many other contexts as well, and often are completely extra-comical (like Nero Wolfe's "pfui", for example). I don't know any way to talk about this, other than to give a phrasal definition supplemented with examples.

Simpsoncalifragilisticexpiala(Annoyed Grunt)cious

I have an update on the correct way to spell d'oh -- Don Porges points out to me by email that it's actually "(annoyed grunt)", parentheses included. At least, that's the standard orthographic representation as far as The Simpsons' writers are concerned.

This reference supports him:

One of the most frequent questions from alt.tv.simpsons newcomers is how
to spell Homer's renowned expression. Although the generally accepted
spelling is "D'oh!", many sources feature different versions,
including closed captioning's own "D-oh!"

But the funniest part is, if you ever looked at a Simpsons script,
all you would see are mentions of "annoyed grunts" over and over.
When the series started, Matt and the boys let Dan Castellaneta
choose an interpretation for the "(Annoyed Grunt)" indicator; since
then, Homer's "D'oh!" has always been referred to in that fashion.
(Though we know through 3F24 and 3G01 that the writers acknowledge
the usual spelling.)

All of this makes this episode's official title:
"Simpsoncalifragilisticexpiala(Annoyed Grunt)cious". Some TV Guides
actually printed this version in their listings; Robert Berry also
notes that it was featured in the DSS onscreen program guide.

One of the perennial problems with standardizing transcription practice is to decide on a stardard list of non-lexical noises, and a standard way to spell each of them. Here's a reference that standardizes English filled pauses as ah, eh, er, uh and um, English interjections as ach, duh, eee, ew, ha, hee, huh, huh-uh, hm, jeepers, jeez, mm, mhm, nah, oh, okay, oof, ooh, uh-huh, uh-oh, whoa, whew, whoops, woo-hoo, yay, yeah, yep, yup, and English speaker noises as {laugh}, {cough}, {sneeze}, {breath}, {lipsmack}. Similar lists for Chinese and Arabic can be found in the same reference.

D'oh didn't make it into any of the lists, under any spelling. This is a significant omission, in a world in which The Simpsons is now probably quoted more often than Shakespeare is, and Matt Groenig has arguably become our culture's Homer.

[Update: Don Porges further points out that at least two other Simpsons' episodes use the (annoyed grunt) spelling in their official titles:

231. AABF19   E-I-E-I-(ANNOYED GRUNT)
322. FABF04   I, (Annoyed Grunt)-bot

while at least three use the d'oh spelling instead:

209. AABF02   D'oh-in' in the Wind
244. BABF14   Days of Wine and D'oh'ses
306. EABF10   C.E. D'oh.

]

Posted by Mark Liberman at 06:57 AM

July 20, 2004

Save the Big Apple

You can help rescue the Big Apple from a French brothel-keeper. Etymologically speaking, that is.

If you ask Google for {big apple history}, the top-ranked site (http://salwen.com/apple.html) will tell you about Evelyn Claudine de Saint-Évremond, who immigrated from France in 1803, started a bordello, acquired the nickname "Eve", and therefore "would refer to the temptresses in her employ as 'my irresistable [sic] apples'", blah blah. According to Salwen's (uncited) "unique archival sources".

But the truth, according to Barry Popik, is that the term was invented by African-American stable hands in New Orleans, and was first used in print on May 3, 1921, in a horseracing column by sportswriter John J. Fitz Gerald:

J. P. Smith, with Tippity Witchet and others of the L. T. Bauer string, is scheduled to start for “the big apple” to-morrow after a most prosperous Spring campaign at Bowie and Havre de Grace...

Fitz Gerald explained the term in a column on February 18, 1924:

The Big Apple. The dream of every lad that ever threw a leg over a thoroughbred and the goal of all horsemen. There’s only one Big Apple. That’s New York.
—————————————————
Two dusky stable hands were leading a pair of thoroughbred around the “cooling rings” of adjoining stables at the Fair Grounds in New Orleans and engaging in desultory conversation.

“Where y’all goin’ from here?” queried one.

“From here we’re headin’ for The Big Apple,” proudly replied the other.

“Well, you’d better fatten up them skinners or all you’ll get from the apple will be the core,” was the quick rejoinder.

Fitz Gerald gave a slightly different version of the "big apple" story on December 1, 1926:

So many people have asked the writer about the derivation of his phrase, “the big apple,” that he is forced to make another explanation. New Orleans has called it to his mind again.

A number of years back, when racing a few horses at the Fair Grounds with Jake Byer, he was watching a couple of stable hands cool out a pair of “hots” in a circle outside the stable.

A boy from an adjoining barn called over. “Where you shipping after the meeting?”

To this one of the lads replied, “Why we ain’t no bull-ring stable, we’s goin’ to ‘the big apple.’”

The reply was bright and snappy.

“Boy, I don’t know what you’re goin’ to that apple with those hides for. All you’ll get is the rind.”

So link to Barry Popik's pages, and help Google steer internet pilgrims towards the truth!

[Update 8/2/2004: No, link to this summary page at Popik's web site!]

[Link via Grant Barrett]

[Update 7/21/2004: Jerry Kreuscher emailed

John Ciardi's 1982 "A Browser's Dictionary" has a different story. His story has New Orleans in common with you citations, but claims to begin a little earlier. It is:

The Big Apple The all-but-official nickname of New York City. [Charles Gillett, president of the New York Convention and Visitors Bureau, initiated (c. 1971-1972) the campaign to have the nickname officially adopted. Mr. Gillett calls the nickname a "positive upbeat symbol" and claims that it has become "the most successful city slogan in the history of tourism." So the robotics of enthusiasm.
The term originated among black jazz musicians of New Orleans c. 1910, as a translation of Sp. manzana principal. Manzana means "apple", but also "tract of land" (apple orchard), and in common usage "city block." Manzana principal a main city block, downtown, the main stem, where the action is. The term later passed into show biz with the sense "the big time," and thence prob. to Mr. Gillett, but it has always remained a special term for jazz men. In his book Hi De Ho (1936) Cab Calloway defined the Big Apple as "the big town, the main stem, Harlem."

Barry Popik also discusses Gillett's role, manzana principal, and Ciardi's dictionary entry, but dismisses them as one of the list of eight "false etymologies" that he places down at the very bottom of his interesting but nearly unreadable index page.]

[Update #2: Jerry Kreuscher isn't convinced:

The claim that there wouldn't be a Spanish term in New Orleans slang is wrong. The counter-example "lagniappe", which OED says comes from Spanish "la ñapa", leaps to mind. New Orleans became a Spanish town when France ceded it after the Seven Years War, and there was Spanish land along the Gulf coast just to the east in what was then called "the Floridas." Spanish was not unfamiliar there.
Since this correction came a decade before Mr. Ciardi put his version in his book, I'll suppose that it did not convince him, either. In going this far, I'm over my head, but my affections are with Mr. Ciardi.

I've got no dog in this fight, myself. And I agree that lagniappe is a good example of what the AHD calls "the rich Creole dialect mixture of New Orleans". But judging from his web site, Mr. Popik seems to be so skilled at hiding his light under a series of bushels that Mr. Ciardi might well have remained unaware of the attempt at correction.

More significantly, there is in fact no real contradiction in (this aspect of) the two accounts, since Popik's claim is only that New Orleans stable hands used the term "big apple" for New York around 1920, and that John J. Fitz Gerald publicized this usage in a series of sports columns in the New York Morning Telegraph during the 1920s. Popik has nothing to say (unless it's hidden in a corner of his web site that I didn't explore yet) about where the stable hands' usage came from.

And everyone seems to agree that the (mythical?) Mlle. de Saint-Évremond was not involved.]

Posted by Mark Liberman at 08:10 PM

Woad House

Linus Gelber at Pepper of the Earth has a hilarious review of King Arthur ("The Untold True Story that Inspired the Legend"), starting with this:

The Steppes of Sarmatia. Fires, huts, mud.

Lookout: Look out!
Elders of Sarmatia: What’s wrong?
Lookout: The Romans are coming!
Elders: What about it?
Lookout: They are coming as they do every 15 years to reap their bounty under treaty, to take our children away to the cold north where they will train them as cavalry riders and force them to patrol dangerous lands near Hadrian’s Wall! Shall we rally to arms and stop them?
Elders: No.
Lookout: But why not?
Elders: For one thing, our tribe disappeared over 200 years ago.

Among many other good parts, there's an extended consideration of the Woad Problem:

Woads: Woooooo!
Arthur: Woads!
Knights: What?
Soldiers: Whoa!
Guy Dressed Like Bishop Germanius: Ow!

The Knights slaughter the ~~Picts~~ Woads. Arthur interrogates a survivor at swordpoint.

Woad Survivor: …and really it’s a complete misnomer. Everyone believes the Picts painted themselves blue and tattooed themselves with hallucinogenic dye from the woad plant, but there’s very little evidence in the historical record. Woad isn’t psychotropic, for one thing, and it doesn’t work as a tattoo dye. The only eyewitness account of naked blue warriors is in Caesar’s The Conquest of Gaul, and that justifies both Caesar’s losses in battle and the further commitment of forces to the Gallic campaign, so he needed to make the Picts sound terrifying and fearsome, which makes the text highly suspect.
Arthur: Sort of a Weapons of Mass Destruction thing.
Woad: Exactly.
Arthur: So that kind of messes up Braveheart too, doesn’t it?
Woad: Look, did you see The Passion of Christ? That man wouldn’t know history if it sat on his –

It seems that there's a general tendency to replace old but historically inaccurate myths and legends with new but historically inaccurate myths and legends.

Posted by Mark Liberman at 06:19 PM

Operatic IPA

Dorothea Salo at Caveat Lector wonders "where [Renée] Fleming learned IPA".

Dorothea starts with description of choristers being forced to sing from a Lord of the Rings score in which "[f]or example, Osgiliath is written 'awss-ghee-lee-ahth'". She commiserates with David Salo, who owns up with a groan to having been forced to provide such pseudo-phonetics, but also tells another story:

Toward the end of the whole long Return of the King scoring ordeal, David got a request for a translation to be sung by Renée Fleming. And this time, please, could we have the transcription in IPA?
“Whoa!” I whoaed. “You’re writing lyrics for Renée Fleming? And she wants ‘em in IPA? Wow. That’s just cool.”

“Who’s Renée Fleming?”

“Argh! Philistine! Only, like, über-diva. And she knows IPA, too! Points for her. And you’re writing lyrics for her! Squee!” And I went on squee-ing until he believed me about her über-diva-dom.

So there it is. Singers are not brainless prats; they can handle smart transcriptions. I do wonder where Ms. Fleming learned IPA, though. Somehow I don’t think they teach it in music schools.

Well, IPA (the International Phonetic Alphabet) seems to be the standard way to represent operatic libretti for singers to use: see for example this discussion of the Layerle Three-line Phonetic Translation System.

The now famous Leyerle Three and four-line Phonetic-Translation System consists of the International Phonetic Alphabet spelling of the foreign language text on the first line, the original foreign language on the middle line, and the word-for-word English translation on the third line. When further explication of an otherwise difficult to understand passage in the word-for-word translation is needed, a fourth line, presented in a more literary translation, is given.

According to her web site, Fleming studied at SUNY Potsdam, Eastman and Julliard. I don't know for sure, but I bet that she learned IPA as a standard part of her training, and that most other operatic singers do, too.

[Update: Eric Bakovic testifies to have once witnessed IPA being taught at a drama class at NYU's Tisch School of the Arts.]

Posted by Mark Liberman at 01:09 PM

Causing us the game

Russell Lee-Goldman (Kantol) at Every Way but One documents a conversion experience. The scales fell from his eyes when he caught himself saying something like "That was what caused us the game" (in place of the standard "cost us the game"), and found that the same (mis)analysis has occurred to others, for instance:

(link) Although the Lady Broncos scored one run in the first inning, they left bases loaded coming out of it, which could have caused them the game if it were not for many plays falling their way in the later innings.

As he points out, the re-analysis retains more phonetic similarity in the past tense, and so you can even find examples like

(link) Tim Duncan did not get the ball as much as Pop wanted him too, and I believe it caused the Spurs the game, and it will eventually cost them the series.

Amazing.

However, there are definitely future-tense and other bare-form versions of this eggcorn around:

(link) Being over confident can also cause you the game.
(link) And this movie will cause them the election and that's why this morning you hear so much chatter about cancelling the elections.
(link) I don't think we should elect judges because then they will constantly worry that their choices will cause them the next election.

There are even -ing forms:

(link) Honorable Mentions: The two guys who took the elevator from the 7th floor to the 6th, Howard from Wake Forest for calling a timeout he didnt have causing them the game, and finally Jason Williams for choking against an unranked team and causing them their only chance at co-owning the ACC championship.

Russell's change of heart? He was converted to the view that eggcorns are interesting enough to blog about. At least once.

Posted by Mark Liberman at 09:18 AM

Territory of Information

Russell Lee-Goldman (Kantol) at Every Way but One has an interesting post about the theory of Territory of Information (ToI). This set of ideas is new to me. Kantol attributes it to "Toshio Akio", but the author of this book on the subject is "Akio Kamio". Cultural differences in surname order aside, is "Toshio" an alternative name or a typo? Kantol does start the post by saying that" I've got a few minutes before I pass out from exhaustion (it's 1:41 in the morning)".

According to the John Benjamins site for the book, Kamio's theory "attempts to demonstrate the key function of the concept of territory in the informational structure and syntax of natural language. It offers an analysis of English, Japanese, and Chinese in terms of territory and shows its fundamental importance in the interface of information and syntax in these languages."

Kantol's blog post is much more informative about the theory than the book blurb is. The idea is that "whenever someone speaks, they base the way they speak on the answers to two major questions: (1) is the information I'm about to say in my own territory? and (2) does my interlocutor believe that the information I'm about to say is in their territory? Various combinations yield various grammatical patterns.." Kantol explains that Kamio "uses quantitative analysis to predict" outcomes, and "presents several linguistic problems that seem to be unrelated to the ToI theory, but in fact can (according to him) be solved using it. One is the difference between shiru and wakaru in Japanese, both of which seem to correspond to English know (or understand)".

This stuff sounds interesting, and I'm also interested in the fact that I've never heard of it before. I'm not a specialist in discourse analysis or the structure of conversation, but I'm interested in things like that. A Google search for {Kamio "territory of information"} turns up a few things, including a couple of journal articles (e.g. Akio Kamio, "English generic we, you, and they: an analysis in terms of territory of information", Journal of Pragmatics 33 (7) (2001) pp. 1111-1124), the abstract for a colloquium at Stanford, and a memorial page at Elsevier informing me that "Professor Akio Kamio, of Dokkyo University, Greater Tokyo, departed this life on Sunday, February 24, 2004, jointly with his wife Noriko."

Querying CiteSeer about Akio Kamio turned up only this Colorado tech report, which is interesting in its own right, but does not seem to make use of the ToI theory or even cite it in the bibliography.

So I conclude that there has not been a lot of uptake, anyhow in English-language sources, for the ToI idea. That might be because it doesn't really work very well, at least in application to things that people are working on, but it also might be because most people (like me) don't know about it. Either way, I'm glad that Russell didn't just go to sleep a little earlier on July 16.

[Update: Russell informs me by email that indeed he mistyped Kamio Akio's name, probably due to the influence of Ohori Toshio, whose book he's been reading lately. I do that kind of thing all the time -- I'm still grateful for the tip.]

Posted by Mark Liberman at 08:55 AM

Is the last not to be seen also the last to be seen?

A few weeks ago, entangledbank noted a sentence about Alice Springs in Australia, which is apparently an addition to the set of cases where "one can add or remove a negation without change of meaning", as Chris Potts put it in his post on the subject.

In 1862 the Scot explorer John McDouall Stuart was the first European to travel to the area, one of the last parts of the world not to have been seen with European eyes.

But it seems to me -- though I'm strictly an amateur in semantics -- that there's an implication in one direction, but not in the other. Thus if a particular chocolate was the last chocolate in the box to be eaten, it was necessarily also the last chocolate in the box not to be eaten. However, if a chocolate is the last one not to be eaten, it remains to be seen whether it will be eaten or not.

When the negative version of such a sentence is in the past tense, it's natural to infer that the negative state has since changed to positive. Otherwise why use the past tense? But the implication seems to be cancellable: "the Crunchy Frog was the last chocolate in the box not to be eaten... and in fact it has remained uneaten up to the present moment."

Posted by Mark Liberman at 08:11 AM

July 19, 2004

No respite from the misapostrophers

Julie at No Fancy Name recently mused that

One of my biggest pet peeves is the misuse of "you're" vs "your", "it's" vs "its", "they're" vs "their" and so forth. But you know what? EVERYONE knows how to properly spell "d'oh". Hm.

Nope. And the form without the apostrophe is preferred by the OED, which also gives apostrophe-deficient citations back to 1952. Sorry, Julie.

Posted by Mark Liberman at 10:12 PM

More on Chekhovian relevance in journalism

Earlier today I complained that the NYT shouldn't "put a foreign-language sign on the wall in a picture of an American municipal office, if the story is not going to comment on it", citing Chekhov's law of literary firearms. Nick Allott at Meaning and Thinking has some thoughts on how "Chekhov's law might be derivable from the communicative principle of relevance with some extra assumptions".

Nick's musings are about the application of Chekhov's principle in literature -- he questions whether I'm right to apply the principle to journalism:

... it seems to me that it can't be reasonable for journalists to remove real details that fail to conform with our (stereotypical) expectations: life is richer in details than we expect, perhaps, and I don't think we need to be protected from that.

There is a limited sense in which this is unavoidably true. Any picture of the 311 call center will be rich in details, most of which are irrelevant. A pictured operator, for example, must have some age, some sex, with some details of hair and clothing, and normally we don't expect the story to explain these details or even to mention them.

But in a broader sense, I don't agree. If the call center operator had been dressed in 18th-century costume -- say in a picture taken at a July 4th fancy dress office party -- we'd expect the story to feature the reason for the costume. If the costume doesn't figure in the story, then the photo editor should have picked a different picture. To use such a picture without discussing it is a violation of the journalistic version of Chekhov's law, because the costume is way outside our expectations about what a call center operator should look like.

A Yiddish sign on the call center wall, illustrating a story about helping "thousands of visitors to the Republican National Convention next month navigate the city", is similarly incongruous.

In this case, the photo editor almost certainly had dozens if not hundreds of pictures to choose from. I can assume this from the usual practices of newspapers, but I can also infer this from the image file name, which is hotline.184.1.jpg. In the image that the editor actually chose, the Yiddish sign is the most visually salient component. I suspect that eye-tracking analysis would show that most viewers look at the sign quicker and more frequently than at the operator's face. And there's not much else to look at in the picture -- just the backs of two Dell flat-panel displays, and a standard acoustic-tile ceiling.

I'd guess in this case that the story originally mentioned the 170 languages, and maybe even said something about Yiddish, but lost these aspects in the editorial process.

Either that, or some bored photo editor was exercising a bit more creativity than is normal at the Times.

Posted by Mark Liberman at 09:10 PM

The police blotter watch

Mark Liberman picks up on my posting on hedges in the Palo Alto Daily News police blotter with a celebration of the Arcata Eye blotter, which eschews bare-bones reporting in favor of (wonderfully) rich narrative. I'd just like to make sure that the entertainment value of the Arcata Eye material doesn't overwhelm the point I was making, which was that it makes a big difference what doesn't get said and what does, and in the latter case, how it gets said -- a point that Mark reinforces, with respect to what gets shown, rather than said, in his "Emergency call for the pragmatics police".

For an extended riff on this theme, check out the handout (in a .pdf version) for a paper I gave at the 2002 Stanford Semantics Fest, "What is said and what is unsaid". It's about the Palo Alto Daily News police blotter again, this time focusing on the what the Atherton CA blotter reports about where (alleged) miscreants come from. Atherton is a fabulously high-end rich suburban enclave that sees itself as rural (to the point of welcoming neither sidewalks nor commercial establishments). I interpret the police blotter reports as witnessing to a strong impulse towards privatizing public spaces (at two levels: carving out Atherton as a place where outsiders are not welcome, and according residents unofficial authority over the space around the house they live in) and as both reflecting and reinforcing a sense of threat from the outside world. The presumption seems to be that privilege begets further privilege; meanwhile, privilege begets paranoia.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 02:52 PM

Emergency call for the pragmatics police

There's a story today in the New York Times about a planned " major expansion of the city's information hot line, 311, ... undertaken just in time to help thousands of visitors to the Republican National Convention next month navigate the city by simply picking up a phone". Terrific, but can somebody tell me why the picture that runs with the story -- at least in the online edition -- shows a sign on the wall in Yiddish?

There must be a journalistic variant of the famous Chekhovian law of relevance for suggestive details in literature, two versions of which are:

"One must not put a loaded rifle on the stage if no one is thinking of firing it." ---Letter to A. S. Lazarev-Gruzinsky, Nov. 1, 1889.

"If you say in the first chapter that there is a rifle hanging on the wall, in the second or third chapter it must absolutely go off. If it's not going to be fired, it shouldn't be hanging there." --- from the Memoirs of Shchukin (1911)

Let me propose a journalistic lemma: one must not put a foreign-language sign on the wall in a picture of an American municipal office, if the story is not going to comment on it. If it's not going to be mentioned, it shouldn't be hanging there.

I can guess that the information hotline is multi-lingual -- as it should be -- and that there are signs on the wall in many languages, and that this picture happened to capture one of them. And indeed the caption under another picture says that" New York City's 311 hot line operates 24 hours a day, in 170 languages". However, the story doesn't mention anything about languages, and the other picture caption doesn't tell us anything about the featured sign. Really, Timespeople, read your Chekhov if not your Grice.

[Note: I originally identified the sign as being in Hebrew -- but although it's in Hebrew letters, the language is Yiddish, and it says "Call 311", as reader Geoff Nathan emailed to point out. He adds that "Yiddish is, of course, one of the widely spoken languages of New York ([which is] probably one of the few places on earth where the number of native speakers is actually increasing)". This reinforces my point, since it's interesting to know that there are apparently a significant number of people calling for information in NYC who are more comfortable in Yiddish than in English. Or are there? the story doesn't tell us.]

[Next question: what are the 170 languages? And do they have employees to take care of all them, or do they contract it out to a company like Language Line -- which only advertises services in 150 languages?

Language Line's 150 languages are given here (in an uneditable MSWord .doc file, for shame! Do they think that their competitors are incapable of retyping the list, or what?) The NYC 311 web site repeats the claim of 170 languages, but if there's a list somewhere, I can't find it.

]

Posted by Mark Liberman at 01:09 PM

DARPA dissed again

Dan Gilmore wrote yesterday about some machine translation (MT) and automatic speech recognition (ASR) technology that's been developed with funding from the DARPA TIDES and EARS programs.

There's been some progress, both in the underlying technology and in its applications, since my two posts on DARPA MT research almost a year ago. But one thing that hasn't changed is this: just like Chris Farah's NYT Circuits story from 7/31/2003, Dan Gilmore's SiliconValley.com eJournal column from 7/18/2004 doesn't mention DARPA.

(D)ARPA is the (Defense) Advanced Research Projects Agency -- it's gained and lost the D from its name and acronym a couple of times over the decades. It got some cred in the past by naming the internet's ancestor the arpanet. To get any kudos in the press for contributions to MT and ASR research, does DARPA need to somehow put its name in a catchy way on the technologies it funds?

Also missing from both last year's NYT and this year's SiliconValley.com: examples of the current quality of ASR, Arabic/English MT, and their combination; and a sense of the trajectory of these technologies in speed, capability and quality. I'm not faulting Gilmore -- his column is more of a "looky here" kind of thing, with limited space and perhaps an audience with little patience for technical details. But there's a story here for someone.

Posted by Mark Liberman at 12:36 PM

Eggcorn dynamics

Following up on Chris Weigl's lovely exploration of the airwaves → airways eggcorn, Ray Girvan discusses some examples of the inverse variety, airways → airwaves. After reading Chris' post, I looked for these but didn't find them, partly because I didn't look hard enough, but mostly because I was looking for them in text about the airline industry. Ray found his examples in texts about asthma and the like.

Chris hypothesized that "a great many of the google hits seem to indicate that those who write airways believe that TV signals travel along something like pipes. And this shapes the way they understand the words they use."

This general idea is clearly true -- common eggcorns -- including the original "egg corn" for acorn, and for that matter the etymology of acorn itself -- always have a motivation in meaning as well as in sound. And I think that Chris makes a persuasive case for the relevance of something like Lakoff's Communication as Conduit metaphor in the airwaves → airways examples.

So what about Ray's airways → airwaves examples?

As Ray says, "It's hard to tell why the misnomer developed in this direction". He offers a couple of ideas: resonance from the Airwaves newsletter (brought to you by Combivent and Atrovent inhalation aerosols); or Wrigley's Airwaves gum, "The Kick that Helps you Breathe Free").

Another idea is suggested by one of his examples, from a TV website (for GMTV, "Europe's Biggest Breakfast Show"), where "Dr. Hilary" answers the question "What is asthma?" The good doctor uses airways correctly in the body of the article -- it's a headline writer who has substituted airwaves, in the subhed "Sensitive or inflamed airwaves can be caused by pollution and smoking".

It makes sense for an editor at a TV website to have airwaves well primed and ready to jump out.

This press release from NCSU tells us that "Normally, when allergic mice are exposed to an allergen, their airwaves swell and mucus production increases dramatically. Treatment with the anti-mucus molecule prevented this mucus build-up." Again, these sentences were written by a PR person whose whole job is focused on getting NCSU discoveries onto the airwaves.

Well, it's too easy to give these post hoc explanations, as convincing as they may be in some cases. But at least in principle, there should be a genuinely scientific way to approach the question. More important, there might be some real value in modeling the relative frequency across contexts of substitutions of this sort, just as psycholinguists have learned a lot from modeling corpora of speech errors such as spoonerisms.

Independent variables would include similarity in sound and/or spelling; metaphorical resonance; and pragmatic association. The dependent variable would be the frequency of a given substitution in a given context or class of contexts. This is in principle a much better situation than analysis of speech-error corpora, since we can control for the relative frequency of the error-free cases.

In the example under discussion, I'm pretty sure that the airwaves → airways substitutions are much more frequent than the airways → airwaves ones; and the airways → airwaves ones seem (almost) always to occur in talking about the bronchial tubes rather than about the airline industry, and mostly in contexts where the writer is focused on media and/or publicity. That's all pretty vague, but that's because I'm not taking the time to frame the contexts and do the counts.

An even more interesting sort of research should soon be possible, if it isn't already, namely the study of eggcorn dynamics. Given a snapshot of substitution frequencies, it might be possible to make predictions (or at least retrodictions) about changes over time. . My guess is that you'd need about 20 years of data to be able to see significant changes with any degree of clarity, and we don't have that yet, at least on the scale required. We'd be trying to track frequency ratios of events with frequencies between about 100 and 2,000 whG/bp, and even at the high end of that scale, a corpus of a mere few million words is not going to give us any useful information.

For example, the lexicographers at OUP apparently say that "diffuse" for "defuse" has become (one of?) the commonest substitutions (or what the Guardian calls "word crimes").

We can check the frame "___ the crisis" and see 6,350 hits for defuse vs. 773 for diffuse. That's enough for us to be fairly confident in the value of the ratio (8.2) for this snapshot. We can compare the frame "___ the bomb" and see 7,760 hits for defuse vs. 800 for diffuse, a slightly (but probably significantly) higher ratio of about 9.7. In contrast, in the frame "___ the light" we see 3,890 hits for diffuse, vs. 89 for the substitution defuse, for a ratio of 43.7. In the frame "___ the Gospel" we get 149 hits for diffuse vs. 3 for defuse, for a ratio of 49.7.

So it's clear already that defuse → diffuse is much commoner than diffuse → defuse -- apparently about 5 times commoner. (To do this sort of thing right, you'd have to check for false positives, either exhaustively or by sampling methods, but the basic results are not going to change in this case).

However, in terms of brute frequency, all these patterns are still pretty rare. Google currently indexes 4,285,199,774 pages, so the 6,350 hits for "defuse the crisis" is about 1,482 whG/bp ("web hits on Google per billion pages"). I don't know what the word count of an average Google page is, but if it's as little as 500 words, this comes out to roughly 3 occurrences of "defuse the crisis" per billion words of indexed text.

And "diffuse the crisis" weighs in at 773/4.2852 = 180 whG/bp, which is probably less than 1 hit per billion words, while "defuse the light" is only 89/4.2852 = 21 whG/bp, which is roughly 4 hits per 100 billion words.

So you can see that conventional corpora -- even large ones like the hundred-million-word BNC -- are not going to work for this kind of research.

Posted by Mark Liberman at 09:17 AM

July 18, 2004

The police blotter at the Arcata Eye

Following up on Arnold Zwicky's discussion of police blotter hedges, it occurs to me that some readers may not be familiar with the police blotter at the Arcata Eye. This journal serves Arcata, CA, some 300 miles north of Arnold's location, and its police blotter has been a popular enough web fixture for the past few years that a sample from the archives has been published in book form.

Some of the entries could be from anywhere:

Friday, June 25 12:43 p.m. Police performed a background check for the Riverside County Sheriff’s Office.

Others describe routine incidents, but with a certain style (date and time stamps removed from now on):

A man said that as he turned onto Buckley Road, a bearded man waved his fist and scared him.

Following numerous complaints, an I Street resident was asked to have houseguests do less yelling.

A driver really didn’t need a "glass smoking pipe" to navigate Blue Lake Boulevard, so it was taken away from him.

A K Street dog demonstrated excessive arfistry.

When you or I drive down Foster Avenue with an alleged pound of dope in our car, we normally try not to attract a traffic officer's attention. And if Officer Gatty were to pull us over whilst we was totin' that ell-bee, we'd surely brush any telltale weed vestiges off the passenger-side seat before he walked up and greeted us, right? But not this driver, who wound up popped and potless in the Pink House.

Travelers blocked the H Street sidewalk, again, and a businessperson said she was tired of calling about them every day. The clumpage of fuzzies motated away, but soon another call came in from a nearby hardware store, reporting the group having alighted in an adjacent carport. And on they moved again.

A Valley West argument ended when the guy who’d hurt his arm punching the wall decided he’d take a walk and "cool off."

For hours the man in a cowboy hat and shades sat in his yellow pickup truck outside a 12th Street home, staring. He left before police arrived.

A man sat with a dog four to six feet from one of the signs that says "NO DOGS" on the Plaza. He claimed an officer said he could sit there and dog up the place, but a City ranger said he’d warned the man to remove his dog a half-hour earlier. He was cited, while the dog’s uncomprehending face glowed with unconditional love for all concerned.

Sometimes the stylization advances to the point where the content is obscured, at least for those who are not regular readers:

For the second weekend in a row, a 911 call came in from a Railroad Avenue facility maintained by a local world-renowned school of physical theatre. But all the clown larvae are gone for the summer and when police arrived, no one was there, just like last time.

Among its other claims to fame, Arcata is the site of Humboldt University, which is not named for a linguist (though it is indirectly named for the brother of a linguist).

For many netizens, including me, Arcata is best known as the home of Grady Ward, author of the Moby lexicon project, who was sued by Scientology over their secret scriptures.

Grady's website, www.gradyward.com (IP 204.62.145.186), seems to be off line -- does anyone know what he's up to these days?

Posted by Mark Liberman at 04:46 PM

Teacher were God Job!

Badaunt at lexical symbolic constructions gives some cute quotes from her Japanese students' course feedback questionnaires:

I was very fun.
This class is much of laugh. This thing is very happy thing.
Teacher were God Job!
Let will be fine.

She discusses two students in particular, who wrote

I could learn that English is joyful. I love this course and I love you.

and

Are you marry? I want you. I need you. I want to meet you everyday.

and says about them

These are both students I have given As. They deserved it. They worked hard. But they didn't start out that way. It just sort of... happened, in the third or fourth week. Something clicked and they really got into their study of spoken English, and made a lot of progress, especially considering that we only had about 13 weeks, one class a week. And because they'd never enjoyed English before (they both said that, on the survey) they think it's me. But all I did was arrange these big classes into pairs or groups and tell them to talk to each other. They did the rest themselves.

They think they have fallen in love with their teacher, when in fact they have fallen in love with learning.

In this case, as her other student put it, it seems that the teacher really did a "God job".

[Update: in a later post referencing this one, badaunt discusses her blogonym, her interesting reasons for preferring anonymity -- which in any case requires no justification, in my opinion -- and her initial confusion and panic over the "flurry of visitors" that my two posts sent to read her blog, for which I apologize.]

Posted by Mark Liberman at 02:59 PM

A terminological rant

There aren't a great many grammatical terms that have made it into (more or less) common usage, but "collective noun" is one. The collective nouns of English include: "group", "committee", "troupe", and "pride" [of lions]. Semantically, they refer to collections of individuals. Grammatically, they are are count nouns, pluralizable as: "groups", "committees", "troupes", "prides" [of lions], etc. Nevertheless, in the singular, they share some characteristics with plurals: for instance, they occur with predeterminer "all" ("all the committee", like "all the children"; cf. ??"all the child"), and they can serve as antecedents for plural pronouns ("The committee perjured themselves", like "The children perjured themselves"; cf. ??"The child perjured themselves"). We need a term that distinguishes (at least) two types of count nouns, and "collective noun" is a really wonderful name for one of them. It's not only a good label, it's a well-established one.

So it's appalling to read Bill Walsh (in his new book The Elephants of Style) referring to mass nouns as "collective nouns", and to read Bill Safire (in his "On Language" column of 7/11/04 in the New York Times Magazine, p. 16) aping Walsh in a review of The Elephants of Style. This is just wrong. Especially for people who propose to educate general readers about grammar. How could they treat grammatical terminology in this ignorantly ham-fisted fashion?

Walsh, as quoted by Safire, holds that "it's time to pull the plug and acknowledge that data is a collective noun, like information." As for the word media, Safire tells us that Walsh "holds that it is usually used by people as a collective singular" but parts company with Walsh on this one, saying that he'll "stick with media are".

What these two would-be grammar gurus are talking about here is mass nouns, not collective nouns. To refresh your memory, mass nouns in English include: "information", "rice", "mail", and "pride" (as in "much pride", with no involvement of lions). Semantically, they tend to refer to stuff rather than individuatible things (though there are well-known anomalies here). Grammatically, except in special uses they aren't comfortable with individuating determiners like "each" and "every", are comfortable with the determiner "much", and resist pluralization: *"informations", *"rices", *"mails", *"prides" (as in *"many [non-leonine] prides"), etc. This is baby-level English-structure stuff -- not something like the that-trace effect, or ergativity in verbs, say -- and very long-standard terminology.

I'm willing to believe that Walsh is simply ignorant of rudimentary terminology. He presents himself as a practical, down-to-earth professional writer sharing his experience and opinions with us, and he seems to be devising his terminology as he goes along. "Collective noun" is not such a bad label for mass nouns, since the reference of mass nouns could be viewed (at one level) as a collection of individuals: pieces of information, pieces of mail, grains of rice. Unfortunately, the label's already taken -- what, I wonder, does Walsh call real collective nouns? -- and there's another label already in use, so he's just sowing confusion with his idiosyncratic terminology. (Unless, of course, he supposes that his readers will be just like him, and won't look at other sources of information about English grammar and usage.)

Safire really has no such excuse. I'm sure he knows about the "mass"/"count" terminology. But rather than alter Walsh's usage, he repeats it, probably out of a desire not to burden his readers with technicalities of linguistics. This is an insult to the intelligence of his readers. Surely they could be expected to learn this easy-to-grasp distinction and appreciate why the standard labels are pretty good ones. (No labels are perfect.)

Shame on you, Bill Safire! Shame, shame!

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 02:37 PM

Blognomens and blognomenclature

Someone whom it might be appropriate to refer to as "badaunt at lexical symbolic constructions" wrote something that I'm going to tell you about in my next post.

But I'm not quite sure how to refer to this author. So first, a digression about blognomens.

The weblog post that I'm going to write about is here, and we're trying to figure out how I should refer to the author and the weblog.

Digressing from this digression, let me point out that there's an additional problem about time: if you're reading this in late July of 2004, the link will probably work. As the calendar continues its advance, the chances diminish. If you're a time traveler from July of 2005, I reckon you've got about a 75% chance -- at least, of the 12 links that I checked in Language Log posts from July 2003, 8 are now good, a year later, and 4 are dead.

Anyhow, as of July 18, 2004, we can find a weblog and weblog author with the following coordinates:

Posting ID: "badaunt".
Index page <title>: "tBLOG - lexical symbolic constructions".
Banner text: "weblog".
Banner picture: second-growth deciduous forest in winter
Basic URL: http://www.tblog.com/templates/index.php?bid=badaunt
...or: http://badaunt.tblog.com/
URL of post: http://www.tblog.com/templates/index.php?bid=badaunt&static=229739

As far as I can tell, "badaunt" is the only person who posts to this particular blog. The name (assuming it's "bad aunt" and not "B.A. Daunt" or something), the references to the author's life companion as "the Man", and some other clues suggest that badaunt is probably female. Other references make it clear that she is teaching English in Japan. There is no "about me" page associated with this blog.

There seem to be two main emerging conventions for blognomenclature. One is "X at Y", where X is a personal name (which can be a complete name, a pseudonym, a first name or nickname, etc.), and Y is the name of the blog. Thus

Rivka at Respectful of Otters
Neal Whitman at Literal Minded
Claire at Anggarrgoon
Iggy at Blogalization

and so on. The other possibility is to use just X or Y by itself. The personal name is often enough, if the association with the weblog is well known to the explicit audience: Glen Reynolds, Brian Weatherson, Margaret Marks, Cory Doctorow.

The weblog name alone also sometimes works: Language Hat, the Enigmatic Mermaid. But for me, at least, the blog's name is not enough if it doesn't have the form of an English name or a definite description. So I find it hard to write something like "Respectful of Otters pointed out that..." I prefer "Rivka at Respectful of Otters pointed out that..." Sometimes there's no choice, because the weblog's author doesn't provide any identifier other than the weblog name -- this is the case for Semantic Compositions (who also refers to himself in the third person as "SC", but "SC at Semantic Compositions" would be kind of silly).

The blog name along also doesn't work well when it's a group blog.

There's a long tail of subregularities and special cases -- in language, that's the norm -- but to make a long story short, I usually prefer the "X at Y" form myself, at least for the initial reference. Then the usual mixture of pronouns, definite descriptions, abbreviations and so on takes over.

So I think that the right way to talk about the weblog post currently referenced by http://www.tblog.com/templates/index.php?bid=badaunt&static=229739 is to say "Badaunt at lexical symbolic constructions gives some cute quotes from her Japanese students' course feedback questionnaires..."

By the way, at the moment "blognomen" only occurs once in Google's index, as a nonce "metasyntactic variable" in this tutorial page for the radio blogging tools:

<%siteName%> blognomen
<%description%> (placeholders)

and "blognomenclature" doesn't turn up at all.

So I hereby suggest that blognomen should be used as a term for the "trevor at kaleboel" style of blogger reference. Blognomenclature might be used to refer to the study of such references and their alternatives.

Blogonym is already in pretty wide use for the pseudonyms or nicknames chosen by bloggers: "Jane Galt", "Wonkette", "Kos", "Dr. Weevil", etc. But that's a different thing.

Posted by Mark Liberman at 02:23 PM

Journalists' alleged hedges

In the U.S., land of innocent-until-proven-guilty, both police and journalists describing infractions of the law are famously cautious in the way they attribute responsibility; "allegedly" and "reportedly" and similar hedges are peppered though their reports, sometimes hilariously.

And now, in the Police Blotter section of the Palo Alto Daily News -- I am a great fan -- on 7/17/04 (p. 17), I see exactly two items for Mountain View, both brief, one hedged ("A car was reportedly burglarized.") , the other not ("A car was stolen."). What's up here?

A long time ago, Chuck Fillmore noted a newspaper report in which someone was "charged with allegedly" committing some infraction. Chuck observed wryly that if that's what the fellow was charged with, then he was surely guilty.

In a somewhat different vein: I've seen numerous reports of some potentially felonious event, like an assault or a drive-by shooting, in which it is said that "the alleged perpetrator/assailant fled the scene". We're talking about an unidentified -- in fact, for the moment, unidentfiable -- person here, so it's not that anyone's rights are being protected. The hedge is just cautious icing on the journalistic cake. But it does have the side effect of inducing some doubt on the reader's part that the event took place as described: there's a whole lot of alleging going on, after all.

Which seems to be the effect of the variable use of "reportedly" in the Mountain View blotter items. I doubt that either the cops or the PADN intended this subtlety, but there it is.

At least in this particular issue of the PADN, some other cities and towns are more consistent in the way they hedge. The city of Menlo Park has three items, all with "reportedly": "Tools were reportedly taken from a car." "A bike was reportedly stolen." "A car stereo was reportedly taken." These are literally reports, what citizens told the police. The police didn't observe the events, but were told about them.

In the town of Atherton, there were some situations the police actually observed and others they were told about, and the blotter consistently distinguishes these. Of the first type: "A resident needed help getting a bird out of the house." "A man in his fifties was sitting in a silver Mercedes Benz ML with a teenage girl..." (The police talked to the man. Note the inclusion of narratively irrelevant but nevertheless fascinating details -- a hallmark of the Atherton blotter style.) "Three teens were walking up and down the street, looking lost..." (The police talked to the teens, who hailed from Arkansas and had gotten off the Caltrain at the wrong station.) Of the second type: "A resident reported what sounded like an accident..." "A resident reported that their dog had possibly injured a baby raccoon."

But for Palo Alto and Los Altos, as for Mountain View, the blotter items are not so scrupulous. In Palo Alto, a color printer "was taken", but "Thieves reportedly stole two couches" and "A... woman was arrested for allegedly shoplifting at Nordstroms" and "A burglar allegedly smashed a car window and stole a laptop computer, cell phone and pens." In all four cases, the cops didn't witness the events (though in the last one, they got to view consequences of the event, namely the smashed car window); nevertheless, the first report isn't hedged. In three other cases, the police didn't see the original event but did get to see the consequences, and these reports aren't hedged: a man and a woman "got into a non-injury accident" and so did two women, and "A trash dumpster filled with cardboard was lit on fire..." Meanwhile, over in Los Altos, "A car was burglarized" (no hedge), "A man was allegedly sitting in a... car exposing himself" (hedged, and rightly so: by the time the cops got there, the alleged flasher was gone), and "Someone was allegedly drunk in public" (hedged, but we're not told whether that's because the cops didn't see this person, or because drunkenness hadn't been established by the time the item was submitted for publication, or because the police reporter was just being cautious).

The bottom line: Police and journalists are, I allege, often inconsistent in their hedges. If they used hedges randomly, then readers could just ignore them. But there's enough of a pattern -- and one that makes sense -- that it's hard for a careful reader not to read unintended inferences into these reports. Caveat lector.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 01:16 PM

July 17, 2004

Yiteraw Minded

After a period of guest-blogging on Glen Whitman's Agoraphilia, Neal Whitman has started his own weblog Literal Minded.

I especially like the series of posts on his 3-year-old son's developing mastery of English pronunciation. For example, he describes (and explains) how Doug pronounced /l/ sometimes as [j] and sometimes [w]. (That's IPA [j], as in yet, not English "j" as in jet.) Example: "yawipop" for lollipop. That would be ['jawiˌpap] in IPA.

(Well, I guess technically the "a" vowels should probably be [ɐ] "turned a". Leave it to the IPA to reserve the usual print glyph for "a" for a cardinal vowel sound that is hardly ever used in any language. But that's a rant for another day.)

Posted by Mark Liberman at 02:54 PM

Fly Serendipity Airwaves

Chris Waigl at ˌser.ən'dɪp.ɪ.ti has discovered, explored, documented and explained the eggcorn "airways" for airwaves.

Her first sighting of the substitution was in a Variety op-ed piece by Larry Lessig!

Chris takes us on a genuinely serendipitous tour. There's something for everyone, including an analysis of the metaphor involved, evidence that journalists make this substitution more often than the general public does, and a plausible idea about why. There's even a discussion of Maxwell's equations.

Posted by Mark Liberman at 02:10 PM

Lexical brickbats

Lynne Truss thinks that people who mix up homophones (like stationary for stationery) ought to have bricks thrown through their windows. Bricks with dictionaries attached.

I'm not sure whether that's better or worse than a plain brick, from the point of view of the recipient.

"John, they broke another window today. In the side room this time."
"Another Concise Oxford, I suppose?"
"No, this one was a Collins Cobuild."
"Same difference, we already have two of those. I was hoping for DARE P-Sk."
"Or Slayer Slang. That would be cool."
"No such luck. Well, put it up on ebay, and call the maintenance guy."

At least dictionaries are a step up from meat cleavers, Lynne's weapon of choice against those who misplace apostrophes.

After sweeping up the broken glass, those who want to return the favor might consider chucking her one of these.

Posted by Mark Liberman at 08:33 AM

Brazil

The NITLE blog census says that there are more than three times as many Portuguese-language weblogs as Spanish-language ones (though Portuguese recently slipped into third place as French moved up).

40% of Orkut members are now Brazilian (vs. 25% U.S.), and some people are unhappy about it (more from Reuters here).

There are 212,842 Fotolog participants from Brazil, compared to 1,731 from France and 32,233 from the USA.

Joao Paulo Lopes Batista from Olinda explains that he wound up playing basketball in the U.S. because "[o]ne of my best friends called Kalu Guasco ... was playing basketball for Western Nebraska Community College and we were chatting everyday via the internet, so I told him that I was stuck in Brazil and I didn’t know what to do".

Felipe Fonseca says (about Brazil) that

...if you walk in a public telecenter right now, you would find some people filling their resumes for some employment agency, one or two actually reading something, but most of them, say more than 60%, typing at webchats. People want to talk to each other.

Well, people everywhere want to talk to each other. But it seems that in Brazil, they're doing it to an unusual extent via various text and image-based digital services.

Posted by Mark Liberman at 07:52 AM

Research points to link between thinking, doing

A recent contribution from CNN to the venerable genre of headline humor.

Posted by Mark Liberman at 06:38 AM

English Readings for Chinese Characters

I was in Cambridge, England a few days ago and encountered an interesting spelling on a sign for the Cambridge Chinese Christian Church. The word Cambridge was spelled 劍橋. If we simply read this in Cantonese it comes out [kím kʰjū], which is not very much like Cambridge. The two characters mean "double-edged sword" and "bridge". The first character, 劍, is used for its sound; the second, 橋, is used for its meaning. [Update 2004/07/20: I initially thought that the second character was given its English pronounciation, so as to approximate English Cambridge, but Wolfgang Behr informs me that this is not the case. So this is not an example of a character being given an English reading but rather one in which the Chinese rendering is based on a calque of the English.]

Posted by Bill Poser at 12:11 AM

July 16, 2004

Most of the people in the world could care less

...and I often feel the same way. That's what Charles Bukowski wrote in 1986 about an argument with his "Ivy League friends". I'm beginning to feel like one of those people who answers a dinner-party question at a whole lot greater length than anyone at the table really wanted to hear, but I said I'd explain more about an alternative to Steve Pinker's "could care less is sarcastic" theory, so here goes. I promise that this is the last you'll hear from me on the subject, at least for a while.

Could care less probably lost its "not" by means of a process that John Lawler has called "negation by association". Perhaps the best known example is the history of negation in French.

Stage #1 (archaic)
je	ne	sais
I	not	know

Stage #2 (current formal)
je	ne	sais	pas
I	not	know	(a) step = at all →(second part of negation)

Stage #3 (current colloquial)
je	sais	pas
I	know	not

This process starts out with intensification of negation by providing a minimal object:

I won't move → I won't move an inch.
He hasn't eaten. → He hasn't eaten a bite.
They didn't drink. → They didn't drink a drop.
She doesn't owe you. → She doesn't owe you a red cent.

The general idea is that X won't VERB anything at all, not even a tiny little Y.

For this to work, the object has to evoke a scale of degrees of VERBing, and so many of these objects are really adverb-like measure phrases. An analogous process applies with minimal adverbs of various sorts:

I won't stop for an instant.
She won't put up with the tiniest slight.

A similar process of intensification can be accomplished by maximizing rather than minimizing, for example when the issue is the size of a time interval during which something is not true.

He hasn't visited in ages.

In some cases, the intensifier becomes a "negative polarity item", which strongly prefers to occur in negative contexts (or in questions and some other related places):

She doesn't have a red cent.
?She has a red cent.

I haven't seen them in ages.
?I've seen them in ages.

At this point, the intensifier is not longer a free agent, but has become a sort of contractual associate of the negation. Another reasonable step is for the negative-associated intensifier to learn to stand again on its own, expressing the negative meaning without needing the negative morpheme at all. This has happened in colloquial French with pas. It hasn't happened in American English with most negative polarity items, but it might some day. Sporadic individual developments in that direction are common among children -- a kid I know used to use anything to mean "nothing" -- "What do you want to drink?" "Anything."

The case of "could care less" is a bit more complicated. "Don't care" is intensified by the modal could in combination with the degree adverbial less:

I don't care.
I don't care even a tiny bit.
I couldn't care less (than I do). = "I care so little that there is no careable amount that is less".

The structure is more complicated, but the general method of intensification is essentially the same -- to insist on the minimality of the degree of caring.

The next step is then similar to the loss of ne in French ne...pas, except that the pattern in which the original negation can be lost involves the discontinuous pattern could...care less as well as a negation.

The process has been generalized to give with a variety of MSOs ("minimal scatological objects"):

I could give a {damn|shit|hoot|(flying) fuck|crap|rat's ass}

The MSO can be an elaborate nonce formation ("a gnat's left testicle"; "a fart in a tornado"; "a rat's hairy scrotum").

The corresponding couldn't forms also occur, though mostly less often. This is a case where neither could nor couldn't makes an especially transparent contribution to the meaning -- didn't/doesn't/don't are more straightforward, and also more common. Google has:

	could	couldn't	didn't
___ give a rat's ass	1,650	860	9,850
___ give a shit	4,930	4,860	10,800
___ give a flying fuck	760	568	827

The could/couldn't give a MSO forms seem to be modeled on the could/couldn't care less pattern. And the influence seems to go both ways, since forms like "I didn't care less" and "Why should I care less" seem to be modeled on the give a MSO pattern.

I'll end by reiterating that "could care less" has nothing to do with teenagers or even youth. It appeared in print more than 30 years ago in the Washington Post, and it's recently been used by John Kerry, George W. Bush -- and me. In the unmonitored speech of Americans of all regions, classes and ages, it's much more common than the original form "couldn't care less," and has been for at least ten years.

My current guess is that the ratio is about 5 to 1. As I pointed out earlier, "could care less" occurs in the Switchboard corpus and the American-transcribed portion of a (current, as yet unpublished) collection 16 times, to just one occurrence of "couldn't care less". A large part of the current collection was transcribed in New Zealand; in this portion, "could care less" occurs 37 times, versus 32 (alleged) instances of "couldn't care less". However, I've checked 8 of the 32 "couldn't care less" phrases, and 5 of the 8 are wrongly transcribed. No doubt the New Zealanders heard "couldn't care less" in those cases, but it's plain that the American said "could care less." Applying a similar correction to the rest of the set, we get 85% "could care less" overall in these conversations (73 vs. 13).

As one more example of could care less used by a non-teenager, here's more from Charles Bukowski's You Get So Alone At Times That It Just Makes Sense, published in 1986 when the author was 66:

well
you know the old saying: it's all a matter of
taste

and
either they're right and I'm wrong or I'm right and they're all
wrong
or
maybe it's some place in between.
most of the people in the world could care less
and
I often feel the same
way.

If could care less were sarcastic here, it should be equivalent -- pragmatically if not poetically -- to write something like

most of the people in the world care a lot
and
I often feel the same
way.

but I don't think it is.

Posted by Mark Liberman at 06:13 PM

(Auto)biography of a blog thread

As I said before, I am excited about the power of blogging. In the space of eight days -- the blink of an eye, judging by journal turn-around times -- Mark's initial skepticism about Steven Pinker's defense of people who say could care less has evolved into a strong argument (in three parts), pointing not only to a superior alternative analysis but also showing that Pinker was less than careful in his claims about (a) the age group using this phrase and (b) the prosodic difference between this phrase and the allegedly correct couldn't care less.

What's most interesting is that Mark didn't have to do very intricate and time-consuming research to make his case, and that by comparison Pinker seems to have done none at all. This made me simultaneously pleased that I had brought the subject up in the first place and ashamed that I had not been skeptical enough of Pinker's claims myself.

An aside, in my own defense: my original point was not to defend Pinker but to pummel Richard Lederer, as I later clarified. But I have to admit that I didn't seriously question Pinker's claims -- certainly not in the way that Mark did, and certainly not enough to bother testing them.

When I originally decided to start posting my criticisms of Lederer, I was motivated by a very simple feeling that I'm sure many linguists are familiar with. For me, this feeling starts with the fact that, aside from my innermost secrets, there is nothing in the world that I personally know more about than linguistics. So, when somebody makes a linguistic claim that I know to be wrong, I feel compelled to correct them. (This much I suppose I share with your typical prescriptivist.) The more outlandish the claim and/or the bigger the person's audience -- both true of Lederer -- the more I feel this compulsion.

I think that one of the causes of this compulsive behavior is the fact that I hold many beliefs and opinions for which I have largely insufficient evidence, yet they influence a lot of what I do (from how I vote to what music I listen to). Being the one area in which I am most confident in my beliefs and opinions, linguistics is where I am most likely to passionately defend what I think is right. But, as Mark has once again reminded me with his could-care-less approach, that confidence is and always should be based on careful (and skeptical) analysis.

(Note: that last link was chosen more or less randomly. Pretty much any of Mark's posts on Language Log would serve as an example of what I mean.)

[Update: In case you hadn't noticed, Mark added yet another post about the could care less issue, just three hours after this one.]

[ Comments? ]

Posted by Eric Bakovic at 03:18 PM

Speaking sarcastically?

Sarcasm is "a cutting, often ironic remark intended to wound", or "A form of wit that is marked by the use of sarcastic language and is intended to make its victim the butt of contempt or ridicule". People often use "sarcasm" to mean something like "irony with an edge", and specifically the type of irony that involves saying the opposite of what you mean.

That's clearly the sense that Steve Pinker had in mind when he wrote that the expression I could care less "is not illogical, it's sarcastic." I agree that the phrase is not a mistake in logic, but I think that Pinker is wrong about the sarcasm. And I'm pretty sure that he was wrong to argue that the melody and stress of the phrase convey -- to those who don't have a "tin ear" -- that it's being used sarcastically.

There are plenty of utterances that are used to mean the opposite of what they literally say, and there's a lot to say about how this works. I could care less might be an example of this -- though I don't think so -- but its stress and pitch don't bear on the question. There's no such thing as sarcastic intonation. Not in English, anyhow, and I doubt that any other language has such a thing either. Nor is there sarcastic stress, sarcastic pitch, sarcastic voice quality or any other mode of speech production that means "what I'm saying now is the opposite of what I mean."

Here's the backstory. Eric Bakovic got righteously peeved at Richard Lederer for saying that the common expression "I could care less" is illogical. In defense of the users of this colloquial expression, Eric quoted Steve Pinker's argument that "if these dudes would stop ragging on teenagers and scope out the construction, they would see that their argument is bogus ... ['I could care less'] is not illogical, it's sarcastic." Pinker claimed that the sarcasm is clear from the way the two phrases are (always?) pronounced: "The melodies and stresses are completely different." In particular, he suggests that the clue to sarcasm is to be found in an "ostentatiously mannered intonation".

As I just said, and as I wrote in an earlier post, I'm doubly skeptical about this. FIrst, I'm skeptical that people who say "I could care less" are being sarcastic. This is partly from listening to them, and thinking about it in context, but it's also an intuition based on the fact that I'm one of these people, and I don't feel sarcastic or ironic when I use the expression. Second, I don't believe that sarcastic or ironic utterances are inevitably or even normally performed with an "ostentatiously mannered intonation." That's certainly one option, but it's not the only option, and I'm not sure it's the commonest one. In general, I don't think that irony has any particular implications for speech performance; nor do I think that there's any mode of performance that signals irony.

So in this case, Pinker is just as "bogus" as Lederer. The expression could care less, as the OED puts it, is just a "U.S. colloq. phr.". Pinker's story about sarcasm, melody and stress is an interesting idea, but his description of the prosodic contrast between the two expressions is empirically false, and it assumes that that sarcasm can be marked by prosody in English, which is also false. Furthermore, there's a better alternative story about the origins of "could care less", which was laid out some time ago by John Lawler and others.

In this post, I'm going to say a bit more about the prosodic issues. I'll say more about the alternative (non-sarcasm) explanation for "could care less" in a later post.

Pinker's story about "I could care less" can be found in his book The Language Instinct, and in a 1994 New Republic article.

Here's his picture of the difference in "melodies and stresses":

COULDN'T care

CARE

ESS.

could

ESS.

Pinker doesn't tell us what he means by this notation. But I'll assume that he's using two common informal conventions: that words in capital letters are more stressed than words in lower-case letters, and that placing letters typographically higher on the page means that the corresponding pieces of words are higher in pitch.

Let's take stress first. Breaking out the caps=stress part of his notation, and using color to reinforce the case information, we have:

i COULDN'T care LESS vs. I could CARE LESS

There are two questions to ask about this. First, is it true? and second, does it have anything to do with sarcasm or irony?

I believe that the answers are "it's partly true, sometimes", and "no, it has nothing to do with sarcasm or irony in any systematic way."

It's partly true because of the following tendencies in speech rhythm:

Pronouns (like "I") are usually unstressed and therefore rhythmically weak
Verbs (like "care") are usually weak, and monosyllabic auxiliaries or modals (like "could") even weaker; in fact the latter are often completely reduced and turned into clitics
Not and contractions involving not usually want to be strong
Alternating rhythmic patterns are preferred
Starting a phrase with multiple weak syllables is avoided (especially in reading isolated sentences)

So for a phrase like "I couldn't care less", a natural pattern is w(eak) s(strong) w(eak) s(strong), which is nicely alternating, allows the pronoun to be weak, the contracted negation to be strong, and the verb care to be weaker than less. This can of course be overidden by contrast ("Kim is worried, but I couldn't care less") and by many other factors.

As for "I could care less", let's put it aside for a minute and look instead at a perfectly normal, unidiomatic, unironic phrase like "I could buy more". A natural stress pattern is "I could BUY MORE" or maybe "I could buy MORE".

Yes, this notation is ill defined. I didn't choose it here -- in another post, maybe we can talk about how to do better. The point here is that could winds up relatively weak, from a rhythmic point of view, and in fact is likely to be completely reduced. The initial pronoun "I" is likely to be rhythmically stronger than could, at least when you're reading the sentence in an artificial context, because of the desire for alternation and the desire to avoid multiple weak syllables at the start of the phrase. In real conversational usage, the pronoun would probably also be very weak, unless it was being used contrastively. The final word more needs to be at least as strong as buy, unless it's weakened by a contrastive structure like "I could MAKE more or I could BUY more).

But "I could BUY MORE" is exactly the stress pattern that Pinker cites for "I could CARE LESS"! That's because the rhythmic options for "I could care less" are exactly the same as they are for "I could buy more", or any other phrase involving similar words in a similar structure. Irony (or "sarcasm") has zip to do with it. There's no "sarcastic stress" or rhythm here.

What about the pitch? Is that sarcastic?

Well, the first thing to say is that the norms of American English intonation make many pitch contours available for "could care less", and also for "couldn't care less". It's not true that the particular patterns that Pinker cites are inevitable, and it's not even true that they're somehow normal or preferred. It's hard to evaluate his overall claim, since he makes it informally and doesn't say which aspects of the pitch contour are critical to "an ostentatiously mannered intonation" and which are not. But I think think we can still reject the theory (that something about the pitch contour tells us that "could care less" is ironic or sarcastic), because a similar distribution of pitch contours can be found on phrases of similar form that are totally lacking in irony. I won't try to prove that, but I'll give some examples that I hope will show you how such an argument would go.

Here's Pinker's picture again:

COULDN'T care

CARE

ESS.

could

ESS.

He shows "I couldn't care less" with "couldn't" as the highest pitch and "less" as the lowest. But here's the first roughly comparable example I found, where "less" is the highest pitch, with "couldn't" just a bit up over the subject (which in this case is "you").

[Time in the display runs from left to right; under the transcription there are two panels, first the pitch contour (time function of fundamental frequency of the voice) and then the waveform (time function of deviations from ambient air pressure). The display is a screenshot of the open-source program WaveSurfer.]

[You can click on the display to play the audio clip].

The sentence-wide context, by the way, is

So if you hate somebody, you'll let them do whatever they feel like, and you couldn't care less.

and the clip is taken from a telephone conversation in the 2003 Fisher collection.

As for the phrase "I could care less", Pinker shows it with "I" as the highest point, and "care" close behind, and "could" and "less" at the bottom. But in the example below ("I really could care less"), the highest points are "really" and "less", with "I" a significant amount lower, "care" also lower (in terms of the salient pitches in its high-amplitude center) and "could" very low in pitch and almost completely reduced.

[Again, you can click on the picture to listen to the audio clip, and see whether you think the intonation is "ostentatiously mannered" or in any other way prosodically marked as sarcastic. I don't think so.]

As I explained above, the reduction of "could" is very much like what might happen in a phrase like "I could buy more". (I should also note that the raised pitch values at the very start of "could" are probably caused by the initial voiceless stop [k] in could -- this effect of consonant voicing on pitch has often been involved in tonogenesis, as in the history of Vietnamese and Chinese).

Here's another example: "He just really could care less about it", where "he" and "less" are the highest points, with "care" only a little up over "could":

As we look at more pitch contours for other examples of "could care less" and "couldn't care less", we'll find that a wide range of pitch contours can be employed for each of them; that the two sets of contours overlap considerably; and (most important) that the contours are just like those that we find on similar examples (like "could buy more" or "couldn't find Lee") where the meaning is semantically compositional and pragmatically unironic.

There are plenty of interesting things to say about these pitch contours, but irony and sarcasm are not an essential part of the discussion. Nor do we need to talk about "ostentatiously mannered intonation". Prosodically, "could care less" is just like any other phrase involving similar words in a similar structure in a similar rhetorical context. And neither the structural analysis nor the rhetorical analysis needs to invoke sarcasm or irony.

I enjoy Steve Pinker's writing, and his analyses are often insightful. But in this case, I think he owes William Safire and Richard Lederer -- the "language mavens" whom he accused of "[a] tin ear for stress and melody, and an obliviousness to the principles of discourse and rhetoric" -- an apology.

Posted by Mark Liberman at 08:09 AM

July 15, 2004

Doing the word warp again

I've defended George W. Bush against those who ridicule his alleged violations of the norms of English word formation. This is partly out of a sense of fair play, and partly because I've been known to over-generalize a morphological pattern myself, on occasion. For example, I recall once rambling on for a bit about semanticians, until a friend gently inquired whether I might be talking about semanticists.

So when I followed a referral link to Blogwell's London Journal, browsed along through a few articles, and read this sentence:

Ken's one of the people who first got me interested in nineteenth-century America, though his specialism now is contemporary American fiction.

my reaction was "Aha! another morphology victim. Using specialism where the standard term is speciality (or specialty). Perfectly understandable because of backformation from specialist," and so on.

But the Clinical Medical Services of the University of Newcastle Upon Tyne offer this listing of "Staff by Research Specialism"; Jonathan Lee Recruitment offers this page of "Rail vacancy specialisms"; and so on. Specialism must be a Britishism, I told myself.

Still, the OED gives citations from 1856, so you'd think I might have noticed. The third-oldest citation introduced me to another unexpectedly unfamiliar word, divarication:

1876 GLADSTONE Homeric Synchr. 212 This divarication into specialism..is a sign of an old..condition of study and practice.

And the AHD dictionary just blandly defines specialism as "1. Concentration of one's efforts in a given occupation or field of study. 2. A field of specialization", without any usage note.

I'm starting to feel like I've slipped though one of those hackneyed hyper-dimensional cracks, into a universe that's linguistically just a bit askew. Perhaps here it's semanticians, after all?

Gack -- the OED gives

1921 H. E. PALMER Princ. Lang.-Study 62 The lexicologist or semantician will study the meanings.

1975 Language LI. 207 This distinction [between competence and performance]..is constantly attacked..by the ‘semanticians’.

Still, I can console myself with these Google counts, which suggest that I haven't slipped too far in hyperlexical coordinate space:

medical specialism	349
medical speciality	11,100
medical specialty	193,000
his/her specialism is	310+115 = 425
his/her speciality is	3,240+982 = 4,222
his/her specialty is	17,600+10,800 = 28,400
semantician	224
semanticist	5,160

[Update: James Dreier emails to point out that I originally checked only "speciality", not "specialty" (the latter being the American version, at least in this particular parallel universe, and the former being what the AHD calls "chiefly British"). Oops. The omission is now corrected. ]

[Update #2: Ray Girvan wrote this:

On no particular evidence (although a skim of Google seems to
bear it out), in UK usage I think there's a difference of tone. To me,
"specialism" implies an area of academic study; while "speciality"
implies a flair or talent in a popular area.

"I've written several monographs in the area of my specialism,
environmental effects of Mousterian variability in peat bogs".

"My speciality is juggling chainsaws".

Interesting. I'm still flabbergasted that I've managed to miss "specialism" entirely, all these years. Then again, I'm the world's worst proofreader, so maybe I just silently adjusted them all to specialty/speciality.]

Posted by Mark Liberman at 07:11 AM

Iraqi chicken

Yesterday I heard a presentation by a U.S. Army linguist (meaning "language specialist") about her experiences during a year with the CPA in Baghdad. She put this joke up on the screen to make a point about the problems of translation in Iraq:

Why did the Iraqi Chicken cross the road?

CPA:
The fact that the chicken crossed the road shows that decision making authority has switched to the chicken. From now on the chicken is responsible for its own decisions.

Halliburton:
We were asked to help the chicken cross the road. Given the inherent risk of road crossing, and the rarity of chickens, this operation will only cost $326,004.

Muqtada as-Sadr:
The chicken was a tool of the evil Coalition and will be killed.

US Army Military Police:
We were directed to prepare the chicken to cross the road. As part of these preparations, individual soldiers ran over the chicken repeatedly, and plucked the chicken. We deeply regret the occurrence of any chicken rights violations.

Peshmerga:
The chicken crossed the road, and will continue to cross the road, to show its independence and to transport the weapons it needs to defend itself. However, in future, to avoid problems, the chicken will be called a duck, and will wear a plastic bill.

1st Cavalry:
The chicken had no right to cross the road as it did not have the correct identification. Thus, the chicken was searched and detained. We apologize for any embarrassment to the chicken.

Al Jazeera:
The chicken was forced to cross the road multiple times at gunpoint by a large group of occupation soldiers, according to witnesses. The chicken was then fired upon intentionally, in yet another example of the abuse of innocent Iraqi chickens.

Blackwater:
We cannot confirm any involvement in the chicken-road-crossing incident.

Translators:
Chicken he corss street because bad she tangle regulation. Future chicken table against my request.

According to a comment on the Soul Pacific website, the author of this joke is Joshua Paul, now working at the U.S. Embassy in Baghdad. This may be the same Joshua Paul whose op-ed piece on an American Foreign Legion is reprinted here.

Among the interesting points that yesterday's speaker (a captain in the Army reserves) made:

Her DLI (Defense Language Institute) training in Arabic was less useful than she would have wished, mainly because it was in MSA (Modern Standard Arabic), whose relationship to Iraqi Arabic is roughly like the relationship between Latin and Italian. As a result, as she put it, "they could understand me but I couldn't understand them". I've heard that DLI used to teach a number of modern Arabic languages (often called "colloquials" or "dialects"), but stopped some time ago because the military's personnel system couldn't deal with the distinctions. As far as the personnel system was concerned, Arabic is Arabic; but sending someone trained in Moroccan Arabic to (say) Kuwait is like sending a Portuguese speaker to Romania.So DLI decided to stick with MSA, which is the language of formal discourse throughout the Arab world, though it's no one's native language. I don't know whether this is the reason, but it's certainly true right now that DLI teaches MSA rather than the local languages.

As a symptom of other sorts of communications difficulties, she cited the fact that it took her several months to locate a translation of the Bill of Rights into Arabic.

She's heading back to Baghdad next week for another tour. I wish her well.

Posted by Mark Liberman at 05:57 AM

July 14, 2004

The rest of the profession

I wrote to Rick Rickerson about his request on LINGUIST List for topic suggestions and script contributions for an NPR program on the apparently-soon-to-be-proclaimed Year of Languages (YOL). (Phew.) I offered to help, but I also mentioned to Rick that I was amazed that the linguistics community seemed to be hardly aware of YOL.

Rick replied and explained:

The "Year of Languages" idea was created at ACTFL [The American Council on the Teaching of Foreign Languages --EB] and (probably through David Edwards, the language profession's lobbyist in Washington) taken up by Sens. Dodd (CT) and Cochran (MO), who proposed it as a bipartisan Senate resolution. It was said to be on the Senate's agenda, apparently with no opposition, and to be approved a week or so ago. It is expected to be endorsed by the President as well. ACTFL has information about the Year of Langs on its website and has encouraged its membership to start thinking of ways to celebrate it ; but perhaps they've been waiting until the resolution is passed before alerting the rest of the profession. In any event, I have forwarded your note to Bret Lovejoy, Exec. Director of ACTFL, so they will no doubt start spreading the word further.

Now I'll of course wait for Executive Director Lovejoy to spread the word, but not very patiently. What I worry about is this: are linguists -- by which I mean, perhaps too narrowly, language researchers in social sciences departments -- generally considered part of "the rest of the profession" by the folks at ACTFL and related organizations? I highly doubt it.

Not that this would be entirely the fault of ACTFL et al. I also highly doubt that many of us are or even would be members of ACTFL, etc. Many linguists (probably most, in the narrow sense I defined above) have little if anything to do with language instruction, foreign or otherwise (myself included, though my linguistics department is the home of most of UCSD's basic language instruction). It may be too much to ask individual linguists or linguistics departments to get more directly involved in issues of language instruction, but I don't think it's too much to ask for our organizations (most notably, the LSA) to do so, if only to the extent that this involvement benefits the field. As broadly defined as YOL seems to be, I'm sure we could find a niche or two that suits us.

Our next and best opportunity for public discussion is probably the Annual Meeting in San Francisco. (The Summer Institute in Cambridge is already more than halfway through the YOL, but could be the ideal location for a follow-up discussion.) Or, we could just keep talking about it here on Language Log and rely on Rick Rickerson's efforts on NPR. In any event, I think involvement of some sort behooves all of us. As Philip very nicely put it:

Theoretical questions do not have to be explored without reference to how they might be applied. As the post-Sputnik tide rose, it lifted theory along with engineering in research, in education, and in the national mindset. Why should linguistic theoreticians not benefit in a similar way?

[ Comments? ]

Posted by Eric Bakovic at 04:04 PM

Crime wave

We call them eggcorns -- "tiny little poems, a symptom of human intelligence and creativity". The Guardian calls them "word crimes".

On July 7, Oxford University Press published a new edition of the Concise Oxford English Dictionary, and apparently sent around a press release that described an associated corpus-based study of punctuation, spelling and word-choice errors.

The Guardian reported this as follows:

One of the epidemic errors of the past 30 years - unnecessary, misplaced or omitted apostrophes in the words "its"and "it's" - has dwindled to only about 8% of people, possibly because the mistake has drawn so much ridicule...

But it has been replaced by misuse of "diffuse" or "defuse" (as in "A coach can diffuse the situation by praising the players").

Research for the new Concise Oxford English Dictionary, published today, found that this word crime was committed in some 50% of examples on the database. It is now rated as the commonest in the language.

The comparable story in the Independent describes these substitutions as "mass dyslexia", in quotes. I haven't been able to find the OUP press release, so I don't know if these striking examples of the "linguistic variation is crime" and "linguistic variation is disease" metaphors should be attributed to OUP or to the journalists involved.

As our on-going eggcorn collection indicates, I certainly recognize that things like using "diffuse" for "defuse" or "tow the line" for "toe the line" are non-standard word substitutions. But crime and dyslexia are pretty strong terms for sensible though incorrect ideas about linguistic analysis.

Posted by Mark Liberman at 09:06 AM

July 13, 2004

Negation by Association

Since I just 'fessed up to being a could care less speaker, I'm happy to learn from reader Lisa Davidson that Senator John Kerry came out as one too, in his 60 Minutes interview on Sunday:

Is the Democratic presidential candidate worried about being upstaged by his running mate?

"I could care less," says Kerry. "Look, this is about issues. It's about Americans; it's about problems. If he does a great job of going out, which I know he will, this is why I chose him - because I wanted the best person and I think he's the best."

Lisa added:

I saw this interview tonight (in fact, I have it on tape), and he definitely wasn't being sarcastic or ironic. He just used the wrong phrase.

Excuse me? The wrong phrase? Us carelessians believe that there are times when no other phrase will do.

Just to keep our coverage fair and balanced, I need to point out that President George W. Bush is also a could care less speaker:

(link) I don't care whether you're a Republican or a Democrat, or could care less about political party, you have an obligation to America.

John Kerry and George W. Bush are a bit older than I am, but none of us is a teenager. And I don't think any of us is being sarcastic when we say that we could care less.

Continuing my search through the LDC's archives of conversational speech transcripts, I've started to get a glimmer of what's really going on here. Consider these examples:

... they were just most of them -- were really just looking for money, and you know they didn't care less about the person ...
... a rocket up on Mars? I don't know about you, but I know -- why would I want to live up on Mars? Why would I -- uh why would I care less?

For many people, I think that "care less" has come to be an emphatic form of care, with a tinge of polarity about it -- something like "give a damn". Google finds plenty of other evidence:

(link) He was not a show cat, but I didn't care less about it.
(link) Who will win!? Do you care less?!
(link) What on earth was going on, and why should I care less anyway?

And in other cases, it seems that care less has just come to mean "not care", incorporating the negative even without a could around:

(link) Marcus Wilkins a player with ability but a player who cares less and just wants the paycheck..
(link) Michelle lives in an apartment who can barely afford to pay rent, who also lives with a druggie room mate who cares less about her, and her boyfriend abuses Michelle and she's afraid to break up with him.
(link) The same as my wife (an artist who cares less about motorcars and the like) can start her car and drive around happily, everybody could be able to install Amaya (or any other program) without worrying about the
PC internals.

And it looks like John Lawler has picked up on this, some time ago. He calls it "negation by association":

Like could care less, give a damn is a Negative Polarity Item, that is, a phrase that is ordinarily used only within the scope of semantic negation of some kind (not, never, only, rarely, few, etc.). Hence the perceived strangeness of They could give a damn, which has no overt negative, but means the same thing as the same phrase with a negative. I.e, the business manager was saying that his members couldn't give a damn.

Give a damn is a member of the open Minimal Direct Object class of NPI's, like lift a finger, drink a drop, do a thing, eat a bite, etc. The implication of all of them is that, if one can't even Verb a Minimal Direct Object, why, then, one couldn't Verb any Direct Object at all. Thus it's an idiomatic intensification of a negative. But it does usually require a negative to intensify.

However, there apparently is such a thing as negation by association. Like what happened to French pas from ne...pas, which is now usable as a negative in its own right, from long association in the discontinuous morpheme with the overt negative ne, give a damn and could care less have, in American usage at least, come to have their own quasi-independent negative force.

Give a damn has been used independently of negatives for at least 25 years in America. I published a paper (J. Lawler, Ample Negatives, in Papers from the Tenth Regional Meeting, Chicago Linguistic Society (CLS 10)) in 1974 that remarked on this topic, among other negative phenomena.

This makes a lot more sense than Pinker's sarcastic teenage intonation theory. Which I will still test empirically, because the intonation part is fun, even if at this point you could care less about care less.

[Update: David Beaver wrote to point out that the alt.usage.english FAQ has an entry for could care less, just in case you could still care for more, ]

[Update 7/14/2004: Jonathan Mayhew writes:

"I (could) couldn't care less" in Spanish is "no me importa un comino" OR "Me importa un comino." Both seem eminently logical to me. "It makes no more difference to me than a cumin seed," or "It makes as much difference to me as a cumin seed." Possibly something like this is going on in English. I can see someone saying "He gives a damn about his workers" to mean "He doesn't give a damn about his workers." That would confirm your analysis.

The Spanish case is similar, in that it's another example of what John Lawler called "minimal direct object" intensification of negation. It's a bit different, in that the literal meanings of the positive version ("it matters (only) a cumin seed") and the negative version ("it doesn't matter (even) a cumin seed") are not pragmatically very far away from one another.

The "give a damn" case is tricky, since one common colloquial generalization is that it's a positive intensive form -- as in the public service ad campaign of that name -- rather than generalization via "negation by association", as in Jonathan's example. I do agree with Jonathan that I sometimes hear "he gives a damn about ..." used to mean "he doesn't give a damn about...", though I can't locate any examples. And there's no question that "could give a damn" is used to mean "doesn't care at all".

As John Lawler pointed out, the traditional poster child for "negation by assocation" is French "ne...pas", where the historical sequence is (spelling aside) "je ne sais", "je ne sais pas", "je sais pas", all as ways to say "I don't know". Literally, "I not know", "I not know (a) step", "I know (a) step". Not that "pas" has meant "step" in this construction for quite a few centuries. ]

Posted by Mark Liberman at 09:28 PM

"Could care less" occurs more

A few days ago, I called into question Steve Pinker's story about could care less. Pinker claimed that could care less is not just a colloquial phrase meaning couldn't care less, because "The melodies and stresses are completely different, and for a good reason. The second version is not illogical, it's sarcastic".

I expressed the belief "that (a) the two phrases are not generally distinguished prosodically as Pinker asserts they are; and that (b) the cited prosodic difference would not as a general rule yield the asserted (sarcastic vs. non-sarcastic) difference in interpretation".

Let me hasten to say that this doesn't put me in the camp of those who sneer at could care less as foolish and illogical. In fact, I freely admit that I use this phrase myself, and though I'm often foolish and illogical, I don't take this to be an instance. Instead, I agree with the OED that could care less is just a colloquial phrase, which for whatever reason means essentially the same thing as couldn't care less.

In any case, I promised "to examine the examples in some conversational speech corpora to evaluate [Pinker's claim] (a), and show you all the pitch tracks". I also promised to "say more later about [Pinker's claim] (b), and how to evaluate claims like this".

Well, I haven't quite gotten there yet. This is partly because I've got a few other chores to deal with, but it's also because my first search produced a surprising result. According to the Switchboard conversational speech corpus, Americans of all ages are far more likely to use could care less than couldn't care less. In fact, in that corpus, couldn't care less doesn't occur at all.

The Switchboard corpus is "a collection of about 2400 two-sided telephone conversations among 543 speakers (302 male, 241 female) from all areas of the United States". I searched these for the word sequence "care less", and found eight relevant examples -- all eight of which are the could care less variant. The couldn't care less form simply doesn't occur.

Here are the examples. Each is preceded by the conversational ID, the sex and birth year of the speaker, the start and end times of the speaker turn, and the turn ID.

sw2331 (M-1950) 341.82 347.24 A.105 And it's something we can do together, so. Well, she doesn't do much hunting, she could care less about that.
sw2352 (F-1940) 519.90 540.44 A.175 It'd be great but the problem with that is you have all of these little branches off the main problem and everyone is very concerned about one thing, you know, like the woman down the tip of Texas that, that calls her turtles and kisses everyone of them everyday. She's terribly interested in her turtles she could care less about shrimp --
sw2490 (F-1964) 253.10 269.04 A.103 Um. -- and, and, I think, and that's what they were going after, uh, they went to interview this town, I mean it's a little dinky town, they went to interview them about the gun control laws [lipsmack] and, uh, the police said, (( )) all the people said, that's fine we could care less. You know, we're all honest, and everything like,
sw2586 (F-1961) 380.88 383.73 B.142 Oh. And the little one, of course, is, could care less.
sw2749 (F-1969) 331.18 348.48 B.56 Probably, yeah [laughter]. Yeah. But, uh, no, it's, it's sad, though, that, uh, that the people, though, that you, you, somebody's life is in twelve people's hands and sometimes those twelve people could care less.
sw3090 (F-1942) 204.48 208.84 A.71 Uh-huh. our son could care less about a budget, and our daughter watches her pennies so closely,
sw3415 (F-1956) 110.70 113.40 A.45 I don't mind waiting, that, that, that's, I could care less --
sw3550 (M-1959) 261.74 264.64 B.126 Uh-huh. -- just to be a little, you know, little car that I could care less about --

As a result, this data is not enough for me to be able to check Steve Pinker's hypothesis about the prosodic marking of the contrast. However, we can certainly disprove his suggestion that the speakers who use this form are "teenagers". The conversations were recorded in 1991, so the speakers' ages at the time were 41, 51, 27, 30, 22, 49, 35, and 32. Not a teen in the bunch.

But I haven't given up on testing the "prosody of sarcasm" hypothesis for this contrast.

Luckily we've recorded and transcribed (though not yet published) quite a bit more conversational speech. Looking at one of these other sources, I found eight could care less examples, and one of the couldn't care less variants. And in another collection, I hit the mother lode: 32 examples of couldn't care less, against 37 examples of could care less. (I'm a little worried about the last set, though -- although it was recorded in the U.S., it was transcribed in New Zealand...) If the couldn't care less examples in the last set are not just transcription errors, that'll be plenty to get a good evaluation of intonational differences, if any. Once I track down the audio, anyhow. So stay tuned.

Posted by Mark Liberman at 03:57 PM

White bred

Reader Maya Hermann emailed:

... I think I've found an eggcorn in the wild. Either that, or I've been unwittingly using an eggcorn for years.

The word in question is the substitution of "white bred" for "white bread" to mean homogeneous or plain.

Maya cited this comment, attributed to "Lynne", on an Atrios thread:

I agree wholeheartedly about familiarity. I know my parents are good people at heart, but living in white-bred sub-rural New Hampster, they get hardly any experience with people of color, or any other minority for that matter. I know my mom wouldn't make the comments she does if she had a friend or family member who was black, gay, whatever.

Thank god I moved to MA...to a city with a lot of minorities. I feel much more exposed to life here. Yes, MA is still primarily white-bred, but nothing beats northern NE for closed-mindedness.

It's hard to be sure about Lynne, in this case, since she also uses the jokey substitution "New Hamster" for "New Hampshire" -- maybe white-bred is also a joke, for her? However, there are plenty of examples on the net that seem to reflect sincere misunderstandings. For example:

(link) If you want to hear four white-bred European jerk-offs sing the tunes of ABBA, why wouldn't you just go and buy an ABBA record?
(link) “(Pepperdine) doesn’t have any culture at all – it is very, very white-bred,” said senior Julee Bailey, who is black.
(link) I'm sure there's market research, where it's saying, (in a horribly stuffy white-bred voice) "We've found that the white background on a video cover sells more units than the…."

We've been using eggcorn as a term for the kind of sporadic folk etymology represented by interpreting acorn as "egg corn". "White bred" for "white bread" is an excellent example of the subspecies where the sounds are not just similar, but identical, and where the misinterpretation makes at least as much sense as the original.

The AHD glosses white-bread as "Blandly conventional, especially when considered as typical of white middle-class America"; this is a metonymic generalization of white bread as "Bread made from finely ground, usually bleached wheat flour". The original thought process seems to be that by the middle of the 20th century, white bread had become the standard form of American bread, but was also relatively soft, bland, textureless and geographically uniform. The generic opposition of the term white bread to brown bread probably also helped provoke the metonymic generalization from food to culture.

According to the OED, the use of white bread for actual bread goes back to the 14th century, and brown bread has citations from 1489 onward. The OED lists white-bread, a. as "colloq. (orig. and chiefly N. Amer.). Freq. depreciative", glossed as "Of, belonging to, or representative of the (North American) white middle-classes; bourgeois; (hence) strait-laced, conventional; bland or innocuous." The earliest citations given are these:

1977 Newsweek 3 Oct. 60 He [sc. Richard Pryor] walked off the Aladdin Hotel stage in Las Vegas, fed up with doing ‘white bread’ humor.
1979 TV Guide 13 Jan. 30/2 The contrast between his white-bread liberalism and the boys' ghetto wit is the basis of all the comedy in Diff'rent Strokes.

However, I'm pretty sure that I heard the same usage at least as early as the middle 1960s.

Posted by Mark Liberman at 05:46 AM

July 12, 2004

Conundrums, quibbles and clenches

The Tour de France is taking a day off, and so Samuel Abt has been reduced to writing an article about headline puns on the name of Iban Mayo, now in 89th position, 15:02 behind the leader. "Mayo On a Roll", "Salad Days", "Hold the Mayo", and so on. You can imagine.

To quote Stephen Maturin:

'He that would make a pun would pick a pocket,' said Stephen, 'and that miserable quibble is not even a pun, but a vile clench.'

Then again, it was also Stephen Maturin who explained that the " dog watches" are so called "because they are curtailed".

Dryden -- the same one who invented the "rule" against stranded prepositions, in criticizing Ben Jonson -- is said to have complained of Shakespeare's "comic wit degenerating into clenches" .

Another odd piece of pedantic punology is the fact that conundrum (according to the OED) first meant "pedant, crotchet-monger, or ninny" (from 1596), and then "a whim, crotchet, maggot, conceit" (from 1605), and then "a pun or word-play depending on similarity of sound in words of different meaning" (from 1645).

The same is true for quibble, which first meant "a play upon words, a pun", which is the sense intended in Johnson's dictionary definition of conundrum:

1755-73 JOHNSON, Conundrum, a low jest; a quibble; a mean conceit: a cant word.

Posted by Mark Liberman at 04:08 PM

Citing to riot

Janet Maslin has a rave review of Carl Hiaasen's new book "Skinny Dip" in today's NYT. She calls it "a screwball delight so full of bright, deft, beautifully honed humor that it places Mr. Hiaasen in the company of Preston Sturges, Woody Allen and S. J. Perelman." I'm a fan of Hiaasen myself, but I'm writing about the review because I can't figure out what Ms. Maslin thinks the word syntax means.

Beyond its lean, clean prose and riotous syntax (" `I did not feed Bert Miller's dog to my snakes,' he said, almost adding: But accidents happen"), "Skinny Dip" has the advantage of a well-populated plot that doesn't go overboard.

There's also a question here about riotous. Given the Sturges-Allen-Perelman set-up, I took Maslin to mean that Hiaasen's syntax is "riotously funny", but maybe she just meant that it's "uproarious, boisterous", or "abundant or luxuriant".

I'm not sure that I share Maslin's opinion that anything about the Hiaasen quote is riotous, either in the sense of being very funny or in the sense of being out of control in a chaotic way. Here's the quote again:

`I did not feed Bert Miller's dog to my snakes,' he said, almost adding: But accidents happen.

Riotous? Maybe. But riotous syntax? What does that mean?

Is the riotous part "'I did not VERB NOUNPHRASE to NOUNPHRASE, he said"? Not likely.

What about the adjoined participial phrase "ADVERB VERBing: CONJUNCTION NOUNPHRASEs VERB"? No riot here, officer.

Could it be that participial phrase ("almost adding...") stacked up after the attributive tag ("he said")? No, Elmore Leonard does that kind of thing all the time, and his prose is as sober as you can get.

I'm not trying to give Janet Maslin a hard time here, I'm genuinely puzzled about what she was trying to convey about Hiaasen's style. The example she gives does illustrate something that I like about his writing, namely his ability to suggest cultural and emotional depths by presenting striking and even outrageous details in a deadpan way. You can find this on just about any page -- I just took Basket Case down from the shelf, opened it to a random place (page 40), and read:

    Janet Thrush -- who else could it be? takes the stool next to me and says, "First off, nobody calls me Jan."
    "Deal."
    "It's Janet. My ex once called me Jan and I stuck a cocktail fork into his femoral artery."
    I am careful to display no curiosity about the marriage.
    "So, Janet, exactly how to did Cleo Rio scam me?"
    "She lied about her new record -- 'Waterlogged Heart' or whatever. Jimmy's not producing it."
    Janet has freckles on her nose and unruly ash-blond hair and green bulb earrings the size of Yule ornaments. She's wearing Wayfarers and a pastel tube top over tight jeans, and looks at least five years younger than her brother.

Funny, yes. Intriguing, yes. Riotous syntax? I'm still baffled.

Last November, William Ivey Long (Harvey Fierstein's dress designer) commented on the diverse construals of Harvey's Mrs. Santa Claus outfit by observing that "the semantics are confusing". I pointed out that "Long is clearly using semantics in the ordinary language sense of 'what things mean'. I explained that linguists and philosophers find this usage inappropriate, in terms of the historical origins of the word semantics as well as its present-day meaning as a term of art, especially in contrast to pragmatics, but it's an understandable and unavoidable generalization. We just have to accept that most people use semantics for layers of speculation about communicative intent, exploration of associative nuances, and anything else that's part of a meta-commentary about interpretation. It makes no more sense for linguists to complain about the technical meaning of semantics, at this point, than it would for physicists to complain about the technical meaning of force.

So I'd be happy -- or at least willing -- to concede syntax as well to Janet Maslin. If only I could figure out what she wants it for.

[Update: if you're not already familiar with it, Ron Hogan's Maslin Watch feature at beatrice.com is well worth reading. On a quick scan, I don't find anything that helps me to understand the intended meaning of riotous syntax. However, I do begin to get the feeling that perhaps it was a mistake to assume in the first place that "the intended meaning of riotous syntax" is a referring expression...]

[One final aside: I can certainly imagine what riotous syntax might be like, at least in the "boisterous" or "luxuriant" sense of riotous: the structure of highly interactive extemporaneous discourse, which "as sudden torrent in time of speat in the mountain / Hurries six ways at once, and takes at last to the roughest", to quote Arthur Hugh Clough. But that's not Hiaasen.]

Posted by Mark Liberman at 09:22 AM

July 11, 2004

Onze Taal and Perlentaucher

In discussing an article by Thierry Chervel on Europe's failure to take advantage of the internet, I questioned the author's claim that a key problem is the dominance of the English language. "Look at Onze Taal, for example", I wrote.

Marc van Oostendorp responded by email:

Thank you! I share your analysis that "Europeans may be less willing to try informal, inexpensive, bottom-up experiments, of the kind that were involved in the very early stages of Amazon and Google".

Onze Taal is very much an exception, maybe because it is exceptional in other ways: an association of approximately 40.000 'language lovers' (most of them obviously not linguists), and probably the largest association of its kind in the world. Originally (back in 1997) our plan was to cooperate with a few other Dutch organisations, but because they suffered from the problem you sketched (we do not want to start until we have the money, but potential sponsors obviously do not want to give money until they can see what it is that we want to do), we decided we would start independently.

Our website is indeed informal and inexpensive. I am the only person who is employed to do some work for it -- I have been hired by the association for four hours a week, but I also write articles for the monthly journal (the other days I work as a researcher). There are also two editorial assistants to the monthly journal who are allowed to spend some working time on it, esp. for the Taalnieuws. The design, scripting etc. is done by us as well. The website is hosted at a commercial internet provider; the costs of this are a few hundred euros a year. The association Onze Taal thinks it is worth it, because the web site attracts new members, and the association has always seen it as its goal to inform the general public about language. The site attracts approximately 3000 visitors each day. A large majority (70%) is from the Netherlands; 20% are from Flanders (most people there would consider us to be still too 'Netherlandish'.

I wonder what other organizations like this exist? If there's anything similar in the English-speaking world, I don't know about it.

Quite apart from his work on Onze Taal, Marc is a serious and accomplished linguistic researcher, with many interesting papers and courses linked on his web site.

Marc also wrote that

I am a fan of Language Log. It is very useful that you often give sensible reviews of misunderstandings in the media.

Well, I hope that they are sensible. There are certainly plenty of misunderstandings to choose from. Of course, we have to accept it cheerfully when others point out misunderstandings and omissions in Language Log, as Margaret Marks did by email with respect to the same post:

...the small bio of Chervel says he was a co-founder of Perlentaucher, and Perlentaucher is a really excellent website that reports on the literature pages of all the German-language newspapers. So even if he hasn't got his own website, he is involved with a good one!

I'll confess that I did find Perlentaucher, both by Googling "Thierry Chervel" and from the link on his little Eurozine bio page. Originally I wrote a paragraph about it, which I cut because the post was pretty long, and I didn't have time to figure out how integrate it, and so on. However, as a result the post is unfair to Chervel, twitting him for lacking a home page when he is responsible for a major German-language A.L.D.-like site. And my omission also means that I failed to underline one important aspect of Chervel's complaint, namely that he apparently envies Denis Dutton of A.L.D. the (surely much wider) range of English-language web content to draw on.

However, as I wrote back to Margaret:

I might argue that this just reinforces my point. Chervel is obviously web-savvy and so on, but his web participation so far might be seen as indicating a certain kind of formality, and a certain threshold for "seriousness" and level of investment, which tends to inhibit the rather anarchic low-level experimentation out of which most new ideas and new applications develop.

And let me point out that A.L.D.'s editor Denis Dutton -- like Marc Oostendorp -- has an extensive and interesting personal web site.

Posted by Mark Liberman at 11:58 PM

Typescript finished: a milestone and a rant

A major milestone was reached today in the process of completing A Student's Introduction to English Grammar, the grammar textbook that Rodney Huddleston and I have been working on for over a year. After a marathon work session at his beach house overlooking the Tasman Sea, Rodney completed the analog of what software engineers call "the build": he put together all the pieces and prepared and debugged the final complete electronic version of our book for Cambridge University Press. It has just been emailed from Brisbane (where Rodney's mail server is) to Cambridge, England (where the publisher is), with a copy to Santa Cruz (for the second author, yours truly). And how do I feel with respect to the technology that permitted this? I'm impressed, and furious, and grateful, and disgusted. Let me explain. Or not, if the last thing you want today is to read a rant. This rant is dedicated to Wolf Angel, who understands that the world is a place that one might occasionally want to rant at.

The Internet is of course a major boon to scholars and has changed my life for the better. Once it would have taken two months to get a bulky double-spaced manuscript from Australia to England as a parcel; now it takes roughly two minutes. Two minutes during which our book changed status from "in preparation" to "in press". What's more, hardly any of the six seconds was travel time. It was taken up with waiting for router boxes and CPUs and disk drives on intermediate servers to get to their tasks. The majority of their work involves forwarding spam. Some counts say that as much as 80% of today's email is spam.

(While waiting for my message with the final-build book typescript to arrive I received a message headed "Nude blonde raped" that eluded my spam filter completely. It contained just some ASCII gibberish strings to fool the spam filter into thinking there was ordinary text there and some HTML code with a URL to click on if I was intrigued by the idea of seeing pictures of a nude blonde being raped. I didn't visit. I did resent the intrusion of this unwanted extra communication slipping through the spam net. But even if we could find out who controlled the website that the rented domain name pointed to at the time, and we could also find the sender of the message, we could never gather convincing proof that the owner of the website had authorized the sending of the spam that was supposed to direct traffic to him; he would say he knew nothing about it, and the spam-sender would say he knew nothing about the website or the sending of the messages, and prosecution of either or both would fail. We legitimate Internet users can never stop the spammers by ordinary legal means. We have to kill them. But I digress.)

If it were not a matter of waiting for intermediate machines to get to the task of routing the message, a journey of a mere 12,000 miles, at the speed of electrons, would take only about 64 milliseconds. But this is a digression too. Let me struggle to get back to whatever my point was; I know I had one.

Internet technology was indispensable to the production of our book. I've done the trip to Australia to work directly with Rodney on The Cambridge Grammar five times already, spending in total about a year of my life there, and it's a long, long way, and it's expensive. I couldn't make the trip during the writing of this book. We had to do it entirely by exchanging WordPerfect files in email attachments. The final email message was 2,233,589 bytes — way bigger than what limits on message size used to permit. Came through without a hitch. Decoded like magic under Linux (I use the nail program; old-fashioned, but because I never use a Windows-based downloading email program, I am totally virus-proof). Downloaded to the Windows machine in a blink. Loaded under WordPerfect 11 in a split second. I should be grateful to the industry that made this possible, the industry that gave us the Internet and word processing software. And yet... Let's just say that I'm not a generous-hearted enough human being to be capable of unalloyed gratitude, given what happened toward the end of the build.

During the last frantic day of work, Rodney found he had a single chapter for which the file appeared to be corrupted. On its own, it could be loaded, viewed, and edited. But if he added it to the previous chapters, WordPerfect would freeze when he tried to look at the whole thing. If he split the book into two parts and tried to make the problem file the first file of the second part, then the second part couldn't be loaded. He worked on the problem in lonely agony for four hours straight. He struggled to find a way of using the file, by desperate strategies like retyping parts of it near where the freeze-up seemed to occur. Then eventually he told me about it.

As soon as I read his email with the description of the problem (it came in just after one telling me where I could obtain videos of incest rape if I was fresh out of them), I began thinking: open brackets. Like LaTeX groups that were never closed, or maybe nested keeps. It wasn't hardware memory limits $#151; the whole book could be loaded all at once as long as the bad chapter on clause type wasn't in there. But something in the file was causing a buffer to open but never close. And I suspected Block Protect.

WordPerfect allows you to mark a block of text as not permitted to be interrupted by a page break. But it is unfortunately possible to not notice that you're in one of these protected blocks, and start putting other material, perhaps dozens or scores more pages, into it. And you can start another protected block inside it. WordPerfect really doesn't like that. (Why am I using WordPerfect if I object to its behavior? Compatibility with a colleague who selected it in 1989 and has built up a library of millions of words and hundreds of macros. I already explained a bit about my experience with WordPerfect here and later here.)

So I began looking for over-extended Block Protects in the apparently corrupted file, and that's what I rapidly found. A Block Protect opened, pages and pages went by, and eventually another one opened; the memory organization couldn't keep track of the illegal structure, and WordPerfect would just freeze up and stop working. Dozens and dozens of times over, without ever giving an error message that would provide a clue as to what was going on.

I deleted the first Block Protect character, and everything was fine. I sent the file to Australia and five minutes later the build could proceed. A happy ending? Well sorry, but not happy enough. I am actually furious, looking back. In the 1970s, the Unix formatter troff was of amply high enough quality to format a professional-looking printed book, it ran fast and was not crippled with bugs, and it had the capability to spot what it calls "illegal nested keeps", or blocks that are opened but haven't been closed when the file ends, or blocks that are closed when they have never been opened. TeX could (and can) do similar error diagnosis too. It is not beyond the powers of software engineering to spot problems of this kind and warn about them in the parsing phase, exiting gracefully whatever happens (by which I mean, leaving the operating system still functional; when I was still using WordPerfect 6.1 on a Windows 95 machine I noticed that every WordPerfect crash would necessitate rebooting the entire machine with loss of all unsaved data in open programs, because after a crash it was not possible for Windows to reinitialize WordPerfect and open it again).

By 1990, what was the improved state of affairs? We had word processors that were orders of magnitude bigger in object code and disk space required (libraries of absurd crap like clip art), and ran slower, with editing capabilities that didn't have the regular-expression searching that Unix editors have, but the formatting algorithm didn't come with the capability to warn about what errors it had encountered in a file containing a typing error. WordPerfect would just crash or freeze or "become unstable" when something ugly happens in the document, without an error message. (Don't ask about Microsoft Word. It is, as always, much worse: WordPerfect has the crucially important Reveal Codes feature, which permits the user to look into the file as encoded internally and see what hidden formatting codes are in there. Microsoft won't supply that, although they could. Instead they put an entry in their Help files explaining that you don't need that. Right. Ve vill tell you vhat you need, worthless enduser scum!) I'm saying that word processing technology moved backward to a considerable extent in the period 1980-1990.

Word processors. Can't live with 'em, can't live without 'em. The project Rodney Huddleston and I have just completed was only possible because of modern word processing technology, which enabled us to exchange dozens of drafts a day across the Pacific in seconds. Yet at the same time, modern word processing technology nearly killed it by allowing an algorithically detectable but unreported and obscure document content error to crash the machine, giving no error message, when one file was concatenated with another, though not when the former was opened. We nearly couldn't deliver a typescript at all. If software technology was automobile technology, the roads would be running with blood and thousands of car manufacturers would be in jail. Where they would deserve to be.

Now we move forward to the stage where a copy editor will work through the typescript, and unless Cambridge University Press has warned them about who we are, they will start changing our whiches to thats and our sinces to becauses and moving our punctuation to the other side of our quotes where we didn't want them and so on and so on. I will try to keep you (or at least WolfAngel) informed with regular rants.

Posted by Geoffrey K. Pullum at 07:24 PM

Sputnik and Language

Eric's posting about the Year of Languages led me to the statement of the program's background, which had me rolling on the floor laughing. A favorite tidbit: " The United States has a history of multilingualism . . . Jefferson himself was fluent in French and could communicate and read in several additional languages. For more than two hundred years, Americans have continued this tradition of valuing other cultures as successive waves of immigrants settled in the United States..." Yup, a great tradition, exemplified so nicely by our nation's coining the term "liberty sandwich" to replace "hamburger" during World War I (during Woodrow Wilson's administration, "[p]laying German music and teaching -- or even speaking -- the German language were prohibited" [Houghton Mifflin, Reader's Companion to American History]) , and exemplified more recently by all those good multicultural GOP lawmakers who suggested that we replace "French fries" with "freedom fries" on our beloved fast food menus.

Nonetheless, having been reading a little bit about the space program recently, I think it's worth considering the analogy between today's interest in foreign languages and the surge of math/science interest after the Sputnik launch of 1957.

There may be longstanding gaps between theoretical linguistics and language applications, but the vital national interest in foreign language capabilities (both human and technological) is an opportunity to build bridges that it seems just plain silly to ignore. Fundamental questions from Chomskyan Linguistics 101 are already rising to prominence outside the syntactic orthodoxy. Questions like "what is a language?" (since "Arabic" connotes two very different things in Yemen and in Egypt, but there's a relationship between the two). Questions like "how does linguistic form communicate underlying meaning?" (since characterizing performance on language tasks, such as human and machine translation, involves both). Questions like "how can theories about language be made internally consistent?" (since inconsistent theories are hard to operationalize).

The question is, will linguistic theory rise to the occasion? Theoretical questions do not have to be explored without reference to how they might be applied. As the post-Sputnik tide rose, it lifted theory along with engineering in research, in education, and in the national mindset. Why should linguistic theoreticians not benefit in a similar way?

Posted by Philip Resnik at 04:09 PM

The interpretation of stones and typefaces

Is meaning something that sentences have, or is meaning something that people do? And what about rocks, fonts and cities?

In an essay "Meaning and Truth" (in Logico-linguistic Papers, 1971), P.F. Strawson put it this way:

What is it for anything to have a meaning at all, in the way, or in the sense , in which words or sentences or signals have meaning? What is it for a particular sentence to have the meaning or meanings it does have? What is it for a particular phrase, or a particular word, to have the meaning or meanings it does have?
[ . . .]
I am not going to undertake to try to answer these so obviously connected questions. . . I want rather to discuss a certain conflict, or apparent conflict, more or less dimly discernible in current approaches to these questions. For the sake of a label, we might call it the conflict between the theorists of communication-intention and the theorists of formal semantics. According to the former, it is impossible to give an adequate account of the concept of meaning without reference to the possession by speakers of audience-directed intentions of a certain complex kind. . . The opposed view. . . is that this doctrine simply gets things the wrong way round. . . the system of semantic and syntactical rules, in the mastery of which knowledge of a language consists -- the rules which determine the meanings of sentences -- is not a system of rules for communicating at all. The rules can be exploited for this purpose; but this is incidental to their essential character. It would be perfectly possible for someone to understand a language completely -- to have a perfect linguistic competence -- without having even the implicit thought of the function of communication . . .

I've elided many of his qualifications and asides, but you probably get the drift. And despite the elisions, you're also probably getting a message from his prose style: here's someone trying to express complex ideas with great precision, nuance and care.

All the same, there's one aspect of ordinary-life and ordinary-language meaning that's left out of discussions like this. Strawson frames the pragmatic view of meaning in terms of "communication intention", and talks about "the possession by speakers of audience-directed intentions of a certain complex kind". Others (like Paul Grice) use the term "speaker meaning". However, interpretation -- attribution of meaning in some sense -- also takes place in contexts where there is no individual "speaker", no well-defined "intention" construed as the psychological state of an individual, and no fixed audience. The object of interpretation may be the result of a complex interaction among many people and various non-human forces and contingencies; it may have other functions besides communication; and the interpretive context may develop over time, adding layers that everyone comes to accept as part of the "meaning" of the object being interpreted. Despite this, the kinds of interpretation involved often seem much more reminiscent of the ("pragmatic") interpretation of speaker meaning than the ("semantic") interpretation of sentence meaning.

It's tempted to reject such cases as parasitic on the more pragmatics of "real communication" between individuals, and full of confusions engendered by fuzzy metaphorical extension of a problem that's hard enough in its core case of Kim informing Leslie that the water has boiled for tea. If this is true, it's too bad, since it would rule out (for instance) applying pragmatic reasoning to the law.

But that's a topic for another post. Today I'm interested in the meaning of building stones and typefaces. On July 4, the cornerstone of the Freedom Tower was laid at the World Trade Center site in NY City.

On July 8, the NYT ran an article by David Dunlap (permanent link unfortunately not available) that used the language of communication-intention freely, as if the stone was a sentence and the typography its intonation. Dunlap's theme is that the cornerstone's message was an ambiguous one:

As the first tangible element of the Freedom Tower - and, by extension, the trade center redevelopment - and as an image seen nationwide on Independence Day, the cornerstone sent an aesthetic signal of intent.

And the signal seemed to reflect the inherent ambiguity of the project: a solemn memorial to 2,749 lives lost in the worst single catastrophe in New York history that is simultaneously supposed to be a defiant restatement of the city's commercial gigantism.

Seen one way, the cornerstone's darkness and plainness are memorial, even funereal. Seen another, the radiant silver-leaf letterforms conjure the exuberant, modernist, midcentury optimism of New York even as they augur the glass and stainless-steel tower to come.

Much of the article focuses on the meaning of the choice of font. The selected Gotham font was designed in 2000 by Tobias Frere-Jones and Jesse Ragan, of the the firm of Hoefler & Frere-Jones. The article quotes a representative of the firm:

"It's one of those typefaces that's open to interpretation," Mr. Hoefler said. "That makes it a good match for this monument."

The article also dissects the meaning of the choice to use all capital letters:

Lines of all-capital lettering, intended to enhance the cornerstone's formality, may have diminished somewhat the idea that it commemorates people and spirit. "Use of upper- and lowercase would have democratized the message, removed its institutional pretensions," said John Kane, the author of "A Type Primer" (Prentice Hall, 2003). "Lowercase would have given the words a human voice."

The 1,100-word article ends with this:

Ann Harakawa, a principal in the Two Twelve Associates design firm, whose office at 90 West Street was destroyed on 9/11, said the typeface was simple, legible and, given its New York provenance, very apt. "The idea of it being slightly ambiguous is interesting," she said, "because no one has any idea of what's going to come."

The cornerstone itself bears these 26 words:

To honor and remember those who lost their lives on September 11, 2001 and as a tribute to the enduring spirit of freedom. July Fourth 2004.

Appropriately enough (as Arnold Zwicky pointed out yesterday), the only letter that the Times has printed in response to this article was from a copy editor named Betsy Wade, who complained about the punctuation of the inscription:

The section of the 9/11 cornerstone inscription depicted in the accompanying photograph clearly shows that the grammatically necessary comma after "2001" ("on September 11, 2001") is absent.
As a longtime editor, I hope that the artisans will be able to correct this omission in the handsome Gotham typeface.

I can't resist echoing Geoff Pullum's plea: where are the T-men when you need them?

Posted by Mark Liberman at 11:10 AM

A post post post toast post toast post toast post toast post

In his original toast post Eric observed that the slippery piece of toast that always hits the floor jelly side down is odd, or at least implies multiple misfortunes for each piece of toast. In my post toast post toast post I noted that the problem is that always is stuck deep in a relative clause, and cannot manage to get hold of the piece of toast at the head of the clause. It would need to do that in order to get wide scope, so as to say for each slippery piece of toast that it hit the floor jelly side down.

But Mark was not satisfied. In his post post toast post toast post toast post he pointed out that sometimes always does seem to do the job that I apparently said was impossible.

Here are two of Mark's examples:

Their sourdough is a light and crispy toast that always hits the floor marmalade side up.
My mother used to make a delicious rye toast that always hit the floor cream cheese side up.

Is there some difference between the slippery piece of toast that and a light and crispy toast that which allows always to get wide scope in the second case, but not in the first? The answer is no. But there is a crucial difference between the examples which gives a similar effect. The slippery piece of toast has a limitation that a light and crispy toast lacks.

Don't get me wrong: I like pieces of things: that piece of paper, a piece of the Berlin wall, and of course, the slippery piece of toast. Nothing against them, no siree. They come in all sorts of shapes and sizes, and, they are almost invariably just delightfully tangible. Oooh, you know what? I'm holding a piece of something in my hand right now and crinkling it.... yum. Even the ones that aren't tangible are eminently givable. I would love to give Mark a piece of my mind. And some of them are so terribly useful. Don't know what we'd do without them. So I hate to criticize the little lovelies. But no more beating about the bush, I'll just come right out with it. Pieces of things don't make good natural kinds. (Ouch: it was no fun saying that, but sometimes as a professional linguist you've just got to ignore your own personal feelings and come right out with the facts.)

Here are some potential kind denoting expressions: dinosaurs, tall buildings, freak thunderstorms, Danish crispbread, Mark Liberman's blog postings, the long vacation, toast. I say "potential" cos while you can use these expressions to refer to a kind, you can also use them to refer to sets of things, or, in the case of crispbread, a mass of the stuff. What's the difference between a kind denotation and a set or a mass? The ultimate source on this question is Greg Carlson's 1977 UMass PhD dissertation Reference to Kinds in English. There are ways to tell whether you've got a kind NP on your hands, since some predicates like kinds, but not other things. The classic one is is/are extinct, but I prefer is/are more common than ever, and has three subtypes, which are more general in their applicability.

Dinosaurs/tall buildings/freak thunderstorms/Mark Liberman's blog postings are more common than ever.
Dinosaurs/tall buildings/freak thunderstorms/Mark Liberman's blog postings have three subtypes.
Danish crispbread, the long vacation, toast is more common than ever.
Danish crispbread, the long vacation, toast has three subtypes.

What sorts of NPs can't refer to kinds? Well, here are some: Mark, my cat, a certain friend of mine. You can tell they are not good kinds, because they don't fit in at all well with my two favorite kind predicates:

* Mark/my cat/a certain friend of mine is more common than ever.
* Mark/my cat/a certain friend of mine has three subtypes.

So now we come to all those pieces of things. These, unfortunately for the original slippery toast example, do not make good kinds:

* That piece of paper/a piece of the Berlin wall/a piece of my mind/the slippery piece of toast is more common than ever.
* That piece of paper/a piece of the Berlin wall/a piece of my mind/the slippery piece of toast has three subtypes.

Here is a slight complication: Elster, the author of the original slippery toast sentence, really was trying to conjure up the notion of a kind, i.e. an awkward kind of food. But I believe he was trying to conjure up that kind by referring to an imagined prototypical example. If it refered to anything, I guess the slippery piece of toast would refer to this prototypical exemplar of the kind, not to the kind itself. This is an awkward idea to get straight, and given the stark results of the above kind-denoting predicate diagnostics, I'm just going to gloss over it. Grant me that the slippery piece of toast cannot refer directly to a kind. So what? We'll have to get a little technical now. Consider an example Mark found:

Their Barbera is a fun and fruity wine that always pleases us

Here the indefinite a fun and fruity wine can be paraphrased as a fun and fruity kind of wine. But always does not take wide scope over this indefinite, which would allow it to quantify over kinds of wine. That is, the meaning of the relative clause in this example does not look like the following gloss, even after pragmatics has filled in the fact that we are talking about events in which someone drinks or tastes the beverage in question:

every e (e is an event involving [drinking] a kind of wine IMPLIES e is an event in which that kind of wine pleases us)

No, what always quantifies over is events involving that particular kind of wine. The meaning of the whole NP a fun and fruity wine that always pleases us can be given as:

a kind of wine x such that RC

The meaning of the relative clause, RC, should then be:

every e (e is an event involving [drinking] x IMPLIES e is an event in which x pleases us)

The crucial point here is that you don't drink or get pleased by kinds as such. Rather, you drink or get pleased by exemplars of the kind. That's what it means to drink or get pleased by a kind. So because of the way we interpret predicates applied to kinds, the meaning of the relative clause, RC, is equivalent to:

every e (e is an event involving [drinking an exemplar of] x IMPLIES e is an event in which [that exemplar of] x pleases us)

Let's check one of the above toast examples from Mark, a light and crispy toast that always hits the floor marmalade side up. This must mean:

a light and crispy kind of toast x such that RC

Here RC is:

every e (e is an event involving [dropping an exemplar of] x IMPLIES [that exemplar of] x hits the floor marmalade side up.

Hey presto - a light and crispy piece of toast gets to do something a slippery piece of toast couldn't! I realize that I haven't been able to explain this puzzle without introducing some technicalities, and I've left many further complications out of the discussion. None the less, I hope the outlines of the explanation are clear. Here is the whole story in five hard and fast bullets best taken straight to the head:

The slippery piece of toast is not kind denoting;
A light and crispy toast is kind denoting;
It isn't kinds that hit the floor but exemplars of the kinds;
Always cannot quantify over the kind introducing expression, but since events involving the kind actually involve exemplars of the kind, always manages to quantify over those exemplars;
That's a neat trick. Try to remember it next time you are trapped on a scope island with no toast.

Posted by David Beaver at 04:51 AM

"Losing" "the subjunctive"

Following up on Mark Liberman's posting about "subjunctive case" in if I were you (where the appropriate label is "mood", not "case"), Geoff Pullum (7/1/04) notes that "subjunctive" isn't a very good label, either, and suggests "irrealis" as an alternative. I'm going to use Geoff's brief comment as a springboard to combat two common misapprehensions about inflectional morphology and its relationship to syntax and semantics and to question a common analysis of inflectional forms (like irrealis mood) that can easily be seen only for a few lexemes -- what I'll call underbrush forms, because they lurk concealed in the morphological underbrush.

The first misapprehension -- which Geoff clearly doesn't hold, though a careless reader could miss this -- is that there is a simple relationship between syntactic properties of phrases or clauses, morphological properties of words, and semantics. The second misapprehension -- which Geoff also doesn't hold, though the terminological focus of his comment conceals this -- is that the primary issue in analyzing phenomena like the English so-called subjunctive is how to label the forms. The questionable analysis -- which Geoff actually proposes -- is that an underbrush form is simply lacking for most lexemes, with some other form standing in for it.

First, Geoff's comment, in its entirety:

---------------------

It isn't actually the subjunctive. People often call the "were" of "I wish I were" subjunctive, but that term is much better used (as in The Cambridge Grammar of the English Language) for the construction with "be" seen in "I demand that it be done." The "were" form is often wrongly called a past subjunctive, but of course "it were done" is not a past tense of "it be done". The difference between the two is that the subjunctive construction occurs with any verb: "I demand that this cease" is a subjunctive (notice "this cease", not "this ceases"). The relic form in "I were" is only available for "be". For all other verbs you use the preterite: "I wish I went to New York more often." The Cambridge Grammar calls the "were" form the irrealis form. It is surviving robustly in expressions like "if I were you", but even there it has a universally accepted alternate "if I was you", and there is no semantic distinction there to preserve.

---------------------

1. The very careful reader will see that Geoff sometimes talks about "constructions" and sometimes about "forms" (and that he makes no explicit reference to semantics). This is a distinction between syntactic properties of phrases or clauses (the expression "it be done" in "I demand that it be done" has, among others, the property Constr:285, for which CGEL uses the label "subjunctive") and morphological properties of words within these larger expressions (the verb word "be" in "it be done" has, among others, the property Form:U, for which CGEL uses the label "plain form", though others call it the bare, base, unmarked, infinitive, or unmarked infinitive form). Geoff provides no label for the construction of "I were you" (I'll call it the "plain counterfactual", Constr:286). For the property Form:I of "were" in "I were you", CGEL provides the label "irrealis".

Semantics gets into the act by being associated with phrase properties; strictly speaking, a construction is a pairing of (a) a package of conditions on the internal syntax of expressions, like being a finite clause with a head verb word of Form:U, with (b) conditions on the semantic interpretations for those expressions, like denoting an obligation.

In many, possibly most, situations the distinction between syntactic properties and morphological properties is too obvious to be missed. This is because so many inflectional forms are multifunctional, even flagrantly so: they occur in a variety of constructions, of otherwise varied syntax and semantics. Form:U, for example, is used in imperative clauses ("Be quiet!"), with infinitival "to" ("to be quiet"), with modals ("will be quiet"), in complements of certain verbs ("made them be quiet"), and in several other diverse constructions, including the plain counterfactual.

Every so often, though, an inflectional form is closely tied to one particular construction, and then it's tempting to identify the form with the construction (and with the semantics for the construction). This is the case with Form:I and the plain counterfactual construction. So maybe it's not entirely an accident that Geoff failed to give a label for Constr:286; it might not have seemed necessary.

Tempting, yes. But you should resist the temptation. This line-up of syntax, morphology, and semantics is a fluke, something that happens with new inflectional forms (which have not yet developed further uses) and moribund ones (which have lost their other uses, though replacement by innovative syntax).

Sometimes it turns out that there are other constructions that use an inflectional form. That's true here: there's Constr:287, which I'll call the inverted counterfactual, as in "were I your teacher". This is clearly different syntactically from the plain counterfactual (in addition to Subject-Auxiliary Inversion, the inverted counterfactual is also incompatible with a subordinator like "if"), though it shares with the plain counterfactual the use of Form:I and the contrary-to-fact semantics.

2. I've been providing arbitrary designations for both phrase properties (Constr:286) and word properties (Form:I), along with suggestive labels of my own devising or from CGEL (plain counterfactual, irrealis). It's important to realize that these suggestive labels play absolutely no role in the description of the language. If they're well chosen, they allude to some relevant aspect of syntax or semantics, but the labels are in no way descriptions, of either the syntax or the semantics.

So there's no substantive issue here. "Irrealis" is a much better name for Form:I than, say, "cislocative" or, for that matter, "elephant", but it's at best a hint at the semantics of the constructions in which it occurs.

3. Now we come to the Missing Form analysis of underbrush forms like Form:I. To describe the facts fairly neutrally: only one lexeme, BE, has a Form:I distinct from other forms; for all other verbs, in constructions that call for Form:I, a form identical to Form:T (variously called the past or the preterite) is used. Geoff's translation of this -- this isn't something he devised, it's a very common formulation -- is that only the lexeme BE has a Form:I, and that all other verbs lack a Form:I, using instead Form:T.

There are two parts to the Missing Form analysis. First, the specification of a lexical gap; almost all verbs have a Form:I gap, and in this respect they are like the modal verbs of English, which all lack Form:U, Form:N (the present participle, gerund participle, or -ing form), and Form:P (the past participle, or -en form). But the Form:I gap is unlike the gaps for the modals, in that, in a second piece of the analysis, a subsidiary principle fills the gap by supplying Form:T. There is some question about the details of the Missing Form analysis, but first let's look at the alternative, the Syncretism analysis, and ask why someone might reject it.

In the Syncretism analysis, every verb lexeme has a Form:I. For BE, this is stipulated to be "were"; otherwise, a rule of morphology (a "referral rule", in the terminology of an old paper of mine; the idea is developed at length by Greg Stump in his 2001 book Inflectional Morphology) says that Form:I for a verb is the same as Form:T for that verb.

So what's the problem? Well, lots of linguists think that there's something wasteful about having all those redundant Form:I's listed for every verb lexeme in the language except one. Frankly, I've never understood this objection. Nobody's claiming that people keep all this stuff stored in their heads as a big list, any more than anyone claims that people keep all those regular, perfectly predictable Form:T's in /d/ ("stored", "jumped", "batted") stored in their heads as a big list.

In any case, these objections normally arise only for a small set of morphological anomalies involving special forms, those in which three conditions are all satisfied: (1) the number of lexemes showing the special forms is very small (for Form:I, this figure is 1); (2) the number of (morphological) principles for alternatives to the special forms is very small (for Form:I, this figure is 1); and (3) the number of constructions that use the special forms is very small (for Form:I, this figure is approximately 2). This constellation of characteristics isn't rare: accusative (vs. nominative) forms for English Ns, partitive (vs. genitive) forms for Russian Ns, locative (vs. prepositional) forms for Russian Ns, vocative (vs. nominative) forms for Latin Ns, and many more. (I use more or less traditional names for the forms, rather than engaging in a gigantic meticulous renaming.)

The difficulty here is that each of these three conditions represents one pole of a cline that extends far in the other direction, and there's no motivated point at which you can say that the numbers are no longer small enough to justify the Missing Form analysis. Look at the far end of cline (1): Form:2 (usually called "plural") in English. For almost all English Ns, Form:2 is distinct from Form:1, but there are a few -- SHEEP is the classic example -- usually analyzed as "zero plurals" (as syncretic). No one says that SHEEP lacks a Form:2, and that constructions calling for a Form:2 use the Form:1 "sheep" instead. But that would be the parallel to the Missing Form analysis of Form:I for verbs. In between these two examples there are all sorts of things, some of them involving very complex patterns of identities (see the examples in Stump's book), that is, high numbers on cline (2), and some involving multifunctional forms, that is, high numbers on cline (3). A good framework for morphology would cover the whole territory, rather than carving out one small portion of the territory for a special Missing Form treatment, admitting syncretism everywhere else.

Now return to the details of the Missing Form analysis for Form:I. The issue is whether the gap-filling is done in the morphology or the syntax. If it's done by a morphological principle or principles, then there really is no difference between the two analyses; "use Form:T if there is no Form:I" is a referral rule under another name. If it's done in the syntax, then any construction calling for Form:I comes with the proviso that if Form:I is lacking, you use Form:T -- a kind of principle linking syntax to morphology. This version turns out to have empirical consequences, and for Form:I, those consequences are not nice.

Here's the thing... As Geoff observed in his comment, English changed, in a way that is usually described as the "loss" of Form:I: no verb lexeme has a Form:I any more, with the result that Form:I was replaced by Form:T. (Well, this isn't what happened, of course. Innovative forms always coexist for at least a while with the forms they replace. In fact, many speakers today still use both versions, and a great many people understand both.) The Missing Form analysis predicts that all constructions using Form:I would shift together, but this is not what happened: "if I were your friend" developed an alternative "if I was your friend", but "were I your friend" didn't develop an alternative "was I your friend (i would tell you the truth)".

On the Syncretism analysis, syntactic constructions behave separately; they can call for Form:I or Form:T or whatever. And they can change independently. Which is what happened in this case. The innovative construction, call it Constr:288, is just like Constr:286 but calls for Form:T. The three counterfactual constructions coexist for sometime -- in my variety of English, for example. When Constr:286 and Constr:287 no longer occur with sufficient frequency for new generations to learn them, then there will be no evidence left for a Form:I in the language, and it will genuinely be lost (as it surely is for some speakers already).

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 02:46 AM

July 10, 2004

If wine and stew can always, why can't toast?

Eric Bakovic recently teased Charles Harrington Elster for a semantic solecism, "the slippery piece of toast that always hits the floor jelly side down."

Don't worry -- I'm not going to stick up for Elster. I might have praised William Safire, and even defended Richard Lederer, but in this case, I agree with Eric that Elster's phrase evokes "the same piece of toast reconstituting itself over and over again and falling, always falling, on its face". My problem is that I'm still not clear about why.

I was initially persuaded by Eric's suggestion that Elster might have tried (and failed) to get a generic malicious-toast category going. Eric's post brought to mind the "food that always" cliché of restaurant reviews:

(link) This is a cosy, unpretentious place with food that always satisfies and sometimes delights.

After all, it's different food on each occasion -- at least I hope so -- but in another sense, it's the same food. And how big a jump is it from "food that always..." to "toast that always..."?

This is a friendly little diner with toast that always hits the floor jelly-side up. [sorry, you can't always find the example you want via Google]

You might think that Elster's problem was using the definite article the in his phrase. But a reviewer might also praise a restaurant for "the seafood that is always fresh" or "the pasta that is always al dente", so why not "the toast that always..."?

Those were my thoughts, before David Beaver laid out a feast of crunchy scopal goodness that included offered another, more compelling explanation:

The problem with the slippery piece of toast that always hits the floor jelly side down is that the quantificational element always occurs in a relative clause, and the variable for the piece of toast is introduced in a higher clause.

Now David is a trained semantic professional, able to field-strip a donkey sentence in seconds. And he gives persuasive examples (involving diplomats and countries) to show that relative clauses are scopal islands.

But I'm still stuck on these other kinds of food examples, for instance:

(link) Their Barbera is a fun and fruity wine that always pleases us...
(link) My mother used to make a fantastic beef stew that always tasted better the next day.

As in the restaurant reviews, what seems to be going on here is that there's a single abstract wine or stew entity that always behaves in a certain way across occasions, even though its instantiations on the different occasions are completely disjoint in every physical respect.

So we could write, by analogy,

Their sourdough is a light and crispy toast that always hits the floor marmelade side up.
My mother used to make a delicious rye toast that always hit the floor creamcheese side up.

But then, why can't Elster have his " slippery piece of toast that always hits the floor jelly side down"?

I don't know. It just doesn't work. Probably David Beaver will be able to explain it to me.

This is something that distinguishes linguists from "language mavens". If a generalization that we've been taught -- or worked out for ourselves -- seems to conflict with common usage or with our own judgments, our reaction is to question the generalization or its application to the cases in question. Language mavens try adjust their own usage to fit the "rule", and sneer at those who don't.

[Update: David Beaver, who is a bit busy, sent the following helpful note, pending fuller explication when he has time:

Oh, I see now, ... the difference is that between:

1) the slippery piece of toast that always hits the floor jelly side down
and
2) the slippery (kind of) toast that always hits the floor jelly side down

I think it's "piece of" that rejects being kind denoting in this case. Confusingly, "the piece" can still be quantified over generically, as in:

3) the slippery piece of toast is always the last one eaten.

I can just about conjure up a set of situations involving both one slippery and one or more non-slippery pieces of toast, such that the slippery one is the last one eaten. It was Carlson who distinguished between different types of generic sentences, only some of which involve references to kinds. Something close to a diagnostic is whether you can add: "is extinct" or "is not common these days".

Thus

4) That fruity wine is not common these days
5) ?? That piece of toast is not common these days.

]

Posted by Mark Liberman at 07:00 PM

Trapped on a toastless scope island

Eric Bakovic writes here about Charles Harrington Elster's the slippery piece of toast that always hits the floor jelly side down. Eric says that:

No matter how many pieces of toast Elster has managed to get his readers to round up in their collective imagination always does not quantify universally over a set of objects, even if it tries to do so indirectly by quantifying universally over a set of events associated (one-to-one) with those objects.

And Eric's right that Elster's slippery NP fails to quantify over pieces of toast in the right way. But why? Is it because in general always cannot quantify over objects, or because there's something amiss with the particular structure Elster used?

The semantics literature contains loads of evidence that always (and this applies also to other quantificational adverbs, such as never, usually, sometimes) does not simply quantify universally over times, but can quantify over events, or even situations. The best known reference is David Lewis (1975, "Adverbs of Quantification", In E. Keenan (ed.) Formal Semantics of Natural Language, CUP, 3-15), but more recently there is Kai von Fintel's 1994 UMass dissertation and work of Schubert & Pelletier, Mats Rooth, Ariel Cohen and many others.

Why should we think always can quantify over objects? The following examples should show you why:

I always shoot cats (/a cat) on sight.

[Meaning: for every cat I see, I shoot it. I'm an avid cat photographer you see.]

Mary always beats me in a game of ping pong.

[Meaning: for every game of ping pong I play with Mary, she wins.]

If a farmer owns a donkey, he always beats it.

[Ahh, donkeys. Semanticists at least since Geach love donkeys, and I'm sure the above sentence has come up in a bunch of places. Everybody should read what the inimitable Larry Horn says about donkey sentences here. Anyway, the example above means something like: for every for every farmer and for every donkey owned by that farmer, the farmer beats the donkey. At ping pong, presumably.]

Lewis famously argued that what always quantified over was what he termed cases, bunches of individuals tied together in some situation. That is, the claim is not that always does not have a temporal quantification reading, as in W.C. Fields I always keep a supply of stimulant handy in case I see a snake--which I also keep handy. The claim is that sometimes always (and sometimes too!) quantifies over cases. There's a lot more to say about how we decide what always quantifies over, since there's some really fun pragmatics involved, but I'll tell you more about that in a separate post. For now, I'll just comment on why always can quantify over cats, games, farmers and donkeys but not easily over toast in Elster's example.

The problem with the slippery piece of toast that always hits the floor jelly side down is that the quantificational element always occurs in a relative clause, and the variable for the piece of toast is introduced in a higher clause. Semantically, the structure of the NP is something like this:

the x [slippery(x) & toast(x) & RC]

Here RC is the meaning of the relative clause, which could be represented as:

for every e ([e is an event of x with jelly hitting the floor] implies [e is an event of x hitting the floor jelly side down])

The problem is simply that relative clauses, as has often been observed, are what we term scope islands: I've put a quick guide to scope islands and scope terminology in the yellow box below. An operator (like always) within a relative clause does not like to take wider scope than operators outside the relative. So it seems Elster's problem is not that he doesn't understand the meaning of always, but that he formed a relative clause a tad sloppily, leaving within it an always that couldn't quite perform the function that he might have expected it to. Then again, we might hypothesize instead that Elster really is the sort of guy that repeatedly drops the same piece of toast over and over again, and supposes that this is the sort of everyday experience with which his readers will empathize. Ceteris paribus, I prefer a theory in which we simply assume Elster said what he meant, but unfortunately we don't have so much as a single sticky crumb of evidence to go on.

By the way, here's my theory of toast. Toast will tend not to fall horizontally since it is aerodynamically unstable in this configuration. As a result, in flight (and this could also be a result of initial angular velocity) it will tend to head towards a vertical orientation while dropping. The momentum built up during this manoeuvre will normally take it slightly beyond the vertical equilibrium position. The toast is just about to compensate for this over-rotation when SPLAT. It hits the floor close to the vertical, but tilted slightly onto the butter/jelly side. After that, bad news is inevitable. So here's my advice to Elster, apart from being more careful with his relative clauses. You have three options:

Eat the toast upside down, with jelly initially underneath
Eat very close to the ground, so that the toast does not have time to make even a quarter rotation. Eating very high up may also help.
Sandwiches, you fool!

A Quick Intro to Scope Islands

Here's a simple illustration of the scope island effect:

(1) A diplomat visited every country

Example (1) can mean either that there was a diplomat, and that diplomat visited every country or that every country was visited by some diplomat or other. In the first case we say that a diplomat has wide scope over every country (and conversely that every country takes narrow scope), and in the second that every country takes wide scope.

(2) A diplomat who visited every country was exhausted.

In example (2), a diplomat must take wide scope over every country, because every country is in a relative clause, and a relative clause is a scope island. So (2) can only mean that there is a diplomat and that diplomat visited every country and was exhausted. it cannot mean that for every country there was some diplomat or other who visited that country and was exhausted.

Posted by David Beaver at 04:46 PM

It's all grammar

To PITS, People In The Street, "grammar" embraces pretty much everything having to do with language, spoken or written, so long as it's regulated in some way: syntax, morphology, word choice, pronunciation, politeness, discourse organization, clarity and effectiveness, spelling, punctuation, capitalization, bibliographic style, whatever. I offer yet another example of this broad usage, from a recent letter to the New York Times -- it's about commas in dates -- and then pause to ask what PITS should do. Sure, linguists find the broad usage deeply annoying, because it lumps together such disparate things, some of them of much less consequence than others. But do we have something better to offer?

Here's the letter (in full), from Betsy Wade of New York, dated July 8, 2004, and appearing in the Times the next day (p. A18 in the edition I get out here on the Left Coast):

Cornerstone Grammar

To the Editor:

Re "A 9/11 Cornerstone, Chiseled With a New York Accent," by David W. Dunlap (Blocks column, July 8):

The section of the 9/11 cornerstone inscription depicted in the accompanying photograph clearly shows that the grammatically necessary comma after "2001" ("on September 11, 2001") is absent.

As a longtime editor, I hope that the artisans will be able to correct this omission in the handsome Gotham typeface.

----------------

Now, this is a mind-numbingly inconsequential issue. Nothing would be lost or confused if we wrote, printed, or chiseled "September 11 2001", and, indeed, the other order of month and day normally appears without a comma: "11 September 2001". In the case of the serial comma, vs. its absence, or the quote-punc order (periods and commas outside right quotation marks unless they were in the material being quoted), vs. the punc-quote order, there are actual issues of informational accuracy, as Geoff Pullum laid out in his article "Punctuation and Human Freedom", in The Great Eskimo Vocabulary Hoax. (You will note that, following the practice that Bernard Bloch established for the Linguistic Society of America's journal Language, I'm a quote-punc kind of guy. Also that I'm a serial comma user.)

People sometimes tell me that the comma in "September 11, 2001" represents the suspensive intonation that characterizes, among other things, appositives and parentheticals. But "11 September 2001" has the same intonation after "September" and doesn't get a comma. And anyway everybody knows that intonation is only a very rough guide to where commas should appear in written English.

The only function of the comma in "September 11, 2001" is as a reinforcement of the visual space between one numeral, "11", and another, "2001". That's helpful, I suppose, but it's scarcely necessary. We cope perfectly well with things like "There are 11 2001 models of the Mammoth Modulator, one for every pocketbook", where an intervening comma is absolutely barred.

[Department of Corrections, 7/25/04: Steve at languagehat points out in e-mail that I misread the clear meaning of Ms. Wade's letter; it's not the comma between "11" and "2001" that's at issue -- that one is on the cornerstone -- but one following "2001". As Steve puts it: "In other words, she feels that 'To honor and remember those who lost their lives on September 11, 2001 and as a tribute to the enduring spirit of freedom' requires a comma to separate the grammatically distinct parts of the clause ('to honor and remember... and as a tribute'); you may or may not agree, but calling it 'grammar' is not as obviously ludicrous as in your version." Well, it's still punctuation, not grammar, I say, but I agree that the point is slightly more consequential, since the alternative punctuations convey subtly different views about the purpose(s) of the cornerstone: without the comma, the two purposes are presented as equally significant, while with the comma, the first is presented as primary and the second as subsidiary. In any case, the comma isn't necessary, grammatically or otherwise. End of digression.

But to get back to "grammar": What should Betsy Wade have written, instead of "grammatically necessary"? "Orthographically necessary", I guess. Would her readers have understood that? Probably not. She would have done fine with the wordier "the comma that written English requires after..." But it's likely that neither of these possibilities occurred to her, because for PITS the written language is the real language. So she had to reach for something that referred to the language system as a whole and to norms, and that word seems to be "grammar"; why, AHD4 records just this usage, in definition 3a for "grammar": "A normative or prescriptive set of rules setting forth the current standard of usage..." (Grappling with a similar problem, reported on here, New York Times Magazine writer Barry Bearak fixed on "syntax" as a way of referring to nonstandard pronunciation.)

Notice that no distinction is made here between grammar and usage. PITS are deeply unclear about this distinction, and the fact that most manuals treat both grammar and usage between the same two covers, and tend to elevate usage advice to the status of rules of grammar, doesn't make the distinction any clearer to PITS. In fact, linguists diverge on this point: Fritz Newmeyer's presidential address to the LSA in 2002 (published in Language in 2003) proclaimed that "Grammar is Grammar and Usage is Usage", but Joan Bybee's presidential address to the LSA in 2005 will maintain just the opposite: "Grammar is Usage and Usage is Grammar".

Still, it's hard for a linguist not to feel that the profession has failed to get across the idea that the conventions for punctuating written English have a different status from, and much less significance than, say, SVO as the default word order for the language, or, for that matter, the injunction to avoid pernicious ambiguities in pronominal reference.

Well, we've fallen down on other fronts as well. For example, we haven't done well in getting PITS to think of the word "linguist" as ambiguous, referring either to someone with a practical interest in language (in learning languages, teaching them, interpreting, or translating), or to someone with an analytical interest in language.

Or are these just lost causes?

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 03:14 PM

Year of Languages

OK, raise your hand if you knew that 2005 is soon to be proclaimed by the United States Senate as the Year of Languages. Anyone? Anyone? Bueller? Well, don't feel bad; until I was catching up on some LINGUIST List reading late this evening, I didn't know either.

Not that there were any direct announcements on LINGUIST List. (Well, maybe there were at some point; I'm having a hard time with their search engine right now.) No, the news came indirectly from one Rick Rickerson of the College of Charleston. The title of the message is "NPR Radio Program on Languages", which sparked my interest because I collect radio clips about language(s) and linguistics for use in my undergrad classes. The message begins:

I'm director of of [sic] the language division at the College of Charleston and am just back from LSA Headquarters in Washington. I had a chat with Maggie Reynolds, who referred me to the LSA website. I'm about to do an NPR radio program in support of the soon-to-be-proclaimed (by the US Senate) Year of Languages in the US.

I'm reading this and thinking, OK, it's only July, maybe this is really news. But then I search Google for "Year of Languages". I got an overwhelming number of hits for the European Year of Languages (which was 2001, in case you didn't know), but also some not-necessarily-new ones for the US Year of Languages. Here are the four hits from the first two pages of Google results:

A featured article from December 2003 at the Center for Applied Linguistics site.
The information center for the Year of Languages, sponsored by The American Council on the Teaching of Foreign Languages. (Essentially the same page exists here, with a more transparent URL, but doesn't show up until the 10th page of Google results.)
A page linked from the Alabama Association of Foreign Language Teachers (somehow affiliated with the University of Alabama at Birmingham).
A page linked from the Foreign Language Association of Georgia.

The next link doesn't appear until the 8th page of Google results, and there's not much after that.

So then I went to the LSA site, and there's nothing there. (As Mark once said, I think we could do better.) Nothing at LINGUIST List either, as I already noted (modulo search engine difficulties). Even the MLA site is silent on the issue (and it bothers me that I was more surprised by that).

What's going on? Why are linguists the last to know about this important PR opportunity?

Rick Rickerson continues:

There will be 6-minute segments on language topics of interest to the general public, starting in January and airing throughout 2005. Since it's part of a nationwide celebration of language, I'm asking professionals around the country to suggest topics and, if possible, contribute scripts. Would any of you be interested in doing so?

I think we'd better be. Those of us who contribute to Language Log, at least, appear to have some time and energy on our hands for this sort of thing.

Or could I ask you to look over a list of topics and give me the name and e-mail address of linguists you think might want to do a piece on that subject? These are questions like: How do babies learn to talk; Dialect vs. language; What's a Pidgin; What causes foreign accents; Indian languages of Southeast US; Is ASL a language; the Significance of Rosetta Stone; How close are Spanish and Portuguese; etc. I would appreciate your help and suggestions!

In case you missed it: erickerson@comcast.net.

Oh, and apparently, 2004-2005 were already resolved to be 'Years of Foreign Language Study'. Bet you didn't know that either.

[Addendum: there's a passing reference to the Year of Languages about halfway down a page of "news" at the web site for the Joint National Committee for Languages (JNCL) and the National Council for Languages and International Studies (NCLIS).]

[ Comments? ]

Posted by Eric Bakovic at 04:07 AM

July 09, 2004

Spiteful things

Speaking of William Safire and Norma Loquendi, consider the following excerpt from Charles Harrington Elster's September 21, 2003 guest contribution to NYT Magazine's On Language column. Elster -- who until recently was Richard Lederer's radio co-host -- sings the praises of the word resistentialism, the "seemingly spiteful behavior manifested by inanimate objects":

Here, at last, was a word for the rug that quietly curls up so it can snag your toe, the sock gone AWOL from the dryer, the slippery piece of toast that always hits the floor jelly side down.

I've often wondered about the jelly-side-down issue. Personally, I'm not going to eat the toast no matter how it ends up on the floor. I suppose there's more clean-up involved if the jelly hits the floor, but somehow I doubt that this is what self-identified victims of resistentialism are worried about.

But what I'm really concerned with here is Elster's use of the word always, universally quantifying over the predicate of the relative clause (hits the floor jelly side down) that in turn modifies a singular definite noun phrase (the slippery piece of toast). Surely Elster isn't saying that he's been resistentially victimized by the same piece of toast over and over again. Or is he?

Maybe it's relevant that Elster is writing about cases of resistentialism that we are all familiar with, if not from personal experience then from shared cultural references. Elster is appealing to each and every one of his readers to conjure up their own spiteful things: "you know the kinds of rug I'm talking about, you know those ship-jumping socks (or is it thieving dryers?), you know those pieces of toast that always land on their jellied sides" ...

No, that's still bad. No matter how many pieces of toast Elster has managed to get his readers to round up in their collective imagination, always does not quantify universally over a set of objects, even if it tries to do so indirectly by quantifying universally over a set of events associated (one-to-one) with those objects.

I'm starting to think the explanation I first rejected is the correct one: just as true resistentiophobes imagine that rugs are out to get them and that dryers are hungry for their socks, maybe they each believe that it's the same piece of toast reconstituting itself over and over again and falling, always falling, on its face.

[ Comments? ]

Posted by Eric Bakovic at 06:01 PM

Still on the hook

I'm really excited. Last night I wrote a little piece about an alleged intonational distinction between I could care less and I couldn't care less, and just a little more than two hours later Mark Liberman responds with a "promise to examine the examples in some conversational speech corpora". That's the power of blogging.

But if you have read both of these posts, you might ask: shouldn't I also be perturbed that Mark defended Richard Lederer against my criticism of his assertions about speakers who say I could care less? Not at all. Here's why.

As I have noted before, Lederer understands little if anything about the sounds and sound systems of human languages. I think it's a pretty safe bet that even if Lederer had read Pinker's "sarcasm hypothesis" about I could care less, he is ill-equipped to judge it based on anything other than his prescriptive predispositions.

Arnold Zwicky also comments on Lederer's penchant for nastiness and insult:

The fall into error is a brand of humor that tends to have a nasty edge to it, a superior They Should Have Known Better tone when directed (as in this case) towards professionals, a sneering They Are Ignorant Fools tone when aimed at ordinary folks, in particular students, with their extraordinary word choices, erratic spellings, and uncertain grasp of facts. I guess we're supposed to be shamed into learning some important lesson from these bad examples, but it would be hard for even a compliant and well-disposed reader to do much with what Lederer provides in this column.

This only reinforces my belief that I am right in asserting that Lederer's mission is not a considered analysis of Norma Loquendi, but simply to dig up "mistakes" wherever he can find them and to expose them for fun and profit.

So, if Mark is right and demonstrates that I could care less and I couldn't care less "are not generally distinguished prosodically as Pinker asserts they are", and/or that "the cited prosodic difference would not as a general rule yield the asserted (sarcastic vs. non-sarcastic) difference in interpretation" -- and I would be a fool to take Pinker's word over Mark's on matters of intonation anyway -- I'll be happy to read about it. I don't think Lederer would be off my hook, though.

[ Comments? ]

Posted by Eric Bakovic at 01:53 PM

Exit only this lane

Jeff Erickson writes to me from the University of Illinois, apropos of my recent post on exit signs, to remind me (for I had also noticed this myself) about the freeway signs that say EXIT ONLY THIS LANE. Careful observation of lane markings and thought about the logical form (best done when travelling as a passenger) will reveal that sometimes they are clearly intended to have a meaning in which ONLY goes with EXIT ("exit only, if this lane is the one you're in") and sometimes they definitely have a meanin in which ONLY goes with THIS LANE ("only this lane, if exiting is what you want to do"). Jeff adds that sometimes you can read the sign as the conjunction of those two meanings ("only exiting is allowed from this lane AND only from this lane can you exit"). How could gross and consequential ambiguities like this be tolerated in an environment where, to an almost unique extent, accurate reading and parsing is essential at high speed and life and limb may depend upon the interpretation process? All I can add is that we're just lucky that they don't (as far as I have seen) tend to write such signs on the road surface. If they did, you'd be driving over something that looked like this:

LANE
THIS
ONLY
EXIT

Posted by Geoffrey K. Pullum at 01:46 PM

Scheduled outage

Language Log will be unavailable this evening (Friday, July 9, 2004) between about 6:30 p.m. and midnight, Philadelphia time. This is because construction work on a new bioengineering building requires a scheduled power outage that will affect the core network hardware in Penn's School of Engineering and Applied Science.

Our server is in a different building, and will not be directly affected by the power cut, but it will lose external network access.

Posted by Mark Liberman at 12:44 PM

Only lane bike: road surface psycholinguistics

I have remarked elsewhere that I have often been puzzled by seeing ONLY written under a left-pointing arrow. But now here is the single most puzzling practical linguistic thing I know of in any domain. Why the hell do all the authorities who put signs on road surfaces in the USA make the completely false assumption that you are going to read the words in the order in which your front bumper arrives at them? It is madness; psycholinguistic bunk. That is not what happens for me. I can't believe anyone has a different reaction. As soon as you see the block of words, you instinctively read them all, from the top. Look at this, which is painted on a road surface on my campus:

ONLY
LANE
B I K E

What do you see? ONLY LANE BIKE, right? It's the same with XING PED; it's the same with AHEAD STOP; it's the same with CLEAR KEEP. The way they lay them out, backwards them read you. Impeded is comprehension. Am I the only person in the whole United flaming States smart enough to have noticed this and to have realized what the problem is and what the solution would be? Like I say, I'm baffled. Psycholinguists, check your pagers. The state highway authority needs your advice.

Posted by Geoffrey K. Pullum at 12:06 AM

July 08, 2004

Caring less with stress

I thought I had already broken the ultimate taboo, but it turns out that there were some depths yet to be explored. Believe it or not, I'm about to defend Richard Lederer against Steven Pinker.

In a recent Language Log post, Eric Bakovic gives Lederer a hard time for asserting that people who say I could care less are being illogical.

Eric's first argument is that since people use this form just about as often as I couldn't care less, Norma Loquendi has spoken, and we need to listen. I agree with Eric up to this point. Eric offers some Google counts in support: the two phrases are about equally frequent. In an earlier posting here, Chris Potts listed I could care less as an example of the "handful of English constructions in which, quite surprisingly, one can add or remove a negation without change of meaning", and he took it for granted that this is just a usage to explain, not an error in logic.

Speaking for myself, I don't take this to be a point of principle, that whatever is said or written reasonably often must ipso facto be part of the language.. Especially where negation is involved, there are plenty of common mistakes, like fail to miss used to mean "miss", or no X is too Y to ignore used to mean "no X is so Y that one should ignore it." It can be a complicated question, conceptually and practically, to decide whether such cases are constructions, fixed expressions, idioms or whatever, as opposed to natural mistakes that people often make in using a psychologically difficult combination of elements and structures. This is partly a question of linguistic analysis, and partly a question of psychological interpretation, and partly a question of social norms. In any case, the categories of "mistake" and "English expression" are obviously overlapping, psychologically, historically and socially.

However, I don't think there's much question at all about could care less, which has clearly become a well-accepted colloquial expression in contemporary American English. This conclusion can claim the sanction of the OED, which gives sense 4 of care as

4. In negative and conditional construction: a. not to care passes from the notion of ‘not to trouble oneself’, to those of ‘not to mind, not to regard or pay any deference or attention, to pay no respect, be indifferent’.

and then among the various subtypes listed (e.g. care a button or a fig) comes eventually to the specific phrase in question,

(c) Colloq. phr. (I, etc.) couldn't care less: (I am, etc.) completely uninterested, utterly indifferent; freq. as phr. used attrib. Hence couldn't-care-less-ness.

for which the earliest citation is from 1946, and then gives an explicit listing to the unnegated form:

(d) U.S. colloq. phr. (I, etc.) could care less = sense (c) above, with omission of negative.

1966 Seattle Post-Intelligencer 1 Nov. 21/2 My husband is a lethargic, indecisive guy who drifts along from day to day. If a bill doesn't get paid he could care less.
1973 Washington Post 5 Jan. B1/1 A few crusty-souled Republican senators who could care less about symbolic rewards.
1978 J. CARROLL Mortal Friends III. iii. 281 ‘I hate sneaking past your servants in the morning.’ ‘They know, anyway. They could care less. Thornton mistreats them horribly.’

with a first citation a mere 20 years later. The OED does tell us that (I, etc.) could care less is a "colloq. phr." -- but so is (I, etc.) couldn't care less. The only difference is that (I, etc.) could care less is a "U.S. colloq. phr."

This is not much of a difference for Lederer to hang his hat on, but then English-language prescriptivism has always had a solid admixture of anti-Americanism.

OK, so far, so good. Eric, Chris, Google, the OED and I are all in agreement.

But Lederer is not off Eric's hook yet. Eric points out that Steven Pinker has written about could care less, both in The Language Instinct and in a 1994 New Republic article. The crucial Pinkerian passage is this:

A tin ear for stress and melody, and an obliviousness to the principles of discourse and rhetoric, are important tools of the trade for the language maven. Consider an alleged atrocity committed by today's youth: the expression I could care less. The teenagers are trying to express disdain, the adults note, in which case they should be saying I couldn't care less. If they could care less than they do, that means that they really do care, the opposite of what they are trying to say. But if these dudes would stop ragging on teenagers and scope out the construction, they would see that their argument is bogus. Listen to how the two versions are pronounced:

COULDN'T care I

LE CARE

i ESS. LE

could ESS.

The melodies and stresses are completely different, and for a good reason. The second version is not illogical, it's sarcastic.

(By the way, could care less is hardly used only by "today's youth" -- the authors of the OED's 1966 and 1973 citations are presumably old and gray by now, if they are even still alive.)

As Eric says,

...this hypothesis has the added advantage of not insulting the intelligence of the half of the population that uses the allegedly incorrect form. (The only thing missing is independent evidence that the same intonational distinction holds of other sarcastic-nonsarcastic utterance pairs; Pinker also does not cite any sources for this claim, unlike many other claims made and discussed in the book.)

But unfortunately, that's not the only thing missing. Pinker doesn't provide any evidence that the claimed difference in stress and/or pitch is actually used to distinguish these phrases, or that it would have the asserted effect on interpretation if it did. And unfortunately for this otherwise neat hypothesis, I'm fairly confident that (a) the two phrases are not generally distinguished prosodically as Pinker asserts they are; and that (b) the cited prosodic difference would not as a general rule yield the asserted (sarcastic vs. non-sarcastic) difference in interpretation.

I promise to examine the examples in some conversational speech corpora to evaluate (a), and show you all the pitch tracks. And I'll say more later about (b), and how to evaluate claims like this. For now you'll just have to take my word for it , or rather, take note that I disagree with Pinker's analysis. But I've put in some time working on the analysis and synthesis of English intonation, and I'm fairly confident that Pinker is stretching a bit here, as Tom Sawyer might have said.

Whatever the origin of I could care less -- and it's as likely to have to been confusion about negation as sarcasm -- by now, it's just an expression. And as Eric hints, grammar anti-mavens may also sometimes try to make us believe something false just by asserting it.

It wouldn't surprise me to learn that Lederer has never bothered to read Pinker's account of the alleged intonational disambiguous of could care less. But if he had read it, the kindest thing might have been just not to mention it, as indeed he didn't.

[More on this:

"'Could care less' occurs more" (7/13/2004)
"Negation by association" (7/13/2004)
"Speaking sarcastically" (7/13/2004)
"(Auto)biography of a blog thread" (7/16/2004)
"Most of the people in the world could care less (7/16/2004)
"Caring less all the time: A variant of the etymological fallacy, and some cautions about the pragmatics-phonetics connection (7/24/2004)
"The future of history of usage (4/16/2005)
"The care less train has left the station (6/20/2005)
"Caring more or less (6/29/2005)

]

Posted by Mark Liberman at 10:25 PM

Left turn only

I have often wondered whether road lane signs with ONLY under a left-bent arrow mean that you can only turn left from that lane or that the only lane you can turn left from is that one. It seems to me dangerous to have to ponder a tricky scope problem, on which life-or-death lane-changing decisions may hang, while driving in heavy traffic. But it is particularly interesting that in 1971 the State of Florida made a mistake about it on a driver's license exam. The question showed a sign like the one shown here, and it asked what the sign meant. The correct answer was supposed to be:

Left turn from left lane only and traffic in adjoining lane may turn left or continue straight ahead.

But almost everyone reads this as a contradiction. Don't you?

The person who wrote the question and its incorrect answer appears to have been one of the minority who utterly confuse "Only if you're in the left lane can you turn left" with "If you're in the left lane the only thing you're allowed to do is turn left." The error was spotted by John Keasler, who wrote about it in the Miami Herald on November 23, 1971, page 8-B. Howard Pospesel gives the task of stating the two meanings explicitly in logical symbols as an exercise on page 61 of his textbook Introduction to Logic: Predicate Logic (second edition; Upper Saddle River, NJ: Prentice-Hall, 2003).

Posted by Geoffrey K. Pullum at 09:25 PM

Lederer should care less

Talk about insult to injury ... not that I'm a big fan of kicking someone again when he's down, but Richard Lederer really just asks for it.

On the show that Lederer co-hosts on my local public radio station, people usually call in to ask questions about word and phrase origins and the like. But every once in a while, someone calls to ask about phrasing choices and pronunciation, things that might actually be interesting to the typical professional linguist. Predictably, I am often disappointed, annoyed, and/or seriously irritated at the hosts' answers to these questions. On the weekend of January 24-25, 2004, for example, someone called in with multiple interesting questions, one of which was (this is not a direct quotation):

Why do some people say I could care less instead of I couldn't care less?

Lederer's answer was that even though many people say "I could care less" -- in Lederer's estimation, roughly half of the people who use either form -- it is logically inconsistent with what those people mean to say.

(By the way, Lederer's estimation jibes pretty well with Google. Searching for the strings "could care less", "couldn't care less", and "could not care less", I got 160,000, 131,000, and 19,200 ghits, respectively. Lumping the last two together, we get something pretty damn close to a 50-50 split.)

Now why would someone in Lederer's position (Ph.D. in English and Linguistics, former English instructor, author of books on language, co-host of a public radio show on language) be so willing to think that half of the population speaks so carelessly as to not acknowledge the difference between an attitudinal statement and its negation? It seems to me that Lederer at the very least has a responsibility to the caller and the rest of his listeners to consult some relevant linguistic literature on the topic. (Calls to the show are screened well in advance, so there's no real excuse not to do this.) He does have a list of books that he consults on a regular basis, but the trained eye will note that not one of them is a book by a professional linguist or even about linguistics. Lederer wouldn't even have to meander too far from the popular language-book fodder he appears to favor; the topic at hand is discussed in Steven Pinker's The Language Instinct. The relevant passage can be read here (a January 31, 1994 article in The New Republic based on Chapter 12 of said book).

Pinker's claim is that the intonational pattern of I could care less is sarcastic, not incorrect. Aside from taking into account the obvious intonational distinction between the two forms, this hypothesis has the added advantage of not insulting the intelligence of the half of the population that uses the allegedly incorrect form. (The only thing missing is independent evidence that the same intonational distinction holds of other sarcastic-nonsarcastic utterance pairs; Pinker also does not cite any sources for this claim, unlike many other claims made and discussed in the book.)

The Google search I noted above revealed that someone out there disagrees with Pinker's claim, based entirely on the fact that so many people use the allegedly incorrect form:

The "irony" explanation comes from Steven Pinker in The Language Instinct. (There could be other sources, but I've read it there.) Personally, I don't think that people mean it ironically; one hears "I could care less" far too often for that to be true. I think it's simply become a stock phrase that people use without parsing. Grammarians and purists put far more stock in "logical" usage than empiricial evidence suggests is supported by actual utterances.

Posted by: Mike Pope at July 17, 2003 06:32 PM (#link)

Mr. Pope apparently believes that sarcasm/irony should not be frequently found in speech -- perhaps because he, much like Lederer, believes that people are generally not sophisticated enough to speak sarcastically/ironically. (Yeah, right.)

But why does Pope expect hyperbole to be so common in everyday people's speech? Both of these phrases are clearly hyperbolic -- one hears them far too often for there to be that much stuff to not care less about. I'm willing to admit that many folks probably use I could care less without consciously thinking about being sarcastic/ironic, but I am also prepared to claim that just as many folks probably use I couldn't care less without consciously thinking about whether or not they could actually care less.

[ Comments? ]

Posted by Eric Bakovic at 08:13 PM

Filename generation idiocy.doc

No matter how low your esteem for modern word processors might already be, the manufacturers will find ways of adding new features that will drive your respect for them even lower. Take the filename-suggestion feature that evolved over the past few years in both Word and WordPerfect. Start a blank document and put something in it with a note to a colleague at the top — let's say you put "TRY AND MAKE THIS PIECE OF SHIT INTO SOMETHING, HARRY; I'VE TAKEN BRADSHAW'S STUPID IDEA AND DONE WHAT I CAN WITH IT, BUT IT STILL LOOKS LIKE A CROCK TO ME. --- BOB". Now choose Save As from the file menu to save it. The program will choose a default filename for you. Only what it chooses is simply what it thinks is the title, and that it assumes will be whatever you have on the first line, whatever its length, and regardless of whether it contains spaces or not. So if you just say yes to the default you will find you then own a file with a fairly eye-opening name. I tried this with two word processors (your mileage may differ). WordPerfect 11 for Windows gave me a file called

TRY AND MAKE THIS PIECE OF SHIT INTO SOMETHING.wpd

Word for Mac OS-X gave me a file called

TRY AND MAKE THIS PIECE OF SHIT.doc

(And then, incidentally, the program crashed without explanation in a freshly booted environment and exited unexpectedly. Microsoft should really try and make this piece of shit into something.)

The problem is that the designers have no idea how to do the process of mapping documents onto suitable words or phrases that might be good mnemonic filenames for them (it's nontrivial, of course), but they try to do it anyway. It's another case of tomorrow's technology today.

It's always the linguistic factors on which they fall down: silly filename-suggesting feature; text conversion from upper case to lower case with initial caps that doesn't quite know which words to capitalize; hopelessly unreliable conversion of characters from one word processor to another (ask a linguist like Arnold Zwicky who has switched from Windows to OS-X and thus has to convert his WordPerfect documents into Word whether he's ever had a conversion that was trouble-free); and of course a grammar checker that is a complete joke, and a spell checker that is little better (they try to do correction suggestions, another thing they are not any good at: Barbara carelessly accepted one of its suggestions the other day without intending to, and we found that a joint paper of ours referred to Kimchee instead of Chomsky).

It's not that nothing at all has been improving: some of the plain graphics programming improvements are extraordinarily clever. But linguistics and natural language processing have not been feeding into the word processor industry at all.

Posted by Geoffrey K. Pullum at 06:43 PM

Goldsmith on Harris

John Goldsmith has posted on his web site at Chicago a long and thoughtful review of "The Legacy of Zellig Harris: Language and information into the 21st century. Volume 1: Philosophy of science, syntax and semantics".

It's too bad that John Benjamins has priced these volumes so high, as many of the papers in them are accessible to an interested lay audience. One that we've mentioned here is Fernando Pereira's paper "Formal grammar and information theory: together again" (from Volume 2).

Posted by Mark Liberman at 04:58 PM

Wrong

In the process of composing a small rant about the unreliability of journalists, it occurred to me to wonder who invented the phrase "the fourth estate". Google immediately told me: Edmund Burke via Thomas Carlyle, with the first citation apparently being this one:

Burke said there were Three Estates in Parliament; but, in the Reporters' Gallery yonder, there sat a Fourth Estate more important than they all.

Thomas Carlyle (1841) On Heroes: Hero Worship and the Heroic in History

"More important than they all"? Surely that's not grammatical.

I know about than the preposition vs. than the introducer of elliptical clauses, but this example took me (intuitively, not analytically) aback. It seems many people have agreed with me, to the extent of silently correcting Carlyle's text: "more important than them all" has 63 whG (web hits on Google), most of which are modified versions of the Carlyle quote. By comparison, "more important than they all" has only 31 whG.

So we can add this to the list of quotations that are quoted more often in a modified form than in the original.

Posted by Mark Liberman at 08:42 AM

Lavatory lexicography

I happen to know the word that is used by flush toilet designers and testers to denote the solid object or objects that it is the purpose of flush toilets to dispose of. And you probably don't. So that certainly puts me one up on you. Unless you read on. Though you probably don't really want to.

* * * * * * * * * ** *

Well, I was wrong, you did, so here it is. The flushable matter is called the insult. Really. I learned it from a program I heard on the radio. R&D guys at toilet factories talk about how effective particular designs are at flushing away the insult.

You're probably thinking there will be some point to this post. But there isn't.

Except perhaps, if you press me, that it is a good example of why I am not very interested in words per se. You get these odd little factoids and then... what follows from them? Nothing. I'm not really a word man. I know, I know, it does seem strangely sort of fascinating to learn that the Somali vocabulary for concepts relating to camel spit is so rich. But in the end, so what? The interesting stuff, to me at least, is in the whole complex system that does the work: the grammar. The grammar is the working mechanism of lift wire, trip arm system, ball and float rod assembly, refill shutoff valve unit. Words are just individual parts like the flapper or the flush level handle. But I won't go on; this metaphor is not working like I wanted it to (if grammars are toilets, then what are conversations?). And I don't want to offend lexicographers by... adding insult to injury.

Posted by Geoffrey K. Pullum at 01:41 AM

July 07, 2004

Premature multilingualism considered harmful

At first, commentary seems easy.

New Zealand MP Pansy Wong has called for a compulsory second language policy in schools, because "[o]ur children should enjoy the opportunities that are opening up for others by being global citizens and have the confidence to move in the world among people whose language and culture is different". But Jared Savage, writing in an Auckland newspaper, quotes an eminent Auckland University academic to the effect that "[i]mposing a requirement for students to study a second language could hinder learning".

According to "Auckland University's head of applied language studies and linguistics Dr Gary Barkhuizen", this is because

"... if people are required to study in a second language too early it could interfere with their cognitive learning ability. Exactly when is too soon is a topic of huge debate."

He really said this, according to a reputable news source. Quotation marks and all.

So the obvious thing is to point out what an idiot Dr. Barkhuizen is, how wrong his conclusions are, how uninformed he is about about research in bilingualism and even about obvious facts known to every sensible person, how incoherent his arguments are, and so on. We can pair him with Dr. Shin Min-sup, the psychiatry professor at Seoul National University that Reuters quoted as saying that "[l]earning a foreign language too early, in some cases, may not only cause a speech impediment but, in the worst case, make an child autistic."

Commentary seems to get even easier, because Dr. Barkhuizen is further quoted as saying that

"The teaching of Afrikaans during apartheid in black schools (in South Africa) sparked the Soweto uprisings in 1976. Different people have different opinions, so it's a contentious issue."

According to Barkhuizen's earlier logic, this was perhaps because the Sowetans knew it would rot their brains to learn a second language? Although most Soweto residents already knew several languages, then as now? Or perhaps they were already cognitively impaired by premature multilingualism? And teaching New Zealanders Chinese or Spanish today is socially and politically just like teaching Sowetans Afrikaans under the apartheid regime, right?

Feh. The trouble is, who knows if this is what Barkhuizen really said. As we've discussed in several earlier posts, specialists (including linguists) are often quoted in the popular press in ways that make them look like idiots. Sometimes it's because they're idiots indeed, but at least as often, the journalist or some editor completely misunderstood the interview, placed a truncated answer to a leading question in a completely new context, misused ellipsis in the service of some personal agenda, or just made the whole thing up. It's really hard to tell.

Gary Barkhuizen seems to be a real and reputable person. He wrote a chapter (on " Social Influences on Language Learning") in Blackwell's Handbook of Applied Linguistics. He's published a book with OUP. He really is a senior lecturer and head of department at Auckland University.

So my guess is that what he told Jared Savage was something very different from what Savage put in his article. I could speculate about what Barkhuizen really said, and why Savage may have misunderstood or misrepresented him, but instead I've sent an email of inquiry to Barkhuizen, and I'll tell you what he says in response.

[Link via email from David Donnell]

[Update: email from Gary Barkhuizen confirms my guess.

The reporter apparently asked him questions about several different things:

whether it can hinder learning to make students study (other subjects) in a language they don't know well
whether there can be social and political controversies about obligatory second language instruction
whether or not it's likely to be harmful, socially and psychologically, to teach New Zealand children a foreign language

The reporter then wrote a story about (3), in which he included a scatter of quotes from Barkhuizen about (1) and (2).

It's surely true that (1) is controversial, as Barkhuizen said, and there can also be no question that the answer to (2) is "yes". The problem is that these controversies are related to question (3) only in that they all have something to do with language instruction. This is roughly like asking someone questions about surfing accidents and hydroelectric power, and then using the answers in a story about a proposed sewage treatment plant. It's all about water, right?

This leaves us with the all-too-common question of whether the journalist who wrote the story was a fool or a knave. That is, did he get completely confused about the issues at stake and their logical relationships? or did he cynically misuse out-of-context quotations in order to advance a private agenda -- perhaps somehow related to an ethnic power struggle in New Zealand that I don't know anything about? I'll leave that question to New Zealanders and to scholars of journalism. (In fairness, the guilty parties are sometimes editors rather than reporters.)

For us linguists, the take-home message is already clear. When you talk to members of the fourth estate, watch your back. You can try to give them simple slogans that project the message you want to get across. Alliteration might help. But a thick skin -- or at least willingness to accept a certain amount of public embarrassment for the greater good -- seems to be essential. ]

Posted by Mark Liberman at 11:19 PM

Don't Dangle Your Participles in Public

That's the title of one of Richard Lederer's syndicated columns on the English language. Like Lederer's columns in general, it's supposed to be funny -- he's fond of puns and other kinds of word play -- but it's also supposed to be instructive. (Lederer had a long career as a private-school English teacher.)

This one is a collection of writing errors, a vein that Lederer often taps. The fall into error is a brand of humor that tends to have a nasty edge to it, a superior They Should Have Known Better tone when directed (as in this case) towards professionals, a sneering They Are Ignorant Fools tone when aimed at ordinary folks, in particular students, with their extraordinary word choices, erratic spellings, and uncertain grasp of facts. I guess we're supposed to be shamed into learning some important lesson from these bad examples, but it would be hard for even a compliant and well-disposed reader to do much with what Lederer provides in this column.

Worse still, Lederer is, whether we like it or not, a significant public face of scholarship in English grammar. This is one of the main ways the literate (but non-linguist) world gets to see what we do. Oi.

Background: I came across this particular Lederer column on page 19 of the March 2004 Funny Times (a Cleveland, Ohio, compendium of recent humorous cartoons and essays, with what now counts as a way-left tinge to its politics). I posted a version of this critique to the American Dialect Society mailing list on March 23, but didn't put it on Language Log because it was so long. But now we're entertaining some longer and pretty detailed log entries, so I'm offering it. (You've been warned; you could bail out now.)

As for students and their foibles, look, I'm a teacher, and have been for forty years, and of course I sit around with other veterans and tell amazing war stories, but these occasions are suffused by sadness, since our students' failures are our collective failures too.

But on to the participles. Or, I should say, "participles", since only five of Lederer's fifteen Horrible Examples actually involve participles (two present participles and three past participles). The rest have modifiers of other types -- six prepositional phrases, one infinitival, one relative clause, one reduced comparative -- plus one pronominal reference example that involves modifiers only because the pronoun is inside one, though this example does superficially resemble classic dangling-modifier cases. The real theme -- twelve of the fifteen examples -- turns out to be attachment ambiguities. Surprise!

So here's Problem 1: The article is actually about modifiers in general, not just participles. This is just inexcusably sloppy for an English teacher offering grammatical advice in public; people are confused enough as it is about grammatical terminology, and now Lederer's column throws "modifier" and "participle" around as if they were synonyms.

The headline is a disaster in this regard, and so is the framing text:

"The AP Press Guide to News Writing advises: "The language has many ways to trip you up, most deviously through a modifier that turns up in the wrong place. Don't let related ideas in a sentence drift apart. Modifiers should be close to the word they purport to modify." These statements culled from newspapers and magazines demonstrate what happens when a writer dangles his or her participles in public:"

Horrible Example 1 has a wonky infinitival modifier:

(1) The family lawyer will read the will tomorrow at the residence of Mr. Hannon, who died June 19 to accommodate his relatives.

and we don't get to an actual participle until example 6:

(6) The burglar was about 30 years old, white, 5' 10", with wavy hair weighing about 150 pounds.

-- an attachment ambiguity that probably results from simple comma omission; certainly, it could be fixed by treating "weighing about 150 lbs." as the fifth in a series of descriptors and setting it off by a comma:

(6') The burglar was about 30 years old, white, 5' 10', with wavy hair, weighing about 150 pounds.

All five of the descriptors apply to the burglar, but there's no way to keep all of them close to "the burglar" -- though if you insisted on keeping "weighing about 150 pounds" right next to "the burglar", you could do it:

(6"a) The burglar, weighing about 150 pounds, was about 30 years old, white, 5' 10", with wavy hair.

(6"b) Weighing about 150 pounds, the burglar was about 30 years old, white, 5' 10", with wavy hair.

These versions, however, highlight the burglar's avoirdupois in a way that the writer almost surely didn't want, and they lose the nice increasing-weight effect of (6'), which puts the longest, heaviest descriptor last. (It would also be possible to shorten "weighing about 150 pounds" to "about 150 pounds" and move it up in the descriptor list.)

Horrible Example 15 might well be a simple comma error too:

(15) An ethnically diverse crowd of about 50 gathered at the Falkirk Mansion in San Rafael yesterday for a speakout against hate crimes organized by the Marin County Human Rights Roundtable.

Here there are two dependents on "a speakout": the prepositional phrase "against hate crimes" and the participial "organized by the Marin County Human Rights Roundtable". Once again, we cannot possibly get both of these dependents up against "a speakout"; we could move the participial up --

(15') An ethnically diverse crowd of about 50 gathered at the Falkirk Mansion in San Rafael yesterday for a speakout organized by the Marin County Human Rights Rountable against hate crimes.

This is clunky, because of the long-before-short order, and it's also potentially ambiguous, with a "Rountable against hate crimes" reading that strikes me as about as likely -- not very -- as the "hate crimes organized by the MCHRR" reading of (15). A much simpler fix would be to set off "organized by the MCHRR" in (15) by a comma; a somewhat more complex fix would be to move it forward and set it off:

(15"a) An ethnically diverse crowd of about 50 gathered at the Falkirk Mansion in San Rafael yesterday for a speakout against hate crimes, organized by the Marin County Human Rights Roundtable.

(15"b) An ethnically diverse crowd of about 50 gathered at the Falkirk Mansion in San Rafael yesterday for a speakout, organized by the Marin County Human Rights Rountable, against hate crimes.

Similar remarks apply to:

(13) Hunting can also be dangerous, as in the case of pygmies hunting elephants armed only with spears.

Still another participle Horrible Example, 12, is almost surely the result of that very common punctuation error, failing to use a comma between conjuncts:

(12) We spent most of our time sitting on the back porch watching the cows playing Scrabble and reading.

(I assume that we're not entertaining a reading in which the back porch watches cows, only a reading in which the cows play Scrabble and read.)

This is spiffily fixed with a single comma:

(12') We spent most of our time sitting on the back porch watching the cows, playing Scrabble and reading.

On the other hand, moving "playing Scrabble and reading" up to the front, real close to the verb "spent", produces a sentence in which our attention is strangely divided:

(12") Playing Scrabble and reading, we spent most of our time sitting on the back porch watching the cows.

One more example, this time with a prepositional phrase instead of a participial phrase: Horrible Example 9.

(9) Residents will be given information on how to reduce the amount of garbage they generate in the form of lectures, printed literature and promotional items.

The problem is to convey that it's the information that comes in the form of lectures and so on, not that it's the garbage that comes that way, or that the garbage is generated in those forms. In this case, moving the "in the form of..." phrase up will work nicely, so long as (once again) it's set off by commas, or the even stronger dashes (otherwise we risk the unintended reading "items on how to reduce..."):

(9'a) Residents will be given information, in the form of lectures, printed literature and promotional items, on how to reduce the amount of garbage they generate.

(9'b) Residents will be given information -- in the form of lectures, printed literature and promotional items -- on how to reduce the amount of garbage they generate.

Or we could leave the "in the form of..." phrase where it is, but set it off:

(9"a) Residents will be given information on how to reduce the amount of garbage they generate, in the form of lectures, printed literature and promotional items.

(9"b) Residents will be given information on how to reduce the amount of garbage they generate -- in the form of lectures, printed literature and promotional items.

Yes, the versions in (9") might still be misparsed, but they could probably succeed just fine in context.

On to Problem 2: As is so often the case, the advisors don't actually come right out and explain why there's an issue with the examples. Lederer and the _AP Guide_ tell you to keep your modifiers close to the things they modify, but they fail to mention why you should do that -- or, in fact, why the fifteen examples are (presented as) ludicrous. The missing word is *ambiguity*. Every single one of the Horrible Examples runs awry, at least in Lederer's judgment, because of an alternative interpretation that the writer didn't intend but the reader might fix on.

Example (1) can be read as saying that Mr. Hannon died to accommodate his relatives (which might or might not be true, but it probably wasn't what the writer had in mind), while the intended reading is that the reading of the will tomorrow is to accommodate Mr. Hannon's relatives. Fixing this can't be managed just by moving "to accommodate his relatives" closer to "read" than to "died", since other adjustments are needed; this is left as an exercise for the reader.

Example (3) could be read as describing a passer-by with a bullet in his head, instead of a bullet-riddled body:

(3) The body was found in an alley by a passer-by with a bullet in his head.

Once again, just moving the prepositional phrase "with a bullet in his head" forward (and setting it off with commas) is not necessarily the best move:

(3'a) With a bullet in his head, the body was found in an alley by a passer-by.

(3'b) The body, with a bullet in his head, was found in an alley by a passer-by.

The problem is the highlighting of the bullet in (3'). What really works here is the simplest of adjustments -- making the shot passer-by reading more unlikely by referring to the body as an inanimate object, via the pronoun "it", with no movement of setting-off at all:

(3") The body was found in an alley by a passer-by with a bullet in its head.

An equally easy fix is available for the following example, which (at least out of context) induces giggles:

(14) Police searched into the night for a man armed with a shotgun that walked into a Boulder pharmacy Thursday morning, demanded drugs and then fled.

This is another one with two dependents of a single head, in this case "a man". Moving "armed with a shotgun" after the relative clause doesn't help things at all, but a simple change in relative pronoun, picking out a specifically human antecedent, does the trick:

(14') Police searched into the night for a man armed with a shotgun who walked into a Boulder pharmacy Thursday morning, demanded drugs and then fled.

Problem 3: The advisors talk as if any potential ambiguity were a problem that needed fixing. This way lies madness. Attachment ambiguities and pronominal reference ambiguities are just everywhere. Telling writers to avoid potential ambiguities is pretty much telling them to give up writing entirely.

Potential ambiguities aren't the issue; effective ambiguities, ambiguities that a reasonable (and well-intentioned) reader might stumble on, are the issue. Picking out potential ambiguities requires (merely!) that you know the syntax of the language. Picking out effective ambiguities, which you might seriously want to avoid, requires factoring in common knowledge, your audience's mind-set, prosodic effects, the discourse context, and much else. Usage manuals are almost all piss-poor at this sort of thing. Lord knows what students are supposed to make of their advice.

Now, some of Lederer's examples really are laughable, at least out of context, because the unintended reading just juts out. The effect is especially strong for very short sentence-final modifiers, which tend to latch onto preceding stuff:

(5) Organ donations from the living reached a record high last year, outnumbering donors who are dead for the first time.

("...dead donors for the first time" would fix this, without movement, though movement works too. Setting "for the first time " off by a comma might work too.)

(7) Beginning with three games on Tuesday, the unmistakable drama of postseason baseball will grip all of us who love the game for a month.

("...all of us baseball lovers for a month" would do, and maybe setting off "for a month" with a comma. All of the movement possibilities for "for a month" sound awkward to me.)

(11) The dog was hungry and made the mistake of nipping a 2-year-old that was trying to force feed it in his ear.

(Moving "in the ear" forward pushes it between a verb form, "nipping", and its object, which is very awkward. Moving it before "nipping" is completely impossible, since "nipping..." is the object of the preposition "of", and you just can't mess with a preposition and its object. Major recasting is best: "the mistake of nipping the ear of a 2-year old that was...")

Problem 4: Examples are judged out of context, when in fact discourse organization, information structure, common knowledge, and the like play incredibly important roles in the interpretation of sentences -- a fact you might never appreciate from presentations like Lederer's.

Here's the dangler-like pronominal-reference example from Lederer's Failed Fifteen:

(8) Despite its dismal record in human rights, the House of Representatives has granted most favored nation status to China.

In isolation, it would be easy to (mis)interpret (8) as attributing a dismal human-rights record to the House. But in a discourse that's about China, the intended interpretation would come through without any problem.

Similar remarks hold about some of the examples above, not to mention sentences you can collect any day -- "...are trying to stop spam in federal court" (news report on NPR's Morning Edition, 3/11/04), "I found out what we'd done in '92" (line from the tv show Law and Order; in the context, it's absolutely clear that it was the finding out that happened in 1992).

Problem 5: Movement is offered as the only solution (because mislocation is identified as the only problem), despite the fact that, as I pointed out above, movement is sometimes not even possible (preserving the intended meaning), often not especially useful, and sometimes not the best solution.

Look at Lederer's deer-hunting example:

(2) Mrs. Shirley Baxter, who went deer hunting with her husband, is very proud that she was able to shoot a fine buck as well as her husband.

Moving "as well as her husband" after "shoot" favors the unintended meaning. Moving it after "able", and setting it off with commas, is possible but really awkward. The easy solution is not to reduce the comparative quite so much:

(2') Mrs. Shirley Baxter, who went deer hunting with her husband, is very proud that she was able to shoot a fine buck as well as her husband can.

This is not to deny that sometimes movement is the way to go. The one absolutely classic sort of dangling-modifier example in Lederer's list really needs the modifier moved, from:

(10) Aided by a thousand eyes, the author explains how ants navigate and how they use dead reckoning.

to:

(10'a) The author explains how, aided by a thousand eyes, ants navigate and how they use dead reckoning.

(10'b) The author explains how ants, aided by a thousand eyes, navigate and how they use dead reckoning.

Problem 6: Not so crucial as the others, but the assumption that what modifiers modify is a single word (rather than some larger constituent) can present puzzles for the student. Look at Lederer's sixth example (yes, this is the only one left):

(6) The suspect was spotted in a vehicle matching the description of one which had been stolen from the Annabelle area by Sheriff's Office Sgt. Craig White.

The textbook doctrine here is that the problem is that "by Sheriff's Office Sgt. Craig White" needs to be close to the word it modifies, "spotted" (how, by the way, does the student learn what modifies what?). And, in fact, moving it right after "spotted" does the job:

(6') The suspect was spotted by Sheriff's Office Sgt. Craig White in a vehicle matching the description of one which had been stolen from the Annabelle area.

But the clever student will notice that in (6'), "in a vehicle..." isn't close to the word it modifies. (Clever professionals understand that the problem with (6) isn't really separation, but an inadvertent attachment ambiguity.) Why isn't that bad?

An even more clever student will wonder why "The suspect was spotted in a vehicle by Sheriff's Office Sgt. Craig White" doesn't need to be reworded to (the more marked version) "The suspect was spotted by Sheriff's Office Sgt. Craig White in a vehicle". The short answer is that the "by"-phrase modifies a whole VP and that putting it after the VP is locating it next to the thing it modifies. It can occur after the head of the expression it modifies, if that avoids an effective ambiguity and/or allows a long and heavy constituent (like "in a vehicle matching the description of one which had been stolen from the Annabelle area") to come last in the sentence, but that's not its usual location.

So... we can guffaw at these examples, ripped out of their contexts and paraded in front of us under the banner of "dangling participles" perpetrated by writers who should have known better. or we can wonder, sadly, what a reader who's hoping to learn something from a well-known English teacher can make of all this -- confusingly labeled examples, a rule that drops from the sky, no guidance about when the rule is important and when it's not, instructions for repair that are hard to follow and often lead to very odd results. If you didn't already know how to write well, could you learn anything from Lederer's column?

That's a rhetorical question.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 10:27 PM

Field Notes

The other day, working with a group of elders on the Flathead Reservation in northwestern Montana, I was eliciting words for `tight'. (Some of the elders speak a dialect known as Bitterroot Salish; others speak Pend d'Oreille. Systematic dialect differences are hard to find, though, and I've been using the cover term Montana Salish for this particular dialect complex. Other dialects of the same language, which is nameless, are spoken in Washington state -- Spokane and Kalispel. Four different tribes, one language.) As usual when `tight' is being discussed, someone mentioned kʷ tinps, consisting of the second-person singular intransitive subject particle kʷ `you', the root tin `tight', a prefix n- `in', and a suffix -ups `tail, butt'. The phrase means `you're a tight-ass, you're stingy'; an alternative form, kʷ nc'anps, means the same thing but uses another root for `tight'. Question: is this a traditional native concept, or is it a calque from English tight-ass?

This isn't a simple issue. On just one occasion I've found clear evidence of calquing, and that may have been a case where the phrase was only uttered once. Several years ago, an elder told me a story about one day when his younger brother came home from school (on a week-end, I guess) and found their mother in a foul mood. School, as was typical on Indian reservations, was boarding school at the Mission, where the nuns punished the children for speaking their own language, with two major results: the children learned English quickly, and Montana Salish entered a rapid decline. Anyway, this child reacted to his mother's crankiness by saying, ``Xeyɬ kʷ pocqn!'', meaning `Boy, are you a sorehead!' -- where pocqn is literally `sore' (poc) plus a suffix for `head'.

But except for this single case where I was told that a kid had invented a new Salish word by combining two Salish morphemes in a new way, inspired by an English word, and where the invented word looks like an exact calque of the English expression, I haven't been able to distinguish between calques and independent inventions. And sometimes the elders I work with, who are all native speakers of English as well as Salish, see connections between English and Salish that strike me as doubtful. An example: the word sut's literally means `(he has a) stretched face' (the root is sut' `stretch', and the suffix means `face, fire'). The elders told me that it's used to mean `have a long face, look mean' -- and that it's an English word, not a traditional word, but `Salish people used it'. Trouble is, there's no English source for this calque that I can think of: in English, `have a long face' means `look sad, gloomy', not `look mean'; and in any case `long' doesn't mean the same thing as `stretched'.

Of course, people have put on both gloomy and mean faces forever, and for all I know stinginess is a modern Western concept; if it is, then that would strengthen the case for `tight-ass' as an English calque. But the general problem interests me. Calquing is a subtle sort of borrowing, often (I bet) completely unnoticed because its results are so unremarkable, and hard to establish even when it's suspected.

Posted by Sally Thomason at 12:25 PM

July 06, 2004

An internet pilgrim's guide to accentual-syllabic verse

Yesterday I cited evidence that some English professors may be a little shaky about verse scansion. In the comments, Charles Hartman raised a valid point about the (in)appropriateness of the classical taxonomy of foot types as a basis for metrical description, and also asked what sort of linguistic analysis might be genuinely useful in analyzing poetic rhythm.

I'm going to try to answer Charles' questions in three stages.

First, in this post, I'll lay out a way of thinking about accentual-syllabic verse. This point of view is by no means original to me, but I'll present it without references and in a somewhat idiosyncratic form, for ease of assimilation. Second, in a later post, I'll take up the question of "feet" and the classical bestiary from amphibrach to trochee, and discuss Charles Hartman's implication that this framework can be confusing and even misleading if taken too seriously as a foundation for metrical analysis. In a third post, I'll offer a few suggestions about the basic aspects of language sound structure that (I think) anyone who cares about poetic meter -- and rhythmic patterns in poetry more generally -- should understand and know how to apply descriptively.

I hope that all this will be helpful for readers who are uncertain about how metered verse in English works. I recognize that there are other readers who know much more than I do about some or all aspects of these phenomena. I hope that they'll find a few interesting things here, if only by association or in reaction, and I expect to learn from their comments and objections.

The rest of this post is a mildly-edited version of the metered-verse section of my notes for lecture #16, "Linguistic Form in Art, Ritual and Play", from Linguistics 001, the introductory linguistics course at Penn.

Tune-text alignment in English

Consider the first verse of the simple song Skip to my Lou, as presented in Ruth Crawford Seeger's American Folk Songs for Children (Doubleday, 1948).

In this verse, as throughout the song, a single line is repeated three times, against a simple melody that sketches a major triad in the tonic, the dominant, and then again the tonic.The verse ends with the invariant line "skip-a to my lou, my darling."

The songbook gives a couple of dozen other verses. Each has the same structure -- a single line repeated three times, and the invariant ending "skip-a to my lou, my darling." Thus the problem of fitting words to music can be reduced for each verse to the problem of
fitting a single line to the first two bars of the melody -- everything else is just repetition.

This is about as simple as songs get. Nevertheless, a four- or five-year-old learning new verses has to solve a non-trivial problem.

One way to look at the problem is to line a few verses up against a depiction of the metrical structure of the first two bars of the song. These two bars contain four "quarter notes". The metronome marking at the top of the music says that the quarter note equals 132, i.e. 132 quarter notes per minute, or a little more than two quarter notes per second.

Standard western musical structure assumes a regular hierarchical subdivision of time. In this case, each quarter note can be divided into two eighth notes, each eighth note into two sixteenth notes, etc. At each level, the first of the subdividing notes is "stronger" than the other -- it is the "downbeat."

Three levels are enough for this musical example. As for the alignment with the melody, the song provides a separate pitch for each quarter note. If that note is subdivided by the syllables of the verse, then the subdividing syllables just repeat the same note.

Here is the first verse -- this is just a schematic presentation of exactly the information provided by the musical notation above:

E               C               E               G               (pitches)
X               X               X               X               (1/4 notes)
X       X       X       X       X       X       X       X       (1/8 notes)
X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   (1/16 notes)
little red     wa-     gon     pain-   ted    blue

Here are some other verses, aligned under another copy of the same melodic and metrical schema:

E               C               E               G               (pitches)
X               X               X               X               (1/4 notes)
X       X       X       X       X       X       X       X       (1/4 notes)
X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   (1/16 notes)
pig      in the par-    lour    what'll I      do
cat      in the but-ter milk    lapping up    cream
rab-bit in the corn    field   big as a      mule
hogs in the po-ta- to patch   rooting up     corn
dad's    old    hat      and    ma- ma's old   shoe

There are many other verses -- Seeger provides a couple of dozen in the publication cited, and says "this song has hundreds of stanzas and is always picking up new ones. One collector alone gives 150, from which the above 22 were selected as encouragement to further improvisation."

The samples given above are enough to give us a guess about the principles involved. For a start, we can say something about what the principles are NOT:

There is not a fixed number of syllables in a line -- the samples have between 8 and 11 syllables.
Even for a given number of syllables, the alignment with the melody and rhythm of the song is not fixed by syllable order, but depends on the stress and structure of the words.

For instance, both "little red wagon painted blue" and "dad's old hat and mama's old shoe" have eight syllables, but if we used the syllable-by-syllable alignment of the first line for the second line, we'd get the impossible pattern:

E               C               E               G               (pitches)
X               X               X               X               (1/4 notes)
X       X       X       X       X       X       X       X       (1/4 notes)
X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   (1/16 notes)
dad's old hat     and     ma-    ma's     old   shoe

which gives the impression of stressing the line as "DAD's-old-hat AND maMA's old SHOE" (where the capitalized syllables correspond to the quarter-note beat of the song, and also to the points of pitch change).

No self-respecting American nursery school graduate would ever think to sing the line that way -- except perhaps as a joke.

[as a party trick, Haj Ross used to sing "Take me out to the ball game" with the words shifted by half a beat relative to the music. It's hard to learn to do, and curiously hilarious to hear.]

The principles of tune-text alignment for this song seem to be:

Each of the four quarter notes is always aligned with a stressed syllable
There is always at least one syllable in between each pair of quarter notes, so that the intervening eighth-note positions are always filled, but these intervening syllables are not always stressed
Syllables aligned with sixteenth notes may be added to taste.

This implies that the minimum plausible line of "skip to my lou" might be seven rather than eight syllables. For instance, "Jane's old hat and Jim's old shoe" might be OK.

We can make some other observations, such as this one:

The eighth sixteenth-note position seems to be avoided (the "upbeat" to the second measure); this corresponds to a tendency to break the line into two parts at this point
The tenth sixteenth-note position is often filled, perhaps to emphasize the continuity of the second measure.

You can verify for yourself that the rest of Ruth Crawford Seeger's cited lines follow the same pattern:

Pull her up and down in the little red wagon

Rats in the bread tray, how they chew

Chickens in the garden, shoo shoo shoo

Cow in the kitchen, moo cow moo

Going to market two by two

Back from market, what did you do?

Had a glass of buttermilk, one and two

Skip skip skip-a to my lou

Skip a little faster, that won't do

Going to Texas, come along too

Lost my partner, what'll I do

I'll get another one prettier than you

Catch that red bird, skip to my lou

If you can't get a red bird, take a blue

If you can't get a blue bird, black bird'll do

The only real novelty in these additional examples is in the line ending with wagon, where there is an extra syllable aligned after the fourth quarter-note.

We can rephrase our observations by saying that Skip to my lou has a four-beat line, where the beats correspond to the quarter notes of the first two bars of the song, and where one to four additional syllables occur between each adjacent pair of beats.

I have seen four- and five-year-old children making up new verses to this song. No one has to teach them the rules -- they figure them out easily enough by themselves.

Most songs are more complicated than this one, but the basic principles of tune-text alignment in English remain the same: syllables are aligned with notes so that the stress pattern of the text and the rhythmic structure of the tune are congruent. If you have some familarity with designing computer algorithms, you might see if you can design one that will correctly specify the tune-text alignment for a simple song like this one.

To make up new verses -- or to sing old ones correctly -- you have to understand, implicitly, the metrical hierarchy of the music, the stress pattern of the text, and the way that they can be aligned. This understanding comes effortlessly to young children, providing more evidence of the psychological reality (and naturalness) of the linguistic (and musical) concepts involved.

Why should there be something natural about the process of aligning two structures so as to make them rhythmically congruent? One plausible hypothesis is that this is the basis of coordination among speech articulators in ordinary talking. On this view, singing is just a kind of regularized and stylized form of speaking. In both cases, rhythmic structures are serving a coordinative function.

Accentual/syllabic verse in English

The principles of tune-text association for Skip to my Lou are basically the same as the principles that underlie most metered verse in English.

This is especially clear if we look at verse with a very clear rhythmic pattern, like Mother Goose rhymes, or Lewis Carroll's The Hunting of the Snark, or Robert W. Service's ballad The shooting of Dan McGrew, or Run-DMC/Aerosmith's Walk this way.

Let's take a look at how McGrew works. The poem has 58 lines, of which the first six are given below.

A bunch of the boys were whooping it up in the Malamute saloon;

The kid that handles the music-box was hitting a rag-time tune;

Back of the bar, in a solo game, sat Dangerous Dan McGrew,

And watching his luck was his light-o'-love, the lady that's known as Lou.

When out of the night, which was fifty below, and into the din and the glare,

There stumbled a miner fresh from the creeks, dog-dirty, and loaded for bear.

If you read these lines out loud, you can hardly avoid getting an impression of the intended rhythm. It's a seven-beat line, with either one or two additional syllables between each pair of adjacent beats. The beginning of the line can start with zero, one or two "upbeat" syllables. There is always a phrasal break between the fourth and fifth beats of each line, and occasionally there is no intervening syllable at this point (as if it were a line break).

We can annotate the rhythmic structure of the next six lines of the poem by using a sharp sign (#) for each "beat", a period for additional syllables, and a slash (/) for the phrase break:

. # . . # . . # . . # /. # . . # . . #

He looked like a man with a foot in the grave and scarcely the strength of a louse,

. . # . . # . # . . # / . . # . # . . #

Yet he tilted a poke of dust on the bar, and he called for drinks for the house.

. . # . # . # . # / . . # . # . . #

There was none could place the stranger's face, though we searched ourselves for a clue;

. . # . # . . # . # / . # . . # . #

But we drank his health, and the last to drink was Dangerous Dan McGrew.

. # . # . . # . # / . # . # . . #

There's men that somehow just grip your eyes, and hold them hard like a spell;

. # . # . . # . # / . . # . . # . #

And such was he, and he looked to me like a man who had lived in hell;

. . # . # . . # . # / . . # . # . #

With a face most hair, and the dreary stare of a dog whose day is done,

. . # . . # . # . # / . . # . # . #

As he watered the green stuff in his glass, and the drops fell one by one.

This kind of annotation of the rhythmic structure of a verse is called scansion, and the basic rhythmic pattern of a poem (if it has one) is called its meter. The scansion shows us how the underlying pattern (here a seven-beat line with one or two intervening syllables) is realized in each line of the poem.

There is quite a bit to say about meter and scansion, even of metrically simple poems (some might even say doggerel) like McGrew. The point that we want to draw out here is that the basic principles are the same as those that applied in the case of Skip to my lou -- a certain number of beats per line, with variable (but constrained) numbers of syllables between the beats, and a regular break in a certain position.

Theorists distinguish among various kinds of poetic meter. The word meter means measure, and in each case, something is being measured or counted. In syllabic meters (as in French poetry), the only thing that matters is the number of syllables per line. In some languages (Classical Greek, Latin, Arabic, Hausa), the pattern of "long" and "short" syllables is regulated. In accentual meters, what is counted is accents -- or more properly beat-aligned accents. Most English metered verse is accentual-syllabic -- each line has a given number of "beats", but there are also more or less strong restrictions of the number of intervening syllables.

It is important to remember that poetic meter is an abtract pattern, a kind of grid against which the poet arranges his or her lines according to some general principles of congruence. How the congruence is defined depends on the poetic style, but also very much on the sound structure of the language that the poetry is written in. For metered verse to be a living form -- as it has been in many cultures around the world, both ancient and modern -- its patterns have to be defined in terms of phonological categories whose patterns poets and their audience can hear and feel.

Metrical feet

In the notation we've been using, The shooting of Dan McGrew is written in a fairly even mixture of . # and . . # rhythmic elements (225 . # and 198 . . # to be precise). The distinction doesn't really seem to matter to its form, which we described simply as seven beats with one or two intervening syllables, divided into two half-lines of four beats and three beats. The lyrics to the 1986 hit Walk this way (by Run-DMC and Aerosmith) have the same basic pattern: seven beats, divided as four plus three. However, now there can be as many as three weak syllables between each pair of strong syllables (where "strong" means "aligned with the beat"):

 . .  #   .  .    #    .    .  #      .    #   /  .   .  #  .   .  .   #  .  .   #
So I took a big chance at the high school dance, With a missy who was ready to play.

 . .    #   .  .   #  .     .      .   #    .    .  .   # . /   .    .  #    .  .   .  #     .   #
Wasn't me she was foolin' 'cause she knew what she was doin', when she told me how to walk this way.

Thus Walk this way has exactly the same meter and rhyme scheme as The shooting of Dan McGrew, except for a slight relaxation of the meter: instead of one or two weak syllables between beats, Aerosmith's song has one, two or three.

Lewis Carroll's mock epic The Hunting of the Snark also has the same basic meter as The shooting of Dan McGrew: Here are the first two stanzas:

              .     .   #    . .   #       . #   .    #
            "Just the place for a Snark!" the Bellman cried,
                .   . # .   .    #   .    #
                As he landed his crew with care;
             . # .   .     # .    . # .    . #
            Supporting each man on the top of the tide
                 . . # . .   #    .   .   #
                By a finger entwined in his hair.

              .     .   #    . .   #     . .    #   .    #
            "Just the place for a Snark! I have said it twice:
                  . . #     .    . # .     .   #
                That alone should encourage the crew.
             .     .   #    . .   #     . .    #   .     #
            Just the place for a Snark! I have said it thrice:
                  . . #    .     .   #    .    #
                What I tell you three times is true."

Snark has alternate lines of four and three beats -- corresponding to the four/three division of the seven-beat line in McGrew. With the promotion of the half lines to full lines, additional rhymes are added (here cried/tide and twice/thrice) to reinforce the stanzaic form, but the meter is basically identical.

In Snark, however, the balance between . # and . . # shifts dramatically towards . . #

There are 1754 . . # sequences, to only 251 . # sequences, for a ratio of about seven to one, while the . # sequences that do occur are essentially all at the beginning or the end of a line. Thus Snark is moving in the direction of fixing not only the number of "beats" -- of strong syllables in the line -- but also the number and placement of weak syllables.

In order to characterize poetic forms in which the arrangement of strong and weak positions is regulated in this way, poets and critics have borrowed the terminology of Greek (and Latin) metrics. The Greek metrical system was based on patterns of totally different units -- their meters did not care about the location of accented syllables, but rather regulated the pattern of long and short syllables. They then established a congruence between long and short syllables and patterns of long and short time-units in the musical meters of the period. Metrical systems that depend on syllable-length in this way are called quantitative. By contrast, English lyric poets rely on a congruence between stress patterns and the beat structure of our music, resulting in a metrical system that is called accentual or accentual-syllabic.

These different choices of basic poetic stuff are not arbitrary. The (classical) Greek language made a systematic distinction between long and short vowels, whereas English does not; English word-stress organizes the rhythm of English speech in a way that Greek accent did not.

Nevertheless, all poetic forms are based on analogies among different sorts of patterns, and it is easy enough to make an analogy between the Greeks' patterns of long and short syllables, and our patterns of strong and weak syllables. Thus we can borrow the Greek term iamb -- applied to the Greek pattern "short long" -- and apply it to the English pattern "weak strong." The Greeks called these basic patterns "feet" (actually of course they called them the equivalent in Greek). Here are some of the commoner foot names, represented with the typographically convenient (but non-standard!) notation of "." for short positions and "#" for long ones:

iamb	. #
anapest	. . #
trochee	# .
dactyl	# . .
spondee	# #

In discussing classical (Greek and Latin) metrics, it's more common to see the macron used for long positions and the breve for short ones, with a vertically-stacked combination of the two symbols used for "common" positions that might be either long or short, thus:

For English accentual/syllabic verse, we are dealing with patterns of stressed and unstressed (rather than long and short syllables), and the usual notation is something like acute accents over stressed syllables with breves over unstressed ones, as exemplified in this page. Other explanations of English verse scansion use more convenient typography, such the symbols / and u as substitutes for the acute accent and breve respectively. As you've seen, we've used '#' and '.' -- changing the notation doesn't change the ideas, or shouldn't do so, anyhow.

Using whatever notation, the meters we've been examining (The Hunting of the Snark, The Shooting of Dan McGrew, Walk This Way) combine iambic and anapestic rhythms, with alternating lines of four and three feet. This ballad stanza is a common form in English folk poetry.

The Greeks (and their Roman students) identified types of poetic lines in terms of the type of rhythmic pattern (foot) used, and the number of repetitions of the pattern. Thus a pattern consisting of five iambs would be an iambic pentameter; a pattern consisting of six dactyls would be a dactylic hexameter; and so on.

In this way of talking, the ballad stanza alternates tetrameters (four-foot lines) with trimeters (three-foot lines). A limerick is typically two lines of anapestic trimeter, followed by two lines of anapestic dimeter, followed by a final anapestic trimeter:

. # . . # . . #

I used to think math was no fun

. . # . . # . . #

'Cause I couldn't see how it was done

. # . . #

Now Euler's my hero

. . # . . #

For I now see why zero

. # . . # . . #

Is e to the pi i plus 1.

Iambic pentameter

When we look at the scansion of Skip to my lou or The shooting of Dan McGrew, we see that there is a very good correlation between strong positions in the meter ("beats") and main-stressed syllables of content words.

This correlation is not perfect. There are a few cases where stressed syllables of content words are in weak positions. Here is a line with four examples (in bold face):

. # . # . . # . # / . # . . # . #

A half-dead thing in a stark, dead world, clean mad for the muck called gold

The rhythm of this line remains clear, however, since in each case there are adjacent stressed words that are naturally more prominent, and so a completely ordinary reading of the line still gives a direct expression of the "swing" of the meter.

What never happens -- in such verse -- is for the main-stressed syllable of a polysyllabic word to be scanned in a weak position.

There are also examples of a "beat" position occupied by a word -- such as a function word -- whose natural degree of prominence is weak. However, in nearly all of these cases, there are even weaker words adjacent, so that the natural rhythm of the phrase still expresses the meter clearly:

. # . # . . # . #

That one of you is a hound of hell . . .

As a result, it is impossible to read the poem without intuitively grasping its meter -- whether you want to or not!

In most metered verse in English, the meter is not as obvious. Nevertheless, the basic rules of meter and scansion remain the same: there are a certain number of strong ("beat") syllables per line, with a specified number of intervening syllables permitted between the beats. The main-stressed syllable of a polysyllabic word is never allowed to occur in one of the intervening weak (non-beat) positions in the meter. Naturally strong monosyllables may occur in metrically weak positions.

Perhaps the biggest difference between Service's handling of meter and that of subtler English poets is the treatment of metrically strong positions. In poems where the meter is obvious or even insistent, like limericks or The shooting of Dan McGrew or The hunting of the Snark, metrically strong positions are usually occupied by strongly stressed syllables, with weaker syllables around them. In subtler metrical styles, this correlation is relaxed, so that weak monosyllables often appear in strong positions in the meter.

The result is a flexible sort of meter, able to vary between clear and obvious rhythmicity and more prose-like patterns.

Like much English-language art poetry from Shakespeare to Auden, Alexander Pope's An Essay on Criticism is written in iambic pentameter, that is, in lines of five repetitions of the iambic rhythm /. #/

Sometimes his lines express the meter as directly as any limerick:

. # . # . # . # . #

In search of wit these lose their common sense

Each strong position in the previous line is occupied by the stressed syllable of a content word; each weak position is occupied by a prosodically-weak function word, or by the unstressed syllable of a polysyllabic word. In the line given below, both stressed syllables of imagination are used in metrically strong positions, but otherwise the situation is the same:

. # . # . # . # . #

Where beams of warm imagination play

However, it is easy to find lines in the same poem where unstressed and prosodically weak syllables are put in metrically strong positions, like is and the in the couplet below:

. # . # . # . # . #

A little learning is a dang'rous thing;

. # . # . # .# . #

Drink deep, or taste not the Piërian spring:

As a result of this (and other aspects of poetic practice, such as frequent inversion of the first foot of a line), there may be a complex relationship between the abstract metrical pattern of iambic pentameter and its linguistic rhythms. This somewhat indirect relationship between the metrical pattern and its linguistic relationship is typical, although the details differ from meter to meter and even from poet to poet.

Both the ballad meter and the iambic pentameter (in English) can be seen as involving sequences of alternating weak and strong positions:

. . . w s . . .

The ballad stanza involves 4 + 3 + 4 + 3 of these /w s/ units, while couplets of iambic pentameter (as in Pope's poem) involve 5 + 5. The two different forms also typically exhibit different principles about the instantiation of these weak and strong metrical positions.

In the English ballad meter, the basic idea seems to be that "strong" positions in the meter should coincide with single syllables that are "peaks" of linguistic stress, in the sense that they are naturally more prominent than the syllables around them. The weak positions in this meter are relatively unconstrained, and in particular may correspond to different numbers of syllables with different stress properties, depending on the poet (or the poem). The result is verse in which the natural rhythm of linguistic performance strongly evokes the metrical form.

In English iambic pentameter, on the other hand, the basic constraints seem to be that both strong and weak positions in the meter should correspond to single syllables, and that "weak" positions in the meter should not coincide with stress peaks (that is, syllables that are naturally more prominent than those around them). The "strong" positions are relatively unconstrained. The result is verse in which the natural rhythm of linguistic performance, while metrically constrained, need not evoke the regular alternation of the metrical form very strongly.

[There is a great deal more to be said about all this: what counts as a syllable; where word boundaries are permitted, required or forbidden; the special treatment of positions at the edges of metrical units, and the reasons for this; and so on.]

As the comparison between ballad meter and iambic pentameter illustrates, different poetic styles may constrain the relationship between metrical patterns and the rhythms of language in quite different ways, even within the same language. However, the relationship is usually not a matter of completely arbitrary conventions -- say, "you can add extra syllables if they contain the letter 'x'". Rather, like children's language games and the relation of lyrics to music in songs, it is rooted in the sound structure of the language. Metrics is applied phonology.

[Note: if you have comments, please feel free to send them to me by email.]

Posted by Mark Liberman at 11:42 PM

Comments on comments, again

[Note, 3/15/2005: as a result of an appeal from the person who wrote the first comment discussed below, I've anonymized the names of the commenters involved. ]

I've just removed two comments from an earlier post. They haven't been censored -- you'll get a chance to read them, if you want -- but they're no longer where they were. The rest of this post explains why I moved them. I hope that the explanation will make sense to you, and will help make interventions of this kind less frequent in the future.

Yesterday, Geoff Nunberg posted a critical evaluation of the recent Groseclose and Milyo paper on measuring political bias in the media. The G&M paper has been widely discussed among social scientists, as well as in the popular press and the blogosphere. Geoff's essay explains several important problems with the G&M study that I haven't seen presented elsewhere, and it does so in a way that will be accessible to a wide audience. I'm proud to have it in our blog, and I expect it to be widely read and discussed.

Shortly after Geoff's post went up, Xxxx Xxxxx started off the comments with a 600-word essay that began "Let me shift the issue a little bit. In Generative Linguistics done in Brazil there is one ideological bias that is undeniable: the segregationist bias. This idea is not something Brazilians came with. It started with foreigners that began to study 'Brazilian' Portuguese". Xxxx went on to present a long discussion of a controversy in Portuguese dialect studies. You can read the whole thing here. Whatever the thought process that connects this stuff to Geoff's critique of the methodology and implementation of the G&M media bias study, nearly all of it is on a different topic, one that most of the readers of Geoff's essay will find puzzling at best.

Shortly after Xxxx's comment went up, Yyyyy Yyyyyyy responded with a comment of his own, which is reproduced in the same file. Yyyyy's comment was very much on topic -- that is, on Xxxx's topic -- but there was by now no glimmer of a connection to Geoff's topic. Again, a reader who had come for a serious discussion of the G&M media bias paper would be baffled by all this .

The effect of these two long comments would certainly be to prevent most other readers from commenting on the actual content of Geoff's post. Worse, this stuff may give readers the mistaken impression that Geoff (is a member of a community that) believes that the G&M media bias paper is somehow about the ideology of Portuguese dialectology.

So I've removed those two comments.

We welcome comments -- well, to be frank, there's some difference of emphasis if not opinion on this point among us -- but at least, some of us welcome comments. You'll help us to keep the welcome mat out if you can stick to the topic, more or less, and be reasonably brief. If one of our posts reminds you of something interesting but completely different, please blog it yourself and tell us about it, and we'll probably post a link. If you have a lot to say about one of our posts, please put your essay up on your own blog, or your own website, and put a brief description and a link in the comments on ours.

I recognize that topic drift in comments is sometimes a creative process. But this was a carefully researched and focused post, on a topic of broad interest, and the very first comment hijacked the discussion and took it off in a completely different direction, at great length. That's just plain rude.

Posted by Mark Liberman at 02:05 PM | Comments (11)

Eggcorns from all over

Some eggcorns are just non-standard spellings:

(link) As a truly intelligent group, we always find it difficult to widdle down what we do to into a bullet list.

I often experience the same difficulty myself.

And who whittles anymore, anyhow? We used to whittle when I was a kid, but these days, if you gave your six-year-old a jackknife, Child Protective Services would haul you into court for reckless endangerment of a minor. And the old guys who used to sit on the feed store steps whittling are all inside watching TV now.

Come to think about it, maybe widdle down has something to do with cute widdle kitty? It's often hard to tell a simple misspelling from a leaky metaphor -- perhaps a PR person sprays adjectives around like Fl(ieger)a(bwehr)k(anone) [ = anti-aircraft artillery]?

(link) It's Sayonara to Bill Hughes, a flak for IBM's personal systems group.

Here's a case where one metaphor (defusing a bomb) has definitely been replaced by another (diffusing a noxious substance into a lot of water):

(link) Russia has launched a diplomatic push to diffuse the crisis over Pyongyang's nuclear program.

If you diffuse it, it becomes harmless, right? Or it pollutes a bigger area, or something...

Some words just work better as nouns. It's certainly easier to watch out for those quirky stigmatisms if they're not mere stems:

(link) Also, eyesight can change at any time. You may just have a stigmatism, and they can be real quirky.

And when you've got a noun, some actions are just more natural than others. For example, it makes a good deal more sense to tow an ideological line than to toe one:

(link) And some people find it very frustrating that I'm not towing an ideological line.

I mean, you've got an ideological line, you're naturally going to use it to pull something around, right? Why would you even think of toeing it, unless you're some kind of pervert? When do normal people ever toe anything, anyway? [...yes, I know about toeing the line at the start of a race or to dress a line in military drilling...]

Incidentally, do you know the story about the hillbilly who went from bar to bar around town dragging a logging chain behind him? Finally a bartender asked, "Hey, how come you're dragging that chain around behind you everywhere?" The answer: "Hell, did you ever try to push one of these things?"

It's so sensible, I'm not sure why it's even a joke.

Just like you can pay a fee to get something notarized, you can apparently also arrange to get bonified:

(link) Dental degree. School certified or notarized copy; if in a language other than English must be accompanied by a notarized translation from a bonified U.S. translator.

I'm not sure where you go to get a translator bonified, though. Probably Margaret Marks would know.

[Although I've found new examples via Google, all of these re-understandings of English come from chapter 11 of Ken Wilson's 1987 book "Van Winkle's Return". Ken, the author of The Columbia Guide to Standard American English, was my parents' friend when I was a child. I haven't seen him in many years, and was sad to learn by Google search that he died last year.]

Posted by Mark Liberman at 09:39 AM | Comments (12)

July 05, 2004

"Liberal Bias," Noch Einmal

G&C.html

Ever since I got involved -- well, no, make that "got myself involved" -- in a catfight over partisan labeling and media bias a couple of years ago, I've been receiving emails with pointers to new quantitative studies that purport to show that the media really do have a liberal bias, just as conservatives have been claiming all along.

The latest of these -- and certainly the most ambitious and analytically complicated -- comes from Tim Groseclose of the UCLA Department of Political Science and Stanford Business School and Jeff Milyo of the Harris Public Policy Institute at the University of Chicago. Groseclose and Milyo's study has been approvingly cited by Bruce Bartlett in National Review, by Linda Seebach in the Rocky Mountain News, and by Harvard economist Robert J. Barro in Business Week, not to mention conservative bloggers like Instapundit, Andrew Sullivan, and Matt Drudge, among a number of others, who trumpet its "objectivity." (There's a bit of more critical discussion of the study at deadparrots.)

But sand sifted statistically is still sand. If you take the trouble to read the study carefully, it turns out to be based on unsupported, ideology-driven premises and to raise what would it would be most polite to describe as severe issues of data quality, however earnestly Groseclose and Milyo crunched their numbers. As we linguists have had ample opportunity to learn, sigmas ain't no substitute for scholarship.

The Study

Groseclose and Milyo describe their method as providing an "objective measure of the slant of the news." They proceded in several steps. First, they took a list of "200 of the most prominent think tanks" and looked in the Congressional Record for the period between between 1993 and 2002 to see how often a member of Congress cited each of them for a fact or opinion. Then they assigned a rating to each group that corresponded to the average ADA rating of the members who cited it. On this basis, for example, the conservative Family Research Council was assigned an ADA rating of 6, and the liberal Economic Policy Institute received an ADA rating of 72.

G & M divided the groups in their survey into liberal and conservative sets, according to whether their derived ADA ratings fell north or south of the House and Senate average ADA rating of 42.2. They then looked to see how often groups from each set were cited by news shows on various media sources, effectively giving the media source a point on one or the other side for each sentence of each citation of a group. On that basis, they calculated a derived ADA rating for the media source.

Their results showed, they say, that all the media sources they looked at were far to the left of the center, apart from the Drudge Report and Fox News' "Special Report," which was slightly to the right of center -- its ADA ranking, by their estimate, is equivalent to that of moderate Republicans like Olympia Snowe, and far more liberal than that of the average Republican. (Groseclose and Milyo didn't consider other Fox shows, since they say they were interested "strictly in the news stories of the outlets," the assumption being that a station can run wall-to-wall O'Reilly without being accused of being biased.)

Their conclusion: the media have a "strong liberal bias."

Conceptual Pitfalls

What's wrong with this picture? Just two things: its conception and its execution. Let's begin with the assumption that underlies Groseclose and Milyo's assignment of ratings to the various groups they looked at: if a group is cited by a liberal legislator, it's liberal; if it's cited by a conservative legislator, it's conservative.

On February 24, 2004, for example, in a debate on the medical liability bill, the liberal Senator Christopher Dodd of Connecticut cited "a study conducted by the Rand Corporation and published in the New England Journal of Medicine last year [which concluded] that individuals received the recommended treatment for their condition in only 55 percent of the cases... "

For Groseclose and Milyo, Dodd's citation of the study counts as one piece of evidence that the Rand Corporation is a liberal think tank. In fact, their method assumes that there can be no such thing as objective or disinterested scholarship -- every study or piece of research, even if published in so august a scientific authority as the New England Journal, can be assumed to have a hidden agenda, depending on which side finds its results congenial to its political purposes.

That assumption is of a piece with the neoconservative critique of "objective research," as adopted wholeheartedly by the Bush administration. The tendency is laid out at length in Franklin Foer's piece on the Administration's approach to science and research that appeared in the July 5 New Republic (available here).

As Foer observes, both the administration and neoconservatives have systematically subordinated science to policy. To take just some of the examples that Foer cites, the administration has downgraded the Office of Science and Technology Policy and the Council of Economic Advisors, disregarded the CIA's assessment of Saddam's WMD's and the recommendation of the FDA's scientific advisory panel on the morning-after pill, suppressed passages of the EPA's report on global warning, and blocked the dissemination of a report analyzing the efficacy of congressional legislation limiting the release of sulfur dioxide, nitrogen oxides, and mercury.

As Foer says of the last:

While this suppression seems like naked pandering to the administration's industry friends, there's an ideological superstructure to justify its behavior: Conservatives contend that even scientific conclusions stem from ideological bias. Politicizing Science: The Alchemy of Policymaking, an anthology published by the Hoover Institution and the industry-funded Marshall Institute, is the critique's clearest distillation. The book contends that scientists are driven by a "love of power and domination." They produce studies that show environmental crises, for instance, because these crises spur Congress to spend money on the EPA--which, in turn, finances their research. In other words, as with budget and intelligence analysts, scientists may style themselves as objective, but they are anything but.

This is a convenient position for both the administration and many conservatives. If the facts don't fit the story, ignore them -- after all, scientists have agendas, too. That may explain a tendency for Republicans to cite objective scientific studies less often than liberals do (a confound ignored by G & M, who consider a citation of a Rand-sponsored research paper from the New England Journal of Medicine as qualitatively equivalent to a citation of a position paper from the Heritage Foundation). It also gives conservative scholars ideological license to adjust their methods to produce the desired result -- everybody slants their research, whether they admit it or not.

There are ideological implications, too, in Groseclose and Milyo's decision to split the think tanks into two groups, liberal and conservative. One effect of this was to polarize the data. No group -- and hence, no study -- could be counted as centrist or apolitical. In the event, this entailed a media citation of the Rand Corporation or the AARP would count as evidence for a liberal bias in the same way that a citation of the Heritage Foundation or the American Enterprise Institute would count as evidence for a conservative bias. (If you're puzzled as to why Groseclose and Milyo count the AARP as a "think tank" in the first place, see below.)

In fact, even though the ADA rating that G & L's method assigned to the Rand Corporation (53.6) was much closer to the mean for all groups than that of the Heritage Foundation (6.17), G & L ignored that difference in computing the effect of citations of one or the other group on media bias, compounding the polarization effect. That is, a media citation of a moderately left-of-center group (according to G & M's criteria) balanced a citation of a strongly right-wing group.

(It could be argued, of course, that the same would hold for a moderate right-wing group and an extreme left-wing group, but in fact the "liberal" groups in the study were far more moderate than the "conservative" groups, owing to where G & M drew the line. The average ADA score for the conservative groups in G & M's top 20 was 16.3, whereas the average score for the liberal groups was 65.2 -- slightly less than the ADA rating they calculated for Joe Lieberman. In effect, they achieved their result by classifying a number of moderate groups as liberal.)

This effect was compounded still more when G & M took the dividing line between left-wing and right-wing think tanks to be the midpoint of the House and Senate average ADA ratings, making the voting record of the Congress over recent years the criterion for defining the political center. At another point, G & M defend their decision to use the median ADA ranking of all House members to determine the dividing line between left- and right-wing media outlets. But the Republican majority in the House is proportionally much greater than the disproportion in the popular votes for the two parties in Congressional elections, and the aggregate voting records of House members are hardly representative of voters' views on the issues as revealed in polls. In a Times/CBS poll last year, for example, respondents felt by by 46 percent to 36 percent that Democrats would do a better job than Republicans at making the tax system fair, and just 11 percent believed the President's tax cuts were very likely to create new jobs. By G & M's criterion, however, the "centrist" position would be one that supported the administration's tax proposals.

In effect, G & C have located the political center somewhere in the middle of the Republican Party, by which standard the majority of American voters would count as left-of-center.

Implementation Issues

Even if the Groseclose and Milyo study had been implemented carefully, then, it wouldn't justify the claims that its authors make on its behalf. As it happens, though, the execution of the study was flawed in important respects, which made its conclusions even less useful.

Start with the list of groups from which G & M drew their initial sample. They describe this simply as a list of "the most prominent think tanks," but that isn't quite accurate. In fact their list was drawn from the 200 links included on the site wheretodoresearch.com (which actually describes it merely as a list of "major think tanks and policy groups"). The list was compiled by one Saguee Saraf, a free-lance researcher with a masters degree in history who lists among his achievements that he was named Man of the Year by the Cheshire (Connecticut) Republican Town Committee.

Saraf gives no indication of how his list was compiled, or what criteria were used -- nor, what's more to the point, do Groseclose and Milyo say why they consider the list authoritative. In fact its contents are a jumble of think tanks, lobbying groups, trade associations, and advocacy groups, assembled in a catch-as-can manner. It lists the Oklahoma Council of Public Affairs but not California Tomorrow; the National Right to Life Committee but not Planned Parenthood; the National Federation of Independent Businesses but not the National Association of Manufacturers; the NAACP but not the Urban League, the American Jewish Congress, or the Mexican-American Legal Defense and Education Fund; the Cato Institute but not the Reason Foundation; the Sierra Club and the Audobon Society but not the League of Conservation Voters or the Natural Resources Defense Council. On the grounds of sample choice alone, in short, the Groseclose and Milyo study would be disqualified as serious research on "the most prominent think tanks."

Then, too, Groseclose and Milyo's survey of the citations of groups in the Congressional Record shows some results that would most kindly be described as puzzling. In their list of the "twenty think tanks most cited by members of Congress," for example, they list in 13th place the Alexis de Tocqueville Institution (which they refer to as the "Alexis de Tocqueville Institute"), which comes in ahead of Common Cause (14th), the Family Research Council (16th), and the Economic Policy Institute (19th), not to mention a number of much better-known groups that appear on Saraf's list but not in G & M's top 20, like the NRA and the Hoover Institution.

That result is pretty curious, since the Tocqueville Institution hardly counts among the heavy hitters in the think-tank world. In fact when you look in the Congressional Record, you turn up just 16 mentions of the group since 1993, including a few pieces from the Washington Times written by people associated with it that were inserted into the Congressional Record by Republican legislators, and a number of other mentions that would not have counted as citations by the criteria that G & M said they used. By contrast, the Family Reseach Council, ranked by G & M behind the Tocqueville Institution in Congressional influence, received 186 mentions in the Congressional Record over the same period. And among groups on Saraf's list but not listed in G & M's top 20, the Manhattan Institute received 42 mentions, and the Hoover Institution received 54.

I have no way of knowing why G & M assigned such prominence to the Tocqueville Institution. Whatever the reason, though, it leaves you with the sense that their other results can't be trusted, either. (At another point, G & M explain that they disregarded the ACLU in their final analysis because it turned up with an excessively conservative score, owing to Republicans who cited it for its opposition to McCain-Feingold. Other researchers might wonder whether there might not be similar anomalies in the results obtained for other groups, and might even suspect that this result cast some doubt on their overall method. G & M seem untroubled by that possibility.)

It's clear, too, that a different choice of a sample for the study would have turned out very different results. Among the groups that didn't appear on Saraf's list and so were not examined by G & M, for example, the National Association of Manufacturers received a whopping 617 mentions over the period under consideration -- that is, 60 times as many as the Tocqueville Institute -- the majority of them, not surpisingly, from Republican legislators. And the Conference of Catholic Bishops was mentioned 130 times, most often in connection with the abortion issue. Had those groups and others like them been included in the study, they would presumably have been classified as conservative on the basis of the ADA rankings of members who cited then. (True, those groups aren't "think tanks," but then Groseclose and Milyo did include groups like the AARP, which is hardly a think tank either.)

If those groups had been included, the picture of media bias would have changed considerably, since both groups are widely cited in the media. For example, the Conference of Catholic Bishops has been mentioned on CNN over the past five years more than four times as frequently as the American Enterprise Institute, and the NAM has been mentioned three times as frequently. By excluding conservative groups that are frequently mentioned in the media, the study appears to exaggerate the media's liberal tilt.

I say "appears to" because there is no way to tell from G & M's data what results they would have come up with if they had chosen a genuinely balanced sample that was restricted to think tanks whose prominence was objectively determined, if they had coded the data more reliably, if they had weighted the media citations appropriately, and if they had classified the groups according to a more plausible categorization scheme -- one, for example, in which the AARP was not treated as the "liberal" counterpart of the Heritage Foundation.

It seems a pity to waste so much effort on a project that is utterly worthless as an objective study of media bias. But in the current climate, does anybody care?

[Update 8/4/2004: a response by Groseclose and Milyo can be found here.
-Mark Liberman]

[Update 12/22/2005: other Language Log posts on this topic:
Science, Politics and Fair Play (8/2/2004),
Marx: Red or Blue? (10/31/2004),
A journalist's perspective on (bias in) media citations (11/13/2004),
Linguistics, Politics, Mathematics (12/22/2005),
Multiplying ideologies considered harmful (12/23/2005)
]

Posted by Geoff Nunberg at 09:43 PM | Comments (1)

How Habermas blew it

The June 22 Eurozine has (an English translation of) an (originally German) article by Thierry Chervel, entitled "Europe loses ground".

The Europeans have invented the internet, but the Americans have come up with all business ideas for it. Moreover, American newspapers have proved much more generous when it comes to giving free access to their articles and publications. If Europe wants to create a public sphere, then European newspapers must finally wake up to the chances that the Internet provides.

One symptom of the problems that Chervel describes is the fact that he himself has no home page, as far as I can tell. This is the pathetic best that Google can come up with (translated here for the Eurozine piece), at least on the first few pages it returns. Journalist, document thyself!

Maybe things are turning around a bit for Old Europe: since last December, French has passed Farsi and Polish to vault into third place in the NITLE blog census, and is pressing Portuguese hard for second. And German has moved up from seventh to sixth.

English	1,227,450
Portuguese	79,611
French	79,047
Farsi	63,788
Polish	42,686
German	31,133
Spanish	24,417
Italian	10,235
Dutch	9,047
Chinese (Big 5)	8,587

On the other hand, Spanish has slipped from sixth to seventh, and Italian is unchanged at eighth.

Chervel's essay is an instance of a familiar European hand-wringing type. It's reminiscent in some ways of American political self-loathing, but a closer analogy might be traditional American "missile gap" or "Japanese challenge" jeremiads. Or the old Soviet propaganda about how it was Russians who invented baseball, ice cream and electricity.

Like the American "gap" literature and the Soviet invention stories, Chervel stretches the facts more than a bit, for example when he writes that "Europe supplied the free industrywide standards for the incredible rise of the Internet". However, he makes some interesting points, especially this one:

The saddest embodiment of Donald Rumsfeld's word of the "old Europe" was the work of the German philosophy professor Jürgen Habermas, who wanted to launch his "Kerneuropa -initiative" against the Iraq war and the "new Europe" via various European newspapers. He published his own article in the Frankfurter Allgemeine Zeitung, and assigned his colleagues to the Süddeutsche Zeitung , to the El Pais and in the Corriere della Serra. None of these papers however published the articles online. An interested intellectual in Madrid, Paris or Berlin would have had to go the main train station and purchase four newspapers from three different countries. A few days later, the debate was quickly forgotten.

Had Habermas invested a few thousand Euros to build his small website, had he published his article and those of his colleagues simultaneously in English, the sensation would have been big. Newspapers would have been forced to report. Maybe they would have intervened with their own contributions into the debate. Simultaneously the public would have been able to discuss in forums on "Kerneuropa.org" and through the use of the English language, the entire international public would have been able to participate. [emphasis added].

Would it really have required even a few thousand Euros to build a "small website"? I don't think so.

Like Chervel, Habermas doesn't seem to have even a home page, but there is a web site devoted to his thought -- hosted at California State University, Dominguez Hills, with University of Wisconsin, Parkside. I'll bet that the three professors who founded and run it (two of them emeritus) have not invested any "few thousand Euros".

I can certainly be precise about the financial requirements for Language Log's first year of operation: sweat equity aside, it was $21.85 for the registration of the three domain names languagelog.org/net/com. The web server runs in spare cycles on a (pre-existing) $700 Dell PC used mainly for other purposes in a university office -- its typical load is under .5, even serving around 4,000 pages a day for this site as well as hosting other services for one of my research projects. And my university can handle the (modest) internet bandwidth required for an essentially all-text weblog, without even noticing it (not that any policies are being violated, as far as I know). At some point we might move to a professionally-hosted site, but I don't expect that to cost a great deal either. If it had cost several thousand dollars to start Language Log, I don't think we would have tried the experiment.

It looks to me as if Habermas' university has net access, and I bet there are people there who know how to set up a web site, probably some of them even in his institute. One of the sources of the problem that worries Chervel is that Europeans may be less willing to try informal, inexpensive, bottom-up experiments, of the kind that were involved in the very early stages of Amazon and Google. Certainly my own experience, over 15 years of joint Euro-American research cooperation of various kinds, is that the Europeans generally want to invest a lot of money and time in preparatory studies and surveys and design committees and the like, whereas the Americans would rather "just do it."

Chervel goes on to say that

...the contrast does not lie between Europe and America but between the English speaking public and all others. An Internet service provider such as Arts&Letters Daily which selects the "Articles of Note" for its daily press briefing exclusively from cultural magazines and quality media, can rely on hundreds of sources. Amongst the "articles of note" this week was an article from the English language version of the Arabic newspaper Al Ahram as well as an article from the Guardian or from a obscure journal of an American university institute.

I'm not sure that I believe this. The European intellectual class is multilingual -- if a version of ALD that cited articles in French, German, Italian, Spanish etc. wouldn't work, it's not because of language barriers.

Look at Onze Taal, for example. I read it almost every day, and I'm not even European. I wonder whether Chervel knows that it exists?

[Update: I forgot to check, but there acutally is a kerneuropa.de, though I'm not sure that it has anything to do with this discussion.

And searching Google for {kerneuropa Habermas} returns 663 hits, some of which seem to provide an interesting gateway to other information. I don't think that any of them are what Chervel was asking for, though I'm not sure -- and I wonder if he checked, either.

For those interested in the content of the Habermas/Derrida Kerneuropa initiative, here is an interview with Richard Wolin on the subject, and (contra Chervel) a link to a version of the statement on the Frankfurter Allgemeine Zeitung web site. None of this leads me to believe that this initiative's lack of popular response has much to do with European newspapers' lack of internet presence, or with the failure of Habermas and his associates to make more creative use of the medium. The ideas themselves seem less than inspiring. But I'm prepared to believe that there are differences in the role of the net in public discourse in Kerneuropa vs. the U.S. ]

Posted by Mark Liberman at 06:11 PM | Comments (4)

Talkin' about America

For the sake of completeness -- and to give me another excuse to listen to Ray Charles' stunning recording of America the Beautiful over and over again -- there's some direct evidence for Geoff's conjecture that Charles probably used crown as a preterite form; that is, as crowned with word-final cluster simplification.

In the part of the song right before Brother Ray invokes the choir (which you can listen to here), Charles sings:

I'm talkin' about America! Sweet America!
You know, God done shed His grace on thee,
He crowned thy good -- yes, he did! -- with* brotherhood
From sea to shining sea!

The use of the subject pronoun He by itself shows that this line cannot be an example of the archaic prayer-expression use of a subjunctive main clause, and the interjected yes, he did! confirms that the main verb is indeed a preterite form.

Then, once the choir comes in, Charles sings:

America! I love you America!
You see, God done shed His grace on thee -- and you oughta love him for it!
'Cause He crowned thy good -- He told me He would! -- with brotherhood
From sea to shining sea!

Again, note the use of the subject pronoun. The interjection in this case is in principle compatible with the "original" interpretation of the line (He told me He would, so I'm praying He will any time now!) but is more compatible with Charles' reinterpretation (He told me He would, and He did!).

Somewhat ironically, this very same evidence shows that Charles' use of crown as a preterite form may be irrelevant to Geoff's main point that Charles is also using shed as a preterite form (which he clearly did -- I'm certainly not disputing that**). Both times Charles sings these lines, the crown clause is not coordinated with the shed clause, and each has its own subject. There is thus no necessary grammatical connection between the two clauses and each verb is free to be in its own tense and mood. (Charles is probably aware of what the choir is singing, however, so the connection is still at least potentially relevant in exactly the way Geoff notes.)

What a great recording to listen to over and over again, even if only to listen for this grammatical evidence. Thanks for the excuse, Geoff.

* Actually, it's not clear to me that Charles sings "with" here, but that's beside the point. back

** Further evidence that Charles is using shed as a preterite form, by the way, is the and you oughta love him for it! interjection in the second clip cited above. back

[ Comments? ]

Posted by Eric Bakovic at 03:32 PM

Faze phase

It's much worse than base vs. bass, and almost as tough as principle vs. principal. People have trouble with phase vs. faze. Even some journalists and copy editors get them mixed up. But except for the fact that the two words are pronounced in exactly the same way, they could hardly be more different.

A phase is a stage of development, a temporary pattern, a cyclically recurrent form, a stage in a periodic process, a form of matter, etc. Phase is basically a noun, though of course there is a denominal verb meaning to carry something out by phases, etc. According to the AHD, phase is a "back-formation from New Latin phasēs, phases of the moon, from Greek phaseis, pl. of phasis, appearance, from phainein, to show.

To faze someone is to disrupt their composure or disconcert them, from Middle English fesen "to drive away, frighten". There is no deverbal noun in common use.

But Google's News index currently has at least recent two "phase" errors from the NYT (both of which have expired from the paper's free archive):

Pace does not phase Sharapova, as she proved with her come from behind victory over the equally big-hitting American Lindsay Davenport in Thursday's semifinals.

These sorts of demands do not phase Ms. Kramen, who happily considers herself "a lifer."

And the BBC web site has a few more:

(link) But given that Later...is for a more refined audience, that's unlikely to phase him or his loyal supporters who'll continue to look forward to the next collection this time next year.

(link) Alone, his expletive tantrum did not phase the young dude at the wheel.

The internet at large is full of phase/faze confusions, in both directions:

It’s just a faze in the relationship, you’ll get over it sooner or later.

The present paper shows that introduction of an additional faze shift permits determination of very small displacements and also presents the portable interferometer and the technique for measurement of residual stress in field conditions.

It just got to the point where things like that didn't phase me anymore.

Kucinich not phased by Gephardt's early dropout

All the journalistic confusions that I've seen are in the direction of using phase in place of faze, rather than the other way around. If this is true, it's probably because phase seems like a higher-prestige word.

This is also true for the two examples of phase/faze confusion documented by the OED:

1889 'MARK TWAIN' Yankee at Crt. K. Arthur (Tauchn.) II. 154 His spirit -- why, it wasn't even phased.
1898 R. B. TOWNSHEND in Westm. Gaz. 19 Nov. 2/1 It don't seem to 'phase' him in the very slightest.

I first thought that only the accidents of 18th- and 19th-century spelling standardization prevented phase and faze from merging in the way that the Greek and Germanic variants of pole seem to have done. However, the OED's citations for phase start in 1812, and for faze in 1830. So neither word has been in use all that long, apparently, and they've probably been confused from the beginning, at least in one direction.

[NYT Sharapova citation from Maryellen MacDonald]

Posted by Mark Liberman at 10:59 AM | Comments (6)

No Professor Left Behind

A few years ago, after a few drinks, a friend I'll call X told me something shocking about English graduate students. I don't mean students from the southeastern portion of the British Isles, I mean students in English Departments in American universities. And the shocking fact that X revealed to me had nothing to do with sex, money, power or even real estate. Believe it or not, it was a secret about poetry.

According to X, English graduate students can't scan. At least, X told me, elite graduate programs don't require students to learn this skill, or test whether or not they have it. "I bet that two thirds of them wouldn't know a line of iambic pentameter if it bit them on the butt", to quote X more exactly.

At the time, I was skeptical. Maybe programs don't teach scansion of metered verse any more, but how could someone get to be a graduate student in English without picking up, somewhere along the way, the ability to tell a couple of heroic couplets from a ballad stanza? Older intellectuals often get to ranting about how today's youth are a bunch of uncultured yahoos, etc., and my instinct is to stick up for the kids. So I ordered another round and changed the subject.

But now I wonder. The eminent critic Marjorie Perloff, in an online review, complains about the Oxford Anthology of Modern Poetry that

a good portion of [its] pages are taken up by texts classifiable as "poetry" only because they are lineated or, in the earlier part of the century, use meter and rhyme. Here, for example is "The Heart of a Woman" by the African-American poet Georgia Douglas Johnson:

The heart of a woman goes forth with the dawn
As a lone bird, soft winging, so restlessly on;
Afar o’er life’s turrets and vales does it roam
In the wake of those echoes the heart calls home.
The heart of a woman falls back with the night,
And enters some alien cage in its plight,
And tires to forget it has dreamed of the stars
While it breaks, breaks, breaks, on the sheltering bars. (1918)

These chug-chug iambic pentameter stanzas rhyming aabb remind one of a Hallmark card...

Hallmark sentimentality, maybe; and aabb rhyme scheme, for sure; but these lines are not iambic pentameter -- they're anapestic tetrameter. The terminological distinction is not crucial for Perloff's argument, but you'd think that one of the stars of the Stanford English Department would get it right.

And apparently Charles Bernstein has identified this line as pentameter

Just to view de homeland England, in de streets of London walk

(though I read this in Mike Snider's weblog, and haven't checked his reference to Bernstein's Poetics of the Americas).

So I'm still not sure about the students, but I'll accept this as prima facie evidence that there might be a problem with the professors.

I blame the linguists. We've somehow allowed a generation or two of intellectuals to grow up without elementary skills in the formal analysis of speech and language. Simple phonetic transcription, fundamentals of morphology and syntax, elements of logic, basic verse scansion...

Just in case you don't get it, that's a wry joke. There's been a broader educational trend away from formal analysis and specific skills, in favor of problem-solving and "learning to learn". In that context, blaming linguists for the fact that English professors can't scan is like blaming philosophers or religious leaders for the fact that MBAs are unethical.

Still, who else is going to fix the problem?

So maybe it's time for a new national program: No Professor Left Behind.

[Perloff and Bernstein references via Mike Snider via Jonathan Mayhew].

Posted by Mark Liberman at 07:26 AM | Comments (13)

Isis bibliography

For enthusiasts of the Isis, ISIS, double is, is is, double be, be be, whatever, construction in English, here's a very small bibliography of relevant publications: Isis bibliography. The thing is, is this is probably incomplete.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 01:09 AM

July 04, 2004

Ray Charles, America, and the subjunctive

Being an immigrant American citizen, I am of course even more likely than the average American to get misty-eyed on hearing the great patriotic songs that radio stations tend to find pretexts for playing on Independence Day. And naturally I thrill to the recording of Ray Charles doing his wonderful rendition of "America the Beautiful". Of course, being a grammarian, I also notice a interesting little indication of a misunderstanding of the lyrics caused by an unusually archaic construction that has a non-archaic alternative interpretation. For me, it doesn't detract from the aesthetic experience at all. But if you don't want to become aware of Ray's mistake, if you think it would stop you enjoying his performance of the song, you should stop here and not read on.

* * * * * * * * *

I should mention that Ray Charles was here in Santa Cruz County just last summer. I watched him at the blues festival in Aptos Village Park. He was just a month shy of 73, the age at which he died. Yet it wasn't a question of seeing an old man trying to conjure up the days when he could do his songs (it can be a real disappointment to see one's idols too late). To my astonishment he was in his prime, at his peak. His show was scintillating. I have never seen such accomplished musicians gathered together at any rock or blues venue before; the members of the Ray Charles Orchestra were world class; there are few big bands of that caliber touring anywhere anymore. And center stage, Ray Charles had apparently spent the past fifteen years or so learning new skills. He had a synthesizer, on which he was an absolute expert. At one point his guitarist did a solo which I would have described as competent but not exactly brilliant, and -- as if reading my mind and agreeing with me -- suddenly Ray Charles flipped his synthesizer into a perfect imitation of the Fender Stratocaster guitar sound and did a second guitar solo, a much more accomplished one. I guess the men in his orchestra had to simply put up with that: if you're on stage with a true towering genius and natural showman, you're just not going to have much of the limelight shed on you. Ray Charles dominated the show, he bounced with energy, he loved what he was doing, he absolutely rocked.

I digress, of course; but it is an opportunity to endorse the not very controversial view that Ray Charles was one of the greatest figures in 20th century music. His death put a tear in my eye and an ache in my heart throughout the week of Reagan's extended funeral ceremonies. People must have thought I mourned the dead president, but my sadness was all for a man from the opposite corner of the country who managed to produce, in one career, the best records I have ever heard in at least three or four different musical genres. His taste was impeccable; his soulfulness was real; his artistry was beyond belief. You have to have played his music professionally, as I did in the 1960s, to realize how good he was.

So after all this, what's the slip in "America the Beautiful"? Well, after the spine-tinglingly effective bit where he calls out "I wish I had somebody to help me sing this" and a full choir obligingly comes in to join him, he starts elaborating in gospel style on the lines as the choir sings them. And as the choir does the line "God shed His grace on thee", Ray's out-of-time embellishment, in a deliberately down-home, non-standard variety of English, is: "God done shed His grace on thee."

Now, in the non-standard dialects that have it, this is an indicative past tense. To say My baby done gone is roughly equivalent to saying in Standard English My sweetheart has departed. The line Ray Charles calls out means, in those dialects, that God has at some time in the past shone His grace on America.

But that is not the right interpretation of the line. Addressed to America, it actually expresses a prayer that God should please shed His light on her. It's part of a pair of coordinated subjunctive main clauses. This is highly archaic; one only sees the same construction in a few fixed phrases still in use, like So be it ("Let it be thus"), Far be it from me to... ("May it be far from me to..."), etc.; see The Cambridge Grammar, p. 90. For example, Long live the Queen means "May it be the case that the Queen has a long life." Katherine Lee Bates' lyrics are using this kind of construction:

America! America!
God shed His grace on thee,
And crown thy good with brotherhood
From sea to shining sea!

Why is there a possibility of reading the verb form shed as a preterite? Because of a small morphological point. There are about 24 verbs in English that have identical past participle, preterite, and plain form. Put and hit are examples. And shed is one of that class. So when you hear God shed His grace..., you do know it is not a present tense (that would be God sheds His grace...), but you cannot tell whether it is a preterite (the most likely analysis) or an archaic use of the plain form in some sort of subjunctive construction (e.g., in It is vital that God shed His grace on us).

The clues come later: when you hear on thee you know we are dealing with archaic language. And when you hear crown you have your crucial piece of evidence. The preterite of crown is crowned, so the line And crown thy good with brotherhood cannot be a preterite. Yet it's coordinated with God shed His grace on thee, so it should be the same tense and mood as that. The only solution is that both are uses of the plain form in a subjunctive main clause construction. The meaning of the lines in red above is "May it be the case that God sheds his grace on you [America], and may He crown your good with brotherhood."

It's reasonable enough that Ray Charles should have misunderstood that line. His dialect (English as learned by an extremely poor African American child born in Albany, Georgia and raised during the 1930s in the small town of Greenville in north Florida) would have no main clause subjunctives at all. Shed would be encountered as a plain form (in infinitival clauses), as a plain present (used when the subject is not 3rd singular), as a past participle, and as a preterite. The only possible analysis of the lines above is to take it as a preterite. And crown would not be crucial counterevidence for him. Recall that Ray Charles began to go blind at the age of five. He would not have read Katherine Lee Bates' lyrics; he would only have heard them sung, mostly by African Americans, since the South was strictly segregated, and certainly mostly by Southerners. And in the dialects of the South (especially African American vernacular dialects), final consonants are mostly or often dropped in clusters of consonants with the same voicing: land is pronounced lan'. So hearing crowned apparently pronounced as crown would be quite natural. Both shed and crown could therefore be taken as preterites.

Since the only two jobs at which I have ever earned my living are soul musician and linguist, I guess I am a natural to notice the point. National Public Radio had an educated white musicologist on to talk about Ray Charles' music, and he commented on the performance of "America the Beautiful", and specifically mentioned the added force of doing the embellishment on God shed His grace on thee in colloquial dialect, but he didn't notice the point I've made here, that it was a misreading of the line.

It doesn't matter, of course. Ray Charles was still one of the greatest musicians of the 20th century in any genre; I still deeply miss him; and his performance of "America the Beautiful" is still musically stunning, a performance to treasure every Independence Day, forever. Happy Fourth of July.

Posted by Geoffrey K. Pullum at 06:45 PM

I trust not

On June 24th, 1826, the great American linguist Thomas Jefferson wrote and sent a letter to the mayor of Washington DC, who had invited him to attend the July 4th festivities there.

"RESPECTED SIR, -- The kind invitation I receive from you on the part of the citizens of the city of Washington, to be present with them at their celebration on the fiftieth anniversary of American Independence as one of the surviving signers of an instrument pregnant with our own, and the fate of the world, is most flattering to myself, and heightened by the honorable accompaniment proposed for the comfort of such a journey. It adds sensibly to the sufferings of sickness, to be deprived by it of a personal participation in the rejoicings of that day. But acquiescence is a duty under circumstances not placed among those we are permitted to control. I should, indeed, with peculiar delight, have met and exchanged there congratulations personally with the small band, the remnant of that host of worthies, who joined with us on that day in the bold and doubtful election we were to make for our country between submission or the sword; and to have enjoyed with them the consolatory fact that our fellow citizens, after half a century of experience and prosperity, continue to approve the choice we made. May it be to the world what I believe it will be (to some parts sooner, to others later, but finally to all): the signal of arousing men to burst the chains under which monkish ignorance and superstition had persuaded them to bind themselves, and to assume the blessings and security of self-government. That form which we have substituted restores the free right to the unbounded exercise of reason and freedom of opinion. All eyes are opened, or opening, to the rights of man. The general spread of the light of science has already laid open to every view the palpable truth that the mass of mankind has not been born with saddles on their backs, nor a favored few booted and spurred, ready to ride them legitimately by the grace of God. These are grounds of hope for others. For ourselves, let the annual return of this day forever refresh our recollections of these rights, and an undiminished devotion to them.

Jefferson was unsure whether he would live to see July 4. On June 24, the same day he wrote the letter to the mayor, he told a friend "that he had called in a physician and to gratify his family would follow his prescriptions, but that it would prove unavailing: the machine had worn out and would go on no longer".

He did live to "breath the air of the 50th anniversary", but barely, dying on the 4th.

Jefferson's old colleague and enemy John Adams died the same day, apparently a bit later. Adams' last words are said to have been "Thomas Jefferson survives", though apparently the last word in this phrase was "indistinct and imperfectly uttered". The transcription is a plausible one, since Adams apparently insisted in his later years on his intention to "outlive Jefferson", despite being seven years older.

In 1798, under the threat of a war with France, Adams' Federalist party passed a series of acts of congress including the Act Concerning Aliens and the Act for the Punishment of Certain Crimes Against the United States (usually known as the Alien and Sedition Acts respectively). These acts (which George Washington supported) were allowed to lapse after Jefferson's Republican party defeated Adams in the election of 1800.

The start of the Act Concerning Aliens:

That it shall be lawful for the President of the United States at any time during the continnuance of this act, to order all such aliens as he shall judge dangerous to the peace and safety of the United States, or shall have reasonable grounds to suspect are concerned in any treasonable or secret machinations against the government thereof, to depart out of the territory of the United States ...

As I understand it, the suspect aliens at that time were mainly French, Irish, and British. Here's a crucial passage from the Sedition Act:

And be it farther enacted, That if any person shall write, print, utter or publish, or shall cause or procure to be written, printed, uttered or published, or shall knowingly and willingly assist or aid in writing, printing, uttering or publishing any false, scandalous and malicious writing or writings against the government of the United States, or either house of the Congress of the United States, or the President of the United States, with intent to defame the said government, or either house of the said Congress, or the said President, or to bring them, or either of them, into contempt or disrepute; or to excite against them, or either or any of them, the hatred of the good people of the United States, or to stir up sedition within the United States, or to excite any unlawful combinations therein, for opposing or resisting any law of the United States, or any act of the President of the United States, done in pursuance of any such law, or of the powers in him vested by the constitution of the United States, or to resist, oppose, or defeat any such law or act, or to aid, encourage or abet any hostile designs of any foreign nation against United States, their people or government, then such person, being thereof convicted before any court of the United States having jurisdiction thereof, shall be punished by a fine not exceeding two thousand dollars, and by imprisonment not exceeding two years.

This would certainly take care of Rush Limbaugh and Michael Moore.

Representatives of 17 of the 20 or so anti-Federalist newspapers were charged under the Sedition Act, and 10 were convicted. Benjamin Bache, Benjamin Franklin's grandson, died while awaiting trial for sedition. Bache's successor as editor of the Philadelphia Aurora, William Duane, was attacked and nearly killed in the newspaper's offices by a Federalist mob, and was then charged with "seditious riot".

On June 1, 1798, Thomas Jefferson wrote to John Taylor

A little patience, and we shall see the reign of witches pass over, their spells dissolved, and the people recovering their true sight, restoring their government to its true principles.

He was right. From his first inaugural, March 4, 1801:

During the contest of opinion through which we have passed the animation of discussions and of exertions has sometimes worn an aspect which might impose on strangers unused to think freely and to speak and to write what they think; but this being now decided by the voice of the nation, announced according to the rules of the Constitution, all will, of course, arrange themselves under the will of the law, and unite in common efforts for the common good.

[...]

We are all Republicans, we are all Federalists. If there be any among us who would wish to dissolve this Union or to change its republican form, let them stand undisturbed as monuments of the safety with which error of opinion may be tolerated where reason is left free to combat it. I know, indeed, that some honest men fear that a republican government can not be strong, that this Government is not strong enough; but would the honest patriot, in the full tide of successful experiment, abandon a government which has so far kept us free and firm on the theoretic and visionary fear that this Government, the world's best hope, may by possibility want energy to preserve itself? I trust not.

I'll raise a glass to that.

I'd like to think that John Adams' dying words are true now, though false at the time he uttered them: Thomas Jefferson survives.

Posted by Mark Liberman at 09:13 AM | Comments (0)

Unchanging Pronouns?

A while back John McWhorter was discussing here an article on distant relationships that emphasized the popular claim that personal pronouns are super-stable -- that they can confidently be expected to persist in languages over very long periods and are thus reliable indicators of genetic relationships among languages (that's `genetic' in the historical linguist's metaphorical usage). One reason historical linguists are skeptical of this claim is that it's so easy to find languages in which personal pronouns have undergone a lot of change. One source of instability in pronouns is borrowing: in an earlier post Bill Poser pointed out that in some parts of the world, for instance Southeast Asia, pronouns are borrowed freely, so that pronoun borrowing is by no means out of the ordinary (that is, it's not the marked case). Another source of change in pronoun systems is analogy of various kinds. Below are two fairly typical examples of pronoun systems that have changed enough to make detection of historical links difficult or impossible through simple inspection.

First, consider the Indo-European language family, almost everyone's favorite source of illustrations of how languages are related. Here are three sets of Indo-European pronouns, from three different branches of the family:

	LATIN	RUSSIAN	ENGLISH
sg. 1:	ego	ja	I
2:	tu	ty	you
3:	is	on	he
pl. 1:	nos	my	we
2:	vos	vy	you
3:	isti	oni	they

In the Latin set, other choices could have been made for the third-person forms; these comprise one of several sets of masculine forms. None of the other possible choices resembles any of the Russian or English forms either, however. In the Russian transliteration, the vowel y is a high back unrounded vowel (not found in English); it's not like English y.

The point about these three sets of forms -- which are the forms you'll find if you look up the English meanings in a bilingual English/Latin or English/Russian dictionary (namely, the method used by Greenberg and others who search wordlists for similarities) -- is that not one pronoun obviously matches across all three languages. The three words for `I' are in fact etymologically connected, but a casual inspection certainly wouldn't pick them out as similar enough to count as potential matches. Latin and Russian share a second-person singular pronoun, but English doesn't, because thou is now obsolete, replaced by you by analogy to the second-person plural. None of the words for `he' or `they' match, and it'd be a big stretch to claim a match (on casual inspection) for `we'; one might or might not hypothesize a match for Latin vos and Russian vy, but English you wouldn't fit, if one is using Greenberg's method. So if you were looking for a relationship among just these three languages, the pronouns would give you little or no basis for a `yes' answer.

Now consider the three sets of forms below, from three of the 26 or so Salishan languages (spoken by rapidly dwindling numbers of elders in Washington, British Columbia, Idaho, and Montana):

	SQUAMISH	MONTANA SALISH	COEUR D'ALENE
sg. 1:	ʔəns	qwoyʔe	ʧineʔ
2:	nəw	anwi	kwuw'e
3:	tiwa	cniɬc	cenil
pl. 1:	nimaɬ	qejen	ʧlipust
2:	nəwyap	nple	kwuplipust
3:	ʔiaʔwit	cniʔɬc	cənililʃ

Salishan languages are more closely related than Indo-European languages are. The standard estimate for the time depth of the Indo-European parent language, Proto-Indo-European, is perhaps 6,000 years BP. (This estimate is unshaken by recent dramatic but not very convincing claims of different time depths.) Latin, 2,000 years older than Modern English and Modern Russian, is of course closer in time to Proto-Indo-European than the modern languages are. Proto-Salishan has been estimated at about 4,000 years; Montana Salish and Coeur d'Alene are considerably more closely related than either is to Squamish, because these two belong to the same branch of the family (Southern Interior Salish). But even with the shallower time depth for Salishan, no single pronoun obviously matches across all three languages, and three or four of the pronouns (1sg., 2sg., 1pl., and probably 2pl. ) clearly don't match in Montana Salish and Coeur d'Alene.

So what does this mean for the super-stable-pronouns hypothesis? What it means is that pronouns, like all other words, are changeable, via regular sound changes, analogic restructuring, and borrowing. Sound changes can and often do make the connections between historically related forms unrecognizable, as with the Indo-European words for `I' in Latin, Russian, and English. Analogic change introduces (often restructured) old pronouns into new places in the system, and borrowing brings entirely new pronouns into the system; in both cases the original pronoun is replaced by a different one. This is no surprise to historical linguists. It may be a surprise to those who still believe that it's possible to investigate language history, including language relationships, by taking shortcuts, without doing any actual historical linguistic analyses.

One more question: do pronominal systems change less than other parts of the vocabulary? I don't know the answer. But I don't think anyone else does either.

Posted by Sally Thomason at 12:31 AM | Comments (0)

July 03, 2004

Pronouns with following antecedents in subordinate clauses

The word "antecedent" suggests (Latin ante being a preposition meaning "before") that an occurrence of a pronoun must be linked to a noun phrase (NP) that has occurred before the pronoun occurrence. Not true. The so-called antecedent can come later if the pronoun is in a subordinate clause: When he had finished eating, John got up from the table. There he had finished eating is a subordinate clause and John is in the main clause, and we can certainly understand the pronoun as having John as its antecedent. But can an antecedent come later if the tables are turned? Can you have a pronoun in the main clause coming earlier than an antecedent in a subordinate clause?

The standard view seems to be no. Pronouns can refer back to earlier sentences, and subordinate clause pronouns can refer forward to main clause antecedents yet to come, but main clause pronouns can't refer forward to subordinate clauses yet to come. But the standard view may not be quite right, at least not for pronouns functioning as genitive NP determiners. The following example occurs under the headline "Still taking on the world?" as the subtitle of the first leader in the July 3, 2004, issue of The Economist:

His foreign critics need to notice that George Bush has now done what they want

Since nothing precedes it on the page or in the leader section, the only reasonable analysis is that the pronoun his, which functions as the determiner in the subject NP of the main clause, has as its antecedent a following NP, the subject of a content clause (George Bush has now done what they want) which is the complement of the verb notice, which is in turn head of the subordinate infinitival complement of need, hence is doubly embedded, in the second subordinate clause down. Not every theory of pronominal anaphora predicts this possibility. (The Cambridge Grammar of the English Language cites, on page 1479, the sentence ^?Her husband had supported Ann throughout the ordeal, the question mark marking it as being of questionable acceptability. But it doesn't have the antecedent in a subordinate clause. That possibility is not mentioned, either to say it is possible or that it is not. Note, however, the remarks about "Anticipatory anaphora for rhetorical effect" on page 1480, which is very probably what The Economist's subhead is a case of.)

Posted by Geoffrey K. Pullum at 04:41 PM | Comments (5)

Eggcorn of the day: sir name

Reader acw sent in the eggcorn "sir name" for surname. The 3580 hits in Google are things like this:

(link) The goals of this project are to provide others the opportunity to verify their relationship to the Ledford Sir Name Genetically and Identify the Genetic Roots of our Ancestry.

As the AHD explains, surname is from French sur+nom. This has nothing to do with sir, which is a variant of sire, which is from Latin senior by way of Old French and Middle English. But it's a classic "sporadic folk etymology" to make the connection.

Posted by Mark Liberman at 11:01 AM | Comments (2)

Voynich and midfix

The July Scientific American has a feature on the Voynich Manuscript, by Gordon Rugg, whose work we mentioned back in January.

His theory is that the manuscript is a hoax, created by using a Cardan grille,

... which was introduced by Italian mathematician Girolamo Cardano in 1550. It consists of a card with slots cut in it. When the grille is laid over an apparently innocuous text produced with another copy of the same card, the slots reveal the words of the hidden message. I realized that a Cardan grille with three slots could be used to select permutations of prefixes, midfixes and suffixes from a table to generate Voynichese-style words. [emphasis added]

I haven't ever seen the term midfix before -- linguists talk about infixes, but Rugg means something a little different. We usually have in mind a system where a stem is combined with various affixes, which might be prefixes, suffixes, or infixes. But Rugg's hypothesis has no stem/affix distinction, just a simple grammar of "one from column A, then one from column B, then one from column C".

It does seem that midfix is more widely used in system-administration and similar contexts: (link) "The namespace filename has a prefix, midfix and suffix".

Posted by Mark Liberman at 10:42 AM | Comments (2)

Not the * I know: let * be *

This morning I stumbled on a weblog that noted two snowclones in recent political discourse: first John Kerry's slogan "Let America be America again"; and then George Bush's remark "This is not the America I know."

The Kerry slogan comes from the title of a 1938 poem by Langston Hughes. But it reminded me first of Nancy Reagan's presidential debate instruction "let Reagan be Reagan", widely taken up by the American right.

If we leave out again (to get the "core snowclone" :-)), we can find instances of "let X be X" for X = Augusta, Bartlet, Bush, bygones, children, China, Chirac, Clinton, country, Cypriots, dad, Europe, Garfield, God, Iraq, librarians, mom, Rumsfeld, Russia, 'Sheed, Singh, Taiwan, upstate, and very many others.

The usual idea seems to be that some (bad) circumstances or forces are preventing X from expressing its (good) fundamental nature. There's a variant that argues against inappropriate homogenization: "let country be country and pop be pop", or "let China be China and let Taiwan be Taiwan". No doubt someone has sorted out whether all this is just the normal semantics of predicate nominals in generic or habitual settings ("Clinton was just being Clinton", "he's just being himself"), or whether there is something more going on.

Phrases such as "this (or that or these) is (or are) not the X I know" are also in common use: here are examples for X = Afghanistan, Bob, Christianity, England, FDA, God, Hollywood, Hasan, India, Islam, Italy, Jesus, Kuwait, Maine, Mexico, Najaf, New York, NFL, Paris, Pennsylvania, Radiohead, Shakespeare, species, Superman, Tom, Tony.

Here the usual rhetorical force seems to be "this is not typical for X, as I am in a position to know". Again, there's some more general semantics ("the X that S" where X might not normally take a definite article), and maybe something extra.

George W. Bush's use of the phrase "This is not the America I know" (or sometimes "That is not the America I know") in his remarks on the abuse of prisoners at Abu Ghraib has been widely quoted. Curiously, I haven't been able to confirm that he said these exact words. The closest thing that I've been able to find is what he said to Al Hurra Television on May 5, 2004, which according to the whitehouse.gov transcript was

First, people in Iraq must understand that I view those practices as abhorrent. They must also understand that what took place in that prison does not represent America that I know. [sic] The America I know is a compassionate country that believes in freedom. The America I know cares about every individual. The America I know has sent troops into Iraq to promote freedom -- good, honorable citizens that are helping the Iraqis every day.

(I suspect that the missing "the" is a transcription error).

Whether or not he used it in reference to Abu Ghraib, "X is not the America I know" is a phrase that President Bush has used more than once in his career. It occurs on the whitehouse.gov website in six places, starting with the transcript of "Remarks by the President at Islamic Center of Washington, D.C." on September 17, 2001:

Women who cover their heads in this country must feel comfortable going outside their homes. Moms who wear cover must be not intimidated in America. That's not the America I know. That's not the America I value.

and in the transcript of "Remarks by the President on Iraq" at the Cincinnati Museum Center on October 7, 2002:

Failure to act would embolden other tyrants, allow terrorists access to new weapons and new resources, and make blackmail a permanent feature of world events. The United Nations would betray the purpose of its founding, and prove irrelevant to the problems of our time. And through its inaction, the United States would resign itself to a future of fear.

That is not the America I know. That is not the America I serve. We refuse to live in fear. (Applause.)

and in the transcript of presidential remarks at a "White House Conference on Character and Community" on June 19, 2002:

I think it's particularly important in a day and age where some question the value system of America that we teach people to serve a neighbor -- people to love a neighbor like they'd like to be loved themselves. There's a question in our society as to whether or not we're so self-absorbed and materialistic that we won't fulfill our obligations as a nation.

That's not the America I know, and the America I believe exists.

and in transcript of remarks on October 14, 2002, in reference to the sniper attacks in the Washington DC area:

The sniper attacks, first of all, I'm just sickened, sick to my stomach to think that there is a cold-blooded killer at home taking innocent life. I weep for those who lost their loved ones. I am -- the idea of moms taking their kids to school and sheltering them from a potential sniper attack is not the America I know. And therefore, we're lending all the resources of the federal government, all that have been required to do everything we can to assist the local law authorities to find this -- whoever it is.

and in the transcript of "Remarks by the President at Argonne National Laboratory - Illinois" on July 22, 2002, about anti-terrorism technology:

I don't know what was going through the minds of the enemy when they were plotting and planning. I don't know who they thought they were attacking. They must have thought this country was so materialistic, so self-absorbed that we would sit back and, you know, after the attacks, maybe file a lawsuit or two. (Laughter.) That's not the America I know. And that's not the America you're a part of.

and in the transcript of "Remarks by the President on Education Accountability", in Lacrosse, Wisconsin on May 8, 2002:

I can't imagine what went through the minds of our enemy when they attacked us on September the 11th. You know, they must have thought America was so self-absorbed, so materialistic, so selfish that we would cower in the face of a challenge -- well, we might file a few lawsuits or two, but that would be all we would do. But that's not the America I know and that's not the America you're a part of. This is a country that when it comes to defending that which we believe in, when it comes to defending our freedoms, we are patient, we're deliberate, and we are plenty tough. (Applause.)

Note that Bush always uses that, not this, at least in these transcripts. By comparison, Google finds 768 examples of "this is not the * I know" and 756 examples of "that is not the * I know".

Posted by Mark Liberman at 09:29 AM | Comments (0)

July 02, 2004

Two comments on six keys

I can't wait to learn what the other 101 key ideas of 20th-century linguistics are. Meanwhile, I have two small comments on the six personalized ideas that Arnold Zwicky just told us about.

First, the most recent piece of "key" work by any of the cited figures was Grice's 1967 Harvard lectures. So really, it's "six key men of the first 2/3 of the 20th century". Do the comparable lists for astronomy, mathematics, biology, psychology etc. end so early?

Second, I have to quote the Lila Gleitman t-shirt joke again:

[on the front] H(enry). Gleitman: Most great scientists are not great men.
[on the back] L(ila). Gleitman: Yeah. For instance, I'm not a great man.

Posted by Mark Liberman at 06:02 PM | Comments (3)

Six Key Men of Twentieth-Century Linguistics

Teach Yourself Books has a 101 Key Ideas series. "Each book contains short accounts [no more than one small uncluttered page each] of 101 key ideas" -- in fields ranging from Astronomy to World Religions, including, yes, Linguistics, a thin volume written by Richard Horsey and published in 2001.

Well, six of these 101 Key Ideas in Linguistics aren't ideas at all, but people. Men, in fact. None dead a hundred years now. So this part of the book is really a list of Six Key Men of Twentieth-Century Linguistics.

Now, take out a slip of paper and write down your six nominees for the Key Men of Twentieth-Century Linguistics. No cheating: no checking Horsey's book or peeking ahead in this posting. If anyone, absolutely anyone, playing fair, gets the same list as Horsey, I'll be astonished. In fact, if you manage this feat, e-mail me and I'll take you out to dinner at the next conference we're both at.

Ok, here's Horsey's list, in alphabetical order: Leonard Bloomfield, Noam Chomsky, Gottlob Frege, H. Paul Grice, Roman Jakobson, and Ferdinand de Saussure. "Sapir-Whorf hypothesis" gets an entry, but Horsey gives no biographical data on either man, nor any discussion of their intellectual contributions beyond the SWH, so they don't count.

Frege and Grice are the surprises, of course. Getting the other four is no great feat, but if you got both of these names, then you definitely have a Horsey take on things, and you get a dinner.

By the way, a tenth of the book is taken up by entries on divisions of the field: Historical linguistics, Lexicon, Morphology, Phonetics, Phonology, Pragmatics, Psycholinguistics, Semantics, Sociolinguistics, and Syntax. So what counts as an "idea" in linguistics is generally pretty flexible. And applied linguists and computational linguists didn't make the cut.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 05:05 PM

The sins of dialogue attribution

Mark's post from earlier today reminds me that Stephen King also admonishes the use of adverbs in dialogue attribution in his book On Writing (p. 124, emphasis in the original):

I believe the road to hell is paved with adverbs, and I will shout it from the rooftops. To put it another way, they're like dandelions. If you have one on your lawn, it looks pretty and unique. If you fail to root it out, however, you find five the next day ... fifty the day after that ... and then, my brothers and sisters, your lawn is totally, completely, and profligately covered with dandelions. By then you see them for the weeds they really are, but by then it's--GASP!!--too late.

I can be a good sport about adverbs, though. Yes I can. With one exception: dialogue attribution. I insist that you use the adverb in dialogue attribution only in the rarest and most special of occasions ... and not even then, if you can avoid it.

After about a page of examples and critical commentary, King assails dialogue-attribution verbs other than "said":

Some writers try to evade the no-adverb rule by shooting the attribution verb full of steroids. The result is familiar to any reader of pulp fiction or paperback originals:
"Put down the gun, Utterson!" Jekyll grated.
"Never stop kissing me!" Shayna gasped.
"You damned tease!" Bill jerked out.
Don't do these things. Please oh please.

King then praises "Larry McMurtry, the Shane of dialogue attribution" (Elmore Leonard gets his due later on in the book) and admits that he himself is "just another ordinary sinner", having "spilled out [his] share of adverbs in [his] time".

On Writing is not a particularly memorable book for its advice on writing -- I picked up the paperback mostly for the autobiographical stuff, especially King's description of his car accident in 1999. But this bit on the sins of dialogue attribution struck me when I read it because I happened to be reading a very sinful book at the same time: David Chilton's financial self-help book The Wealthy Barber.

(Why was I reading a financial self-help book? I had recently landed a steady-ish job, gotten married, and bought a house. Financial education in our nation's high schools is as nonexistent as linguistic education. My wife had received a copy of the book as a gift from her aunt. That's why. Now, with that disclaimer out of the way ...)

Chilton's book has some pretty good advice (for those of us who are financially fairly ignorant, anyway). Mind you, this advice could have been summarized on a single (maybe legal-sized) sheet of paper, but Chilton decided to write it as a kind of play in several acts. The real problem, however, was that I couldn't stand Chilton's style. I couldn't really put my finger on what it was that I hated so much, though, until I read King's bit on dialogue attribution. Here are some examples from a random page of Chilton's book, just to give you a taste (p. 64, if you must know):

"Yeah, Roy, and you know what else?" Cathy beamed.
"[...] Your advice last month may have been great, but I'm not all that happy with my haircut," I kidded.
"I guess I'll have to get a new bowl," Roy quipped.
"Or as long, I hope," Tom interjected.
"Or as long," Roy confirmed.
"And to think that I looked forward to this for a month," Cathy groaned.
"Yeah. I read an article the other day saying that the majority of Americans are overinsured," I pointed out, thinking that for once I'd look informed.
"Wrong," Roy replied.

Flipping over to pp. 134-135, we find said triumphantly, bristled, verified, complained with a wink, replied sheepishly, shrugged, agreed, frowned, noted, responded, consented, cautioned, argued, and laughed.

Larry McMurtry this guy isn't. I hope that Chilton and others like him come across King's book someday -- or, better, Elmore Leonard's legal-sized summary of rules.

[ Comments? ]

Posted by Eric Bakovic at 03:24 PM

Into the blogosphere

I haven't read much of it yet, but this site (collection? volume?) looks interesting [tip from Fernando Pereira via Nick Montfort].

Posted by Mark Liberman at 01:12 PM

Self-exposure at the NY Times

Today's NY Times has one of the most egregious violations of Elmore Leonard's Fourth Rule of Writing that I've ever seen in a news article.

It's in John F. Burns' piece on "the opening of court proceedings on Thursday against 12 of the highest-ranking officials of Iraq's ousted dictatorship". I enjoyed the article, which is unusually long for the Times. It's organized as a series of accounts of the interactions between the court and the former officials being arraigned, with a great deal of subjective evaluation of the defendants' demeanor. The last former official in the parade is Ali Hassan al-Majid, known as "Chemical Ali" for his role in using chemical and biological weapons against the Kurds.

Burns describes Ali's reaction to the charges this way:

"I'm happy with the accusations, because I'm innocent of them, and as you will see, justice will prevail," he said, in an even tone that had something of the quality of a man concerned that he has been overcharged for his car repair, but unwilling to make much of it.

Nice simile. But St. Elmore told us

4. Never use an adverb to modify the verb ''said'' . . .

. . . he admonished gravely. To use an adverb this way (or almost any way) is a mortal sin. The writer is now exposing himself in earnest, using a word that distracts and can interrupt the rhythm of the exchange. I have a character in one of my books tell how she used to write historical romances ''full of rape and adverbs.''

Now, "in an even tone that had something of the quality of a man concerned that he has been overcharged for his car repair, but unwilling to make much of it" is not an adverb, in the sense of a single word modifying a verb or adjective. But it's a prepositional phrase used adverbially, modifying "said".

And if the single adverbial word gravely is enough to "expose" the writer and "[distract] ... and interrupt the rhythm of the exchange", how much worse is it to use a 30-word adverbial phrase?

I like Elmore Leonard's writing, but I also like to read novelists who "expose themselves" in the way that he advises against. One of the champions of self-exposure is Henry James, who often stitches together a few scraps of dialog with acres of inner fustian:

She had wound up with a laugh of enjoyment over her embroidery of her idea--an enjoyment that her face communicated to Strether, who almost wished none the less at this moment that she would let poor Waymarsh alone. HE knew more or less what she meant; but the fact wasn't a reason for her not pretending to Waymarsh that he didn't. It was craven of him perhaps, but he would, for the high amenity of the occasion, have liked Waymarsh not to be so sure of his wit. Her recognition of it gave him away and, before she had done with him or with that article, would give him worse. What was he, all the same, to do? He looked across the box at his friend; their eyes met; something queer and stiff, something that bore on the situation but that it was better not to touch, passed in silence between them. Well, the effect of it for Strether was an abrupt reaction, a final impatience of his own tendency to temporise. Where was that taking him anyway? It was one of the quiet instants that sometimes settle more matters than the outbreaks dear to the historic muse. The only qualification of the quietness was the synthetic "Oh hang it!" into which Strether's share of the silence soundlessly flowered. It represented, this mute ejaculation, a final impulse to burn his ships. These ships, to the historic muse, may seem of course mere cockles, but when he presently spoke to Miss Gostrey it was with the sense at least of applying the torch. "Is it then a conspiracy?"

But is this where NY Times courtroom reporting is heading?

Posted by Mark Liberman at 10:28 AM | Comments (1)

The secret Netherlanders among us

According to this article, the Dutch in Albany (originally Beverwijk) took more than 250 years to give up and accomodate to English language and culture. And maybe even now...

The Dutch lost New Netherland to the British in 1654, but a century later, botanist Peter Kalm "wrote that many people still spoke and read Dutch and that the English and Dutch populations despised each other", and "as late as World War I, a form of Dutch was still spoken in the region, according to Charles Gehring" at the New Netherland Institute, which offers a covertly revanchist Virtual Tour of New Netherland, covering "what are now the states of New York, New Jersey, Pennsylvania, Maryland, Connecticut and Delaware". I hope that somebody will warn Samuel Huntington -- never mind the Palatine Boors and the Frieslanders in Illinois, apparently we've got to worry about those culturally unassimilated upstate NY indigenes from the Low Countries.

Seriously, this sort of thing is now just one of the threads in the American cultural and linguistic tapestry. The debate these days is whether current immigrants are culturally and linguistically a new phenomenon, or just more of the same at an earlier stage.

[Update: Trevor at kaleboel explains that it's worse than I thought: those "Dutch" indigenes were actually a motley multi-cultural crowd of Francophones, East Frieslanders and whatnot: he says "I'm prepared to bet the first-comer my AdSense earnings for June that at no time during the C17th did "Dutch"-speakers constitute the largest language community in Beverwijck; I suspect, in fact, that this honour belonged for most of the period to speakers of Low Saxon variants."

Hey, Trevor, I'd be happy to pay up if I had any AdSense earnings to put up against yours. ]

Posted by Mark Liberman at 09:09 AM | Comments (0)

Analyzing voice stress

Yesterday's NYT had an article on voice stress analyzers. As a phonetician -- someone who studies the physics and physiology of speech -- I've been amazed by this work for almost three decades. What amazes me is that research (of a sort) and commerce (at a low level) and law-enforcement applications (here and there) keep on keepin' on, decade after decade, in the absence of any algorithmically well defined, reproducible effect that an ordinary working speech researcher like me can go to the lab, implement and test.

Well, these days there's no need to go to the lab for this stuff -- you just write and run some programs on your laptop. But that makes the whole thing all the more amazing, because after 50 years, it's still not clear what those programs should do. I'm not complaining that it's unclear whether the methods work -- that's true too, but the real scandal is that it's still unclear what the methods are supposed to be.

Specifically, the laryngeal microtremors that these techniques depend on haven't ever been shown clearly to exist, as far as I know. No one has ever shown that if these microtremors exist, it's possible to measure them in the pitch of the voice, in a way that separates them from all the other phenomena that modulate the pitch at similar rates. And that's before we get to the question of how such undefined measurements might be related to truth-telling. Or not.

How can I make you see how amazing this is? Suppose that in 1957 some physiologist had hypothesized that cancer cells have different membrane potentials from normal cells -- well, not different potentials, exactly, but a sort of a different mix of modulation frequencies in the variation of electrical potentials between the inside of the cell and the outside. And further suppose that some engineer cooked up a proprietary circuit to measure and display these alleged variations in "cellular stress" (to the eyes of a trained cellular stress expert, of course), and thereby to diagnose cancer, and started selling such devices to hospitals, and selling training courses in how to use them. And suppose that now, almost half a century later, there is still no documented, well-defined procedure for ordinary biomedical researchers to use to measure and quantify these alleged cell-membrane "tremors" -- but companies are still making and selling devices using proprietary methods for diagnosing cancer by detecting "cellular stress" -- computer systems now, of course -- while well-intentioned hospital administrators and doctors are occasionally organizing little tests of the effectiveness of these devices. These tests sometimes work and sometimes don't, partly because the cellular stress displays need to be interpreted by trained experts, who are typically participating in a diagnostic team or at least given access to lots of other information about the patients being diagnosed.

This couldn't happen. If someone tried to sell cancer-detection devices on this basis, they'd get put in jail.

But as far as I can tell, this is essentially where we are with "voice stress analysis."

The "National Institute for Truth Verification" offers a page giving "a partial list" of "studies validating voice stress analysis." These go back to work by Lippold and others, starting in the 1950s, that has claimed to identify a "microtremor" with a frequency of about 8-12 (sometimes 8-14) Hz., caused by reflex arcs in the motor system, whose intensity is modulated by stress. This is supposed to be a more general muscular phenomenon, but the "voice stress" applications depend on measuring the intensity of this tremor in the fundamental frequency (pitch) of the voice, caused by "microtremors" in the muscles of the larynx.

One aspect of this stuff that I've always found counter-intuitive is that stress is supposed to diminish these microtremors, not increase them. So it's not that your voice becomes more quavery when you're nervous or upset, rather that it (supposedly) becomes more steady.

Anyhow, I've tried, on and off for almost 30 years, to measure these microtremors, and I can't find them. I don't know any reputable speech researcher who can, at least not in any reproducible way. Shortly after I started work at Bell Labs in 1975, some people in my group took a look at these claims, since voice stress analysis offered an obvious way to prevent telephone credit card fraud. These folks ran into the basic problem right away -- they couldn't find these putative microtremors. This was not because they were stupid people who didn't know how to analyze the frequencies of the voice, believe me.

This is still the situation. Pick some unusual kind of signal processing like "modulation spectrum" and Google will easily find you dozens of pages with equations defining the concepts and source code implementing them. Pick a hard-to-analyze acoustical property of the voice, like "jitter and shimmer" in fundamental frequency (related to hoarseness and so on), and the same thing is true. Look for "microtremor" or "voice stress" and you'll find lots of pages discussing whether or not the methods work -- but nothing, as far as I can tell, telling you in mathematical or algorithmic detail what the methods really are, much less offering code that implements them.

There are some reasons to think that it ought to be hard to measure 8-14 Hz. "microtremors" in the fundamental frequency of ordinary speech, even if they exist. Frequencies of 8-14 Hz. correspond to periods between about 70 and 125 msec. But the durations of phonetic segments corresponding to consonants and vowels overlap that same range -- typically 50 to 250 msec. And most such segments involve vocal gestures that strongly modulate the pitch of the voice, through changes in supralaryngeal impedence, changes in voicing, or the effects of stress, intonation and/or tone. So you're looking for a low-amplitude modulation of a signal that's simultaneously being subjected to a highly variable (because information-bearing) high-amplitude modulation in the same frequency range. Not impossible, but hard.

The best technical overview that I know is a 1996 special issue of Speech Communication on "Speech under Stress" (vol. 20, issues 1-2, pp. 1-175). The most relevant article in that issue is Robert Ruiz, Emmanuelle Absil, Bernard Harmegnies, Claude Legros and Dolors Poch, "Time- and spectrum-related variabilities in stressed speech under laboratory and real conditions" (pp. 111-129). They define an "index of microprosodic variation" which they dub μ -- for which they give an actual equation! -- and they show that it is affected by (situational) stress in both laboratory and real-world situations. Is this the long-sought technical validation of the microtremor theory?

In a word, no. This "index of microprosodic variation" is defined on individual vowel segments, as f_c/((f_i+f_f)/2), where f_c is the pitch in the center of the vowel, f_i is the initial pitch of the vowel, and f_f is the final pitch of the vowel. In other words, it's simply the ratio of the pitch in the middle of the vowel to the average of the starting and ending pitch. This will tend to be higher under conditions of higher "vocal effort' -- high vocal cord tension, high subglottal pressure, etc., as an elementary consequence of the physical mechanisms involved in vocal cord vibration. It's got zilch to do with "microtremors" putatively caused by a motor system reflex arc, and putatively modified by stress-induced changes in feedback strength. And increases in mu would be caused by lots of things other than stress -- talking more loudly because your listener is farther away, or because of higher background noise, for example.

So I'm still waiting. There's some good test data out there. The Linguistic Data Consortium has published a database of "Speech Under Simulated and Actual Stress (SUSAS)", collected by John Hansen. If someone will send me an equation, an algorithm, or some Matlab code, I'll be happy to try it out -- it would just take a few days to do some initial tests of plausibility -- and if it works, I'll sing its praises.

I'm not prejudiced against the "microtremor" theory -- I'd love to have another measurement dimension for speech analysis. I'm not prejudiced against "lie detector" technology -- if there's a way to get some useful information by such techniques, I'm for it. I'm not even opposed to using the pretense that such technology exists to scare people into not lying, which seems to me to be its main application these days. But when a theory about quantitative measurements of frequency-domain effects in speech has been around for half a century, and no one has ever published an equation, an algorithm or a piece of code for making these measurements, and willing and competent speech researchers (like me) can't create reliable methods for making such measurements from the descriptions we find in the literature... something is wrong.

But maybe the techniques are being kept secret to preserve competitive advantage, but really work anyhow? That's not the way these things are supposed to happen, but this is possible in principle. However, the "National Institute for Truth Verification" does not list on its page the more negative results, like this 2003 study, which tested the Vericator (TM) voice stress analyzer in a test to find (people pretending to be) smugglers at two mock border checkpoints. The study was done by the Department of Defense Polygraph Institute, not likely in priniciple to be an outfit hostile to such technologies, and found that the miss rate was about 85% (50 of 59 smugglers were missed), while the false alarm rate was about 12% (13 of 111 non-smugglers were falsely flagged). More to the point, the rate at which candidates were identified as "smugglers" was quite similar whether in fact they were in that category (9 of 59, 15%) or were not in that category (13 of 111, 12%).

As I said, if you tried to sell cancer diagnosis equipment on the basis of (non double blind) clinical trial results like that, you'd be in trouble with the law.

[Note to fellow-linguists: if you haven't already figured it out, we're talking here about 'stress" as in "I'm so stressed about my job interview", not "stress" as in "giraffe is stressed on the second syllable".]

Posted by Mark Liberman at 07:44 AM | Comments (2)

July 01, 2004

Prescriptivism and ignorance: together again

One of the listener letters read on Here and Now today complained about some newscaster's failure to use "subjunctive case".

... in our segment on the new film The Corporation, we posed the question "if a corporation was a person, what kind of person would it be?" Well, this prompted K.G. Hynes [?] of Jenkintown PA to write:
"While indeed legally a corporation is considered a person, in point of fact it is a company. For instance, a corporation can be sued, but it can't literally be sent to jail, only people can. Ergo, your introductory sentence should have been put in the subjunctive case: 'What if the corporation WERE a person.' The subjunctive case is used when the statement is contrary to actual fact, as in 'I wish I were on vacation.' "

The Here and Now newsreader responded mildly "Oh, don't we all."

I thought this was an especially nice little example of a correlation that might seem surprising, until you think of it: the people who complain the most about linguistic usage are also usually the people who are most ignorant and confused about how to analyze it and describe it.

The subjunctive has been dying out in English for a few hundred years, as the American Heritage Book of English Usage explains:

... over the last 200 years even well-respected writers have tended to use the indicative was where the traditional rule would require the subjunctive were. A usage such as If I was the only boy in the world may break the rules, but it sounds perfectly natural.

But the key point here is that it's the subjunctive mood, not the subjunctive case. "Case" is a property of nouns (and associated categories like adjectives), not of verbs. If you're going to be an annoying prescriptive nag, at least don't be a terminologically ignorant annoying prescriptive nag.

Posted by Mark Liberman at 12:58 PM | Comments (17)

A classical malapropism and a hypercorrect eggcorn

Andrew Sullivan notes a NYT reader review of Fahrenheit 9/11 saying that the "Bush Administration damns itself through its own actions, its own words, its own lies...all documented for prosperity." This is a surprisingly common malapropism. Google has 21 instances of "documented for prosperity", 8 instances of "preserving * * for prosperity", etc., most of which seem to be sincere mistakes:

( link) Moore's collection was aimed at preserving these ballads for prosperity.

A few minutes ago on the BBC Newshour, I (believe that I) heard a newscaster use the phrase "smoked filled room", while reporting on a demand for greater democracy in Hong Kong. Google has 614 instances, many of which are sincere mistakes rather than jokes about preserved meat and the like:

(link) The famous "smoked filled room" of politics was located behind this very bar!

I take it that this is a hypercorrection, patterned on the common loss of "-d" in complex nominals such as ice(d) cream, skim(med) milk, ice(d) tea, wax(ed) paper, roast(ed) beef, shave(d) ice, cream(ed) corn, whip(ped) cream, and the like.

(In the interests of fairness to Mr. Lustig, whom I have a hard time taking seriously for shameful reasons I've mentioned earlier, I'll try to check the audio when it becomes available on line, and make sure that it was really him, and that he really said "smoked filled"...)

[Update: I originally wrote that I thought the BBC newscaster was Robin Lustig, but he has written a comment (below) to indicate that he was not responsible. To my surprise, it appears that the BBC has no accessible archives of its programs. Since I am not about to make a trip to London to arrange a "listening appointment" at the British Library, I'll never know who it was. Sorry.]

Posted by Mark Liberman at 09:45 AM | Comments (13)

Comments: progress and prospects

We're experimenting with comments here at Language Log. So far, reviews are mixed.

There are some positive comments on comments:

I have nothing to say on the subject; but I'm delighted at your decision to enable comments.

I would like to second Ray Girvan's delight. Well done, o brave Loggers!

And some negative ones:

Come on guys, how about some security here? I love languagelog.org because of its quality content. Having to be bothered by a bunch of trolls is going to detract from the site.

Ray Girvan wrote:
"I have nothing to say on the subject; but I'm delighted at your decision to enable comments."

Oh yes, because this won't be abused. ::cough::

Please fix the comment posting system before LanguageLog turns into Slashdot at -1.

Following both these lines of commentary at once, we'll continue to experiment, while intervening occasionally to keep the SNR up. Our current plan is to go to a typekey-based system for keeping out comments spam.

However, I'm less interested in the negative/defensive aspects -- the struggle with spammers, trolls and fools -- than in ways to foster productive and/or interesting discussions. Over the course of the summer, I hope to try some experiments that will go beyond enabling comments on the blog site, both in terms of technical structure and in terms of content.

Here are a couple of ideas, varied in form and content.

(1) Start off a limited-time discussion with a provocative post. The discussion could take place on an off-blog site that would maintain a threading posting structure, for easier posting and reading. Other language blogs could participate with posts on their own sites, using some kind of RSS arrangement to collect the distributed contributions. Or perhaps the kick-off could be orchestrated in advance by an ad hoc confederation of linguabloggers, who would post their individual provocations independently but simultaneously. After a few days, the discussion site would shut down and the originators (and others) could comment via new blog posts on whatever aspects of the discussion interested them.

(2) Try a weblog-mediated "journal club" or "book club", where from time to time (once a month?) an interesting article or book would be featured for discussion. Of course it would be available on line. Ad hoc support by linguabloggers might include informal tutorial explanations of technical aspects, or accounts of the intellectual context of the research. Some sort of open or semi-open threaded discussion should also be part of this.

The idea is not to make anyone do significantly more work, but rather to introduce a minimal amount of maximally informal coordination, so as to enable existing posting and reading and commenting to interact in a way that is more productive and more fun.

Please feel free to comment on these ideas here, and also on your own sites where appropriate.

Posted by Mark Liberman at 09:18 AM | Comments (17)

Just so!

This from Scientific American (full text by subscription) in an article headed Infant pacification may have led to the origin of language:

In a paper slated for the August Behavioral and Brain Sciences, Florida State University physical anthropologist Dean Falk proposes that just as motherese forms the scaffold for language acquisition during child development, so, too, did it underpin the evolution of language. Such baby talk itself originated, she posits, as a response to two other hallmarks of human evolution: upright walking and big brains.

In contrast to other primates, humans give birth to babies that are relatively undeveloped. Thus, whereas a chimpanzee infant can cling to its quadrupedal mother and ride along on her belly or back shortly after birth, helpless human babies must be carried everywhere by their two-legged caregivers. Assuming, as many anthropologists do, that early humans had chimplike social structures, moms did most of the child rearing. But having to hold on to an infant constantly would have significantly diminished their foraging efficiency, Falk says.

She argues that hominid mothers therefore began putting their babies down beside them while gathering and processing food. To placate an infant distressed by this separation, morn would offer vocal, rather than physical, reassurance and continue her search for sustenance. This remote comforting, derived from more primitive primate communication systems, marked the start of motherese, Falk contends. And morns genetically blessed with a keen ability to read and control their children, so the theory goes, would successfully raise more offspring than those who were not. As mothers increasingly relied on vocalization to control the emotions of their babies--and, later, the actions of their mobile juveniles--words precipitated out of the babble and became conventionalized across hominid communities, ultimately giving rise to language.

Just so! And indeed Falk's powerful idea can be applied not only in linguistics and anthropology, but also in zoology and paleobiology. For example, I had always suspected that Kipling was entirely wrong about the elephant's trunk: it is clearly a child rearing adaption which mama Nelly uses not only to put all the little Dumbos in a row, but also to wash them.

Mind you, Kipling has the better punchline (Kipling's full text here):

Then the Elephant's Child felt his legs slipping, and he said
through his nose, which was now nearly five feet long, 'This is
too butch for be!'

But back to linguistics. The SciAm article at least does cite the commentary of a suitably skeptical linguist (though one with his own little story to tell about the origins of language):

Linguists likewise demur. Falk's account sheds considerable light on the origins of speech, writes Derek Bickerton of the University of Hawaii at Honolulu in an accompanying commentary. Unfortunately, he continues, it reveals nothing about the origins of language. He charges that the hypothesis fails to address how the two fundamental features of language--namely, referential symbols and syntactic structure--arose, noting that speech is merely a language modality, as are Morse code and smoke signals. Falk's scenario does not explain how mother's melodic utterances acquired meaning in the first place, Bickerton insists.

Somehow Bickerton's insistence is not quite as persuasive or as damning as I'd like. The thing is: language doubtless has some child rearing benefit just as it has payoffs in all other social arenas, though a little less than in most cos kids are so incredibly stupid and don't have the faintest idea what all the cooing is about. Even if we allow that child rearing benefit is a factor in the evolution of language, why on earth would we isolate it as the prime mover above everything else? I'm curious to know if the actual article when it appears in Behavioral and Brain Sciences (hmm, maybe online already) will contain even one shred of evidence. I mean, an actual observation of language in the process of significant development through child rearing in any species is obviously more than we can expect. But still, some shred, just to convince me that the eds. of BBS are not crazy.

Well, at least Falk can be seen as a gender balancing antidote to Geoffrey Miller's Mating Mind theory of language evolution, a theory which I seem to remember gives males trying to impress mates a similar innovative role in language evolution that Falk wants to give mothers. Mind you, in Miller's model it's still the women that make the selection. I'm not sure whether Falk's model has a place for weaned males. An occasional grunt, perhaps.

Posted by David Beaver at 04:33 AM | Comments (2)

Texting, typing, speaking

According to the AP, Kimberly Yeo "thumbed 26 words in 43.24 seconds into her phone, beating a world record of 67 seconds for the same words set by a Briton last September." That's about 36.1 words per minute, which is good enough to pass some typing tests -- I remember having to beat 35 wpm in order to qualify as a temp when I was in college.

Of course, champion typists routinely beat 100 wpm. And normal speaking rates are in the range of 150-200 wpm -- some people can speak much faster, but this is not an area where there are any contests that I know about.

The 26 words that Ms. Yeo texted were not especially easy ones:

The razor-toothed piranhas of the genera Serrasalmus and Pygocentrus are the most ferocious freshwater fish in the world. In reality they seldom attack a human.

(Note that this is 26 words -- the contest sponsor's count -- if you call "razor-toothed" two words and "freshwater" one word.)

By comparison, I just read this passage into the computer, speaking briskly but clearly. With generous segmentation around the edges, the duration of the read passage was 6.97 seconds, or 223.8 wpm. This include a 490-msec. pause between the two sentences.

The passage is a rather dense one, with 46 syllables, for an relatively high average (for English) of 1.78 syllables/word. This makes my speech rate in the passage about 396 syllables/minute, or about 152 msec./syllable, which is about average for material like this.

I don't know what normal, sustained texting rates are -- an informal lunch-room guesstimate in Japan was 15 wpm for skilled users. But of course, rates on clotted formal text like the cited test passage are not relevant, since a typical eight-English-word text message is more like "my summr is cwot" than "the piranhas of the genera Serrasalmus and Pygocentrus".

Still, texting is clearly slow and effortful compared to talking. Its popularity in Japan and Europe, relative to the U.S., seems to be due to higher prices for voice calls and/or greater constraints on talking in public places.

Posted by Mark Liberman at 01:16 AM | Comments (3)

34.	a.	Kim visited NY and Jim could've _VPe.
	b.	Kim visited NY and Jim couldn't _VPe.
	c.	Kim visited NY and Jim couldn't've _VPe.
	d.	I'd've visited NY.
	e.	*Jim could'ven't seen it.

32.	E.	Syntactic rules can affect affixed words, but cannot affect clitic groups.
	F.	Clitics can attach to material already containing clitics, but affixes cannot.

Ninguém	Buscas mais, amigo meu?	What else do you seek, my friend?
Todo-o-Mundo	Busco a vida e quem ma dê.	Life and who will give it to me.
Ninguém	A vida não sei que é, a morte conheço eu.	Life I don't know, but death I do.
Berzebu	Escreve lá outra sorte.	Write down this finding.
Dinato	Que sorte?	What finding?
Berzebu	Muito garrida: Todo-o-Mundo busca a vida, e Ninguém conhece a morte.	Very colorful: Everybody seeks life, and nobody knows death.
Todo-o-Mundo	E mais queria o paraíso, quanto devo para isso.	I also want to get to heaven without anybody getting in the way.
Ninguém	E eu ponho-me a pagar quanto devo para isso.	And I am paying what I owe to get there.
Berzebu	Escreve com muito aviso.	Write (this) down carefully.
Dinato	Que escreverei?	What?
Berzebu	Escreve que Todo-o-Mundo quer paraíso, e Ninguém paga o que deve.	That everybody wants heaven, but nobody pays what they owe.

Kraven:	Once I get my hands on you I'll.. UHHH!
Spider-Man:	Tsk-tsk! Didn't anyone ever tell you not to end a sentence with an expletive? I can tolerate your nastiness.. but bad grammar... unforgiveable!

Baddie #1:	C'mon... grab 'im!! He's no bigger'n us!
Spider-Man:	Tsk tsk! You mean "no bigger than we"!

Aunt May:	Isn't that Dr. Bromwell the dearest thing? As you would say Peter.. he's a regular pussywillow!
Peter:	No, Aunt May! I keep telling you.. the word is pussycat!
Aunt May:	But I think pussywillows are much cuter, dear!
Peter:	OK, pretty girl! If you say so, he's a pussywillow!
Aunt Anna:	The more May says it, the more it sounds right to me!