November 29, 2007

What Linguists Want for Christmas

Code Talker GI Joe action figure My friend Mike sent along a pointer to what he wants for Christmas: a
Navajo Code Talker GI Joe.

It is reported to produce seven phrases.

Posted by Bill Poser at 06:17 PM

This blogging life: The InterWeb makes people lazy and stupid

Every few days, I get e-mail saying something like, "I'm sending this to you because you're the only person whose address I can find on the Language Log site."  The mail is really intended for some specific other blogger, or offers a topic that some one of us (not necessarily me) might want to blog about.

Over the years, hundreds of such messages.  And then, this morning, a similar message about the eggcorn database:

This ["daring-do"] is probably not a new eggcorn [it isn't], it seems so obvious, but I couldn't find it mentioned in Language Log, and I'm not sure how to find the eggcorn database that's mentioned every so often, so I can't be sure this has already been documented.  [Many Language Log postings about eggcorns, including my recent lolcat/eggcorn posting, have links to the ecdb in them.]

Why are people so incompetent at finding e-addresses and web addresses?  The hypothesis I've developed is that the InterWeb -- the conglomeration of the Internet and the World Wide Web -- makes people lazy and stupid.  Here's this amazing resource, which allows people to track down all sorts of arcane information within (at most) minutes, yet the users have come to expect that sites will be designed to offer them a single-click route to whatever they want.  That's just lazy.  And they seem to have lost the ability to search things out for themselves.  The InterWeb has made them stupid.

[Yes, I'm being hyperbolic here, and I understand that people have always been lazy and not very knowledgeable about how to find information.  I don't actually think that these problems are of recent origin, or that the InterWeb caused them; people can stop mailing me to set me straight on those points.  But I do think that the InterWeb amplifies the problems.  And I want to complain.]

[Added 12/1: Caitlin Light tells me that there's a humor website Something Awful, where "the internet makes you stupid" is a catchphrase: "Something Awful has been mocking itself and the internet since 1999, bringing you reviews of the worst movies, video games, and websites to ever exist. If it's something and it's awful, it's probably on Something Awful, where the internet makes you stupid."]

To take up the immediate problem: if you want to write to a Language Logger, you can go to the list of bloggers (on the right side of the main page, down a bit); click on the blogger's name, and you will be taken to their webpage, where there will be contact information.  Last I looked, this worked for everyone except John McWhorter (who does have a webpage, but it supplies a postal address and not an e-address).

(Many readers have written me to complain that the site is user-unfriendly, because this click-on-the-blogger feature isn't obvious.  Well, we don't WANT it to be instantly obvious; see below.  By the way, some readers have written me to complain that they're annoyed by the slightly coded spam-challenging e-mail address that used to be -- I no longer offer it -- at the end of my postings, because they have to type it out, rather than just clicking on it.  Sigh.)

BUT, much more important,  if you haven't figured this click-on-the-name thing out, you have search engines at your disposal.  Google Is Your Friend, as I keep telling people.  Search on "Mark Liberman", say (in whatever your favorite search engine is) -- I get more mail intended for him than for any other Language Logger -- and you'll quickly get to his webpage and to his contact information.  (And for the ecdb, just search on "eggcorn database", and that should get you a link to the ecdb main page as your top hit.  [Added 12/1: Elizabeth Daingerfield Zwicky points out that if you want to find out if an expression x is on the ecdb, you can just search on "eggcorn x", and that will get you results faster than searching for the site and then searching on the site.])

[Digression: why doesn't everyone just give their e-address on the site?  Because we (well, most of us) want to ALLOW e-mail but not to INVITE it.  If people have to do a few clicks and think a bit about what they want to say, that's a good thing.  There's no reason why we should be instantly available to everyone, all the time.

And if people could mail to all of us at once, that would be a disaster.  We'd be stuck in endless metadiscussions among ourselves about whether this item deserves a posting and who should do it.  We have some of those discussions already, but if we ALL had to look at ALL of the suggestions people send us, the blog would quickly grind to a halt; there are too many of them.  I myself get two to ten such suggestions a day, on top of ten to twenty (some days, a hundred or so) responses to my earlier postings.  There's no way I can give a useful reply to all of these messages and still have a life outside of Language Log.  Hey, I have more than three thousand messages in my inbox as it is, and I have a more-or-less constant queue of 250 postings in preparation, that is, partly written.  Still more I do not need.

I know, you're going to point to Mark Liberman's astonishing productivity on this blog.  Mark's ability to respond rapidly, cogently, and at length just boggles my mind.  I work much more slowly, chewing things over for some time; even short postings often take me six to eight hours to compose, and some of my longer postings have taken twenty to forty hours.  For me, it's much like academic writing, except that I'm writing squibs rather than extended articles, and I'm writing for a general audience rather than a specialist audience.  In any case, there are differences in personal work styles here, so I'm asking you not to judge me by the Liberman Standard.  End of digression.]

On to a related topic.  Some time ago a colleague complained to me, on a Monday, that over the weekend they had wanted to call me at home about a project we were working on but didn't have my telephone number.  (This colleague isn't the only one; I've gotten such complaints a number of times.)  I was astonished.  If this colleague was at Stanford, all the contact information is posted outside my office door.  And my number is in the telephone book.

Ok, maybe this colleague wasn't at Stanford and (like so many people these days) didn't HAVE a phone book.  Well, there's Google again (or just the Stanford Linguistics website), which will get you to my website.  All my contact information is there, available to the world.  Why did it not occur to this colleague to check me out on the InterWeb?

Enough for now.  More, less clear-cut, complaints to come.

Posted by Arnold Zwicky at 01:35 PM

Name Crime

The case of the British teacher in the Sudan who has been convicted of blasphemy for allowing her students to name their teddy bear "Mohammed" has received much attention, as it deserves to,including a good summary by our own Geoff Pullum. As an advocate of freedom of speech, freedom of religion, and freedom from religion, I am appalled that a crime of blasphemy should even exist. What is linguistically interesting is the variation within the Muslim world in the appropriateness of using the name Mohammed. In most of the Muslim world this is a common name for boys, but in Turkey it is considered inappropriate to give a boy the same name as the prophet. In Turkish, the name Muhammed, which closely reflects the Arabic original, is reserved for the prophet. If you want to name your son after the prophet, you must name him Mehmet, the Turkicized form.

Posted by Bill Poser at 01:08 PM

On explicitness and discourse markers

It seems that Language Loggers have done a lot of posting about Thanksgiving this season. The holiday is now over but I have one more thing to say about the "thank-you" part of Thanksgiving (however it's pronounced). Last week I  got a thank-you message from a young lawyer I had helped defend at his criminal trial back in 2001. I've consulted in some 500 cases over the years but I  don't remember ever before receiving a thank-you for my efforts. I've posted about thanking in the past. It seems that we may be forgetting how to use this speech act.

This young lawyer, Brian Lett, had just graduated from law school and his first client was a telemarketer who had some serious legal problems. The police had made tape recordings of two undercover conversations that led them to think that Lett had conspired with his client to obstruct justice. He was accused of trying to steal investigative records from the U.S. Postal Service and was also charged with trying to use his influence to get the prosecutor removed from the case.

Here's the story. It was early in the telemarketing case and the government had not yet turned over the discovery records that Lett needed to defend his client properly. Lett's trouble began when an undercover agent, posing as a Visa investigator, telephoned him. This conversation was the evidence used against him (Lett was not on the second tape). The agent began by telling Lett that copies of those needed files had been made. Then he asked Lett if he'd like to have a copy of them, wording it oddly: "Do you want those retrieved?" Of course Lett wanted them. To him (and I would guess to most of us) the dictionary definition of "retrieved" is discovered, brought in, or recovered from storage. The agent's question contained nothing to suggest that the records would be stolen or otherwise obtained illegally.

Next, the agent upped the ante, saying: "Those things, they gotta disappear. If you don't get rid of 'em all, then you're gonna be stuck." To this, Lett replied: "The huge thing now is the prosecutor." What did the agent's "get rid of 'em all" mean to Lett? As a lawyer, it was his job to get rid of the charges. He hadn't seen the evidence yet and he wanted the prosecutor to turn over the required discovery material so that he could learn the basis of the charges against his client. Picking up on Lett's mention of the prosecutor, the agent then asked, "What do you think the best thing to do with him?" Lett didn't catch the agent's opening to have the prosecutor removed from the case and so he responded: "Just get him to stop doing what he's doing."

At trial the investigator claimed that Lett's observation about getting the prosecutor "to stop doing what he's doing" meant that he wanted that agent to use his influence to do something far stronger than to stop the prosecutor's continuing efforts to block the discovery process. His inference was not supported by the actual language used. But the agent didn't stop there. He tried again: "It would take somebody puttin' pressure on him and then somebody actually obtaining the files that he's got," to which Lett said that this sounded good, if it can be done. "Obtaining" seemed harmless enough and "somebody puttin' pressure on him" could be understood to mean somebody putting pressure on the prosecutor to turn over the files.

To this point the agent had been trying to get Lett to incriminate himself. He had already used "obtain the files" and "retrieve the files" and this hadn't worked well so he decided to be more explicit by using the dreaded S-word, "steal." In undercover cases it's usually prudent first to get the target to implicate himself. But when that fails, you have to use a more direct approach. So the following exchange then took place (discourse markers highlighted):

Agent: Okay, well let me tell you something now. Uh, we're talking about, we, we're talking about, I got somebody stealing these files.
    (short pause)
Lett: Oh!
Agent: They have to be destroyed. You understand that, don't you? Because they don't exist no more.
Lett: But who, who, where are they  going then?
Agent: I thought you wanted them at your office.
Lett: Yeah, I do, but, I mean, okay, I just wanted to know and then---
Agent: (interrupting) Didn't your client say he needed to go through them or something?
Lett: No. We can, I mean, I don't know, I don't think it's, you know, as long as it's done, done with, then that's fine.

When the agent finally uses the explicit verb, "steal," Lett replies with the discourse marker, "oh," an indication that this is new information in which the speaker is undergoing some kind of change in his current knowledge, information awareness, or orientation. His "but" discourse marker indicates a contrast, a challenge, or a disagreement. His "okay" marker is also significant, for this discourse marker of clarification indicates that Lett was about to clarify his intentions. The agent blocks this by interrupting Lett before he can finish explaining his surprise, challenge or desire to clarify.

In my book, Creating Language Crimes (Oxford U Press, 2005) I describe eleven powerful conversational strategies sometimes used by law enforcement in criminal investigations. Three of these occur in this case: (1) camouflaging illegality; (2) interrupting at the point when a target is leading up to an exculpatory clarification; and (3) employing the hit and run strategy after first suggesting something potentially illegal. First, the agent camouflages the illegality with words like "retrieve" and "obtain." Failing to catch his prey in this, the agent then resorts to the explicit use of "steal," which might have worked if Lett had any criminal intent in the first place. His response of surprise, "Oh!' indicates that he didn't have that scenario in mind at all. Growing bolder, the agent then says the records have to be "destroyed." Lett is still in the dark and asks where the records are going, hardly words of agreement. Finally, when Lett stumblingly says, "I just wanted to know and then--" the agent calls on the old favorite conversational strategy of interruping the target before he can say anything that might be exculpatory, then follows with the hit and run strategy of changing the topic to Lett's desire to see the files. By then, Lett appears to be thoroughly confused, concluding that if it's "done with, then that's fine." The remaining issue is what Lett means by "done with." If it's true that the files no longer exist, his client's case is in good shape -- a defense lawyer's dream.

Lett was acquitted at trial. After I gave my testimony, I flew home and never heard from him again until last week, when his thank-you message popped up in my in-box. He told me that he is now a successful lawyer, practicing law in Canada, where he now lives with his wife and new baby. Thank-you messages can do a great deal for both the writer and the receiver. We ought to be doing more of this. It certainly made my Thanksgiving pleasant.

Posted by Roger Shuy at 12:36 PM


For those interested in language/gender issues in general, and the talkativeness debate in particular, a new quantitative review has just come out: Campbell Leaper and Melanie Ayres, "A Meta-Analytic Review of Gender Variations in Adults' Language Use: Talkativeness, Affiliative Speech, and Assertive Speech", Personality and Social Psychology Review, 11(4), 328-363 (Nov. 2007).

The abstract:

Three separate sets of meta-analyses were conducted of studies testing for gender differences in adults' talkativeness, affiliative speech, and assertive speech. Across independent samples, statistically significant but negligible average effects sizes were obtained with all three language constructs: Contrary to the prediction, men were more talkative (d = -.14) than were women. As expected, men used more assertive speech (d = .09), whereas women used more affiliative speech (d = .12). In addition, 17 moderator variables were tested that included aspects of the interactive context (e.g., familiarity, gender composition, activity), measurement qualities (e.g., operational definition, observation length), and publication characteristics (e.g., author gender, publication source). Depending on particular moderators, more meaningful effect sizes (d > .2) occurred for each language construct. In addition, the direction of some gender differences was significantly reversed under particular conditions. The results are interpreted in relation to social-constructionist, socialization, and biological interpretations of gender-related variations in social behavior.

Posted by Mark Liberman at 11:02 AM

Emergent morphology and informative tautologies

One of the emerging political scandals of the day involves some fancy financial footwork in the NYC mayor's office under Rudy Guiliani -- the costs of police protection for travel associated with his extramarital affair with Judith Nathan (now his wife) were apparently hidden in the budgets of various apparently unrelated city offices. ($34K to the New York City Loft Board, $10K to the Office for People With Disabilities, $30K to the Procurement Policy Board, and $400K to the Assigned Counsel Administrative Office.)

Over at the political blog Talking Points Memo, Josh Marshall has been having an inordinate amount of fun with this one, indulging in punning headlines ("Raking the shag") and a flurry of neologisms like these (emphasis added):

On the Rudy shagonomics story, the details are even richer than the headline. Not only did Rudy pick obscure public agencies to bill for his trips out to hang in the Hamptons with Judy Nathan, he seemed to pick them to guarantee the maximum impression of tastelessness and chutzpah should he ever be found out.

Admittedly he only charged $10,000 to the people with disabilities fund. Chump change for the shag fund. But the office charged with getting counsel for indigent defendants got stuck with $400,000.

and these:

Tuned in late here to the Youtube debate. I was thinking Rudy's big problem tonight was the govt-funded shagoramas with then-mistress Judith Nathan. But I think that this answer about gun control might be a bigger problem.

In related linguistic news, Giuliani's explanation is an excellent example of an apparent tautology that is actually informative -- in this case, on several levels at once:

"And they took care of me, and they put in their records, and they handled them in the way they handled them."

Posted by Mark Liberman at 08:43 AM

Arom on polyrhythms

Here's a sort of prequel to my post on "Rock syncopation: stress shifts or polyrhythms?" (11/26/2007). It consists mostly of some notes and quotes from Simha Arom's African Polyphony & Polyrhythm, Cambridge University Press, 1994 (originally published as Polyphonies et Polyrhymies Instrumentales d'Afrique Centrale, SELAF, 1985). The point is to help those of us trained in European metrical traditions to understand some other ways to think about meter and rhythm. My own interest comes from attempts to understand the ways that words fit into American popular and folk music of the past century or so.

The stuff in red is quoted from Arom's book. The rest is my commentary, summary and exemplification.

Some of Arom's terminology (p. 230):

The period provides a temporal framework for rhythmic events. It is invariably composed of whole numbers. These numbers are usually even, i.e. divisible by two ... This means that the structure of the period is symmetric. The constituents of this structure are pulsations.

The pulsation is an isochronous reference unit used by a given culture for the measurement of time. It consists of a regular sequence of reference points in relation to which rhythmic events are ordered. Moreover, in polyrhythmic music, the pulsation is the common denominator, from the standpoint of musical organization, for all the parts in a piece. ...

The pulsation is, however only rarely given material existence in this region of Central Africa. While it can always be given the material form of handclaps, it is nevertheless usually implicit.

The pulsation can be subdivided in three different ways: binary when it is slit into two of four equal parts; ternary when it is split into three or, rarely, six equal values; composite when it is split, in a combination of the two preceding ways, into five equal values. ...

The minimal operational value is the smallest relevant duration obtained after subdivision; all other durations are multiple of this value.The period is thus equal to the total number of these values. A period based on twelve pulsations will then contain thirty-six operational values in the case of ternary, and twenty-four (or forty-eight) in the case of binary, subdivision of the pulsation. ...

The preceding sections have dealt with the metric organization of the period as a temporal framework for rhythmic events. In African music, however, several rhythmic events are usually found to occur simultaneously. This is what we call polyrhythmics  ... [T]he superposed rhythmic figures in a polyrhythmic context are of varying lengths, yet always stand in simple ratios, such as 2:1, 3:1, 3:2, 4:2, and multiples thereof.

Here's a discussion of one particular class of rhythmic figure (p. 246):

One particular form of asymmetry which is very frequently found in Central Africa may be called rhythmic oddity. When the number of pulsations in the periods involved is divided by two, the result is an even number. The figures contained in this period are nevertheless so arranged that the segmentation closest to the middle will invariably yield two parts, each composed of an odd number of minimal vaues, wherever the dividing line in placed. These figures are always constructed by the irregular juxtaposition of binary and ternary quantities. The resulting rhythmic combinations are remarkable for both their complexity and their subtlety. They follow a rule which may be expressed as 'half - 1/half + 1'. [...]

The figure with eight minimal values (i.e., containing two binary pulsations) ... is articulated as follows: 3/3.2 = 3/5 = 4-1/4+1.

This, of course, is nothing other than the familiar habanera rhythm, discussed in my earlier post:

 x - - - x - - -|x - - - x - - -
|1 2 3 4 5 6 7 8|1 2 3 4 5 6 7 8|
 o     o     o   o     o     o
 o     o         o     o

Note that the two binary "pulsations" (indicated with x's in my schematic above) are implicit in this figure, though they are where a musician from this tradition would mark time with hand-claps -- and they are positions that might well wind up marked by notes in a different figure performed simultaneously. (See my earlier post for an example.)

A figure with a period containing twelve operational values arranged into four ternary pulsations ... may be segmented into 3.2/3.2.2 = 5/7 = 6-1/6+1.

 x - - x - - x - - x - -|x - - x - - x - - x - -
|1 2 3 4 5 6 7 8 9 A B C|1 2 3 4 5 6 7 8 9 A B C|
 o     o   o     o   o   o     o   o     o   o
 o         o             o         o

Here's a midi implementation of this schema -- for clarity, I've put the "minimal operational values" in the background. In Arom's examples, both the "minimal operational values" and the "pulsation" would be implicit, and the whole thing would be much faster -- typically the "pulsations" are around 150 per minute (2.5 per second, 400 msec. each), and thus in this case, the minimal operational values would be around 450 per minute (7.5 per second, 133 msec. each):

A figure with four pulsations subdivided into sixteen minimal values ... has the following arrangement: 3.2.2/ = 7/9 = 8-1/8+1.

 x - - - x - - - x - - - x - - -|x - - - x - - - x - - - x - - -
|1 2 3 4 5 6 7 8 9 A B C D E F G|1 2 3 4 5 6 7 8 9 A B C D E F G|
 o     o   o   o   o     o   o   o     o   o   o   o     o   o
 o             o                 o             o

And so on for some longer figures.

Since the "pulsations" subdividing the periods in these examples are all isochronous -- 4+4 or 3+3+3+3 or 4+4+4+4 -- should we think of these rhythmic patterns as dislocations of underlyingly evenly-spaced beats, in the style of Temperley's theory of "rock syncopation? Arom suggests a different sort of generating process for this case:

The technique of rhythmic oddity is based on the principle of progressively inserting binary quantities into configurations bounded by ternary quantities. This is clear from the paradigmatic representation of how this principle applies:

Cycle of 8 minimal values    3.   3. 2
Cycle of 12 minimal values    3. 2 3. 2.2
Cycle of 16 minimal values    3. 2.2 3. 2.2.2
Cycle of 24 minimal values    3. 2.2.2  3.

The figure shown in ex. 36, however, applies an inversion of this principle, by inserting ternary quantities in configurations bounded by binary values. The twenty-four constituent minimal values are articulated thus:  2.3.3  2.3

Note that these patterns are not yet polyrhythmic -- they're examples of some of the basic figures out of which polyrhythmic music can be made.

The simultaneous performance of different rhythmic figures engenders a polyrhythmic block or formula. Since each figure has its own period, but all the periods stand in simple ratios, the period of a polyrhythmic formula will always be the period of the longest figure. (p. 277)

One example of such a polyrhythmic formula, provided by Arom, is the scheme for the mò.kóndí dance of the Aka people, "the music for a ritual dance used to consecrate a new campsite" (p. 289 ff.).

We may remark in passing that the Aka musicians generally perform three different rhythmic figures on a single two-headed skin drum lying on the ground: the è.ndòmbà ['child'] part is played on the narrower end, and the ngúé ('mother') par on the wider end. The two drummers straddle the drum, sitting with their backs turned. The dì.kpàkpà [hard wooden sticks] player crouches between them facing the drum, and strikes the barrel in the middle.

The instrument known as dì.kétò consists of striking together two iron strips.

The four parts of mò.kóndí are interwoven in a way schematized below (in a simplified form in which only the accented notes of each part are shown).

            1  2  3  4  5  6  7  8  9  10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
pulsation:  x  -  -  x  -  -  x  -  -  x  -  -  x  -  -  x  -  -  x  -  -  x  -  -
è.ndòmbà:   o        o     o  o     o  o        o        o     o  o     o  o
dì.kpàkpà:  +     +        +     +        +        +        +     +        +
ngúé:       ^        ^  ^        ^  ^  ^        ^        ^  ^        ^  ^  ^
dì.kétò:    *     *  *     *     *     *     *     *     *  *     *     *     *

The dì.kétò part is one of those "rhythmic oddity" patterns, configured as = 13/11 (where the 3s are further divided as 2.1).

Note that there is no direct connection between this culture and the polyrhythmic aspects of American popular music, aside from some very recent effects. African musical traditions are extraordinarily diverse. The Aka people are hunter-gatherers of the Central African Republic, whereas the musical and the traditions that influenced American music would have come mainly from agricultural and city-based cultures in West Africa and Angola. (And perhaps especially through Lagos, via the 19th-century Lagos/Bahía/Havana/New Orleans pathways discussed by J. Lorand Matory in "The English Professors of Brazil: On the Diasporic Roots of the Yorùbá Nation", Comparative Studies in Society and History, 1999.)

However, the patterns and formulae from Central Africa illustrate some general principles of African (and perhaps American) polyrhythmics. Quoting Arom again (p. 211):

In most traditional African music, time is organized according to the following principles:

(1) A strictly periodic structure (isoperiodicity) is set up the repetition of identical or similar musical material, i.e. with or without variations.
(2) The isochronous pulsation is the basic structural element of the period. Whether the figures it contains are binary or ternary or a combination of these, the period is defined by the invariant number of pulsations which constitutes its temporal framework.
(3) There are no regular accentual matrices The pulsations or beats on which the period is based all have the same status. There is thus no intermediate level (measure "strong" beats") between the pulsation and the period.
(4) The pulsation is not necessarily materialized.

It is easy to confirm that these principles exist. It will suffice to ask a member of a given culture to superpose hand-claps on a piece of music from that culture. Whether the experiment involves one or more informants, at brief or longer intervals, we have found that the results are always the same:

- the beat is isochronous and perfectly regular: the beats always fall at the same points in the melodic and/or rhythmic material contained in the period.
- concomitantly, however this material may be distributed, it always reappears after exactly the same number of beats.

These essential principles of all measured Central African music, both monodic and polyphonic, also seem to govern most measured music throughout sub-Saharan Africa.

Perhaps these principles also apply to a certain part of American popular music since 1890 or so, or at least to one available way of conceptualizing some of it. When we think about the patterns of tune-text alignment in that music, we should keep this in mind.

Posted by Mark Liberman at 06:50 AM

November 28, 2007

Linguistic crime report: bear nomenclature flogging threat

The Linguistic Crimes desk at Language Log is closely following the case of Gillian Gibbons. For those who have not heard about her crime, she is a 54-year-old schoolteacher from Liverpool, England, who has been charged with permitting a child to name a toy bear. In case this still sounds a bit baffling, let me clarify: the charge is "insulting religion, inciting hatred and showing contempt for religious beliefs". Perhaps I should give slightly fuller details of her brutal crimes against God and man?

Mrs Gibbons was in charge of an elementary class at a private school. She used a toy teddy bear as a prop in various classes on writing, letting students take the bear home and getting them to write an account (as if written by the bear) about his experiences. There was a shared exercise book with a picture of the bear on the cover. Clearly, the bear needed a name. Mrs Gibbons suggested "Faris". The class disagreed, and accepted the choice of a class member who wanted the bear to have the name of a boy in the class. She agreed to that, and so the bear had a name: Mohammed.

No parents of students in the class (class 2X at the Unity High School in Khartoum, Sudan) complained or felt offended (see this article by a teacher at the school). But it is forbidden under Islamic laws to make an image of the prophet Mohammed in any kind of medium. (Yes, I know, the picture was of Mohammed the bear named after Mohammed the boy; it was not of Mohammed the prophet. But please don't interrupt me; I'm just trying to tell this story the way it has apparently unfolded.) Eventually, although the bear was removed from circulation and replaced by some other soft toy, government officers from the Ministry of Education came calling and asked to interview Mrs Gibbons. Then they took her to the police station and grilled for another five hours.

The school assured them that the teacher was entirely innocent of anything but a slight and understandable error of cultural misunderstanding. But it was to no avail. She has now been charged with blasphemously defaming the Prophet, and faces a punishment of 40 lashes or jail time or a large fine. Top clerics including the Sudanese Assembly of the Ulemas insist that she is part of a Western plot against Islam and must suffer the full punishment laid down by Islamic law. The school has been temporarily closed, and other teachers who work there are travelling in vehicles from which the school's name has been removed. Angry Muslims are gathering outside the police station where Mrs Gibbons is imprisoned.

Keep these facts in mind: (1) it's a bear, not a pig; (2) some years ago the Islamic Society sold a soft toy made for British Muslim children named Adam the Prayer Bear (Adam is also the name of a Muslim prophet); (3) Mrs Gibbons did not name the bear or advise on the name; (4) she actually recommended a different name; (5) it was a Muslim boy in her class insisted on "Mohammed"; (6) he was naming the bear after another boy, not after the revered prophet; (7) "Mohammed" (in any of its scores of different romanized spellings) is the single commonest male personal name in the world, and the second most popular in Britain; (8) nobody has suggested that Mrs Gibbons either intended or perceived any possible religious implications; (9) everyone at the school where she works has defended her on those grounds; (10) the Muslim Council of Britain has also defended her, calling the arrest and charges "disgraceful".

Here on the Linguistic Crimes desk we try to highlight the lighter side of language offenses: the zany character of victimless criminality that amount to no more than uttering strings of letters of syllables, the mad asterisking of words too awful to print, the giggleworthy character of loony attempts to suppress free speech. But we are finding it a bit hard to develop a humorous hook for this story, coming to you as it does from the home of the regime that is raping and slaughtering the people of Darfur. Perhaps we should be seeing a ribald vein of comedy in this story of a place where middle-aged male religious leaders insist that women should dress so modestly as to be almost invisible in public, but those same guardians of morality salivate at the thought of giving a woman forty lashes of the whip in private. But we just aren't seeing it; we just cannot get into the right mindset to see the laughs in this one.

News source links:
Times OnLine;
BBC News;
CNN World News
Guardian op-ed

Posted by Geoffrey K. Pullum at 02:23 PM

Thanksgiving: the Greek influence

Following up on our (already remarkably long) series on the stress pattern of thanksgiving, Kimberly Belcher writes:

I unfortunately don't have time to test this hypothesis at the moment (I think testing would start at Early English Books Online), but I suspect that the word "thanksgiving" started as a non-compositional technical term in English, that is, as a literal translation of Greek "eucharistia," eucharist. That would explain its limitation to "gratitude of a certain large-scale, ceremonial nature" and would tend to keep it quite distinct from "giving thanks." This is supported by the British, pre-holiday poems you cite, as they tend to have a doxological and even liturgical cast (notice the prevalence of rituals in the Joseph Beaumont excerpt, which would make me tend to think "Thanksgiving" there has a ritual meaning as well, i.e. eucharistic worship).

This translation into English (circumventing the loan-word "eucharist") may have been motivated by a desire, just before, during, and after the Reformation, to vernacularize Christian vocabulary. I have noticed a tendency in England in the 15-16th century, even among Catholics, to prefer vernacular translations in prayerbooks to Latin versions, even, in some cases, for prayers that the books' owners would undoubtedly have memorized in Latin. Expanding this observation to vernacularization generally would be an interesting research project which I might even do at some point.

The OED doesn't give this etymology, but it's true that the earliest citations given there are in appropriate contexts:

1533 TINDALE Supper of Lord Eivb, One or other Psalme or prayer of thankes giuyng in the mother tongue. 1539 BIBLE (Great) 1 Tim. iv. 4 For all the creatures of God are good, and nothing to be refused, yf it be receaued with thankesgeuynge. 1562 WINȜET Cert. Tract. iii. Wks. (S.T.S.) I. 29 Gyf sic zeirlie memorial in blythnes and thankisgeifing wes haldin.

1535 COVERDALE Ps. xxxix. [xl.] 3 He hath put a new songe in my mouth, euen a thankesgeuynge vnto oure God. 1552 Bk. Com. Prayer (heading), The Thankes geuing of Women after Childe birth.

And one later citation is given that makes the relationship explicit:

1708-22 J. BINGHAM XV. iii. (1845) 770 After this the priest went on with the εὐχαριστία properly so called, that is the great thanksgiving to God for all his mercies, both of creation, providence and redemption.

One small piece of contrary evidence is this use in Francis Bacon's The Wisdome of the Ancients (1609), where thanksgiving seems to mean simply "giving thanks" in a secular and individual sense:

There followes next a remarkable part of the parable, That men in steed of gratulation, and thanksgiuing, were angry, and expostulated the matter with Prometheus, insomuch that they accused both him and his inuention vnto Iupiter, which was so acceptable vnto him, that hee augmented their former commodities with a new bountie.

And EEBO similarly gives us a 1572 publication by Thomas Paynell, The moste excellent and pleasaunt booke, entituled: The treasurie of Amadis of Fraunce conteyning eloquente orations, pythie epistles, learned letters, and feruent complayntes, seruing for sundrie purposes. ... Translated out of Frenche into English, whose preface ("To the gentle Reader") ends like this:

Wherfore (gentle Reader) let it not lothe thée (I pray thée) to reade this fine and fruitfull booke, nor to ensue the honest and vertuous lessons, the prudent admonitions and good counsels of the same: for thou shalt not at any tyme (as I thinke) repent thée more for the reading of it, than I for the translating therof, the which although it be but rude and vnpleasant, yet my mynde and hand were neyther negligent nor slacke to profite thée, and to english it to thy consolation and comfort. Therfore receyue it, I pray thée, as it is, in good part and with thanksgiuing for my good will and paines taking, if thou estéeme it thankes wo.rthie, if not, amende it I beséeche thée, and I with all my heart shal thanke thée nowe and euer.

All the same, if thanksgiving was a "non-compositional technical term" from the beginning, that would give us a place to start in figuring out why its stress pattern seems from the beginning to have been different from that of other English compounds of the same form. But I can't think of any other examples of similar compounds with second-syllable stress, and it remains unclear when and how the American south and most of contemporary Britain ended up with first-syllable stress in this word.

Posted by Mark Liberman at 08:15 AM

November 27, 2007

Cupertino sewage

It started yesterday with Dan Goodman quoting the following in the American Dialect Society mailing list:

From a post in the Minneapolis Freecycle mailing list:

"I have separated and will be forcing from my abusive husband."

Forcing for divorcing, obviously.  A strange eggcorn?  Or what?

Kate Daly figured it out.  It was a very familiar type of error:

WordPerfect and Word both offer forcing as a correction for the typo fivorcing. Since the d and f are next to each other on a standard keyboard, it's probably an auto-corrected typo. Given the topic of her post, I think she can be excused for not checking her work more carefully.

For some time now we've been cataloguing examples of the "Cupertino effect", in which automated spellcheckers yield extraordinary results, as in the "correction" of cooperation (without  a hyphen) to Cupertino that gave the phenomenon its name.  (Well, mostly Ben Zimmer has been collecting them.  See his recent OUP blog entry on the Cupertino effect and his most recent Language Log posting on the subject, on the Muttonhead Quail Movement.)

I'm not a Cupertinologist (pester Ben with your examples, not me), but I very much enjoyed the follow-ups.  First, from Charlie Doyle:

One of the many, many reasons I preferred WordPerfect to MSWord is that every time I typed the word "folksong," WP would tell me I probably meant to say "filching."

Then from Ron Butters, with (in my opinion) the very best so far:

My Blackberry consistently tries to correct "Zwicky" to "Sewage." I hope Arnold will forgive me if that one slips by by mistake.

You can see parts of the program at work here: the W is preserved; before it a Z that could represent an S; after it a K that could represent a G; who knows about the vowel letters.

Posted by Arnold Zwicky at 07:17 PM

In the wake of ThanksGIVing

You never know which ones are going to get around ... The New York Sun gives me the challengingly vague request to "make this week's column about Thanksgiving," I gamely wangle something or other and assume barely anyone will read it since they'll be in transit to join relatives for ThanksGIVing, and much to my surprise the column ends up the occasion for discussion by people other than roughly me and my wife.

Well, Mark, I take your point, of course.

The particular fact that Thanksgiving with stress on the second syllable was already especially common in poetry as far back as the 17th century, whereas it was less so with other compounds such as SHOEMAKER, suggests that the stress pattern had already shifted when the holiday began.

However, I presume that we would rather not just attribute this to mere chance. Rather, the stress shift suggests that the word itself had already drifted from being processed as a strictly literal conjunction of the two words THANKS and GIVING.

That is, the compound word that furnished the name of the holiday was one which had already taken on what linguists would call a noncompositional meaning. From the poetic citations in Mark's post it would seem that THANKSGIVING referred to gratitude of a certain large-scale, ceremonial nature as opposed to simply thanking someone for picking a hair off of your sweater.

However, I do wonder if part of what we see in poetry is due to the word THANKSGIVING, with its ceremonial air, being used more in the poetic register than SHOEMAKER and its like. If so, the word would have been more likely than others to submit to distortion of stress that poets have always considered their license.

I think of examples where poets render compounds in ways that are most certainly foreign to actual speech, rather than reflecting dialectal or idiolectal variation.

Take this passage from Dr. Seuss's On Beyond Zebra, where NOSE-PATTING is rendered with second-syllable stress:

And ZATZ is the letter I use to spell Zatz-it
Whose nose is so high that 'most nobody pats it
And patting his lonely old nose is the least
That a fellow could do for this fine friendly beast
So, to get there and do it, I built an invention:
The Three-Seater Zatz-it Nose-Patting Extension.

Or this from Scott Joplin's opera Treemonisha, in which the stress in booze drinkin' is on the first syllable of drinkin':

It will drive the blues, I'm thinkin',
And will stop Ned from booze drinkin'.

Or this from the recently defunct Broadway musical GREY GARDENS, in which "Big Edie" Beale sings, in the song "Jerry Likes My Corn," about hearing "the old sad SACK sing," whereas in speech it would be "the old SAD sack sing":

Jerry doesn't fight like two fishwives.
Jerry likes relaxing.
Now and then we play my 45's.
Hear the old sad sack sing.

In fact, in this lyric fishwives is pronounced with the stress on wives rather than fish as it would be in speech. The song is brilliant overall, but this small section happens to have bad scansion.

If in fact THANKSGIVING was submitted to this more often than other words, it definitely may mean that the word was actually widely pronounced with second-syllable stress as early as the 17th century. However, this is the kind of thing that leaves one regretting that we only have a century-and-change's worth of recordings of the human voice.

For whatever it's worth, recorded in my head is that my Philadelphia-born father said "thanks-GIV-ing," whereas my Atlanta-born mother said "THANKS-giving."

Posted by John McWhorter at 12:46 PM


On the op-ed page of the NYT on 11/24/07, L. Jon Wertheim lamented the decline of old-style pool hustling, a decline (Wertheim argues) set off by a series of events, culminating in the establishment of a professional pool tour in 2005, which blew the hustlers' cover by exposing them to public view and then disintegrated financially.  Wertheim writes:

The first three events were smashing successes.  But in keeping with the circadian rhythms of pool, the boom times didn't last.

Whoa!  The circadian rhythms of pool?

I can see what Wertheim was trying to say -- that the rhythms of pool are cyclic (though it hadn't occurred to me that the world of pool HAD a rhythm) -- and I can guess that Wertheim, and no doubt many others, got to the meaning 'cyclical' for circadian from hearing or reading the word in context, not realizing that the word is used as a technical term for a very specific kind of cycle, namely one that is about a day (24 hours) long.

(Biologists built on Latin for this term: circa 'about, approximately' plus dies 'day', as in diary and diurnal -- note the two different senses of 'day'.  That was a nice choice, but you really can't expect ordinary people to appreciate the etymologies of lexical items, whether technical or not.  Etymology is not destiny; if it were, learned societies would be misspeaking if they got their journals out less often than every day.)

The fact is that ordinary language is pressed into service in a number of ways to provide technical vocabulary, which then has a very specialized meaning in certain contexts, and at the same time technical vocabulary "leaks out" into ordinary language.  People get the general drift of the technical vocabulary, but (usually not knowing either the etymology OR the context of its technical use) do their best to interpret what they hear.

And they get a lot of it wrong, from the point of view of people in the technical fields.  Epicenter obviously refers to a location (of an earthquake) -- to, in some sense, the central point where the earthquake took place.  Besides center, there's an extra element epi-, which clearly must contribute something.  So the epi- adds extra stuff, probably something emphatic: the epicenter is, people reason, the EXACT center.  (Technically, it's the location on the earth's surface OVER the place where the earthquake event happened, undergound.)  Now, getting all enraged about the common-language use of epicenter for the central point of an event -- it seems to be standard now -- is just as silly as getting all enraged about the common-language use of vegetables to refer to tomatoes, zucchini, peppers, eggplants, etc., all of which are technically fruits in one scheme of biological terminology.

These two cases run opposite to one another.  For epicenter -- and circadian -- the specialists chose the terms, and then ordinary people naturalized them.  For fruit -- and mass, and normal, and many thousands of other technical terms -- the specialists recruited ordinary-language terms, and non-specialists had to try to interpret them in a new context.

But the result is in a way the same: a disparity between technical and ordinary-language understandings of the same expressions.  The larger point is that neither is RIGHT; expressions don't come with deep, essential, eternal meanings.  The different meanings are simply relevant in different contexts.

We're drawn up short when we confront someone saying something we don't understand, or something we can interpret but is not anything we would say, or even anything we recall having heard before.  Social life is full of little surprises and differences in the way people act.  Mostly, we accommodate, doing our best to divine the intentions of the people we're dealing with.

Wertheim's use of circadian rhythm was new to me (please don't write me with citations of earlier instances; I'm not writing about the history of English usage in this case, but about my own experience), but I figured it out, as I assume his readers generally did.  I'd guess that the usage is not yet standard, though on the rise.  I wouldn't use it, but that's just me.

[And now from Darrin Edwards, the entertaining suggestion that Wertheim might have had cicadas -- with their long and regular periods of emergence -- in mind (maybe way in the back of his mind) when he wrote "circadian rhythm".  Edwards searched on {"cicadian rhythm"} and got some hits referring to cicadas; maybe this is an emerging, so to speak, eggcorn.  (None of my dictionaries gives an adjective derived from cicada -- not cicadan, not cicadian, not anything else -- by the way.)]

[Further from Simon Overall, who says he thought for years that circadian rhythms totally had to do with cicadas, and notes that his non-rhotic variety of English probably encouraged this misperception.]

[Still further, 11/29/07: Mark Liberman points out that some people think the insects are circadias; you can google up a few dozen examples.  So the reshaping goes in both directions.]

Posted by Arnold Zwicky at 10:21 AM

Unfolding in the passive tense

What is it with these people who think the passive voice has something to do with tense? Where do they get this stuff? Jim Salant has pointed out to me that Greg Grandin, a historian at New York University, has reviewed two books about Henry Kissinger in the current London Review of Books (see it here), and he says (fourth paragraph back from the end):

For Suri, Kissinger’s ‘career, like the American Century as a whole, unfolded in the passive tense. Both were ‘deeply affected — sometimes distorted — by external factors[...]’

This does not seem to be due to any grammar book. I do not know of any grammar book that confuses voice and tense. All grammarians talk about tense as the contrast between present (writes) and preterite (wrote), but a a voice contrast between active (wrote) and passive (was written). But Language Log has found mentions of a mythical passive tense in The Economist, and in a confessional on a NaNoWriMo forum, and on National Public Radio... And now The London Review of Books joins this List Of Shame.

Probably Arnold Zwicky is right: we're never going to find a grammar-book source for these errors. There is no book that recommends this usage. It's just that so many extensions of the use of the term "tense" have been made in various grammars where "tense is extended to cover all sorts of verbal categories (often realized by morphology on the verb)", and in addition in so many cases verbal form "is extended to cover multi-word combinations -- periphrastic expressions -- as well as single words", we can hardly be surprised if some people think it covers voice as well. Arnold points out that in various grammars he has found references to infinitive tense, conditional tense, subjunctive tense, negative tense, causative tense, permissive tense, inceptive tense, plural tense, imperative tense, and interrogative tense.

Non-finiteness, modality, mood, negation, transitivity, deonticity, aspect, number, clause type... it's all about tense. And will doubtless remain so. Only it isn't.

I don't like to harrumph, really I don't. But honestly... Harrumph.

Posted by Geoffrey K. Pullum at 09:34 AM

More on double modals: Problems of Adjunct Placement

The next episode of my serialized reflections on double modals (the series began here) concerns problems of adjunct placement. Notice, first, that modals like might allow not to come after them (the negation of You may have any home-brewed beer is You may not have any home-brewed beer), while regular verbs don't (the negation of We make home-brewed beer these days isn't *We make not any home-brewed beer these days). That gives us the following contrast between can tomatoes (with the regular transitive verb can meaning "put into cans") and might can tomatoes (the modal verb can meaning "be able to"):

 .  regular transitive verb can modal verb can
WITHOUT NOT: We can tomatoes in the summer. We can eat tomatoes in the summer.
WITH NOT: *We can not tomatoes any more. We can not eat tomatoes any more.

We can exhibit a similar contrast between the regular transitive verb will meaning "leave as a bequest") and the modal verb will (with meanings involving future time and volition):

 .  regular transitive verb will modal verb will
WITHOUT NOT: He willed his estate to charity. He will leave his estate to charity.
WITH NOT: *He willed not his estate to charity. He will not leave his estate to charity.

So, other things being equal (in the absence of special constraints), one would expect a dialect that allows Senator Fred Thompson's might could get done (meaning "possibly could get done"), if could were finite, to also allow I might could not get done (meaning "possibly could not get done"). Do people report that as well?

Moreover, since we do not find notafter adverbs in adjunct function and right before modals — that is, we do not find *He maybe not can get there in Standard English — my speculation that might is simply an adverb in some dialects would predict that in those dialects you cannot say [?*]That might not could get done (to mean "possibly could not get done"). Do people speaking his dialect find that ungrammatical? If they accept it, that would be some evidence supporting the double-modal analysis.

Because of something a student in my class said today, I have realized something else: Marianna Di Paolo's analysis, where sequences like might could are single lexical items, also predicts very firmly that you could not get [?*]That might not could get done — it would involve an occurrence of the independent word not right in the middle of a lexical item!

And because of something Brett Reynolds wrote to me today, I have noticed something else. Adverbs with meanings like "probably" or "possibly" usually occur in quite a few positions: clause-initial, VP-initial, clause-final... and after the tensed auxiliary. This suggests that if might were an adverb there is no reason not to expect alongside I might could get that done by Friday a variant with a flipped order with the adverb after the modal: I could might get that done by Friday. I don't know why, but somehow that doesn't feel plausible.

I don't know the answers to the questions I've been raising; I'm not biased against or in favor of any hypothesis. I really don't know what the syntactic facts about the supposed double modal dialects are. And I think most people who mention them don't; they just have at best a small collection of examples they're heard — anecdotal evidence rather than a systematic basis for a syntactic analysis.

Maybe I could work on the construction here in Scotland: there are Scots dialects with formations like might could (there is at least one speaker in my class). But as far as analysis is concerned, right now I am feeling that (a) my adverb hypothesis about might is basically junk, very unlikely to be true; but (b) every other hypothesis I can think of looks like it has a heck of a hard time too. Nothing seems to fit.

But of course, I need to read more. I'm very ill-educated on this topic. So the next thing I'm going to do is to read the unpublished paper by Chris Barker (New York University) and Cynthia Kilpatrick (University of California, San Diego) that Chris has kindly sent to me. It is about double modals, and Cynthia is a native speaker of a dialect that apparently has them. This should be interesting. In due course, if I learn things, I will report on them here, of course, at your linguistic one-stop shopping location, Language Log.

Posted by Geoffrey K. Pullum at 09:24 AM

More on double modals: double tense?

Consider this question about Fred Thompson's might could: Is the second modal, could, a tensed form, or a plain form? If it is a plain form, then the form could would appear to have broken away from the can lexeme to become the plain form of a new lexeme, and we face the puzzle I briefly discuss here: if modals take the plain form on the following verb, we should expect to find them generally in to-infinitival clauses — we should encounter want to can, tried to could, hope to will, and so on, quite freely. But people don't seem to talk about examples like those nearly so much. (There certainly are large numbers of cases of used to could; but used is a truly strange and baffling form, which some people treat as itself belonging to the modal class, and for now I am ignoring it.) Now let's consider where we are if the could of might could is instead a tensed form. (In Standard English, modals are always tensed forms.) That would be compatible with my idle conjecture that in some dialects might has simply turned into an adverb. But for dialects like that, if there are any, we face a different puzzle: do we get other tensed forms after might? That is, do we find examples like the following in the relevant dialects?

[?*]Jeb might is out back fixing the tractor.
[?*]He might likes her more than he lets on.
[?*]We might didn't turn the oven off.

I do not know the answer to this question. But if we do not get these, and we do not get modals in infinitival clauses very much either, then I can definitely see why Marianna Di Paolo's classic paper on the topic ("Double Modals as Single Lexical Items", American Speech 64(3), Autumn 1989, pp. 195-224) suggested that the modal+modal pairs that are found might just be compound lexical items.

Annie Zaenen suggests that the answer may be that modals do indeed occur in infinitival contexts for a lot of people (which would support the view that modals have plain forms in the relevant dialects, against my suggestion regarding adverb status). She supplies a bunch of examples from corpora. The trouble is that quite a few of these are pretty clearly typing or editing errors. Some are incoherent in any dialect, like these (I have underlined the crucial parts, which look like errors of composition to me):

If the legislation gets out of the Judiciary Committee, to will go to the Rules Committee - the last stop before a bill gets onto the floor.

The following morning the plan is to go to may go to Normandy beaches (Omaha Beach was a very touching place even for Jasmine...)

To know more about our Procurebot we recommend to see the Procurebot Tour and then the Procurebot Demo. To may go deep visiting and/or downloading our White Pages and FAQ.

When I go to will say to him, ' How much for this pad of soles ? ' He will answer, ' Fourteen shillings.' 'Fourteen shillings!' I say, 'I'll give you seven ...

And some are obviously errors by inexpert (probably foreign or illiterate) users of English, like this one:

hi spontaneity!! you are gourgeous!!!! your videos are so sexy,you are so sexy!!!! you are in my friends list, I hope to will be in your! ...

But that doesn't mean they are all to be dismissed. These look like they might genuinely be grammatical and error-free in their dialects of origin:

This year horse racing fans will again flock to the Downs and bet on their favorite contender to will go on to race for the Triple Crown honors.

KW is to write an article to may go to local press.

"With the new functionality provided by the pathology search capability and mining we hope to will provide our staff with a greater level of insight into ...

I took a year off for an internship, and hope to will attend Drexel University in Philadelphia, Pennsylvania starting in April of 2007.

We hope to will have - but have yet to confirm - Limor at our CustomerMade conference in Copenhagen two weeks forward. Limor should be able to give Soeren ...

"Let's wait for the Easter Bunny," referring to his dream claim that GM hopes to will have a test mule with the Volt's drivetrain on the streets by spring ...

With the new 3G network, DST hopes to will bring mobile communication services in Brunei to new heights, offering DST customers one of fastest mobile data ...

So there may be dialects in which will appears freely in infinitival clauses. The topic continues to look intriguing. And quite puzzling.

Posted by Geoffrey K. Pullum at 06:14 AM

November 26, 2007

An experimental control

Brett Reynolds sent in a link to this CBC news story about Canadian Prime Minister Stephen Harper's visit to Tanzania to announce a Canadian donation to "The Intitiative to Save a Million Lives". This is a program to train health workers and furnish treatments for diseases like malaria and measles that especially affect mothers and children. In addition to the CBC coverage, there were stories n the Toronto Star ("Harper annonces new aid program"), the Globe and Mail ("Harper arrives in Tanzania bearing gift"), and many other places.

But neither the CBC, nor any of the other stories that I've seen so far, picked up on the thing that Brett noticed and pointed out to me.

Brett reminded me that last September, there was a significant commotion over President George W. Bush's use of the pleonastic plural "childrens" in a speech in New York City ("Weisberg wins", 9/30/2007). I've been muttering for several years about the herd-journalism of the "Bushisms" phenomenon (e.g. "You say Nevada I say Nevahda", 1/3/2004). Now, as Brett observes, Prime Minister Harper offers an exact experimental control, which is predictably ignored by the press. This time, the Weisbergists lose.

Posted by Mark Liberman at 07:42 PM

More on double modals: the Case of the Missing Infinitivals

In what I admitted was merely an idle speculation, I recently suggested that before we conclude that locutions like might could are examples of modal auxiliary verbs followed by modal auxiliary verbs, we need to rule out the simple possibility that might has turned into an adverb for some speakers, just as maybe did a long time ago. Some people have written to me to say this seems exactly right. Others have pointed out major difficulties for it. It would probably be appropriate for me to say a little more about the issues that bear on this. I will try to do a post or two on the topic. This will be the first. Read on if you wish to hear what I have. This first episode is about the strange case of the Missing Infinitivals.

Somebody wrote to me to say that Dutch allows double modals all over the place: dat zou niet moeten kunnen", literally translates as "that should not must can-be", and more liberally translates as "that ought not to be possible"; of je dat zou moeten willen doen translates literally as "whether you that should must want-to do", i.e., "whether you should desire to do so". This suggests that I have not made the point about double modals clear enough.

Dutch has verbs that are cognate with the English modal auxiliaries, and so does German; but there is a major difference. The English ones only have tensed forms. That is not true of the Dutch or German verbs, so they are entirely irrelevant. The most important of the special features that characterize the modals (also known as the special finites or anomalous finites) is illustrated by the contrast between the regular transitive verb can of can tomatoes and the modal auxiliary can of can go to Spain. In what follows, an asterisk in front of a string of words means it is not a grammatical sentence.

PRESENT TENSE:   Often we can tomatoes.   Often we can go to Spain.
PRETERITE TENSE:   Back then we canned tomatoes.   Back then we could go to Spain.
PLAIN FORM:   We hope to can tomatoes next year.   *We hope to can go to Spain next year.
PAST PARTICIPLE:   For a long time we have canned tomatoes each summer.  *For a long time we have canned go to Spain each summer.
GERUND-PARTICIPLE:   We are canning tomatoes right now.   *We are canning go to Spain right now.

I hope the point is clear: when the syntactic context permits tensed forms, the modal is fine; but when the context demands an untensed form — in infinitival clauses, where the plain form is needed; in a perfect tense clause, where the verb after have must be past-participial; or in the progressive aspect, where the verb after be must be gerund-participial — the modal is banned. It simply doesn't have a plain form or a past participle or a gerund participle. All the modals are like that.

Now, the thing is, in Standard English the verb following a modal is required to be in the plain form. So it immediately follows that in Standard English a modal can never follow a modal. We get this pattern:

FOLLOWING MAY:   They say we may can tomatoes.   *They say we may can go to Spain.
FOLLOWING WILL:   I think we will can tomatoes.   *I think we will can go to Spain.

So this is one point to make about double modals in the dialects that supposedly have them: if we find modals following modals, we should also find that modals can occur in infinitival clauses. That is, we should find not only I might can fix it but also (and I don't know whether to prefix these with an asterisk for the relevant dialects or not) [*?]I hope to can fix it, and [*?]I want to can fix it, and [*?]To can fix it would be nice, and so on. But although people report having heard might can, they never seem to report hearing hope to can. One of the first things I would want to know about the alleged "double modal" dialects would be, why not? Where are all the cases of modals in those other plain-form contexts, like infinitival clauses?

[Update: I'm not claiming sequences like "hope to can" are never found anywhere, of course. Michael Wescoat tracked down a couple of potentially genuine cases:

As an educator I hope to can encourage African-American students to consider nursing as their choice for higher education and future career and profession.

I hope to can keep a running journal of my adventures to keep me organized, and for your enjoyment!

But as he says, we cannot tell whether these were just typing or editing errors. My point is that a serious piece of work on double-modal dialects would have to determine whether modals (or at least some of them) freely occurred in infinitival contexts. Either the syntax of modals is quite different, and modals don't take plain-form verbs in those dialects (which seems strange, for surely everyone accepts It may be raining rather than *It may is raining), or they do take plain-form complements, in which case we should find modals in all plain-form contexts. To study double modals, you can't just concentrate on collecting double-modal examples and pinning them to a board in a display case; there's a whole syntactic and morphological ecology you have to study.]

Posted by Geoffrey K. Pullum at 10:55 AM

FLoP in the funny papers

One of our little pleasures around here is collecting curious conjunctions, and one of the species we collect involves incomplete or hard-to-access parallelism of shared final constituents (e.g. "More risky RNR", "FLoP and anti-FLoP"). Here's one from yesterday's Zits: "... an hour to myself without somebody wanting me to cook for, clean up after or drive them somewhere".

[See here for Neal Whitman's arguments that FLoP should be called RNW ("right node wrapping").]

Posted by Mark Liberman at 07:11 AM

Rock syncopation: stress shifts or polyrhythms?

Here's something that's been on my to-blog list for a long time. The topic is the basic nature of American popular music. The illustrative examples are taken from a talk I gave at Stanford in January of 2005, as part of a workshop on Language and Poetic Form, and the problem is one that I've been thinking about, on and off, since I was in graduate school. I don't expect to have a chance to write it up any time soon, so I'm using this blog post as a way to get some of the issues on the table for discussion with anyone who may be interested.

A good way to explain English accentual-syllabic verse is to compare the metrical scansion of a line to the alignment of a song's lyrics with the meter of the music. In an earlier post ("An internet pilgrim's guide to accentual-syllabic verse", 7/6/2004) I recycled a pedagogical example that I've been using for a long time -- the fact that American pre-school children know, without being told, how to set the many verses of Skip to my Lou, e.g.

 E               C               E               G               (pitches)
 X               X               X               X               (1/4 notes)
 X       X       X       X       X       X       X       X       (1/8 notes)
 X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   (1/16 notes)
little  red     wa-     gon     pain-   ted    blue 
cat      in the but-ter milk    lapping up    cream
rab-bit  in the corn    field   big as  a      mule
hogs in the  po-ta- to  patch   rooting up     corn
dad's    old    hat      and    ma- ma's old   shoe 

It's not just in traditional children's songs that the musical meter and the meter of the lyrics line up, at least to the extent that the strong syllables of the verse generally align with the musical beat.. This was the standard pattern in English-language art songs, popular songs and folk songs alike for hundreds of years; and famous examples of some popular styles continue the tradition. For example, the words to Hank Williams' I'm so lonesome I could cry are an example of the traditional ballad stanza, with alternating four-stress and three-stress lines; and the strong positions in the lyrics line up with the beats of the music in a simple way:

      |  #          #           #           # 
       Hear   that lone-  some whip-   poorwill,
      | #           #           #           #
    He sounds too blue     to fly.
      | #           #           #           #
   The mid- night train    is whin-   in'  low,
      | #           #           #           #
I'm so lone-   some I   could cry. 

The music has four beats to the bar, each subdivided into three by the strumming guitar -- this might be notated in 12/8 time, or in 4/4 time with 8th-note triplets. And as usual in traditional sung ballads, each of the strong syllables in the lyrics is aligned with one of the musical beats, leaving one empty musical beat at the end of the even-numbered lines.

In the first two lines, Williams delays the weak syllables so that he can draw out the strong syllables like that lonesome train whistle. (I've marked the musical (and lyrical) beat with #, and guitar-strumming pulse with %.)

   #                       #                       #                       #
   %       %       %       %       %       %       %       %       %       %       %
   x   x   x   x   x   x   x   x   x   x   x   x   x   x   x   x   x   x   x   x   x
  Hear               that lone-              some whip-               poorwill,

In the last two lines of the stanza, the off-beat words align with the musical meter in a more standard sort of way, lining up with the third of the ternary subdivisions of the beat:

           x                       x                       x                       x
   x       x       x       x       x       x       x       x       x       x       x    
   x   x   x   x   x   x   x   x   x   x   x   x   x   x   x   x   x   x   x   x   x
  I'm so  lone-           some     I             could    cry

An 18th-century audience would have been puzzled by the reference to a "midnight train", and no doubt by other aspects of the content, but in general, this style of setting words to music would have been familiar to them.

But the situation is very different in many genres of American popular music -- a different sort of tune-text alignment starts to appear in the early years of the 20th century, and becomes dominant by the century's end. Here's a stanza from James Taylor's 1971 lullaby You can close your eyes:

 #      . |  #   .  #  .    #  .  #  .  | #  .  #  .  #  . #  .  | #    # # # | # # # 
Well, the   sun  is   sure-      ly sink-          in'       down,

 #      . |  #   .  #  .    #  .  #  .  | #  .  #  .  #  . #  .  | #    # # # | # # # 
but   the   moon is  slow-       ly ris-           in'.

 #      . |  #   .  #  .    #  .  #  .  | #  .  #  .  #  . #  .  | #    # # # | # # # 
       So  this  old  world   must still    be      spinnin'      round.
 #      . |  #   .  #  .    #  .  #  .  | #  .  #  .  #  . #  .  | #    # # # | # # # 
       and   I       still          love          you.

The meter of the lyrics is the same traditional ballad stanza, with its alternation of four-stress and three-stress lines: "Well, the SUN is SUREly SINKin' DOWN / but the MOON is SLOWly RISin'". (It's unusual to rhyme the odd lines rather than the even ones, and the third line has an extra stress group in it, but never mind that for now.) The musical meter is simple 4/4 time, with four beats per bar, subdivided into two. The setting is sparse -- there are a couple of wordless bars after every line, so that the ballad stanza turns into a 16-bar form rather than an 8-bar form. But the key thing here is that only the first strong position of each line is reliably aligned with a musical beat. Of the other 10 metrically-strong syllables, 8 are off the beat.

What's going on? The lyrics are metrically clear enough, and the music is plain vanilla 4/4 -- but the musical meter and the lyrical meter don't line up very well at all.

A theory of such tune-text misalignments can be found in David Temperley, "Syncopation in Rock", Popular Music 18(1), 1999:

The Oxford Companion to Music defines syncopation as 'the displacement of the normal musical accent from a strong beat to a weak one'. While I will propose a slightly different understanding of the term, this definition does point to two important aspects of syncopation. First, syncopation involves a deviation from the 'normal' placement of an accent: usually, accents occur on strong beats, but in a syncopation, a weak beat (or rather an event on a weak beat) is accented. Second, syncopation involves displacement; in a syncopation, an accent that belongs on a particular strong beat is shifted or displaced to a weak one (I will suggest that it is actual events, rather than accents, that are displaced, but the idea of something being displaced from where it belongs is essentially right). The distinction between these two points -- the accenting of weak beats, and the displacement of an accent from one beat to another -- is important. Simply understood as the accenting of weak beats, syncopation is commonplace in many kinds of music, including classical music. [...] However, it is not at all clear that these passages involve displacement; are the accents really heard as belonging on some other beat? In the case of rock, however, the sense of displacement is much more apparent, and there are strong constraints on the ways in which this displacement occurs. In this way, I shall argue, the nature of syncopation in rock is fundamentally different from that in classical music. [...]

The clearest evidence of syncopation in rock is found in the setting of text. [...] It is a well-known fact that text tends to be set to music in a way which matches the stresses of the text with the metrical accents or 'strong beats' of the music. In most pre-twentieth century vocal art music or folk song, it will be found that stressed syllables generally coincide with relatively strong beats. This can be seen in the Handel passage in Example 2a. ["The mighty God, the everlasting father, the Prince of Peace".] In the two Beatles melodies, however, this is clearly not the case. [...] In 'Here Comes The Sun', 'been', 'long', 'lone-' and 'win-' are stressed; the syllables in between are unstressed. But in the music, each of these syllables falls on a weak metrical beat [...] At first thought, this might suggest that the rules for text-setting in rock are fundamentally different from those in classical music. Notice, however, that if certain notes in the melodies are shifted over by a quaver or semiquaver, the metrical grid become nicely aligned with the stress pattern of the words. It seems reasonable to suggest, therefore, that in the internal representation we form when listening to rock music, we are understanding the metric grid and the stress pattern as coinciding. We retain, on principle, the assumption that stressed syllables should occur on strong beats, but we understand certain syllables as 'belonging' on beats other than the ones they fall on.

A similar set of issues arises in purely instrumental music, but tune-text alignment provides an especially clear form of the phenomenon, since the intrinsic stress pattern of the words is understood independent of their setting.

Temperley's examples are typical, his description is correct, and his explanation is compelling. But it seems to me that it's not the whole story, and for some American popular music it might not be the right story. In some cases, at least, there's an alternative account, in which stressed syllables don't stylistically shift off of strong beats, but rather land on strong beats that aren't in the standard place -- because we're dealing with polyrhythmic music. The African roots of American music are elaborately and diversely polyrhythmic; and some of this has remained, and been periodically reinforced by influences from Latin America and from Africa directly.

For example, the "habanera rhythm" (originally contradanza habanera) is 8 beats divided as 3+3+2 or 3+(1+2)+2 or (1+2)+(1+2)+2, and at the same time (at least implicitly) divided into the "square" 4+4 rhythm. In musical notation, the 3+3+2 rhythmic pattern might be rendered something like:


Counting in 16th-notes, these would be something like the two lines below the digit string:

o       o       o       o       o
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 ...
o     o     o   o     o     o   o
o o   o o   o   o o   o o   o   o

According to Steven Loza's article on Latin Caribbean Music in Vol. 3 of the Garland Encycopedia of World Music,

The habanera was the first Cuban style to strongly influence music in the United States and was probably the most influential style throughout the Americas. Besides directly affecting jazz, it was the root of the Argentinian tango and fed Mexican styles that were to travel north. [...]

In the early nineteenth century, the contradanza preceded the popularity of the habanera. According to many sources, the contradanza was brought to the United States by French refugees fleeing the Haitian revolution. The dance was also influenced by certain African forms. In Cuba it developed two closely related time signatures, 6/8 and 2/4, both of which influenced the later dance form. "It was presumably black musicians who began to syncopate the contradanza's rhythm. A so-called ritmo de tango, extremely similar to the Argentinian tango, became a feature of Cuban contradanzas, and spread into many other local forms" (Roberts 1979:5). In addition to the merengue of Haiti and the Dominican Republic, the cinquillo, a five-beat throb cast in duple meter, also became a feature of Cuban contradanzas, most likely via the French contradanse. Both the ritmo de tango and the cinquillo became fundamental to the contradanza habanera by the early nineteenth century. Black musicians further syncopated the ritmo de tango, Africanizing this widespread European dance. Though not purely a Havana style as implied by its name, and never referred to as such by its creators, the habanera was easily absorbed into American music largely because its rhythmic pattern was contained in a single measure of common 4/4 meter, and it was frequently incorporated into the bass line of piano compositions. The earliest known piano version of the habanera was "La Pimienta," written in 1836.

Jelly Roll Morton called such polyrhythms the "Spanish Tinge":

Now in one of my earliest tunes, "New Orleans Blues", you can notice the Spanish tinge. In fact, if you can't manage to put tinges of Spanish in your tunes, you will never be able to get the right seasoning, I call it, for jazz.

In a flat and ugly midi implementation, a habanera rhythm sounds something like this:

A more graceful and elegant performance is the background to Little Richard's Slippin' and a Slidin':

Here's the time waveform for the background segment, showing the 3+3+2 structure:

But the 4+4 rhythm is also implicitly there, and that's the pattern that the lyrics lock onto for the first eight bars -- which is one line repeated twice, as usual for in this song form -- while the instrumental background keeps up the 3+3+2 pattern:

  o          o          o      o          o       o
  1  2   3   4   5  6   7  8   1  2   3   4  5  6 7  8
  o              o             o             o
slippin' and a slid-    in'   peepin' and a hid-  in'

  o          o          o     o          o   o    o
  1  2   3   4  5   6   7  8  1   2   3  4   5  6 7  8
  o             o             o
been  told   a long    time ago

And then in the final four bars of the stanza, the lyrics shift and lock onto the 3+3+2 rhythm:

1  2  3  4  5 6  7 8  1  2  3  4   5 6  7  8  1 2  3  4 5 6 7  8  1 2 3 4 5 6 7 8
o                     o                       o                   o
o        o       o    o        o        o     o       o     o     o
I  been told    baby you been bold   I won't be your fool  no more

You could analyze this as a "rock syncopation" in Temperley's terms -- the strongly-accented words "told", "bold" and "fool" have all shifted from the fifth eighth-note of their bars to the fourth. In the standard, square (4+4) organization of those notes, this is a shift from one of the strongest positions in the meter to one of the weakest. But in the (3+3+2) habanera pattern, the fourth eighth-note is exactly where the ictus belongs.

So was it the "square" setting of the previous line that was actually "syncopated"? Not really -- the (4+4) pattern is also simultaneously available. Neither setting really represents a shift away from the strong positions in the musical rhythm. Instead, the setting is shifting between one polyrhythmic definition of metrical strength and another.

But on the other hand, as John Halle has pointed out to me, the final strong syllable "more" anticipates the downbeat, in a way that is so common in this style of music as to be essentially obligatory -- and this seems much more like one of David Temperley's shifts.

It seems to me to be an open question how much of verse-setting in American popular music should be seen in terms of stresses shifted off the beat, in Temperley's sense, and how much in terms of stresses shifting between polyrhythmic figures, as in the Little Richard song analyzed above. There are many examples that seem clearly to call for the syncopation theory, and many others that seem equally clearly to motivate the polyrhythm theory -- and many where either theory might plausibly be applied. It may well be wrong to aim to resolve the question, at least in any overall way-- perhaps this ambiguity is one of the symptoms of the cross-fertilization of musical traditions in the past century of popular music. The same piece may even have a "European" interpretation for some performers and listeners, and an "African" interpretation for others.

A great deal of evidence is out there, in the patterns of thousands of performances in dozens of styles. There's been relatively little research on this, oddly enough; and the field is open to anyone with a laptop and an interest in the question.

Posted by Mark Liberman at 06:13 AM

November 25, 2007

GOP double modal face-off!

Last week I noted a double modal construction with might (or, if you prefer, a case of non-standard adverbial might) as used by Republican presidential candidate Mike Huckabee on "Fox News Sunday." Huckabee, who hails from Arkansas, told Chris Wallace, "if this bipartisan commission had actually studied the fair tax, they might would have had a different conclusion." I wrote (more presciently than I could have guessed):

Perhaps Huckabee's "might would have" is a dog whistle to working-class Southern supporters, demonstrating he's really a man of the people. If so, look for some one-upsmanship from Tennesseean Fred Thompson.

As if on cue, Thompson went on "Fox News Sunday" earlier today and responded thusly to a question contrasting his position on abortion to Huckabee's (Thompson favors repealing Roe vs. Wade and leaving abortion regulation to the states, while Huckabee favors a "human life amendment" to the Constitution):

What the situation is now is as follows. Because of Roe vs. Wade, all states are restricted from passing rules that they otherwise would maybe like to pass with regard to this area. If you abolish Roe vs. Wade, you're going to allow every state to pass reasonable rules that they might see fit to pass.
When we had control of the House, had control of the Senate, had control of the presidency, there wasn't a serious effort to put forth a constitutional amendment because people knew that it couldn't pass — couldn't pass, wouldn't pass.
What I've been talking about is directing our energy toward something that was halfway practical, something that might could get done. [audio]

Might could is in fact the most common double (or multiple) modal construction — see, for instance, the corpus of examples from the Carolinas collected by Margaret Mishoe and Michael Montgomery ("The Pragmatics of Multiple Modal Variation in North and South Carolina," American Speech, Vol. 69, No. 1, Spring 1994, pp. 3-29), where might could appears in 57 out of 236 multiple modal tokens attested. Might would have is much rarer (it's unclear if Mishoe and Montgomery collected any examples in their Carolina corpus, or if a few were folded into the 37 might would tokens).

So Thompson might could be matching Huckabee's dog whistle, while at the same time avoiding any of the more exotic constructions found in the relevant dialect regions. That way Thompson can sound Southern but not too Southern... call it the grammatical Goldilocks effect.

Posted by Benjamin Zimmer at 06:32 PM

Words of One Syllable Department

In the NYT of 18 November, Sam Roberts wrote about "Mailer's Nonfiction Legacy: His 1969 Race for Mayor", saying that

Even his three-word campaign slogan -- a vulgarization of "No More Bull" -- was unprintable.

(Hat tip to Barry Popik, in ADS-L yesterday.)  As Mark Mandel remarked, that's really getting it backwards.  Whatever the history of bull 'trivial, insincere, or untruthful talk or writing; nonsense' (OED) and bullshit 'rubbish, nonsense' (OED again) -- this isn't entirely clear -- there's no doubt that in the popular view most occurrences of the former are seen as euphemisms for the latter, rather than the latter being seen as a vulgarization of the former.

Posted by Arnold Zwicky at 01:51 PM

November 24, 2007

A Non-Monolingual Prime Minister

This isn't the place for a discussion of the consequences of the Australian election, which resulted in the defeat of the right-wing Liberal Party, but it is nice to see that Kevin Rudd, the leader of the victorious Labor Party, who will soon be Prime Minister, speaks fluent Mandarin. You can hear him greet Chinese President Hu Jintao here. If only he spoke an aboriginal language, he'd be perfect. :)

Posted by Bill Poser at 10:16 PM

Does Cosma deserve a refund?

Cosma Shalizi tears William Saletan up, blow-torches the shreds, and jumps up and down on the ashes. And then he starts in on Saletan's employers:

The editors of Slate have just demonstrated that they either cannot or will not do their job. Someone who reads a story there now must ask themselves "Is this appearing here because the editors are incapable of recognizing that it's worthless? Is this appearing here because the editors want to make propaganda, to manipulate me into believing something, truth be damned? Is this appearing here because the editors owed someone a favor, or wanted to get into someone's pants, or wanted to acquire a reputation for being edgy and contrarian, truth be damned?"

This is a great rant, and eminently well deserved -- but it seems to me that we all ought to ask these same three qustions about everything we read. And all too often, the answers are going to be "yes, the editors are quantitatively illiterate", "yes, the editors have an ideological agenda", and "yes, the editors were motivated in part by personal affinities and by the desire to pander to their readers or to shock them".

It's still worthwhile to get angry about egregious examples. Writers and editors at places like Slate certainly care about their reputations, which means that they need to pretend to care about the truth, whether or not it actually ranks very high among their goals. A healthy intellectual ecosystem requires that mistakes sometimes be caught, or bullshit would reign supreme. And journalistic mistakes about science and mathematics -- unlike mistakes about geography, politics and commerce -- have traditionally (i.e. in pre-blog days) been free of consequences, since the people who know better could only mutter into their oatmeal.

Linguists are particularly sensitive to this problem, since we've failed so badly in our duty to educate contemporary intellectuals -- including writers and editors -- about syntax and phonology and so on. But the statisticians have done even worse. William Saletan's Slate series on racial differences in IQ should remind us that as a whole, our society is as willfully ignorant of basic statistical concepts as the Pirahã are of counting.

And just as people are confused by the ordinary-language meaning of terms like "sentence" and "vowel" and "modifier", so they often mistakenly think they understand terms like "correlation" and "factor" and "heritability" because these words have ordinary-language counterparts with vaguely analogous meanings. This is like believing that 7 means "small pile" and 25 means "big pile" -- sometimes the translation works, but often it doesn't. And reasoning based on these false translational equivalences can lead almost anywhere.

How about you? This is an important debate, and as a citizen of the world, you ought to try to understand the arguments for yourself, rather than just supporting the "expert" with the most evocative anecdotes, the best slogans, or the biggest bullhorn. The required mathematics is not very complex or difficult, and Cosma recently posted two long discussions of the relevant issues: "Yet More on the Heritability and Malleability of IQ" (9/27/2007), and "g, a Statistical Myth" (10/18/2007), which are free and accessible.

Cosma begins the first post (well, after some pro forma moaning about how depressing the whole thing is) by saying

I am going to assume that you know what "variance" and "correlation" are, but not too much else.

So to start with, you should ask yourself whether you can define and calculate the variance of a set of numbers, or the correlation between two sequences of numbers. If not, then read the (linked) wikipedia articles -- and spend a little time playing with the concepts in the context of an interactive program like R. Once you've paid that entry fee, read Cosma's posts. (It's more fun that you might think -- I especially recommend the discussion of the heritability of zip codes, and you could go back and read the prequel about the heritability of accent.) And then go through William Saletan's articles, and decide for yourself what they mean about the abilities and motivations of the writer and his editors.

Too big a price to pay? OK. But watch out for that river trader who's telling you about how big a pile of brazil nuts he wants for how many pieces of cloth.

[Cosma replies:

I am not sure that "we all ought to ask these same three qustions about everything we read" isn't an impossible (and questionable) ideal. Consider the difficulties of actually answering those questions in any particular case; they will often be substantial. And yet if the media ecology is functioning acceptably, the answers will either be "no", or perhaps better "not so much as to matter", and so the cost/benefit ratio is pretty skewed.

Well, perhaps I should have said "...keep these three questions in the background while processing everything we read", or something like that. Institutions have ideological biases and commitments, and so do individuals, including thee and me; we all have limitations of knowledge and skill; editors and journalists, like the rest of us, make choices in part based on personal connections and affinities or enmities.

Everything that we read is influenced, often significantly, by these factors. That doesn't mean that it's all crap -- it's not even true (in my opinion) that there's a strong negative correlation between quality and degree of influence. Ideological commitment can provide the motivation for difficult work with interesting results; ignorance can occasionally protect against the falsehoods that "everybody knows", or promote creativity by forcing someone to develop an unusual route to a solution. And personal networks are the fabric of cultural evolution, after all.

That's not to say that bigotry, ignorance and favoritism are always or even usually good things, or that truth is not the paramount virtue. But the problem with Saletan's series of articles, it seems to me, was a different one, namely irresponsibility. He used his powerful voice as Slate's "national correspondent" to address a sensitive and important issue that he doesn't understand very well, and to promote a point of view that has already done a lot of damage, and is likely to do more.]

[Update 11/27/2007 --

Cosma warned me that this discussion is easier to get into than to get out of, but I didn't listen.

I'm interested in the rhetoric of public discussion of science, and especially the rhetorical interpretation of statistical concepts; and I'm also interested in what you might call the ecology of journalism, especially as it influences public discussion of science. All this led me to link (above) to Cosma's rant against William Saletan's recent series on race and IQ. My basic observation was that Cosma has unrealistically high expectations about the relationship among journalism, expertise and the pursuit of truth. In passing, I observed that Cosma's earlier posts on heritability and g are an accessible tutorial on a important and often-misunderstood set of topics, and I urged readers to consider reading them.

This in turn has led me to P-ter at Gene Expression to take me to task, under the title "Linguist: I can use R, you can't. Thus, your motives are questionable. QED." The post says things like "Dr. Liberman assumes that Cosma concludes that heritability estimates are worthless. This is not the case."

A modest point in self-defense: I don't (and didn't) interpret Cosma's posts to mean that "heritability" and "g" are worthless concepts (though I think that they are easy to misuse individually, and often toxic in combination). All that I meant to say about these concepts is that the scientific versions are not at all the same as the ordinary-language versions, and that as a result, much of the discussion of the heritability of intelligence is confused and ill-founded. I'm well aware that simple mathematical models are often of great scientific and technological value. I spend a good deal of my time trying to persuade people that this is true, and teaching them how to put that belief into effect.

But there's a difference between using a map and thinking that it's equivalent in every way to the territory it describes. The only way to avoid serious conceptual errors is to understand the tools you're using, and the material you're using them on.

If you're going to debate the price of eggs, you should be able to count and to multiply. Of course, you can count and multiply, and still be completely out in left field on the price of eggs. But if you can't count and multiply, then you shouldn't waste our time with your arguments about why the conventional wisdom on egg-pricing is all wrong. If you insist on doing so anyhow, you'll forgive us for wondering why.

Similarly, if you're going to debate the heritability of g, you should understand variances (and the problems of estimating how to divide them up) and correlations (and the meaning of the results of factor analysis of positively-correlated variables). That's because heritability is a ratio of estimated variances; and g is a construct emerging from performing factor analysis on certain kinds of positively-correlated test results. If you don't understand those things, then you can't understand the literature on the heritability of intelligence, and you shouldn't waste our time with your arguments about why the conventional wisdom on that subject is all wrong.

The point is not that simple statistical models are useless, or that people who can't use R are barred from discussing them. What I actually said was that this is an important topic, and therefore you should want to learn about it, so as to come to an informed conclusion; and that Cosma's two long posts on the subject are fairly accessible -- except that the price of admission is to understand the meaning of "variance"  and "correlation".

The definitions of those terms are simple -- variance is the mean squared difference from the mean, and correlation  is the inner product of mean-corrected, length-normalized vectors. The equations are even simpler, as you can learn from the wikipedia entries. But there's a catch -- you probably can't assimilate these simple definitions and equations unless you already know enough mathematics that you already know these definitions and equations. As a result, if you don't already know about variances and correlations -- or if you once sort of knew, but have more or less forgotten -- there's a problem.

So I suggested a way out. At least for someone like me, a good way to understand new concepts is to play around with them in a practical setting. And luckily there's free software that makes this pretty easy -- a lot easier than it used to be in the days of pencil and paper, or even electronic calculators -- easy enough that for simple concepts like variance and correlation, you can probably demystify them for yourself, from a standing start, in a couple of hours. That's where programs like R come in.

This was all pretty compressed, I admit, and easy to misunderstand. However, my point was not to exclude anyone from the conversation, but rather to suggest how they can find a way in. If, after understanding the arguments on both sides, they agree with Saletan's conclusions, then we can argue about it. But if they don't understand what they're talking about, what's the point of arguing?

And just so that we're clear on what the issues are here, Josh Marshall reminds us of Saletan's basic premise, which is that that the genetic mental inferiority of Africans is as well established as the theory of evolution is, and that well-meaning liberals who try to deny the genetic mental inferiority of Africans are pitting their faith against scientific fact, just like the well-meaning believers who don't want to accept evolution. Here's Saletan, as quoted by Marshall:

If this suggestion [i.e., the genetic mental inferiority of Africans] makes you angry—if you find the idea of genetic racial advantages outrageous, socially corrosive, and unthinkable—you're not the first to feel that way. Many Christians are going through a similar struggle over evolution. Their faith in human dignity rests on a literal belief in Genesis. To them, evolution isn't just another fact; it's a threat to their whole value system. As William Jennings Bryan put it during the Scopes trial, evolution meant elevating "supposedly superior intellects," "eliminating the weak," "paralyzing the hope of reform," jeopardizing "the doctrine of brotherhood," and undermining "the sympathetic activities of a civilized society."

The same values—equality, hope, and brotherhood—are under scientific threat today. But this time, the threat is racial genetics, and the people struggling with it are liberals.

Evolution forced Christians to bend or break. They could insist on the Bible's literal truth and deny the facts, as Bryan did. Or they could seek a subtler account of creation and human dignity. Today, the dilemma is yours. You can try to reconcile evidence of racial differences with a more sophisticated understanding of equality and opportunity. Or you can fight the evidence and hope it doesn't break your faith.

To repeat, this is not about whether "heritability" (in the technical sense of the ratio of genetically-associated variance to total variance) is ever or always a useful concept, or whether "intelligence" (in the technical sense of a linear combination of correlated test results) is ever or always a useful concept. It's not even about the difference between applying the technical concept of heritability to a metabolic disorder like diabetes and to a statistical construct like IQ. Rather, it's about Saletan's assertion that a starkly racist theory is just as scientifically well established as the evolution of species is.

Josh Marshall calls this "an equation of almost unparalleled absurdity". I agree, not out of political correctness, but out of scientific conviction. And if the point is going to continue to be debated, I want as many people as possible to understand the concepts involved, precisely because I believe that the scientific evidence supports my views, and refutes Saletan's.

Josh offers a suggestion about  Saletan's motivation, or at least about the pattern of his rhetoric:

... right-wing inability to come to terms with modernity and modern science must be equalled by something on the other side of the aisle. [...] I'm not sure I've ever seen a more nonsensical example of that TNR-originating disease of facile contrarianism for its own sake.

This makes sense to me.

Saletan sees is a debate between clear-sighted scientific racists and muddle-headed politically-correct do-gooders. As for P-ter at Gene Expression, I take it that he is just trying to defend the use of simple linear models in science in general, and in empirical population genetics in particular, from what he understood as a general attack. But both of them are wrong about the nature of the argument. ]

Posted by Mark Liberman at 01:07 PM

November 23, 2007

Be off with you!

What on earth does Be off with you! have as its understood grammatical subject? (I know the phrase is a bit archaic and fairy-storyish; but it gets 40,000 Google hits. We've all encountered it.) You see, the understood subject can't be second person, as in imperatives, because then the pronoun in the with-phrase would be obligatorily reflexive (as in Be gentle with yourself), and we would find *Be off with yourself! instead. But we don't.

Updates, November 24th:

1. To the various people who have asked me whether I know the paper by the mythical Quang Phuc Dong (now known to be the late James McCawley) on English sentences without overt grammatical subject, yes, of course. Every linguist who remembers who broke the glass (it was Floyd), or who can recall when Mick Jagger first said he wouldn't want to still be performing when he was 40, remembers Jim McCawley's lovably puerile experiments in pseudonymous pornolinguistic and scatolinguistic underground essays. Quang discusses expressions such as Fuck you, which have a similar lack of overt subject. (Lonnie Chu seems to have stashed a copy of his paper on the web here.)

2. To the several people who have suggested that Be off with you is a calque on an Irish phrases like Amach leat "out with you", I would point out that this says nothing about the problem I raised, which is about what the understood subject could be.

3. To the people who have suggested that the understood subject is "the devil", so that Be off with you really communicates The devil be off with you, or May the devil be off with you, I have a simple refutation to offer (and it is suggested by an argument of Quang's): it is simply that if you were saying "Be off with you!" to Satan himself, then under this hypothesis we would expect a reflexive:

*Hey Satan, be off with yourself!

But that doesn't seem to be what one would say.

4. The best hypothesis so far (thanks to Chris Weimer for this) is that the you of Be off with you is an archaism from the time when the accusative forms of the pronouns were frequently used instead of the reflexive forms. One well-known line showing this clearly is Now I lay me down to sleep (me for myself), and more recently, in Wilbur Harrison's grand old blues number "Kansas City":

They got some crazy little women there,
And I'm going to get me one.
This non-reflexive-form pronoun usage could perhaps be the solution for some of Quang's cases as well: after all, in modern colloquial English we find Fuck you alongside Go fuck yourself. However, expressions like Damn you and To hell with you remain a problem, because we never find *Damn yourself or *To hell with yourself. Yet as Quang notes, Damn God! is grammatical, and *Damn himself! is not (and the same applies to To hell with God! and *To hell with himself!), which means the understood subject in these cases cannot be God either.

5. For a truly radical suggestion about Damn you, Fuck you, etc., see The Cambridge Grammar of the English Language, pages 1360-1361, especially footnote 70 on page 1361. It astonished me when we thought it up. It is too radical for me to repeat here. You are just going to have to lift CGEL down from the shelf, or make a trip to the library.

6. One other observation, which could be quite important, is from David Denison: he points out that there are with-phrases within which the reflexive requirement does not apply, e.g., Take me away with you (it is not *Take me away with yourself. Notice also, He's taken her away with him (*with himself), I'll keep it with me always (*with myself always). Now, there are also with-phrases into which the reflexive requirement does apply; e.g., I'm so pleased with myself (*with me), or You're never happy with yourself (*with you). But as a rough first guess at how things fall out, it looks to me like in the latter cases the with-phrases are complements, and in Take me away with you the with-phrase might be an adjunct. (It's a bit hard to tell, because the phrase take X away with Y is so familiar, it's almost a fixed formula.) It could just be that the right answer to my original question is that simple: with you in Be off with you! is an adjunct; end of story. However, it also might not be; see the next update.

7. Randy Alexander has pointed out something really important (and I can't believe I overlooked the task of checking this, because it is so simple): he searched with Google for the string "be off with yourself"; and there are thousands and thousands of hits, many of them from a long time ago. We even find (in an 1852 story) "Say to him, Be off with yourself, Satan"! So it looks like there has been plenty of variation between Be off with you and Be off with yourself down the years. That fact returns us to favoring update 4 again.

Do you see how complex and fascinating and yet ultimately amenable to investigation this syntax business is?

Posted by Geoffrey K. Pullum at 03:39 PM

Flacks and hacks and brainscans

On November 11, the New York Times published an unusually scientific Op-Ed piece (Marco Iacoboni et al., "This is Your Brain on Politics"), accompanied by an impressive multimedia slide show of fMRI brain scans. On November 14, the NYT published an extraordinary letter, signed by 17 eminent neuroscientists, which asserts that the Iacoboni Op-Ed "uses flawed reasoning to draw unfounded conclusions" (Adam Aron et al., "Politics and the brain"). An extraordinary editorial yesterday in Nature ("Mind games: How not to mix politics and science") delivers an additional stiff rebuke to the NYT editorial board:

The results described in the op-ed are apparently the claims of a commercial product posing as a scientific study. [...]

Articles on The New York Times op-ed pages are opinionated by definition, and shouldn't normally require peer review. But here, the paper's editors have instead published the results of (to put it mildly) questionable scientific research, disseminating this information to millions of their readers who may not have the background to recognize for themselves the absurdity of some of the authors' conclusions.

Although it is a gross disservice to science and indeed to politics, it is a great deal for the company. Scientific publication would have required the authors to divulge their data and qualify their assumptions — and some journals might even have required that they declare their financial interests. Whatever the motives, seducing The New York Times' editors with the allure of Technicolor brains lighting up with Hillary Clinton angst yielded no more or less than a multimedia advertisement for the company's product to millions of readers.

The basis of Nature's complaint is that the "research" behind the 11/11 NYT Op-Ed came from FKF Applied Research Inc. ("The Leader in NeuroMarketing"), and was scientifically questionable (though it's hard to tell how bad it is, since none of the details were published), and was clearly intended to promote FKF's commercial services.

While I share Nature's outrage, I feel that the moral tone of the editorial comes uncomfortably close to Captain Renault's famous line "I'm shocked, shocked to find that gambling is going on here!"

The NYT's provision of free advertising to FKF follows a pattern that's much commoner than this extraordinary Nature editorial might lead you to think. We've documented many examples here over the years, where the science is even more obviously absurd, and the commercial motivation even plainer. For example: "Enhance breast size by 80%", "It's always silly season in the (BBC) science section", "The brave new world of computational neurolinguistics", "Another day, another reprinted press release", "Quit email, get smarter?". And there are many other examples every week of the year -- see the archives of Ben Goldacre's Bad Science for a larger sample.

Some of the examples are subtler, and some are even more blatant. But they all have the same source: the hidden symbiotic relationship between flacks and hacks. This relationship was discussed a couple of weeks ago, in passing, as a result of a public tantrum by a magazine editor who was annoyed by getting too many emails from publicists (Andrew Adam Newman, "Things Turn Ugly in the 'Hacks vs. Flacks' War", NYT, 11/5/2007):

Journalists often call publicists “flacks” and publicists call journalists “hacks,” though rarely in earshot of one another. But the gloves came off last week after Chris Anderson, the executive editor of Wired magazine, chided “lazy flacks” who deluge him with news releases “because they can’t be bothered to find out who on my staff, if anyone, might actually be interested in what they’re pitching.” [...]

After picking the fight, he then made it personal, posting the addresses of 329 unsolicited e-mail messages he had received and telling the senders that he had permanently blocked them. [...]

Roy Peter Clark, vice president of the Poynter Institute for Media Studies, said that the hostilities Mr. Anderson had stirred up were less about technology than about territory.

“I grew up in this business 30 years ago learning that flacks were your enemies, with an asterisk,” Mr. Clark said. “The asterisk was unless you really needed them, when you were on a tough deadline and couldn’t get around them or through them.”

There has long been a “love-hate relationship” between the two professions, said Sheldon Rampton, co-author of “Toxic Sludge Is Good For You! Lies, Damn Lies and the Public Relations Industry” and research director of PR Watch, a publication that reports on dubious public relations practices. “In a lot of ways P.R. people do the legwork for journalists — feeding them stories and sources, and doing research,” Mr. Rampton said.

I wonder: what fraction of mass-media stories about science or technology are initiated, directly or indirectly, by a press release or by an email or phone call from a publicist? It's surely a large proportion, and I wouldn't be surprised to learn that it's within hailing distance of 100%.

It's worth pointing out that the flack/hack symbiosis also leads to absurdly over-hyped stories even in cases where there is no immediate commercial motivation, but a University or scientific-society publicist has done a good job at getting media interest for an evocative piece of research or development, and ignorant or self-interested journalists run with it. Again, it's easy to find examples every week of the year: here's one, dissected in detail: "Envy, navy, whatever", 10/27/2006.

It's not obvious what would be a better system. But readers should understand that the flacks have one set of motivations, and the hacks have another, and that truth doesn't rank very high in either set.

[Note: I recognize that access to the NYT Op-Ed page is a somewhat special case, and a chain of personal connections between FKF and the paper, perhaps involving some political operatives, may have been involved here. But the basic dynamic is the same: someone wants coverage, someone else need material.]

Posted by Mark Liberman at 12:04 PM

Thanksgiving variation

In response to yesterday's post on Thanksgiving stress (the phonological kind), several readers have reminded me that many Americans do have first-syllable stress on the word in question. The general situation seems to be that southerners say THANKSgiving while northerners say thanksGIVing.

This is consistent with an overall tendency, also manifest in cases like UMbrella vs. umBRELla. Thus Connie Eble writes (American Voices: How Dialects Differ from Coast to Coast, p. 47) that "New Orleanians also place the word stress on the first syllable in adult, cement, insurance, and umbrella". In some words (POlice vs. poLICE, GUItar vs. guiTAR), the initially-stressed pronunciations seems to have become stigmatized, and have been abandoned by many better-educated or more upwardly-mobile people. But in the case of thanksgiving, the pronunciation with first-syllable stress is supported by the otherwise exceptionless pattern of  stress in object+gerund/participle compounds, and seems to have remained regionally dominant.

[I should say that the pattern of historical, geographical and social variation here is probably much more complex than this. All that I really know is that there's a general southern tendency towards stress retraction in nouns, and that some specific examples of it have become stigmatized while others (apparently including thanksgiving) have not.]

Craig Russell provided some geographical detail:

*I* say the word with the stress on the first syllable. I was born in North Carolina, raised there until I was 13 years old, and my recollection is that "THANKSgiving" is the normal way to say the word in that part of the country.

At thirteen, my family and I moved to Oregon, where the "thanksGIVING" pronunciation prevails. It took me years to even notice the difference, and, it being a word that only gets said with any frequency for a few weeks a year, the pronunciation was not teased out of me by the other children like much of the rest of my southern accent.

Now I have moved back to North Carolina for work, and I hear the familiar "THANKSgiving" pronunciation again. However, the town that I moved to, Fayetteville, is home to one of the country's largest army bases, Fort Bragg, so the population here comes from all over the country. This gives me an opportunity to hear all sorts of different dialects side by side, and I have confirmed my suspicion: the pronunciation "THANKSgiving" seems (from my unscientific observations) to be associated with native southerners, and "thanksGIVING" seems to be more normal in the rest of the country (damn Yankees!).

Kris Rhodes wrote:

I have no poetry for you, but my wife and I and one pair of friends of ours all say THANKSgiving. I have heard "thanksGIVing" before, but always thought of it as a feature of some marginal dialect or something. I am suprised to discover that the reverse is the case.

Well, I guess that everyone's margin is someone else's core. Anyhow, Kris is from Arlington TX, his wife is from Holliday TX ("north of Dallas near the Oklahoma border"), one of his cited friends is from the Dallas area, and the other friend "was raised in Zimbabwe by missionary parents, and those parents are from Houston TX". They now all live in California.

Craig added:

... assuming that "THANKSgiving" is the original pronunciation (conforming to other gerund/participle-object compounds like the ones you mention), it's conceivable that the change in pronunciation happened after the US colonies were established and regional pronunciations started to drift, and we preserve the original. Of course, the opposite could have happened as well.

If the overall southern-states stress pattern was the older one, I'd expect to see some pockets of UMbrella and POlice in the British isles -- but I've never heard of this. That doesn't mean it doesn't exist. I'll ask some experts.

However, in the particular case of thanksgiving, the OED gives only the pronunciation [ˈθæŋksˌgɪvɪŋ], with first-syllable main stress, while the (1891 American) Century Dictionary follows Noah Webster's (1828) lead in giving only the second-syllable stress:

And as I wrote yesterday, the evidence from metered verse seems to suggest second-syllable main stress in poems by British writers going back to the 17th century.  So I'm now puzzled about the history, and for that matter about the current British pronuncation -- if you can supply some relevant evidence or references, please let me know.

[Update -- Vicky Larmour writes:

I'm British (English) and have only ever heard it said here as THANKSgiving (not that it is said here that much at all, of course, being as we don't celebrate it). I mentioned this to my British (Northern Irish) husband last night after reading your earlier post and he agreed with me, THANKSgiving is the only current pronunciation he's heard.

I've heard ThanksGIVing but only from American friends.

And from Jo Kershaw:

I'm not sure whether or not this is relevant to the issue of stress patterns in British dialect, but I have noticed that -- although 'police' is pronounced with a second syllable stress by Scottish speakers using formal speech -- the Scots regional form 'polis' definitely has the stress on the first syllable. I've never heard anyone from any part of Britain say UMbrella, though, and the tendency to refer to it as a 'brolly' certainly suggests to me that the second-syllable stress is more normal.

And Andrew Weir:

I don't know about UMbrella, but in Scots initial-stressed ["polIs] for "police" is very common. The entry in the Dictionary of the Scots Language states that this was a common 18thC English pronunciation, so it might well be that the change from initial stress came after the establishment of the American colonies.


[Update #2 -- Mike Pope writes:

I don't think the word-initial stress is strictly regional. It's either socioeconomic, or maybe it's an urban-rural split (which might be saying the same thing, dunno). Out here in Wa(r)shington, one tends to hear pronunciations such as MONroe and DUvall from the residents of those comparatively small burgs, whereas we city slickers tend to say monROE and duVALL. Ditto POlice and INsurance and a host of similar examples that I can't think of at the moment, as usual.

And Arnold Zwicky does what I should have done in the first place, namely checks a bunch of dictionaries. He adds some other useful comments as well:

(In what follows, 1 means first-syllable stress, 2 means second-syllable stress.)

1. First, a quick dictionary search, in volumes I have at home or can find on-line...

British: OED has only 1 (as you have already noted). NSOED4 (1993) has 1, 2 (in that order).

American: AHD4, NOAD2, Free Online Dictionary all have only 2. Merriam-Webster Online gives 2, but "also" 1. Webster Dictionary 1913, online, appears to give only 1; well, it says "Thanks"giv`ing (?), n.". (contrast your citations of Noah Webster (1828) and the 1891 Century, with only 2.)

Historically, this is bewildering. the current situation looks pretty clear: 1 in the U.K. (and, i think, Australia) and the southern U.S., 2 in the rest of the U.S. (what do Canadians say, i wonder.)

2. The connection between 1 for this word and 1 for police/cement/umbrella/etc. in the southern U.S. isn't clear to me. I don't recall the facts in detail -- I do recall that there is literature on this -- but I do remember that there is a tendency for the shifts to 1 to co-occur, but that it's only a tendency. My guess is that the stressings are learned word by word, so that the co-occurrence of 1 for different words is just like the co-occurrence of other dialect features, and is not part of a regular phenomenon of stress-shifting. after all, the affected words don't seem to form a natural class phonologically.

What this means is that there is no reason to expect 1 for "thanksgiving" (which is just what you'd expect from the regularities of stress in English, as you point out) to co-occur with other 1 stressings in the U.K.

3. If 2 is now general U.S. for the holiday, but 1 is what you'd expect for the compound meaning 'the giving of thanks', then i'd expect there to be at least a few people with both stressings, but semantically differentiated.


[Update #3 -- Jarek Weckwerth writes:

For me, as a non-native speaker, the possibility of having the stress on the second syllable was a total surprise. (OK, I do try to follow a British model, and we don't celebrate Thanksgiving in Poland.) So I checked. I'm away at a conference, so I could only consult the dictionaries I have installed on my laptop. The lowdown:

(1) Longman Advanced American dictionary and the Kosciuszko Foundation Dictionary (a large bilingual Polish-English dictionary based on an American model) both give thanksGIVing only.

(2) The Oxford-PWN bilingual dictionary (the largest English-Polish dictionary on the market, based on BrE): THANKSgiving for BrE and thanksGIVing for AmE. (They do give you the American pronunciations if they think they're different, but they're extremely inconsistent, especially in their AmE recordings, or their relation to the transcriptions.)

(3) Now the really interesting thing: the Cambridge English Pronouncing Dictionary (17th editiion) gives only thanksGIVing for BrE, and both options for AmE. Funny, innit? (Screenshot attached. Red = UK, blue = US.)


[Update #4 -- Deborah Pickett writes:

Born and raised Australian, it never occurred to me that "THANKSgiving" was even a possible pronunciation. It isn't a holiday here, but we deal with enough Americans that we aren't surprised by the word. Everyone I've ever known in this country calls the holiday "ThanksGIVing".

Conveniently, I live with a Canadian (from Alberta), and asked her what holiday Americans had last weekend. Result: "ThanksGIVing".


Posted by Mark Liberman at 09:10 AM

November 22, 2007


Mark Liberman is unlikely to announce this, so let me do it for him: he has been elected a Fellow of the American Association for the Advancement of Science (AAAS).  The citation:

Dr. Mark Liberman, professor of linguistics; cited for contributions to phonological theory, the computational analysis of language, and the practical applications and popular understanding of linguistics.

Note "popular understanding of linguistics".  That's Language Log, folks.

(Hat tip to John Lawler.)  Over the years, as Language Log has become such a prominent part of the public face of linguistics (along with Steve Pinker and Geoff Nunberg in his life as a public intellectual outside of Language Log), some of the bloggers have become concerned about the responsibility involved.  I certainly have.  Mark and Geoff started the blog as a way of having fun with serious linguistics for "a general non-linguist readership" (as they put in the introduction to Far From the Madding Gerund).  Well, we have that readership, in very large numbers, and they send us e-mail, in very large amounts -- in my case, unmanageably large amounts.  I kid that I have trouble coping with my new life as a semi-public intellectual, but the mail is daunting, especially since so much of it is leads to things the writers hope (and, in some cases, expect) we will blog about.  I do my best to try to mix up geeky stuff about technical linguistics with cartoons that have some point of linguistic interest, commentary on language in the media and public life, formulaic language, errors of various types, reactions to things I've been reading, and so on, but I get around to responding to only a small part of my mail.  So I feel I'm letting the field of linguistics down.

We do get some good press.  Most recently, a nice little piece in the 17 November New Scientist that quotes both Mark and me.  I'm quoting the whole thing here, because if you go the the NS site, unless you're a subscriber, you can view only the very beginning of the piece:

DO YOU long for the not-so-distant days when people used "hopefully" correctly - as in "to travel hopefully" - rather than to modify a whole sentence, as in "Hopefully, today I will win the lottery"? Were you amazed and appalled when the expression "Not!" was invented in the late 1980s? Do you feel like stabbing yourself with a fork when a brand-new word like Stephen Colbert's "truthiness" becomes popular?

"Were you appalled when the expression 'Not!' was invented in the late 1980s?"

If so, you might be surprised that according to the Oxford English Dictionary, all these questions are based on a false premise. "Hopefully" has been used to mean "let us hope" since at least 1932. "Not!" has been used in the Wayne's World sense as far back as 1860. And "truthiness" was used in 1824.  [AZ note: but not in the Colbert sense.]

Being convinced of the newness of a word, meaning, construction or phrase that is in fact long established places you among the victims of the "recency illusion", one of the hazards that plague people who take an interest in language.

Stanford linguist Arnold Zwicky coined the term, and he defined it on the Language Log linguistics blog as: "the belief that things you have noticed only recently are in fact recent". A typical example discussed on the blog involved a blogger who believed that the sports expression "pulled within two points" was a recent invention, probably by US sportscaster Marv Albert. In reality, variations date back to the 19th century.

Zwicky's recency illusion has alerted Language Log bloggers to other word-watching pitfalls, including the infrequency illusion: the belief that a feature you have just noticed is rarer and more notable than it is. Then there is the antiquity illusion: that something familiar to you has been common for a long time. Or the out-group illusion: the tendency to blame undesirable language trends on some other group of people. And finally, there is the adolescent illusion: a "kids these days" version of the out-group illusion.

A gender-specific case of the adolescent illusion came to light when Mark Liberman, a linguist at the University of Pennsylvania in Philadelphia and co-founder of Language Log, received an email expressing surprise that George W. Bush had used the words "like totally". As many people do, the emailer associated "like totally" with young people, especially young women and girls. Liberman consulted various collections of phone-conversation transcripts, and found that "like totally" was widely used by men and women of all ages - and middle-aged men actually use it more.

Why are there so many ways of guessing wrong? Much of it is to do with a tendency to overestimate the importance of our own experience: if something is new to us, we assume it is new to everyone. And perhaps the universality of language exaggerates the effect. We might not put so much trust in our gut feelings about archaeology or nuclear medicine.

(Hat tip to Ben Zimmer.) 

Back to Mark and the AAAS.  As far as I can figure, Mark and I are the only Language Loggers to have been elected as Fellows.  Over at the American Academy of Arts and Sciences, there are four of us: Lila Gleitman, Barbara Partee, Geoff Pullum, and me.  And at the Linguistic Society of America, four again: Lila, Barbara, Sally Thomason, and me.  So we're doing pretty well on Fellowships.  (These are all elections for life; they are honors.  A number of us have held one-year fellowships to do research at institutions like the Center for Advanced Study in the Behavioral Sciences and the Stanford Humanities Center.  I know, it's confusing that the word fellow gets used for two different kinds of academic awards.)

Posted by Arnold Zwicky at 01:09 PM

A Thanksgiving discussion

When columnists and feature writers need to find something to say about a seasonal topic, they naturally turn to linguistics, and sometimes even to linguists. For example, Annie Groer in the Washington Post ("Th(angst)giving", 11/22/2007) quotes Deborah Tannen at length about how "different conversational styles among families and friends can create problems", and also about the problems of "adult children returning home and 'always feeling 12 years old'".

That headline writer's clever interpolation of angst into Thanks- feeds right into John McWhorter's essay in the New York Sun ("THANKS-giving", 11/21/2007), which starts:

You know how you can tell that we don't truly think of Thanksgiving as being about thankfulness anymore? Which syllable most of us put the accent on. Most people say thanks-GIVING. But think about it -- you don't say horse-RACING; you say HORSE-racing. BABY-sitting, not baby-SITTING. [...]

The accent has changed as our concept of the holiday has. THANKS-giving would convey that we were giving thanks. When we say thanks-GIVING we are just uttering a string of sounds only vaguely connected to what the words thanks and giving mean. It's rather like ice cream: we don't really conceive of the stuff as "iced cream;" in our heads it's more like a single word "eyescream."

John is a linguist himself, and a member of the Language Log family. And the second most important Thanksgiving tradition, after the sacramental meal itself, is animated discussions among family members about everything from football games to stuffing recipes. So I'm going to take up brother John's point and run it back the other way.

I agree that thanksgiving has an unexpected stress pattern, and I agree that this is somehow connected to its status as word whose meaning is not entirely predictable from the meanings of its parts. But this isn't enough to explain the stress shift -- there's no stress shift in other lexicalized compounds like shoemaker, foot-soldier, cock-fighting, etc.. And whatever casued the stress to shift in thanksgiving, it happened before the Thanksgiving holiday became an American ritual in the middle of the 19th century, and it also affected people in England, Scotland, Ireland, Australia and other places that don't participate in our turkey/stuffing/cranberry-sauce culture.

English-language writers have been leaving the space out of thanksgiving since the 16th century, suggesting that the compound was already becoming lexicalized even if the meaning still had more to do with giving thanks than with eating turkey and watching football. From the OED:

1539 BIBLE (Great) 1 Tim. iv. 4 For all the creatures of God are good, and nothing to be refused, yf it be receaued with thankesgeuynge.
1535 COVERDALE Ps. xxxix. [xl.] 3 He hath put a new songe in my mouth, euen a thankesgeuynge vnto oure God.
1641 Nicholas Papers (Camden) 10 It was resolved that there shalbe on ye 7th of September next a publique thanksgiving for this good accord betweene ye 2 nacions.

But how can we tell what the stress pattern was, in those days before voice recording? It's possible that some early dictionaries mark the stress -- but Samuel Johnson's 1755 opus doesn't even have thank or thanks in it, much less thanksgiving, and I don't have any other early dictionaries on my shelf. In any case, a better way is to look at the way the word is placed in metered verse.

Unfortunately, however, the traditional norms of accentual-syllabic verse in English mean that we can't use binary meters (iambic or trochaic) for this purpose. In such verse, thanksgiving has always been set with the second syllable in a strong position.

We can see this in these lines of iambic pentameter written by Joseph Beaumont (1616-1699) , from Psyche, Canto XIV, "The Death of Love":

      .  # .   #  .     #  .   #  . #
1045 Disloyal Murmurs; Pulpit Villanies;
      .      # .  #       .    #  .     # . #
1046 Curs'd Holy Leagues; and zealous Profanations;
      .   #  .    #       .    # .    # .    #
1047 Sin-fatning Fasts; Thanksgiving solemn Lyes;
      .    #  . #     . #   .    # .  #
1048 Bold Sacrilege; rebellious Reformations;

And some time in the early 19th century Samuel Taylor Coleridge (1772-1834) wrote in the same meter:

26 He knows (the Spirit that in secret sees,
27 Of whose omniscient and all-spreading Love
28 Aught to implore were impotence of mind)
29 That my mute thoughts are sad before his throne,
     . #       .   #  .    # .    #    .   #
30 Prepar'd, when he his healing ray vouchsafes,
     .    # .    #   .   #     .    #  .   #
31 Thanksgiving to pour forth with lifted heart,
32 And praise Him Gracious with a Brother's Joy!

The LION database offers us 1202 examples of  the word thanksgiving in English poetry, and a significant fraction of them are in such iambic meters. While I've only checked a small fraction of those, I'm confident that all will show the same alignment, which is consistent with the modern pronunciation. But in fact this gives us no evidence at all about the relative stress of the first two syllables. We can see this by observing that in iambic verse, the same metrical alignment will apply to compounds like horse-racing and psalm-singing:

Thus Walter C. Smith, 1902, "Provost Chivas":

205 And he must choose between this chance
206 And being led a bonny dance,
207 Through courts of Law, for crimes and debts,---
208 Hame-sucken, stouthrief, common theft,
209 Smuggling, and heavy claims he left
     .   #   .   #    .     # .    #
210 For gambling and horse-racing bets.

John Wilson (1785-1854), "The City of the Plague":

252 Ha! thou'rt a scrupulous robber! and the sound
    .    #     .    #  .       #    .       # .  #
253 Of these psalm-singing, shrill-voiced choristers

So what good is the metrical evidence? Well, we need to look at ternary meters -- ones with anapestic or dactylic feet -- where in principle either the first or second syllable of these three-syllable compounds can be aligned with the ictus (main beat) of the foot. That seems to be because words like racing are happy to fit in the two weak positions -- but whatever the reason, both alignments of words like horse-racing are found.

For example, in the anapestic poem "A Case of Conscience" by John Godrey Saxe (1816-1887), we get one of each. (I've omitted the full scansion of the relevant lines, and only indicated the alignment of the crucial word by using boldface for the ictus syllable. That doesn't necessarily imply how the word should be read, merely how it scans, i.e. aligns with the meter.)

1 Two College Professors,---I won't give their names
2 (Call one of them Jacob, the other one James),---
3 Two College Professors, who ne'er in their lives
4 Had wandered before from the care of their wives,
5 One day in vacation, when lectures were through,
6 And teachers and students had nothing to do,
7 Took it into their noddles to go to the Races,
8 To look at the nags, and examine their paces,
9 And find out the meaning of "bolting" and "baiting,"
10 And the (clearly preposterous) practice of "waiting,"
11 And "laying long odds," and the other queer capers
12 Which cram the reports that appear in the papers;
13 And whether a "stake" is the same as a post?
14 And how far a "heat" may resemble a roast?
15 And whether a "hedge," in the language of sport,
16 Is much like the plain agricultural sort?
17 And if "making a book" is a thing which requires
18 A practical printer? and who are the buyers?---
19 Such matters as these,---very proper to know,---
20 And no thought of betting, induced them to go
21 To the Annual Races, which then were in force
22 (Horse-racing, in fact, is a matter of course,
23 Apart from the pun) in a neighboring town;
24 And so, as I said, the Professors went down.
61 The race being over, quoth Jacob, "I see
62 My wager is forfeit; to that I agree
63 The Fifty is yours, by the technical rules
64 Observed, I am told, by these horse-racing fools;
65 But then, as a Christian,---I'm sorry to say it,---
66 My Conscience, you know, won't allow me to pay it!"

[It's relevant, but secondary, that this matches the effect of the stress-shift in pre-nominal modifiers known colloquially as the "thirteen men" rule, after the stress shift in pairs like "The answer is thirteen" vs. 'Thirteen men on a dead man's chest".)

If we check the alignment in ternary verse of such words -- compounds combining a monosyllable with a trochee -- we find that the ictus falls on the first syllable most of the time. A nice example is a lyric by the pseudonymous "Claudero", entitled "On St. Crispin's day, October 25th, 1763", in which the word shoemaker occurs five times, four of them with first-syllable ictus and one with second-syllable ictus:

1 Come let us prepare,
2 Jolly hearts ev'ry where,
3 Each shoemaker sing and be merry,
4 Let mirth now abound,
5 And bumpers go round,
6 Of Claret, Champaign, and Canary.
19 We still bear in mind,
20 And show to mankind,
21 Our loyalty by a procession,
22 To Crispin the great,
23 Who left kingly state,
24 And liv'd in a shoemaker's station.
37 The King on the throne,
38 The Prince too his son,
39 Without our Craft's friendly assistance,
40 They bare-foot might go,
41 Thro' frost and thro' snow,
42 If shoemakers were at a distance.
61 Our very great care,
62 Is to pleasure the fair,
63 Whom shoemakers fit always neatly,
64 Our sweet-hearts and wives,
65 We love as our lives,
66 And by them are loved compleatly.

67 To sum up the whole,
68 Let us Crispin extol,
69 And be of his virtues partakers;
70 Then all will applaud,
71 And sing loud as Claud,
72 The fame and great worth of shoemakers.

The same is true for compounds of the same form written with hyphens. Here's Charles MacKay's mid-19th-century "My Heart's in the Highlands":

9 My heart's in the Highlands, the long summer day
10 Breathing the health-giving breeze on the brae,
11 And braving the tempests that gather below;
12 My heart's in the Highlands, wherever I go.

Or Ambrose Bierce's "In His Hand":

1 De Young (in Chicago the story is told)
2 "Took his life in his hand," like a warrior bold,
3 And stood before Buckley---who thought him behind,
4 For Buckley, the man-eating monster is blind.
5 "Count fairly the ballots!" so rang the demand
6 Of the gallant De Young, with his life in his hand.

Of John Betjeman's 1958 "The Sandemanian Meeting-House in Highbury Quadrant"

13 Away from the barks and the shouts and the greetings,
14 Psalm-singing over and love-lunch done,

There are a few examples with second-syllable ictus, like this one in Dugald Ferguson's 1912 "Verses Addressed to My Brother on Exchanging Boots"

17 But when I was out, I declare---
18 Cock-fighting this fairly confutes---
19 To leave me your shabby old pair
20 In lieu of my best Sunday boots.

21 I have read of the wars of the Turk---
22 Their slaughters, their sacks and pursuits;
23 But whoever heard tell of such work---
24 To rob a poor man of his boots?

But such examples are distinctly in the minority, amounting to a tenth of the settings or even fewer. When we look at thanksgiving in ternary meters, the situation is completely different. A quick scan of LION turns up dozens if not hundreds of cases -- I'll spare you most of the 30 or so that I've looked at -- and every one of the examples that I've seen so far has second-syllable ictus.

This is true of American poems referring to our secular holiday, starting with Lydia Maria Francis Child's 1854 "The New-England Boy's Song About Thanksgiving Day":

1 Over the river, and through the wood,
2 To grandfather's house we go;
3 The horse knows the way,
4 To carry the sleigh,
5 Through the white and drifted snow.

6 Over the river, and through the wood,
7 To grandfather's house away!
8 We would not stop
9 For doll or top,
10 For 't is Thanksgiving day.
21 Over the river, and through the wood,
22 To have a first-rate play---
23 Hear the bells ring
24 Ting a ling ding,
25 Hurra for Thanksgiving day!

But it's also true for much earlier British lyrics, such as Alexander Brome's 1661 "A new Diurnal of passages more Exactly drawn up then heretofore":

71 His Excellence had Chines and Rams-heads for a present,
72 And his Councel of War had Woodcock and Pheasant.
73 But Ven had 5000. Calves heads all in carts,
74 To nourish his Men and to chear up their hearts.
75 This made them so valiant that that very day,
76 They had taken the Town but for running away.
77 'Twas Ordered this day, that thanksgiving be made,
78 To the Round-heads in Sermons, for their beefe and their bread.

Or "A Thanksgiving Hymn" by John Byrom (1692-1763):

1 O come, let us sing to the Lord a new Song,
2 And praise Him to whom all our Praises belong;
3 While we enter His Temple with Gladness and Joy,
4 Let a Psalm of Thanksgiving our Voices employ;
5 O come, to His Name let us joyfully sing;
6 For the Lord is a great and omnipotent King:
7 By His Word were the Heav'ns and the Host of them made,
8 And of all the round World the Foundation He laid!

Or "Britons Awake" by Joseph Mather (1737-1804 )

11 He stretched the last link to collect a vile crew,
12 To render thanksgiving where stripes were most due.
13 He rallied his forces his cause to maintain,
14 At Bang-beggar Hall he assembled his train,

Or Bernard Barton's 1837 "The Morning is Breaking":

25 The stars in their courses
26 Now marshal their forces;
27 The moon in pale splendour walks up the blue sky;
28 While Philomel's numbers,
29 'Mid earth's placid slumbers,
30 Seem lauds of thanksgiving ascending on high.

31 Oh! thus, when stars glisten,
32 With none near to listen,
33 Should spirits awaking their melodies raise
34 To Him who sleeps never,
35 But merits for ever
36 Glad songs of thanksgiving, and honour, and praise.

There are many cases where it would be easy to adjust the setting of thanksgiving to put the ictus on the first syllable. Thus William Canton's 1887 "Christmas Eve" mixes binary and ternary feet, as anapestic meters often do -- thus:

    .     .  #    .    .   #  .    .   #   .   #
13 Near the rails of the chancel the crib was seen,
    #     .     #  .     .    #  .    #
14 Roofed and clustered with winter-green;

and so it would have been easy for him to replace

46 The joy of that birth, the thanksgiving of song.


       *The joy of that birth, the thanksgiving song.

But he didn't.

Likewise in Coleridge's dactylic hexameter "Hymn to the Earth",  thanksgiving is set with second-syllable ictus:

12 Into my being thou murmurest joy, and tenderest sadness
13 Shedd'st thou, like dew, on my heart, till the joy and the heavenly sadness
14 Pour themselves forth from my heart in tears, and the hymn of thanksgiving.

He could well have chosen to write something like

*Pour themselves forth from my heart in tears, and the thanksgiving music

But he didn't. I think that it should be possible to use thanksgiving with first-syllable ictus in ternary verse in English -- if you find an example, please tell me -- but so far, after looking at dozens of poems from the mid 17th-century onwards, written by poets from the British Isles and Australia as well as the U.S. and Canada, I've come up empty. This is in striking contrast to the treatment of words like shoemaker and psalm-singing, for which first-syllable ictus in ternary verse is overwhelmingly more common.

I conclude that thanksgiving has generally had second-syllable main stress in English at least since the 17th century. I don't know why this shift happened -- but I believe that we can rule out John McWhorter's theory that the stress shift occurred because of recent developments in an American cultural practice that was proposed by Sara Josepha Hale and the Boston Ladies' Magazine in 1827, and established by Abraham Lincoln after the battle of Gettysburg in 1863.

On the other hand, Sara and Abe deserve credit for lots of other good things, including the meal we'll be sharing later today, and the tradition of friendly debate on the conversational topics that we structure our lives around. So John, do you think that the Eagles have a chance to keep it close against the Patriots?

Posted by Mark Liberman at 10:49 AM

November 21, 2007


From Grant Barrett, on behalf of the American Dialect Society:

Nominate your 2007 Word of the Year

The American Dialect Society's word-of-the-year vote--the longest-running such vote anywhere--takes place in Chicago in January at its annual meeting. The academic society is now accepting word-of-the-year nominations at Word of the Year is interpreted in its broader sense as "vocabulary item"--not just words but phrases. Your nominations do not have to be brand-new, but they should be newly prominent or notable in the past year, and should have appeared frequently in the national discourse. The word-of-the- year vote is not a formal induction of words into the American language, but a whimsical affair. Nominate accordingly.

Please don't mail to me, or to Grant, but use the address above.

Posted by Arnold Zwicky at 08:18 PM

What old linguists do after they retire

The energetic staff here at Language Log Plaza tries to deal with all aspects of language life, including geezerdom. For some unknown reason, I was assigned to the Geriatric Desk and I've posted about how geezerdom feels in the past. At this holiday season, it might be appropriate to point out what us old guys are thankful for, along with some advice to those approaching that stage of life.

Most of us work hard, earn a living, reach 65 or so, then retire. I know. I did this twelve years ago. But I had the mistaken notion that when I retired, I'd move to a comfortable setting, take up oil painting, enjoy Montana's beautiful scenery, do a little fishing, take lots of trips, and lounge around in my easy chair. I deserved it, didn't I? But it didn't take long for me to realize that I sadly missed my former work life, or at least parts of it.

One of the things I found that I missed most was that I no longer got to help grad students with their linguistic research papers and dissertations. That was real fun. The classroom was okay too, but I really liked individual planning and teaching the most. Other things, like serving serving on promotion and tenure committees, taking my turn as department chair, and the seemingly endless bickering about smallish things in departmental politics, were a lot easier to give up. It may surprise some to learn that there is often relatively little sense of collegiality among faculty members working in the same department, so even my best efforts to discover or develop something like teamwork and closeness showed me how hard it was to find pleasure there. In a few cases, yes. In most cases, no.I found that my fellow faculty members were often lost in their own work and too busy competing with each other. And, if you're trying to find a reasonable fun context, the university administration is usually not a great place to look. For me it was the individual grad students who energized me most. And now I don't have them anymore (note: my dialect permits me to use the positive "anymore" and I'm even permitted to prepose it, but I defer to those who find this construction odd -- so anymore I try not to use it).

But wait. When I retired from the classroom twelve years ago, electronic communication existed but it hadn't yet developed into the way we know it today. Things have changed. Today my classroom is the Mac in my home office, where I spend hours each day communicating with grad students from various parts of the world. Currently I'm helping an Iranian student with her Master's thesis and a Malaysian student with her dissertation proposal. And there are others as well. The most frequent messages I get are from students who have recently discovered linguistics and want some help about how to go about studying it in grad school. They see my website, contact me, and I eagerly rise to the bait. In short, I haven't lost my opportunity to work with individual grad students at all, thanks to email.

I also continue teaching outside of the classroom by writing books, largely directed to students, about my area of linguistics. These also generate correspondence, questions, and problems to solve. Meanwhile, I still do some consulting with lawyers on civil and criminal law cases, but not as much as I used to. And, of course, there is Language Log, where I find lots of individual colleagues and great readers. So even in retirement, my work day is as full as I want it to be.

So although I'm a happy retiree, "retirement" is an odd way to describe how I spend my current days. In fact, it's probably a very wrong word for it. Most people don't give much thought to what they'll do after they retire -- or, like me, they have some pipe dreams about it. Going on cruises, taking up oil painting, golfing, or playing shuffleboard in Florida hold no interest for me. Before I retired I had some pretty false notions about what my retirement from teaching would be like and so I encourage people to think about this realistically long before they take that step. I'm thankful to be able to continue doing some meaningful work. Personally, I want to contine to flunk the course in Retirement 101 (as conventionally defined). So far I'm grateful that I've done pretty well at it.

For me that's something to be grateful for this Thanksgiving.

Posted by Roger Shuy at 11:13 AM

Positive "let alone"?

In the third paragraph of a Reuters wire service story on the latest Reuters/Zogby poll, John Zogby, the pollster, is quoted as saying "This race is just beginning, let alone all over." (John Whitesides, "Democratic 2008 presidential race tightens: Reuters poll", 11/21/2007).

I'm used to positive anymore ("Thanksgiving is becoming so commercialized anymore"), but this use of let alone (apparently) outside of a polarity context -- a negation or question -- is new to me.

If I search Google News for "let alone", most of the hits involve traditional polarity contexts:

Truck 'not fit for animals, let alone people'
Are You Sure You Know The Best 2? Let Alone 1?
...David Beckham cannot take defenders on at the best of times, let alone when he is no more than semi-fit.
Gene Steratore Is No Ed Hochuli, Let Alone Hulk Hogan
Proton does not even have this at home, let alone overseas.
...the market still doesnít appear to get dual-core products, let alone the quad-core offerings Intel and AMD will be fighting over. inexperienced and incompetent they don't know how to file a motion, let alone properly get evidence examined.

As usual, the negation or question is sometimes only implicit:

Flintoff: too drunk to throw, let alone catch
You barely know what goes into your mouth sometimes, let alone a deer's mouth.
But nowadays UK is having a hard time beating Vanderbilt, let alone Tennessee...
I will be mightily surprised if any of our strikers are sold, let alone Keane or Defoe.

And occasionally, the polarity construal is more subtle:

"How they managed to stretch and kvetch it out to three hours defies description and challenges the basics of human decency, let alone good TV," the Chicago Tribune's Marilynn Preston wrote at the time.

This example doesn't trouble me at all -- I suppose that I'm interpreting "challenges the basics of human decency, let alone ..." as something like "is not consistent with the basics of human decency, let alone ..."

And after scanning thirty or forty curent let alone uses, I didn't find any others that troubled me. But Zogby's sentence "This race is just beginning, let alone all over" gives me a classic grammatical WTF reaction.

I wonder whether Zogby might have meant something like "this race is not past its beginnings, let alone all over" -- and if so, why I seem to have such difficulty following him down that path. Alternatively, perhaps he represents a speech community that no longer sees "let alone" as a polarity item at all.

Or perhaps my view of how to analyze let alone is too hasty. If you can offer relevant examples or anaysis, let me know.

[Update -- on reflection, it occurs to me that the issue here might have to do with the interpretation of just. In most contexts, just and merely seem similar to barely and hardly:

He has barely/hardly three friends.
He has just/merely three friends.

But (for me at least) they contrast sharply in whether or not they provide an appropriate context for polarity-sensitive items:

He has barely/hardly any friends.
*He has just/merely any friends.

Perhaps John Zogby (or the reporter who transcribed the quotation, perhaps as loosely as journalists generally do) puts just in the polarity-context category.]

[Update with mail from readers -- David N. writes:

i had my own grammatical wtf moment with this just last night. i was on the subway with my girlfriend and she said something along the lines of "i have enough of my own problems, let alone yours". it totally threw me off.

Elise Kendall writes:

After reading your Language Log post this morning about positive 'let alone' I blogged my way over to Evolving Thoughts, where I found the following sentence...

"It is too easy to come up with "possible scenarios", let alone possible adaptations."

Is this another example of what you were talking about in your post? It "feels" wrong to me when I read it but not to the same extent as "This race is just beginning, let alone all over".

Often, it seems that "let alone" is licensed by a paraphrase or implication of the matrix clause, rather than by the clause as it stands. Thus someone who says "I have enough of my own problems, let alone yours" presumably means something like "I can't deal with my own problems, let alone yours".

As for "too", there's a use that licenses polarity items, as in the example cited above "too drunk to throw, let alone catch", which implies "so drunk that (s)he can't throw, let alone catch". The trouble with the Evolving Thoughts phrase is that the negative implication doesn't seem to have the right scope -- presumably it's something like "it's so easy to come up with 'possible scenarios' that success doesn't mean anything". The author meant that this is even more true for "possible adaptations", but I agree that the sentence as written doesn't quite work.]

[Update 12/20/2007 -- David Bull writes:

I think this might be relevant.

Here in the UK we had an advert that ran on TV a couple of years ago for a mortgage company. I can't remember which one though, I think every time I saw it I was having my own WTF moment.

The advert consisted of a group of attractive thirty-somethings in business suits in a bar or diner somewhere talking about switching their mortgaging to a different provider. One of them pipes up about how good a deal you can get from Mortgage Company X and how they give you a lump sum cashback when you transfer your current mortgage to them.

I can't remember what the comment refers back to, but the advocate of Mortgage Company X finishes the advert by saying, "Then you could buy that horse, let alone back it."

I always thought that sounded strange, although I understood the sentiment, and thought it would have sounded better if he had said, "You couldn't back that horse, let alone buy it," but then of course it wouldn't have made any sense in the context of the advert.

I can offer no analysis, I'm afraid, other than the scriptwriter's tight timescale perhaps.

Oh, and I'm a little behind with my RSS feeds.


Posted by Mark Liberman at 09:45 AM

Crazy English

Victor Mair sent in a link to an article about Li Yang, the English-teacher-as-cult-leader (Wu Nan, "Is Crazy English Here to Stay?", China Digital Times, 10/16/2007):

Shanghaiist has collected reports and videos of Li Yang teaching his Crazy English. If you watch the video by muting the sound, you may think you're watching a religious gathering with fervent worshipers waving their arms in spiritual communion.

Victor comments that Crazy English "has both Hitleresque and Maoist aspects, plus quite a bit of Falun Gong cultishness as well". The religious/political aspects are even clearer if you leave the sound on, in my opinion:

[Linked in the Shanghaiist article, that video was recorded at the Hunan University of Science and Engineering on May 20, 2007, at 6:30 a.m.]

Wu Nan's article draws out one unexpected aspect of this style of English language instruction -- Chinese nationalism:

If you carefully listen to one class Li taught, you'll sense something in his speech -- nationalism. Li makes jokes about Japanese pronunciation of English and the sounds Japanese make while bowing. Laughing at the Japanese is a common pastime for nationalists in China. Li also talks about how Hong Kongers look down on people from mainland China who go shopping there. Li says that in that situation, he would pretend to be Chinese-American speaking Chinese with an American accent to scare the Hong Kongers.

You won't be surprised to learn that there's also a financial aspect:

One thing that should also not be neglected is the profits Li has made from his students. Even though there's no data from the official "Crazy English" website, the 18000 Yuan (USD 2300) price he's charging for a "diamond seat" to his gatherings gives us an idea.

The article ends with a (translated) quotation from Wang Shuo, the famous Beijing colloquial writer:

I know that Li Yang invented a method of learning English called "Crazy English"; I know that his ambition is to let 300 million Chinese open their mouth and speak English, and then to let 300 million foreigners learn Chinese; I also know that through practicing English he turned inferiority to extreme self-confidence; I also know that he is very patriotic. This sounds good, but after watching the film, my personal reaction to him is uncomfortable. ... I do not know how many people in the end master the English language by Li Yang's proficiency, not just shout in the public, "I love disgrace." The posture of the men, women and children practicing 'Crazy English' is more like a massive pledging ceromonies. Li Yang is more interested in arousing people, or I'd rather say inciting mob. I have seen such incitement, it is an ancient voodoo, gathering a large group of people, making them excited through your words, then producing the illusion of strength like the waves of ocean. At that moment, the poorest people will suddenly feel invincible. This is not so cheering as it is more like to fool people. Many of things are done this way in China --after a dream, all the problems are solved. Li Yang was trying his best to disseminating of knowledge, beating his chest and stamping his feet, while those who were welcoming his enthusiasm with stimulated faces looked very foolish. He publicized patriotism, which makes him sounds to me like a racist. His clothes, behavior and attitude for poeple creates him a success figure, which inevitably makes you to think of a successful crook. [...] To to attract full house applause he frequently use those warm-up words which are really blatant racial supremacy clamor. This is not humorous. I do not believe that our country's being strong is for the purpose to spread your racism.

According to the Wikipedia article, the Crazy English enterprise was founded in 1994, and now has 20 million practitioners, so there has been plenty of time to evaluate its effectiveness as a language teaching method.

[Update -- Randy Alexander writes:

The video you linked isn't Li Yang (as he is known here). It's just a bunch of college students recreating the atmosphere of one of his sessions. The first (long) video in the article is him, and is very typical of his English cheerleading.

In October 2004, Li Yang came to Jilin City to promote some new books and tapes he came out with. His company has published many many books, tapes, and other learning materials over the years. I was curious about seeing what all the hoopla was about, so I went to one of his sessions in a huge arena filled with about 2,000 or so people. He invited several local English school headmasters up on the stage to talk about how to learn English. It was pre-arranged that they would come up and talk, but he was giving the appearance that he just kind of randomly pulled them out of the audience. Taking advantage of that false appearance, I went up on stage too (nobody stopped me!), and he rolled with it and soon had me giving advice to the throng as well.

Over the next few days, I ended up having a few meals with him and going up on stage for his remaining sessions in the city (this time invited), and he even had me participate in a book signing at a big bookstore. It was a little weird signing my name in his book for people, but sitting next to him there I got to see some interesting things. Many of the college age students would tell him, with almost indecipherable English pronunciation, that they learned English exclusively from his books and tapes.

His publications are thrown together collages of exercises, tips, and inspirational sentences and stories. The content of his publications focus on quantity rather than quality, and a good deal of the material is lifted from other books. I asked him, "you just take things out of other books and put them in your books?" He said, "sure, nobody cares about that in China".

His company is pretty big; when he was here, he said it had an income of over 10 million US dollars per year. He also said that he's really burned out traveling around giving his charismatic speeches, but he can't just stop, because that's a major part of the company's income. He is very different offstage than on. He is quiet, reserved, and has a deadpan sense of humor. To his credit, his English is in fact rather good. Unfortunately on stage he ends up obscuring his good pronunciation by talking in all kinds of ridiculous accents for comedic effect, and the audience just blindly follows along, caught up in the rapture.

Students have often told me about "the Li Yang Effect": after attending one of his cheerleading sessions, they study fervently for three days, and then the inspiration wears off. They say it is completely temporary and has no effect on their English ability.

So the right analogy is not to Hitler but to Elmer Gantry?

There is more discussion here and here.]

Posted by Mark Liberman at 07:36 AM

"My confidence damaged," Chancellor doesn't say

The Chancellor of the Exchequer of the United Kingdon, Alistair Darling, was being interviewed on BBC Radio 4 this morning about the Great Missing Personal Data Scare (a government clerk put 25 million people's personal details on a CD ROM and sent it to an audit department in the ordinary post, and it never turned up at its destination). I think the Chancellor has been very bold, forthright, and honest about the situation, with a clear and forthright genuine data blunder apology, but the job of the interviewer this morning was to get him in an embarrassing posture, and the technique used was to ask him very pointedly, "Doesn't it damage your confidence, that a thing like this could happen?"

The Chancellor replied, "Of course it damages... confidence." (The gap there was a sort of 200 milliseconds of silence plus what I thought was perhaps a very tiny gulp. If I had a phonetic analysis laboratory beside my breakfast table the way Mark Liberman does, I'd have a sound clip and a spectrogram for you.) The interviewer stopped him instantly and asked, "Including your own? It damages your own confidence?" And the discomfited Chancellor responded, sounding very slightly nervous now, "Of course it... does."

Two little linguistic tricks of evasion there: substituting the indefinite NP confidence for the definite NP my confidence (permissible because non-count noun heads of indefinite NPs do not require a determiner), and saying it does for it damages my confidence (permissible under the Verb Phrase Ellipsis construction because the interviewer had just used a VP with the needed sense, "damages X's confidence"). In each case he had only fractions of a second for planning (he may even have decided to use does while he was starting to pronounce the initial [d] of damages), yet he pulled it off. No newspaper can truly report this morning that he said "My confidence has been damaged" (which would of course weaken him politically), because he didn't; he avoided it both times. Isn't language wonderful?

You know, just between you and me, I sometimes worry that there is a naive view loose out there — most students come to linguistics believing it, and there appear to be some professional linguists who regard it as central and explanatory — that language has something to do with purposes of efficiently conveying information from a speaker to a hearer. What a load of nonsense. I'm sorry, I don't want to sound cynical and jaded, but language is not for informing. Language is for accusing, adumbrating, attacking, attracting, blustering, bossing, bullying, burbling, challenging, concealing, confusing, deceiving, defending, defocusing, deluding, denying, detracting, discomfiting, discouraging, dissembling, distracting, embarassing, embellishing, encouraging, enticing, evading, flattering, hinting, humiliating, insulting, interrogating, intimidating, inveigling, muddling, musing, needling, obfuscating, obscuring, persuading, protecting, rebutting, retorting, ridiculing, scaring, seducing, stroking, wondering, ... Oh, you fools who think languages are vehicles for permitting a person who is aware of some fact to convey it clearly and accurately to some other person. You simply have no idea.

Posted by Geoffrey K. Pullum at 05:13 AM

November 20, 2007

secondo fiddle

Geoff is right to say that Umberto Eco's use of "according to me" is odd in English, but it's a natural mistake for a native speaker of Italian, in which secondo X can mean either "according to X" or "in X's opinion." I've been tripped up by this construction myself -- not in English, but in French, where I have heard myself using selon moi (which can only mean "according to me") by analogy to the Italian secondo me, thereby assuming an authoritative status that it would doubtless be more judicious to leave to others to invest me with.

Posted by Geoff Nunberg at 08:20 PM

The locavore chronicles: the birth of a word

After locavore ('one who endeavors to eat only locally produced foods') was named New Oxford American Dictionary's Word of the Year, the coiner of the word, Jessica Prentice, was kind enough to respond to my post here with a brief email explaining how she came up with it. Now we get the full story in a guest column on OUPblog. It's a fun read &mdash and it turns out that an item on Language Log let her know she had really arrived:

And just to put icing on the cake, someone has turned my picture into a lolcat-style "lolcavore". I have to admit I'd never even heard of lolcats before, but now I am just so proud... so very very proud.

Posted by Benjamin Zimmer at 05:01 PM

According to Umberto Eco

I was surprised to hear Umberto Eco, interviewed on BBC Radio 4 this morning, using the phrase according to me several times. He seemed to think it is synonymous with "in my view", or "the way I tell it". It is not.

According to X has the peculiar property of only being properly used by people other than X. We can say, "According to her, the Jews control world banking", and we mean that this global banking stranglehold stuff is her story about the Jews, and we are by no means committed to it.

The constraint is (somewhat) analogous to a similarly odd fact about lurk: you only describe other people's actions using it. If I wait around outside your office trying not to be seen (not that I would, but I could), someone might say "Geoff Pullum has been lurking outside your office", which is normal use of the language describing slightly nefarious behavior on my part. But if I say "I'm planning to come and lurk outside your office", that would be deeply weird in a linguistic way, unless it was a joke.

I have only ever heard according to me from foreigners who have learned English imperfectly. One tends to think of Umberto Eco as a sort of polymathic cultural and linguistic European academic superstar who would spot this sort of subtlety. But no, there he was, talking about what he says in his new book, and saying "according to me". Stop it, Umberto. Get a clue. This is not an idiom to use about yourself. Use it when imputing views to others, especially (though not exclusively) when you are skeptical about those views. Never use it to say that something is your own view.

There, now I've been prescriptive. See what you made me do? Still, it's in a good cause. Think of me as a language teacher. An EFL instructor to a polymathic cultural and linguistic European academic superstar.

Updates, November 21: just in case you thought I had overlooked the following points, let me assure you that I was aware of them, and you do not need to join the swarms of people who are flooding the mail servers with messages pointing them out to me.

  1. The net forum usage — "I usually only lurk on this forum, but I'd like to make one comment" — is of course a special one, with jocular origin. Those of you who are writing to me about it, please don't; it underlines my point rather than refuting it. People are coining words like "delurk" and "relurk" now, to describe (I assume) coming in out of the lurker's shadows and retreating back into them. If you were just in the middle of writing to me about this, please relurk.
  2. There is a disanalogy between the two expressions I discuss in that lurking in its original sense describes nefarious activity or impure motives, so there is a reason for the normal practice of not using it about ourselves. This aspect is not there with according to me: being an information source is nothing to be ashamed of. It's just that English speakers normally use according to X for attribution to information sources external to themselves.
  3. I simplified the constraint to make the point more briefly, but in fact there is a more general formulation which is more interesting. It is not just about the speaker, because the same oddness arises in indirect discourse. That is, it is not just ?According to me, US policy is all wrong that is odd; the same oddness is there in ?John explained that according to him, US policy was all wrong. The phenomenon this illustrates is called logophoricity. Some languages have special logophoric pronouns so that (among other things; see Christopher Culy, "Logophoric pronouns and point of view" Linguistics 35 [1997], 845-859) they can keep grammatical track of the difference between a pronoun referring back to the person whose point of view is being taken or whose thoughts or experiences are being represented and a pronoun referring back to someone else. (This is not a fully explanation of logophoricity, or a particular good one; but you see, I started out just trying to write a one-liner about Umberto Eco not appreciating that according to me is not used in logophoric contexts in English. But then people started mailing me and it all got weird. People often ask why Language Log doesn't have open comments and doesn't publish email addresses all over its pages. The reason is that there are roughly 8.34926 gzillion things to be said about almost any piece of language we comment on, roughly 1.02981 gzillion of them being true — though often irrelevant — and 7.31945 gzillion being false, and we try to protect ourselves from having more than a few thousand of them being reported to the mighty Language Log organization on any given day.)

Posted by Geoffrey K. Pullum at 04:27 PM

Do double modals really exist?

I have not studied the primary data on the so-called "double modals" of which Ben Zimmer speaks, though I have of course been hearing about them occasionally for decades (if you ever give a lecture about the syntax of the English auxiliaries, people who can't think of anything else to ask will ask you in the question period whether you have anything to say about double modals). As a member of a distinguished department of Linguistics and English Language, I might be assumed to have done some research on their origins in their emergence in the American South or their possible antecedents in the British Isles; but I have not. I am off duty at the moment. But I just want to air a speculation. I think there might be no double modals at all. I think it might be just a matter of the emergence of a small number of new adverbs with a rather strong preference for being used before certain modals. One new adverb was clearly spawned from a modal when may and be merged to form maybe. That is now standard, and can be placed wherever adverbs of probability go. I think it is possible that might (in origin, the preterite form of may) has also turned into an adverb, of rather limited distribution, in certain non-standard dialects in the South. That would account for all the sequences Ben mentioned: might could, might should, might oughta, might would, might would've; all of them.

The thing is, nobody seems to get random double modals. There are between eight and a dozen modal verbs: definitely can, dare, may, must, need, ought, shall, and will, and a few other marginal items: the had of the idiom had better, the would of the idiom would rather, the is of the is to sequence that is roughly synonymous with must, and possibly for some speakers also the used of the used to construction (though for me the latter does not have auxiliary behavior at all, so it can't be a modal auxiliary; it all depends on whether you can say Used you to live near there? and It usedn't to be allowed). Some of the modals have quite peculiar restrictions: (i) shall is almost extinct for many Americans; (ii) must has no preterite so it is limited to present tense; (iii) ought takes a to-infinitival complement for many but not all English speakers; and (iv) dare and need are limited to non-affirmative contexts (though they have non-modal regular verb twins that are not). But ignoring those peculiarities, there are 64 logically possible two-modal combinations of the eight most basic modals. Here they all are for your perusal:

can dare can may can must can need
can ought can shall can will dare can
dare may dare must dare need dare ought
dare shall dare will may can may dare
may must may need may ought may shall
may will must can must dare must may
must need must ought must shall must will
need can need dare need may need must
need ought need shall need will ought can
ought dare ought may ought must ought need
ought shall ought will shall can shall dare
shall may shall must shall need shall ought
shall will will can will dare will may
will must will need will ought will shall

(If you want to bring in had and would and is in addition, then there are 112 - 11 = 110 combinations.)

How many of these are actually attested? My (admittedly very shallow) acquaintance with the dialects exhibiting the so-called double modals suggests that approximately none of them are. People report might being used in a way that might suggest it is an adverb favoring the position between subject and predicate, and they report a sprinkling of other combinations; but nothing systematic.

Notice, in I might have one out back you can't tell whether might is a modal or an adverb. That would be one of the key facts that led to the emergence of the adverb: if you misanalyze it as having the same structure as I possibly have one out back rather than I could have one out back, you have assumed that might is an adverb.

So like I say, this isn't research I'm doing here, I'm just speculating; I'm not on duty qua linguist. But I'm thinking it is just possible that double modals don't exist. Research could be done that might would bear on the question.

Update: I've revised a little of the above text since first writing it. And Ben Zimmer has kindly provided me with a list of the modal-modal combinations in Marianna di Paolo's data (1989). In addition to the cases where might occurs before a verb phrase beginning with a modal (she has might could, might oughta, might can, might should, might would, might had better/might better, and might supposed to, also incorrectly listing might've used to as a double-modal use), there are cases where may is used in a very similar way, i.e., in a way that suggests it could just be an alternant for maybe (she gathered may could, may can, may will, may should, may supposed to, and may used to; she also incorrectly cites may need to, but that is just may with the regular verb need).

However, apart from the may/might examples there are just these cases:

should oughta
can might
used to could
musta coulda
would better
could might
oughta could
better can

Plenty of food for thought there; all I'm pointing out is that (as di Paolo herself noted) it's not a broad representative sample of all the logically possible combinations.

Oh, one other thing: she also gathered as case of might woulda had oughta. Now that's more like it! If cases like that spectacular beauty were commonplace (and they might be for all I know), then I wouldn't remain skeptical very long about double modals.

Posted by Geoffrey K. Pullum at 05:28 AM

Might would have

You don't get to hear a finely turned double modal from a major presidential candidate very often these days. On "Fox News Sunday," the host Chris Wallace grilled former Arkansas Governor Mike Huckabee about his plan for a "fair tax," i.e., a sales tax of 23 percent on goods and services that would supplant the federal income tax. Wallace pointed out that a bipartisan commission appointed by President Bush had firmly rejected this type of tax as unworkable. Huckabee responded:

Well, the only problem is twofold. First, they didn't really study the fair tax. They simply studied a type of consumption tax, not the actual proposal that was designed by some of the leading economists in this country. The fair tax was not just something I cooked up. Frankly, I wish I had. But it was designed by the leading economists from MIT, Boston University, Harvard, Stanford, people like Arthur Laffer, one of Reagan's key economic architects. Had $20 million of research that went into the proposal.
So, if this bipartisan commission had actually studied the fair tax, they might would have had a different conclusion. [audio]

Double modals with might, such as might could, might should, might oughta, and might would, are chiefly limited to Southern and South Midland dialects, and they can sound downright peculiar to anyone outside of those regions. MWDEU states:

The might in these constructions seems to intensify the notion of possibility or speculation; to the outsider who does not use the forms, the might seems to be similar in force to perhaps.

Marianna Di Paolo deals with these constructions in greater detail in her article "Double Modals as Single Lexical Items" (American Speech, Vol. 64, No. 3, Autumn 1989, pp. 195-224). Huckabee's "might would have" falls under Di Paolo's "hypothetical" category, as illustrated by:

I might would've done it if he'd've told me to.

Further examples of this type can be found in "The Pragmatics of Multiple Modal Variation in North and South Carolina" by Margaret Mishoe and Michael Montgomery (American Speech, Vol. 69, No. 1, Spring 1994, pp. 3-29). Mishoe and Montgomery note an unusual case of "might would have" spoken by a television character well outside of the expected dialect region for the construction:

I'm gonna give a little demonstration of what might would have happened to that guy if we'd have fought.

That sentence was spoken by Cliff Clavin on "Cheers," set in Boston. (And the actor, John Ratzenberger, is from Bridgeport, Connecticut.) That's a real anomaly, since as Mr. Verb recently pointed out, double modals tend to be heavily stigmatized even in regions where they're known and used. But perhaps Huckabee's "might would have" is a dog whistle to working-class Southern supporters, demonstrating he's really a man of the people. If so, look for some one-upsmanship from Tennesseean Fred Thompson. Maybe he'll tell Huckabee that he might should oughta change his mind about that "fair tax."

[Update: More from Geoff Pullum here, and from Joe Salmons here.]

[Update, Nov. 25: Fred Thompson did indeed match Huckabee's double modal on "Fox News Sunday" the following week. Details here.]

[Update, Nov. 27: Further explorations from Geoff Pullum here, here, and here, and from Joe Salmons here.]

Posted by Benjamin Zimmer at 01:12 AM

November 19, 2007

Exhausted grammar

"Pardon My Planet," (Nov. 14):

Brett Reynolds posted this on his English, Jack blog, with a note: "I wonder if Shatner ever felt that he was hyphenating his words. Likely this is more of a cartoonist's thing." What I'm wondering is, why the use of hyphens rather than some other punctuation to separate the words?

One common method of representing this sort of prosodic pattern (with each word treated as a distinct intonation unit) is separating the individual words by periods. This type of word-by-word emphasis came up here in a few posts back in March (here, here, and here). At the time we were considering the snowclone "Best./Worst. X. Ever." — modeled on the putdown uttered by Comic Book Guy on "The Simpsons": "Worst. Episode. Ever." (For extra oomph, exclamation points could be used, as I suggested would be appropriate for ESPN's Chris Berman when he uses his irritating gimmick in NFL recaps: "He! Could! Go! All! The! Way!")

The "exhausted" style exemplified by the "Pardon My Planet" strip is certainly different from the sneering delivery of Comic Book Guy, but the separation of words by periods (rather than hyphens) works well here too, to indicate a melodramatically labored effort to get each word out. Here are some period-delimited examples of exhausted grammar from online sources where, as in "Pardon My Planet," the first-person pronoun is dropped before modals like must or can't:

Must. Vent. Spleen. (Barbelith Underground)
Must. Force. Self. To. Act. (Grayblog)
Must. Get. Tunes. Out. Of. Head. (Drowned in Sound)
Must. Fight. Urge. To. Become. Apple. Developer. Again. (Channel 9 Forums)
Must. Work. Harder. Must. Secure. Tenured. Position. At. Oxford. (Ghost of a Flea)
Must. Fight. Urge. To make. Snarky. Comment. And. End up. Under. NSA. Surveillance. (Blog, MD)

Can't. Stop. Laughing. (Culture Strain)
Can't. Sleep. Must. Blog. (Ghost of a Flea)
Can't. Stop. Tinkering. Must. Turn. Off. Computer. (Film Experience Blog)
Can't. Rest. Till. Every. Perpendicular. in World. Abolished! (Catholic and Enjoying It)

Even more common in the exhausted/labored/breathless style is delimitation by ellipsis:

Must... control... fury... (Volokh Conspiracy)
Must... buy... sugary... products. (Blah Blah Flowers)
Must... stop... cognitive... dissonance... (Corrente)
Must.....reassert....intellectual...credentials. (Venusburg)
Must ... restrain ... sarcastic ... remarks ... (Brainsluice) buy...all things...Jeep. (Engadget)

Can't... stop... searching... (Metafilter)
Can’t... resist... the Dickies. (See Magazine)
Can't ... hold ... out ... much ... longer! (Philadelphia Weekly)
Can't.....distinguish.....brilliant....satire....and.....incoherent......vitriol.... (Barbelith Underground)

(Google doesn't help too much in searching for examples like those above, except for characteristic phrases like "must fight urge," but I'm fortunate to have access to an online corpus that allows searches on punctuation.)

Now, what about the Shatner factor? William Shatner, especially in his Captain Kirk persona on the original "Star Trek" series, has been widely mocked for his hammy elocution. One notorious example is his dramatic reading of Elton John's "Rocket Man" at the 1978 Science Fiction Film Awards ("I'm a rock ... it ... man!"). But I don't know if Shatner is really the ultimate model for the exhausted word-by-word delivery (unless there's a relevant "Star Trek" episode that I'm unaware of). It seems more likely that comic-book superheroes (and supervillains) provide a firmer historical basis for this. I'm thinking particularly of Superman, when immobilized by exposure to kryptonite:

Then again, there are probably numerous sci-fi/fantasy predecessors for this — much like the snowclone "What is this X of which you speak?" (discussed here and here) — congealing into commonly recognized (some would say clichéd) conventions for representing jocular pseudo-exhaustion.

Posted by Benjamin Zimmer at 11:42 PM

Ritardando al fine

One of the oldest and best established analogies between speech and music is the tendency to slow down at the ends of phrases. This is a natural consequence of the way our motor system performs rapid temporally-complex actions. But it has a perceptual side as well, since it reliably marks the structure of performances; and it can be consciously or culturally modulated, as the existence of the musical instruction ritardando al fine suggests.

In a post last year ("The shape of a spoken phrase", 4/12/2006), I showed some pictures of this effect in conversational speech; and a couple of days ago, I laid out a detailed design for a simple experiment, suitable for use in an introductory phonetics course, to examine the phenomenon in the laboratory ("Design for a class unit on cross-linguistic final lengthening", 11/17/2007).

That experiment consisted of reading 100 7-digit strings, in the style of American telephone numbers, arranged in such a way that each of the 10 digits occurs equally often in each of the 7 positions, and each of the 100 two-digit sequences occurs equally often spanning each adjacent pair of positions.

While watching the first half of an uncompetitive football game, I've segmented my own reading of the list.

Here's what the overall duration pattern looks like:

That is averaged across all 10 digit types. The individual digit types all show the general pattern of final lengthening, including the smaller amount of lengthening at the end of the first sub-phrase, but the details are quite different, case by case:

In particular, the phrase-position effects are superimposed on a large difference in basic duration. These differences are due in part to different numbers of syllables and segments, but also to different intrinsic durations of the vowels and consonants in question. This large effect of intrinsic duration is one of the things that separates speech rhythm from musical rhythm, where such effects (e.g. due to notes or note-sequences that are difficult to perform) do occur, but are suppressed as far as possible. (The implications of this for traditional distinctions between "stress-timed" and "syllable-timed" languages have yet to be entirely straightened out, since there are no reliable tendencies for speakers to adjust their performances even slightly in the direction of the allegedly isochronous intervals.)

If the digit-string data is segmented more finely (e.g. distinguishing the various pieces of six or seven), we could learn some things about the linguistic character of the effect, which is by no means a simple slowing of overall time. There is also something to be learned from repeating the experiment at two different speaking rates; and from comparing individual differences across members of a class, or across languages for which speakers are available.

For those who are tempted to head in that direction, some additional help may be provided by the R script that generated the pictures in this post, and the relevant data files.

[Readers with sharp eyes and good memories may have noticed that at least one minor aspect of these pictures seems inconsistent with the durational shape of 4-word phrases shown for conversational speech in my earlier post:

There, the second of four words was slightly shorter than the first and the third; here, it's slightly longer. My guess is that this is a rhythmic effect, caused by the typically alternating pattern of phrasal stress in telephone numbers. A variation on this experiment would be to look at stress patterns (e.g. ONE two three FOUR five six vs. ONE two THREE four FIVE six) crossed with phrasing (e.g. 12 - 3456 vs. 123 - 456 vs. 1234 - 56).]

[Update -- John Burke writes:

In the film "Class Action," a lawyer cross-examining an elderly witness asks him if he's familiar with several number strings; one of these turns out to be his own phone number, but with the spoken digits grouped unconventionally, rather than in the standard pattern with lengthening of the third digit. I'm fairly sure the conventional division of a phone number into area code, prefix, and individual four-digit number harkens back to the days when the prefix was an "exchange" like "Mission 6" or "Rhinelander 4," a real building in a neighborhood where all the phones with that prefix were located.) The witness doesn't recognize the number, and the lawyer uses this to cast doubt on the accuracy of his memory for other events.

As someone who used to work for the phone company, and who remembers when "area codes" were introduced, I can confirm that the 3+4 division dates to the days when "exchanges" (corresponding to the first two digits) had names. When I was growing up, for example, my family's phone number was HArrison 3-4488. The detailed history is discussed (of course) in the wikpedia entry.]

Posted by Mark Liberman at 07:10 AM

Syncretic resonance of the day

Jim Gordon writes:

I was brought up short in an English lesson by a Brazilian student's use of a term, seeming originally from English, that I'd never heard: "making off", having nothing to do with carrying away. The term apparently has gained some currency in French, Italian, Spanish and Brazilian Portuguese, although I can't find it in on-line dictionaries or translation engines. A search returned a fair number of G-hits, some of which were for the usage I sought, but without conveying enough defining comment or context. Finally, in the blog of an anglophone in Galicia, Spain, I found the following:

Un making off — The recording of the director and actors describing the making of a film.

And, indeed, it seems to be a drift away from "making of", and now the wave is coming back to us.

I was intrigued and amused that the only occurrence of "making off" in Language Log seems to have been in an article about semantic drift, starting from "making off with booty."

The same blog also presents this lovely example of the bebop-aroo effect:

Early yesterday evening, my film director friend called to say that the 'making off' scheduled for yesterday wasn't now going to take place. As if I hadn't guessed. Instead, we would meet today at 10am in Vigo. When I asked for the location, things went like this:-
In Café Bangkok, in Rosaléa de Castro street.
Bangkok? The Asian capital?
No, Bangkok, the painter.
Bangkok the painter? Could you spell it, please
Oh, Van Gogh!
Yes, Bangkok.

Here's another example of making off in Spanish:

...mas que un tutorial podría ser un making off, pero bueno asi veis como lo he realizado...

And in French:

Il s'agirait d'une video du making-off du film Titanic...

And in Portuguese:

Participar do "making off" dessas fotos maravilhosas, foi mais do que um grande prazer.

And in Italian:

Così mi è venuta l'idea del making off, che nella prima stesura rappresentava quasi il sessanta per cento del film.

It's enough to make you suspect an origin in vulgar Latin -- except that there are plenty of examples in German ("Nachdem es ja jetzt hip ist über alles und jeden ein Making off zu machen, können wir uns dieser Sache an dieser Stelle auch nicht entziehen."), and Dutch ("Vandaag is op het net het making-off filmpje opgedoken waarbij je de 'gangster' aan het werk kunt zien."), and probably in other non-Romance languages as well.

Instead, pending confirmation from the Language Log Terminology Committee (paging Arnold Zwicky...), we can tentatively identify this as a cross-linguistic eggcorn.

[Update -- Fernando Pereira writes:

There's a v-b gradient between Portuguese and Galician (Galego), with "v" dominating South of the Douro, and "b" becoming stronger as you move North, across the border, and to Vigo. A famous white grape that the best Portuguese vinho verdes are made from is "Alvarinho" for us, "Albariño" in Galicia. A cow "vaca" is pronounced "baca" in extreme North Portugal and in Galicia. This difference was used in comedy in Portugal because people from Galicia immigrated to Lisbon from the late 19th century to the mid 20th century, opening small restaurants, greengrocers, and other neighborhood businesses.


[Update #2 -- Slavomír Čéplö writes:

In response to your challenge ("and probably in other non-Romance languages as well":

"Prosim ta ked by si mal niekedy cas mohol by si uploadnut nieco z nasledujucich making off´s"
( note the English plural, the Slovak plural would be "making off-ov")
"nijaky premium pack s making off videami..."
"Teraz v apríli sme robili spoločne réžiu záznamu a making-off rakúskeho muzikálu R. Baumgartnera Sisi"

"Madonny, ke kterému je přifařen i vlastní Making Off (tedy film o tom, jak se klip natáčel)."
"V upoutávce na making off zvolili dost úsměvné věty"
"nakonec bude předveden i nějaký autorův Making-off"

"Strona składa się z 4 dużych części: portfolio agencji, nagrody, szkolenia i making-off, czyli kulisów produkcji."
"A jak oglądam na tv making off to cały czas przewijam ten moment jak Tom jest w tej kurtce"

"Föleg ha egy making off-ot megnézünk a GT-röl"
"A Bad Boys II making off-ját megtekintve számos helyen mutatták be a digitális művészek, hol, ki volt digitális karakter."

"Making off-pätkät on tarpeellisia varsinkin muille videoita tekeville."
"DVD:n extroissa oleva making-off oli mielenkiintoinen..."

"ma nafx imma waqt li kont qed nara il-making off u rajt lil kristina..."

While searching for Slovak examples, I have also come across this interesting bit in what I assume is Breton:
"Aze e vo kavet ganeoc'h pep tra diwar benn pennoberenn Diwan, interview ar c'hoarierien, ur making off, an arvestoù c'hwitet gant hag all hag all. Pediñ a ran ac'hanoc'h da bellgargañ ar film en e bezh a benn kaout plijadur ha c'hoarzhioù e leizh dirak ho skramm."

Meanwhile, Arnold Zwicky wrote:

I think you've definitely found a re-shaping here, possibly an eggcorn (I'm not sure about the semantics), but not something that's crucially bilingual. There's a big pile of hits for {"the making of"} in English, from presumably native speakers:

I already talked about the amazing new Sony Bravia ad (THERE) but here is an even more amazing video about the making-off of the advert.

part of the making off of the album imagine, they are rough cuts ... at one point John Lennon complaining that the technician has to change tape in the ...

Forced Founders: Indians, Debtors, Slaves, and the Making off the American Revolution in Virginia. (Book Reviews).~(book review) from Journal of Southern History ...

Mostly you seem to get "the making off of" a film or recording or whatever, but the third one above, from a history journal, was startlingly different. The actual book under review has "of", by the way.

The third one seems like an old-fashioned typographical error, though perhaps it was influenced by the "making off" version.

As for the semantics, I construe it as related to the uses of "off" in offshoot, offprint, run-off (in all three senses), cast-off, etc.: something extra created as a by-product or residue.

I have no idea what the order of influence has been, but (like Jim) I'd never heard of this before, although it seems to have become a common usage in several other languages; so it seems plausible that it might have spread among non-native speakers with limited understanding of the vagaries of English spelling-sound correspondences in the area of fricative voicing.]

Posted by Mark Liberman at 06:49 AM

November 18, 2007

Grunt Glossary

This morning's Zits provides a "Glossary of Grunts", subtitled as a "public service guide to interpreting the language of the teenage species":

Based on my recent experience of middle-schoolers, this seems accurate if incomplete.

I don't think that the inventory of such sounds is much different now from what it was like when I was 12, except for Homer Simpson's "d'oh" (which is really Dan Castellaneta's "d'oh"). But I believe that the frequency of grunts relative to other forms of communication may well be higher. On the other hand, maybe it's just that grunts are more salient when you're the audience and an adolescent is the author, as opposed to the other way around.

I don't know any real "glossary of grunts", e.g. a transcription standard with examples and guidelines for distinguishing one kind from another, and perhaps a scheme for quantifying relevantly graded properties. If you know of any candidates, please tell me.

[Update -- Gwillim Law writes:

If you go to the Turner Classic Movies website (, click on November 14 to bring up Matt Groening, and then click on Video Interview, about halfway through the film clip you will hear MG say that Dan Castellaneta credited James Finlayson, in the Laurel and Hardy movie "Way Out West", as the prototype for his "D'oh" rendition.

I'm very glad to learn this, both for the value of the particular piece of information, and as an example of paralinguistic antedating. I hope that my colleagues at the Oxford English Dictionary are paying attention.

...And Ben Zimmer quickly points out

Not to worry-- the OED entry for "d'oh" already recognizes Castellaneta's explanation...

Popularized by the American actor Dan Castellaneta who provides the voice for the character Homer Simpson in the U.S. cartoon series The Simpsons. The quotation below is his own description of its origin: 1998 Daily Variety (Nexis) 28 Apr., The D'oh came from character actor James Finlayson's "Do-o-o-o" in Laurel & Hardy pictures. You can tell it was intended as a euphemism for "Damn". I just speeded it up. Although the word appears (in the form D'oh) in numerous publications based on The Simpsons, the scripts themselves simply specify annoyed grunt (as did the very earliest). Unofficial transcripts of the programme suggest the first spoken use was in a short episode, Punching Bag, broadcast on 27 Nov. 1988 as part of The Tracey Ullman Show. Its earliest occurrence in the full-length series was in the first episode Simpsons roasting on an Open Fire, broadcast on 17 Dec. 1989.

I should have checked! ]

Posted by Mark Liberman at 04:41 PM

November 17, 2007

On the screen

Caught on a screen very near me two weeks ago: a cute playing with anaphoric pronouns in an old episode of Taxi; an incredibly irritating editing of Eddie Murphy's 1987 Raw performance; and some great London street speech in the movie Metrosexuality.

I rarely work in complete silence.  I almost always have the radio, the TV, a movie, or my iTunes playing in the background.  I've  done this since I was a child.  (Please don't write me about my work habits.)  But this stuff has to be a background track, especially since when I'm deeply into the flow of writing or thinking I entirely cease to attend to the background.  Things that actually REQUIRE close attention just won't work: movies in languages I can't follow (so I have to look at the subtitles), for instance, or music that moves me strongly in one way or another.

Trash television and trashy movies are especially good for my purposes, and they have the virtue of providing me with occasional (but not too frequent) examples of linguistic phenomena I collect.  Mutant-creature movies -- giant or newly vicious (or both) ants, yellowjacket wasps, bees, snakes, fish, cats, whatever -- are especially fine.  Also mostly fine are things I've seen or heard many times before; I can tune in occasionally when something memorable comes along.  Monty Python can run in the background, and every so often I'll take a break to speak lines along with the Pythons.

ANAPHORA.  Which brings me to Taxi, a classic American sitcom I've seen all the episodes of several times.  Saturday two weeks ago, spurning KFJC's Norman Bates Memorial Soundtrack Show (which is somewhat distracting because Robert Emmett, the host, is a very heavy user of Extris), I went through the first third of Taxi's first season on DVD.  Not perhaps the best choice in the world, since I kept finding things to take note of.  Including this wonderful exchange, from "Bobby's Acting Career" (first shown on 10/5/78):

[Alex, the show's main character, comes into the Sunshine Cab garage with a great dane]

Bobby: Where'd you find him?
Alex: I took him away from some guy in my cab; he was whipping him with his leash.
Tony [to the dog]: Hey, you shouldn't do that, boy!  You could hurt somebody.

Tony Banta, a cabdriver who's also a boxer and who's a bit on the slow side, gets the (intended) antecedents reversed (this is endearing, because he's looking at things from the point of view of the dog).  No sensible person would get the antecedents wrong.

(But simpler examples than this are trotted out, out of context, in textbooks and advice manuals, as instances of "unclear antecedents" for pronouns, in this case antecedents that are labeled unclear because they're said to be ambiguous.  I have a whole series of postings in the works on "unclear antecedents", including one on "more than one antecedent".  Bottom line: the advice material totally fails to take into account real-world plausibility and discourse organization, and these factors are absolutely crucial, here and with regard to the recently-discussed modifier attachment.  There are some bad-news examples -- I have a collection of them and occasionally post about them here -- but people mostly don't have trouble locating antecedents.)

The problem for me as someone working while viewing was that I had to stop and get this whole exchange down.  Not conducive to work.

BLEEP.  The next morning I thought I'd catch Eddie Murphy's 1987 performance Raw (at Madison Square Garden) on the Bravo Channel,  I wasn't prepared for Bravo's massive bleeping of all the taboo vocabulary.  It was seriously disconcerting: whole chunks of Murphy's shtick were reduced to function words with blanks, and since the routines were fast-paced, you actually had to listen carefully to them to guess at what had been elided.

Frustrating indeed.  After a little while I abandoned this bizarre event.

INNIT.  And passed on to a DVD of Metrosexuality, a film (originally, a TV show) set in London's Notting Hill district, with a large cast of characters, of several ethnicities, sexualities, and dialects.  It's very fast-paced, with lots of quick cuts.  Not really an Arnold Zwicky work thing, because it requires so much attention.  But it has some wonderful London street speech, including this telephone exchange right at the beginning, between an adolescent and his (flamingly) gay father, both black:

Dad: Just tell me what you want and be done.
Son: How about a lift home, yeh?  See, your no-good ragamuffin ex-husband ain't shown up, innit,  And I don't got no bus fare, innit.
Dad: But you do got legs, innit.  And you do can walk, innit.

(Plot point: the son is trying to get his two dads, separated for 18 months, to reconcile, and is contriving to get them both in the same place at the same time, on his behalf.)

There's just so much here: the deployment of innit (which has several uses in current London street speech, going well beyond its use as a fixed question tag) and the do's in the dad's last two sentences, in particular.

Sadly, the interpretive burden was just too great for me, and I moved to less challenging things, in the mutant-creature genre.

Posted by Arnold Zwicky at 08:30 PM


One of the latest YouTube sensations is, surprisingly enough, a metalinguistic exploration of the speech patterns of northeastern Pennsylvania. It's called "Heynabonics," and it's the handiwork of the comedy troupe One Laugh at Least. The sketch takes place in a classroom, with a hyper-demotic teacher instructing clueless outsiders in "the unofficial dialect of nort'eastern Pennsylvania." Though the short film now posted to YouTube was created in 2005, the troupe has actually been performing the sketch since 1998, when the Oakland Ebonics controversy was still fresh in the national consciousness. (More background here and here.)

The word Ebonics has generated various other jocular folk-dialectal labels via blending, notably redneck-bonics and y'allbonics — though both of those examples are used to mock routinely stigmatized Southern dialect features. The "Heynabonics" sketch, on the other hand, pokes fun at a dialect region that most Americans don't consider salient enough to stigmatize. For instance, if you watch the US version of "The Office," which takes place in Scranton, Pa., you won't hear much of anything identifiable as local dialect (besides a suitably glottalized pronunciation of "Scra'on"). So the YouTube video is for many viewers probably their first exposure to the regional speech of northeastern Pennsylvania, or at least a caricature thereof.

For students of perceptual dialectology the video is reminiscent of many other folk-dialectal representations from around the U.S. Even staying within the state of Pennsylvania one can find many self-mocking catalogues of features from the Pittsburgh dialect region. Barbara Johnstone, as part of her Pittsburgh Speech & Society project, has analyzed local works of folk lexicography that instruct readers "how to speak like a Pittsburgher." And just as y'all is taken as emblematic of Southern speech, the second-person plural pronoun yinz or yunz is seen as embodying Pittsburghese, to the extent that Pittsburghers sometimes call themselves yinzers and refer to their home town Yinzburgh. In the "Heynabonics" video, the regionally emblematic feature is not a pronoun (even though the instructor drills the students on non-standard youse) but rather a tag question: heyna.

The Dictionary of American Regional English treats this tag question under the heading haina and supplies the following 1986 citation from its files, given by a respondent from northeastern Pennsylvania:

Putting haina on the end of a statement makes the statement a question. It doesn't matter who you're talking to, or when the thing happened. "You're going dancing Friday night, haina?" means "Are you going dancing Friday night?" "He did that last night, haina?" means "Did he do that last night?"

Coalspeak, an online glossary of Pennsylvania's anthracite coal region, provides this definition:

hayna or heyna or henna or haynit: request for affirmation, like "ain't it so?" or "isn't that right?". See hain't. This is primarily a Luzerne County word, very common in Hazleton, Wilkes-Barre, and surrounding areas. On a related note, many Pennsylvania Dutch sentences end with the phrase "say not". That sure smells like sulfur, henna?" "It sure is cold tonight, heyna or no?"

And under hain't it has:

hain't: another variation on ain't. "Hain't it?" is likely where heyna comes from.

DARE supports this derivation, since it cross-references haina with ainna, identified as a contraction of ain't it, influenced by German nicht wahr. (The citations for ainna come from areas with historically large German settlements, such as Milwaukee.) Coalspeak's linkage of heyna/haina to Pennsylvania Dutch say not may have some validity — in a discussion on the American Dialect Society mailing list last year, Arnold Zwicky mentioned another Pennsylvania Dutch variant, ai not. Elsewhere, Arnold explained that "the Pa. Dutch English tag is just a straight-out borrowing from Pa. Dutch (which is surely a calque on a German dialect tag)." Paul Johnston, meanwhile, suggested a possible connection to the Scottish tag ai no, since Ulster-Scots were also early settlers in the region, though that seems less likely.

Interestingly enough, Hindi has a similar tag question formation, hai na, roughly translatable as 'is not?'. The ADS discussion last year was sparked by a mention of the Hindi tag in a BBC News report on a dictionary of "Hinglish" as used by South Asian immigrants in Britain. The dictionary, The Queen's Hinglish, evidently derives the popular British tag question innit from Hindi hai na. An article in The Telegraph elaborates, "The ubiquitous tag 'innit' is believed to have started among Indians in the West Midlands in the 1970s via 'haina'." This conjecture, however, fails to account for much earlier attestations of innit in British English, which would seem to be a simple shortening of isn't it. Granted, the spread of innit as an invariant tag (meaning that it does not require agreement for person, tense or number) could very well have been helped along by similar tags in the native languages of immigrants to Britain, but according to Peter Trudgill the key influence was from the Caribbean rather than South Asia:

There is an enormous body of literature on the invariant tag innit in English English, The origin appears to be in London-based Caribbean-influenced varieties, where it seems to have served originally as a 'translation' of Caribbean English Creole 'no?". It is worth noticing that such invariant tags are very common in areas where English has a history of being learnt as a second language, e.g Welsh English invariant "isn't it?"; broad South African English "is it?"; West African English "is it?", Indian English "isn't it?"; Singaporean English "isn't it? / is it?"

In any case, I'm pretty sure that Hindi hai na has no connection to northeastern Pennsylvania's heyna, but we might want to check with Kelly Kapoor.

[Update #1: Craig Close points out some blog synchronicity: on the Milwaukee Journal-Sentinel's Word to the Wise blog, Tom Tolan discusses ain(n)a, a remnant of Milwaukee's German settlers. Tolan writes:

My favorite short explanation of where aina came from I found in an article online about German-American dialect songs, by University of Wisconsin-Madison professor James Leary. He writes: "'Aina' or 'Enna,' from the English 'ain't' and the German 'ne' (a dialect rendering of 'nicht'), is a venerable 'Milwaukeeism' meaning roughly 'isn't that so?'" (This in a passage about a song called "Aina Hey" by Milwaukee musician Sigmund Snopek III, in which "Snopek chants the phrase over and over, creating a musical and verbal icon of Upper Midwestern Anglo-Germanic fusion — an abstract, modernist dialect song.")

And see Arnold Zwicky's post immediately after this one for a bit more on British innit.]

[Update #2: A nice note from the creators of "Heynabonics":

We can now say "WE'VE MADE IT." We noticed Heynabonics was featured on the U of P Language Log. Whoo Hoo!
Thanks for your great article.
By way of background, Chris O'Donnell is the writer and the 'student' with glasses was born and raised in Wilkes-Barre, the heart of Northeastern Pennyslvania. The 'teacher' Greg Korin, lived in Las Vegas and Montana before moving to NEPA for a radio gig. Shivaun O'Donnell, the female 'student' moved to NEPA to go to college. Born to two Wilkes-Barre residents who'd had the heyna wacked out of them, she spent her first two years in Wilkes-Barre trying to figure out what people were saying.
The other core members of the troupe, Karen Novick, Jack Gibbons and John Schurgard were all born and raised in NEPA.
Thanks again.
Shivaun O'Donnell and all of us at One Laugh at Least.

Also, discussion about heyna and other tag questions has been going on over at Languagehat.]

Posted by Benjamin Zimmer at 05:14 PM

Definite descriptions

Geoff Pullum has posted about the considerable difference in the acceptability of "singular they" in the two sentences

(1)  Do not speak to the driver or distract their attention without good cause.
(2)  Do not speak to the king or distract their attention without good cause.

(which differ only in (2) having king where (1) has driver, yet (2) is much less acceptable than (1)).  Geoff's explanation turns on a difference in the way "definite descriptions" (roughly, singular count NPs that have the determiner the and denote contextually unique individuals) pick out referents -- a difference made famous by the philosopher Keith Donnellan in a 1966 Philosophical Review paper "Reference and Definite Descriptions", under the labels ATTRIBUTIVE and REFERENTIAL: the driver in (1) is used attributively, the king in (2) referentially.  (In fact, Geoff uses the term referential in distinguishing (2) from (1).)

What's cool here is that Donnellan's distinction shows up in a fact about how English they is used.

[Digression: definite descriptions, as understood in the philosophical literature, have both the properties D (roughly, uniqueness) and ArtDef (having the determiner the) that I talked about in an earlier posting, on (an)arthrousness, so they're "definite" in two ways at the same time.  The standard examples of definite descriptions are singular count NPs, though there are other NPs with both the properties D and ArtDef, like the recipients of this year's Nobel Prize in chemistry.]

[Another digression: Donnellan was responding to Bertrand Russell's analysis of the semantics of definite descriptions and Strawson's challenge to it.  Donnellan's analysis, in turn, has been disputed and defended over the years, in a rich and complex literature.  (For a summary of this history, look here.  And see below.)]

Taken out of context, the driver can be used either attributively (picking out whoever is uniquely the driver in the context) or referentially (picking out some specific person and saying that, in the context, this person is the driver).  The driver of this bus in

(3) The driver of this bus is insane.

will probably be interpreted referentially: person x, who is driving this bus, is insane.  So we'd usually use a singular pronoun, he or she, for anaphora to the driver of this bus, since the speaker of (3) will know the sex of x.  They (or he or she) would be much less felicitous.

But the driver of this bus in (3) can have an attributive interpretation: whoever is driving this bus is insane.  Perhaps the speaker of (3) judges that only an insane person would drive the bus the way this person does.  It's even possible that the bus company hires only insane people to drive this particular bus.   In such contexts, the sex of the driver is not particularly important, and might well be unknown to the speaker -- so anaphoric they (or he or she) is entirely natural.

So much for the main theme.  Now, a coda.

While searching on "definite descriptions" for links to add to this posting, I came across a book by that name, edited by Gary Ostertag and published in 1998 by the MIT Press.  I somehow managed not to buy it when it came out -- I buy an awful lot of books -- but it looks like something that would interest me: it's a compendium of the classic philosophical literature on definite descriptions.

You're thinking that it's been less than ten years since this book was published, and it's obviously a valuable resource, so it should be possible to find copies for sale.  Well, on the MIT Press site, Definite Descriptions is OUT OF STOCK.  Otherwise, things are dire; the book's a rare and expensive item.  (Maybe everybody who bought a copy has hung onto it.)  For the PAPERBACK (!) edition, we find:

Alibris lists one used copy, at $298.66.  Barnes & Noble lists one used copy, at $285.58.  These appear to be the same copy, from Actinia Bookstores in Baltimore.

Amazon has one used copy, at $211.01, from Specialty-Book in Ohio (which I've been unable to find anything about).

Books-A-Million's hard-to-find inventory has one used copy, at $437.10 (or a mere $393.39, if you belong to their "Millionaire's Club").

Powell's, AbeBooks, and Biblio list no copies at all.

Somehow I don't think I'll be filling this gap in my library any time soon.

Posted by Arnold Zwicky at 12:49 PM

Design for a class unit on cross-linguistic final lengthening

Anya Lunden asked:

In a recent LL post ("The perils of mixing romance with language learning", 11/7/2007) you describe a very interesting class project on final lengthening cross-linguistically, and, even more enticingly, a class *unit* on FL. I would be very grateful for any references on cross-linguistic FL you might happen to have handy. I'm particularly interested in FL at the word level, and although this phonetic effect is established and appears to be cross-linguistic, I have been able to find relatively little data on it.

There are two questions here. One is about the literature on cross-linguistic final lengthening --  I suspect that Anya knows more about that than I do, but I'll send her a few thoughts separately. The second question is about a class project on final lengthening, suitable for comparing the size of the effect across sounds and individuals and languages.

This is something that I did about a dozen years ago, as a lab project in an introductory phonetics course. I'm afraid that the materials from that course are buried in old boxes and back-up tapes somewhere, if they still exist at all. But for my second Breakfast Experiment™ this morning, I'll try to reconstruct the recipe for the lab exercise, and show how quick and easy it is to put into effect. On some other morning, I'll take a shot at explaining why the question is interesting, for those of you who aren't already disposed to believe that it is.

The idea is to look at how the duration of a word (or syllable, or phonetic segment) is modulated by its phrasal position. In general, there are several kinds of barriers to doing this. For one thing, different segments and syllables and syllable-sequences have very different intrinsic durations, which may be modulated in different ways by emphasis and speaking rate and phrasing and so on. And there are many different sorts of phrases and phrasal relations, which may have different effects on speech timing. And rhetorical structures, and word frequencies, and the various vagaries and artistries of performance all have their important consequences. And then there are all the differences among languages. Putting it all together, an experiment to compare final lengthening across languages can be hard to design and interpret.

One way to deal with these difficulties is to look at a great deal of data, and hope that all the complexities balance out somehow. Jiahong Yuan, Chris Cieri and I took this approach, in some research discussed in an earlier Language Log post ("The shape of a spoken phrase", 5/12/2006) and published in part as "Towards an Integrated Understanding of Speaking Rate in Conversation", ICASSP 2006.

Another approach is to ask speakers to read (or repeat) phrases that have been artificially designed to put a designated set of elements in a carefully balanced and controlled set of positions in a limited set of phrases. An especially easy way to do this -- and one that is well adapted to use as a lab project in an introductory phonetics course -- is to look at structured sequences of elements like numbers and letters, such as telephone numbers, catalog identifiers and the like. These are structurally simple, semantically "flat", and rhetorically neutral, so that every element can freely occur in every structural position, and each such sequence of elements is open to about the same range of rhetorical and emotional interpretations as every other one. And such sequences translate trivially into other languages, at least those that have names for the digits or other elements that you use.

I originally tried this back around 1980, when I was at Bell Labs, as a simple and crude way to estimate the appropriate durational modifications for a speech synthesis system. It was the phone company, after all, and showing that we could do a decent job on telephone numbers made sense! But it's simple and quick enough to make a good lab project for a phonetics course. Each subject's recording takes only five or ten minutes to make and a couple of hours to measure; the students learn a fair amount about what speech sounds look like and how they interact; and the resulting data is intricately regular, offering plenty of fun statistical modeling.

Here's a bit of fancy footwork in R that does the right thing to create a balanced list of 7-digit (American-style) telephone numbers. You probably don't want to make your students understand how this works, or even to show it them at all -- you just need to use it (or something equivalent) to create the patterns that they'll use in the experiment.

X <- matrix(nrow=100,ncol=7)
X[,1] <- sample(rep(0:9,10),100)
for(c in 2:7){
   for(p in 0:9){
      X[X[,c-1]==p,c] <- sample(0:9,10)
write.table(X, file="Sequence1", row.names=F, col.names=F)

(You could create similar collections of e.g. 10-digit numbers by changing 7 to 10 in the first and third lines.) The result is 100 rows of seven digits each, with the property that each of the ten digits occurs equally often in each column, and each of the 100 possible pairs of digits occurs equally often spanning each pair of columns. The first five rows are here:

2 6 2 2 8 0 2
6 4 4 1 9 0 3
3 7 7 7 7 3 3
1 7 2 7 6 7 2
2 4 3 2 1 2 0

The whole output of this particular run is here. Of course, your results would be randomly different, since each run will have different pseudorandom permutations at each choice point.

It's a good idea to format the strings so as to signal the grouping that you want speakers to use. A simple way to do it in this case, using gnu awk, might be

gawk '{printf("(%d)  %d%d%d - %d%d%d%d\n",NR,$1,$2,$3,$4,$5,$6,$7)}' sequence1.txt >sequence1a.txt

The results would start this way:

(1) 262 - 2802
(2) 644 - 1903
(3) 377 - 7733
(4) 172 - 7672
(5) 243 - 2120

The whole output in this format is here.

And if we wrap it with a bit more formatting, we can make an xhtml side show, with some initial instructions on the first slide, and then each digit string on its own page, so that students (or the friends and acquaintances that they recruit) can easily keep track of where they are in the sequence of strings. (Someday, browsers will have advanced to the point where you could actually record each string via a javascript call, or the like, and perhaps even do automatic -- and accurate -- phonetic-segment alignment.)

I recorded the list -- it took me about seven and a half minutes. I'll post the measurements when I have a chance to do them -- segmenting this much stuff by hand takes a couple of hours, which is too much labor for one Breakfast Experiment™, especially on a morning when I'm doing two of them.

Posted by Mark Liberman at 08:57 AM

Do men and women use different parts of their natural pitch ranges?

In response to my earlier post on gender polarization in F0 ("How about the Germans?" 11/14/2007), Bob Ladd wrote:

I was delighted to see the Pride and Prejudice graph, because it shows very clearly the distinction that I talked about in my book as the difference between "level" and "span". Karen Savage has a wide span and Annie Coleman has a narrow span, but their level (defined as mean F0) is more or less identical.

But the more interesting thing that caught my eye was a small but clear difference between males and females in all the graphs you show. For the men, you have a function with gradually increasing slope as you move from one decile to the next higher one. Translating this into a histogram of F0 values, it means that the male distribution is quite skewed toward the bottom of the range. The females, however, tend to have a slightly more S-shaped function as you go from the lowest to highest decile - steeper at the extremes, and flatter in the middle.

What this seems to mean is that men are talking closer to the physiologically determined bottom of their range - baseline, if you like - whereas women are talking further above the baseline.

Bob has a good eye for trends in graphical data, as I've noted before, and so I decided to check his impression quantitatively.

One obvious number to look at would be the difference between the median pitch and the bottom of the pitch range, as a fraction of the pitch range as a whole. Since even the best pitch trackers make some octave errors, using the actual extreme values would an even worse idea than usual, and I decided to use the 10th and 90th percentiles of each speaker's F0 values as a proxy for his or her pitch range.

If we call the three percentiles involved P10, P50, and P90, this proxy for "pitch range utilization" is then (P50-P10)/(P90-P10).

I had all the f0 estimates for the 150 speakers from my earlier post lying around, so these numbers just took a few minutes to calculate -- perhaps easiest Breakfast Experiment™ ever! I've plotted them below, on the vertical axis of a scatterplot whose horizontal axis is the median pitch.

For the females, the median pitch was on average at 0.4 of the span from the 10th percentile to the 90th percentile; for the males, the median pitch was on average at 0.28 of the 10%-90% span. So we can mark up another score for Bob's ability to see trends in graphs!

These measurements came from 75 telephone conversations in Japanese, English, and German, with a male speaker on one side and a female speaker on the other. See my earlier post for a bit more detail about the source of the digital audio, which is part of some speech corpora published by LDC in 1996 and 1997.)

Since there's O(100,000) pitch estimates per speaker, the individual data points are unlikely to be affected much by sampling error. On the other hand, there are very likely to be some systematic pitch-tracking problems in some of the files, for example because of background noise or channel distortion -- I took a look at a few of the pitch tracks to satisfy myself that things are working approximately as they should, but I didn't check systematically. (There may also be some issue with non-linear effects in some speakers' voices, leading to lots of period-doubling in low-pitched regions. It wouldn't surprise me to find that this was happening with some of the speakers with relatively high values on the y axis in the graph above.)

Note that there's relatively little overlap in the median pitch values -- pitch of the voice is one of the few secondary sexual characteristics where distributions for the sexes are almost completely separated. In contrast, there's quite a bit of overlap in the pitch-range-utilization statistic, despite the clear trend.

The most obvious explanation for the pitch-range-utilization effect is that males are choosing (perhaps unconsciously) to speak lower in their pitch range, and females are choosing to speak higher in their pitch range, in order to exaggerate the natural sex difference in the pitch of the voice. But there might be some anatomical and physiological reasons as well -- e.g. it takes more energy to stretch larger vocal cords, or the effects of a given increase in subglottal pressure are smaller for longer and more massive vocal cords. (Talk is cheap, metabolically speaking, so I would be inclined to discount those particular explanations, but there could well be some effects of that general kind.)

Posted by Mark Liberman at 08:36 AM

November 16, 2007

Opening Parliament and deliver a speech

Back on 11/5/07, I caught this on BBC News on NPR:

Queen Elizabeth will be opening the British Parliament ... and deliver a speech.

I balked.  It sounded to me like a GoToGo sentence --

I'm going to the library and study for my test.

in that it seemed to have a present participial VP (opening the British Parliament, going to the library) paired in coordination with a base-form VP (deliver a speech, study for my test), when only the first structure would be licensed by the relevant head V, (apparently) be.  But the cases differ: the details aren't quite the same (GoToGo mostly has going to, or at least some present-participial motion verb plus infinitival to, while the QE example seems to have no such restriction on its first conjunct); and for me (and some others), the GoToGo examples are just fine, while the QE example struck me as bizarre.

Then I saw that the QE example had a acceptable parsing, though not one that was easy to discern.

This is the parsing in which the VP is

[ will ]  [ [ be opening the British Parliament ] and [ deliver a speech ] ]

that is, as a reduced variant of

[ will [ be opening the British Parliament ] ] and [ will [ deliver a speech ] ]

from which the shared head will can be "factored out".  The result is a coordination of two base-form VPs (be opening the British Parliament, deliver a speech), so what's the problem?

Well, two problems.  The first, and more subtle, is that though both VPs in this analysis are in the base form, and so are in some sense parallel, they are different internally: be opening the British Parliament is a progressive VP, deliver a speech an unmarked-aspect VP.  Now though it would be lunatic to require in general that conjuncts be internally parallel (see my extended attack on this idea here), sometimes the internal composition of the conjuncts does make a difference.  The trick is to figure out when and how.

I'm a linguist, so I play with the variables and see what happens.  First, I note that other progressive+plain examples are also dubious:

I must be going soon and deliver a speech.

I have been going to Paris and gone to Vienna as well.

but that the reverse ordering is much less jarring:

I will deliver a speech in Antwerp and be going to Brussels soon after.

I have gone to Vienna and been going to Paris as well.

I've jiggled the context a bit to improve these, but the important points are (a) that it's incredibly hard to improve the progressive+plain examples much by jiggling the context, and (b) the reordered versions are hugely better than the originals.  Similarly, compare the original QE example with this (improved) variant:

Queen Elizabeth will open the British Parliament ... and then be delivering a speech.

So, yes, ceteris paribus, we'd often prefer parallel conjuncts to be parallel internally.  BUT there's something else going on.

The extra thing is a processing strategy, an analogue to what's known in the trade as Low Attachment: when a modifier follows a modified XP ending in an XP, the default is to interpret the modifier as attached to the structurally lower of the two XPs.  So, for example, confronted with a NP of the form

NP1  [ P NP2 ]  [ P NP3 ]

without any information about the content of these expressions, most people will take [ P NP3 ] as modifying NP2 rather than NP1, that is as having the structure

NP1  [ P [ NP2 [ P NP3 ] ]

rather than

[ NP1  [ P NP2 ] ]  [ P NP3 ]

Advice manuals warn you against high-attachment structures, telling you that modifiers MUST be next to the things they modify, and they supply what they take to be dire examples, like this one from Richard Lederer:

An ethnically diverse crowd of about 50 gathered at the Falkirk Mansion in San Rafael yesterday for a speakout against hate crimes organized by the Marin County Human Rights Roundtable.

(Lederer understands the sentence to be saying that it was it hate crimes, rather than the speakout against them, that the Rountable organized.)

But in the real world, low attachment is not a rule but a (default) preference, and context and real-world knowledge often favor high attachment (as indeed they should in Lederer's example).  Compare the low-attachment

an inventory of errors that were printed in the NYT

with the indisputably high-attachment

an inventory of errors that was larger than any previously published

I have a pile of high-attachment examples, and so do other people.  Most such examples pass by without notice, because their interpretation is clear in context.  There is no grammatical rule here.

Back to the original case, involving coordination (rather than modification).  The be going to segment in the QE example can be parsed as be followed by a coordination:

[ will ]  [ be [ [ going to ... ] and [ deliver a speech ] ] ]

(a kind of low attachment for deliver a speech), or it can be parsed with be going to as the beginning of the first conjunct:

[ will ]  [ [ be going to ... ] and [ deliver a speech ] ]

(a kind of high attachment for deliver a speech).

The first is the analysis I gave to the QE sentence at first, without reflection.  For whatever reason (psycholinguists, to your labs!), we very much prefer low attachment here -- but it just won't fly, because of the non-parallel conjuncts.  High attachment hard to get, low attachment not parallel.

The reason reordering the conjuncts improves things so much is that in the reordered coordinations there is no choice in the parsing of the be going to... segment, since it's in the second, and final, conjunct, which has to be taken, as a whole, to be in coordination with deliver a speech.  Any residual problems with the reordered coordinations presumably have to do with the semantics and pragmatics of progressive aspect and unmarked aspect.

Posted by Arnold Zwicky at 04:57 PM

Communication by omission

Or by failing to comment on an omission. Apparently, some forms of communication consist of what is not said about the fact that what was not said was not said:


Posted by Mark Liberman at 08:24 AM

Think globally, protect amorphously

Because of earlier Language Log posts on constructions like "Drive Safe" and "Think Different", a reader in Ridgefield CT thought we might be interested in the latest episode in this grammatical saga:

Recently, there's been a slight fracas at my local planning and zoning commission over signs saying "Shop Local". The board almost denied permits to put up the signs, with one member of the board saying they were "horrific grammar" and should instead say "shop locally." One board member (John Katz) was quoted in the local paper as saying "Just as art is amorphous, so is the concept of protecting the health, safety and welfare of the public. I believe exposing people to the horrific grammar of these signs is in direct opposition to protecting the public's welfare."

There's a related case, highlighted by the recent spread of locavorosity, where the distinction between local and locally seems to me to make a difference. Many of the 1,130,000 Google hits for {"buy local"} seem to come from the movement to buy locally-produced food. The top hits for {"buy locally"} are connected to the same movement, but there are only 353,000 of them. Some of these are from phrases like "buy locally-grown food"; some come from the catchphrase-substitution "think globally, buy locally"; others may be the result of a copy-editor's intervention to correct someone's attempt to write "buy local". But it seems to me that "buy locally" commits me only to carrying out the transaction of purchasing in the local area, without any implication about where the stuff I buy comes from. In contrast, "buy local" is naturally interpreted to mean "buy local stuff".

It seems to me that local is being used in this slogan as the object of buy, not as a non-standard adverb modifying it. I very much doubt that the slogan's proponents would countenance a form like "*buy food local" as opposed to "buy food locally" (though in either case, that's not what they mean). The same thing seems to be true of slogans like "buy American", which is about what to buy, not how (or where) to buy it.

The Ridgefield case may be different -- it's not clear whether "shop local" has something to do with sustainability and locavorosity, or is simply supposed to exhort us to buy from local merchants the stuff that they've had shipped in from California and China. And in the latter case, it's not clear whether local is a non-standard adverb, or some sort of adjectival appositive ("I wandered local as a cloud...").

My Ridgefield correspondent writes that "[t]he article's not up on [the newspaper's] web site yet", and I see that the minutes of the most recent meeting are not yet up on the Ridgefield Zoning Commission's website, but we'll look forward to addiitonal details when they're available.

[Update -- Ben Zimmer points out that the "local food" movement also often uses the slogan "eat local".]

[Update #2 -- the Ridgefield Press now has an article up on this: Chipp Reid, "Local or locally, shop signs are up in Ridgefield", 11/19/2007. And the larger context of Mr. Katz's remark makes the underlying constitutional issue clear:

Commission member Patrick Walsh and Mr. Katz engaged in a spirited discussion of whether the zoning panel could regulate grammar. "Is there anything in our regulations that allows us to regulate grammar?" Mr. Walsh said.

"No," said Ms. Brosius, a finding Mr. Katz quickly challenged.

"Just as art is amorphous, so is the concept of protecting the health, safety and welfare of the public," he said. "I believe exposing people to the horrific grammar of these signs is in direct opposition to protecting the public's welfare."


Posted by Mark Liberman at 08:12 AM

Drivers and kings: a model answer

Here is a model answer for the essay question set on November 14th.

First, some background. Singular antecedents for they are most commonly quantified noun phrases like nobody or any student. Indefinite noun phrases are less common, but plentifully attested and quite acceptable. Definite noun phrase antecedents are still less common, and sometimes seem distinct unacceptable to people who otherwise accept singular antecedents. Singular proper name antecedents are almost unattested (for a very rare and somewhat peculiar exception attested one Saturday night in Las Vegas see this post). Basically this is a scale of definiteness of reference: as we move closer toward the kinds of noun phrase that uniquely pick out a specific person with the usual sex characteristics, it becomes less and les plausible to use they as a pronoun dependent on it. This insight is due to the excellent doctoral dissertation Singular They by Rachel Lagunoff (Applied Linguistics, UCLA, 1997).

Now, the factor that makes a personal name all but impossible as an antecedent for they is that if we know we are dealing with a specific person, then we know that they are of either the male or the female sex, and it feels highly unnatural not to use the appropriately gendered pronoun. In the rare cases where the speaker knows the name of someone and can use it to refer to that person but does not know the sex of the person and thus cannot use a gendered singular pronoun, it feels better to say he or she than they.

A definite NP that refers directly to a specific person is also implausible as an antecedent for they. But the more it is clear that it does not have a specific person as referent, but rather means "whoever turns out to meet this definite description", the more singular they will be acceptable.

When the sign on the 29 buses says that a passenger "must not speak to the driver or distract their attention", it means "the driver of this bus, whoever it may be at the time in question". And Lothian Buses does employ female bus drivers. There is no way to tell who might be the person picked out by the driver on a particular occasion. The phrase is interpreted rather like whatever person happens to be driving this bus, which has a covert universal quantification. That is what makes they fully acceptable with a singular definite NP as antecedent in that case.

What is entirely different with the king is that it is so obviously referential. The king (when there is one) is a unique individual, and always male. We always know the sex of the king if he exists at all. So singular they is entirely unmotivated in this case. It is used to allow third person singular pronominal reference in cases where the antecedent is interpreted as a quantifier binding a variable, or is indefinite, has some similar interpretation involving unknown or suppressed sex information and thus unassignable or arbitrary morphological gender choice. A definite NP referring to a unique person guaranteed to be male is just about as unable to support an anaphorically dependent they as a personal name would be.

In short, it is not really about a difference between drivers and kings, but about the difference between (i) a phrase referring to the unknown arbitrary person of whatever sex who may be driving a particular bus at some relevant time in the future, and (ii) a phrase referring directly to the unique male person who currently holds the monarchy. One noteworthy point this illustrates is that pronoun choice is not the province of any one component of grammar on its own: morphology, syntax, semantics, and pragmatics are all relevant to the choice between the four 3rd-person pronoun lexemes he, she, it, and they.

P.S. Paul Postal has written to point out that he finds it utterly ungrammatical to say Congress shall not harass the president nor interfere with their control of the armed forces. All that one can say is that (i) he may be more subliminally influenced by prescriptive injunctions than he would be inclined to think he is; (ii) he may have a stronger sensitivity to the hierarchy of referentiality and draw the cutoff line below indefinites and above definites; and (iii) it is a bit early to judge sentences like this in view of the fact that US presidents have been very much like kings thus far (always male, with no females even nominated by major parties for election to the post, and sometimes members of quasi-royal dynasties that contribute more than one male scion to the presidency). Perhaps ten years from now we shall be looking back on a situation where, although the tendency toward family dynasties in American politics has not lessened, the choice of he as the pronoun to be anaphoric to the president does not look quite so obvious any more. On a point like this one, grammatical acceptability is evolving in parallel with social mores and the politico-cultural context. An enormous amount of English syntax is remarkably stable, but this little bit of it is not.

Posted by Geoffrey K. Pullum at 03:51 AM

Sally Thomason Apoplectic?

As someone who knows Sally Thomason pretty well, both linguistically and personally, having served as an Associate Editor under her Editorship and stayed many times in her Montana cabin, I can attest that Tecumseh Fitch's suspicion that Sally belongs to "a certain cadre of linguists" who become "apoplectic" at "the mere mention of Chomsky's name" is wholly unfounded. Not only is he mistaken as to her theoretical orientation, but Sally doesn't become apoplectic about much of anything, except maybe knapweed.

Posted by Bill Poser at 03:16 AM

November 15, 2007

Apoplectic? Who, Me? Nah.

Tecumseh Fitch suspects that I disliked his News & Views piece in Nature because I belong to "a certain cadre of linguists" who become "apoplectic" at "the mere mention of Chomsky's name" (and therefore resent Fitch because he has published famously with famous people in Cambridge). His reaction reminds me of one of the more entertaining weeks during my seven-year stint as a journal editor. Back in the early 1990s, when I was editor of Language, I had an 85-90% rejection rate, which meant that there were a lot of disappointed manuscript submitters. Most of those people were very gracious; one friend of mine even offered to let me cite her rejected paper as evidence that I didn't favor some group or other (I forget the specific accusation that triggered her offer). But one week I got two irate letters from authors whose papers I had turned down. One of them accused me of never publishing papers in generative linguistics because I was biased against generativists; the other accused me of publishing only papers by generativists because I was biased against non-generativists. It cost me some effort to suppress the urge to send each author's letter to the other complaining author. That week convinced me, in case I needed convincing, that some people really do see only what they want to see. It also reinforced my deep discomfort with academics (and others) who substitute conjectures about motivations for discussion of substantive issues.

Posted by Sally Thomason at 09:08 PM

The moral of losing your pants, your suit, and your job

Back in mid-June I posted about the case of a DC administrative law judge who sued a local cleaning establishment for losing his pants. He requested a settlement of 54 million dollars for inconvenience, mental anguish, and attorney's fees. Later in the same month I wrote about how this judge lost his lawsuit against the cleaners. Now we learn that he has lost his job as well.

Judges without pants (not to be confused with doctors without borders) can be embarrassing, both to themselves and to the courtroom.  But that didn't turn out to be the case here. Apparently he covered his bottom with another pair and never missed a step as he passed his judgments in the courtrooms of DC. Until this month anyway.

As far as I can tell, even though the judge felt dishonored, he didn't physically attack the business owners who had lost his pants. That's progress of a sort, I suppose, because in the olden days, duels often ensued when a person's honor was questioned. Today, we have courts of law, where fights like that can be resolved (by awards of monetary compensation) using negotiated language rather than with swords or pistols. But even in the days of dueling, negotiation was possible. Parties on both sides had "seconds," whose job it was to deliver the challenge, request an apology, and negotiate an amicable settlement short of bloody combat. Careful uses of appropriate speech acts were highly regarded. Short of success in this, they stood by their friends during the duel. So even in those bloodier days, the language of negotiation played an important role. The judge in this case refused negotiation and went for the financial kill instead.

It's not totally clear whether his pants problem was the reason that his employment was terminated but it's hard to see how it helped him keep his job. The press drools over stories like this and it's likely that the DC Commission on Selection and Tenure of Administrative Law Judges was concerned about its own image. Try to imagine a court case in which one business brings a grievance against another. Further imagine that a judge who had sued a business over losing his pants was presiding. Well...

However anguished and angry the judge was (and haven't we all been frustrated by little things like our cleaner losing our pants?), wouldn't it have been prudent for him to have taken a deep breath, gathered his wits about him, considered the relative importance of it all, and resorted to using language to negotiate a reasonable outcome? Negotiate? We haven't seen a lot of that lately in the news. As humans progressed from fighting bloody duels to using language to resolve disputes, we have learned something about the way negotiation trumps physical fighting. Or have we? Maybe recent international events are an exception. By comparison, the judge's failed lawsuit pales in significance to many other more important failures to use diplomatic language to settle disputes. If there's a moral to the judge's case, maybe it's that people in high office could learn something about the need to use language to get better results.

Hat tip to Amy Forsyth

Posted by Roger Shuy at 12:56 PM

Fitchifying biology, memes and historical linguistics

Guest post by
Tecumseh Fitch

My recent News & Views article in Nature titled "Linguistics: An invisible hand" (Nature 449: 665-667, 11 October 2007) garnered more attention than I expected, both from the media and scientific colleagues. Though this attention was mostly kind and positive, my article aroused the ire of historical linguist Sally Thomason who flamed me quite thoroughly on Language Log a few weeks ago in her two posts (this one here and this one here) on "fitchification". Alerted to this attack by Geoff Pullum, I was trembling in my boots (OK, my house slippers to be perfectly honest) as I pointed my browser to the link. But as I read my fears subsided: despite the denouncements of the "hair-raising errors" in my article, Thomason's screed displayed the sure signs of nearly complete misunderstanding of my article's function.

However, I slowly recognized that Professor Thomason has unwittingly given me the perfect gift to illustrate what was the point of my article, and it is because of this that I write. Thomason has introduced to the English lexicon a new verb fitchify (thankfully left undefined). Although a Google search reveals "Abercrombie-and-Fitchification" in the prior record, I vote for independent treatment of Thomason's new coinage, and here I shall attempt to unpack this verb, using the evidence at hand.

Thomason's intended meaning for fitchify seemed reasonably clear, but, as is so typical in language, context left open a number of other possibilities for the "true" meaning of fitchification. Indeed, as I read further, an explosion of possible definitions presented themselves. The question is especially intriguing (to me, and perhaps some other Fitches out there on the planet) given that "Fitchism", "fitchosity", "fitchulation" and the like are words quite unlikely to have much staying power, for good phonological, not to mention intellectual, reasons. In contrast I find "fitchify" rolls fetchingly off the tongue, and look forward to future uses.

But, first things first. I gather that Thomason's original intent is something like:

fitch'ify v.t. 1. ruin, spoil, botch, foul up or otherwise make a mess of an attempt to summarize, in the space of a few paragraphs, some complex topic.

The complex topic in question was "the history of historical linguistics" (N.B.: a topic with worrisome signs of infinite regress worn right on its sleeve), and the botching in question (near as I can judge) was my inexplicable failure to mention the contributions of the Neogrammarians to said field, and the importance of regular sound change.

However, this first proposed definition of "fitchify" runs into immediate problems, and can't be the correct one. 1

Thomason's post is titled "Fitchifying the history of linguistics," and decries my supposed attempt to explain why "historical linguistics failed". The problem with definition one is that my article did not present a history of linguistics, nor of historical linguistics, but rather of the links between historical linguistics and evolutionary biology (especially between Darwin and the philologists of his time).

My appointed task in the article (as is the explicit nature of "News & Views" pieces) was (1) to offer an accessible introduction to and summary of the two main scientific reports that appeared in that issue (that's the "News" part) and (2) perhaps offer some contextualization or mild elaboration thereupon (that's the "Views"). Those two papers, by Lieberman et al. and Pagel et al., analyzed historical linguistic change using powerful statistical techniques adapted from evolutionary biology. The "News" part of my article attempted to summarize those articles, and to place their findings, and their importance, in a larger scientific context. I further suggested that such work might offer a bridge a long-standing gap between both biological and more traditional historical approaches to language, and more importantly between diachronic and synchronic linguistics.

Perhaps, then, in this context, fitchify means "offer an accessible summary"?

fitch'ify v.t. 2. summarize in accessible English an otherwise daunting or technical work of science, and place it in a broader context.

The critiques Thomason directs at my article, and the supposed errors and misunderstandings she found, were directed nearly exclusively at the "News" portion, which simply reported the findings of the Lieberman and Pagel articles. The class of possible "errors" for a summary of this sort are rather limited:

(1) Propagation of errors in the target articles themselves, as stupidly relayed in a lay-person's summary; whence:

fitch'ify v.t. 3. repeat or transmit uncritically, in the lay press, errors committed in a scientific paper.

or, perhaps, (2) Inaccurate, insensitive or otherwise inadequate summary of the (correct) target articles, giving us:

fitch'ify v.t. 4. bowdlerize, inadequately abridge, or otherwise make a hack job of an attempt to summarize a scientific paper.

Unfortunately for either definitional gambit, Thomason admits to not having found the time to read the articles summarized; 2  so neither of these can be the correct definition in this particular context. But, at least logically speaking, both definitions are still viable contenders. Perhaps Professor Thomason has since read the articles and can breathe new life into these otherwise moribund possible definitions.

Until then, we seem, by process of elimination, to be narrowing the range of possibilities, and thus honing in on an adequate definition of fitchification. The "Views" portion was my little opportunity to say something about where I thought this new work might lead, and to integrate these findings with the insights of eminent historical linguist Rudi Keller (from whom the term "invisible hand", as well as the example of the pejoration of "wench", were borrowed, with proper attribution). I also mentioned exciting new work combining theoretical and experimental laboratory work by linguists like Simon Kirby and his colleagues. Fortunately, however, this portion of the article escaped the wrath of Thomason's sharp pen, so I needn't bore everyone by defending it here: please read the originals if you're interested.

I further suggested that drawing explicit parallels between evolutionary theory and historical linguistics might be of benefit to both fields, and more particularly for some possible future field of "memetics". And it is here that I must thank Thomason for her little gift: a new meme whose fate I can now track.

"Meme" is a rather successful coinage of Richard Dawkins. The term refers to a transmissible chunk of imitated form, or of meaning. Thus, a meme is an idea that can spread from mind to mind in much the same way as genes are transmitted from body to body through the generations, and it invites all sorts of analogies (the "meme pool", "memetic evolution", the struggle among competing memes, etc). Unfortunately, till now, there has been little use of the most apt referent of the term: namely the changes in word structure and/or meaning that are the traditional bread-and-butter of historical linguistics. If ever there is to be a rigorous, empirical approach to memetics, the richest source of data will be that of historical linguistics: observing the fates of new words as they struggle for survival, mutating their form and their meaning through successive iterations of "cultural evolution". I made this point precisely because I agree with Thomason that historical linguistics is "one of the most successful historical sciences you'll find anywhere".

We are left, at this moment in the evolution of the brand new meme fitchify, with a few remaining top contenders for its meaning. One remaining contender derives from the familiar if tiresome fact that the mere mention of Noam Chomsky's name seems to be enough to drive a certain cadre of linguists apoplectic. My suspicion is that part of Thomason's readiness to skewer my article was that she read my summary of Chomsky's famous I- vs. E-language distinction as some sort of vindication or adulation of the currently popular synchronic approach to linguistics: a reflection of Chomskyan triumphalism or the like. This gives us:

fitch'ify v.t. 5. incite linguists to riot and mayhem with the mere mention of Noam Chomsky's name, or brief summary of one or more of his ideas.

Given my association with the infamous paper by Chomsky and Marc Hauser in Science in 2002, or my subsequent use of the term "Chomsky hierarchy" in some of my experimental work on artificial grammar learning (both contributions well roasted in various Language Log posts), I must be honest and admit that this meaning might have some serious staying power. But I hope not. More likely, I think, the top contender from my analyses for the future most successful interpretation (and I admit to being biased) is:

fitch'ify v.t. & i. 6. summarize in accessible English an exciting new scientific result or subject, in a fashion liable to incite the ire of traditionalists (whence fitchification, n., the act or result of fitchifying)

I, for one, intend to keep fitchifying (senses 2 and 6) to the best of my ability, avoiding sense 1,3 and 4 at all costs, and sadly accepting the continued possibility of sense 5. In any case, now the horses are out of the gate and neither Thomason nor I can control where they shall go: at this point the matter is in the "invisible hands" of memetic evolution, and the mouths and keyboards of future language users.

May the best meme win!

W. Tecumseh Fitch
University of St Andrews


1. I do admit to one actual error, notice by Bob Ladd and pointed out to me and to Nature by Geoff Pullum: an artist's error which bizarrely granted the ancestor of the Slavic languages the title "Islamic" (see the published correction). A supposed "simple mistake" concerned the age of Proto-Indo--European, which Thomason pegs at 6000 years. I'll stick with my more conservative estimate -- "some 10,000 years" — given the outer range estimate of the divergence of PIE at 9800 years before present, reported in an earlier paper in Nature by Gray & Atkinson, and derived using reasonably rigorous mathematical methods. But obviously little rides on this estimate, and of course the topic is controversial, because the fact is no one actually knows.

2. In her second post, almost midway through, Thomason writes "(and I admit that I haven't yet read the articles he refers to, to see if they make any distinctions according to the type of lexical change)". Thomason does, however, gallantly concede that the findings in said articles (at least as fitchified by me) are "not necessarily ... trivial".

Posted by Geoffrey K. Pullum at 04:11 AM

November 14, 2007

Linguistics at Guantanamo Bay

The Standard Operating Procedures manual for Camp Delta at Guantanamo Bay has been leaked. You can download it (238 page pdf document) here. (Feel free: it isn't classified - just "for official use only".) Most of it is rather dull stuff about procedures and schedules and who is responsible for what. In my quick pass through it the only thing that stuck out was the discussion of procedures for giving access to the International Committee of the Red Cross, which mentions that the ICRC is to be denied access to some prisoners. That's disturbing, and I believe, illegal. The linguistic part is Chapter 15 "Linguist Operations".

Now before you get all excited and start speculating that some senior person in the Bush Administration has a deep, secret interest in Iranian dialects, remember that in military-speak a "linguist" is an interpreter or translator. At Guantanamo Bay the "linguists" interpret for the interrogators, guards, and other staff, and they translate the letters received by and written by the prisoners so that they can be reviewed and censored. They are also given permission (unlike most other personnel) to loiter in the cell blocks so as to gauge the level of tension among the prisoners and pick up on anything of intelligence interest. They also have the responsibility for vetting the publications in the library made available to prisoners (nothing is permitted that promotes jihad or anti-American, anti-Western, or anti-Semitic views and nothing on a military or sexual topic).

The document lists the languages for which translation is available locally: Arabic, Pashtu, French, Farsi, Urdu, Tajik, Uzbek, Uighur, Russian, Turkish, Spanish, and German, and those that have to be handled off-site: Bengali, Divehi, and Kurdish. I can see outsourcing the translation of mail and vetting of library books, but how do they communicate with prisoners who speak only Bengali, Divehi, or Kurdish? It is probably safe to assume that anyone who speaks Kurdish also speaks Arabic, Turkish, or Farsi, but I'm not sure that such an assumption can safely be made for Bengali. My impression is that younger Bangladeshis do not necessarily speak Urdu.

I'm also curious about some of the languages. Most of them are obvious: they are the languages spoken in Afghanistan and Pakistan and in other Muslim countries from which al-Qaeda members are likely to come. Russian is presumably for Chechens. They probably don't have any Chechen linguists, and Chechens can be assumed to know Russian. Spanish could be for talking to the Cubans across the fence. But why do they need French and German? French, for example, is spoken by many people in North Africa, but I would think that any North Africans likely to be at Guantanamo Bay would also speak Arabic and probably prefer it to French. A guess is that some of the prisoners are Arabs brought up in France or Turks brought up in Germany, who are not literate in Arabic or Turkish respectively.

Posted by Bill Poser at 06:15 PM

Drivers and kings

All right, class, put your books away; this will be a closed-book surprise quiz. It will count toward your final result in the course. Put your name legibly at the top right corner of a clean sheet of paper, and write a short essay answer to this question.

Compare the following two sentences:

  1. Do not speak to the driver or distract their attention without good cause.
  2. *Do not speak to the king or distract their attention without good cause.

Example 1 is closely modeled on a sign found behind the driver's cab on route 29 Lothian Buses in Edinburgh. It is clearly grammatical and acceptable. (Prescriptivists might object to it, but as you know, singular antecedents for forms of the pronoun they are attested in the finest English authors since Middle English times; the prescriptivists just haven't paid attention to the evidence of literary usage.) Example 2 contrasts in only one word, yet is clearly ungrammatical (or strikingly unacceptable at the very least). Why? What is the difference between driver and king that is responsible for the contrast?

You have five minutes. Then I'll collect them in and we'll discuss it.

A model answer for this question has now been posted on Language Log as promised; follow this link.

Posted by Geoffrey K. Pullum at 05:22 PM

Thoughtless contempt

Matt Richtel, "Devices Enforce Cellular Silence, Sweet But Illegal", NYT 11/4/07, p. 1 (yes, on the front page):

SAN FRANCISCO, Nov. 2 -- One afternoon in early September, an architect boarded his commuter train and became a cellphone vigilante.  He sat down next to a 20-something woman who he said was "blabbing away" into her phone.

"She was using the word 'like' all the time.  She souded like a Valley Girl," said the architect, Andrew, who declined to give his last name because what he did next was illegal.

The story manages to compress a lot of stereotypes into a very few words: a young speaker, female, talkative (not just talkative, but "blabbing away"), using variants that annoy the hearer (a professional man, presumably older than the speaker): using the word 'like' ALL THE TIME (my emphasis), sounding "like a Valley Girl".  And using a cellphone.

It's hard to imagine the NYT printing a news story (especially one on the front page) in which someone conveys so much thoughtless contempt for, say, black people, or gay people -- unless, of course, the contempt was the point of the story, which it isn't here: THIS story is about contempt for cellphone use; the architect is about to wield a cellphone jammer.  But young women perceived to be chatty and using youth-marked style features are fair game.  (So are working-class men and rural Southerners.)

We've noted many such cases before on Language Log.  I'm inclined to view them as upwellings of small-scale misogyny and anti-youthism.  (In somewhat less contentious terms: disdain for women and young people.  In still less contentious terms: a devaluing of women and young people.)

It's hard to know whether there's any way to confront people who talk like the architect: they know what they hear, so to speak, and anything a linguist or other academic can say about who uses cellphones, or who uses (various kinds of like), or who talks a lot, and so on, is just going to be seen as beside the point.  Who are we to deny their reality?  "I know what I hear", they say, "and I don't like it."

Of course, quite possibly Richtel intended to convey that the architect was not only a cellphone vigilante but a lout as well.

Posted by Arnold Zwicky at 04:05 PM


Faithful reader Andrew Glines has created what might be the first-ever Language Log mashup...

Long time reader, first time poster, and linguistics graduate student who should really be doing other things than photoshopping recent Language Log entries into eggcorns. But I could not help misreading locavore / localvore as being prefixed by lol-, and then could not help but apply appropriate text to a photo... resulting in the pic attached. Enjoy!

(That's a picture of the coiner of locavore, Jessica Prentice, who probably never guessed she'd get the lolcat treatment.)

Posted by Benjamin Zimmer at 09:28 AM

How about the Germans?

Following up on earlier posts about "Nationality, gender, and pitch" (see also here and here), Linda Lanz sent in some personal experience supporting the view that Japanese have different expectations than Americans do about the appropriate degree of gender polarization in fundamental frequency:

I once worked for a Japanese travel agency in the United States, and as part of that job I frequently fielded phone calls from our Japanese clients. I was never aware of any problems, but after working there for some time, the boss informed me that I needed to speak "higher" on the phone. At first I thought he had a problem with my politeness level, since Japanese phone conversation is highly formal (at least for a business), but that wasn't it. He said that while my grammar was fine, I just talked too low, and the callers thought I was angry or unwilling to help them because of it. Apparently they'd had complaints that the woman answering the phone wasn't very friendly. After that, I tried to remember to speak with a higher pitch when I was answering phone calls in Japanese.

And Tilman wrote to propose another stereotype of national difference in this characteristic:

... speaking of anecdotal and subjective evidence, the impression I get is also that on average American and British women tend to speak with a higher pitch than German women, and one of my English Language lecturers in college confirmed that, again from personal experience. (BTW, I am a historian who studied English as his minor). Thus quite a few American actresses sound rather squeaky to German ears, while as far as I gather the voices of the likes of Marlene Dietrich were perceived as more unusually low-pitched in a US context than they were in a German one.

Well, it would take more work to compare actresses, but the idea that Germans in general might be less gender-polarized than Americans, in terms of voice pitch, is easy to test by the same method that I used to test the difference between Japanese and Americans. The same CallHome collection that I used to compare English and Japanese had a German edition as well. Of the 100 German conversations that were published, 30 involved one male and one female. So for this morning's Breakfast Experiment™, I've added those 30 German conversations to the same plot as the 18 Japanese and 27 American conversations that I used before:

Here's the version plotted in Hz instead of in semitones:

Rather than being less gender-polarized than the Americans, the Germans are actually slightly more gender-polarized. Overall, the German pitch values fall roughly in between the Americans and the Japanese. Another way to see this is to look at a table of the difference (in semitones) between female and male speakers of each language at each percentile:

  .1 .2 .3 .4 .5 .6 .7 .8 .9
Americans 6.0 8.4 8.1 7.5 6.9 6.1 4.8 4.4 3.7
Germans 6.0 8.3 8.5 8.3 8.1 7.5 6.4 5.3 3.7
Japanese 9.2 9.5 9.2 9.1 9.3 9.0 8.3 7.6 6.2

(The same caveats about possible non-sampling error apply as in the earlier case, of course.)

I'd also like to clarify where these numbers came from.

When we track F0 values as a function of time, we get a lot of numbers. Here's the opening phrase of Jane Auten's Pride & Prejudice, as read by Chris Goringe, one of the readers whose versions are available from the free LibriVox audiobooks effort:


The F0 values in Hz estimated for the first word, "it", look like this:


If we take Chris Goringe's recording of the first three chapters, and estimate the F0 200 times per second, we wind up with 93,816 values. (In regions where Chris is not talking, or is producing voiceless sounds, no pitch estimate is produced -- as long as the F0 estimate algorithm is doing its job. The algorithm that I used is a good one, and doesn't make too many errors on the material I've used, so I believe the percentile estimates should be fairly accurate and stable.) If we translate the F0 into semitones relative to A 110, and then calculate the 10th, 20th, ..., 90th percentiles of those values, we get the points that are plotted with the blue 1's in the graph below.

The other plots are (2) Annie Coleman reading the first four chapters of Pride & Prejudice, (3) Micah Sheppard reading the first two chapters of Pride & Prejudice, and (4) Karen Savage reading the first chapter of Pride & Prejudice, all based on the mp3 files available from LibriVox.

Note that the "percentiles" involved are NOT percentiles of people, but percentiles of pitch values. In Hz (= cycles per second), the percentiles look like this:

  10% 20% 30% 40% 50% 60% 70% 80% 90%
Reader #1 147.6 159.0 170.7 183.3 196.1 212.0 230.3 254.0 286.7
Reader #2 195.3 206.9 216.0 224.9 234.4 245.0 257.5 275.9 305.8
Reader #3 88.0 92.4 97.2 102.6 109.8 117.7 128.8 142.9 165.3
Reader #4 166.4 188.1 201.5 213.3 226.8 245.4 267.4 295.9 345.7

You shouldn't be surprised to learn that readers 1, 2, and 4 are female, while reader 3 was male. [Update 11/16/2007: Chris Goringe has written to inform me that he is, in fact, male; so much for naive connections between pitch range and sex. There is more sexual dimorphism in pitch range than in height, for example, because of testosterone-induced "voice change" at puberty -- but there is still some overlap in the distributions.] If we pool the estimated F0 values for the three female [rather, higher-pitched] readers, and plot the percentiles for the pooled data, we get a picture like this:

In order to look at gender polarization among Japanese, Americans, and Germans, I pitch-tracked all the conversations (after separating the two channels), pooled all the pitch estimates for each combination of nationality and sex (generally resulting in more than a million numbers per category), and determined the percentiles for each collection of pooled estimates. [I took the identification of speakers by sex from the demographic information given in the database -- I trust that it was more reliable than my identification of the sex of the Pride & Prejudice readers... In both cases, though, the fact that the data is published makes it more likely that errors can be found and corrected, even -- or perhaps especially -- in informal Breakfast Experiments.]

This is not the only way to examine the question quantitatively, and it's surely not the best way, but it gives a simple picture of what's happening, and it's easy to calcuate. For a Breakfast Experiment™, that matters -- and in fact, my second cup of coffee is getting cold, and I have a 9:00 appointment, so bye for now.

Posted by Mark Liberman at 08:15 AM

The Perils of Translation

We all know how easy it is to criticize translators for the perceived errors of their work, but some cultures take this to extremes. Reuters reports that Ghaus Zalmai was arrested as he tried to cross from Afghanistan into Pakistan for the "crime" of publishing an unauthorized translation of the Qur'an. The publication of Zalmai's tranlsation triggered an emergency session of the Afghan parliament, which prohibited Zalmai from leaving the country, and, according to the Gulf Times a demonstration in Jalalabad by over 1,000 university students demanding that he be put to death. The Supreme Court of Afghanistan has ordered an investigation into the publication of the translation, and Afghan police are reported to be looking for Qari Mushtaq, an imam who reportedly gave his approval to the translation.

These people need to learn about book reviews.

Posted by Bill Poser at 03:21 AM

November 13, 2007

Music Review: ********

On Monday the New York Times ran a review of a punk rock concert by a band named, um, "********". How do you pronounce that, exactly? Is it anything like !!!, the almost unpronounceable dance-punk band out of Sacramento? No, as it turns out, that's just some heavy-handed taboo avoidance on the part of the Times, without even a single unasterisked letter to give the reader a clue of what the unexpurgated name might be.

The critic, Kelefa Sanneh, wryly addresses the newspaper's profanity policy early on in the review:

Pink Eyes is the lead roarer in a ferocious band from Toronto. What band? Well, the name won’t be printed in these pages, not unless an American president, or someone similar, says it by mistake. Suffice it to say that this is an unruly hardcore punk band with a name to match. (You can find out more at the official Web site,

Following that link will reveal that the name of the band is (gasp) "Fucked Up". So not only did the Times censor the entire band name, they didn't even give the proper spacing of "****** **". And certainly "F***ed Up" would have been sufficiently concealing?

Sanneh's comment that the name would only be printed by the Times if it were uttered by "an American president, or someone similar" is apparently a reference to editor Abe Rosenthal's famous dictum that the paper would "only take shit from the President." Neal Ungerleider of FishbowlNY was not convinced:

Except for one little problem... The New York Times had no problem putting "shit" in print, asteriskless, in a transcription of a phone call from Roger J. Stone to father-of-the-governor Bernard Spitzer.
So is profanity okay in some contexts but not in others? Is it only okay in transcripts? Does the Times need some sort of internal Lenny Bruce trial to sort out "good" profanity from "bad" profanity? Who the f***k knows.

Arnold Zwicky noted the groundbreaking appearance of non-presidential shit in the Stone-Spitzer article here, linking to a post by Patrick LaForge on the newspaper's "City Room" blog explaining the editorial decision. "We rarely permit the use of profanity in our columns, even in quotations," LaForge wrote. "We made a rare exception in this case because we felt that readers would more easily understand why the Spitzers were so upset about the message if they knew what the language was."

Even if the band name "Fucked Up" wasn't seen as significant enough to escape censorship, the blunt asterisking of "********" just seems like overkill. But perhaps the Times simply isn't accustomed to asterisking practices, since, as Arnold has observed, the paper usually eschews that form of taboo avoidance in favor of indirect allusion. Such an allusive style works fine in the body of Sanneh's review, but not in the "Music Review" header, where the band name is supposed to go.

The Times ran into the same problem last year when they tried to review the documentary Fuck, featuring appearances by everyone from Ice T to our own Geoff Nunberg. At the time of its theatrical release Geoff called it "the film that dare not speak its name," and the Times, naturally, reviewed the movie under the title "****". Film critic A.O. Scott provided whimsical metacommentary similar to Sanneh's:

Just to clear up any confusion: the four stars in the box accompanying this article do not represent a rave review, though I did quite enjoy the movie in question. Really, what sort of a critic do you think I am? Certainly not one who resorts to nonverbal, quantitative means of expressing opinions. This just isn’t that kind of newspaper.
Nor, however, is it the kind that will permit me to print the title of Steve Anderson’s rowdy and contentious new documentary, which consists of a single, highly versatile English word.

Based on these asterisked reviews, you might think the Times holds fuck to a more stringent standard than shit. But the F-word has in fact appeared at least once in the paper's history: on Sep. 12, 1998, when it printed the entire text of The Starr Report. Buried inside, the careful reader will find this passage:

So there you go. According to the Times, the F-bomb is acceptable in a transcribed snippet from a presidential intern, but not in a band name or a movie title. All clear?

[Update, 11/15: Daniel Helm notes that referring to a "ferocious band from Toronto" with eight asterisks is especially ambiguous because there's another Toronto band called Holy Fuck.

And for more discussion, see the comments on Languagehat.]

Posted by Benjamin Zimmer at 11:07 PM

Like a ring in a bell

So far today, four people -- in order, Avi Rappoport, Colin Barrett, Jason Wright, and Jon Peltier -- have written to point out an odd expression, "like a ring in a bell", in the second panel of an xkcd cartoon: an eggcorn, or what?  The first three (plus posters on the xkcd forums) identified its source as Chuck Berry's "Johnny B. Goode" (released in 1958), where what Berry actually sings is "like a-ringin' a bell" (Johnny B. Goode can play the guitar just like ringing a bell); the original has been re-worked.  The cartoon:

What happened here?  It starts with a kind of mishearing, in this case a misparsing of the original, with the participial prefix a- interpreted as the indefinite article a and the participle ringin' (a verb form) taken to be the noun ring plus the preposition in.  This is a classic mondegreen (indeed, a phonetically perfect one).  In the resulting interpretation, the expression looks like some puzzling sort of idiom.  But then idioms generally don't make perfect sense; sometimes we just live with them, content to see meaningful parts in them.

Who knows who was the first to mishear Berry's original, or how many people independently came to this analysis, but there are a few webhits for it, most of them quotations of the song lyrics, as in

Because despite his challenges, Eddie has a gift for music and he can play his uncle's guitar "like a ring in a bell."  (link)

In this quote, and in the cartoon, what seems to be conveyed is the ease or naturalness of an activity (something pretty close to the reading I get for Berry's original).  But in others, as in this guitar review, the reference is to beautiful bell-like tone:

Beautiful sound, like a ring in a bell that would make Johnny B Goode swoon! Really- it has a VERY nice ring- good harmonics without brittle, ...  (link)

This looks like the road to eggcornville, with an opaque expression reinterpreted so as to make its parts contribute more to the meaning of the whole.  Mondegreen, then eggcorn: a MONDEGGCORN, to use a term Ben Zimmer suggested on ADS-L back in August.  On the 14th, there was this memorable exchange between Ben and Joel Berson:

Zimmer: To be fair, Mark Peters' Babble article on eggcorns includes a mondegreen ("You're a grand old flag, you're a high-fivin' flag"), so it would be easy for a non-initiate to miss the distinction.

Berson: But high-fiving is a celebratory act, so might not someone think it fits with "grand old"?

Zimmer: Sure. It could very well be a mondeggcorn.

The day before, Wilson Gray had considered a blend analysis of an example from his past:

Back in the day, a friend of ours was under the impression that the once well-known brand-name, "Richard _Hud_nut" was "Richard _Hug_nut," interpretable as "Hug_testicle_." We laughed with him, till we realized that he was serious, at which point we laughed *at* him.

And once Ben had posted on "high-fivin' flag", I replied to Wilson:

This looks like a mis-hearing of the non-word "hud" as the phonetically *very* similar actual word "hug", followed by a
rationalization of this perception in an analysis of the result as involving "nut" 'testicle'.  a little mondeggcorn (tm B. Zimmer).

No doubt there are other potential mondeggcorns in the Eggcorn Database.

Back to "a ring in a bell": Several correspondents noted the marked character of the form a-ringin' for them -- archaic or regional or something.  Quite so.  The a-Vin' form has been much studied (in recent decades, by Wolf Wolfram with various collaborators); in the U.S. these days, it's pretty much limited to some relatively isolated areas (the Appalachians, the Ozarks, the Sea Islands), and even there its use is declining.  (For a summary of its properties, see the discussion in this handout of mine.)  It was once much more widespread in the U.S. (and the U.K.), even standard, but now it reminds most people of Hee-Haw or The Beverly Hillbilles.

Posted by Arnold Zwicky at 06:12 PM

Locavore vs. localvore: the coiner speaks

As I announced yesterday, locavore ('one who endeavors to eat only locally produced foods') has been selected as the New Oxford American Dictionary's Word of the Year. I wondered why the original "locavores" — four women in San Francisco who challenged local residents in 2005 to eat only food grown in a 100-mile radius — chose that particular spelling instead of localvore with an extra l, favored by some other groups. I wrote: "Unlike other word formations lost in the mists of time, this is a case where the origin can be firmly pinpointed, so perhaps the true story of loca(l)vore will be revealed in more detail by the coiners themselves." Well, sure enough, the coiner of locavore, Jessica Prentice, emailed to explain how she came up with the word.

Jessica writes:

I thought about both "localvore" and "locavore" and decided on the latter. First of all, it's easier to say, has a better flow, and almost sounds like a "real" word. But also my understanding is that the prefix "loc(a)" has to do with place — as in "location", "locomotive" and "locus"... The ending "vore" has to do with eating, and is the same root as the word "devour". To me the word "locavore" means, in a sense, "a person who eats the place" or even "one who eats with a sense of place" or, better yet, "one who devours the place" (I enjoy eating). To have used "localvore" would have limited the possible resonances and shades of meaning of the word — in my opinion.

New England locavores added the "l" because (I believe) they didn't like the association with "loca" as in the Spanish for "crazy." I live on the West Coast, where "loca" in that sense is more a positive than a negative. We're less serious out here... :-) Also, if journalists wanted to question me on that association, it would be an opportunity to explain that what is really crazy is the amount of unnecessary importation and exportation of food that currently happens in our globalized food system. So again in that way I find it to be a more expansive word.

(You can hear me talking about locavore and other Word of the Year candidates on "Here & Now," broadcast earlier today on WBUR in Boston.)

Posted by Benjamin Zimmer at 02:11 PM

Mailbag: F0 in Japanese vs. English

Email responding to the recent posts on pitch in Japanese and English ("The perils of mixing romance with language learning", 11/9/2007;"Nationality, gender and pitch", 11/12/2007) provides some additional support for the idea that there are really some cultural differences in this area. Most of the evidence is anecdotal (from non-Japanese speakers) or subjective (from a Japanese speaker), but there is also a reference to some experimental evidence.

Lindsay wrote:

One thing that seems not to have been mentioned is pitch difference when speaking different languages. I recall clearly noticing that the recorded announcements (female voice) on a bus from Narita airport to Chiba were noticeably different in pitch between the Japanese version and English Version. The English sounded "normal" but the Japanese sounded artificially high pitched.

I have also noticed another effect with an airport announcement at Heathrow where a flight to Tokyo was delayed by a late passenger. They called in English a couple of times, then they called in Japanese and finally they called again in Japanese, but this time sounding much more "angry" and with a definite drop in the pitch of the announcer's voice.

Grace write:

I've been reading with interest the recent posts about pitch in Japanese and I just wanted to chime in with my own experience. I, too, speak higher in Japanese than in English. I don't want to, but while I have managed to train myself out of using many other female speech patterns, I still automatically pitch my voice higher when speaking Japanese, regardless of who I'm speaking to or what the situation is. It's just what feels "right".

Melanie wrote:

This is only anecdotal evidence, but when I was in Japan, often I would be having a conversation with a Japanese woman who would be speaking at a "normal" pitch. Then she would get a phone call and answer the phone in a very artificial-sounding (and maybe nasalized?) high pitch, speak in that pitch for the duration of the call, and then continue her conversation with me in a "normal" pitch. I'd always assumed that it was a marker for formal speech for females, but I'd trust a native speaker judgment over my own. This is so prominent I can't believe there hasn't been work on it.

And Axel wrote:

I've been following with interest the posts on Language Log concerning the pitch employed by female gendered (this is obviously not a queestion about sex) Japanese speakers. In fact, this is an area where I myself have done some reading and today I finally managed to re-find a reference that I don't think have appeared in your posts yet.

The source (Ohara. “Japanese Pitch from a Sociolinguistic Point of View.”女性語の世界. (The World of Women’s Language) Ed. Ide, Sachico. Tokyo: Meiji Shoin, 1997) is unfortunately unavailable to me, but I can quote a summary of the work from a term paper in Japanese linguistics I found some years ago.

Traditional view about phonological differences between the speech of men and women is that this is the result of the natural difference between male and female vocal tract. However, a research done by Ohara[iii] disproves this idea.

In this experiment, Ohara asked twelve subjects, six men and six women, to produce speech in four different contexts: Japanese conversation, Japanese reading, English conversation, and English reading. All subjects were native speakers of Japanese. The result was that female subjects clearly employed higher pitch in conversation than in reading for both languages, and higher pitch in Japanese than in English. On the other hand, the recordings of men were scattered so that no particular patterns could be observed, although the pitches of the men were lower than that of women’s in all four contexts. If physiological reasons can solely account for the difference of pitch found in men and women’s speech, then the female subjects should not have employed different pitches for Japanese and English when the male subjects did not. Physiological difference between the sexes does explain the fact that the women spoke with a higher pitch than the men in all four contexts, but it is undoubtedly insufficient to account for the rest of the findings; there seems to be cultural factors involved. In order to further explore the problem, Ohara made a second experiment concerning attitudes toward different pitches.

In the second experiment, Ohara asked two female subjects to utter three greeting words among which there was no difference in politeness level. Ohara then altered each of the six recordings into three different pitches, producing eighteen different pitches in total. Male and female subjects were then asked to assess the pitches, rating the eighteen recordings by terms that define impressions about personal characteristics such as hospitable, smart, or rude. The result showed that both men and women associated lower pitch with characteristics such as “stubbornness,” “selfishness,” and “strength”, and higher pitch with “cuteness,” “politeness,” and “kindness.” This experiment not only proves that personal characteristics are considered closely related to pitch but also that for a woman, higher pitch is generally more desirable than lower pitch (if we agree that “kindness” or “politeness” is more desirable than characteristics like “selfishness”.) The fact that the physiologically more childlike higher pitch is considered more desirable proves the high valuation of young age in Japanese culture. However, Ohara’s experiment did not include assessment of male pitches; we do not know what kind of men’s pitch would be considered more desirable.

Axel didn't say who the author of that term paper was.

Posted by Mark Liberman at 09:18 AM

November 12, 2007

Great moments in antedating, part 2: all nine yards of goodies

Back in June I reported on a newly discovered citation for the expression "the whole nine yards" from April 1964, two years earlier than what had previously been the first known appearance of the phrase. It was a small but significant antedating of an idiom whose origin remains surprisingly mystifying, despite its relatively recent vintage. Now, once again thanks to the American Dialect Society mailing list, we have news of another incremental step backwards in the "nine yards" paper trail. This time around it's Bonnie Taylor-Blake who has made the discovery, hunting down a lead on Google Book Search to find this passage from a letter to the editor in the December 1962 issue of Car Life:

The letter writer, one Gale F. Linster of Decatur, Georgia, refers to "all nine yards of goodies" in a decidedly offhand way. As Bonnie wrote in a follow-up post, this seems to indicate "the letter-writer's and editor's nonchalance about the expression and their implicit confidence that the figurative use of 'all nine yards of' would be easily understood by readers of Car Life." It also moves us away from the various theories that "all/the whole/the full nine yards" had a military origin of some sort, for instance that it had to do with the nine yards of ammunition carried in a fighter plane (not to mention more fanciful explanations). Mr. Linster could have been a military man, but his usage appears in an entirely generic context, as a way of describing the full extent of "goodies" or optional features on a Chevy Impala.

Thus, despite all the theorizing about the provenance of "nine yards," it appears that the phrase merely began as an exaggerated way to describe a long "laundry list" of items, not necessarily some material like ammunition or fabric that can be measured in yards. On the ADS list, Doug Wilson suggests the image here is that of a long list of features on a window sticker at an auto dealership. So "nine yards" could simply refer to the length of an itemized list of this sort, with "nine" a more or less arbitrary figure for hyperbolic purposes.

I should note that this discovery is typical of how Google Book Search now provides limited assistance to participants in what Erin McKean recently called "the competitive sport of antedating." Bonnie Taylor-Blake happened upon the relevant volume of Car Life but had no way of determining the precise context or even the correct issue and page number because of the limitations of Google's "snippet view." Fortunately, the metadata for this record includes accurate volume information ("v.9 1962-1963"), which allowed Bonnie to zero in on the correct page in a library copy of Car Life. (She was gracious enough to send me the page scan that I have excerpted above.)

I resorted to much the same tactics when locating early citations for the "crisis = danger + opportunity" meme in issues of the missionary journal Chinese Recorder from 1938. The moral is that Google Book Search can be an extremely powerful research tool, but very often it must be complemented by intensive library investigations. At least, that's how things will be until improvements are made to the implementation of "snippet view" and to the accuracy of bibliographic metadata. Caveat lector.

Posted by Benjamin Zimmer at 11:45 PM

Locavore or localvore?

One of the more enjoyable duties I have as an editor at Oxford University Press is working with OUP's lexicographers to select the Word of the Year. This year the selection is locavore, used to describe those who endeavor to eat only locally grown foods. You can read all about it, along with the runners-up, over on OUPblog.

When I researched the brief history of the word since its 2005 coinage, I found that there are actually two competing forms: locavore and localvore. The four women in San Francisco who launched the movement two years ago prefer locavore (and run the website But soon after the Bay Area initiative, other regional groups took up the cause, sometimes (as in Vermont) spelling it localvore. (Still others prefer locatarian, but that seems to be running a very distant third.)

Locavore and localvore are new enough that people are still trying to decide which variant to use, often making their choice for personal, stylistic reasons. The blogger Cincinnati Locavore explains: "I settled on locavore for reasons of simple laziness: it's easier for me to pronounce." Noe on My Live Earth agrees that locavore is "a little easier to say," but finds it "a bit too close to 'locovore.'" Locovore does actually show up as an occasional variant, and there's even some precedent for using loco- as a prefix for 'local' in "loco-descriptive poetry," though the OED says the loco- in loco-descriptive was "erroneously taken as a combining form of L. locus place" based on the word locomotion. In any case, Spanish loco 'crazy' would now preclude the spread of locovore and may even cause some semantic interference with locavore — after all, loca is just the feminine form of loco. And who wants to be misinterpreted as a crazy eater (or an eater of crazy things)?

Despite the loco complaint, it looks like locavore has emerged as the preferred form, and the "ease of pronunciation" rationale might have something to do with it. Granted, there are plenty of words in English with the [lv] consonant cluster appearing intervocalically, like silver, pelvis,  and salvage. Historically, the [lv] cluster has been simplified to [v] in some environments, as in halve, calve, and (for some speakers) salve, indicative of a more widespread process of l-dropping in Middle English before labial or velar consonants (see also talk, folk, calm, etc.). In those cases, however, [lv] appears word-finally, so what's the problem with medial [lv], especially when it straddles a morpheme boundary, as in local + -vore?

I think in the case of localvore, the fact that the preceding vowel is unstressed (as opposed to silver, etc.) means that the [lv] cluster is more prone to l-dropping or l-vocalization, even among speakers whose dialects don't normally make them l-vocalizers. In American English, at least, the second [l] in local and hence localvore is typically a "dark l," making it more susceptible to vocalization or reduction. It seems roughly equivalent to speakers who drop (or reduce) the [r] in words like neighborhood but are otherwise rhotic. (Note the similar stress patterns of localvore and neighborhood, with the liquid [l/r] occurring at the end of a syllable with an unstressed nucleus and before a syllable with secondary stress.) So that means, unless you're being very careful, localvore may end up sounding like locavore anyway. And if you're pronouncing it without the [l], why not spell it that way too?

This phonological situation may, in fact, have had something to do with the coinage of locavore in the first place, since there's no precedent that I'm aware of for using loca- as a prefix meaning 'local.' Maybe when the San Francisco group was tossing around possible names for their movement, someone suggested localvore and it was construed by someone else in the group as locavore. Unlike other word formations lost in the mists of time, this is a case where the origin can be firmly pinpointed, so perhaps the true story of loca(l)vore will be revealed in more detail by the coiners themselves.

[Update, 11/13: The coiner of locavore explains how she came up with the word here.]

Posted by Benjamin Zimmer at 12:44 PM

How (not) to develop a cognitive neuroscience of politics

[This is a guest post by Martha Farah, also posted at the Neuroethics and Law Blog. Some other comments on the same topic can be found here.]

This morning's New York Times Op Ed page presents us with dazzling pictures, from the lab of Marco Iacoboni, of the brains of swing voters as they react to photos and videos of the leading presidential candidates. Accompanying these pictures are interpretations of the patterns of brain activation offered by Iacoboni and his collaborators. Mitt Romney evokes anxiety -- this is deduced from amygdala activation. John Edwards' detractors feel disgust toward him -- this is apparent in the insula of these subjects.

I suspect that most of the New York Times-reading cognitive neuroscientists of the world spent some of their Sunday morning grousing to their breakfast companions about junk science and the misapplication of functional brain imaging. Having just finished my own grousefest, I would like to undertake a slightly more constructive task -- Distinguishing among what I consider to be good and bad reasons for skepticism about the conclusions of Iacoboni and colleagues, and suggesting a way to validate this sort of work.

First, some criticisms that I don't think this work necessarily deserves, starting with the old "you can process brain imaging data to make it show anything" criticism. There is indeed a large amount of data processing involved in creating functional brain images, and in the hands of naïve or unscrupulous researchers this can distort the evidence. But the idea that functional brain images are more susceptible to fakery than many other kinds of scientific evidence is debatable. I think the extreme skepticism about image processing that one sometimes encounters is an overreaction to the realization that functional brain images are not as simple and straightforward as, say, a photograph. At present I see no reason to suspect that Iacoboni and colleagues did anything stupid or sleazy with their image processing.

Another common criticism leveled against various commercial and "real world" applications of brain imaging is that such imaging simply cannot provide useful information about the mental states of individuals, for example their reactions to specific political candidates, and that any use of brain imaging for such purposes is junk science. Functional MRI is a relatively new method, and its potential for measuring all kinds of psychological phenomena is still a matter for experimentation and exploration. Although the most tried and true applications of fMRI involve generalizations about groups of subjects performing scores of repetitions of tightly controlled experimental tasks, there are also indications that it can be extended beyond such uses. We should keep our minds open to the possibility that fMRI can indicate the kinds of attitudes and feelings that are relevant to political campaigns.

So why do I doubt the conclusions reported in today's Op Ed piece? The problems I see have l"ss to do with brain imaging per se than with the human tendency to make up ďjust so" stories and then believe them. The scattered spots of activation in a brain image can be like tea leaves in the bottom of a cup -- ambiguous and accommodating of a large number of possible interpretations. The Edwards insula activation might indicate disgust, but it might also indicate thoughts of pain or other bodily sensations or a sense of unfairness, to mention just a few of the mental states associated with insula activation. And of course the possibility remains that the insula activation engendered by Edwards represents other feeling altogether, yet to be associated with the insula. The Romney amygdala activation might indicate anxiety, or any of a number of other feelings that are associated with the amygdala -- anger, happiness, even sexual excitement.

Some of the interpretations offered in the Op Ed piece concern the brain states of subsets of the subjects, for example just the men or just the most negative voters. Some concern the brain states of the subjects early on in the scan compared with later in the scan. Some concern responses to still photos or to videos specifically. With this many ways of splitting and regrouping the data, it is hard not to come upon some interpretable patterns. Swish those tea leaves around often enough and you will get some nice recognizable pictures of ocean liners and tall handsome strangers appearing in your cup!

How can we tell whether the interpretations offered by Iacoboni and colleagues are adequately constrained by the data, or are primarily just-so stories? By testing their methods using images for which we know the "right answer." If the UCLA group would select a group of individuals for which we can all agree in advance on the likely attitudes of a given set of subjects, they could carry out imaging studies like the ones they reported today and then, blind to the identity of personage and subject for each set of scans, interpret the patterns of activation.

I would love to know the outcome of this experiment. I don't think it is impossible that Iacoboni and colleagues have extracted some useful information about voter attitudes from their imaging studies. This probably puts me at the optimistic end of the spectrum of cognitive neuroscientists reading this work. However, until we see some kind of validation studies, I will remain skeptical.

In closing, there is a larger issue here, beyond the validity of a specific study of voter psychology. A number of different commercial ventures, from neuromarketing to brain-based lie detection, are banking on the scientific aura of brain imaging to bring them customers, in addition to whatever real information the imaging conveys. The fact that the UCLA study involved brain imaging will garner it more attention, and possibly more credibility among the general public, than if it had used only behavioral measures like questionnaires or peopleís facial expressions as they watched the candidates. Because brain imaging is a more high tech approach, it also seems more "scientific" and perhaps even more 'objective." Of course, these last two terms do not necessarily apply. Depending on the way the output of UCLA's multimillion dollar 3-Tesla scanner is interpreted, the result may be objective and scientific, or of no more value than tea leaves.

[The above is a guest post by Martha Farah.]

Posted by Mark Liberman at 11:52 AM

Nationality, gender and pitch

Do the Japanese exaggerate the natural differences between men and women in the pitch of the voice? Some people say so, and so in this morning's Breakfast Experiment™, I take a look at the facts of the matter. The results are interesting -- and I hope you'll enjoy the discussion, which takes in empirical sociology, cognitive neuroscience, and the U.S. presidential campaign.

First, some background. This all started when we took a look at a report by Matthew Rusling about the problems he created for himself by trying to learn Japanese from his girlfriend ("The perils of mixing romance with language learning", 11/7/2007):

I thought my Japanese was fine, while in reality the effeminate, almost childish twang I had been learning made me sound very much like a 20-something, pink miniskirted Japanese woman.

The female features that he copied included certain modes of self-reference, elongation of certain utterance-final syllables, use of certain female-associated particles, and perhaps a more feminine pattern of honorifics. But the biggest issue, he reports -- at least the one that annoyed his girlfriend the most -- was pitch:

Most of all, she said, I needed to take the pitch of my voice down several notches from the tone I had learned.

I can see someone learning the wrong particles and modes of self-reference -- though the gendered nature of these things in Japanese is widely discussed in language-learning texts -- but it seemed to me

... a little surprising that Rusling never realized that women have higher-pitched voices than men do, and that exaggerating the difference is a way of explicitly drawing attention to sexual identity -- this general pattern is hardly unique to Japanese.

But various readers took me to task for this. Randy Alexander wrote:

Come on, that's a little below the belt! I'm 100% sure that he's factoring in a male-female difference of about an octave. His girlfriend meant that he was talking too high in his range. I think if you listen to a few Japanese people having a conversation in Japanese, you might notice that the women talk higher in their range than the men do.

Karen Kay wrote:

My voice is higher when I speak Japanese. When John Wayne's voice is dubbed into Japanese, he sounds like Barry White instead of John Wayne because that's his cultural image. This is one of the things I taught when I taught Japanese language, pitching your voice higher or lower.

And some observation in the blog of an English teacher in Korea ("Deep-Voiced Japanese and Pitch Parrots", 11/10/2007):

While I was in Japan to renew my visa I thought the bus drivers talked in an unnaturally low voice. I remember the first bus I rode, the driver was a big guy, looked like he mightíve been a former sumo wrestler, so when he got on the intercom and announced a stop I thought his extremely low voice couldíve been natural. But then other bus drivers much smaller in stature also spoke in the range of two octaves below middle C. Thanks to this recent post at Language Log it seems thereís some method to this madness.

Referring to the Language Log post: In class when I ask students to model my pronunciation Iím always taken aback when a young woman not more than 90 lbs not only copies my pronunciation but also drops her pitch to match mine. But then again, Iíve done the same thing when Ju-yeong has taught me Korean; not thinking about my tone I tend to match hers - which gives her and her friends a good laugh.

My own impression, for what it's worth, is that some Japanese women speak in a higher pitch range than I expect, while Japanese men use about the same range of pitches as Americans. But I don't trust such impressions very far, so I decided to check.

For a start, I took some recordings of telephone conversations made about a dozen years ago in the "CallHome" project, and published by the Linguistic Data Consortium in 1996 and 1997. There were 120 Japanese conversations of about half an hour each; I decided to focus on 18 conversations that involved one male and one female participant. (The rest of the conversations involved two males, two females, or -- more than one participant on each side of the conversation). For comparison, I took the 27 CallHome English conversations with the same characteristic -- just two participants, one male and one female.

I pitch-tracked all of the conversations using the get_f0 program from the ESPS software system. [This was originally written by Dave Talkin based on an algorithm by George Doddington -- this is the pitch tracker used in WaveSurfer from KTH in Stockholm, but I used a standalone version available as part of a free package here.]

This produces quite a bit of data -- around four and a half million pitch values, divided among the four categories of nationality and sex.

One simple way to compare those four categories is to lump all the pitch data from all the male Japanese speakers together and look at the quantiles of fundamental frequency values -- the 10th percentile was 88.5 Hz., the 50th percentile was 122.1 Hz., the 90th percentile was 207.0 Hz. -- and do the same for the female Japanese speakers, the male Americans, and female Americans.

If we plot the results in terms of semitones (relative to A 110), we get this:

The same data plotted with pitch estimates in Hertz (cycles per second):

So sure enough, the Japanese speakers are more gender-polarized -- the male Japanese speakers are pitching their voices somewhat lower (overall) than the male Americans, while female Japanese speakers are overall somewhat higher-pitched than female Americans.

How big is the effect? The table below shows the overall female-male F0 difference in semitones at percentiles from 10% to 90%:

  .1 .2 .3 .4 .5 .6 .7 .8 .9
Americans 6.0 8.4 8.1 7.5 6.9 6.1 4.8 4.4 3.7
Japanese 9.2 9.5 9.2 9.1 9.3 9.0 8.3 7.6 6.2

Overall, the Japanese (in this sample) separate the sexes by one to three semitones more than the Americans do. Since each semitone corresponds to a pitch difference of about 5%, this is a difference with a certain amount of oomph. (With 4.5 million data points, the difference is highly "significant" in the statistical sense, though that fact is of no value or consequence whatsoever.)

Does this mean that the stereotype about national differences is correct, this time?

Maybe so, but I'd recommend caution.

And I don't mean to repeat my usual warnings about pop platonism, which mistakes overall group differences for essential properties of individual group members. It's certainly true that the distributions overlap: in the sample I used in this experiment, there are plenty of pairs of Japanese male and female speakers whose pitch ranges are closer than some pairs of American male and female speakers.

But that's not what worries me most in this case. My main concern is that these speakers may not be typical of the categories we're trying to learn about by studying them.

The speakers in these conversations were not randomly selected Japanese and American male and female speakers. They were recruited by offering free overseas telephone calls. in the mid-1990s, pre-Skype days when international calling rates were often several dollars per minute. All calls originated in the U.S., and so the Japanese participants were (I think) mostly students calling their parents, while the American participants were a more mixed group.

I picked the calls purely on the basis of nationality and sex, but my sample was not controlled for age, class, caller's relationship to callee, or for the interaction of those categories. So perhaps we've discovered that male Japanese students and their mothers tend to polarize their pitch ranges; or that American married couples tend to harmonize their pitch ranges; or something else entirely. I haven't looked into the ages and relationships of the participants in these conversations, so I don't mean to suggest that these explanations are likely ones -- I'm just spinning out some ideas about things that might be going on.

This is why social scientists put a lot of effort into controlling the demographic characteristics of survey participants. This is also partly why they use large sample sizes -- no matter how carefully you control for the obvious things, there are always lots of subgroup or individual differences that you have to treat as noise (and you hope you're lucky enough that all the other stuff averages out in your sample -- it probably usually doesn't, alas).

For some reason, experimental psychologists in general, and neuroscientists in particular, don't seem to have learned these same lessons. And their willingness to draw conclusions about group properties from tiny and uncontrolled samples is amplified by journalists and politicians.

I've cited many cases where brain imaging studies involving a handful of subjects -- and often with marginal results on those -- have been interpreted as telling us something about men and women in general, or boys and girls in general, or members of other general categories. For example, here's a study of 9 boys and 10 girls used to argue that "Girls and boys behave differently because their brains are wired differently"; here's a study of 10 female and 10 male medical students at UCLA used to argue that "Women really do enjoy a good laugh as much as you do; they are just wired to focus on different aspects of humor."

A beautiful example of the same thing was published yesterday in the New York Times: Marco Iacoboni et al., "This is your brain on politics":

In anticipation of the 2008 presidential election, we used functional magnetic resonance imaging to watch the brains of a group of swing voters as they responded to the leading presidential candidates. Our results reveal some voter impressions on which this election may well turn.

Our 20 subjects -- registered voters who stated that they were open to choosing a candidate from either party next November -- included 10 men and 10 women. In late summer, we asked them to answer a list of questions about their political preferences, then observed their brain activity for nearly an hour in the scanner at the Ahmanson Lovelace Brain Mapping Center at the University of California, Los Angeles. Afterward, each subject filled out a second questionnaire.

We don't learn anything more about who these 10 males and 10 females were -- medical students again? It doesn't matter, anyhow, since in such a small sample, it's impossible to control for the range of demographic, cultural, individual and random factors that are likely to swamp whatever conclusions you'd like to draw about the political reactions of American voters in general.

This doesn't prevent the authors from making sweeping generalizations about voters' reactions to particular parties, issues and candidates, e.g.

The two areas in the brain associated with anxiety and disgust -- the amygdala and the insula -- were especially active when men viewed "Republican."

And as this example indicates, they're particularly eager to draw conclusions about gender differences:

Men show little interest in Mrs. Clinton initially but after watching her video they react positively. Women respond to her strongly at first, but their interest wanes after they watch her video.

With Mr. Giuliani, the reactions are reversed. Men respond strongly to his initial still photos, but this fades after they see his video. Women grow more engaged after watching his video.

This is evidence that swing voters' responses change when they see these two candidates in action. For men, Mrs. Clinton is a pleasant surprise. For women, Mr. Giuliani has unexpected appeal.

I wonder how consistent these reactions were in their sample. I'll bet there was some overlap, and maybe a lot. But again, it doesn't matter -- it's irresponsible to take the responses of 10 medical students (or whatever) recruited at UCLA as a proxy for the reactions of 55 million male or 55 million female U.S. voters.

Their conclusions might be true, or they might not be, but the fact that some of their evidence comes from high-tech brain-imaging machines doesn't make the results any more likely to generalize to American voters as a whole than if they asked for a show of hands in their Introduction to Neuroscience class.

I'm not surprised to see that some political scientists understand this. Thus Brendan Nyhan writes ("Watch out for brain scan hype"):

When you read about brain-scanning studies like the one in today's New York Times, remember that interpreting fMRI data is more art than science and that the sample sizes are tiny and unrepresentative. I don't know how to interpret any of the claims in the article without way more information, which will hopefully be forthcoming in the authors' academic work.

Many people are concerned about the misuse or misinterpretation of this technology because brain scanning carries perceived scientific authority -- so much so that even irrelevant neuroscientific information can be perceived as more persuasive. As a result, even preliminary and not particularly newsworthy studies like this one may receive a great deal of hype.

What are the prospects that neuroscientists and journalists might learn these lessons? I try to stay optimistic, but each group has an interest in failing to understand the limitations of such studies.


[In passing, let me point out that the discussion above ignores the possibility that a national difference in gender polarization might be genetic rather than cultural. That's because I think a genetic explanation is vanishingly unlikely to be true -- if the pitch-range polarization effect is real, it's probably a fact about Japanese vs. American societies, not Japanese vs. American genomes.

But in many analogous cases, similar or even smaller smaller group differences have been presented as evidence for genetic differences (between women and men, or between Africans and Europeans, or whatever). And in this case, we know that the basic sex difference in pitch range is genetically based, caused by the effect of testosterone on the male larynx at puberty; and we know that this particular form of sexual dimorphism is a relatively recent innovation in human evolution, not shared with chimps and gorillas, so that it must have been under selective pressure in recent times, and might still be.

It wouldn't be nuts to imagine that the selective pressures on this trait might have been unusually great, in Japanese society over the past 1,000 years or so, creating a increase in average laryngeal dimorphism. But I doubt it. ]

Posted by Mark Liberman at 07:39 AM

November 11, 2007

Lolcat, meet eggcorn

This item will eventually go on the Eggcorn Database (it's not there yet, nor is it in Brians), but since the eggcorn comes with a lolcat, I'm putting it on Language Log as well:

(Hat tip to Ken Mallott.)

Porthole for portal is astonishingly common.  In fact, there are so many relevant Google webhits for {"porthole to"} that it looks like a candidate for the "nearly mainstream" category on the database.  A few examples:

Porthole to the Past
is a total view of the old town of Deadwood from 1876-9 before it all burned down in 1879.  (link)

Commenting on her new business, she told the Hull Daily Mail: "To me, this is a porthole to the rest of the city." (link)

Your porthole to Mars! Time on Mars! Your Mars Clock!  (link)

Semantically, it's a natural.

Posted by Arnold Zwicky at 12:55 PM


Mark Liberman has pulled up an item from his to-blog list which coincides with an item on MY to-blog list, about "translations" of music videos in one language into another language, adhering fairly closely to the sound of the original, while sacrificing the sense entirely.  As Ben Ostrowsky noted in mail to Mark, these videos represent a resurgence of an old form of language play; he cited the 1967 book Mot D'Heures: Gousses, Rames, which provides a number of Mother Goose rhymes in English "translated" into French.

A little while ago, Laura Kalin wrote me about what's presented as a "phonetic transliteration of a Dutch kiddies song" entitled "Fart in the duck" (already you know it's a joke), which Mark also mentioned in his posting.  She and I wondered what such things should be called.  Ben Ostrowsky suggested "autour du mondegreens", and Mark used "cross-linguistic mondegreens" in his posting.  The connection to mondegreens strikes me as not quite right, and it turns out that there two phenomena other than mondegreens that are very similar to these cross-language re-workings of texts.

What's going on in these re-workings is the deliberate substitution of words in one language for phonetically similar words in some text in another language.  It's a kind of word-by-word TRANSLATION (based on sound rather than meaning), not transliteration.  There are large-scale translations, like the English-to-French wonders in Mot D'Heures: Gousses, Rames, and small-scale ones, like the German-to-English "donkey fieldmouse" for "danke vielmals".  They differ from mondegreens in that the classic mondegreens ("Excuse me while I kiss this guy") are accidental mishearings, and these translations are deliberate plays on words.  (They also differ from mondegreens in that they cross languages, but that difference is recognized in the labels "autour du mondegreens" and "cross-linguistic mondegreens".)

Not surprisingly, people have wondered about such things before.  There was a Linguist List discussion back in 1999 (summary 10/13/99 by Anatol Stefanowitsch) on "bilingual puns", taken to extend to phenomena like these.  But they're not much as puns; good puns evoke two different interpretations, and there are bilingual puns that do this.  From the Wikipedia page:

Q: Did Herr Beethoven write ten symphonies?
A: Nein.

The duck-fart type, in contrast, is fine in the original language, but just nonsense (though often suggestive nonsense) in the recipient language.

Then there's literature on "homophonic translation" WITHIN a language; see Heidi Harley's Language Log posting "O grammar, water bag noise!" on things like "Ladle Rat Rotten Hut" (for "Little Red Riding Hood", in case you missed Heidi's posting).  (You could quibble about "homophonic" in this label, since almost all the substituted words are not identical in sound to the originals, but merely similar.  But of course, substituting true homophones wouldn't work, since the translation would then be identical to the original; there would be no joke at all.)

What we have in the duck-fart examples is a combo of real bilingual puns with (monolingual) homophonic translations: bilingual homophonic translations.  Ok, that's a really good label in some ways, but it's long and clunky and hard to love.  BHTs?  Or a label like "mondegreens" and "eggcorns": duck farts, donkey fieldmice, donkey fieldmouses, ...?

Posted by Arnold Zwicky at 11:35 AM

Ask Language Log: See spot run

Nic Bommarito wrote to ask about a famous sentence from the infamous Dick and Jane textbooks of the 1950s: "See Spot Run."

I don't know how to explain what is happening in this sentence.'See' is an imperative but 'run' I can't explain.

Searching the internet, Nic found an analysis of this sentence on Cecil Adams' excellent site The Straight Dope. After considering three theories from "a survey of experts" and finding them wanting, Adams turns to his "trusty 1936 edition of A Writer's Manual and Workbook by Paul Kies", and finds a sentence diagram that "looks like the Flying Wallendas playing jai alai".

Adams leaves it there, after asking the gnomic question "This week sentence diagramming, next week world peace?" This might be taken to imply that an understanding of syntax must await the Messaiah, and it left Nic understandably puzzled as to whether the original question had been answered or not.

To Nic and any others who are interested, I recommend chapter 14 of the Cambridge Grammar of the English Language, "Non-finite and verbless clauses", where in section 5.4, he'll find see numbered among the verbs of class 3Bii, and learn that "run" in "See Spot run" is known as a bare infinitival.

These verbs, together with smell from Class 3Cii below, are the verbs of sensory perception. All take bare infinitivals, while [some of them] take a to-infinitival as well. For see we have the following possibilities:

[43] i a. We saw Kim leave the bank. b. *Kim was seen leave the bank.
  ii a. We saw Kim leaving the bank. b. Kim was seen leaving the bank.
  iii a. We saw Spurs beaten by United. b. ?Spurs were seen beaten by United.
  iv a. We saw him to be an imposter. b. He was seen to be an imposter.

We put the to-infinitival last because this does not represent the primary sense: it is not a matter of sensory perception but of mental inference. In this construction, see behaves like the verbs of cognition/saying (Class 3Aii), following their pattern of favouring a matrix passive and the verb be in the subordinate clause, of allowing the perfect (He was seen to have altered the figures), and of alternating with the finite construction (We saw that he was an imposter).

The primary sense, illustrated in [43i-iii], involves two arguments, an experiencer and an stimulus (the situation perceived): Kim in [i-ii] thus does not represent an argument of see.

Just for grins, let's compare the four viewpoints presented in Cecil Adams' 1998 answer.

His first "expert" says that the sentence is simply "malformed" and should be replaced by "Did you see Spot running?". Adams, quite appropriately, rules this out as "cheating", since it pretends to answer the question by claiming that it never should have been posed in the first place.

The second "expert" says that Spot run is a "small clause", which Adams describes as "a piquant term brought to us courtesy of transformational grammar, a field of linguistics we might think of as sentence diagramming for adults". Adams is not satisfied by this answer, because he finds that after telling him that "[a] small clause consists of a noun phrase followed by some other kind of phrase" and that "[in] 'See Spot run,' Spot is the noun phrase and run is a verb phrase", he believes that "your transformational grammarians do not trouble themselves with such details as what kind of verb" run is.

This is more than a little unfair to modern linguists in general, and in particular to my former classmate Edwin Williams, who introduced the term "small clause" in his 1974 dissertation "Rule Ordering in Syntax", a work that was quite explicit about its theory of what kinds of entities make up a sentence like "See Spot run". If Cecil had managed to reach him, I'm sure that Edwin would have been happy to describe and explain the nature and distribution of verbs like run in such sentences -- and in descriptive terms, at least, Edwin's discussion would have been quite similar to the one that I've quoted above from CGEL.

On the other side, it's fair to complain that before the publication of CGEL in 2002, modern linguists had not provided a systematic and accessible account of English syntax. (Though I'm sure that Quirk, Greenbaum, Leech & Svartvik's Comprehensive Grammar also gives a clear and straightforward analysis of the "See Spot run" type of construction.) Edwin's dissertation would not be an easy read for an interested member of the general public -- and an effort to master the technicalities necessary to follow its details would be an exercise of mainly historical interest, since field's treatment of syntax has undergone several transformations and bifurcations of notation and terminology since 1974.

Adams' third "expert" presented an analysis somewhat like CGEL's, with slightly different terminology and an unnecessary generalization of the notion of "direct object":

Spot run is an "objective infinitive." Now we're getting somewhere. An infinitive is an uninflected verb form commonly beginning with to, as in to run. In an objective infinitive (and doesn't that sound like something you could get Unitarians to pray to?), the noun is modified by the infinitive, and the two parts together--in this case Spot run--are the direct object of the predicate, see. One may object: How can run be an infinitive? There's no to in front of it. My informant explained this by saying the to was "understood." "See Spot [to] run"? I don't think so.

This is again a bit unfair -- asking "How can run be an infinitive? There's no to in front of it." reminds me of the class of bad jokes "If you're a captain, where's your ship?"

And indeed, Adams is happy to take Paul Kiel's word for the notion that to, though it's the "sign of the infinitive," is "frequently omitted, especially after such verbs as help, make, bid, feel, see, hear, dare, need". And he adopts the associated sentence diagram as the background for the picture at the head of his entry:

Unfortunately, CGEL lacks evocative graphics. Nevertheless, pending messianic developments, it should be the go-to source for questions about English syntax.

Posted by Mark Liberman at 08:21 AM

November 10, 2007

Autour-du-mondegreens: bunkum unbound

Ben Ostrowsky writes:

I've noticed a new tendency toward an old kind of language play ("Mots D'Heures: Gousses, Rames"). The new rules:

1. Find a music video in a language you don't understand.
2. Construe the lyrics as if they were in a language you do understand. Aim for the surreal and outrageous.
3. Add your subtitles and upload the video to YouTube.

Ben offered two examples, first a hacked video of Dschengis Khan's Moskau (from which the still is taken), and an Icelandic children's video reborn under the title "You are a Pirate! RAUNCHYY" ("Riding a lemur is alright with me...").

The first of these that I saw was another children's video, originally Dutch, that was posted on youtube in early September as "Fart in the duck". Chandan Narayan sent me the link, and it's been on my to-blog list ever since.

Ben adds:

I can think of no better name for these than "Autour-du-mondegreens" and selfishly hope nobody else can, either.

One lesson to learn from these subtitling efforts is how easy it is to find non-systematic phonetic similarities across languages, of the sort that Daniel Cassidy has used to see the relationship between bunkum and Buanchumadh and more broadly to argue that The Irish Invented Slang.

Robert Cumming writes:

My guess is that the first autour-du-mondegreen to really take off could have been the Arabic-to-Swedish 'Hatten är din' in 2000. See Wikipedia ( and Youtube ( Another well-known example from about the same time - and the same artist, Azar Habib - is 'Ansiktsburk' (face-tub/carton) ( In Swedish, the genre is known as 'turkhits' and there's a list of the best known on Wikipedia (

At least one blogger has speculated ( that what Azar Habib was singing about was in fact not face-tubs but social networking sites - 'ansiktsbok' is the literal translation of Facebook.

And Caitlin Light points out that another of this phenomenon, in the form of Flash videos known as fanimutations, has been around at least since 2001, with "one of the earliest/most famous being Caitlin observes that "for a while there were a whole lot of videos like this this popping up all over the internet".

Kim Belcher wrote to trace the animutation back to a 2001 effort by then-14-year-old Neil Cicierega, Hyakugojyuuichi!!!. Kim explains:

According to wikipedia, the pieces derived from Cicierega's stuff are now called animutation. As well as mondegreens, they usually include visual non-sequiturs, especially faces of famous people or characters. I don't know if this was derived from your Swedish-out-of-Arabic connection or is entirely independent. Definitely Hyakugojyuuichi!!! is still referenced in many tongue-in-cheek fanart pieces on the web.

And Gwyan Rhabyt writes:

There was an article in Wired this month about Buffalax (who did the Dschingis Khan video you mentioned). [Monty Phan, "Buffalax Mines Twisted Translations for YouTube Yuks", 11/6/2007.] As someone who teaches both video production and postmodern art theory, I'm sure there's a paper in here somewhere, though the genre needs a better title than autour-du-mondegreens.

In my book, Buffalax's masterpiece is:

Gwyan added in a later note:

As a follow up to my last email, the video I mentioned and the male lead, a major Tamil star by the name of Prabhu Deva, have both become known by the name "Benny Lava" after the subtitles from Buffalax. A quick Google search reveals 130,000 hits for "Benny Lava", marking some serious cultural momentum.

Rob Stryker observes that "what my friends and I know as the best-done misheard lyrics out there today" is Kewen's version of the Nightwish song Wishmaster.]

[Olaf Hellman writes:

The song-lyrics-in-a-different-language wordplay is a long-running gag on the Tamori Club TV show in Japan -- It is called 'soramimi'.

Japanese wikipedia lists the Soramimi Hour segment of the 'Tomori Club' TV show as dating from April 1992.

We can take the idea of cross-linguistic mondegreens even further back, to Luis van Rooten's 1967 book Mots D'Heures: Gousses, Rames, which Ben Ostrowsky mentioned in starting this discussion.]

[Cosma Shalizi writes:

I'm not sure if it really counts, since it's nonsense in the base language, but I have always been struck by the following passage from James's Principles of Psychology (ch. 11,

In the meaningless French words 'pas de lieu Rhône que nous,' who can recognize immediately the English 'paddle your own canoe'? But who that has once noticed the identity can fail to have it arrest his attention again?


[And Laura Kalin sends a link to the original Dutch lyrics of "Fart in the Duck", which feature the refrain "er zit een gat in mijn dak" (= "there's a hole in my roof").

[Dan Tobias reports a couple of genuine cross-linguistic mondegreens, based on mishearing (or at least a naively new hearing) rather than an intentional miscontrual:

Regarding your comments about odd renditions (sometimes done intentionally for comic effect) of lyrics from one language to another, I can recall as a kid in first grade being taught the song "Alouette" (with lyrics in French), and singing it using similar sounding words that made more sense to me:

I am wet, ah
Jumping, I am wet, ah
I am wet, ah
Jumping in the rain

Jumping in the rain, I bet
I will get very wet
Very wet, very wet
I am wet, I am wet

I am wet, ah
Jumping in the rain

Also, I've always heard the French lyrics in the middle of the Beatles song "Michelle" as "Sunday monkey won't play piano song".


[Maria Garibotti writes:

As another example, the song "Dragostea Din Tei" (that Romanian song that was a meme a while ago) was, uhm, translated into Spanish by a Spanish band called "Los Morancos". The song is an absurd gay pride song, and it was a big hit in South America and Spain back in 2004 (or 05? Maybe Summer of 04-05). The lyrics are

Marica quien?
Marica Tu
Marica Yo
Marica HAHA! x 4

Valor, a la luz
si eres un Gay tú
Piensalo (piensalo)
es tu vida y si dicen po que digan (que digan lo que quieran)

Valor... valor (mucho valor)
Que oscuro es un armario
Sal de ahí (sal de ahí)
y vente aquí
Tu destino es ser feliz...

Fiesta Fiesta
Y Pluma pluma Gay
Pluma pluma Gay
Pluma pluma pluma Gay x 4

Que importa si el niño sale gay
tu has nacido gay
aunque cueste...
Hay que gritarlo...
°°° SOY GAY !!!
Fiesta Fiesta
Y Pluma pluma Gay
Pluma pluma Gay
Pluma pluma pluma Gay x 4

Marica quien?
Marica Tu
Marica Yo
Marica JAJA!


Posted by Mark Liberman at 12:15 PM

November 09, 2007

No comment on this street name

Talking of street names in Scottish cities, such as Edinburgh, I need to just mention that I went to the ancient city of St Andrews last weekend for a philosophy conference (an excellent one), and I happened to notice the name of one long narrow street down which it was often necessary for me to pass. Its very striking name was Butts Wynd. Now, I am not going to make any comment on this, because I think American readers will agree that comment would be flatulent. Sorry, I mean fatuous. Superfluous. Whatever. Sorry.

By the way, I found the above picture of the sign in the Dictionary of Words in the Wild, where it was accompanied by the legend: "street sign amused dumb Americans in St. Andrews, Scotland." Well, really! We Americans are dumb simply for having a sense of humor?

It made me recall with a certain sympathy the patient in the very old joke whose psychological evaluation reveals that he finds every one of the Rorschach inkblots violently and obscenely sexual. "Me sexually obsessed? You're the one who brought out the collection of filthy pictures!"

Posted by Geoffrey K. Pullum at 03:30 PM

Gullibility in high places

Suppose you hold some crank theory for which there is no evidence but which is likely to appeal to some specific audience.  Suppose, for instance, that you believe that Jesus and all of his apostles were gay, an idea that might appeal to some gay people (not me, but tastes and opinions differ).  You then write a book of stories detailing the hot hot man-man sexual exploits of these men, keying each story to a biblical passage.  You manage to get it published.  Does the New York Times then write an enthusiastic feature story about you and your work?  Do you win an American Book Award -- "the purpose of the awards is to acknowledge the excellence and multicultural diversity of American writing" -- for non-fiction?

It sounds unlikely, doesn't it?  But Daniel Cassidy has managed something similar with his book How the Irish Invented Slang (CounterPunch/AK Press, 2007), which maintains that great chunks of English slang came from Irish (an idea that is likely to appeal to some English-speaking people of Irish descent), supplying for each slang expression a (putative) Irish expression that resembles it in pronunciation or spelling.  And now the NYT has (gullibly) celebrated Cassidy and his preposterous book, and the book has (alas) gotten a 2007 American Book Award for non-fiction.

Of course, there ARE some Irish loan words in English -- galore, for instance (see the OED) -- and Cassidy catches many of these (though even there he doesn't cite his sources).  But the problem with the book is that there is no scholarship or real evidence at all in it.  As Grant Barrett says at the beginning of his blog entry on the book:

It is quite incredible that Corey Kilgannon would write in the New York Times about Daniel Cassidy's book How the Irish Invented Slang without talking to historical lexicographers, historical linguists, or experts in Irish Gaelic linguistics.

They would tell him that Cassidy's theories are insubstantial, his evidence inconclusive, his conclusions unlikely, his Gaelic atrocious and even factitious, and his scholarship little better than speculation. In short, his book is preposterous.

Cassidy paints himself as the maligned scholar, the unappreciated genius, the outsider. He may be all of those things, but he is them by choice: his work cannot withstand scholarly scrutiny so he simply cannot afford to join forces with any larger body of experts who do this sort of thing for a living. His book falls apart on first reading by anyone with some expertise in the field.

Read Barrett's blog for details.  For your immediate entertainment, here's a piece from the announcement of a reading and performance by Cassidy on the 6th at the Irish Arts Center in New York City:

In a fast-paced spiel (speal, cutting, sharp speech) of monologues, stories, and songs, Daniel Cassidy slices through the current Anglo-academic baloney (beal onna, foolish blather) which claims that the Irish have had no influence on the American language.

On the gullibility front: the event was sponsored by the Irish Arts Center in association with the Irish American Cultural Institute, NYU's Glucksman Ireland House, and CUNY's Institute for Irish-American Studies, and Pete Hammill and Peter Quinn were there as special guests.  You might want to check out the "critical acclaim" in the announcement, which includes this gem:

"Save the Irish dúid from the Oxford English dictionary!  Daniel Cassidy has shaken the study of linguistics in the U.S. with a startlingly new theory -- that much of American slang has been borrowed from Irish... Cassidy's ideas have rapidly gained academic respectability since the publication of his book early this summer. This book is truly amazing!" (Eamonn McCann, Belfast Telegraph)

 By the way, dúid is Cassidy's source for English dude.

[Added 11/11/07: Mark Liberman took on Cassidy on the word bunkum last year -- and made the connection to the father in My Big Fat Greek Wedding, who declares that English words (most remarkably, kimono) all come from Greek.]

Posted by Arnold Zwicky at 01:47 PM

More autantonymic slogans

A couple of days ago, I cited "Go, Musharraf, go!" as a slogan with an special kind of ambiguity: it has two interpretations, one of which is the opposite (in some evaluative sense) of what its users intend. I suggested the Olympia beer slogan "It's the water" as another with the same property.

Several readers have sent in additional examples. Gregg Drube wrote:

Ten or fifteen years ago, the slogan used for the combined meetings of the South Dakota school superintendents association, the School administrators of South Dakota, and the South Dakota Education association was "Educating for Change." At that time (and still, I think) South Dakota ranked 51st (out of 50) in average pay for teachers. I'm not sure what the intended meaning of the slogan was. I was teaching math at a high school in South Dakota at that time...I know what I thought it meant.

Don Porges suggested: "You'll never outlive your money" and "See how far your dollar can go".

Bruce Rusk :

At a conference I once attended, a speaker was delayed and a written statement was read in her absence. It stated that the field of study in question "isn't going anywhere," which depending on intonation could mean either that it's "here to stay" or "just spinning its wheels."

Randy Alexander reminded me of "Nothing sucks like an Electrolux". It seems to be a matter of debate whether this was an intentional pun or not -- Wikipedia says:

In the 1960s, the company successfully marketed vacuums in the United Kingdom with the slogan "Nothing sucks like an Electrolux." British consumers took the slogan literally because "sucks" as a term of disparagement is strictly an Americanism. However, US Americans often incorrectly believe that this was a brand blunder, and this is erroneously claimed even in college business textbooks. In fact, the informal US meaning of "sucks" was already well known in the UK at the time, and the company's "marketing people were fully aware of the possible double entendre and intended it to gain attention".

But apparently this is a case where the debunkers are themselves debunked -- Jesse Sheidlower, whose business it is to know these things, writes:

The Wikipedia description of this issue is completely at variance with the facts. The verb suck 'to be notably bad' was not at all common in the 1960s, even in America (OED's first quote is 1971, and though I've since found evidence back to 1964, there isn't all that much of it; 1970 is when it really takes off). I've never seen a British example from the 1960s, and while there might be one, I absolutely refuse to believe that it "was already well known in the UK at the time".

Given the sources in which this appears in the 1960s, I'd be somewhat surprised if the marketing people even in America were aware of the meaning.

But Bob Ladd writes to disagree, at least in part:

I take issue with Jesse Sheidlower (debunking the debunker of the debunkers). The modern sense of "suck" was thoroughly alive in young people's colloquial American English when I was an undergraduate from 64-68, and I presume you can confirm that as well, unless they had a different dialect at Harvard. The key difference between then and now is that it was very definitely felt to be rude -- it had clear sexual overtones, especially when directed at a person instead of (as seems to be more common now) at things and situations. (I still find it vaguely disconcerting when my 13-year-old son uses it entirely unself-consciously in polite company.) Consequently it didn't generally make it into print for Jesse and Co. to find, any more than there very many printed occurrences of fuck, etc. Obviously, I have no idea whether it was around in British English as well, but I'd bet anything that people my age who noticed the Electrolux ad in the 1960s (I don't remember it myself) would have got a double meaning out of it.

My recollection agrees with Bob's, and I'd extend the period back to secondary school during the period 1959-1965; but I don't have any specific episodic memories that I can place definitively in time, and so I admit that I might be wrong. If the phrase was as common (although taboo) as Bob (and I) remember, it's surprising that it wouldn't occur more often in print, at least in settings like "underground newspapers" where taboo words were freely used.

Meanwhile, several people have pointed out that there is already a perfectly good word autoantonym, so what's with the form that's missing the first 'o'? I don't know -- you have to ask Ryan North, the author of the dinosaur cartoon that started this thread off.

[Update -- from Ben Zimmer:

In the mid-'90s Burger King had an ad campaign for the Whopper using Marvin Gaye's song "Ain't Nothing Like the Real Thing." If one were so inclined, one could interpret that as "It ain't nothing like the real thing" rather than "There ain't nothing like the real thing."

A more recent example of unfortunate burger-chain sloganeering is "You gotta eat!" for Checker's/Rally's. As King Kaufman wrote on, this could be interpreted as: "(What the Heck), Ya Gotta Eat! (or you'll starve, and eating our burgers is marginally better than starvation)."

And from Bruce Rusk:

Your post today reminded me of another, intentionally punning slogan: the Fluke trucking company proudly states, "If it's on time, it's a Fluke."


[More from the mailbag -- Robert Lieblich:

I doubt I'm the first to mention the sign in a pharmacy window: "We dispense with accuracy."

I remember the first time I was stopped by a policeman while driving in California. He walked up to me and announced that he was going to give me a citation. It was all I could do not to respond, "Thank you officer, but I think driving well is its own reward." The citation cost me a visit to traffic court and about a hundred bucks.

And then there's "Shoes and shirts must be worn." What do those of us do whose shoes and shirts are new?

Or people carrying their purchases of shoes and clothing. Compare the (linguistically) famous London underground sign "Dogs must be carried", discussed here.

From Vicky Larmour:

In the UK one of the big courier companies is Business Express. Their slogan is "A promise means nothing until it's delivered".

I think this is meant to mean "You can make promises to your customers and then trust us to deliver the goods to them on your behalf". However, once I had been given the run-around with them repeatedly promising to deliver an awaited item to me at a particular time, but completely failing to do so, I started to read the slogan in a different way....

And Gregg Drube sends another, from the Dakota Food Court: "We put the fast in breakfast!".]

[Bill Barnett writes:

Before Hillary Clinton declared her candidacy for President, a number of websites popped up selling "Run, Hillary, Run!" bumper stickers, advising Democrats to affix them to their rear bumpers, Republicans to the front. Google finds 952,000 hits for the phrase.

And Mary Tabasko writes:

I live in Pittsburgh, PA, and there used to be a local hamburger chain called Winky's. At one point, they used the slogan "Winky's makes you happy to be hungry." Since the chain is defunct (has been since the late 80s, I think), perhaps more customers went with the unintended reading.


[Tilman Stieve points out that "Nixon's the one" came to have a very different meaning during the Watergate scandal than it had during his re-election campaign.]

[Jerry Kreuscher writes:

If you have time for another, here's my long-standing favorite: "It doesn't get any better than this." That one first caught my ear decades ago as the tag-line of a TV commercial for a brand of cheap, watery beer. Since then it has amused me how people incline to understand it as high praise rather than the despair that could be intended equally well.

You could put a couple of them together: "It doesn't get any better than this -- it's the water."]

Posted by Mark Liberman at 08:15 AM

November 08, 2007

The Google

In my seminar yesterday, I spent a little time reviewing the facts about (an)arthrousness in English proper names (recently discussed on Language Log here and here, with more to come), and the students brought up George W. Bush's reference to "the Google" last year, which was widely mocked, but not commented on here.

The event was a CNBC interview by Maria Bartiromo on October 23.  This site has a video clip, with the transcription:

HOST: I'm curious, have you ever googled anybody? Do you use Google?

BUSH: Occasionally. One of the things I've used on the Google is to pull up maps. It's very interesting to see -- I've forgot the name of the program -- but you get the satellite, and you can -- like, I kinda like to look at the ranch. It remind me of where I wanna be sometimes.

The WSJ version quotes "CNBC's unofficial transcript", which differs from the ThinkProgress transcription above in several respects: it gives Bush's "kinda" and "wanna" the standard spellings "kind of" and "want to"; it has "I forgot" instead of the non-standard "I've forgot" (I can't be entirely sure after many listenings to the passage, but I'm inclined to go with the ThinkProgress version); and it corrects "it remind me" (which seems entirely clear to me in the video) to "it reminds me".  But everybody has "the Google" (as well as the odd syntax surrounding it: "One of the things I've used on the Google is to pull up maps").

The class consensus was that Bush was analogizing to "the Internet" and "the (World Wide) Web", without realizing that Google doesn't take a determiner.

[Added later in the day: I've removed a brief attempt at an explanation for the arthrousness in "the Internet" and "the Web", since all that's important here is that they ARE arthrous.  Eric Christopherson asks about the history of arthrousness in "the Internet"; he notes that other computer networks and online services that were around in the early days of the Internet had/have anarthrous names: EFnet, DALnet, Fidonet, Usenet, AOL, Compuserve.  So he wonders if Internet once was anarthrous too.  An interesting question, which I hope someone will investigate.  And Alexis Grant reports another twist: 'my mom used to say 'on the email', as in, 'I sent you something on the email'" (so treating email like phone or fax).]

[Added 11/13/07: Mail on "the Internet" is pouring in; I will eventually post a summary.  Meanwhile, David O'Callaghan has pointed me to an Onion article "Google Launches 'The Google' For Older Adults".]

The main point I'd been making in class was that we need to distinguish several senses of DEFINITE, that is, several distinct properties that expressions can have.  In particular, we need to distinguish (as I did in the first of my recent postings) between NPs that are semantically (or pragmatically) definite (conveying uniqueness or givenness or both) from those that have the definite article in them.  There's clearly a connection -- it's not an accident that there's a custom of using the label definite in both cases -- but they are not coextensive.

First, there are NPs in English that are semanically definite, but don't have the as their determiner.  Several kinds of them: personal names, like Arnold Zwicky; other proper names, like, yes, Google, and Lake Worth, MIT, NATO, and several other types; NPs with possessive determiners, like Mary's father, and with demonstrative determiners, like this dog; and a number of others.

Second, there are English NPs with the as their determiner that aren't semantically definite.  Again, there are several types; here are two in which the NP is understood as referring to a type rather than an individual:

(1) the hospital in "She's in the hospital".  This is the American version; the British version uses a bare (anarthrous) NP -- "She's in hospital" -- to convey something like 'She's been hospitalized'.  The facts about these type-denoting location nouns, of both the arthrous and the anarthrous varieties, are quite complex, but the point here is only that there are some arthrous examples.

(2) the bus in "I came on the bus" 'I came by bus'.  Note the anarthrous variant with by rather than on.  Again, the facts about these conveyance-denoting nouns, of both the arthrous and anarthrous varities, are quite complex, but the point here is only that there are some arthrous examples.

Further wrinkle: there is some question about whether there is a specifically syntactic (rather than semantic) property of "definiteness" for NPs in English (and indeed in many other languages).  At issue is whether there is a class of NPs in English that plays some role in generalizations about the syntax of the language and is not simply identical to the class of semantically definite NPs.  But in any case, we do need to distinguish NPs with a semantic property I'll label D and NPs with the syntactic property I'll label ArtDef; ArtDef NPs are those with the as their determiner.  Most, but not all, ArtDef NPs are D NPs, and ArtDef NPs are in some sense the canonical examples of D NPs.

I've chosen somewhat artibtrary labels for these classes of expressions to discourage people from reasoning about them on the basis of the meanings and uses of the English words definite and definiteness; labels are not definitions.  It's sometimes been suggested to me that anarthrousness in proper names can be predicted, at least in part, from the extent to which the referent is a definite, or defined, entity, in the sense that it has clear boundaries.  I'll take up this idea in a later posting, but for the moment I'll just note that connecting ArtDef to this sense of definite looks like reasoning from the customary label.

Posted by Arnold Zwicky at 02:31 PM

Ask Language Log: is "regime" a loaded word?

Joseph Kynaston Reeves wrote:

If you have the time and inclination, I’d be interested to hear your opinion on the word “regime”.  To me, the word, applied to a government, implies a lack of freedom and is implicitly critical of that government.  Going simply by intuition, I’d say that “the Stalinist regime” sounds right while “the Chirac regime” sounds faintly ridiculous.  But I have been informed that the word is “officially” neutral.

If you want to waste a huge amount of time reading the spiteful arguments that led to this query, they’re here. It’s a typically annoying blog-type argument, so I strongly advise you not to read the whole thing.

Joseph's intuition certainly corresponds to the facts: the word regime, used with respect to a specific government, seems in practice to carry a strong sense of disapproval, usually implying a lack of democracy.

The most recent ten examples from a search on the New York Times' web site:

"the Iranian regime", "Saakashvili's regime", "the regime of Saddam Hussein", "the Iranian regime", "the Pyongyang regime", "this regime" [referring to Myanmar], "the Deby regime" [Chad], "the regime in Tehran", "his regime" [referring to Musharraf], "the communist regime", "another military regime in Guatemala" [referring to a hypothetical situation].

All of these are clearly intended as negative characterizations: thus the phrase about the current Georgian government occurred in a quotation, "Saakashvili's regime showed us that it is in no way different from the communist regime whose soldiers beat their citizens with shovels in the same place".

Searching the Washington Post, I get

Musharraf and his regime, the Iraqi regime [= Saddam Hussein], The shah ignored America's admonitions to clean up his undemocratic regime, a regime representative [Myanmar], the Iranian regime, naked political prisoners were tossed from planes to their deaths in the waters under the military regime [in Argentina], condemning the Syrian regime, a heroically observed story of a woman's abortion during the final years of the Ceausescu regime, Irate lawmakers accused them of collaborating with an oppressive communist regime, [etc.]

This is not just a fact about American usage -- a similar search on the BBC News web site (which seems to be ordered by relevance, or perhaps randomly, rather than by date) turns up these ten headline fragments at the top of the list:

Torture Victims During Pinochet Regime, The Tsar's regime till 1914, Burma regime 'frees 70 detainees', Cambodia's brutal Khmer Rouge regime, Slovak bishop praises Nazi regime, the regime of Robert Mugabe, Saddam Hussein's regime, Swazi lawyers sue king's regime, Horrors of the Abacha regime, Machel death 'linked' to apartheid regime

Similar searches on the sites of a number of other U.S. and U.K. publications have similar results. I won't bore you with the entire list of examples, but only one example of regime applied to a contemporary European government by a mainstream publication tuined up -- in an editorial on the Telegraph's web site criticizing "the new prime minister's first Queen's Speech ("Fiddling, not vision", 11/7/2007):

Many of the measures were, in fact, resumptions of business as usual under the Blair-Brown regime of the past decade: yet more legislative mechanisms in which the intrusive hand of the bureaucratic state meddles in areas that are inappropriate or faintly absurd.

Even if we search explicitly for such uses, the results underline the point. A search on Google News for "Merkel regime" comes up empty, though "Merkel government" has 52 hits; and a search for "Prodi regime" turns up just one example, in the communist paper Worker's World ("1 million-plus take to streets", 10/24/2007): "The strong Oct. 20 action shows that a large section of the Italian working class refuses to accept this collaboration with the 'lesser-evil' Prodi regime."

The OED takes note of this negative connotation. Its first sense for regime is the (here irrelevant) one that means "regimen", i.e.

This is not a diet to enter upon without medical prescription... To embark on this régime without due regard to the consequences may delay diagnosis of other disorders.

The second sense is the one in question:

2. a. A manner, method, or system of rule or government; a system or institution having widespread influence or prevalence. Now freq. applied disparagingly to a particular government or administration.

The earliest OED citation that seems to carry the disparaging connotation is this one:

1955 Times 2 May 8/3 But none of us is prepared, either, to bolster up the aging régime of Chiang Kai-shek. Ibid. 11/5 Only King Saud and the régime in the Yemen (which recently survived in undiminished medieval splendour an abortive coup d'État) remain patently faithful to Egypt.

There are other meanings of regime that are evaluatively neutral -- e.g. "regime theory" in political science, where regime refers to a set of "principles, norms, rules, and decision making procedures" that apply in international relations. But anyone who claims that regime is a neutral term for "government" in contemporary usage is either dishonest or tone deaf.

[Benjamin Zimmer sent in a famous example from the pop culture of the 1970s, applied to a recent Western European government but also supporting the view that regime is a negatively-evaluated word:

God save the Queen
The fascist regime
Made you a moron
A potential H-bomb.
God save the Queen
She ain't no human being
There ain't no future
In England's dream.
-- The Sex Pistols, "God Save the Queen" (May 1977)


Posted by Mark Liberman at 06:45 AM

November 07, 2007

Homographic homophonic autantonymic chants and signs

The current crisis in Pakistan has brought us a class of chants and signs that can mean two opposite things. Thus Stephen Graham, "Pakistan Police to Stop Bhutto Rally", AP 11/6/2007:

In Islamabad, about 200 attorneys held a rally inside the district court, chanting "Go Musharraf, Go" and "No Musharraf, No."

Or "'Go Musharraf Go!'", News24 11/6/2007:

In the biggest gathering, about 2 000 lawyers congregated at the High Court in the eastern city of Lahore. As lawyers tried to exit onto a main road to stage a rally, in defiance of a police warnings not to violate a ban on demonstrations, hundreds of officers stormed inside.

Police fired swung batons and fired tear gas shells to disperse the lawyers, who responded by throwing stones and beating police with tree branches. The protesters shouted, "Go Musharraf Go!"

It's clear from context that this has the opposite meaning -- at least emotionally -- from "Go, Dog. Go!", or "Go Speed Racer, Go!", or "You go, girl!".

As a phrase that can mean two more-or-less opposite things, this is much better than the examples with dust and overlook.

There must be some other political slogans that can mean the opposite of what their users intend, but none come to mind just now. I can think of one advertising slogan with this property: the old Olympia beer slogan "It's the water".

[Hat tip: Andy Hollandbeck.]

Posted by Mark Liberman at 02:14 PM

The perils of mixing romance with language learning

A few days ago, Michael Chen drew my attention to a piece by Matthew Rusling in the Christian Science Monitor, "I Sound Like What In Japanese?" ( 9/17/2007):

Like many Western men who spend more than a year in Japan, I learned most of my intonation, expressions, and slang – the things not taught in the classroom – by mimicking a Japanese girlfriend.

I thought my Japanese was fine, while in reality the effeminate, almost childish twang I had been learning made me sound very much like a 20-something, pink miniskirted Japanese woman.

Although I know very little about Japanese, one of the features that Rusling cites came up about a dozen years ago in an introductory phonetics course -- but in reverse, and given a different interpretation. He write that

... women tend to ... [elongate] their word endings in an almost coquettish attempt to flatter the listener.

In the course I was teaching, in the unit on phrase-final lengthening, we did an exercise in which each student recorded a list of 100 sequences of digits, like phone numbers or social-security numbers. The lists were arranged so that each digit occurs ten times in each position in the list, with each digit-pair occurring once spanning each pair of positions in the sequence. We then graphed the average duration of each digit in each position. This shows a sort of durational profile representing the local modulation of speaking-rate: the shape of a spoken phrase.

Each student used his or her own native language, and so in addition to English, we had Chinese, Spanish, and a number of other languages including Japanese. We could then compare across languages -- and across phonetic categories and syllable types -- how much modulation of duration was produced. In that year's data, as I recall, Chinese had the most contextual modulation of duration, and Japanese the least -- by far. This was especially striking because some of the digits are essentially the same, the Japanese digit-names having been borrowed from Chinese.

When we discussed this in class, the Japanese participant -- a young woman -- explained that this might not mean that Japanese had less final lengthening in general. She said that she perceived elongation of final syllables as a marker of young women's speech in informal settings, and that she had therefore consciously minimized this feature in her own reading of the digit sequences. But she didn't present the elongation feature as "coquettish", i.e. flirtatious: the example that she used was two high-school girls talking together. (This anecdote is in itself no more reliable than Rusling's offhand characterization. I haven't been able to find any sociolinguistic studies of this phenomenon -- if you know more about it, please tell me. )

Rusling cites a number of other female-associated features:

Because the Japanese tend to avoid any form of confrontation, my girlfriend would never correct me. That is, until one day in an ice-cream shop when she couldn't take it anymore. She snapped, "Don't say it that way – you sound like a girl!" referring to my choice of words to describe the ice cream we were sharing.

I didn't mind being corrected on my pronunciation. But I was disappointed to learn that for the past 2-1/2 years, I had not been speaking good Japanese.

Suddenly, she fired off a list of the mistakes I had apparently made umpteen times. She said her friends had often snickered when I referred to myself in the third person, as many Japanese women and girls do, and when they heard me end sentences with the particle "wa," which is usually used by women to soften the tone of a sentence. Most of all, she said, I needed to take the pitch of my voice down several notches from the tone I had learned.

Final particles aside, it's a little surprising that Rusling never realized that women have higher-pitched voices than men do, and that exaggerating the difference is a way of explicitly drawing attention to sexual identity -- this general pattern is hardly unique to Japanese.

As for the choice of final particles and so on, it's also suprising that Rusling didn't notice the sections in textbooks that comment on the stereotypical gender associations. It's a matter of some controversy how these differences should be portrayed -- thus Meryl Siegal and Shigeko Okamoto, "Toward Reconceptualizing the Teaching and Learning of Gendered Speech Styles in Japanese as a Foreign Language". Japanese Language and Literature, 37(1): 49-66, 2003:

... the actual speech of Japanese men and women often diverges from these gender "norms." For example. many women, including younger women and speakers of regional dialects, do not use many of the "female" forms given in these textbooks ... Furthermore, they widely use many of the "male" forms ...

Likewise, men also use many of the female forms, such as no, na no, and plain forms of verbs and adjectives, while the uses of such male forms as kui, dui, zo, and ze are situationally restricted

I guess it's possible that Rusling was exposed only to politically-corrected textbooks that pretend that the stereotypical gender differences don't exist, but my impression is that there are no texts of this kind. There don't seem to have been any in 2003: Seigal and Okamoto surveyed "seven popular textbooks as representative of the texts used in Japanese language classrooms in the United States", and found that these texts "portray stereotypcial images of Japanese men and women" and "emphasize the gender differences in speech patterns, referring to male/female differential uses of a number of linguistic features, such as sentence-final particles, honorifics, and self-reference and address terminology".

Things have become more complicated recently, I think, because some of the traditionally gendered features of Japanese speech are dying out or coming to be used both by men and women, while other new features associated with particular groups -- especially young women -- are spreading.

It's easy to see how the situation that Rusling describes can arise -- the son of a Japanese colleague who was raised in the U.S. once told me that when he visited Japan, he was told that he spoke "lady Japanese", because he had learned Japanese mostly from talking with his mother. Still, given that Rusling learned his basic Japanese not from his girlfriend but in language classes, I suspect that the gender-associations of some of the features that he cites should not have come as such a surprise to him.

For more, see Cindi Sturtz Streetharan ("Students, sarariiman and seniors: Japanese men's use of 'manly' speech registers", Language in Society 33:81-107; and Shizuka Lauwereyns, "Hedges in Japanese conversation: The influence of age, sex, and formality", Language Variation and Change 12:239-259 (2002).

[Update -- commenting on the pitch range issue, Randy Alexander (AKA LanDi Liu) writes:

Come on, that's a little below the belt! I'm 100% sure that he's factoring in a male-female difference of about an octave. His girlfriend meant that he was talking too high in his range. I think if you listen to a few Japanese people having a conversation in Japanese, you might notice that the women talk higher in their range than the men do. But I don't get the impression that people do that (exaggerating the difference in voice range between sexes -- men speaking lower in their range and women speaking higher) in most languages, certainly not in English or Chinese. I think it's a notable feature of Japanese; one that men especially need to be careful about when studying it as a foreign language.

This may be more common in Japanese culture than elsewhere, but there are certainly contexts in American life where sex differences in pitch range are exaggerated for effect. And I have the impression (unsupported by any systematic measurements) that British women of a certain class use an unexpectedly high pitch range, though (like many of my other opinions about British linguistic culture) this may be the result of excessive exposure to Monty Python skits.

Anyhow, it does seem naive to me for someone spending a year or more in Japan to remain unaware of the gender associations of pitching one's voice higher or lower, since (whatever the ethnographic details) the basic association is an entirely natural one. ]

[Richard Gabbert also comes to Rusling's defense:

A friend sent me a link today to your blog entry on gender stereotyping in Japanese, and I must say that I sympathize completely with Rusling. I lived and worked in Japan for most of the nineties and spent much of my time in an all-Japanese environment. I speak and read the language well, but I have constantly struggled with the very issues Rusling describes in his article. Most of my colleagues were women, and my Japanese teacher was female, and my speech patterns inevitably mimicked those of the people with whom I spent most of my day.

Your criticism of Rusling--if I understand it correctly--seems rather naïve. I mean, it's one thing to have a textbook understanding of language norms; it's quite another to (1) be able to conform to those in daily speech and (2) recognize how one's deviations from those norms affects native speakers' perceptions of one's character, personality, and intelligence. I never used textbooks with any regularity, but I was well aware of the gender-specific conventions of speech that still prevail--even if less so than in the past--in day-to-day conversation. That awareness, however, did not (and still does not) make conforming my speech to the patterns expected of male speakers any easier; with sufficient effort, I can avoid stereotypically female usages during a business meeting, but it's incredibly difficult over a sustained period of time. In fact, one of the reasons I finally gave up speaking to my (Japanese) wife in Japanese was that she found my speech mannerisms too effeminate and eventually--after a couple of years of marriage--told me that those mannerisms lessened her respect for me. Only toward the end of my eight years there did I finally achieve the level of competence in the language--and the familiarity with its idioms and nuances--that allowed me to appreciate just how jarring and childish my own speech often sounds.

In sum, my experience of learning a foreign language is that we model the language of those we interact with on a daily basis; indeed, assuming that one's learnings from a language textbook can trump one's daily experiences with the language reflects a misunderstanding of how most people learn language. (My grandmother, if asked, would readily acknowledge that the third person past plural form of "to be" is "were," but that does not prevent her from using the form that apparently predominated in her childhood home: "They was . . . ") One can be beat over the head with the warning that certain expressions will make one sound unintelligent or inappropriately effeminate (or masculine), but until one begins to hear the language with the ears of a native speaker, that reality seldom hits home (particularly where the people one speaks with on a daily basis all use those forms of speech).

I don't find anything in this to disagree with. My observation about Rusling was not that he had trouble sorting out the gender associations of various phonetic, phonological and morphological features of Japanese -- these are clearly complex and situation-dependent. Nor was I surprised that he tended to accomodate to the speech patterns of the women he mostly spoke with, especially his girlfriend. What surprised me was the implication that the very existence of gender-associated speech patterns in Japanese was an unexpected discovery for him. But perhaps that's just the consequences of the newspaper-article format -- and it might even have come as a result of changes made by an editor at CSM.]

[Karen Kay writes:

This was something I read about when I was a grad student at Yale. Haskins had all the issues of the... shoot, it's been 20 years. Journal of Logopedics and Phoniatrics. Something like that. It's a Japanese phonetics journal, or was. One of the studies in there indicated that Japanese women's voices are higher than American women's voices, and Japanese men's voices are lower than American men's voices. AFAIK, it's not actually a matter of different structure, it's cultural. My voice is higher when I speak Japanese. When John Wayne's voice is dubbed into Japanese, he sounds like Barry White instead of John Wayne because that's his cultural image. This is one of the things I taught when I taught Japanese language, pitching your voice higher or lower.

I think that Karen may be referring to the Annual Bulletin of the Research Institute of Logopedics and Phoniatrics, at the University of Tokyo, but a quick scan of the titles over the years through 1997 doesn't turn up any work on sex differences in pitch. However, checking Google Scholar for {Japanese male female pitch} does turn up Leo Loveday, "Pitch, Politeness and Sexual Role: An Exploratory Investigation into the Pitch Correlates of English and Japanese Politeness Formulae", Language and Speech, 24(1): 71-89 (1981).]

[Laura Ahearn writes:

While it is not exactly the kind of work you were looking for, one book-length work that immediately popped into my mind as I read your post on "The Perils of Mixing Romance with Language Learning" was the ethnography by Miyako Inoue, a linguistic anthropologist (note: she is a linguistic anthropologist, *not* a sociolinguist) at Stanford. Inoue's book, entitled, Vicarious Language: Gender and Linguistic Modernity in Japan (Univ. of California Press, 2006), analyzes the contemporary and historical linguistic practices and language ideologies surrounding "women's language" in Japan. Here is what the blurb says about her book:

"This highly original study provides an entirely new critical perspective on the central importance of ideas about language in the reproduction of gender, class, and race divisions in modern Japan. Focusing on a phenomenon commonly called "women's language," in modern Japanese society, Miyako Inoue considers the history and social effects of this language form. Drawing on ethnographic fieldwork in a contemporary Tokyo corporation to study the everyday linguistic experience of white-collar females office workers and on historical research from the late nineteenth century to 1930, she calls into question the claim that "women's language" is a Japanese cultural tradition of ancient origin and offers a critical geneaology showing the extent to which this language form is, in fact, a cultural construct linked with Japan's national and capitalist modernity. Her theoretically sophisticated, empirically grounded, interdisciplinary work brilliantly illuminates the relationship between culture and language, the nature of power and subject formation in modernity, and how the complex nexus of gender, language, and political economy are experienced in everyday life."

This book might not help male language learners seeking to speak Japanese in a "gender-appropriate" way, but it does address the broader social, cultural, and historical issues influencing ideas about language and gender in Japan.

I see that the same author has written " Gender, Language, and Modernity: Toward an Effective History of Japanese Women's Language", American Ethnologist 29(2): 392-422 (May 2002).]

[A reader, self-described as bilingual in Japanese and English, writes:

The description you have from the young Japanese participant is accurate, and I would expect a young female to sound "neutral" in such a controlled situation as you described with the survey.

"Flirty" is definitely an oversimplification: Female patterns of speech are strongest in the most casual and familiar situations, just like you would expect from any young female English speaker. There are other related "ways of speaking" for every female character description in Japanese, but to me it has always felt equal to but slightly exaggerated and more systematic than the same trends in English and other languages.

In a very casual situation (i.e. talking to close friends, brothers and sisters, school mates, generally people on the "inside" of their circle that they feel comfortable talking casually with) girls will extend the final vowels on a number of words with grammatical functions (things like "but", "because", "well", "anyways"), as well as tense marking particles and sentence final particles.

There is a stereotype in American English of girlish emphasis on various interjections and intensifiers, but I'm not familiar with any English female-associated pattern of exaggerated final lengthening. But maybe the difference is that the functional particles in Japanese tend to come word- or phrase-finally. I don't know of any studies of this phenomenon -- if you do, please tell me.]

[Ray Girvan draws our attention to this humorous account of the travails of American students trying to learn Japanese, which includes this observation:

Politeness depends on many things, such as age of the speaker, age of the person being spoken to, time of day, zodiac sign, blood type, sex, whether they are Grass or Rock Pokemon type, color of pants, and so on. For an example of Politeness Levels in action, see the example below.

Japanese Teacher: Good morning, Harry.
Harry: Good Morning.
Japanese Classmates: (gasps of horror and shock)

The bottom line is that Politeness Levels are completely beyond your understanding, so don't even try. Just resign yourself to talking like a little girl for the rest of your life and hope to God that no one beats you up.


Posted by Mark Liberman at 07:40 AM

Dilbert explains the maxim of manner

Posted by Mark Liberman at 07:37 AM

A poem

Just ran across this poem in the excellent and accessible '180' collection.
The Grammar Lesson

Steve Kowit

A noun's a thing. A verb's the thing it does.
An adjective is what describes the noun.
In "The can of beets is filled with purple fuzz"

of and with are prepositions. The's
an article, a can's a noun,
a noun's a thing. A verb's the thing it does.

A can can roll - or not. What isn't was
or might be, might meaning not yet known.
"Our can of beets is filled with purple fuzz"

is present tense. While words like our and us
are pronouns - i.e. it is moldy, they are icky brown.
A noun's a thing; a verb's the thing it does.

Is is a helping verb. It helps because
filled isn't a full verb. Can's what our owns
in "Our can of beets is filled with purple fuzz."

See? There's almost nothing to it. Just
memorize these rules...or write them down!
A noun's a thing, a verb's the thing it does.
The can of beets is filled with purple fuzz.

Posted by Heidi Harley at 01:18 AM

November 06, 2007

More risky RNR

Ben Zimmer ran across this puzzling coordination on the website for the University of Chicago's Workshop for the Anthropology of Latin America and the Caribbean, at the very beginning of an announcement for a workshop (on responding to the movie Apocalypto) this Wednesday:

Do, and if so, how should academics engage with popular culture?

This would appear to be a reduced coordination, of the sort known in the syntax literature as Right Node Raising (RNR).  The unreduced version would be:

Do academics engage with popular culture, and if so, how should they engage with popular culture?

RNR is versatile, but not versatile enough to treat a yes-no question and a WH question as equivalent structures.

RNR (last discussed here back in August) allows for coordinations of  the form

[ X Z ] and/or [ Y Z ]

(where Z is a constituent, but X and Y are not both constituents) to have the right constituent Z "factored out", giving

[ [ X ___ ] and/or [ Y ___ ] ]  Z

So, for the coordination of VPs

[ bring liquids to this machine ] or [ place liquids on this machine ]

we get the RNR version

[ [ bring liquids to ___ ] or [ place liquids on ___ ] ]  [ this machine ]

(where bring liquids to and place liquids on are not constituents), which is the VP in

Do not bring liquids to, or place liquids on, this machine.

This is an acceptable reduced coordination.  (It's the acceptable version of the damaged RNR "Do not bring to, or place liquids on, this machine" in the August posting.)

On to academics engaging with popular culture.  The unreduced version seems to have the form

[ do academics engage with popular culture ] and [ how should academics engage with popular culture ]

(omitting the if so, which is important for the meaning but not crucial for the structure).  So X is the inverted auxiliary do; Y is the fronted WH word how in combination with the inverted auxiliary should (a combination that is certainly not a syntactic constituent); and Z is apparently a base-form clause, academics engage with popular culture.  If so, then the badness of the RNR version

[ [ do ___ ] and [ how should ___ ] ]  [ academics engage with popular culture ]

is mysterious.

But there might be an explanation.  The English Subject-Auxiliary Inversion (SAI) construction has been the focus of quite a lot of literature over many years, and one of the issues surrounding it is the question of what the structure of inverted clauses is.  There are three possibilities (assuming that constituent structures are continuous):

(1) two-part: Aux + base-form clause (consisting of Subj + complement of Aux)
  i.e., [ do ]  [ [ academics ] [engage with popular culture ] ]

(2) two-part: Aux and Subj together in a constituent + complement of Aux
  i.e., [ do academics ]  [ engage with popular culture ]

(3) three-part: Aux + Subj + complement of Aux
  i.e., [ do ]  [ academics ]  [engage with popular culture ]

Theoretical considerations have figured very prominently in the literature on the structure of SAI clauses.  Within some theoretical frameworks, only structure (1) -- the one I assumed at first in my discussion of the popular culture example -- is possible.  In other frameworks, what the correct structure is is an empirical question, and many writers have opted for structure (3) (which is, in fact, my choice).  The evidence in favor of structure (2) -- with Aux + Subj as a constituent of some novel type -- isn't zero, but it's not very compelling, so the matter pretty much comes down to a choice between (1) and (3).

And now RNR is relevant.  With structure (1), we have no obvious explanation for the problem in our original example.  But with structure (3), we have an immediate explanation: the "factored" material, in bold face, does not constitute a single syntactic constituent; instead, it's just two constituents in sequence, and the conditions for RNR are not satisfied.

Some other, invented, examples that seem to me as bad as the original:

Will, and if so, when will, you finish the project?

How do, in fact, do, you work long hours?

(It hadn't occurred to me when I started writing up this one odd example that it might bear on the analysis of SAI.  This is an example of how "little" Language Log postings can turn into something more substantial.)

Posted by Arnold Zwicky at 03:39 PM

November 05, 2007

Of mice and men and women: news from the fifth estate

This is Wikipedia quoting Jeffrey Archer quoting Edmund Burke:

"In May 1789, Louis XVI summoned to Versailles a full meeting of the 'Estates General'. The First Estate consisted of three hundred clergy. The Second Estate, three hundred nobles. The Third Estate, six hundred commoners. Some years later, after the French Revolution, Edmund Burke, looking up at the Press Gallery of the House of Commons, said, 'Yonder sits the Fourth Estate, and they are more important than them all.'"

The clergy and the nobility have been replaced, I suppose, by the scientists and the politicians. Or perhaps the religious right and the corporate lobbyists. Or some other groups -- social categories are not so easy to differentiate and enumerate these days.

But there's one group whose influence is hardly ever discussed, although they may have more cultural and political impact than the scientists and the press combined. I mean, of course, the consultants.

I was reminded of this a couple of weeks ago, when Josh Jensen sent me a link to an article about Deloitte & Touche's use of the TrendSight Group to teach its employees how to "approach women clients differently from men" (Erin White, "Deloitte Tries a Different Sales Pitch for Women", WSJ 10/8/2007).

Body language differs by gender. Men tend to stare as they listen and nod to signify they understand. Women may nod when they don't yet understand to encourage the speaker to keep talking. And while consultants often seat themselves beside a male client as their "right hand man," women are more comfortable seated face to face.

Since Deloitte is itself in the consulting business, this makes TrendSight into meta-consultants, in effect teaching theology in the seminaries. And press coverage also helps to spread the gospel. Thus the WSJ article was picked up by the influential radio evangelist Albert Mohler:

... every husband to whom I have shown this had what amounts to a communication Great Discovery. ...

Ahhh . . . so when she nods it means something different? This is useful.

TrendSight's web site claims that their GenderTrends™ product is scientifically based:

It's no secret that men and women have different communication and decision making styles as well as different priorities and preferences. The TrendSight Group bases their strategic insights on leveraging these differences. Our research methodology and proprietary marketing models are based on science, not stereotypes. We combine gender expertise and multi-industry experience to help companies build sales and share and improve recruiting and retention effectiveness with women.

Since I'm interested in human communication, I was struck by the claim about sex differences in the use of head nods. Yesterday, I happened to be talking with Jens Allwood, who has put a lot of effort into looking across cultures at things like this. He didn't recall any relevant research on sex differences in nod function, and he expressed some skepticism that he would find a large male/female difference in head-nod function if he looked for it, in the many video recordings of conversations that he and his colleagues have coded for exactly these sorts of gestures.

I hope that Jens will look into this. As I pointed out to him, pseudo-science is beneficial to scientists in a a certain way, because negative results that would not otherwise have much interest (say, that men and women use about the same number of word per day, on average) can become worthy of publication. (If you know of any research on sex differences in the function of head-nodding in contemporary American culture, please tell me. Of course, there are well-established cultural differences in the signals that manage or comment on interpersonal interactions, and no doubt head nods are subject to such differences as well. The question at hand is whether in contemporary American business settings, men tend to use head nods to mean "I understand, go on" while women tend to use them to mean simply "go on".)

I didn't find any references to empirical studies of sex differences in communication on the TrendSights web site, but I did find a "speaking demo" featuring the company's leader, Marti Barletta. I thought that this presentation was an especially nice example of the rhetoric of an increasingly-common kind of contemporary gender-difference ideology, so I've transcribed a characteristic passage:

When we're talking in a personal setting or in a social setting, everybody'll pretty much give you men and women are- Are men and women different? Oh you betcha, of course, they're very different.
But when you talk in a business setting,
sometimes people are a little bit
careful about it, you know, there's been a lot of energy
over the last twenty or thirty years --
including by women like myself --
over the proposition that men and women are NOT different.
And then they did this human genome- or they did these genome sequencing projects,
and they found out that actually mice and human beings share
ninety five percent identical D.N.A.
Ninety five percent identical!
And I'm looking at this two different organisms, and I'm thinking to myself, well OK, if I've got to work with this, they've both got two eyes, and two ears, and
heart-lung system, breath- carbon-breathing organisms.
I guess I can see they do have a lot more in common than I thought.
But you're telling me
that the difference in the size, and the fur, and the ears, the- the brain power,
all of that is due to five percent difference in D.N.A.?
And I think it might be kind of like that with men and women as well.
I think men and women may actually be ninety five percent identical.
But boy that last five percent makes a big difference.
And the point of this whole thing is,
we're different.
Men do things one way,
women do things a different way,
it doesn't mean one way is better than another way.

Fair enough, right?

But when we pay someone to give us "scientifically-based" insight into what those differences are -- and the money that Deloitte pays TrendSights comes from Deloitte's customers, and thus indirectly from all of us to one extent or another -- it would be nice if they had something to back up their assertions besides analogies to the differences in DNA between humans and other "carbon-breathing organisms" such as mice.

[Update -- Stalina Villareal has pointed me to Marie Helweg-Larsen et al., " To nod or not to nod: an observational study of nonverbal communication and status in female and male college students", Psychology of Women Quarterly, 28(4): 358-361, 2005. The abstract:

Gender studies show that women and men communicate using different styles, but may use either gender style if there are situational status differences. Considering the universal gesture of head nodding as a submissive form of expression, this study investigated head nodding by observing female and male college students in positions of subordinate and equal status. We observed head nodding (N= 452) in classroom interactions between professor-student and student-student dyads. Overall, women nodded more than men and students nodded more to professors speaking than peers speaking. In addition, female and male students nodded equally to professors speaking, but men nodded less to peers speaking than did women. Thus, both men and women attended to the status and not the gender of the speaker. Future research using varying contexts should further examine the effects of dominance, context, and gender.

One striking effect not mentioned in the abstract (perhaps because it did not accord with the authors' preconceptions) was that male students nodded more than twice as often to female peers as they did to male peers (in 19 of 57 opportunities, or 18%, compared to 2 of 30 opportunities, or 7%).

This study didn't try to determine the function of individual nods -- but the fact that male and female students nodded about equally often while professors were speaking makes it seem unlikely that the males generally meant "I understand, go on" while the females generally meant only "go on". (For example, male students nodded 57% of the time while male professors were speaking, while female students nodded 61% of the time.) If the TrendSight theory about the gendered function of nods were true (that men nod "to signify that they understand" while women "nod when they don't yet understand to encourage the speaker to keep talking"), this would imply that the male students understood what was being said much more quickly than the female students did. This seems very unlikely to have been true. ]

[Update #2 -- In Barletta's book Marketing to Women: How to Understand, Reach, and Increase Your Share of the World's Largest Market Segment, which is kind enough to let me search online, she tells a somewhat different story about the gender difference in head-nod meaning (p. 186):

For men, head nods mean agreement: the listener agrees with the speaker. For women, though, head nods are how they encourage participation; they mean "go on" rather than "yes, I buy what you're saying".

There are at least three quite distinct messages under discussion here: (1) "I'm interested, please go on"; (2) "I understand"; and (3) "I agree".

In the WSJ piece, Barletta is quoted as saying that women's head nods "may" mean (1) while men's nods "tend to" mean (2). In the book, she says (much more categorically) that men's nods mean (3) while women's mean (1). No references are cited to support this assertion.

It's worth noting that in this book, Barletta pledges allegiance to the new biologism (p. 21):

What makes a woman a woman? Is it "sugar and spice and everything nice" with some maturity thrown in for good measure? Actually, it's more like chromosomes, hormones, and brains. In reality, the deciding factors are far more related to proven evolutionary and biological factors than they are to fairy tales, myths, or stereotypes.

There's another dozen pages of pop biology after that. Here's a characteristic sample (p. 23), illustrating the carelessness with which such popularizers turn facts into factoids:

Headline: "Brainy sons owe intelligence to their mothers." It turns out that the primary genes for intelligence, all eight of them, reside on the X chromosome. Men get one X chromosome from their mothers, whereas women get two Xs, one from Mom and one from Dad. So, while women's intelligence is a composite of both parents' "smarts,", men get all their intelligence from their mother.

There's a footnote: Gillian Turner, "Intelligence and the X chromosome," The Lancet 347, No. 9018 (20 June 1996), pp. 1814-15. Cited on, revised by Clifford Morris, 16 July 2000.

That page on Clifford Morris's web site seems to have been removed -- though it was once there, since he links to it. So let's look at the original article -- I've posted a copy here for purposes of criticism and discussion.

If you read it, you'll discover that Turner does not say that "the primary genes for intelligence, all eight of them, reside on the X chromosome". Rather, she says this:

In primary or non-specific X-linked mental retardation (XLMR) ... eight discrete localisations have emerged, which define the lowest limit of the number of genes involved. They extend over the short and long arm of the X chromosome. The genes themselves are not sequenced and their individual functions are unknown.

These are not "the primary genes for intelligence" -- rather, they are the regions affected by "non-specific X-linked mental retardation", which by definition must involve the X chromosome.

Turner does believe that this indicates a special role for the X chromosome in the genetics of "intelligence", but she cites the fact that

Morton's counterargument was that there were a calculated 325 recessively inherited genes associated with mental retardation. Therefore by calculating total DNA content of all the chromosomes the contribution of the X chromosome should be 17 genes.

There's further back-and-forth in the subsequent literature about the relative importance of the X chromosome in mental retardation -- thus Jamel Chelly et al., "Genetics and pathophysiology of mental retardation", European Journal of Human Genetics 14:701-713 2006.

Affecting 1-3% of the population and resulting from extraordinary heterogeneous environmental, chromosomal and monogenic causes, MR represents one of the most difficult challenges faced today by clinician and geneticists. Detailed analysis of the Online Mendelian Inheritance in Man database and literature searches revealed more than a thousand entries for MR, and more than 290 genes involved in clinical phenotypes or syndromes, metabolic or neurological disorders characterized by MR. We estimate that many more MR genes remain to be identified.

... For monogenic causes, genes have mainly been found on the X chromosome than on any other comparable segment of the autosomes. This is partially related to the greater ease in pointing out X-linked genetic disorders (including those characterized by MR) and in identifying the corresponding genes and mutations involved. ... However, recent molecular studies, in combination with clinical follow-up of a large cohort of patients, are suggesting that the proportion of monogenic XLMR in sporadic MR males would account at best for 8-10% of the genetic causes of MR.

So Barletta's factoid about "the primary genes for intelligence, all eight of them" is clearly a wild exaggeration of a misreading of a controversial claim -- all too typical of the "science" that is advanced to support this ideology.

In the case of her factoid about the gendered meaning of head nods, it remains to be seen whether it is grounded in some empirical research, or spun up out of a misreading of some research, or simply made up out of thin air. At the moment, my money would go on the third option.]

Posted by Mark Liberman at 10:12 AM

November 04, 2007

Sliver me timbers!

Yes, sliver.  Alison Purnell found this eggcorn in an episode of the television show Due South.  Not just the eggcorn, but a discussion about it, complete with a rationale for it.

From the episode "Mountie on the Bounty Pt.1", originally aired 3/15/98 (transcription by Purnell):

Ray: Frannie! Can you run some prints for me, check 'em against any known pirates?

Francesca: Pirates?!  What do you mean, like "pieces of eight" and "sliver me timbers"?

Ray: It's "shiver me timbers."

Francesca: It's "sliver."

Ray: Frannie!

Francesca: Ray! What can that mean, "shiver me timbers"? That doesn't mean anything.

Ray: Sure it does, like, "shake your booty," something like that.

Francesca: Ray.  Pirates.  They slide down masts.  Wooden masts.  "Sliver," y'get it?  Sliver in their timbers?  [in disgust as she walks away] "Shiver."

Ray: [to himself] I never got that.

There are a few dozen webhits for "sliver me timbers", but they all look like plays on words.  This one seems to have been invented just for the show.  And with a rationale that's like the ones we get for eggcorns in the wild.

Posted by Arnold Zwicky at 02:02 PM

Lexical repulsion

This is not word aversion. Nor is it word rage. In fact, no human emotions are involved, at least not in any obvious way.

The source is Antoinette Renouf and Jayeeta Banerjee, "Lexical Repulsion between sense-related pairs", International Journal of Corpus Linguistics 12(3): 415-443, 2007. From their abstract:

We have proposed that there is a hitherto unexpored textual feature, which we call 'repulsion', which operates on the construction of meaning in an opposing way to that of word collocation. ... We focus on "lexical repulsion,' by which we mean the intuitively-observed tendency in conventional language use for certain pairs of words not to occur together, for no apparent reason other than convention.

For example, they suggest, merry tends to collocate with christmas, and happy with birthday; but also, merry actively resists combination with birthday. Thus they cite these counts from a corpus:

Word 1 Frequency   Word 2 Frequency   Collocates
merry 2326   christmas 90670   450
happy 8323   birthday 2416   526
merry 2326   birthday 2416   0
happy 8323   christmas 90670   299

Google counts show a similar pattern:

Word 1 Frequency   Word 2 Frequency   Collocates
merry 38.4M   christmas 288M   2.64M
happy 519M   birthday 185M   13.2M
merry 38.4M   birthday 185M   27.9K
happy 519M   christmas 288M   1.53M

This is an interesting paper. But even if you're a pro in the field of text analysis, you probably haven't come across it. The authors have not (as far as I can tell) posted a copy on their websites, or deposited one in a repository. (John Benjamins, the publisher, is missing from this list, suggesting that perhaps they don't allow such archiving.) And the International Journal of Corpus Linguistics is not very widely available. I happen to have an individual subscription, but my university's (excellent) library does not subscribe -- IJCL is available through ingentaconnect, but apparently only after a delay. To get a copy of this single article on line, I (or the library) would have to pay $41.88. That's a lot for 28 pages.

Although I'm in general in favor of open-access journals, I'm not an open-access absolutist. Someone has to pay, somehow, for the legitimate costs of running a journal. But for a journal with limited circulation, the IJCL pricing model could hardly have been better designed to minimize impact. As a result, I would think very hard before hiding my work under the proverbial barrel by publishing it in IJCL, and I'd certainly advise students and postdocs and junior faculty to think carefully through the issues as well.

[Update -- Tanja Säily writes:

Having just read your Language Log post on lexical repulsion, I'm happy to let you know that a related article by Renouf and Banerjee will be published by the end of the year in the second volume of a new e-series entitled Studies in Variation, Contacts and Change in English. This series will be freely available on the web at


[Graeme Hirst writes:

What Renouf & Banerjee call "lexical repulsion" seems to be pretty much the same as the idea of "anti-collocations" proposed by Darren Pearce in 2001, which was subsequently developed by my former student Diana Inkpen in her 2003 dissertation and published in papers by Inkpen & Hirst in 2002 and 2006.

Inkpen, Diana Zaiu and Hirst, Graeme. "Acquiring collocations for lexical choice between near-synonyms.'' SIGLEX Workshop on Unsupervised Lexical Acquisition, 40th meeting of the Association for Computational Linguistics, Philadelphia, 12 July 2002, 67--76.

Inkpen, Diana and Hirst, Graeme. "Building and using a lexical knowledge-base of near-synonym differences.'' Computational Linguistics, 32(2), June 2006, 223--262.

Pearce, Darren. 2001. "Synonymy in collocation extraction." In Proceedings of the Workshop on WordNet and Other Lexical Resources, Second meeting of the North American Chapter of the Association for Computational Linguistics, pages 41-46, Pittsburgh, USA.


Posted by Mark Liberman at 07:43 AM

Solving the mysteries of the ages via semantic search

It's breakfast time here in Reykjavik, and the morning mail includes this from Tim Finin:

AskWiki answers your questions using Wikipedia as its knowledge base. Solve the mysteries of the ages: Where is the source of the Nile? Who shot JFK? Would Kirk beat Picard? Is waterboarding torture? Can computers think? Does P=NP? Is there a God?, Is Jimbo Wales God?. As an extra feature, if you don't like the answer, you can edit the knowledge base to make it so!

I tried the questions in Tim's note, and got these answers:

Where is the source of the Nile?

Origin: Africa.

Who shot JFK?

Lee Harvey Oswald was arrested, in a theatre about 80 minutes after the assassination and was charged by Dallas police for the murder of Dallas policeman J.D. Tippit, before eventually being charged for the murder of Kennedy. Oswald denied shooting anyone, claiming he was a patsy, and was killed by Jack Ruby before he could be indicted or tried.

Would Kirk beat Picard?

As a proper noun, The Kirk is an informal name for the Church of Scotland, the country's national church. The Kirk of Scotland was in official use as the name of the Church of Scotland until the 17th century, and still today the term is frequently used in the press and everyday speech, though seldom in the Church's own literature.

Is waterboarding torture?

Recent torture methods include waterboarding, sexual humiliation and sexual abuse, and the use of dogs against prisoners.

Can computers think?

However, computers cannot "think" for themselves in the sense that they only solve problems in exactly the way they are programmed to.

Does P=NP?

No answer found for the question "Does P=NP?"

Is there a God?

Arguments for and against the existence of God have been proposed by philosophers, theologians, and others. In philosophical terminology, existence of God arguments concern schools of thought on the epistemology of the ontology of God.

The debate concerning the existence of God raises many philosophical issues. A basic problem is that there is no universally accepted definition of God. Some definitions of God's existence are so non-specific that it is certain that something exists that meets the definition; in stark contrast, there are suggestions that other definitions are self-contradictory.

Is Jimbo Wales God?

No answer found for the question "Is Jimbo Wales God?"

Score: 3.5 out of 8.

I made up a question of my own, and got an answer that suggests a future in politics for AskWiki:

Can I edit your knowledge base?

The best knowledge bases have carefully written articles that are kept up to date, an excellent information retrieval system (search engine), and a carefully designed content format and classification structure.

But the follow-up was not so deftly handled:

Do you plan to run for public office?

Niskanen went on to serve on the US Council of Economic Advisors under President Reagan, and his model provides a strong underpinning for the worldwide move towards cutbacks of public spending and the introduction of privatization in the 1980s and "90s.

Concerned about the effects of the hotel's breakfast buffet, I asked "Does herring cause cancer?" and got this stern warning:

The viral promoter or other transcription regulation elements in turn cause overexpression of that proto-oncogene, which in turn induces uncontrolled cellular proliferation.

I guess I'll stick to muesli -- "Does muesli cause cancer?" produces a much less alarming answer:

The term muesli is a diminutive of the Swiss German noun "Mues" (German: "Mus"), a cooking term for a semi-liquid made from raw or cooked fruit that lacks an exact English equivalent, but that is related to mush, paste, compote or the French purée.

Being a kind person, I closed with the traditional slow floater over the plate:

Who killed Cock Robin?

Who killed Cock Robin is a nursery rhyme beginning:

And there AskWiki stopped, apparently too choked with emotion to continue.

[Update -- a much better title over at opendotdotdot: "I'm Sorry Dave, I Can't Tell You That..." ]

[Update -- Fernando Pereira writes:

Answer to Your Question: "What is the best approximation algorithm for traveling salesman?"

Salesman is a 1969 direct cinema documentary film which follows four salesmen as they travel across New England and Florida trying to sell expensive Bibles door-to-door in low-income neighborhoods.

Answer to Your Question: "What did the Michelson-Morley experiment show?"

Better yet, the light emitted in one cavity can be used to start the same cascade in another set at right angles, thereby creating an interferometer of extreme accuracy.


Posted by Mark Liberman at 03:01 AM

November 03, 2007

Phylogeny recapitulates ontogeny

Rhymes With Orange takes on the development of civilization, using the familiar framing in which the language of early human beings is like the language of early childhood:

There's a bit of child phonology ([d] for [ð]) and two bits of child syntax (negation via sentence-initial no, the fronted topic NP all dose).  Of course, these features also appear in the English of adults whose command of the language is imperfect -- in "foreigner talk".  In either case, the English is that of speakers who are aiming at English but haven't entirely gotten there yet.

Posted by Arnold Zwicky at 03:00 PM

More Colbert

Two follow-ups to my posting about riffs on the Colbert title I Am America (And So Can You!): one about why we should care about distinguishing snowclones from playful allusions, and one about the possibility that the title is itself an instance of a pre-existing form.

There are cases where we're not entirely sure (here at Language Log Plaza) whether something is a snowclone, so why bother making distinctions?  Well, as I pointed out in my last lasting, if we go this route, there will be zillions of "snowclones" -- many of them entertaining, like the ones I cited back in 2005 in postings on playful allusions (here and here), but few of them actually formulaic.  Most of them are based on some model expression, and ring variations on it, but they vary different parts in different ways; they're all over the map.  And in most cases the model contributes little or no meaning to the variants, or different bits of meaning for different ones.

The clear examples of snowclones, on the other hand, have both a form, like "X is the New Y", and a (rough) meaning, like 'Y now plays the role that X used to play'.  In this respect they are like syntactic constructions and like idioms or clichés with open slots in them and like productive derivational formations.  People who use them pull them "off the shelf", so to speak.  Playful allusions, in contrast, are invented "on the spot", not pulled off the shelf.  (Of course, different people have somewhat different things on their shelves.  What's a snowclone for one person might be a playful allusion for another.)  So snowclones and playful allusions have a different psychological status.

Meanwhile, reader Ken Mallott wrote to suggest that the Colbert title exemplifies a formula "not uncommon in slogans and titles".  He googled up six pre-Colbert book titles in which had so can you/we in them -- that is, which had Verb Phrase Ellipsis (VPE) in so tags with the modal can in them.  Here's the set:

I Can Count 100 Bunnies: And So Can You  (link)

Pre-Setting Dice--I Beat the Bastards, So Can You!!!  (link)

If Lazarus Did It, So Can You!  (link)

Money Talks and So Can We  (link

I Feel Wonderful : So Can You [1956]  (link

They Lost More Than 40 Pounds! . . . And So Can You  (link)

The first thing to say about these titles is that they're composed of everyday ingredients from English syntax and lexicon, in particular the ingredients that would allow the writers to express the meanings they had in mind.  (They also lack the grammatical oddity of the Colbert title, which we've already explained.  The current point is not the grammaticality of the Colbert title, but the possibility that it's an instance, however odd, of an existing formula.)  I can't see anything formulaic here: people are just deploying the lexical items (can, you) and syntactic constructions (the so tag, which involves both subject-auxiliary inversion and VPE) that are available to them in ordinary English.  In addition, so can you/we is scarcely a slogan or title specialty, as you can check by some googling.

Just because some expression type occurs with some frequency doesn't mean that it constitutes a formula.

Is there a special form for these expressions?  Mallott suggested that there was: "<Something Good> <Inclusion and Encouragement>".  But that's not a matter of linguistic FORM, it's a matter of linguistic CONTENT; this is just a meaning that people sometimes have reason to want to express.   There are plenty of other ways to do it without using a so tag.  Here are a few alternatives to the last title above:

They lost more than 40 pounds,

... and you too can lose more than 40 pounds.
... and you can lose more than 40 pounds too.
... and you can too.
... and you could become one of their number.
... and it's possible for you to achieve the same goal.
... and with some work you'll be able to do the same.

Now, it's certainly true that alternative expressions of the "same" content won't be used with equal frequencies (in a linguistic community or for an individual speaker); people who use more than one variant will have preferences for one or another, in general or in particular contexts.  Where both passive and active clauses are available, English speakers in general use the active alternative much more often.  But everybody (even those who inveigh against passives) uses some passives.

So it is with so tags.  There are a number of alternatives, but I suspect that so tags and reduced too tags (with VPE) predominate statistically, just because they're so compact.  But people use the others too.

Just because people use some expression type more often than the alternatives doesn't mean that it constitutes a formula.

Posted by Arnold Zwicky at 01:13 PM

Homographic homophonic autantonyms

These are "words that do their job in the most sarcastic, sullen, passive-aggressive way possible, and they totally get away with it!"

[Hat tip: Amy de Buitliér]

[Update -- Laura Dickerson writes:

The series of children's books featuring Amelia Bedelia are based on the idea that she takes everything literally. In the first book of the series, the woman employee her as a house cleaner has to remember to say things like "un-dust" the furniture, having learned the hard way that saying "dust the furniture" will prompt AB to put dusting powder all over it.

But Neal Whitman, who is really Literal-Minded, begs to differ.]

[This concept was discussed here, with many examples, using the term "contronym". Or perhaps I should say, a family of similar concepts...]

Posted by Mark Liberman at 12:36 AM

November 02, 2007

Lost in translation

Today's Cathy:

Posted by Mark Liberman at 10:19 AM

November 01, 2007

An open letter

Lance Nathan, doing business as eternally stressed semanticist, posted a wonderful "open letter" on LiveJournal. It begins:

Dear [□ Sir / □ Madam / □ Representative / □ Journalist / □ Idiot],

I know you believe you know a great deal about

[□ linguistics / □ children's literature / □ law / □ psychology / □ other (please specify)]

simply because you

[□ use language / □ read Harry Potter and Goodnight Moon / □ watch Law & Order / □ have a mind],

or because you've read a newspaper article about

[□ the lack of numbers in Pirahã / □ Dumbledore being gay / □ some Supreme Court decision / □ Prozac].

Go read the rest, and the follow-up post too. (Hat tip, The Ridger.)

Posted by Benjamin Zimmer at 11:49 PM

The perpendicular pronoun

From a review on

I'm very much of two minds about this book. There's little need to offer further comment on:

1. The author's ego (in one paragraph on page 59, he uses the perpendicular pronoun 7 times; the possessive first person another 5) ...

This somewhat cutesy way of referring to the pronoun I (which in its spelling is nothing but a line perpendicular to the line of text) was new to Elizabeth Daingerfield Zwicky (who reported it to me) and to me, but it's fairly well represented on the web (1,590 raw hits just now).  Perhaps it's a way of avoiding actually USING the pronoun, which gets a bad press in a lot of advice to writers.

On the web you can find a 2003 column "The Perpendicular Pronoun" by Maureen Dowd, contrasting Bush 41's avoidance of this pronoun -- according to Dowd, he often omits the subject I entirely, sometimes uses an inclusive we (though just how much "modesty and self-effacement" -- Dowd's words -- this shows is debatable) -- with Bush 43's bold use of it.

Twenty years before this (on 11/25/82) we find the expression in an episode of  Yes Minister, in the mouth of a stunningly circumlocutory character:

Sir Humphrey: "Minister, I think there is something you perhaps ought to know."

Jim Hacker: "Yes Humphrey?"

Sir Humphrey: "The identity of the Official whose alleged responsibility for this hypothetical oversight has been the subject of recent discussion, is NOT shrouded in quite such impenetrable obscurity as certain previous disclosures may have led you to assume, but not to put too fine a point on it, the individual in question is, it may surprise you to learn, one whom you [sic] present interlocutor is in the habit of defining by means of the perpendicular pronoun."

Jim Hacker: "I beg your pardon?"

Sir Humphrey: "It was...I."

(This quotation is used in the Wikipedia page on logorrhoea (or verbal diarrhea) to illustrate the phenomenon.)

And you can find plenty of badmouthing of I, as on the blog Daily Diatribe (written by an Australian):

One of the difficulties of writing about personal experiences and thoughts is the perpendicular pronoun: "I". Allan Moult, the editor who guided my early writing efforts [note missing comma] was quite fierce about eliminating every possible occurrence of the perpendicular pronoun [many style manuals would require a comma here too] and my gratitude to him on this account is immense. On the other hand, it can be quite difficult to achieve. One way out of the difficulty is to invent an alter ego, hence The Pompous Git.

The idea that I (also me) is immodest has a long history, but many writers on usage regard avoiding the pronoun as false modesty (or, in Bill Safire's phrasing in No Uncertain Terms, p. 183, "phony humility"), especially since the strategies of avoidance -- the author, the present writer, yours truly, myself (which Safire calls the "horizontal pronoun", in contrast to the perpendicular pronoun), recasting in the passive, etc. -- mostly offend writers like Safire and Bryan Garner (see his article on FIRST PERSON in GMAU, p. 349) much more than straightforward I and me would, especially in writing about personal experiences or opinions.  And in this case I agree with them.

Posted by Arnold Zwicky at 07:16 PM

The Muttonhead Quail Movement

Today on OUPblog I delve into a topic I've discussed on several occasions on Language Log (here, here, here, and here): the modern phenomenon of spellchecker-induced slipups, a.k.a. "the Cupertino effect." (We can also think of this as a computer-aided genus of "incorrections," i.e., corrections that are themselves incorrect.) Though much of the column covers old ground, there are a few new Cupertino-isms in the mix. My favorite recent example is Reuters accidentally referring to Pakistan's Muttahida Quami Movement as the Muttonhead Quail Movement.

Speaking of spelling, I'm a bit surprised that MQM, one of the largest political parties in Pakistan, chooses to transliterate its Arabic-derived Urdu name متحدہ قومی ('United National') as Muttahida Quami rather than Muttahida Qaumi. It looks like the typical transliteration of قومی ('national') as qaumi has been "incorrected" to quami somewhere along the line, based on Anglophone expectations that u should always follow q. But this seems to be a long-standing respelling in Roman Urdu: the old nationalist newspaper Qaumi Awaz, for instance, shows up transliterated as Quami Awaz in dozens of books and periodicals on Google Book Search. I'd imagine the quami spelling encourages speakers not familiar with Arabic or Urdu to pronounce the word as /kwami/ rather than /qaumi/. And in the Reuters goof, quami abets the Cupertino effect, since it's closer to quail!

Posted by Benjamin Zimmer at 10:45 AM