May 31, 2005

That's such a, like, coincidence

Another one for the Language Log like list -- the following posters greeted me this morning as I was walking from the shuttle stop to my office.

Thanks to Vic Ferreira and Dennis Fink for taking the pictures, and to Chris Barker for suggesting the most likely explanation for them: a UCSD art project. (Confirmation of this hypothesis still pending.)

[Another question: is the idiosyncratic spelling in "Academy of Linguistic Awarness" part of the artistry? or is it an example of Hartman's Law of Prescriptivist Retaliation? or both? -- myl]

[Update, 6/7/2005: Jesse Ruderman, who found this post here, writes to note that he's got better pictures of the posters here. Note that the first comment on that post notes the same thing as Mark does above.]

[ Comments? ]

Posted by Eric Bakovic at 08:03 PM

Stanley Fish moves into linguistics

Today the New York Times published an article by Stanley Fish (printer-friendly version here; it may disappear behind a pay wall if you don't take a look now) in which he explains how he teaches freshman writing classes at the University of Illinois at Chicago in which content is banned, forbidden, verboten. No opinions allowed, just work: the work is that the students have to create a language. Seriously. Look:

On the first day of my freshman writing class I give the students this assignment: You will be divided into groups and by the end of the semester each group will be expected to have created its own language, complete with a syntax, a lexicon, a text, rules for translating the text and strategies for teaching your language to fellow students. The language you create cannot be English or a slightly coded version of English, but it must be capable of indicating the distinctions — between tense, number, manner, mood, agency and the like — that English enables us to make.

Stanley Fish is famous for the way he built up the English department as Duke University during the heyday of postmodernism in American universities. (He is also famous for something else: he is generally held to be the original model for the character named Professor Morris Zapp in David Lodge's novels Changing Places and Small World.) He moved to the University of Illinois at Chicago as a dean in 1999 to improve that university's standing in humanities disciplines, and reportedly resigned the deanship when he found that the institution was not standing behind its original financial commitments.

Of course, at first the students don't know what he's talking about when he tells them to devise a language, having never heard of tense, agency, and such. But by the end of the semester they get it. To invent a language of adequate expressive power you have to develop a grasp of syntax. His point is that you can never be a really effective and confident writer unless you know something about sentence structure, and you'll be distracted from sentence structure if you start paying attention to content and writing about your experiences and opinions and have the writing instructor pay attention to them. No content, he insists, because the topic of this course is pure linguistic form. Professor Fish has turned into a linguistics instructor, only I suspect he doesn't know it.

I first heard about this course from a group of applied linguistics professors at his campus that I met while I was in Chicago last year. They say it works pretty well. Though they also say that while he was Dean of the College he never paid much attention to them, and when they pointed out to him that he was now doing a linguistics course, he looked surprised, and simply said "Oh." But it's certainly right, he really is doing linguistics (if a little unconventionally). In fact, you could almost define the fields of syntax and semantics as the study of the ways in which a language might be designed to be able to indicate the distinctions between tense, number, manner, mood, agency and the like that English enables us to make (and other languages enable us to make).

Posted by Geoffrey K. Pullum at 05:59 PM

Knowingly corruptly persuade

Today's U.S. Supreme Court decision in Arthur Andersen LLP v. United States hinges on a point of linguistic analysis.

The decision said:

As Enron Corporation's financial difficulties became public, petitioner, Enron's auditor, instructed its employees to destroy documents pursuant to its document retention policy. Petitioner was indicted under 18 U. S. C. §§1512(b)(2)(A) and (B), which make it a crime to "knowingly ... corruptly persuad[e] another person ... with intent to ... cause" that person to "withhold" documents from, or "alter" documents for use in, an "official proceeding." The jury returned a guilty verdict, and the Fifth Circuit affirmed, holding that the District Court's jury instructions properly conveyed the meaning of "corruptly persuades" and "official proceeding" in §1512(b); that the jury need not find any consciousness of wrongdoing in order to convict; and that there was no reversible error. [emphasis added]

Held: The jury instructions failed to convey properly the elements of a "corrup[t] persuas[ion]" conviction under §1512(b).

The cited portion of the law 18 USC §1512(b), reads in a less abridged form as follows:

(b) Whoever knowingly uses intimidation, threatens, or corruptly persuades another person, or attempts to do so, or engages in misleading conduct toward another person, with intent to -
(1) influence, delay, or prevent the testimony of any person in an official proceeding;
(2) cause or induce any person to -
(A) withhold testimony, or withhold a record, document, or other object, from an official proceeding;
(B) alter, destroy, mutilate, or conceal an object with intent to impair the object's integrity or availability for use in an official proceeding;
[...]
shall be fined under this title or imprisoned not more than ten years, or both.

The body of the opinion explains

This Court's traditional restraint in assessing federal criminal statutes' reach [...] is particularly appropriate here, where the act underlying the conviction--"persua[sion]"--is by itself innocuous. Even "persuad[ing]" a person "with intent to ... cause" that person to "withhold" testimony or documents from the Government is not inherently malign. Under ordinary circumstances, it is not wrongful for a manager to instruct his employees to comply with a valid document retention policy, even though the policy, in part, is created to keep certain information from others, including the Government. Thus, §1512(b)'s "knowingly ... corruptly persuades" phrase is key to what may or may not lawfully be done in the situation presented here. The Government suggests that "knowingly" does not modify "corruptly persuades," but that is not how the statute most naturally reads. "[K]nowledge" and "knowingly" are normally associated with awareness, understanding, or consciousness, and "corrupt" and "corruptly" with wrongful, immoral, depraved, or evil. Joining these meanings together makes sense both linguistically and in the statutory scheme. Only persons conscious of wrongdoing can be said to "knowingly ... corruptly persuad[e]." And limiting criminality to persuaders conscious of their wrongdoing sensibly allows §1512(b) to reach only those with the level of culpability usually required to impose criminal liability.

Homework questions:

1. What are the plausible parses for 18 USC §1512(b)?

2. What is the the scope of modification of the adverbs knowingly and corruptly in each plausible parse?

3. Do you think laws might be clearer if lawmakers normally took a couple of linguistics courses?

[Link via email from Lane Greene, who also draws attention to this zinger at the end of the opinion, which was written by Chief Justice William H. Rehnquist:

The government suggests that it is "questionable whether Congress would employ such an inelegant formulation as 'knowingly ... corruptly persuades.' " Long experience has not taught us to share the Government's doubts on this score...

]

Posted by Mark Liberman at 03:03 PM

How soon before we see the complexities?

reducedq My posting on soon before missed at least one important complexity, which correspondents have now pointed out: how soon before in the examples I gave has how soon modifying before, but there are plenty of elliptical questions in which how soon does not: How soon before we have to leave? 'How soon will it be before we have to leave?'  These elliptical questions, which I believe are unproblematically acceptable, change the Google statistics somewhat, but without obscuring the main point I wanted to make with them.

More important, they provide a possible source favoring how soon before (with how soon modifying before) even for speakers who reject soon before otherwise.

In addition, one correspondent has suggested looking at future-oriented sentences like How soon before midnight will they meet? -- my earlier examples, like How soon before midnight did they meet?, were all in the past tense -- to see if their "basic query" (e.g., 'How soon will they meet?') improves their acceptability.  Whether or not this idea pans out, it is true that the Google examples of both types are heavily future-oriented.


I begin with e-mail from Chris Maloof, who pointed out the many elliptical questions among the how soon before cites that a Google web search provides.  (Marilyn Martin also offered an elliptical question example.)  These are of the form how soon  +  before-clause, and they lack both a subject and a verb.  On the other hand, the examples with how soon modifying before are just ordinary interrogatives, with fronted how soon before X (where X is a clause, as in (2b,c) below, or a NP object, as in (2a,d)), followed by a clause (in inverted or uninverted order, depending on whether the whole thing is in a main or subordinate clause: (2a-c) vs. (2d) below).  Some examples from Google:

Elliptical questions:

(1a)  "How soon before I can ski?" Ankle injuries are common...
www.stoneclinic.com/index_ankle.htm

(1b)  How soon before every state has conflicting laws on the subject? The states can't currently agree upon ages at this time...
castlecops.com/article5830.html

(1c)  And how soon before we will see weirder instruments like Futures being traded on virtual currencies?
terranova.blogs.com/terra_nova/2005/02/no_shortage_of_.html

(1d)  If I order now, how soon before I get it?
www.scarepros.com/questions.html

Ordinary interrogatives:

(2a)  How soon before a grant deadline should I submit a protocol?
www.umass.edu/research/comply/humanfaq.html

(2b)  How soon before I travel can I apply for my WHM visa?
www.australian-embassy.de/visa/faqs/faq_whm.html

(2c)  How soon before the quarter begins may a student be placed in homestay?
www.skagit.edu/news.asp_Q_pagenumber_E_380

(2d)  ... they will help you determine what book to write, how quickly to write it, and how soon before publication you need to start your marketing efforts.
entrepreneurs.about.com/cs/marketing/a/aa091803.htm

The elliptical questions should be generally acceptable, since they don't have soon (with its usual component of afterness) in combination with before.  So far as I know, this is the case, but it needs examination.  (At this point, I'm hoping to encourage someone else to take up soon before as a project.  My plate is pretty full.)

It turns out to be no easy task to estimate the relative frequencies of the two types; the Google cites are full of repetitions and near-repetitions.  (Many of the ordinary interrogatives are, like (2a-c) above, from faq's, which tend to have similar form.)  My first impression -- again, this should be investigated further -- is that the two types are roughly even, which means that the number of relevant how soon before hits should be cut roughly in half, and the relevant after/before ratio roughly doubled.  Even with this adjustment, the frequency of how soon before is still hugely less than the frequency of soon before without modification by how.  There's still something to be explained.

But the elliptical questions might not just be confounding data; they might have something to say to us.  They provide a pool of acceptable clauses beginning with how soon before and might therefore boost the acceptability of ordinary interrogatives of this form, even for people who don't otherwise accept soon before.  Something to consider.

Finally, Marilyn Martin has suggested looking at future-oriented sentences like How soon before midnight will they meet? to see if their "basic query" (e.g., 'How soon will they meet?') improves their acceptability.  This is, in effect, a suggestion that the future-oriented examples might be treated as amalgams of a how soon question (How soon will they meet?) with a neutral duration question (How long before midnight will they meet?).  I'm dubious about this suggestion, because the past-tense examples could be given a similar analysis (How soon before midnight did they meet? = How soon did they meet? + How long before midnight did they meet?), so I would predict no difference in acceptability between past and future examples.  Something else for someone to look at.

Still, the ordinary interrogatives from Google are, heavily future-oriented; the examples in (2) are all in the present tense, understood with a future orientation relative to the temporal reference point. (The elliptical questions are all future-oriented.)   Of course, the future orientation pretty much comes along with the genre of most of the Google examples, so it remains to be seen whether there is any actual association between how soon before interrogatives and future orientation.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 01:52 PM

Autoantonymy

My partner Barbara is one of the most careful and precise speakers I know, and I just heard her refer to a brand new computer for somebody that she had seen "in the department office unpacked." Almost holding my breath, I asked her whether the carton was still sealed. "Yes," she said, puzzled why I would ask that (the computer is of no interest to me; it is sitting in a department office 35 miles away from where I work and had only peripherally come into the conversation). But of course you regular Language Log readers will know what had engaged my interest. Her answer confirmed that she had meant it was still un-unpacked. Thus was I finally convinced in a few seconds that we have a lexical phenomenon here, not a sporadic error heard occasionally here and there. Unpacked sometimes means "not in a state of having been packed" and sometimes (even for Barbara) it means "not in a state of having had the packing operation undone", i.e., "not in a state of having been unpacked".

But this is not the first autoantonymous lexical item (word serving as its own opposite), even if we ignore the existence of idioms like could care less (= couldn't care less). Sanctioning something can mean either permitting it or setting penalties for it; renting an apartment can mean either being a tenant or being a landlord; and there are other examples. They don't occur to me right now, but I once heard the guys on Car Talk come up with a dozen of them, and other Language Log contributors will soon come up with plenty. You'll see. Watch this space.

Posted by Geoffrey K. Pullum at 01:29 PM

Addictive to eggcorns?

Fernando Pereira emailed an example from a mailing list: "become addictive to" in place of "become addicted to". He also sent a sample of web examples of the same substitution.

Why is it so difficult to maintain good habits, when it’s so easy to become addictive to bad ones?
I didn't realise that one could become addictive to this drug.
If you do this over and over you can become addictive to it.
Sometimes I think I am addictive to shopping on EBAY.

Is this an eggcorn, as we've taken to calling a word or phrase given a new, etymologically incorrect morphological analysis, similar in sound and plausible in meaning? I'm not sure. The substitution of "addictive to" for "addicted to" is certainly an example of an etymologically incorrect analysis that is similar in sound. But there are two other processes that might also be at play. First, there's a particular kind of slip of the fingers that results in typing the wrong ending on a word -- -ing in place of -ed, for instance, or -ation instead of -ator. I do this all the time when I'm typing, though I can't recall ever having done it in speech. And second, there's a rarer process by which the logical structure of derived adjectives gets tangled, without the endings necessarily being similar in sound. A good example is the prescriptively-deprecated use of nauseous to mean nauseated.

The substitution of addictive for addicted is exactly parallel to the substitution of nauseous for nauseated. In both cases there is an affecting substance or activity (call it "argument 0"), and a creature that experiences its effects (call it "argument 1"). Traditionally it's argument 0 that is nauseous or addictive -- the innovation is to apply those adjectives to argument 1.

As the AHD usage note indicates, "it appears that people use nauseous mainly in the sense in which it is considered incorrect". The OED's first citation for this usage is from 1949:

1949 Sat. Rev. 7 May 41 After taking dramamine, not only did the woman's hives clear up, but she discovered that her usual trolley ride back home no longer made her nauseous.

Curiously, the very earliest citations in the OED are for a similar usage, in which nauseous means "inclined to sickness or nausea":

1613 R. CAWDREY Table Alphabet. (ed. 3), Nauseous, loathing or disposed to vomit.
1651 J. FRENCH Art Distillation V. 144 It may be given..to children or those that are of a nauseous stomack.
1678 J. RAY Coll. Eng. Prov. (ed. 2) Pref., I have..so veiled them, that I hope they will not turn the stomach of the most nauseous.

It'll be interesting to see whether the innovative meaning for addictive grows and takes over, as the innovative use of nauseous did, or whether it remains (as it is now) a sporadic mistake.

Another question: are there other adjectives where a similar process is taking place?

[Update: Ben Zimmer has tracked nauseous="nauseated" way back before 1949:

A usage no doubt repulsive to the John Simons and Robert Fiskes of this world is the equating of "nauseous" with "nauseated" (rather than the earlier sense of "nauseating"). The OED3 draft entry dates this sense of "nauseous" to 1949, but surely we can do better...

1885 Daily Gleaner (Kingston, Jamaica) 14 Apr. 2/5 I saw the long and white helmeted troops march in apparent comfort on their way, while I swayed to and fro and was bumped up and down and oscillated and see-sawed from side to side until I became nauseous and had exhausted my profane Arabic vocabulary in the vain attempt to induce "Daddles" to consider my comfort more than his own.

1903 Coshocton Daily Age (Ohio) 16 Sep. 1/1 Her voyage through the spirit land made her somewhat nauseous and was not the most pleasant journey imaginable, but she is on the high road to recovery now.

1906 Daily Gleaner (Kingston, Jamaica) 7 July 7/3 (advt.) When you feel nauseous and dizzy, don't take brandy or whisky -- try Nerviline.

1927 Chicago Tribune 9 May 10/3 This lasts ten or fifteen minutes, and then I have a terrible headache and I feel nauseous.

1933 Los Angeles Times 21 Sep. II6/1 (advt.) The salts that do not make you nauseous.

The 1885 cite is from an unnamed piece entitled, "In the Camps at Korti: Terrible March across the Heated Sands of the Soudan" ("Daddles" is the name of the writer's camel). So perhaps British (or Commonwealth) sources antedate American ones for this usage (despite the OED's "orig. U.S." tag).

Here is the earliest cite I could find expressing concern over the proper use of "nauseous" (from Frank Colby's column, "Take My Word For It!"):

1946 Los Angeles Times 8 Nov. II7/7 From a recent issue of Look: "Stefan became nauseous." Could that be right? ... Yes, if the author intended to say that Stefan was loathsome; so disgusting as to cause nausea. Obviously he meant to write: Stefan became nauseated.

 

]

Posted by Mark Liberman at 08:16 AM

May 30, 2005

Soon before

soonbefore The story begins with an American Dialect Society posting by Alison Murie on 5/18/05.  Murie found the soon before in "...security official said in an interview soon before the transfer of sovereignty that..." (Seymour Hersh, Chain of Command, pp. 355-6) to be very odd, adding that she would have expected shortly before here.  Others agreed, though I myself found no problem with soon before.

Google web searches quickly turned up a huge disparity between (infrequent) soon before and (very frequent) soon after, an unsurprising difference given OED2's definition of soon as "within a short time (after a particular point of time specified or implied)", which has afterness as well as shortness in it (plus a reference point).  Those (like me) who accept soon before are innovators who have extended the meaning of soon by dropping the afterness component.

Then it turned out that the disparity between soon before and soon after essentially disappears when we look at how soon before and how soon after.  From the Google figures and preliminary judgment collection, it appears that there are three varieties: one without soon before, one with soon before only when modified by how, and one with soon before generally.  I'll speculate about how the intermediate variety might have arisen.

Finally, later discussion on ADS-L suggested that acceptability judgments on soon before might be like acceptability judgments on positive anymore (as in Gas is expensive anymore), in that these judgments are sometimes unreliable.  I'll argue that the two situations aren't parallel, and that judgments on positive anymore aren't chaotic or generally unreliable.


The examples in question are ones like like following:

(1)  Soon before, soon not modified by how:
an interview soon before the transfer of sovereignty
an interview soon before sovereignty was transferred
They met soon before midnight.

(2)  Soon after, soon not modified by how:
an interview soon after the transfer of sovereignty
an interview soon after sovereignty was transferred
They met soon after midnight.

(3)  Soon before, soon modified by how:
How soon before the transfer of sovereignty was he interviewed?
How soon before sovereignty was transferred did it happen?
How soon before midnight did they meet?

(4)  Soon after, soon modified by how:
How soon after the transfer of sovereignty was he interviewed?
How soon after sovereignty was transferred did it happen?
How soon after midnight did they meet?

In what follows, I'll assume that types (2) and (4) are generally acceptable; the meaning of soon is entirely compatible with the meaning of after.  It's types (1) and (3) that we're interested in.

Now, the raw Google web hits:

"soon before" -how
31,900
"how soon before"
20,300
"shortly before"
2,960,000
"soon after" -how
2,670,000
"how soon after"
86,400
"shortly after"
11,400,000
after/before ratio
83.70
after/before ratio 4.25
after/before ratio 3.85

"very soon before"
3,900
"very soon after"
187,000
after/before ratio 47.95

In the left column of the first table, we see the gross disparity between soon before and soon after when soon isn't modified by how, a disparity reproduced in the second table for modification by very.  By themselves, these figures suggest a general disfavoring of soon before, which is entirely consonant with its being an innovative combination.  Still, the numbers for soon before aren't tiny; my variety is well represented.

The center column of the first table has the surprise: under modification of soon by how, the disparity between before and after essentially disappears, falling almost to the level of shortly before vs. shortly after, where after is favored over before, though not hugely.  It looks like there are rather a lot of people who use how soon before, but little or no soon before otherwise.

This impression is borne out by preliminary (and still unsystematic) collections of judgments.  So far most informants fall clearly into three types: full innovators, those who accept soon before generally, in examples (1) and (3); partial innovators, those who accept how soon before (in (3)) but not otherwise (as in (1)); and conservatives, who reject soon before, in both (1) and (3).

Where do the partial innovators come from?  Their comments on examples like (3) are telling.  How else would you say it, they ask?  For soon before in (1), these informants offer paraphrases with other adverbs denoting short duration: shortly before, just before, right before.  But these adverbs either resist modification by how (shortly: ?how shortly before/after) or reject it entirely (just and right: *how just/right before/after).  There are adverbs that are fine modified by how -- long and much, as in how much/long before/after -- but these lack the semantic component of short duration.  Short of recasting the question thoroughly, there's no way to package short duration into a how question.  That is, how soon before fills an expressive gap.  Even if you won't go all the way to (1), you might be willing to go as far as (3).  Yes, this is all highly speculative.

In further discussion on ADS-L (5/19/05), Ron Butters suggested that soon before might be like positive anymore, in that informants' judgments are unreliable, not always in accord with their practice.  But the two situations aren't parallel: a great many positive anymore speakers have been confronted with criticism or correction from others -- the feature has even made it into some usage manuals, as a regional variant to be avoided in formal writing -- while soon before seems to have escaped notice.  Nothing confounds acceptability judgments quite so much as explicit regulation, so that it's scarcely a surprise that some people who use positive anymore claim not to.  (Lots of people who use restrictive relative which -- E. B. White and Jacques Barzun, for example -- claim not to, after all.)  For soon before, there is no explicit regulation and no reason to treat informant judgments as any more suspect than informant judgments on other unregulated features.

In any case, informant judgments on positive anymore aren't simply a morass.  Some people don't use positive anymore and report, accurately, that they don't.  Some people use positive anymore and have escaped explicit regulation or failed to attend to it, and they report, accurately, that they use it (and where they use it).  Alas, some people are unreliable judges.  But that's no reason to throw everybody out; our task is to figure out who's who.

There is one way in which soon before probably is like positive anymore: it's not simply a matter of having the feature or not having it.  Instead, the feature is allowed or prohibited, favored or disfavored, in certain contexts, and the details of these distributions differ from speaker to speaker.  That's the way variation works.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 08:11 PM

Still unslacked

...the interest, that is, in the "still unpacked" conundrum I raised in some earlier postings, now the subject of the most recent column by the Boston Globe's always engaging Jan Freeman (there must be something in the semantics literature about what the definite is doing in NP's like that). Freeman offers some terrific examples from the Globe, the Washington Post, and The New York Times Magazine, and ultimately throws the question to her readers. (When last I checked, sentiment was running 73% in favor of declaring the construction illogical, but this is Not a Scientific Poll, as the cable news stations are always reminding us.) In the course of things she has some nice things to say about the blog in general. Well, altrettanto back at you, Jan.

Posted by Geoff Nunberg at 03:27 PM

Moreso

moreso From a student's homework assignment, a week ago:
(1)  Number one is not necessarily a problem with pronouns, but moreso just an incorrect slip of the tongue.
The moreso caught my eye.  In a minor way because of the spelling, as a single word.  In a major way because of the usage; spelling moreso as two words doesn't make the sentence any less baffling to me:
(2)  Number one is not necessarily a problem with pronouns, but more so just an incorrect slip of the tongue.
More so is functioning here like contrastive more (not far from rather in its effect), reinforcing the but.   I can use more, without the so, this way --
(3)  Number one is not necessarily a problem with pronouns, but more just an incorrect slip of the tongue.
or more of (preferably losing the just) --
(4)  Number one is not necessarily a problem with pronouns, but more of (just) an incorrect slip of the tongue.
But more so just doesn't cut it for me; the so seems extraneous.

Recent versions of the OED recognize the spelling moreso (in the U.S.) but not the innovative usage in (1) and (2), or further innovations to be found in writing on the web.  I offer this material as fodder for lexicographers, along with some speculations about the development of innovative moreso/more so.


First, the spelling question.  The spelling moreso is deprecated in several usage sources (for example, Paul Brians's Common Errors in English Usage and Evan Morris's Word Detective site), but it is very common, and the Dec. 2002 draft revision of the OED relevant subsection on more mentions it as an American variant:
(5)  With ellipsis of the word or sentence modified.  Now freq. with anaphoric so... in more so (also, chiefly U.S. moreso).
In what follows, I'll mostly be citing examples with moreso, simply because searches for this variant produce less junk than searches for more so.  But all the uses below could be equally well documented with more so examples.

Now, on to the uses, beginning with those documented by the OED.  As background, there are occurrences of more with "zero anaphora":
(6a)  1852 M. ARNOLD Farewell viii, I too have wish'd, no woman more, This starting, feverish heart away.

(6b)  1862 G. BORROW Wild Wales lii, 'Are the Welsh..as clannish as the Highlanders?' said I. 'Yes', said he, 'and a good deal more'.

And then more followed by anaphoric so:
(7a)  1735 G. BERKELEY Def. Free-thinking in Math. §28 This is so plain that nothing can be more so.

(7b)  1788 J. MADISON in Federalist Papers lvii. 158 The districts in New Hampshire in which the senators are chosen immediately by the people, are nearly as large as will be necessary for her representatives in the congress. Those of Massachusetts are larger than will be necessary for that purpose. And those of New-York still more so.

(7c)  1816 J. AUSTEN Emma I. xii. 209 'I only want to know that Mr. Martin is not very, very bitterly disappointed.' 'A man cannot be more so,' was his short, full answer.

(7d)  1997 C. SHAW Sc. Myths & Customs x. 223 Anyone perceived as being different from society's norms was a potential target--no-one moreso than the local wise-woman.

The important point here is that more with zero anaphora and more with anaphoric so are in alternation.  Either could replace the other in the examples in (6) and (7); (7d) with plain more is especially felicitous, to my ear.  The choice between one variant and the other is a stylistic one.  One relevant effect is that, in general, explicit anaphora, as in more so, tends to be seen as more emphatic or contrastive than zero anaphora, as in plain more.

This is as far as the OED goes.  The examples are all anaphoric.  But the so in (1) and (2) is not anaphoric.  It is, however, contrastive/emphatic, a fact that suggests a possible route from the system illustrated in the OED to the system illustrated in (1) and (2): alternating more and moreso have been reinterpreted as mere plain and emphatic counterparts, with no necessary anaphoricity.  This system is amply illustrated on the web, notably by many examples with moreso followed by than:
(8a)  Even moreso than in our dealings with our fellow humans, our dealings with the life world are mired in traditions which vary from gratitude and awe to harvesting and stewardship and on to subdue and exploit.
http://meme.com.au/theoria/ethical_tripod.html

(8b)  I would say that the term Theism implies a system of belief based on tradition and dogma moreso than on logic and empirical observation.
http://www.improvedclinch.com/index.php/weblog/comments/imore_gay_stuff_i/

(8c)  The grace of the art is there moreso than in karate, etc., but it is not dancing. It is relying on physics and other principles moreso than karate etc. ...
forum.japantoday.com/m_375137/mpage_2/tm.htm
but also without an explicit than for comparison, as in (1) and
(9)  My wife and I were watching the making of Disney's Oliver and Company on DVD. Unlike today's extensively (and sometimes exhaustively) researched "making-of" featurettes, you could tell this one was used moreso as a marketing tool.
http://www.kartooner.com/archives/2005/03/28/the-state-of-animation/

At this point, moreso is open to a further reinterpretation, as a simple contrastive sentence adverbial glossable as 'even more, it is even more the case (that/since)', without any specific standard of comparison implicated:
(10a)  Unpopulated structural elements of 'national households' evolved; moreso, those structural elements were developed (from the beginning of the most harsh environmental requirements for expenditure of human life on Earth).
http://www.ovaloffice.org/

(10b)  Of course I want this guy to stop spamming people, but moreso, I want this guy to at least take my email off his return path when sending out these...
forum.spamcop.net/forums/ lofiversion/index.php/t3738.html

(10c)  Hello,
Am mr donald ,a pilot by proffesion and i reside in the usa.am glad to inform you that am sicerely interested in the adoption of your pet. am guarantee the pet a lovely home .  Moreso the pet is coming to a large and a fenced garden in which it will be comfortable in playing up to any length. At the risk of sounding rude kindly let me know the last asking price , moreso i would love to see the pics and as well i want to know the name of the pet.  [AMZ: Except for bolding, I've left this remarkable item exactly as it appeared online.]
http://www.petpages.com/Forum/MSGViewThread.asp?CMD=NEW&ID=327
or as a simple contrastive adverb combining with a NP and glossable as 'especially':
(11a)  People and moreso, soldiers, should know that conscientious objetion is not about any particular war, but to war per se,...
www.haaretz.com/hasen/pages/ArticleNews.jhtml?itemNo=492015&contrassID=13&subContrassID=1&sbS

(11b)  Yeah, you'll notice that most of the retards you run into play hunters and rogues, hunters moreso.
forums.gamedaily.com/index. php?act=findpost&pid=1451187

In all these developments, the connection to anaphoric comparative more (so) hasn't been entirely lost, though the uses have drifted pretty far from the models in the OED.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 01:09 PM

The best of one

superlative According to The Ethicist at The New York Times Magazine, Randy Cohen, something can be the best only if there are at least three things in the comparison set.  This, Cohen tells us (5/29/05, p. 22), is a matter of fact, and it's a matter of fact because it's a matter of grammar.  This grammarian objects.


Cohen is replying to a query from Steven Tanzer of Bayside NY:

My son's school announced that a $750 scholarship would be awarded to a senior submitting the best short essay by Feb. 1.  After the deadline, the school announced that because only one student had applied for the scholarship, it was extending the deadline.  My son protested: according to the rules, he should be the winner because he submitted the only and best essay.  Was it ethical to extend the deadline?

No, Cohen replies:

Even if your son were content to win on a technicality, he doesn't have much of a case.  If the prize is for "the best short essay," the school may not award it to him.  The superlative "best" necessarily refers to the most impressive of three or more -- good, better, best.  If there are not at least three entries, there can be no best essay.  Live by legalisms; die by legalisms.

On the question of whether it's ok for the school to extend the deadline (for whatever reasons), I will not pronounce.  But on the grammatical question I have an opinion -- which is that Cohen's dictum, that it takes three to make a superlative, is not a rule of English and is therefore irrelevant to any ethical considerations.

Writers and speakers of English frequently use superlatives for reference classes of unknown size.  If I offer something for auction to the highest bidder, if I advertise that I will award a contract to the lowest bidder satisfying the requirements I stipulate, if I place a personals ad and tell my friends that I'll go with the guy whose photo strikes me as the handsomest, in all these situations it might turn out that reference class is huge, but it might turn out that it's empty (in which case nothing happens), and it might turn out that it's of size 1 (in which case that one's the winner) or 2 (in which case the respectively higher, lower, or more handsome candidate wins).  In a slightly more subtle example, if I offer concert tickets to the first person who requests them and only one person responds, that person (the first of one respondents) gets the tickets.  This is everyday reasoning, using everyday language.

(Mathematicians, with their passion for generality, take the same route.  If you're looking at sets of elements with a total ordering on them and defining "least" and "greatest" on these sets, then your definitions will extend to sets of cardinality 1 and 2.  As a result, two positive integers always have a greatest common divisor, even if -- as is the case with relatively prime numbers, like 15 and 16 --  they have only one common divisor, 1.)

Now, back in the real world, you might want to set a size for the reference class.  Maybe you'll insist that there must be at least four qualified bidders on your contract.  Or three.  Or two.  That's up to you.  In the case of the school essay contest, it might have been wise (as Cohen himself observes) for the school to have prepared for the contingency of only one applicant and to have set, in advance, a minimum number of applicants.  But none of this has anything to do with grammar.

There is much silliness abroad on the "logic" governing the use of comparatives and superlatives.  Check out, for example, the entertaining entry in Merriam-Webster's Dictionary of English Usage for superlative of two (as in Thomas Gray's "if one is alive and the other dead, it is usually the latter that is the handsomest").  If there's ever another edition of MWDEU, maybe it should have an entry on superlative of one (citing Cohen, of course).

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 11:15 AM

Preparing (for?) a pandemic

In response to my post on George Bush and Sally O'Reilly, where I (rather unfairly) hypothesized that "Nature's Paris correspondent is using anti-American and anti-Bush prejudice to promote awareness of an international public health issue", several European readers proposed other motivations.

For example, Moritz Schallaboeck argued that "the US market is larger than any other single  national market, if perhaps not larger than the European market taken as a  whole", or "maybe the author just felt like it, or had an  American acquaintance the story is loosely based on".

John Kozak suggested that Nature is "writing to an international audience which, for better or worse, will be collectively far more familiar with the verbal idiosyncrasies of the incumbent US president than any other statesman". (But John, isn't that what I said?)

Trevor at Kalebeul suggested that I've misunderstood "the role of the intellectual in European society, which is something like that of a Catholic bishop in the States". He quotes a character in a Max Aub novel to the effect that "an intellectual is someone for whom all political problems are fundamentally moral problems". His conclusion: "That on this basis George W Bush emerges as the greatest intellectual of our time has naturally led to some jealousy over here".

And Declan Butler, who actually wrote the piece, emailed to explain that his intention was to highlight the need for U.S. leadership on this issue, and to try to make the issues more real to that public who can perhaps most effectively bring about real change in the way the world handles this threat: the American public. On this account, Sally O'Reilly's fictional blog was in fact pro-America's capacity to influence world events for the better.

I'll confess to the obvious: I was being one-sided and provocative. Butler's explanation is entirely reasonable, and I might have made the same choice in his place. However, there's a tricky contextual dynamic here, in my opinion.

These days, many Europeans seem to see Americans as the main source of international agency, especially in respect to blame. If something goes wrong, the default explanation is that it's because of something that the Americans did, or something that they failed to do. If Americans take an initiative (like Google Print), it's interpreted as a challenge and even an attack. If Americans fail to take an initiative, as in the case of developing techniques for rapid vaccine development in response to new flu strains, or monitoring and rapid response systems for extinguishing the disease in animal reservoirs, then the threat to world health is seem as a primarily American failure.

In military matters, there's some objective foundation for this view. But the size and sophistication of Europe's biomedical R&D establishment easily rivals America's. European political opinion is more friendly to major government-funded R&D initiatives, outside of military and national-security areas. And European government budgets are somewhat less out of control than the American budget is. So all in all, Europe is arguably better positioned than the U.S. to launch a major new R&D initiative on responses to influenza pandemics.

Nature's editorial on the bird flu threat, as Declan Butler pointed out to me, implicitly highlighted the failings of the international community and all countries. However, the U.S. is the only country whose preparations are explicitly criticized in the issue, as far as I can tell, and Bush is certainly the only political leader to be ridiculed.

Luckily, Fortune has dealt Nature another opportunity to ridicule a politician for saying dumb things about bird flu, just three days after the publication of the recent special issue. And this one is non-fiction:

Réagissant à cet article, le ministre de la Santé Philippe Douste-Blazy a déclaré que la France est "le premier pays européen à avoir parfaitement préparé une éventuelle épidémie de grippe aviaire".

Reacting to this article, the Minister of Health Philippe Douste-Blazy has declared that France is "the first European country to have perfectly prepared (for?) a possible epidemic of bird flu."

Selon le ministre, "la France a parfaitement préparé une éventuelle épidémie en commandant en octobre 2004, aux laboratoires Roche, 13 millions de traitements de Tamiflu ... Le Tamiflu est un antiviral efficace sur l'ensemble des souches grippales".

According to the minister, "France has perfectly prepared (for) a possible epidemic by ordering in October 2004, from the Roche laboratories, 13 million treatments of Tamiflu ... Tamiflu is an antiviral effective against all strains of flu."

Cinq millions de traitements sont déjà en stock dans notre pays et la quasi-totalité des stocks sera constituée avant la fin 2005. La totalité le sera avant mars 2006, a précisé le ministre.
Il a souligné que "la France est un des rares pays avec les Etats-Unis, la Canada, le Japon et l'Australie, à avoir constitué de tels stocks, permettant un traitement précoce d'éventuels malades atteints de grippe aviaire durant la pandémie".

Five million treatments are already in stock in our country and nearly all the stock will be complete by the end of 2005. The whole order will be complete before March 2006, the minister explained.
He underlined that "France is one of the few countries, with the United States, Canada, Japan and Australia, to have established such stocks, permitting an early treatment of possible patients stricken with bird flu during a pandemic."

Before we get to the content of these remarks, there's an interesting linguistic point here. I'm surprised to learn that the French verb préparer can be used with the prepared-for threat or challenge expressed as a direct object. I thought that (as in English) one could prepare a meal, or prepare a patient for an operation, or prepare (oneself) for an ordeal; but apparently one can also prepare a pandemic. Curiously, the DAF doesn't seem to register Douste-Blazy's usage either.

With respect to the public health issue, Douste-Blazy's notion of "perfect preparation " is a stock of 13 million treatments of Tamiflu for the French population of about 61 million, a precaution that he compares in degree of perfection to the status of four other countries including the U.S. I'll look in the next issue of Nature for the appropriate editorial comment: "parfaitement préparé, mon cul!"

[Update: Declan Butler points out via email that Douste-Blazy's remarks fit well with the schema of common denial and attempts to reassure, described in this essay by Peter Sandman and Jody Lanard. He also observed that the stocks of Tamiflu in Britain, France and Canada, though sure to prove inadequate in the face of a pandemic with high mortality, are much higher than those in the U.S., which has some 2.3 million courses of treatment available. ]

Posted by Mark Liberman at 07:45 AM

May 29, 2005

Smokin' too much Fowler

In my mail on 27 May, from Neal Goldfarb (of the Tighe Patton Armstrong Teasdale law firm in Washington DC), a pointer to a most remarkable (and disturbing) claim in a 2003 Supreme Court Review article (Of "This" and "That" in Lawrence v Texas, 55 Sup. Ct. Rev. 75) by Mary Ann Case.  Examining the following sentence from the Court's October 2002 decision invalidating Texas's anti-sodomy law --

(1)  The Texas statute furthers no legitimate state interest which can justify its intrusions into the personal and private life of the individual.

Case maintains that it is ambiguous as to whether the relative clause in which is restrictive or non-restrictive.  That is, she maintains that (1) has an interpretation as in

(2)  The Texas statute furthers no legitimate state interest, which can justify its intrusions into the personal and private life of the individual.

which, according to her, entails

(3)  The Texas statute furthers no legitimate state interest.

She appeals to the authority of "strict grammarians" (citing, yes, Fowler), maintaining that "a classically trained grammarian" would in fact say that (1) was interpreted as in (2).  It's that pesky That Rule again, last discussed in Language Log here.

Case is "blinded by the rules", applying something she was presumably once taught, rather than using her own knowledge of the language.  Sentence (1) is not ambiguous in the relevant respect; it has only a restrictive interpretation.  Indeed, the purported paraphrase in (2) is ungrammatical, for reasons that are well understood.  Case has been smokin' way too much Fowler.

How remarkable that two topics of great concern to me — the modern advice literature on English grammar and usage, in particular the That Rule, and the politics of homosexuality, in particular the regulation of sodomy (between consenting adults in private) — should come together this way.  But how sad that a fundamental misunderstanding about the grammar of English should have made its way into the Supreme Court Review.

Case's claim in her article is that the majority opinion in Lawrence v Texas (written by Justice Kennedy) exhibits a considerable degree of unclarity, in part because of "ambiguity of referents".  Sentence (1) is just part of the web of unclarity she sees.  In more detail:

At least for strict grammarians, perhaps the most significant "that" in the entire majority opinion is the one that isn't there, in the sentence dissenting Justice Scalia describes as the opinion's "actual holding:"  "The Texas statute furthers no legitimate state interest which can justify its intrusions into the personal and private life of the individual." Note, the majority says "which can justify ..." rather than "that can justify...." A classically trained grammarian would observe that this should signal the majority's intention for the clause to be a non-restrictive rather than a restrictive one (or, as Fowler puts it "non-defining" rather than "defining"). "Non-restrictive clauses are parenthetic.... A non-restrictive clause is one that does not serve to identify or define the antecedent noun."  Thus, if the majority opinion is careful about its grammar, the question of whether the opinion applies heightened or rational basis scrutiny can be answered by noting that, technically, the sentence can be shortened to "The Texas statute furthers no legitimate state interest" without altering its meaning.  In other words, the statute fails the lowest level of scrutiny; no heightened scrutiny is required. Had the sentence continued with "that" rather than "which," it could correctly have been read to suggest instead that, while the Texas statute did "further a legitimate state interest," the interest was not one "that can justify its intrusion into the personal and private lives of individuals"; in other words, the majority would have been acknowledging a need to apply heightened scrutiny.

The main legal point here concerns the relevant level of scrutiny to be applied.  In Goldfarb's words, from his e-mail to me:

Regarding the phrase "heightened or rational basis scrutiny": This refers to an important aspect of the methodology of deciding whether a statute is unconstitutional. One of the central issues in making that decision is what "level of scrutiny" the court should apply. In other words, should the court take a critical look at the statute and strike it down unless the government presents a convincing justification for it (heightened scrutiny), or should it give the statute the benefit of the doubt and uphold it as long as there is some rational argument that could be made in its support (rational basis scrutiny). This is an oversimplification, but it will do for now.

In any case, the grammatical point is perfectly clear: which is entirely acceptable in restrictive relatives, so that (1), punctuated as above, is understood as having a restrictive relative.  In fact, a non-restrictive interpretation isn't possible at all; (2) is simply ungrammatical, because the NP no legitimate state interest isn't referential.  The point is an old one.  It's explicit in The Cambridge Grammar of the English Language (p. 1060):

Expressions consisting of no, any or every morphologically compounded with -one, -body or -thing, or syntactically combined with a head noun, have non-referential interpretations and cannot serve as antecedent of a [non-restrictive] relative, but they can be followed by [restrictive] relatives.

CGEL gives this rule (contrasting *No candidate, who scored 40% or more, was ever failed with the grammatical No candidate who scored 40% or more was ever failed), but not, of course the That Rule, since the That Rule "is not descriptive of actual usage" and so "had no place in a descriptive grammar" (as Huddleston put it in e-mail on 28 May). 

Goldfarb notes that Justice Kennedy "routinely violates the prescriptive which/that rule" -- as any reasonable person would. Here are three more instances, supplied by Goldfarb, of restrictive which (in bold) from the Lawrence opinion:

For many persons these are not trivial concerns but profound and deep convictions accepted as ethical and moral principles to which they aspire and which thus determine the course of their lives.
  Romer invalidated an amendment to Colorado's constitution which named as a solitary class persons who were homosexuals, lesbians, or bisexual either by 'orientation, conduct, practices or relationships,'...
 If protected conduct is made criminal and the law which does so remains unexamined for its substantive validity, its stigma might remain even if it were not enforceable as drawn for equal protection reasons.

The first of these is especially compelling, since the which is parallel to the earlier to which, and we all know that restrictive which, rather than that, is obligatory with fronted prepositions, with the result that parallelism can be maintained only by the choice of which in the bolded position.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 11:23 PM

Case nuances

case My sophomore seminar students have weekly assignments to collect real-life examples relevant to some point of usage in English and to discuss their significance.  Some of these examples are surprising and thought-provoking.  Here are three from the last (both most recent and final) assignment, on pronoun case, illustrating nuances in the choice between nominative and accusative case.


First, what looks like a simple example of non-standard accusative case in a coordinate subject:
    (1)  Me and Paco are best friends.
Ah, but things shift when I tell you that Paco is a chihuahua.  Suddenly, (1) doesn't sound so bad any more, and its standard version --
    (2)  Paco and I are best friends.
no longer sounds so good; (2) humanizes Paco, inappropriately to my mind.  (Serious dog-lovers might feel otherwise.)  And to the minds of others I've consulted.

Faced with the choice between (1) and (2) in a moderately formal setting, I'd reject them both and go for something like Paco's my best friend.

Second, an example of non-standard nominative case in a coordinate object:
    (3)  Rachel wants you and I to...
Google turns up over 500 examples of "wants you and I to" (many of them in religious material, for some reason), and over 700 for "want you and I to":

The Star wants you and I to register to access stories on the site.
www.polspy.ca/items/2004/08/20/736.php

He wants you and I to be evangelists, to help others to go to Heaven, with our word and our deeds, with our life!
biblia.com/jesusbible/isaiah7.htm

Now, many people who reject things like
    (4)  Rachel likes you and I.
find examples like (3) considerably better -- not fully acceptable, but considerably better.  I share this judgment.

The effect seems to have something to do the fact that the coordinate NP is interpreted as the subject of the VP that follows it.  It also seems to be specific to the verb want; hits for other verbs are less than 10% of those for want: raw hit numbers of 48 for "expects", 34 for "forces", 33 for "needs", 12 for "tells", 10 for "asks", and 0 for "likes".  (These are almost all religious in content.)  In any case, the phenomenon deserves some further exploration.

Third, accusative us (rather than we) as a determiner in a subject NP:
    (5)  All us old folk are going to bed now.
The usual examples of personal pronouns as determiners are things like
    (6)  We/Us old folk are going to bed now.
in which we is labeled as the norm, with "very colloquial and dialectal varieties having accusative us" (Cambridge Grammar of the English Language, p. 459).  My own judgments are that we in (6) is hyper-formal, while us is decidedly informal, so that neither variant is comfortable for me in most formal contexts.  In (5), on the other hand, we strikes me (and a fair number of others) as simply unacceptable:
    (7)  ??All we old folk are going to bed now.

The subject NPs in (5) and (7) have an instance of "predeterminer" all -- a use of all in which it combines with a full NP (that is definite and plural), in which use it alternates with a construction having an explicit partitive in of.  The explicit partitive has accusative objects of of, of course:
    (8)  All of us/*we old folk are going to bed soon.
My hypothesis is that the contrast between (5) and (7) reflects the contrast within (8).  (For what it's worth, the contrast between (5) and (7) is even starker for me when the predeterminer is both rather than all.)

And that's the top of the crop for this week.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 02:44 PM

Locating the sarcasm bump?

In a significant advance for the modern science of phrenology, Dr. Simone Shamay-Tsoory and others at Haifa University have located the brain regions responsible for "understanding sarcastic comments": the right ventromedial prefrontal cortex. (S.G. Shamay-Tsoory and R. Tomer, "The Neuroanatomical Basis of Understanding Sarcasm and its Relationship to Social Cognition". Neuropsychology, 19(3), pp. 288-300 (2005)). The abstract:

The authors explored the neurobiology of sarcasm and the cognitive processes underlying it by examining the performance of participants with focal lesions on tasks that required understanding of sarcasm and social cognition. Participants with prefrontal damage (n = 25) showed impaired performance on the sarcasm task, whereas participants with posterior damage (n = 16) and healthy controls (n = 17) performed the same task without difficulty. Within the prefrontal group, right ventromedial lesions were associated with the most profound deficit in comprehending sarcasm. In addition, although the prefrontal damage was associated with deficits in theory of mind and right hemisphere damage was associated with deficits in identifying emotions, these 2 abilities were related to the ability to understand sarcasm. This suggests that the right frontal lobe mediates understanding of sarcasm by integrating affective processing with perspective taking.

I shouldn't be too sarcastic here -- the paper is interesting and suggestive. However, it exemplifies the tendency of scientists to assume without discussion that the common-sense categories of conscious experience must be in one-to-one correspondence with brain regions and with components in a functional "boxology". (And often with genes as well, though that's a different story.) So when I read a paper whose second section heading is "The Anatomical Basis of Sarcasm", I get a sinking feeling: here we go again.

There's a compelling critique of neo-phrenology in Martha Farah's 1994 article, "Neuropsychological inference with an interactive brain: A critique of the locality assumption", Behavioral and Brain Sciences, 17, 43-61. Plenty of others, before and since, have highlighted the problems that arise when we reason uncritically from lesion effects or from subtractive functional imaging to the functional and anatomical locality of some mental process or content. This is not to say that brain function is homogeneous, or that it is necessarily wrong that the right ventromedial prefrontal cortex "mediates understanding of sarcasm by integrating affective processing with perspective taking".

The BBC News story about this publication is significantly less credulous than the BBC norm, using phrases like "scientists say", and moderating the suggestion that these "findings might help to explain autism features" with quotes from the National Autistic Society that "The causes of autism are still being investigated", and that "Many experts believe that the pattern of behaviour from which autism is diagnosed may not result from a single cause".

I learned about this from Justin Busch at Semantic Compositions, who has posted an interesting discussion of some other aspects of the paper, especially the authors' claim that the sarcastic passages in their experiments were read with "sarcastic intonation". Justin linked to an old post of mine, which offered an alternative to Steve Pinker's theory of "could care less" as sarcastic. He didn't link to the posts where I criticized the idea that such a thing as "sarcastic intonation" actually exists at all (here, here and especially here), and this reminded me again of the lack of a topical index to Language Log.

I still don't have any general solution to that problem, but here is a list of our posts on sarcasm:

Reverse sarcasm? (Mark Liberman)
Scalar inversion and the unique cephalopod of negation (Mark Liberman)
How does the devil admonish Kerberos? (Mark Liberman and David Beaver)
Improve your love life through the power of pragmatics (Mark Liberman and David Beaver)
The FCC and the S-word (again) (Mark Liberman)
Aw+ (Mark Liberman)
From Just So Stories to science, in biology and in pragmatics (Mark Liberman)
Lederer should care less (Eric Bakovic)
Caring less with stress (Mark Liberman)
Still on the hook (Eric Bakovic)
"Could care less" occurs more (Mark Liberman)
Negation by association (Mark Liberman)
Speaking sarcastically (Mark Liberman)
Most of the people in the world could care less (Mark Liberman)
Caring less all the time: a variant of the etymological fallacy, and some cautions about the pragmatics-phonetics connections (Arnold Zwicky)

I should also cite the Second Law of Pragmatodynamics (from Duding out): "In all isolated cultural exchanges, irony increases."

Posted by Mark Liberman at 10:03 AM

Language Quiz #4: the answer

The answer to language quiz #4: Hausa.

This was a single sentence (audio clip here) from a VOA radio broadcast about events in Uzbekistan.

In usual Hausa orthography, it would be Jami'an Gwamnati sun ce mutane tara sun mutu, wasu kuma talatin da huɗu sun jikkata, lokacin da sojoji suka yi harbi cikin taron masu zanga-zangar, meaning "Government officials confirmed 9 people dead and 34 injured when soldiers opened fire during a protest." [The transcription and translation were kindly provided by Will Leben].

Quite a few readers were able to find the answer quickly, by listening to the clip, picking out a clear word (usually the last one, zangazangar meaning "protest"), and searching on Google. Either {"zangazangar"} or {"zanga zangar"} produces lots of Hausa pages and not much else. The combination of alphabetic orthography and Google is an interesting new tool for language identification!

Some people documented their guesses on line, for example this weblog entry by Patrick Hall at Infundibulum. Some others contacted me by email with the correct answer: Aron Burrell, Jarek Weckwerth, Language Hat, Artur Jachacy and Bay Elliot (I hope I didn't forget anyone!) There were a couple of plausible wrong answers as well.

Hausa is an Afroasiatic language, spoken by about 25 million people in northern Nigeria and nearby areas, and used as a lingua franca by tens of millions of others across west Africa. Some resources on Hausa are available from a page at UCLA, especially five hours of videos on Hausa Language and Culture for which mp3 audio and transcripts are available on line.

Here's a version of the quiz sentence with long vowels and tones:

Jaami'an Gwamnati sun cee mutaanee tara sun mutu,
 L  L H    H  H L  H   L   H L  H   H L  H   H L  

wasu kuma talaatin da hu'du sun jikkata,
 H H  H H  L L  H   L  H  H  H   L  H L 

lookacin da soojoojii suka yi harbii cikin taaron maasu zangazangar.
 H  L H   L  H  H  H   H L  H  H  L   H H   L  H   L  H  L  L L  HL

Here are spectrograms, pitch tracks and waveforms for the three subphrases, with a time-aligned orthographic transcript:







[I originally promised to post some hints, and to post the answer on Thursday. I apologize for being late -- a cancelled flight on my return from Chicago in mid-week disrupted my schedule a bit. The Language Log Circulation Department offers, as always, a full refund of subscription fees to any reader who is less than fully satisfied.]

Posted by Mark Liberman at 08:04 AM

Pass the hát.

a -a = 35,800,000. According to Google that is. The "-" sign is advertised by Google as a way to remove stuff from a search. So you would have thought that any string of the form X -X would produce 0 hits. But it doesn't. Try it: a -a.

Or try espanol -espanol.

Or achete -achete: infantile as I am, I really like this one since it produces 1,890,000 hits, while Google helpfully suggests the alternative acheter -acheter, which produces no hits, surely a new record of bad performance for a search enhancing feature.

Or else try agreable -agreable, which yields 1,450,000 hits. Restricting a search for agreable to French pages only also yields 1,450,000 hits. Coincidence, n'est pas?

By now you must think you've figured out what Google is doing. Simple positive queries return hits that ignore diacritics, while the negation operator removes stuff that perfectly matches its ASCII argument, right? But Google search results are never as simple as they appear.

First, try resume -resume. No hits. Weird. You might speculate that the earlier explanation is right, but words that Google thinks are of financial value, i.e. adwords that might be sold to a vendor, are indexed using a different algorithm that ignores diacritics completely.

No, no, no, a thousand times no. You still don't get it, do you? Google hits are not only never as simple as they appear, they are never simple at all. Although the problems with Google's wildcard * described in this earlier post of mine seem to be all but fixed, Google's secret algorithms are still tied up in knots that nobody understands. You don't believe me? Try some searches that include accents:

 
hat 172,000,000
matches "hat" and who knows what else.
hat hat
52,500,000
matches "hat", but now without mistaken Google extrapolation.
For explanation of extrapolation see this earlier language log post (and this one too), based on a superb analysis  by Jean Veronis.
hát 52,500,000
similar to above, but the results are at least ordered differently. Could diacritics be another trick we could use to remove mistaken extrapolation without repeating the whole query?
hát -hat 664,000
Matches hát with accent, loads of Vietnamese hits. It means "to sing", apparently.
hat -hát 52,400,000
Similar to hát, but different, e.g. our friend languagehat is on the first page!
hát -hát 52,400,000
Like the previous one, except "red hat" is higher ranked.
hat -hat
0
Uh, yeah, right.
hát hát -hat
11,200,000
My head aches.
hát hát hát -hat
11,300,000
Your head aches.
hát hat -hat
11,300,000
What do you mean you knew that would happen?
hát -hat -hát 0
Oh, OK, I think I get it, perfectly logical after all...
hát  hát -hat -hát
11,200,000
Seems sensible, same number as hát hát -hat. But wait! Those hits were Vietnamese and these are in English. And none of them are for "hat". They are for "hats". Plural. WTF?
hát hát -hat -hats
671,000 OK, so there were "hats" in the "hát hát -hat" count, albeit not in any pages of hits I sampled.
hát  hát -hat -hát -hats
0
Back to zero again! But what were the hats doing there in the first place?

 
This is all scary stuff if, like me, you want to use Google counts for complex boolean searches to get a statistical handle on how language works. Using Google to measure language frequencies is like trying to measure the circumference of the Earth by putting live snakes of unknown length end to end around the equator. If I tell you the answer is 68,000,000 snakes, will you be any the wiser?

Posted by David Beaver at 03:44 AM

May 28, 2005

Gone to get pants: a handwriting recognition story

This story is from The New Yorker (May 30, 2005, p. 95), in the bottom-of-column series headed "Constabulary Notes from All Over", and it is repeated here because it did make me giggle. The linguistic angle (apart from the fact that real syntax aficionados will see an occurrence of may with past time reference) is a pessimistic point about artificial intelligence: nonlinguistic context affects handwriting recognition in ways that computers are unlikely to be able to simulate any time soon. I don't think you'll see the punchline coming.

From the Boulder (Colo.) Daily Camera.

   At least one driver reported Saturday evening that a nude man was "streaking" on eastbound U.S. 36 in Broomfield, according to police and police scanner reports.

   Broomfield police officer Jim Alston said there are no suspects. He said an officer found an abandoned car in the vicinity of the incident with a note that he thought read: "Gone to get pants," and that he thought the car may have belonged to the streaker.

   However, Alston said, it turned out the note read: "Gone to get parts."

Posted by Geoffrey K. Pullum at 03:32 PM

Communication via erectile appendages

Here's another remarkable new invention from those wacky engineers at MIT. Robotic appendages are built into the pelvic area of an article of clothing. When a member of the opposite sex gets close, proximity sensors register the approach. This automatically triggers the response of an erectile system that stiffens two pointed, foot-long probes, extending them at jaunty angles in the direction of the Approaching Other. The idea is to send a clear message about sexual availability.

No, it's not what you think. At least, I don't think it is. We're talking about J. Meejin Yoon's Porcupine Defense Dress:

It seems that erectile appendages, like phrases, are interpreted in context.

Posted by Mark Liberman at 07:41 AM

May 27, 2005

George W. Bush and Sally O'Reilly's ass

The May 26 issue of Nature dramatizes the danger of bird flu, using the medium of a fictional weblog.

The format is backwards, in the sense that its entries run forwards in chronological order. This first-things-first order is the reverse of the blog-standard last-things-first order, presumably to make it easier to follow the story in fictional time. In other ways, though, it's an honest attempt to imitate a bloggy style, including what must be the first use in Nature of the expression "my ass" to communicate skepticism.

The fictional blogger is Sally O'Reilly, "a freelance journalist based in Washington DC", who has "been researching a book on pandemic preparedness" when a pandemic breaks out for real. In fact, the piece was written by Declan Butler, described as "Nature's senior reporter in Paris". I didn't realize that Nature had reporters resident around the world -- I thought it was a scientific publication.

Anyhow, the bird flu story is indeed an important one, well worth dramatizing. See the whole issue, mostly accessible to the public, for more information; or follow the story as it develops on ProMED-mail). But the "ass" comment is really kind of curious. It's presented as a reaction to a statement by U.S. President George Bush:

"At this hour, the World Health Organization has declared a full-scale pandemic influenza alert, with person-to-person spread lasting more than two weeks in Cambodia and Vietnam. During previous influenza pandemics in the United States, large numbers of people were ill, sought medical care, were hospitalized and died. On my orders, the Department of Homeland Security and the Department of Health and Human Services have today implemented the nation's draft Pandemic Influenza Response and Preparedness Plan. It will serve as our road map, on how we as a nation, and as a member of the global health community, respond to the pandemic. We are ready. Thank you, and may God bless America."

Declan-as-Sally's reaction is "Ready, my ass!"

Two things struck me about this. First, this is another milepost in the process that John McWhorter wrote about in the WaPo a couple of years ago: a "narrowing of the gap between the formal and the informal in public discourse". A few days ago, the New Yorker's film critic used the word "fucking" to emphasize his dislike of a new movie, and now one of the world's top scientific journals expresses its opinion of a U.S. President's remarks by reference to a (fictional) writer's metaphorical butt.

Second, the whole thing here is fictional -- the flu pandemic and the presidential statement as well as the blogger and her informal reaction. It's true enough that a flu pandemic is likely, sooner or later, and that the world is both woefully underprepared and not nearly concerned enough. But why should a Paris-based writer, in a London-based publication, choose George W. Bush for fictional skewering by a Washington-based freelancer? If a flu pandemic breaks out next winter, is Bush's statement likely to be any more contemptible than Blair's -- or Chirac's, or Schroeder's, or Jintao's? Or (EU President Jean-Claude) Juncker's or Kofi Annan's, for that matter?

Well, the answer is clear. Nature's Paris correspondent is using anti-American and anti-Bush prejudice to promote awareness of an international public health issue. I wonder if Nature's editorial board explicitly considered this step. My guess is that they did not -- having decided to dramatize a possible pandemic in blog form, Butler was just doing what comes naturally, these days, to European intellectuals.

Posted by Mark Liberman at 04:25 PM

May 26, 2005

Memory of qualities: armed structure and crystals

My brother Richard brought me, all the way from southern Spain, a precious object: a color brochure for a new residential condo development in Pilar de la Horadada, near Alicante. It has been built with an eye to the many British people who are buying vacation properties in Spain, and located (so the brochure says puzzlingly) "From 2 km. to the sea." The brochure goes on to claim that in this development:

Each detail meticulously has been studied to create
a confortable home, a warm space for a stay wel
coming. In Basi Residential we have designed each
corner thinking about its welfare.

Meticulously has been studied? Wel  coming? Confortable? Each corner designed by thinking about its welfare? Even a casual browse of the pamphlet gets one's linguistic antennae tingling. A look at the floor plans of the units reveals that the floors are labeled "LOW PLANT", "FIRST PLANT", and "SECOND PLANT". And as we go on into the description of the condos, we increasingly realize that something terrible has happened. I have a suspicion that these people have done something truly catastrophic: I think they have trusted a free Internet translation service they found on the web. Look at the rest of the text:

MEMORY OF QUALITIES

Armed Structure of concrete.

Facade of brick and remainder of outsider plastered wall.

Carpentry exterior of aluminium lacquered blank.

Crystals in double glass climalit or similar.

Carpentry interior of Word lacquered blank, with the armored door of entrance of high security.

Interior to puta an end to painting plastic color.

Baths and kitchens shaped to the ceiling.

Floors of platform of Word in the living rooms and in the remainder of the dwelling. I pave earthenware.

It cooks furnished with sink of stainless steel of a breast and plate rack, fridge, plate oven and extractor fan.

Heater of gas or electric in the patio or laundry.

Preinstalation of air conditioned.

Strong box of security.

Built-in cabinets in dormitories an attic.

Installation of pumbing in copper, complying the regulation in force.

Installation of electricity, according to regulation with antenna t.v.

Solarium with pergola of wood.

Common pool, green zones and parking.

Then comes a description of the town of Pilar de la Horadada:

In the south of the Valencian Community, opening step to the Costa Blanca, its found Pilar de la Horadada. Its coastal seaboard of more than 4Km.,its spring climate during all the year an the behavior of its peoples do of this municipality a privileged place to pass some unforgettable holidays.

To travel through the maritime walks, to enjoy the sun, of the nature, to take a bath in its tranquil an transparent water and to navigate the Mediterranean are some of the attractive tourist that offers this nail.

Leaving behing the coast we enter in the zone of mountain, that constitute areas of great environmental and ecological value, arranging of a natural area protected where besides the enjoyment of the nature activities they can be practiced and sports related to this.

Besides Pilar de la Horadada counts on numerous sports facilities : fiel of golf, sports port, air conditioned swimming pool, sports trails, covered building, etc.

Your place of leisure and rest!!!

When one has finished giggling, and one has noted that "The development business is reserved the right to make any change that be necessary", one finds oneself wondering: who on earth could approve English this bafflingly dreadful for publication in a full-color brochure that must have cost thousands of euros, while knowing so little English that they could not see they were signing off on impenetrable gibberish? Why was no one who knew English from long acquaintance brought in to cast an eye over it? My brother spends several months each year in Alicante; they could have found him. After reading two lines he could have told them, "Hold the presses; you don't want to print this." For fifty euros he could have spent a couple of hours finding out what they intended (what an armed structure is, what those crystals are, what is meant by carpentry exterior of aluminium lacquered blank, how the baths and kitchens could be "shaped to the ceiling", and so on — and could have rewritten it for them. Why did they not call him, or call someone, any visitor from England who was able to read?

Could it be pride, an unwillingness to admit to not being adequately fluent in the nascent global language of commerce and the most frequently encountered language spoken by visitors to Spain? ("I speee-eek Eeng-lish," says the waiter Manuel from Barcelona in the John Cleese "Fawlty Towers" series, very proudly and dramatically; "I learn it from a boo-ook!") The frequent uses of its for is and an for and suggest the material really was typed by the hands of a human being, someone who thought they knew what they were typing, and did know various common small English words, though not enough to tell them apart. I'm still not sure what I think. But perhaps someone called up a web-accessible translator, typed in the Spanish text, and — not realizing that the machine translation problem has yet to be solved — trusted the output to be reliable, and handed it over to someone who retyped it adding extra errors. Why "Word" would be used for wood, with a capitalized initial, I have absolutely no idea. It happens twice in the material above, but not three times (the third time we get the correct "wood"). That is not predicted by machine translation, word-processor spell-checking, or incompetent retyping; it's just an inscrutable mystery.

[Added later: Thanks to Richard Pullum for supplying the brochure, and for Candace Freiwald for doing the painful job of typing it out for me. Ray Girvan has pointed out that you can read the text online in both the English version and the original Spanish, and that if you use Google's translation engine you get something very similar to the English but not identical, as you can see here for yourself. Ray thinks it is as I suspected: a human editor who didn't know English not only trusted the Google translation (mistake number one) but also saw fit to slightly modify it (mistake number two).]

Posted by Geoffrey K. Pullum at 06:31 PM

Yoda is Luce reborn

Q Pheevr has figured it out: not only where Yodic comes from, but why it really chaps Anthony Lane's grits.

Has no one considered the possibility that Yoda is channelling the spirit of Henry Luce?

Luce, of course, was the perpetrator of Timespeak, the peculiar language of Time magazine, which Wolcott Gibbs memorably lampooned in a profile of Luce in The New Yorker; Gibbs's "backward ran sentences until reeled the mind" neatly prefigures Lane's "break me a fucking give."

Q also offers a modified version of Geoff Pullum's syntactic analysis of Yodic, and (crucially) a cartoon:

The cartoon, alas, depicts an English class long ago, in a culture far away.

I'm not convinced that Q's syntactic analysis of Yodic as (Pred S Aux) is enough. It doesn't address the ternary swaps, where there is double or nested fronting:

To fight this Lord Sidious, strong enough, you are not.
To question, no time there is.

To trace these patterns back into Time texts of the 1930s, going forward research is.

Posted by Mark Liberman at 05:35 AM

Language Log like list

Cristi Laquer at Invented Usage has recently posted "on like usage". She cites a number of blog posts on the various innovative uses of like (the hedge, the quotative and so on), including a Language Log post, and asks "If anyone knows of anything else out there, please let us know!"

The classic (non-blog) reference is Muffy Siegel's paper "Like: The Discourse Particle and Semantics" (J. of Semantics 19(1), Feb. 2002). In thinking about other references on our site, I came to three conclusions at almost the same time. There have been quite a few Language Log posts that are relevant to the use of like; it's hard to find them; and none of them summarizes the epic panorama of that protean word's patterns of usage.

To start with, here's a reasonably complete list, in chronological order, of Language Log posts relevant to like:

It's like, so unfair (Geoff Pullum)
Like is, like, not really like if you will (Mark Liberman)
Exclusive: God uses "like" as a hedge (Geoff Pullum)
Divine ambiguity (Mark Liberman)
Grammar critics are, like, annoyed really weird (Mark Liberman)
This construction seems that I would never use it (Mark Liberman)
Look like a reference problem (Eric Bakovic)
Seems like, go, all (Mark Liberman)
I'm like, all into this stuff (Arnold Zwicky)
I'm starting to get like "this is really interesting" (Mark Liberman)
This is, like, such total crap? (Mark Liberman)

It's hard to find these because we don't have a subject index or a lexical index. You can search by strings, and that works fine for (say) Pirahã, but it's essentially useless in searching for something like like. I guess some day we should fix that. Meanwhile, there's the like list. I think. (I probably missed a couple.)

As for my observation that none of these posts is, like, a systematic guide to all the, like, meanings and syntactic patterns of all the forms of like -- well, this is a blog, not a dictionary. So it's not like we should feel bad about this. Still, a survey of the origin and progress of like would make an interesting post. Someday.

Looking over the list of Language Log like titles, I also notice that different contributors have different ideas about like-related punctuation in the hedge and quotative cases. I seem to prefer commas fore and aft, while Geoff Pullum and Arnold Zwicky favor following commas only. There'll be a meeting at 8:00 in the Board Room at Language Log Plaza to settle the matter...

Posted by Mark Liberman at 05:00 AM

May 25, 2005

Mascot Names and Etymology

Frank Deford had an excellent piece today on NPR's Morning Edition praising the NCAA's decision to review its policies regarding the use of Indian names for team mascots. As I mentioned in a post some time ago, I served as the expert for the Indians who challenged the Washington Redskins' trademark on the grounds that the Lanham Act prohibits the registration of marks that are "disparaging" -- a petition upheld by the Trademark Board but ultimately reversed by a Washington D.C. district court judge. So I was naturally glad to hear a sportswriter as influential as Deford condemn the Redskin as "the most offensive" of all Indian mascot names.

But Deford made one assertion that needs correcting. "Redskins," he said, "does not refer to skin color. . . A redskin was a scalp taken by Americans as bounty. The red in redskin is blood red." Not so.

True, that tale has been around for a while, and has been widely repeated by opponents of the use of the nickname, including many Indians, among them Susan Harjo, who as it happens was one of the petitioners in the case I worked on. It makes for effective propaganda for a just cause -- I think we made an unimpeachable case that the term is and has always been racist, whatever the District Court judge may have ruled.

I don't know where the "scalp" story originated, but it isn't very plausible. The OED gives the first citation for redskin from 1699, well before the date of any of the stories about paying bounties for scalps. Some people have suggested that the phrase derives from the European and Algonquian name for the Delaware Indians, whose men would streak their faces and bodies with red ocher and blood-root. Could be, though I'm not aware of any contemporary evidence for that claim, either.

In any event, even if one of those tales were a true account of the first use of the term, neither would account for its spread and persistence in English. "The public has a feeling for utility," Michel Bréal wrote in his Essai de Sémantique, "but does not trouble itself over history." No, Indians don't really have red skin, no more than other perceived racial groupings are really white, black, or yellow, but the color names reflect an urge for synaesthesthetic categorization that runs very deep in cultures, reducing racial groups to elemental primaries. Even if the scalp story happened to be true, it wouldn't explain terms like redmen.

Linguists pooh-pooh the idea that the original meaning of a term can somehow persist in the collective unconscious after it has been lost to individual recall. But that picture still has a pervasive hold in popular thinking about language. I'm of two minds about the "scalp" story about the origin of redskin: as a linguist I feel obliged to correct it, but it's doing good work, and part of me is inclined to let it pass. Se non è vero. . .

Posted by Geoff Nunberg at 10:13 PM

Joy and contempt

A few weeks ago, Liz Ditz sent in a link to an article on Equestrian vocabulary from the (London) Times. The part of the article that caught my attention was its discussion of how the word hack has "moved from contempt to joy".

Hack is shortened from Hackney, which was a horse that you could hire. Therefore it was not up to much else, like the car you hire at the airport. So a hack was a sorry drudge, a horse from which not too much was expected. It was used figuratively and came to mean a literary drudge, a penny-a-liner, a term used of journalists with amiable contempt, and by journalists of themselves with a kind of epic false modesty.

But the word has been reborn in the horsey life. A smart trainer at Newmarket will ride out on the Heath on his hack, which may be a sumptuous former racehorse. It has become a verb: riders hack out on their horses, riding for the straightforward pleasure of it. A hack is not the horse but an out-and-back journey on horseback. The word has moved from contempt to joy.

In the language at large, hack seems to have at least three different etymological sources, and a dozen areas of practical association, with a bewildering variety of emotional connotations. And it's ironic that horsey hack has moved from contempt to joy, since technological hack has moved in the opposite direction, from joy to contempt.

According to the OED, the three sources for modern English hack seem to be a word for a kind of horse; a word for cutting with heavy, irregular blows; and (more obscurely) a word for the racks used to make food available to cattle or to falcons.

Specifically, we have hackney defined as

A horse of middle size and quality, used for ordinary riding, as distinguished from a war-horse, a hunter, or a draught-horse; in early times often an ambling horse.

From an early date mention is found of hackneys hired out; hence the word came often to be taken as, A horse kept for hire.

with the etymology

[a. OF. haquenée fem. ‘an ambling horse or mare, especially for ladies to ride on’; cf. OSp. and Pg. facanea, Sp. hacanea, It. acchinea (Florio), chinea ‘a hackney or ambling nag’: see Diez, Scheler, etc. (In 1373 latinized in England as hakeneius: see Du Cange.)
  It is now agreed by French and Dutch scholars that MDu. hackeneie, hackeneye, Du. hakkenij, to which some have referred the French word, was merely adopted from the French, thus disposing of conjectures as to the derivation of the word from MDu. hacken to hoe. The French haquenée and its Romanic equivalents had probably some relationship with OF. haque, OSp. and Pg. faca, Sp. haca ‘a nag, a gelding, a hackney’ (Minsheu): but, although the word-group has engaged the most eminent etymologists, its ulterior derivation is still unknown.

After being shortened to hack, hackney underwent a sequence of extensions along the lines sketched in the Times.

Meanwhile, long ago in falconry, hack was a noun for

The board on which a hawk's meat is laid. Hence applied to the state of partial liberty in which eyas hawks are kept before being trained, not being allowed to prey for themselves. to fly, be at hack , to be in this state.

This is probably connected with another kind of food-availability hack:

A rack to hold fodder for cattle. to live at hack and manger, i.e. in plenty, ‘in clover’.

The OED suggests that at least the cattle-feeding version comes from hatch ( perhaps based on the design of traditional systems for controlling access to feed?):

[... another form of the words HATCH and HECK, having the consonant of the latter with the vowel of the former; cf. hetch, a variant of hatch. The other senses do not run quite parallel with those of hatch and heck, and it is possible that some of them are of different origin.]

For the commonest form of hack, the OED gives the gloss and etymology:

To cut with heavy blows in an irregular or random fashion; to cut notches or nicks in; to mangle or mutilate by jagged cuts. In earlier use chiefly, To cut or chop up or into pieces, to chop off. Const. about, away, down, off, up.

[Early ME. hack-en, repr. OE. *haccian (whence tó-haccian to hack in pieces): Common WGer. *hakkôn: cf. OFris. to-hakia, MHG., MLG., MDu., G. hacken, mod.Du. hakken.] 

There are many extended senses that seem to connect to one or another of these sources, but others whose connections are more obscure.

For example, it's plausible that the expression hack off in the sense of "to annoy" is extended from hack in the sense "to cut with heavy blows":

I am getting really hacked off now with NTL email.
I wouldn't be so hacked off about it if I didn't love country music.
But here's what really hacks me off. WHAT REALLY HACKS ME OFF. When I give a plant every advantage and it DIES ANYWAY.

But it's less clear to me where the expression can't hack [something] in the sense "be unable to manage or tolerate" comes from:

It's not that I think Janeway can't hack it alone ...
You can't hack the tactics / Of a semi automatic full rap fanatic
If he couldn't hack the accent, why did he get the part?
Partway through the gig the bloody thing disintegrated in me fist 'cos the sellotape couldn't hack the sweaty heat.

When I was a kid, we used expressions like hacking around (and sometimes hacking off) to mean something like playing or fooling around. It had to be a directed activity -- dozing in the sun would not be hacking around, but building a dam in the creek would be -- and it also had to be fun and self-motivated, so mowing the lawn would definitely not be included. Of all the OED offerings for hack, the one about young falcons being "at hack", or cattle being "at hack and manger" seem closest to this, though the connection is far from exact. Our use certainly had no sense of chopping or cutting about it, though I guess there might have been some resonance of the "irregular or random" component that the OED attributes to hack-as-chop.

In the late 1960s, when I heard people at MIT talking about "the model railroad club hackers" or "hacking ITS" or "hacking TECO", I just assumed from context that this was the same sense of hacking as goal-directed play that I'd grown up with. This isn't exactly the sense of hacking as "an appropriate application of ingenuity" suggested by the Jargon File, but there's some sort of connection.

The black-hat hack senses "to gain unauthorized access to computer files" or "to break into a computer system by hacking" came later, as is well known. The activities denoted are roughly the same, but the connotation has changed from joyful play to devious threat.

This sort of situation, in which several different historical sources half-way merge into a highly polysemous collection of incompletely-differentiated words and phrases, seems to be commoner than one might think. Cases previously discussed here include diet and pole.

One last mysterious hack, from the OED:

The sense of hack in SHAKES. Merry W. II i. 52, ‘These knights will hack’, is doubtful. The senses, To be common or vulgar; to turn prostitute; to have to do with prostitutes; and ‘to become vile and vulgar’ (Johnson and Nares), have been suggested; but the history and chronology of this verb, and of the n. whence it is derived, appear to make these impossible.

[Update: the indefatigable Ben Zimmer reports that

Fred Shapiro on ADS-L uncovered a 1963 article in MIT's student paper, The Tech, which discusses the "hacking" of the Institute phone system. Even early on, the connotation was more "devious threat" than "joyful play".

1963 The Tech (MIT student newspaper) 20 Nov. 1 Many telephone services have been curtailed because of so-called hackers, according to Prof. Carlton Tucker, administrator of the Institute phone system. ... The hackers have accomplished such things as tying up all the tie-lines between Harvard and MIT, or making long-distance calls by charging them to a local radar installation. One method involved connecting the PDP-1 computer to the phone system to search the lines until a dial tone, indicating an outside line, was found. ... Because of the "hacking," the majority of the MIT phones are "trapped."

I can see that the "administrator of the the Institute phone system" would see this as a threat or at least an annoyance, but from the perspective of a mid-1960s undergraduate, I'd have to say that this stuff sounds more playful than malicious. ]

Posted by Mark Liberman at 12:29 PM

May 23, 2005

The temptation of overnegation

In October of 1973, Saudi Arabia declared an oil embargo against the United States, to protest U.S. support for Israel in the Yom Kippur war. In March of 1974, after Henry Kissinger helped negotiate a disengagement in the fighting, the Saudis lifted the embargo. The embargo caused an oil shock that "doubled the real price of crude oil at the refinery level, and caused massive shortages in the US".

In an interview with Business Week, Kissinger said, "I am not saying that there's no circumstances where we would not use force." See "Kissinger on Oil, Food, and Trade," Business Week, 13 January 1975, 66-76. [Gawdat Bahgat, "Oil and militant Islam: strains on U.S.-Saudi relations", World Affairs, winter 2003, Footnote 14]

Henry Kissinger was the U.S. Secretary of State at the time, and he was answering a question about whether or not the U.S. would invade Saudi Arabia. Presumably he weighed his words carefully, and meant to convey a threat. But the words that he chose are puzzling.

Kissinger might have said "I'm not saying that there's no circumstances where we would use force", to hint that we might in fact use force. The statement is in some sense meaningless, since it could be truly uttered by any government official in the world at any time in history: there are some circumstances where any nation will use force, and everyone knows it. All the same, for a practiced diplomat to say this, in reference to a particular source of tension, still communicates something. The repetition of well-known facts and even tautologies is often informative, if only by communicating that a certain issue is relevant: "Money doesn't grow on trees"; "What's right is right".

But Kissinger didn't say that. He threw in an extra negative: "I'm not saying that there's no circumstances where we would not use force". Did he mean that his default framework was "there's no circumstances where we would not use force", i.e. "we will use force in every circumstance"; and then back off from this uniform belligerency a bit by saying "I'm not saying that..."?

I don't think so. This seems like a classic overnegation. One way of looking at this is that in sentences with multiple negations, people get confused about how the polarity works out, and therefore put in the wrong number of negatives and end up saying the opposite of what they mean. Another perspective is that negation is a feature that sometimes seems to spread across multiple locations in a phrase. Though formal modern English is not a negative concord language, speakers are still often tempted by the old negative-concord patterns that still apply in colloquial phrases like "it ain't no cat can't get in no coop".

Some may speculate that Kissinger did this on purpose, to make the interpretation of his threat even more obscure. Maybe so, but I suspect that this reaction falls into the pattern described by another Kissinger quote: "The nice thing about being a celebrity is that when you bore people, they think it's their fault."

[Kissinger quote via Gabriel Nivasch]

Posted by Mark Liberman at 09:29 AM

May 22, 2005

Does Hayden Christensen have a clitic problem?

An email from Lane Greene about the phonology and phonetics of Episode 3:

It seems you haven't seen Star Wars yet. If you do, I'd love your take on the phonetics of Episode 3. John McWhorter's old post "Clitics on Broadway" was useful because it's clarified for me why Hayden Christensen's acting in the movies is SO ANNOYING: he has a clitic problem.

Except for Obi-Wan, the good guys in the Star Wars original trilogy all spoke colloquial American English, clitics and all. But Christensen irritatingly feels the need to look Deep and Serious, which of course requires full "THEM" and never "THUM" or "'EM", "YOU" instead of "YA". Yet at other times, palling around with Obi Wan near the beginning, he speaks plain American clitic-filled English. So when the clitics go and we're back to full pronouns, you just tense up: Anakin is having another Moment and you feel it coming like a disturbance in the Force.

Eric Bakovic points out Leia's code-switching in the original trilogy, but it's far less annoying to me.

Ugh. Still, light-sabers and all. I'm a Star Wars baby. I had fun.

I expect that I will, too, but I probably won't be able to get to see the movie until next weekend. And we'll have to wait for the DVD to check out Hayden Christensen's clitic problem instrumentally (because we wouldn't analyze a pirated copy, needless to say). But Lane's analysis rings true: there's a certain kind of solemn High Seriousness for which No Weakened Words is a common actor's emblem.

Posted by Mark Liberman at 10:53 PM

Language Quiz #4

It's been a while since we had a Language Quiz, so here's the audio for another one.

You can see some links to examples of how people went about solving another quiz here.

I'll provide some hints over the next couple of days, and the answer on Thursday Sunday.

Posted by Mark Liberman at 10:43 PM

Five more thoughts on the That Rule

As the mail on restrictive which vs. that pours in, I have the feeling of deeper understanding about some of it, and of deeper bafflement about other aspects.  Five thoughts on the That Rule, which I considered most recently in these precincts here:

  • First thought: There  is a sense (alluded to by James Smith in an ADS-L posting of 13 May) in which text that obeys the That Rule is clearer than text that does not, though text of the latter sort is not actually UNclear.
  • But on second thought, this extra bit of clarity is achieved by a prescription that has at least three odd characteristics: it seeks to eliminate an option long available in the formal standard written language; in doing so, it insists on increasing the redundancy of this variety (though prescriptions are usually profoundly conservative in this regard, insisting that redundancy in the standard language is exactly right in amount and in exactly the right places); and it opts -- surprisingly -- in favor of the variant (that) which is widely perceived as being the more INformal alternative.
  • Third thought: The That Rule has disseminated very unevenly.  The primary agents of its spread seem to those responsible for overseeing the editing of copy for newspapers and book publishing, especially in the United States.  American journalists figure prominently in the story.  Meanwhile, many people not involved in the editing enterprise (including scholars of grammar and usage) seem to have missed the "rule" entirely or to have tuned it out as irrelevant to their concerns.  One result is a startling disparity between, on the one hand, the advice books and style sheets that presses put out and, on the other hand, the grammars of English (some intended for students) that these same presses publish.
  • Fourth thought: In the process of dissemination, the That Rule has made its way into textbooks and manuals for writers.  Once there, the prescription might well go on forever as a "zombie rule"; no matter how many times, and how thoroughly, it is executed by authorities (like Quirk, Biber, Huddleston & Pullum, or, for that matter, me), it continues its wretched life-in-death in style sheets and grammar checkers and the like.
  • Fifth thought: For some, the zombie quasi-life of the That Rule has led to its being seen as a mere matter of "house style", like capitalization practices or font choice.  Well, this might explain the deep puzzlement that I get from editors when I insist on my right to use restrictive which, and their reaction when I charge that the disparate recommendations of their presses makes them look like a pack of hypocrites or fools -- which is to conclude that I'm a lunatic.  They don't see an issue.  House style is house style, right?

Now for some details.

Thought 1 begins with James Smith's posting:

I support "which/that" prescriptivism, in particular in formal language.  I find documents written in compliance with this rule are easier to read and clearer than those that ignore it...

In a sense, this this true, though documents that "ignore" the rule are not in any way unclear, as I pointed out in my last posting on this subject: so long as the punctuation is correct, there's no ambiguity or unclarity.  Nothing's unclear in "This is a day which will live in infamy" (FDR, via Huddleston & Pullum's Student's Introduction to English Grammar).  For that matter, nothing's unclear in "This is a day, which will live in infamy", though it's a really stupid sentence.

But following the That Rule makes your text EVEN CLEARER, since it will have redundant indications of the distinction between restrictive and non-restrictive relatives.  Every relative clause will have the distinction marked BOTH by punctuation AND ALSO by the choice of subordinator.  Hammer.  Nail.  Bang TWICE.  Who could argue with that?

Smith in fact goes on to suggest that the That Rule is really important only in the special circumstances of formal written English:

For general and informal usage, I - like most people IMO - ignore the "rule" with no great harm.  I first encountered this rule in graduate school - perhaps it is not meant for the unwashed masses.  :)

On to my second thoughts, which like Gaul come in three parts.

The That Rule is a "prescription by excision": it proposes to eliminate an alternative that had been long available in the formal written standard language. (The availability of which as a restrictive relative subordinator for hundreds of years is amply demonstrated in the literature on English grammar, in particular on the syntax used by "good writers".)  This is important: the That Rule is a proposal to CHANGE the formal written standard, by removing some of its flexibility.

That's already peculiar, though not unprecedented.  There's the Possessive Antecedent Proscription (*Mary's father adores her), the No Stranded Prepositions rule (*Which parent does the child take after?), the No Split Infinitives rule (*I'm going to France to not get fat), and a number of less well-known proscriptions that I hope to talk about in this space eventually.  All of these are attempts to get people who write formal written standard English to give up some of the options that have long been available to them.

Prescriptions by excision arise, I think, from two motivations: misguided "theoretical" considerations -- possessives are adjectives, so pronouns can't refer back to them, infinitival to forms a word with the verb that follows it, English should be like Latin, etc. -- and well-meant, but also misguided, attempts to prevent writers from falling into error by totally keeping them away from the structures in question -- avert the ambiguity of Mary's mother thinks she is adorable by outlawing ALL possessives as antecedents of pronouns (recall Smith on the "great unwashed" above).  Or, of course, both. 

What unites these two sorts of motivations is that someone has to have the bright idea.  Somebody has to think through to a hypothesis (however wrong) about English syntax.  Somebody has to note a problem in writing or reading and formulate a "rule" (however overbroad) to cover the case.  Prescriptions by excision have originators; if we are lucky, we can even identify them.  And these originators had to make their bright ideas explicit, put them into words.

Now, this sort of explicit attempt at tinkering with the language just can't be common, welcome, or particularly successful.  People have to get on with their lives, after all.  Prescriptions by excision are, as a result, pretty odd ducks.

The That Rule is not only a prescription by excision.  It's also a prescription in favor of redundancy.

The usual prescriptive take on redundancy is that whatever the formal written standard is, it has EXACTLY the right amount of redundancy, in EXACTLY the right places.  Preserve things just as they are, that's the ticket.  Reject the non-standard, the informal, the innovative, the regional, the spoken.  Irregardless, return back, continue on, etc. are pleonastic, but lack of 3sg -s in non-standard varieties is insufficiently informative.  There is no reasoning from first principles here: whatever is, is right.  (The prescriptive take on (ir)regularity is similar.)

So it's really odd to hear advice that redundancy in the formal written standard language should be increased.

As if this all weren't odd enough, there's the fact that the prescribed variant, that, is the one that's widely perceived as being the more informal alternative.  Prescriptions are generally hostile to variants that are perceived (correctly or, as is often the case, incorrectly) as being informal.  We are told not to strand prepositions, not to use reduced auxiliaries (I'm), not to use negative verb forms in n't (don't), etc., all on the supposition that these are markedly informal alternatives.  (In fact, in most contexts the other alternatives are markedly formal and these variants are stylistically unmarked, but let that pass.)  The variant that is unaccented, "more reduced", than the variant which, so many careful writers choose which in order to convey seriousness and emphasis; they are then baffled and outraged -- I think, rightly so -- when teachers take grades off for their choice of restrictive which over that.

Ok, that's all three parts of my second thoughts.

Thought 3 begins with a disparity I've written about here before.  The big modern scholarly grammars of English -- Quirk et al., Biber et al., Huddleston & Pullum et al. -- don't recognize any such thing as the That Rule.  Q and H&P simply list that and which (without comment) as alternative subordinators in restrictive relatives, while B goes to the trouble to provide corpus evidence in favor of restrictive which.

The grammars intended for students don't recommend the That Rule either.  The Oxford English Grammar (by Greenbaum, from the Quirk shop) just lists the alternatives, while the Cambridge Guide to English Usage (Peters, 2004) allows both variants and cites the Biber evidence, though noting that the Chicago Manual of Style endorses the That Rule "and American editors and writers more often seem to be exponents of it than their counterparts elsewhere."

As I've said here before, the high-end advice manuals are more open to which than the low-end guys.  But, as it turns out, I have been unfair to Bryan Garner, author of several Oxford-published usage dictionaries.  (The most recent -- 2003 -- of these is Garner's Modern American Usage, the title of which I'm inclined to see as OUP's attempt to distance themselves from Garner's idiosyncrasies.  Note that comprehensive reference works on English grammar and usage have for some years been the collaborative work of many people.  This makes sense, given the scope of the task.  Works by individuals, like Garner's dictionaries, look like expressions of individual and eccentric taste, in a tradition from another time.  I'm not saying people shouldn't be allowed to publish whatever their personal opinions are about their language, but I'm seriously unhappy when a major press publishes these assembled crotchets as a manual of usage.  But I wander from my point...) 

So what does Garner say in GMAU?  He's an idiot.  On page 832:

Suffice it to say here that if you see a which with neither a preposition nor a comma, dash, or parenthesis before it, it should probably be a that.

(This is a somewhat improved, though still baroque, version of the proscription I formulated in my last posting here.  Neither version encompasses the obligatory which after that in things like "That which does not kill me makes me stronger".  This one's in H&P.  But the larger question is whether a proscription is appropriate al all.) 

This is Strunk & White, the Associated Press style manual, the American Psychological Association style manual, the Chicago Manual of Style, and tons of house style sheets.  And the magisterial Robert Hartwell Fiske in The Dictionary of Disagreeable English (2004), who labels restrictive which as "solecistic for that" (p. 320) and goes on to tell us:

In the United States, the restrictive, or defining that is used when the clause it begins is necessary to the meaning of the sentence; the nonrestrictive, or nondefining, which is used when the clause is not necessary, when it is parenthetical, to the sentence.  Which clauses are generally separated by commas...; that clauses are not.  Observing the distinction between these two words and their clauses is indispensable to understanding clearly and effortlessly the sentences in which they appear.

Oi.  How did we get to this, from Fowler's 1926 tentative suggestion that English might be better if the functions of relative clauses were clearly distinguished by their subordinating words?  And what's this "in the United States" stuff?  I mean, Fowler was unquestionably an Englishman.

I'm still pawing through this stuff, but it looks like the path of dissemination for the That Rule was through people who oversee the editing of copy for publication, in newspapers, magazines, or books.  The early figures in the spread of the "rule" have connections to journalism or other forms of editing for publication.  Even today, people report that they first came across the "rule" in these contexts: on ADS-L, Bethany Dumas (13 May) tells us she "didn't always pay attention to the rule" until she got to law school; Paul Frank (14 May) says it was drummed into him in grad school at Harvard and Michigan; Jon Lighter (16 May) heard about it from "a journalism major at NYU around 1972".  Meanwhile, Doug Wilson (10 May) learned about using commas, but says he never had to deal with the That Rule until the Microsoft grammar checker began to ding at him.  Actual journalists, like Linda Seebach (e-mail of 10 May) just take the "rule" for granted.

Well, they do if they're American.  Paul Frank observed -- correctly, I think -- that British publications like the Guardian simply don't observe the "rule", and added that The Economist, which has a transatlantic audience, includes it in its style sheet but often flouts it in articles.

There is a certain tradition for prescription by excision in American journalism -- I intend to write about some other cases here -- and I don't entirely understand it.  But it seems to have contributed to the spread of the That Rule in the U.S.

Ok, thought 4.  Once the That Rule has some status in American copy editing, the institutions that prepare people for serious adult life are going to work to enforce that rule.  It's going to appear in school texts, in advice books for business people, and the like.

Once a "rule" gets this status, it's pretty well entrenched.  It will be handed down from one advice manual to another.  It will appear on standardized tests.  No matter how passionately authorities like Q, H&P, and B (and the rest of us) argue that it's a fiction, no matter how thoroughly we try to drive the stake into its heart, it will lurch on, perhaps for centuries.  (I will eventually write about other zombie rules with weaker legs than the That Rule.  But still they go on.  Buffy, we need you.)

Finally, thought 5.  Every so often, I've had to deal with editors from presses who are genuinely puzzled by the passion I have invested in protesting the That Rule.  It's just a matter of house style, they say; it has nothing to do with syntax.  You say how capitalization works, you tell people what fonts to use and how paragraphing is indicated and all that.  And you tell people which subordinators to use in restrictive relative clauses.  Why are YOU getting your knickers in a twist?  I mean (they say), this is basically all arbitrary stipulation, the only function of which is to create and maintain consistency in the press's publications.  (Some writers, like Louis Menand, even revel in arbitrary "rules" for their own sake.)

Twice, my aggressive truculence about the That Rule (and a collection of other zombie rules) has prompted editors to cave in to my craziness and let me do whatever I want.  Me.  Not anyone else, just me, for this one book.  They were then baffled that I didn't view this response as really satisfactory.  I pointed out that the scholarly books their firms published on English grammar uniformly failed to subscribe to the That Rule, so that their presses looked like packs of hypocrites and fools.  They simply didn't get it.  For them, one thing is scholarship, the other thing is practice.  They're just different.

Every so often I really run off the rails and rant.  Paraphrasing some from my e-mail to one of these presses:

Sometimes I wonder: if the people who make up style sheets and enforce them are so damn fond of arbitrary and indefensible "rules" not grounded in usage, even the usage of the intellectual elites, why don't they just invent some?  Say, your press won't publish any word with the letter "z" in it, or any sentence that begins with a vowel letter, or any occurrence of the pseudocleft construction, or the sequence "is for" (no matter how it arises)?  I can think of hundreds of entertaining "rules" of this sort.  You could hire people to enforce them, and make every book published by your press ENTIRELY CONSISTENT with them.  And then schoolchildren everywhere could be drilled on these "rules".  Your press could go down in history.

Hey, John Dryden did it for stranded prepositions.  Some still-unidentified person(s) did it for possessive antecedents for pronouns, less than a century ago.  There's plenty of territory still available.  Talk it up to your board.

Somewhat more seriously (though my rant is not entirely unserious), there are hundreds and hundreds of stylistic choices that could be excised.  The option between that relatives and zero relatives, for example: the people (that) I met.  The option between complementizer that and no complementizer, for another: I think (that) we should go.  I could go on for quite a while.  Why are we being allowed to make these choices willy-nilly?  Why isn't there a CLEAR RULE about which choice to make?  How is the That Rule different from these putative rules?

As a final little twist, I should note that at least once the choice of that over which has been justified to me by someone who pointed out that that has one letter less than which.  Brevity rules.  It's like not using the serial comma; after all, that final comma isn't necessary because the and signals the end of the list.  That is, the final comma is redundant, and therefore not necessary, so we can save a little bit of space.

Whoops.  DON'T use the final comma because it's redundant (and therefore unnecessary).  USE that because it's redundant (and therefore clearer).  What's a poor boy to do?

zwicky at-sign csli period stanford period edu




Posted by Arnold Zwicky at 02:00 AM

May 21, 2005

Secure and insecure scare quotes

A staff member at my university gave me a document to review, and the Post-It note on the front said ‘See "tagged" pages.’, referring to little colored tags sticking out to mark some of the pages. I asked her why she had put tagged in quotation marks on her note, and as I expected, she said she wasn't quite sure whether tag was a proper verb, didn't want to say anything that was wrong, and so on. The quote marks were a sign that this might not be the correct word to use. In other words, they were what scholars call scare quotes. And that's when it struck me that there were two shades of meaning for scare quotes, pragmatically distinguished.

When a full professor who works in some field like linguistics or philosophy puts a word or phrase in scare quotes, it's about the word or phrase: it's an indication that it may be the wrong one, an expression that ignorant and careless writers elsewhere have used but which really should be eschewed. Professors (well, professors of subjects like linguistics, logic, and philosophy, anyway) write from a standpoint of feeling linguistically fairly secure.

But when a staff member of lower perceived status uses the same device, the semantics is the same — the quote marks mean that this word or phrase may not be strictly correct — but pragmatically it's quite likely to indicate a very different situation, one in which the user feels insecure about whether the right word or phrase has been chosen.

One more reason why the people who try to say there isn't a distinction between semantic and pragmatic aspects of meaning (and there still are a few such people) just can't be right.

Posted by Geoffrey K. Pullum at 09:54 PM

More on Canadian French preposition stranding

Hervé Saint-Amand, a "23 year old French Canadian from Montreal" wrote to set me straight about sentences like "Le gars que je te parle de". He explains that they are far from being "unthinkable in Montreal":

I know many people who use that type of construction regularly, and sometimes I use them too. We all know they are incorrect, but many of us (chiefly young people) use them.

He goes on to say that "personally I never thought these constructions came from English", noting that he feels it has something to do with wanting to avoid the relatively formal relative-clause introducer dont, as in the "correct" alternative "Le gars dont je te parle". He cites as other examples:

l'hôpital auquel je vais
l'hôpital où je vais
?l'hôpital que je vais à

l'homme pour lequel je travaille
l'homme pour qui je travaille
?l'homme que je travaille pour

la ville d'où je viens
?la ville que je viens de

where ? marks phrases that have the same sort of status for him as "Le gars que je te parle de", i.e. understood to be non-standard but often used in informal contexts anyhow. He adds that

It's interesting (I myself am discovering all of this as I type), that some prepositions can't be used with such constructions, it would just sound completely goofy and unacceptable. For instance,

la fille chez qui je vais
*la fille que je vais chez

I can't picture anyone saying that. It's not just a matter of being incorrect with regards to "standard French". I can't imagine even the low-class school drop-outs saying that.

I imagine that some of the excellent syntacticians in Francophone Canada have explored this further. Perhaps there is relevant stuff in Yves Rorberge and Nicole Rosen, "Preposition Stranding and que-deletion in Varieties of North American French", Linguistica Atlantica, v. 21 pp. 153-168 (1999) -- but I can't read it on line.

Anyhow, Hervé agrees with me that the "French" examples that I cited from Elizabeth Bear's SF novel Hammered are not French at all, standard or not standard. With respect to "Comment massif parlons-nous de?" as an answer to someone asking for a "massive favor", he wrote that

I think nobody would use such a syntax, at least not in Montreal. To me it's an obvious word-by-word translation from English.

First of all, "massif" is not an adjective one could use in this sentence. The meaning of "massif" is slightly different from that of "massive", and off the top of my head I can't find any example of "massif" being applied to an abstract object such as a favour. I don't think Montreal people would ever say "une faveur massive" (other, more English-influenced regions may use that anglicism, though).

Then the syntax is also unacceptable. We may use similar constructions, though, such as

   on parle de combien gros ?
    on parle de comment gros?

which is definitely not "standard French", but is commonly used in Quebec, and roughly means "how big (an amount) are we talking about?". But intuitively I feel uneasy about putting other, more semantically refined adjectives in that mold, and this:

    ?on parle de combien massif ?
    ?on parle de comment massif ?

sounds akward.

Note that "comment" is commonly used in place of "combien" in Quebec.

With respect to the second example I cited, "Est-cela si beaucoup de demander?", Hervé's opinion was:

That one's even worse. First of all, "Est-cela" is not French, neither standard, literary, Quebec, informal slang or anything. "Est-ce" would be the correct form. "C'est-tu" would be the typical Canadian form.

"si beaucoup" is not something we'd use either. I'd be surprised if anyone, in any region, used that (I've been surprised a few times in the past, though). "Tant" is much shorter and easier to use. And it's even the correct, standard way of saying it! "Tellement" could also be used.

Finally, the "de" is akward. Maybe "à" would be better, but even that would be odd.

These variants could be used in Montreal:

    C'est-tu tant demander ?     C'est-tu tellement demander ?

The correct French way to say it would be

    Est-ce tant demander ?

but nobody talks that way, not round these parts anyway. It would sound theatrical.

I can confidently say that it wouldn't be normal for someone from Montreal to utter these sentences as they appear in the book, though.

So here's a suggestion to Bantam/Random House and Elizabeth Bear: before the sequel to Hammered comes out, hire Hervé to check out the Canadian French bits! C'est-tu tant demander?

Posted by Mark Liberman at 09:19 PM

A 16th-century eggcorn

Language Hat points out the workings of coincidence in the origins of the verb press, in the sense "To engage (men) with earnest-money for service; to enlist by part-payment or ‘bounty’ in advance". I've given the gloss from the OED, and here is the OED's account of the etymology:

[Altered from or substituted for PREST v.2, by association with PRESS v.1: see PRESS-MONEY.
This result may have been facilitated by the fact that the pa. tense and pa. pple. prest could be the pa. tense and pple. either of prest v. (cf. cast, cost, thrust), or of press v. (cf. drest, past, tost), so that ‘he was prest’ could be understood either as ‘he was prested’ or ‘he was pressed’.]

This is a perfect example of the kind of process that creates an eggcorn: a word or phrase is given a new, etymologically incorrect morphological analysis, which is similar in sound and plausible in meaning. But the change prest → pressed went to completion by 1600 or so, and I didn't even know of the connection until I read Hat's post.

Posted by Mark Liberman at 05:18 PM

An avalanchlet of snowclones

moresnowclones Now that we've revisited the wonderful world of snowclones (here, here, and here), they seem to be everywhere.  Here are three more that have recently come to my attention: the N that is N (the abomination that is Jar Jar Binks), from Aaron Dinkin in e-mail (19 May); one man's X is another man's Y (one man's terrorist is another man's freedom fighter), from Rachel Shuttlesworth on the American Dialect Society mailing list (20 May); and color me X (color me surprised), which I was reminded of this morning when I ran across references to Color Me Arnold (a coloring book of sorts, aimed at Arnold Schwarzenegger, the Governator of my state).


But first, three comments.

First comment: the line between clichés, some of which can have open slots (the wonderful world of X, as in the wonderful world of snowclones above), and the somewhat more complex classic snowclones, like the X have N words for Y (which gave the genus its name), is not at all clear.  Probably it's like the line between idioms and constructions: there are pretty clear examples at the extremes (the idiom by and large, the construction Subject Auxiliary Inversion), but a range of intermediate types, with varying degrees and kinds of freedom as to what can fill the slots in the pattern and with varying degrees of semantic and pragmatic specialization.

As for the wonderful world of X, besides the very familiar X = Disney, Google's 600,000 raw web hits for "the wonderful world of" include, in no particular order, the following fillers for X:  border collies, insects, trees, Linux 2.2, Linux 2.6, the manatee, Calli And Graphy, renewal energy, coins, Paso Fino horses, weather, animation, Larry Carlson, poodles, wine, Narnia.  There's one open slot, and the expressions are semantically and pragmatically transparent.  It's just that wonderful and world collocate much more often than the other (non-alliterative) possibilities: 63,900 hits for amazing world (roughly one-tenth of the wonderful world count), 3,480 for marvelous world, 1,140 for astounding world, and a mere 617 for wonderful universe.

Contrast this simple collocational pattern with Geoff Pullum's characterization of the snowclone as "a multi-use, customizable, instantly recognizable, time-worn, quoted or misquoted phrase or sentence that can be used in an entirely open array of different jokey variants by lazy journalists and writers."

Second comment: an update on the once a X, always a X snowclone.  Roger Depledge writes (on 17 May) to point out that the legal Latin formula semel X, semper X 'once a X, always a X' works semantically as well as phonologically, since semel and semper share an element sem- 'one, together' that goes back to PIE.  In any case, it seems likely that the fixed version of the English snowclone was strongly influenced by the Latin formula.  Depledge points out that the formula also occurs in French (X un jour, X toujours) and German (einmal X, immer X).  (For all I know, it might occur in Finnish, Hungarian, and Russian as well.)  Whether different languages were separately influenced by the Latin formula, or whether the formula spread from one modern language to others, or both, is for textual scholars to discover.  I am so not a textual scholar.

Third comment: Barry Popik (ADS-L, 18 May) adds an entry to the X is the new Y inventory: Chocolate is the new black (which he first observed at a Godiva chocolate store).  It's not entirely clear to me from the links that Popik supplies, but I think that the intention is to convey that chocolate is an affordable luxury, like the famous little black dress.  In any case, X is the new Y was one of the first snowclones to come to our attention here at Language Log Plaza, back when the furniture was still being installed in our gleaming office tower.

And now on to the new entries.  First, the X that is Y, where X is a descriptive noun (with strong evaluative content) and Y refers to the thing or person (usually person) that is being described.  Dinkin's geeky examples, gleaned from fanblogs (Dinkin notes defensively, "Not that I ever read fanblogs or anything like that. This is purely for research purposes, of course."):

    the abomination that is Jar Jar Binks
    the greatness that is Yoda
    the manliness that is Jayne
    the weirdness that is Xander
    the gorgeousness that is Viggo Mortensen
    the beauty that is Dominic Monaghan
    the enigma that is Snape

    the failure that is Enterprise
    the wretchedness that is Matrix: Reloaded

    the miracle that is George Lucas's imagination

(Please remember that Dinkin and I are merely reporting these evaluations, not agreeing with them. I myself am, like the writer of the above, partial to Viggo Mortensen, but find Xander charming rather than weird and could describe George Lucas's imagination as a miracle only in a sarcastic moment.)

Dinkin observes that's essentially impossible to do a Google search for this snowclone, since "the * that is *" turns up millions of false positives.  He found the ones above by trying various instances of Y that were likely to elicit strong feelings from geeks.  Undoubtedly the formula occurs in non-geek contexts; Dinkin just happened to have noticed examples in fanblogs.

Goodness knows how you'd track down the origins of something like this.

On to the second example, one man's X is another man's Y, for which we have a pretty good idea of the source, namely one man's meat is another man's poison.  Rachel Shuttlesworth found examples in exactly this form (well, with variant spelling: "one man's meate is another man's poyson") from 1618 (where it was already referred to as "a proverb").  The OED Online (March 2005 draft revision) has a slightly earlier version, from ca. 1576: "þat which iz on bodies meat iz an oþerz poizon."  The OED also has a 1604 cite that refers to "That ould moth-eaten Prouerbe..One mans meate, is another mans poyzon."  Moth-eaten already, four hundred years ago!  Variant formulations (including the reversed "One man's poison, another man's meat", from 1902) appear throughout the centuries, but the archetype is clear.

The ultimate source again appears to be Latin.  Lucretius, De Rerum Natura, in fact.  The closest dictionary of quotations (a Bartlett's 15th edition, of 1980) gives, from book IV, line 637 in the Rouse translation, "What is food to one, is to others bitter poison."  In the original: "Ut quod ali cibus est aliis fuat acre venenum."

In any case, the fixed meat... poison version has been robust for centuries, and now (as Shuttlesworth observed) serves as the model for all sorts of variations, including the one in the Paul Simon song "One man's ceiling is another man's floor".  Among Google's 120,000 raw web hits for  "one man's * is another man's *" are the following pairings: weed... ground cover, coconut... grenade, junk... treasure, security... prison, pork [in the legislative sense]... dinner, home... castle, vice... virtue, trash... treasure, data... metadata, meat... girlfriend (sigh), mistake... smart move.  There are other variants out there; Shuttlesworth unearthed the following 1853 quote from the New York Daily Times, that refers to the "old, musty, but true proverb" and then plays with it: "What was one man's loss was another, yes, a thousand ladies' gain.

At this point, the ADS-L discussion turned to the question of which wag first varied the formula to the punning "One man's Mede is another man's Persian".  Definitive results not yet in.

Finally, the third example, color me X 'I am X'.  Googling for this one requires sorting through names of coloring books and straightforward instructions to "color me green/black/etc."  But there's plenty of gold left, with X = surprised, impressed, jealous, sensitive, beautiful, confused, underwhelmed,...  There are plenty of song titles, too: Color Me X, with X = Badd (Young, Gifted & Badd), Blind (Extreme), Gone (Rhonda Hampton), Impressed (The Replacements), for example.  And, of course, the Streisand song, and album, Color Me Barbra (1966).

There are plenty of examples with other object pronouns: color her angry, color him [designer Tibor Kalman] a provacateur, color them [Nokia] booming, color them confident.  No doubt plenty of non-pronominal examples can be found as well.

The ultimate source is surely instructions in coloring books, involving a stretch from things like "color the pig pink" to things like "color me happy".  At some point, the expression became fashionable (and therefore annoying), but I'm not quite sure what the precipitating events were.  I have a haunting feeling that La Streisand was not the origin, that there was a song, or book, from the '50s.  If so, I'm sure some student of popular culture will let me know, in e-mail beginning "How could you possibly have forgotten...?"  I will be appropriately humble, and thankful.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 02:01 PM

The condescension of descriptivism

Languagehat has weighed in, as always interestingly, on the curious use of unpacked in sentences like The boxes were still unpacked that Mark and I talked about in some recent posts (here, here, and here. LH argues that the usage shouldn't be ruled out as "a part of English" just because people who use the construction generally renounce it once its apparent illogicality is pointed out to them:

People who think language should be a certain way even though it's not, even in their own usage, are perfectly willing to condemn their own usage and say "it's wrong, I won't do it again..." You can't depend on users' judgments in these matters, you have to look at the facts of usage, and based on what I've seen at the Log, one meaning of unpacked is '(still) packed.'

LH compares this case to the which-that rule which, as Arnold observed in an earlier post, is routinely violated even by writers who recommend it. Descriptivists, the story goes, should pay attention only to what people say, not what they say about what they say.

Am I the only one who worries that this view may be a bit condescending?

True, there are times when externally imposed prescriptive norms color speakers' views of acceptability -- "I said literally but I really should have said figuratively," to take the analogy Ben Zimmer offered -- and in those cases lexicographers and linguists are surely right to dismss speakers' protestations as a reflex of linguistic false consciousness and to record the relevant use of literally as "part of English," as LH puts it, or as what Ben Zimmer calls "a legitimate word."

There's no question that this use of unpacked is quite frequent in contemporary English, as Ben, Jesse, and Mark have all demonstrated -- in fact a search in the Making of America collection turns up examples from early 19th-century prose, as well:

It was sad waste of time, indeed, to be sketching and staring about, when the cold chickens were still unpacked, and the damask napkins undistributed. The Living Age, 20, 243; 1849

Thus the attempt to enforce the Stamp Act proved utterly ineffectual. The bales of stamped paper remained unpacked at Castle William; no man being bound to open and distribute them. The North American Review, 11,29; 1820

For descriptivists, that's reason enough to count the construction as "a legitimate usage," as Ben Zimmer puts it, and record it as a form of English. But despite that continued use, no prescriptivist has ever condemned it as a solecism, perhaps because it's hard to cotton to. (Merriam-Webster's Dictionary of English Usage doesn't mention it, nor does any usage guide I'm aware of). In any event, if writers consistently repent of the construction when the problem is pointed out to them, it can only be in consultation with their own inward manuals of English morphology, not with some rule they learned at the end of Sister Petra's ruler. And in ignoring those judgments, as uniform and as uncoerced as they are, what lexicographers would be saying in effect is that these people really have no right to their opinions -- "Thanks, but if it's all the same, we'll tell you what's part of English."

It goes without saying that a comprehensive description of English should take note of this curious use of unpacked. But there are ways of doing this without seeming to recognize it as a fully naturalized citizen of English -- that's why we have usage notes, after all. Let's not be so quick to throw out native speakers' Sprachegefuehl. They have the sense they were born with.

More, yet

Several people have written me to note other constructions that seem to resemble this one. Kate Gregory noted that:

I have a friend who consistently says "unthaw" as in "I have a steak out of the freezer and I just have to wait for it to unthaw." When someone once pointed out that you would have to wait a long time for a steak on the counter to spontaneously refreeze, she was truly puzzled for a long long time.

And John Cowan wrote to remind me that unloose "clearly means 'loosen', not 'tighten' or 'fasten'." In fact Hans Marchand noted in his classic Categories and Types of Present-Day English Word-Formation that

Occasionally un- redundantly intensifies verbs which have in themselves a privative meaning, as in unloose 1362, unpick1377... [and]undecipher 1654.

But this isn't really pertinent to the case of unpacked. For one thing, the verb pack isn't privative the way loosen or decipher is (for some speakers, presumably, thaw is one of these as well). For another, the relevant meaning isn't found with the verb unpack, which can only be reversative (i.e., "cause to be no longer packed").

Then too, the participles of verbs with intensifier un- don't generally permit stative readings. Were/was/been unloosed get around 1400 Google hits; still unloosed gets exactly one involving the relevant sense of still, with one more for remain unloosed ("No dog or cat shall be allowed to remain unloosed at anytime except when fenced," a usage that I personally find pretty weird). The corresponding ratio for was/were/been unpacked to still unpacked is about 60 to 1. That is, the appropriate analysis of still unpacked is one where unpacked is an adjective containing the negative prefix un-, not a participial form of a verb *unpack that contains an intensifier prefix.

That helps to explain why speakers who are ready to accept words like unloosen and the like as curious but idiomatic continue to reject unpacked in its "full" reading, even after centuries of common use. What it does not explain is why the error is so inviting. Watch this space.

Posted by Geoff Nunberg at 01:15 PM

"Est-cela si beaucoup de demander?"

Like Geoff Pullum and many others, I deal with air travel by reading. Before heading out, I try to visit the bookstore and buy a few mysteries or SF novels or similar things. If I forget this crucial step, or I don't have the time, I need to make do with what happens to be available at the airport. Often this works out well. But on a trip to Pittsburgh last week, the results were interestingly mixed.

The book I settled on was an SF debut by Elizabeth Bear, "Hammered", published by Bantam in January of 2005. There were a lot of things about it that I liked, including a setting in post-apocalyptic Hartford, Connecticut -- near where I grew up -- and an escaped AI chatbot based on Richard Feynman. However, there was one thing about it that really troubled me.

The central character is a former Canadian military pilot named Jenny Casey. Despite her name, she's from a Montreal francophone background, and so are several of the other characters. Their internal monologues and many of their conversations are liberally sprinkled with French words and phrases. There are a few indications of familiarity with French Canadian features, like the spelling "marde" for merde. But most of the French seasoning appears to have been created by translating idiomatic English phrases nearly word-for-word.

For example, on p. 200, a pre-teen girl named Leah Castaign asks her father for something:

She took a breath. "Can I ask you a huge, gigantic, massive favor?"

"Comment massif parlons-nous de?"

In English, if someone asks you for a "massive favor", asking "how massive are we talking about?" is an idiomatic response. Now I blogged a while ago about Ruth King's documentation of sentences like "Le gars que je te parle de" ("the guy that I'm talking to you about") in French from Prince Edward Island; but she indicated that this kind of preposition-stranding is unthinkable in Montreal as well as in standard French; and even on Prince Edward Island, I wonder if Bear's French calque for "how X are we talking about?" is likely.

On p. 301, Jenny Casey is going through physical therapy, and not enjoying it. Her thoughts go like this:

I want a drink and a quiet window. I want to take Gabe out and sit in the sun and drink beer and eat poutine and get silly with the girls. Est-cela si beaucoup de demander?

Again, "est-cela si beaucoup de demander" seems like a word-for-word rendering of English "is that so much to ask". My command of French -- Canadian or otherwise -- is not very good, but it doesn't seem right to me. Google tends to support me, in that {"so much to ask"} get 28,000 hits, while {"si beaucoup de demander"} gets none at all.

These are not isolated examples -- there are dozens of others, sprinkled throughout the novel.

So there are three possibilities here:

1. Bear knows an undocumented variety of radically English-influenced Canadian French.
2. Bear is imagining a future (45-55 years from now) in which that kind of Canadian French has come into existence.
3. Bear knows a little bit of French, which she eked out for purposes of this book by writing phrases in idiomatic English and translating them (almost) word-by-word into French.

My money is on (3), and so I have a question: doesn't Bantam (a division of Random House) have editors that can find someone who knows Canadian French to help out in a case like this? It would be very easy for someone to substitute reasonable translations. And what about the Online Writing Workshop for Science Fiction, Fantasy and Horror, and the 13 participants whom Bear thanks by name "especially but not exclusively" as "first readers"?

This is a small flaw, but it really degrades the book for anyone with even the marginal sort of command of French that I have. It's surprising that someone along the way didn't notice the problem and find a way to help fix it. There's a sequel due out in a few months -- is Dell's editorial staff still asleep at the wheel? Or should I say, imitating Bear's method, dormant à la roue ?

Posted by Mark Liberman at 11:47 AM

May 20, 2005

Unclear of Yoda's syntax the principles are, if any

A bit of empirical investigation has left me more puzzled about Yoda's syntax than I was before.

I looked at all the 60 lines attributed to Yoda in the script for Episode 3. Most of them have the preposed-constituent structure suggested by Geoff Pullum's analysis, where the expected order of two substrings in an English clause is swapped:

<substring1> <substring2><substring2><substring1>

In general, <substring2> is plausibly a syntactic unit ("constituent"), while <substring1> may not be.

To a dark place this line of thought will carry us.
Obi-Wan, my choice is.

However, sometimes the constituency of <substring2> may be unclear:

Good to see you, it is.
Still much to learn, there is.

The transformation is typically carried out clausewise:

At an end your rule is and not short enough it was, I must say.

And <substring2> is often inserted after an initial adverb or conjunction:

Then now the time is, Commander.
If a special session of Congress there is, easier for us to enter the Jedi Temple it will be.

And as Geoff observed, about a fifth of the clauses have the unmarked English order:

Death is a natural part of life.
The fear of loss is a path to the dark side.
A Master is needed, with more experience.
They are our last hope.

However, there are several other strange things. Sometimes the pattern is

<substring1><substring2><substring3><substring3><substring2><substring1>

as in

Master Kenobi, our spies contact, you must, and then wait.    (← you must contact our spies)
To fight this Lord Sidious, strong enough, you are not.    (← you are not strong enough to fight this Lord Sidious)
To question, no time there is.    (← there is no time to question)

(Or as in Anthony Lane's "Break me a fucking give".)

These triple swaps could be analyzed as two nested or iterated frontings, say 1 2 3 → 2 1 3 → 3 2 1, but I don't see any particular evidence either for or against this view.

In some cases, the triple swap can be analyzed as involved a fronted constituent plus inversion of the subject and auxiliary, as in the normal English sentence "With no job would John be happy":

Not far, are we, from the emergency ship.    (← we are not far)
Master Kenobi, dark times are these.    (← these are dark times)
Heard from no one, have we.    (← we have heard from no one)

but this may be a coincidence.

Furthermore, Yoda sometimes seems to front an intransitive verb and adds tensed do to take its place, analogous to the phenomenon that linguists call "do-support":

Stink, this mud does.

I've given this analysis because there are no instances of extra do without the verbal fronting.

There are some cases where we can't tell whether there is a triple-re-ordering, fronting with subject-aux inversion, or just a strange copular order:

Disturbing is this move by Chancellor Palpatine.    (← ? this move by chancellor palpatine is disturbing)

There are some odd ellipses mixed in with the re-orderings:

Killed not by clones, this Padawan. By a lightsaber, he was.
Visit the new Emperor, my task is.

The first one might be

Killed not by clones, this Padawan.    (← [he was] not killed by clones, this padawan)

but it could be generated in lots of other ways, too.

Oddest of all, the fronted element is sometimes inserted between subject and predicate:

That group back there, soon discovered will be.

(though this might also be from "... will soon be discovered" with two criss-cross frontings?)

I might take a look at the broader corpus of Yoda-speak found in the other movies, but I have little hope that the principles will become clearer. If Yoda-speak were a natural language, I'd expect that more data would tend to support some accounts, and disconfirm others. In this case, the generalization that I expect to emerge is that George Lucas designed his plots on simple and consistent grammatical principles, but invented Yoda's sentences without any.

[Update: whatever the non-principles are, Mr. Sun has them down pat (scroll down to his update at the bottom):

You may stop e-mailing me now with advice to tell my son, "Talk to girls, learn you must", "Begun, your lameness has", and "In your mother's basement, live you will."

]

Posted by Mark Liberman at 02:28 PM

May 19, 2005

Juliet was wrong

Standing at a window overlooking her family orchard in Verona about 700 years ago, Juliet Capulet is reputed to have developed a famous hypothesis which Shakespeare later recorded.  Details may have been lost in translation, transmogrified through the passage of several hundred years before her words were set down, or magnified from nought by the pen of a man whose poetic license has never been paralleled. This is what she hypothesized:

What's in a name? that which we call a rose
By any other name would smell as sweet;
So Romeo would, were he not Romeo call'd,
 
The guy smelt pretty damn sweet for Juliet to scent him from a window high above an orchard, but remember, these were the middle ages. And, anyhow, it turns out she was wrong.

Maybe it's no surprise that she was wrong, since she's well known for her tragic mistakes. Yet despite her famously bad judgment, the Rose Hypothesis is oft cited and widely believed. No longer, perhaps! According to this article in today's Guardian, a group of psychologists at Oxford University has determined that words you see affect what you smell. Via the website of the journal Neuron, I located the full reference to the original article:

Cognitive Modulation of Olfactory Processing, Ivan E. de Araujo, Edmund T. Rolls, Maria Inés Velazco, Christian Margot, and Isabelle Cayeux, Neuron, Vol 46, 671-679, 19 May 2005

The full text is here, but I am not sure whether access is free to all, or whether it will remain so. Here is the summary of the article:

We showed how cognitive, semantic information modulates olfactory representations in the brain by providing a visual word descriptor, "cheddar cheese" or "body odor," during the delivery of a test odor (isovaleric acid with cheddar cheese flavor) and also during the delivery of clean air. Clean air labeled "air" was used as a control. Subjects rated the affective value of the test odor as significantly more unpleasant when labeled "body odor" than when labeled "cheddar cheese." In an event-related fMRI design, we showed that the rostral anterior cingulate cortex (ACC)/medial orbitofrontal cortex (OFC) was significantly more activated by the test stimulus and by clean air when labeled "cheddar cheese" than when labeled "body odor," and the activations were correlated with the pleasantness ratings. This cognitive modulation was also found for the test odor (but not for the clean air) in the amygdala bilaterally.

So, by a quite unscientific though not implausible extrapolation, that which we call a rose by the name of a dog turd might not smell half as sweet.

And while we're on the subject, why are there so many different names for roses, but so few for dog turds? Is the intrinsic variation of dog turds so much less?

And while we're not on the subject, Juliet was not trying to rid Romeo of his given name, but of his family's name: she wanted him a Capulet. As a semanticist, I should love to have been born with the name Montague. However, he left no children. Like Romeo, but for different reasons. I won't go into them here, except to say: it ended badly.

Posted by David Beaver at 02:12 PM

Obligatorily split infinitive in real life

I just heard Alex Chadwick, on NPR's program "Day to Day", say the following in a dicussion about military policies on women in combat:

But still, the policy of the Army at that time was not to send — was specifically to not send — women into combat roles.

Note the obligatorily split infinitive. Saying The policy was not to send women into combat was far too likely to be understood as the weaker claim that sending women into combat wasn't the policy, and Alex realized that on the fly, and corrected himself. He wanted to refer to the stronger claim that not sending women into combat was the policy, and there was simply no way for him to do it, given that he was using an infinitival clause after the copula (was), unless he placed not between to and send. So he correctly did so. Far from being ungrammatical, split infinitives are (as we have explained before on Language Log) always an option for modifiers of infinitival clauses, and sometimes the only option. Far from being impermissible, they are sometimes required.

Note added later: I've corrected the quoted sentence (which I originally heard while driving) after listening to the program as archived here.

Posted by Geoffrey K. Pullum at 01:35 PM

Ginormous

Merriam-Webster had a contest, and ginormous won. Read all about it.

Posted by Mark Liberman at 11:57 AM

Syntax is a disturbance in the there

Anthony Lane doesn't like “Star Wars: Episode III”, and one of the many things about it that gives him heartburn is Yoda's word order:

...what’s with the screwy syntax? Deepest mind in the galaxy, apparently, and you still express yourself like a day-tripper with a dog-eared phrase book. “I hope right you are.” Break me a fucking give.

The quoted example ("I hope right you are") follows Geoff Pullum's analysis that Yoda

[favors], almost to excess, certain special constructions ... [which] take not only an adjunct but also a predicative complement or a nonfinite catenative complement and prepose them (pop them at the front of the clause).

But Lane's punch line -- "break me a fucking give" -- is an example of some different process. It still gives a vaguely Yodian impression, but what's the syntactic generalization here? Since the New Yorker doesn't have joke checkers, I'll take up the challenge in this case.

Leaving syntax behind for a minute, I can't resist a pragmatic question. Is this the first time that the New Yorker has dropped the F-bomb, not in a quotation or a piece of fiction, but to express the author's own attitude in a review or non-fiction piece?

Returning to the syntax, we can observe that Lane has swapped the first and last words of the sentence:

give me a fucking breakbreak me a fucking give

But this is surely not the process in general, since it would map "I hope you are right" to "Right hope you are I", or "There is a disturbance in the force" to "Force is a disturbance in the there." These are not Yoda-ish, just foolish.

Maybe we can get a clue by coming at the problem from another direction. Given that Lane wanted to Yoda-ize "give me a fucking break", what were his alternatives?

None of the strings created by moving chunks from the end to the beginning work:

a. *Break give me a fucking.
b. *Fucking break give me a.
c. ??A fucking break give me.
d. *Me a fucking break give.

I think that there are some prosodic constraints at issue here, quite apart from the syntactic issues and the problems with breaking up an idiom.

As is well known, "fucking" wants to precede a strong stress, even to the extent of forcing its way inside words: "in-fucking-credible", "tele-fucking-graphic". This eliminates (a).

The indefinite article "a" is a proclitic, a word that wants to merge phonologically with the word that follows it. This eliminates (b).

The object pronoun "me" is arguably an enclitic, merging phonologically with the word that precedes it. This eliminates (d).

As for (c), there seems to be a wider problem, for modern English speakers in general, about applying Geoff's "special constructions" to sentences with imperatives. For instance:

A. Want a beer?
B1: A beer you could give me, but not cider.
B2: ??A beer give me, but not cider.

The problem is not with the imperative itself, since "sure, give me a beer" is a fine answer.

So as you've probably guessed, I'm giving a sort of constraint-based analysis here. "Break me a fucking give" is a pretty bad Yoda-ization, but it's better than the alternatives. Of course, it also helps a lot that the result is a scrambled form of a fixed expression. If you have a better analysis, let me know (myl@cis.upenn.edu) and I'll post it.

[If you didn't follow the link to Lane's review, or if it's expired, here's the rest of his let-it-all-hang-out take-no-prisoners anti-Yoda screed. Perhaps Lane should get together with Kelly Dobson, the inventor of Blendie, so as to try Machine Therapy with the blender application that he proposes in this passage:

No, the one who gets me is Yoda. May I take the opportunity to enter a brief plea in favor of his extermination? Any educated moviegoer would know what to do, having watched that helpful sequence in “Gremlins” when a small, sage-colored beastie is fed into an electric blender. A fittingly frantic end, I feel, for the faux-pensive stillness on which the Yoda legend has hung. At one point in the new film, he assumes the role of cosmic shrink—squatting opposite Anakin in a noirish room, where the light bleeds sideways through slatted blinds. Anakin keeps having problems with his dark side, in the way that you or I might suffer from tennis elbow, but Yoda, whose reptilian smugness we have been encouraged to mistake for wisdom, has the answer. “Train yourself to let go of everything you fear to lose,” he says. Hold on, Kermit, run that past me one more time. If you ever got laid (admittedly a long shot, unless we can dig you up some undiscerning alien hottie with a name like Jar Jar Gabor), and spawned a brood of Yodettes, are you saying that you’d leave them behind at the first sniff of danger? Also, while we’re here, what’s with the screwy syntax? Deepest mind in the galaxy, apparently, and you still express yourself like a day-tripper with a dog-eared phrase book. “I hope right you are.” Break me a fucking give.

In his interview with Robert Birnbaum on identitytheory.com, Anthony Lane identifies himself as someone who appreciates a brisk checking, whether of facts, theories or jokes, and might also benefit from the release of sensorial energies promised by Machine Therapy. ]

[Update: Will Fitzgerald sent a link to a purported script for the movie, which includes the line

    Yoda: A moment to bathe, give me.

As Will points out, this is exactly comparable to my alternative (c),

    A fucking break give me.

Will also supplied a dialogue context to help overcome the reluctance of idioms to be fragmented:

X: What do you want me to give you? Some kind of special treatment?
Y: A fucking break, give me. That's all I ask.

Will suggests that this response is plausible for Yoda, though perhaps not for the rest of us.

So why wasn't this good enough for Anthony Lane? My own guess is that Lane just put "give me a break" in his mental blender, spun the dial up to puree, and poured the results out on the page, "break" first. The result is more fluent prosodically, though incoherent syntactically. ]

Posted by Mark Liberman at 07:17 AM

May 18, 2005

Fasten = Grecian?

Andrew Sullivan comments on the NYT's decision to charge $50/year for on-line access to columnists:

By sectioning off their op-ed columnists and best writers, they are cutting them off from the life-blood of today's political debate: the free blogosphere. Inevitably, fewer people will link to them; fewer will read them; their influence will wane faster than it has already. The blog is already becoming a rival to the dated op-ed column format as a means of communicating opinion journalism. My bet is that the NYT's retrogressive move will only fasten the decline of op-ed columnists' influence.

I think he's right about the effects of the decision. But will the NYT's move really "fasten" their columnists' decline? If so, what will it be fastened to?

Sullivan apparently meant that charging for access will make print columnists' decline happen fast+er, and thus will fast+en it, just as making something deep+er deep+ens it, or making it dark+er dark+ens it, or making it moist+er moist+ens it.

However, making verbs from adjectives by adding +en is not a productive derivational process. You can't hotten or coolen your drink, or louden your ipod. The Springfield Theme Song "Embiggen his soul", and the town motto "A noble spirit embiggens the smallest man", are creative jokes that embiggened the English vocabulary. (Especially because embiggen is one of the few words in English involving both a prefix and a suffix with the same basic form...)

And though fasten does exist as a deadjectival verb in English, it's from the wrong sense of fast, the one that means "fixed firmly in place". (The "rapid" sense of fast developed from the adverbial form of the same word, apparently -- that 's another story, though one that's worth telling in connection with the development of unpacked.)

I'm not trying to pick on Andrew Sullivan, who is a first-rate writer and needs to make no apologies for his command of the English language. In this case, he generalized a limited morphological pattern to a case where it's not sanctioned by history or current usage. This is something that almost all of us do from time to time. It's a symptom of the fact that we have brains that are capable of learning patterns and applying them in new ways. But when George Bush does it, one of Jacob Weisberg's staffers picks it up and publishes it as the latest Bushism.

[Note: I guess it's possible that Sullivan really meant fasten, in a sense like "cause to come to be fixed firmly in position". I doubt it, though: I think he meant to write hasten, and "fasten" slipped out of his fingers because of the association with "cause to happen faster". ]

[Update: John McChesney-Young pointed out by email that the key 'f' is only two over from the key 'h' on a standard qwerty keyboard; and 'f' is touch typed with the index finger of the left hand, while 'h' is produced with the index finger of the right hand; all of which makes a slip of hastenfasten easier. ]

Posted by Mark Liberman at 12:42 PM

Speak this way I do because wiser than I actually am I sound

Geoff's post on Yoda language got me thinking about language use in science fiction and fantasy in general. There's always been a small part of me that can't suspend disbelief long enough not to ask myself (or anyone unlucky enough to be sitting next to me) the obvious question: how is it that these disparate people from different planets (or from different cultures, or of different races, whatever) so often speak the same language? I realize, of course, that it's an extremely useful literary device; unless the plot of the episode/series/whatever depends somehow on there being some misunderstanding or rift between groups of people, then it would be a serious pain for both writer and reader/viewer for some random groups of people to speak a completely different language, requiring a translator when they encounter other groups, etc.

But Yoda's odd way of speaking raises an interesting question. What's the literary purpose, if any, behind having (characters like) Yoda speak so differently?

In response to what seems to be exactly this question with respect to Yoda, Mark Peters ("who writes about language for Verbatim and The Vocabula Review and keeps a Weblog on words www.wordlust.blogspot.com") is quoted as follows (in the Chicago Tribune article that Geoff cited in his post):

"In addition to making him sound like Kermit the Frog crossed with a fortune cookie, these Yodaisms mirror how Luke's world is being turned upside down (at times, literally, with the help of Jedi levitation)," Peters writes by e-mail. "If a green Muppet living in a swamp can be as smart and powerful as Yoda, and a mass murderer like Darth Vader can be Luke's (eventually) redeemable daddy, then maybe subjects, verbs and objects can play musical chairs too."

My own take is a little more cynical. I think it's just that Yoda is old and wise and therefore speaks in a way that sounds like he's saying something much deeper than he actually is. Where Obi-Wan Kenobi said fatherly-advice things to Luke like "You must learn the ways of the Force", Yoda says "Learn the ways of the Force you must". The fact that you have to spend a little more time untangling that into normal English word order makes you think harder about what Yoda is saying -- or at least to notice that he did in fact say something, which thereby makes it seem more important than what Obi-Wan said.

But my feeling is that this is all too specific to the question about Yoda. Personally, I tend to lump Yoda-speak together with a bunch of other curious language facts about the Star Wars universe. (Here, I simply use "English" to refer to the generic language of the humanoid characters. Not being a big enough fan of the new trilogy, most if not all of my references below are to the "original" trilogy, Episodes IV-VI.)

  • Overall, the members of the Rebellion speak in very casual American English, as do many of the foot soldiers in the Empire. But officers in the Empire tend to speak a more refined-sounding variety of (British) English. (Note that Princess Leia code-switches in Episode IV, A New Hope, depending on who she's talking to; otherwise, Obi-Wan Kenobi is apparently the only Brit on the good side.)

  • Wookies understand English. Only Han Solo, Lando Calrissian, and C-3PO seem to understand Wookie, or at least Chewbacca. (A similar point can be made of several other characters, like Jabba the Hut, Greedo, and so on.)

  • R2-D2 understands English, but only speaks in series of beeps, which only C-3PO can understand. (See footnote.)

  • It's possible to build a droid who can speak English and translate many languages, like C-3PO, yet they still build droids that don't speak English, like R2-D2.

  • When C-3PO actually overtly speaks other languages -- if I'm not mistaken, not until Episode VI, Return of the Jedi -- he speaks them with a heavy (and presumably amusing) English accent.

I'm sure that if I took more time I could come up with several more examples, but these will do. I think that the broader literary motivation behind all these examples is simply to emphasize the diversity of the Star Wars universe. In each specific case, of course, there are very practical local reasons for why things are the way they are: Han and Lando (and to a lesser extent, C-3PO) serve as translators for Chewie, R2 and C-3PO have a special bond and must always be together, and Episode VI was just full of cutesy crap like the music video performance for Jabba, the Ewoks, the revelation that Luke and Leia are not only siblings but twins, and Vader's redemption and ascendance into Jedi heaven.

Compare this with, for example, the much more direct handling of the issue of linguistic diversity in Hitchhiker's Guide. Immediately upon Arthur Dent's first encounter with the alien language Vogon, Ford Prefect gives him the incredibly useful Babel fish to put in his ear. In one swell foop, Douglas Adams neutralizes the literary problem that can be posed by intergalactic linguistic diversity and provides a compelling argument for the nonexistence of God. George Lucas could learn something from this.



Footnote
Luke interacts once with R2 in one brief scene in Episode IV without the help of C-3PO or the translation screen in his X-wing fighter, but it's really obvious that he's faking it. Luke's boarding his X-wing fighter and some guy is loading R2 into the back; I think the guy asks Luke if he wants another droid, and Luke says no thanks, that he and R2 have been through a lot together. As if to prove that, Luke asks R2 how he's doing. R2 responds in some series of beeps, and Luke smiles and says, "Good." For all we know, R2 has just said: "Well, since you asked, I really don't want to go up there in this thing because my head's kind of exposed and I'm afraid it'll get shot off." (Which, of course, it does.) (back)

[ Comments? ]

Posted by Eric Bakovic at 12:31 PM

Huzod?

Returning from Pittsburgh yesterday, I took a taxi from the airport in Philly, and happened to notice the following message printed on a small sticker pasted to the bullet-proof partition between the rear and front seats:

PARTITION MANUFACTURED BY SAFE AMERICA CORP
USING HUZOD MATERIAL

The word "huzod" puzzled me. Google finds nothing in response to a query for {"huzod material"}, so I'm still puzzled. There are 1,380 hits for "huzod", all apparently in Hungarian, which doesn't help.

Safe America Corp. is indeed listed by the New York City Taxi & Limousine Commission as a "partition manufacturer", located in Long Island City. (Most Philadelphia taxis are recycled New York taxis, brought here after they've exceeded their legal lifetime there...)

[Update: mystery solved. Ray Girvan emails:

Almost certainly a typo for HYZOD polycarbonate.

Ray
ex materials scientist

Ah, HYZOD.]

Posted by Mark Liberman at 10:10 AM

The hounds of ADS-L

While Mark Liberman was tracing the snowclone "once a X, always a X" back to a Ralph Waldo Emerson essay of 1856 ("once a crab always a crab"), the textual hounds of the American Dialect Society were baying on the trail, back through the 19th century and before.  If Latin counts, way way before.

(Mark maintained that I had asked for the earliest example.  Actually, I merely reported Larry Horn's asking for the earliest example.  Not that I'm not interested.  But I'm just a reporter here.)

As a reminder: this all started with Barry Popik, who was tracking the baseball slogan "Once a Dodger, always a Dodger" and had gotten back to 1934.  Then Larry Horn generalized the question, noting huge numbers of Google hits for various values of X in the formula.

Almost immediately, Ben Zimmer was onto the Making of America database, and had extracted an 1842 "Once a subject always a subject" and an 1844 "once a clergyman, always a clergyman"; for details (on this and other ADS-L postings), see the ADS-L archives.  Zimmer added some slightly later Dickensiana: "Once a captain, always a captain" (Bleak House) and "Once a gentleman, and always a gentleman" (Little Dorrit).

Bill Mullins chimed in with examples from newpapers: X = subject (1853), state (1866), judge  (1867), soldier (1878), sailor (1887), miner (1892), sailor (1909), cowboy (1909), Marine (1928).  Barry Popik added fireman (1884).  [I just now noticed the heavily masculine bent of these quotations.  Where is X = nurse, queen, mother, cook, maid, princess?]

Then Ben Zimmer pushed things further back, with X = captain from 1786 and 1792.

At this point Ben did what we ALL should have done at the beginning, just in case: check the OED.  Which -- surprising to me -- does indeed have a subentry for once a --, always a -- and its variants ("indicating that a particular role cannot be or is unlikely to be relinquished"), with cites in the spirit (though not quite the form) of the familiar formula, from 1566 and 1613.  But by 1622 we get "Once a knaue, and euer a knaue", and then X = captain in 1705 and knight in 1760 (with and joining the two parts of the formula).

There's no question that the formula was well established in the 19th century.  Nada O'Neal in e-mail to me supplies an 1846 example of "once a general, always a general" and an 1851 "once a Major always a Major, and once a Governor always a Governor".  By then the formula seems to have been fixed without an internal and or ever, which is to say, really fixed.  As icing, O'Neal adds a fool cite from Frazer's Golden Bough and a clergyman cite from the Sherlock Holmes story, "The Adventure of the Solitary Cyclist".

But it gets better, thanks to Roger Depledge (in e-mail to me).  First, Depledge tells me that the Oxford Dictionary of Quotations (5th ed.) backs up the OED -- big surprise -- in saying that "the formula is found from the early seventeenth century".  Then he goes on to quote an Italian legal site with the Latin proverb "Hodie et olim possessor, semper possessor" ('Today and once the owner, always the owner').  More impressively, there's Gaius the jurist (fl. 130-180), who's credited with the saying "Semel heres semper heres" ('Once an heir, always an heir').  This is really cool, given the "semel... semper" phonological parallelism, which (as Depledge said to me) suggests that the formula is probably older and probably oral.

Well, in Latin.  There's a hell of a long time between the 2nd century and the 17th.  The early OED cites (1566 and 1613) have the right sense, but are not yet fixed in form.  So, either the English formula developed on its own, or there was a 17th century source using Latin models with "semel... semper" (note: without an "et" 'and').  Or, of course, both.

Remember, I'm not a historical linguist, or a historian of language and culture.  I do syntax and morphology and variation.  The rest of this stuff, I'm just passing it on.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 12:40 AM

Yoda's syntax the Tribune analyzes; supply more details I will!

Nathan Bierma has a piece in the Chicago Tribune on the syntax of the strange variety of English spoken by Yoda in the Star Wars epic series (to be completed this week by the release of "The Revenge of the Sith"). It quotes me. I said too much in my messages to Nathan (my fault; when you talk to the press, you should talk briefly), and you'll see portions of two separate points I made to him. For those who are concerned, I can separate them, and supply further details.

One way to look at Yoda's syntax is that it shows signs of favoring OSV syntax (Object-Subject-Verb) as the basic order in the simple clause. In fact one could call it XSV syntax, where the X is whatever complement would appropriately go with the verb, whether it's an object or not. This is a fantastically rare kind of clausal syntax. Desmond C. Derbyshire and I began looking at the available information from fieldworkers in Amazonia in the late 1970s, and we found maybe eight or ten languages with OVS order in simple transitive clauses (Hixkaryana, which Des knows fluently, is one of them), and maybe four with OSV. In some cases the evidence for those was a bit shaky (some languages have highly variable constituent order, making it quite hard to tell from texts what the most straightforward and basic would be). Des and I thought that the evidence suggested certain little-known Amerindian languages of the Amazon basin might be OSV: Xavante, Apurinã, Urubu-Kaapor, Nadëb, and possibly Warao. But it really is a rare way for clauses to be organized.

But there is another way to see Yoda's syntax: you could see him as using SVO (or SVX) but favoring, almost to excess, certain special constructions that English allows only as stylistic variations in special discourse contexts. In English you can take not only an adjunct but also a predicative complement or a nonfinite catenative complement and prepose them (pop them at the front of the clause) for a special effect. (For the terminology I'm using here, see The Cambridge Grammar or A Student's Introduction.) Note the position of the underlined phrases in the following sentences (I show an additional underline where the phrases would normally have been):

Inspect the car they certainly did ___; they crawled all over it as if they thought we were smugglers.

Angry about it I may once have been ___, but I'm not any more.

The difference here is that the phrase that gets popped to the front does not have to belong logically to the main clause. It can be associated with a subordinate clause (linguists call this "extraction" from a subordinate clause):

Inspect the car I imagine they probably will ___; so don't try to bring in any drugs.

Angry about it I'm sure they probably think I am ___, but they're wrong.

In the first case inspect the car belongs logically in the will clause, not in the imagine clause. And in the second example, angry about it goes with the am clause, not the think clause that contains it, or the clause with am sure that contains that. It's two clauses down. Cases of this sort are rare in texts, but when they are found, they show it's not just a matter of reordering the members of one clause, it's a matter of extracting phrases right out of the clauses they logically belong to. In simple cases like Inspect the car they did, you can't tell whether it's extraction or just XSV ordering.

Yoda uses loads of sentences where phrases are popped to the front in their clauses, and sometimes it's possible to regard them as simply XSV ordering in a simple clause:

Always two there are, no more.
Truly wonderful, the mind of a child is.
Much to learn, you still have.
When nine hundred years old you reach, look as good you will not.

But Yoda also extracts verb phrases that are catenative complements of auxiliary verbs, so those auxiliary verbs are left stranded at the end of the sentence (this is what Nathan means by saying that sometimes Yoda "will separate helping verbs from main verbs"):

Agree with you the council does.
Your apprentice Skywalker will be.
Lost a planet Master Obi-Wan has.
Begun, the Clone Wars has.

But then, as Nathan correctly notes, Yoda also uses relatively straightforward English word order sometimes:

The shroud of the Dark Side has fallen.
War does not make one great.
You must unlearn what you have learned.
A Jedi uses the Force for knowledge and defense, never for attack.

His English is an odd mix, as if he were sometimes thinking in terms of XSV constituent order, and sometimes just over-using English stylistic variant orders, and sometimes getting the idiomatic English word order just right. But heck, he's an alien. I bet we wouldn't do so well learning whatever his first language was, the one that he learned nine hundred years ago at (one assumes) his mother's knee. (Hmm. Do the females of Yoda's species even have knees?)

Posted by Geoffrey K. Pullum at 12:31 AM

May 17, 2005

Cartoon in the snow

Ok, three cartoons on usage matters -- split infinitives, dangling participles, and stranded prepositions -- which makes a lot of sense for someone (like me) who's teaching a course entitled "Split infinitives, prepositions at end, and other horrors" (among those other horrors being dangling modifiers).  And now for something moderately different: the Eskimo words for snow, as treated by Bruce Eric Kaplan in The New Yorker of 3/21/05:

For what it's worth, Geoff "Mr. Great Eskimo Vocabulary Hoax" Pullum doesn't think this cartoon is particularly funny.  So the Language Log staff won't be getting him a framed print as a gift.  But some of us laughed.

Posted by Arnold Zwicky at 10:39 PM

"still unpacked": threat or menace?

Mark's description of the use of still unpacked to mean "not yet unpacked" as an "insidious overnegation" suggests he concurs with my characterization of the construction as a "glaring error."

Not so fast, says the OED's Jesse Sheidlower: "unpacked doesn't mean what you think it means."

Jesse and Ben Zimmer both wrote to point out that the construction is extremely common, not just on the Web, as Mark showed, but in the OED's files and in books by well-known authors that Ben located in an Amazon search. A few examples of each type:

1994 B. Anderson All Nice Girls ii. 22 The brown suitcase returned with his receipt from Patient's Effects remained unpacked, stowed deep in Win's wardrobe.

1994 Sports Illustr. 29 Aug. 53 Patty comes into the garage, seeking an extension cord from one of the many still unpacked boxes.

1995 N. Blincoe Acid Casuals i. 4 She stepped over the suitcases that remained unpacked on the floor of the apartment.

2002 Chron. Higher Educ. 5 Apr. A13 His small, semidetached house there has been sold--there was no point in keeping it....Besides, he grins, it's full of unpacked boxes from 1984.

2004 National Rev. 31 May 52, I have sworn never to move house again, having boxes still unpacked from our last move twelve years ago.

There are boxes of books, still unpacked; it is obviously newly rented.(Race, by Studs Terkel)

Then Mother was bored with music, searching now for one of her books. So many books still unpacked. (Blonde: A Novel, by Joyce Carol Oates)

I recall that I was sitting on the edge of a chair in our still-unpacked kitchen, holding my huge body together with both hands as we listened to the radio. (The Poisonwood Bible, by Barbara Kingsolver)

What's more, the OED actually gives an entry for a related sense of unpack to mean "not taken out of a pack or parcel," giving a cite from 1721:

Loads of ill Pictures, and worse Books.., lye unpacked and unthought of when they come into the Country.

As Ben puts it, not unreasonably, "How many examples would you need to see before considering this to be a legitimate usage?"

Well, "legitimate" comes with a lot of ideological lint clinging to it, but my sense is still that this is an error, if a common and inviting one. After all, it's hard to see how un- could be plausibly reanalyzed as a mere intensifier; more likely this is an idiosyncratic sort of haplology, where the form unpacked stands in for ununpacked. The decisive question, I suggested to Jesse and Ben, would be whether the writers of these passages would defend the usage if the apparently anomalous use of unpacked were pointed out to them. To which Jesse responded:

I did try to contact the authors of the quotes I provided. The only one I managed to reach was John Derbyshire, who wrote the line I quoted from National Review, so he's conservative, and spoke with a very plummy RP British accent. When I first asked him he didn't see a problem, but when I pointed out unpacked he paused for a very long time, then said, "It's a mistake," and, in a manner typical of linguistic conservatives, he said, "I wouldn't have noticed it, but it's wrong, I won't do it again, I've learned something, it's my editor's fault," etc.

I've asked several more people with my constructed sentence, who continued the trend of not having a problem with it. One was a fact-checker at The New Yorker, who thought it was fine, still thought it was fine when I asked about unpacked, and only when I said, "the issue is that unpacked is here being used to mean 'packed'" did he say, "Oh, yes, that doesn't make any sense at all."

The one exception was an editor I know at Slate, who immediately said "unpackedisn't used right."

This seems to support my contention that few if any people are actually willing to stand up and defend their use of unpacked to mean ununpacked once the apparent illogicality of the construction is made clear. Note that by "apparent illogicality," I don't mean according to the pseudo-logic that prescriptivists invoke to justify their condemnations of double negation and the like; this one is clearly inconsistent with the morphological rules of the speakers' own grammars, unless they're willing to countenance it as an idiosyncratic exception.

But is that conclusive? Jesse and Ben would say, as I understand it, that once a form is widely used with a particular meaning, it merits a lexical entry, whether or not its users are willing to go to the mattresses on its behalf. I would argue that these are more on the order of performance errors, or of overnegations like Mark's example of don't fail to miss this one. Have Ben and Jesse fallen prey to loosey-goosey permissivism? Or am I in a state of stiff-necked lexicographical denial?

Posted by Geoff Nunberg at 08:11 PM

Antique snowclones

Arnold Zwicky asked for the earliest example of "once a/an *, always a/an *". The oldest one I can find is in a curious passage of Ralph Waldo Emerson's racist essay entitled Race (1856):

When it is considered what humanity, what resources of mental and moral power, the traits of the blond race betoken,—its accession to empire marks a new and finer epoch, wherein the old mineral force shall be subjugated at last by humanity, and shall plough in its furrow henceforward. It is not a final race, once a crab always a crab, but a race with a future. [emphasis added].

Can anyone take it further back?

[Update: well, as usual, it was a mistake not to check the OED. Benjamin Zimmer did, and sent in the results:

16. Proverb. _once a --, always a --_ and variants, indicating that a particular role cannot be or is unlikely to be relinquished.

1566 W. P. tr. C. S. Curio Pasquine in Traunce f. 107v, The olde rule: he that is once a false knaue, it is maruell if euer he be honest man after.
1613 H. PARROT Laquei Ridiculosi II. cxxi sig. N2v, Well you may change your name, But once a Whoore, you should be still the same.
1622 J. MABBE tr. M. Aleman Guzman d'Alf. I. I. II. i. 7 Once a knaue, and euer a knaue:..For he that hath once beene naught, is presumed to bee so still.
1696 Cornish Comedy IV. i. 30 'I'll do so no more.' 'Not till next time; once a Villain, and always so.'
1705 P. A. MOTTEUX Amorous Miser III. 58 Once a Captain and always a Captain.
1760 W. KENRICK Falstaff's Wedding IV. i. 52 As to the matter of knighthood; once a knight and always a knight, you know.

]

[And here's some bilingual commentary by Robert Dixon, from Canidia (1683), The Third Part, Canto XVI:

722 Inheritances must not ascend, I pray,
723 Then hang poor Parents out of the way.
724 To what Absurdities will you hale us?
725 Semel malus semper præsumitur esse malus.
726 There is a saying that we have,
727 Once a Knave, and ever a Knave.
728 It is a Saying of the Devil,
729 Once Evil, and ever Evil.
730 It is a Saying of Robin-Hood,
731 Once good, and ever good.
732 When will Follies have an End,
733 If that which is bad can never mend?
734 'Tis a Saying of as good Delivery,
735 Qui nescit dissimulare, nescit vivere.
736 Vox Populi, vox Dei; How so?
737 Then they may let all Truth go.

Bouvier's 1856 Law Dictionary quotes:

Qui semel malus, semper prasumitur esse malus in eodem genere. He who is once bad, is presumed to be always so in the same degree. Cro. Car. 317.

]

Posted by Mark Liberman at 05:18 PM

Still unpacked after all these years

As Geoff Nunberg has pointed out, it's disturbingly natural to assume that "still unpacked" means "not yet unpacked". This overnegation may be even more insidious than "fail to miss."

A Google search for {"still unpacked"} finds 785 examples, and almost every single one of them is backwards:

Somewhere, in one of 30 boxes of books still unpacked, is my copy of Genealogy.
There are still unpacked boxes on the floor...: but the computer is assembled and I'm back to limited blogging, probably for about another week.
There are the boxes still unpacked along the wall where you dropped them upon moving in four months ago.
42 boxes here... about 1/4 of which are still, 7 months later, still unpacked... guess that's why I never got the unpacking cardio workout you did!

I left him; I left the box, still unpacked at our old house.
Basil had unearthed an old office stereo system from the still unpacked Hawaii boxes in the garage...
A crate of this pottery, still unpacked, had just been delivered to Pompeii from Gaul when the town is buried by the eruption of Vesuvius.
At first, I scramble to find the binoculars I have stashed somewhere in my still unpacked bags ...
I've been scrubbing floors, unpacking boxes of still-unpacked-debris, rearranging closets and doing laundry...
[and so on...]

This has reached the point where "unpacked" by itself often now means "not unpacked":

Amid the clutter and unpacked boxes, Charlotte schleps her laptop around the house, trying to get a grip on the erotic novel she's working on.
Are you the kind of person who puts unpacked boxes in the basement of your new home to be unpacked at a later date ( 5 years later!)?
I finally got fed up that our office is a labyrinth of unpacked boxes from when we moved like five months ago, so to make a point, I hid the computer chair and stacked a bunch of boxes in front of the computer, so that we can't do anything on the computer until the boxes are unpacked.
Unpacked boxes make me crazy, so I dove in head first and didn't come up for air until it was almost all done.
Unpacked boxes from previous moves may be filled with items you will never use again.
I knew that the Box That Hath Never Been Unpacked Ever contained my Precious Moments collection ... and I thought that the Other Unpacked Boxes were mostly my yarn stash and books.

Posted by Mark Liberman at 05:01 PM

Ending with a preposition

grammarwizard Beware, those of you who would prohibit stranded prepositions!  That's one lesson you could draw from this cartoon, "Grammar Wizard", by Nicholas Gurewitch, from his series "Perry Bible Fellowship" (check it out), and reproduced here with permission.




My thanks to Jerry Zee and Matt Maguire, who led me to Gurewitch's site.  As it happens, Gurewitch lives east of Rochester, New York, and I have good friends who live in Perry, New York, south of Rochester (who were visiting me in Palo Alto when I discovered this cartoon).  I wondered if the Perry Bible Fellowship was connected to this Perry, but no, the PBF in question is a church in Perry, Maine.  So much for wonderful coincidences.
Posted by Arnold Zwicky at 12:36 PM

Once a snowclone, always a snowclone

It's been a while since we talked about snowclones; "every schoolboy knows...", on 2/27/05, seems to have been the most recent formula to get attention here.

Now, over on the American Dialect Society mailing list, the snowclone "once a X, always a X" has come up, in Barry Popik's pursuit of the instance "Once a Dodger, always a Dodger", which he's (so far) traced back to 1934. You can follow the discussion on the archives stored at the ADS site.

Today Larry Horn took to wondering about the general formula:

I wonder if there are earlier cites for "Once a(n) X, always a(n) X". When I was a young'un, I remember being taught on my professor's knee "Once a phoneme, always a phoneme". That was after 1934, though.

I too was taught on my professor's (figurative) knee about the "once a phoneme, always a phoneme" principle, but that was about 25 years after 1934. And I too wonder about other instantiations of the formula.

Update from Larry Horn, a few hours later:

There are 886K google hits for "once a" "always a", featuring on the first couple of pages no Dodgers or phonemes, but a motley collection of marines, cheaters, friends, Caesarean[s] (which come to think of it I've read about), deserters, tuba players, and such. Plus another 32.4K for "once an" "always an", featuring Arabs, Indians, English majors, orphans, and addicts. Some of these (marines, Caesareans, addicts) are no doubt proverbial, others are formed productively as needed. Anyone want to tackle the issue of first cite for the construction type?

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 12:11 PM

Indirect speech acts in Detroit

From Elmore Leonard's City Primeval (1980). The characters are Sandy Stanton and Clement Mansell, the book's designated bad guys.

Clement lay around the rest of the day while he thought and stared out at Motor City. Sandy sat at the desk to write a letter to her mother in French Lick, Indiana, that began "Dear Mom, The weather has been very warm for October, but I don't mind it a bit as I hate cold weather. Brrr." And stopped there. She rattled the ballpoint pen between her front teeth until Clement told her to, goddamn-it, cut it out.

She went over and turned on the TV and said, "Hey, Nashville on the Road. . . my God, anybody ever tell you you look like Marty Robbins? You and him could be twin brothers." Clement didn't answer. Sandy turned to him again after a few minutes and said, "That doesn't make any sense, does it? Marty goes, 'Would you like to sing another song for us?' And Donna Fargo -- you hear her? -- she goes, 'I can't hardly pass up an offer like that.' What offer? Marty didn't offer her nothing." Clement was staring at her, hard. Sandy got dressed and left the apartment without saying another word.

Posted by Mark Liberman at 06:52 AM

Missing Links

I've been too busy to keep up recently. Here are a few of the many good things I've missed, based on random stabs at our blogroll...

Some ruminations on butterfly by Q. Pheevr at A Roguish Chrestomathy, along with a picture of Freddy Sourcream (who, I suspect, must have been the great-great-grandfather of Suzy Creamcheese).

A new program at NEH on documentation of endangered languages, for which Language Hat has a set of links (and keep clicking for lots more great stuff in recent Hat posts).

An observation about Coleridge's pants by Edward Cook at Ralph the Sacred River, and a biographical sketch of George Psalmanazar at the same site.

I was unable to find any pictures of Coleridge's pants, with or without butterflies. However, at Technologies du Language, Jean Veronis points out that Jean-Noël Jeanneney, while defying the crushing domination of American culture, has simultaneously embraced the history of France in America, if not America itself; and Google image search turned up a photo of M. Jeanneney enduring if not embracing the handgrip of James Hadley Billington, head of the Library of Congress.

A link from Des von Bladet to a BBC News quotation of Luxembourg Prime Minister (and EU president) Jean-Claude Juncker, on the subject of the EU budget: "I'm happy to be able to announce to you an agreement, in the sense that it is clear that we don't agree." As Des says, "Persons often agree with us in just that way or manner!"

And yes, this is the same M. Juncker who said «Je dis ‘oui’ parce que l’Europe ne doit pas se soumettre devant la virulence de l’attaque des autres» ["I say yes because Europe must not submit to the virulence of others' attacks"], explaining his reasons for supporting M. Jeanneney's proposal of a European Digital Library as a way to resist Google's "attacks" on European culture.

[And here's one that came in this morning by email: Geoff Pullum in a guest appearance on Nathan Bierma's new blog Inflections, discussing the typology of Yoda's syntax. ]

Posted by Mark Liberman at 05:08 AM

May 16, 2005

a bad 'un

Mark's post about the how's-that-again headline "Cedar Point Ride Can't Get Untracked" put me in mind of a story I heard many years ago from my friend Dan Menaker, who's now the head of Random House but in those days was a lowly fiction checker at the New Yorker. (Yes, they have fiction checkers too -- you wouldn't want a character in a short story getting on a Fifth Avenue bus at 59th street and getting off at 86th.) Reading one story, Dan ran across a sentence with a glaring error that had escaped three other editors. Or at least it's glaring after the fact -- it can be hard to spot, even when you've been tipped off in advance to look for it. The sentence read:

They had only just moved in; their boxes lay on the kitchen floor, still unpacked.

What the writer meant to say, of course, was "not yet unpacked," or more accurately if less grammatically, "still ununpacked." But most people take a moment to realize that the sentence doesn't actually mean that. And indeed, if Menaker hadn't spotted the mistake, it would have made it into the pages of the New Yorker. What's going on here?

Posted by Geoff Nunberg at 11:22 PM

The Dangling Participles

Following on Geoff Pullum's musings on the Fellowship of the Predicative Adjunct, via Ben Zimmer: Jeff MacNelly's cartoon Shoe of 5/13/05.



Posted by Arnold Zwicky at 07:56 PM

Either too high or too low

Mark Liberman posted here recently about instances of both that were moved either to the left (and upward in constituent structure) or to the right (and downward), and Mark noted the parallel to similar examples with either. Now Philip Hofmeister has assembled a brief summary of his recent work on either in non-parallel structures, in his blog entry for May 16, 2005.

Posted by Arnold Zwicky at 06:29 PM

May 15, 2005

Pass the Oat-Fiber Cubic

At the wonderful Dai Thanh supermarket, a large Vietnamese grocery store in San José, Barbara recently bought a packet of some pretty good light, flaky, square-shaped, sesame-flavored cookies made in China. However, as you know, we here at Language Log are gifted with the ability to tell whether new words will catch on, and I am here to tell you that the name of these cookies is not going to catch on. What the manufacturers decided to call them was Oat-Fiber Cubic Pastry. I can't know exactly how I know (it's subtle), but I know that no one is ever going to say, "Mmm! Pass the Oat-Fiber Cubic!"

I'll tell you a very strange thing about them, though. The ingredients are (1) wheat flour, (2) vegetable from soybean and cottonseed, (3) sugar, (4) wheat shoots, (5) sesame, and (6) wheat fiber. No oats in them at all. What's going on? Did they think oats was a more sexy word? Is there only one word for "oats" and "wheat" in Chinese? It can hardly be a translation mistake, unless the translation for the name was done by people who were not involved in the translation of the ingredients list. There is a real puzzle here. I'm going to take the original package in to Language Log Plaza and have the Chinese text analyzed by some of our in-house sinologists. I'll be stopping by Poser's office first. Whatever I learn will be reported right here. Watch this space.

Posted by Geoffrey K. Pullum at 02:12 PM

Linguistic pseudoscience lives in academia!

I'm not talking here about the silly things pundits say about language, or about how your linguistic theory (as opposed to mine) is so far out on the fringe that it's actually pseudoscientific. Here the topic is genuine linguistic pseudoscience, in this case claims that people can remember (and speak) languages that they spoke in their previous incarnations. Usually such topics are reported in breathless tones in the popular press or in depressingly popular books written by psychiatrists or non-academics, but this example, though in the popular press, is by someone with academic and maybe even linguistic credentials. A friend of Bill Poser's brought it to Bill's attention, and Bill passed it on to me because I am (I think) the only Language Logger who has published in a serious linguistics journal on languages of reincarnation (see an earlier Language Log discussion of this topic here).

The article, entitled "The Practical Linguist/Recalling Past Languages from Past Lives", was written by Marshall R. Childs, Ed.D., who "teaches TESOL...and other subjects, such as psychology, at Temple University Japan, Tokyo", as a Special to The Daily Yomiuri, an English-language newspaper published by the publisher of a major Japanese newspaper, Yomiuri Shimbun. Childs is not only an academic, but apparently an applied linguist, so his credulity is more surprising than it would be if he were, say, a physicist. (Lots of physicists, for some reason, seem to be eager to believe just about any nutty thing about language. But psychiatrists are the highly-educated champions in this domain.)

Childs says that he is "keeping an open mind on the existence and nature of reincarnation", but he is obviously much impressed by the work of a psychiatrist named Brian L. Weiss, who uses hypnoptic regression to "past lives" to treat his patients. Childs goes on to discuss xenoglossy, defined as the ability to speak a language that you haven't learned in your present lifetime, and its most-published proponent, the psychiatrist Ian Stevenson of the University of Virginia. Childs is wowed by both doctors' eminence and expertise, observing that "Scholars as distinguished as Weiss and Stevenson do not normally embrace mystical ideas."

Here he is sadly mistaken. He describes in some detail, quite uncritically, the case of Gretchen, as reported in Stevenson's 1984 book Unlearned Languages: New Studies in Xenoglossy (published by The University Press of Virginia!). The few details Stevenson "discovered" about the life of this purported 19th-century German girl are wildly improbable, and he offers bizarre speculations about why her German (as "remembered" by his modern subject) is so very poor. Of course absolute proof of non-reincarnation in this case and others is unavailable, and one can always account for weaknesses in the subjects' language skills by the presumed difficulty of speaking a language you haven't spoken for the past hundred and fifty years or so -- or, in the case of one of Weiss's patients, 4,000 years. But abandoning skepticism in these cases is rash, especially as most of the questions addressed to "Gretchen" in German were yes/no questions -- easily recognizable as such by their intonation, even if you don't know more than a few isolated words of German -- and her answers ("Ja", "Nein") could not be checked for accuracy, since she was the only person present at the interviews who could reasonably be expected to know anything about the details of her 19th-century "life". Now if only they had managed to find a modern reincarnation of one of "Gretchen's" relatives who could be age-regressed to the same earlier context to carry on a conversation with her...

Childs is impressed with "Gretchen's" "championship performance", though. I can only conclude that he either doesn't know any German or simply didn't bother to study the transcripts that Stevenson provides as evidence of her ability to speak the language. I've done this (in a 1987 article in The Skeptical Inquirer), and it's all too clear that "Gretchen" does not in fact know more than a handful of isolated words of German, and that she had both opportunity and motivation to learn these words in her current lifetime. Childs also wonders why historians, linguists, and anthropologists are not "flocking to the doors of people who can give testimony from previous lives."

Finally, he says that his Practical Linguist column is intended to "promote fruitful collaboration among theorists and practitioners of language teaching in Japan", and that he plans to follow this column with others reporting on similarly "amazing language learners". Heaven help the theorists and practitioners of language teaching in Japan.

Posted by Sally Thomason at 08:59 AM

This is, like, such total crap?

For Poetry Month, back in April, NPR featured Taylor Mali reading "Totally Like Whatever", which starts

In case you hadn't noticed,
it has somehow become uncool
to sound like you know what you're talking about?
Or believe strongly in what you're saying?
Invisible question marks and parenthetical (you know?)'s
have been attaching themselves to the ends of our sentences?
Even when those sentences aren't, like, questions? You know?

The idea that "uptalk" and tag questions are weak and self-doubting is a commonplace one, you know?

But it's not necessarily true? In fact, it may be completely false? Mali's performance reminds me of a radio ad I once heard, in which a hyper-aggressive car salesman deployed repeated final rises like a sonic finger poking you in the chest.

So when I hear him saying things like

Declarative sentences -- so-called
because they used to, like, DECLARE things to be true
as opposed to other things which were, like, not -
have been infected by a totally hip
and tragically cool interrogative tone? You know?

it's tempting to poke a literal finger in his direction while explaining, rising on every phrase, that

That is, like, such total crap?
You've got no idea whatever
about how people actually, like,
                 communicate, you know?

However, this would not only be impolite, it's unnecessary. If you want to hear what a strong, agressive use of tag questions and final rises sounds like, you could hardly do better than to listen to Taylor Mali subverting his own poem's message by reading it.

In a couple of earlier posts, I've mentioned the story of tag questions. Robin Lakoff wrote in her book Language and Woman's Place that tags are "associated with a desire for confirmation or approval which signals a lack of self-confidence in the speaker." However, later studies of the actual distribution of tag questions in a spoken-language corpus found that they were mainly used by "powerful" speakers, those "institutionally responsible for the conduct of the talk" -- teachers, doctors, talk-show hosts and so on. I also mentioned Cynthia McLemore's observation that in a University of Texas sorority, final rises were used in chapter meetings to signal the presentation of significant new information by institutionally powerful individuals.

The latest issue of IJCL has a paper that provides another nail for the coffin of the idea that final rises are a sign of inadequate conviction. (Now if we can only get get it to lie down long enough to get the lid on...)

The paper is Winnie Cheng and Martin Warren, "// ↗ CAN i help you //: The use of rise and rise-fall tones in the Hong Kong Corpus of Spoken English". International Journal of Corpus Linguistics, 10(1), 2005. (link). Here's the abstract:

This paper examines the use of two tones by speakers across a variety of discourse types in the Hong Kong Corpus of Spoken English (HKCSE). Specifically, it focuses on the use of the rise and rise-fall tones by speakers to assert dominance and control in different discourse types. Brazil (1997) argues that the use of the rise and the rise-fall tones is a means of exerting dominance and control at certain points in the discourse and that while conversational participants have the option to freely exchange this role throughout the discourse, in other kinds of discourse such behaviour would be seen to be usurping the role of the designated dominant speaker. The findings suggest that the choice of certain tones is determined by both the discourse type and the designated roles of the speakers, but is not confined to the native speakers or determined by gender.

David Brazil was a British linguist who died in 1995, and laid out an interesting theory of the discourse functions of intonation back in 1985. One of his ideas was that what he called "rise tones" can be used to "assert dominance and control" by holding the floor, by exerting pressure on the hearer to respond, or by reminding the hearer(s) of common ground.

But Cheng and Warren don't just present a theory with illustrative examples, they also count things. For example, in four business meetings, two chaired by women and two by men, the chairs used rise tones almost three times more often than the other participants did (329 times vs. 112 times). In conversations between academic supervisors and their supervisees, the supervisors used rise tones almost seven times more often than the supervisees (765 times vs. 117 times).

Cheng and Warren summarize their findings by placing different sorts of discourses on a scale with respect to the distribution of rise tones:

Conversations are at one extreme end where the use of the rise tone is chosen equally by participants enjoying equal status. As we move towards the other end of the continuum, we find that the degree to which designated dominant speakers use the rise tone more frequently than the other discourse participants steadily increases. The first discourse type on the continuum is the service encounter, followed by placement interview and informal office talk, next is the business meeting and, finally, academic supervision which is the furthest removed from conversation in this respect.

The authors also speculate that

...the effect of choosing to use the rise tone is probably cumulative in that the isolated use of a rise tone by a speaker might pass unnoticed, whereas repeated use might be perceived by the hearer as the assertion of dominance and control.

So maybe the problem with "Valley Girls" and other youth of the past couple of decades is really that they're, like, totally self-confident and socially aggressive? You know?

[Other Language Log "uptalk" posts:

Uptalk uptick (12/15/2005)
Angry Rises (2/11/2006)
Further thoughts on "the Affect" (3/22/2006)
Uptalk is not HRT (3/28/2006)

]

Posted by Mark Liberman at 08:03 AM

May 14, 2005

More things that aren't eggcorns

I've been entering occasional items on the eggcorn database with the labels "questionable" or "not an eggcorn", but I can't of course enter all the dubious examples that are suggested to me, or otherwise the database would become an inventory of non-eggcorns rather than eggcorns.  I try to restrict these entries to items that are very frequently suggested as eggcorns, but seem to be better analyzed as classical malapropisms that are not reanalytic, or as blends, simple (though common) misspellings, phonological variants spelled "by ear", and the like.  My collection includes some examples that do (or might) involve reanalysis, but where the reanalysis is motivated not by semantic considerations but by morphological (or morphophonological) considerations, involving some kind of analogy to other words.  The famous nucular (discussed here on many occasions and in many contexts) illustrates the type: this version of the word looks like it has the analysis nuc-ul-ar, similar to molecular and other words.

Here I report on two more examples of this type: doctorial (for doctoral) and the rather more interesting overature (for overture).  And I consider pronunciations like chicking (for chicken) and childring (for children), which look like simple hypercorrections, but might also involve reanalysis favoring the suffix -ing.

Favorites from my recent "not eggcorn" collections:

Blend: a wholescale ban...  [AMZ in class, 4/15/05: wholesale + full-scale]

Simple misspellings (recorded in the database): without undo pressure and the reverse, to undue the buttons on his jeans

Phonological variant spelled "by ear": If you have any questions please fill free to ask away. [Reported by Fritz Juengling on ADS-L, 5/11/05, eliciting much discussion of the laxing of /i/ in various contexts (especially before /l/ and before /g/) in various American varieties of English.]

But on to morpho(phono)logical reanalyses. 

First case, the very common doctorial (ca. 35,100 raw Google web hits), as in the following:

Post Doctorial & Research Scientist Lab Technicians | Graduate Students | Undergraduate Researchers |... (www.yale.edu/breaker/postdocs.htm)

(This one has even made the OED.  OED2 has the variant doctorial from 1729 through 1843, with cites all in university contexts.  Plus an occurrence of doctorially from Trollope (1858) that seems to refer to physicians.  By the way, OED Online, draft of December 2003, has nucular 'nuclear' with cites beginning in 1943. )

English has alternative suffixations -al and -i-al.  The first places stress either on the penult of the stem (orIGin-al, with source ORigin; VIRgin-al and proFESSion-al already have the stress on this syllable in the source), or on the final syllable (diaLECTal, with source DIalect).  For the source word DOCtor, the standard derivative in -al has the first pattern, DOCtoral, though a non-standard stressing docTORal also occurs (parallel to standard maYORal).  Suffixing in -i-al requires final stress on the stem: profesSORial, with source proFESSor; meMORial, with source MEMory.

So we have doctor, which looks like it has a morphological component -or and is semantically parallel to professor, ambassador, senator, and many others, almost all of which take -i-al rather than -al.  (I think that pastoral is the only reasonably common adjective of this class besides mayor that takes -al, and its connection to pastor is not terribly clear.)  So doctor shifts to be like the rest of the herd.  (The stressing docTORal might have had a hand.)

Second case, the much less common (though not fabulously rare) overature -- ca. 5,600 raw Google web hits (a fair number for commercial products with the name Overature) -- as in the following:

William Tell Overature Yankee Doodle. Military Themes Airforce Marine Navy Semper Fidelis Taps... (www.sministry.org/PatrioticMusic.htm)

Nouns in -ture fall into two classes: a large class in which the -ture is preceded by an unstressed syllable, usually spelled with an a (caricature, miniature, signature, temperature, literature, curvature), though sometimes with i (expenditure, furniture); and a smaller class in which the -ture is preceded by a stressed syllable ending in a consonant (dePARture, adVENture, manuFACture).  Now the word overture has -ture preceded by a syllable that ends in a consonant but is not stressed.  The way to preserve the stressing while accommodating to the prevailing patterns is then to insert an unstressed vowel, which would normally be spelled with an a: overature!

(No, the OED has no cites for overature.)

Not eggcorns, but still reshapings, and interesting in their own right.

One last case.  Every so often on ADS-L the topic of hypercorrect -ing comes up; in this case, as for nucular, doctorial, and overature, the new version appears first in pronunciation and only later in spelling (if at all).  We've assembled examples of all of the following: chicking for chicken, childring for children, kitching for kitchen, and cushing for cushion, and a little while ago (from Wilson Gray) been taking care of for been taken care of (plus an assortment of other instances of velar nasal for alveolar, as in ongions for onions and mongsters for monsters, and at least one instance of velar nasal for oral stop before /n/, as in prengnant).  The obvious source of chicking and its kin is the instruction not to "drop your g's" in -ing, with the resulting "restoration" of engma to words that didn't have it in the first place.

But in addition, or in fact instead, the velar nasal might be appearing because people (even people who are not anxious about their g-dropping) are trying to find as much morphological structure as possible, and end up seeing -ing in places where it's not etymologically justified.  That is: a partial reanalysis.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 04:13 PM

Moving both forward and backward

A 5/13/2005 NYT article by Ian Fisher and Laurie Goodstein, running under the headline "Pope Names American to Be Guardian of Church Doctrine", features some very odd writing. In particular, the authors seem to have a thing for floating conjunctive both past a clause-initial participle. There is one example in the very first sentence:

Acting both symbolically and consolidating his young rule, Benedict XVI announced today his first major public acts as pope: He named an American archbishop to be the guardian of church doctrine and he said he would speed up the process to make his popular predecessor, John Paul II, a saint.

And then another, a few paragraphs later:

And so, the double-barreled announcements today marked the most significant public day so far in his brief reign, looking both forward to his own future and tying up unfinished business around his old boss and predecessor, John Paul II, that is likely to be well-received among Catholics.

This placement of both is so weird, to my taste, that it must be either the leading edge of a syntactic change that I've completely missed, or a particularly egregious misapplication of some prescriptive stricture that I've never heard of.

The general idea of conjunctive both floating around a bit is perfectly reasonable, in some cases. The canonical pattern is

both [ .......... ] and [ .......... ]
He both [leaves for work] and [arrives home] in darkness.

but (as CGEL points out on p. 1307) both can sometimes wander rightwards inside the first conjunct:

[ ... both ........ ] and [ .......... ]
[He both overslept] and [his bus was late]

and also leftwards outside it:

both ... [ .......... ] and [ .......... ]
This was made clear both to [the men] and [their employers]

This both-floating makes some people uneasy wherever it occurs, but at least the second case has been around for a long time, as in these examples from the OED:

1649 SELDEN Laws Eng. I. lxvii. 176 (1739) The Guardian in Socage remaineth accomptant to the Heir, for all profits both of Land and Marriage.
1766 GOLDSM. Vic. W. ii. (1806) 7, I looked upon this as a masterpiece both for argument and style.
1874 THEARLE Naval Archit. 116 The pillaring of a frame adds..to its strength, by acting both as a strut and a tie.

With determiners, the only real options are to float both to the left ("both her fork and spoon") or to repeat the determiner ("both her fork and her spoon"), since both is not allowed inside the determiner in modern English ("*her both fork and spoon"). The pattern both DET [...] and [...] seems to be several times more common, across the board, than the pattern both DET [...] and DET [...]:

{"both the first and last"} 14,200 {"both the first and the last"} 4,320
{"both our clients and candidates"} 5,250 {"both our clients and our candidates"} 1,350
{"both our internal and external"} 3,570 {"both our internal and our external"} 29
{"both the singles and doubles"} 772 {"both the singles and the doubles"} 189
{"both his|her life and death"} 452 {"both his|her life and his|her death"} 260
{"both the full and half"} 295 {"both the full and the half"} 30
{"both my home and business"} 101 {"both my home and my business"} 46

With prepositions, left-floated both is now fairly common but by no means dominant, as these Google counts indicate:

  both for __ and __ both for __ and for __ for both __ and __ for both __ and for __
men / women
11,900
1,400
644,000
107
boys / girls
1,080
82
128,000
24
hardware / software
412
19
29,500
0
home / office
1,860
1
14,500
0
audio / video
407
9
7,690
2
color|colour /
black and white
21
2
736
0
food / drink
37
5
405
0

The rightmost column (in blue) might be analyzed as the third column with both floated into the first conjunct: "both for boys and for girls" → "for both boys and for girls". It feels like a mistake to me; and the very low counts (roughly 1 in 6,000 overall, or 1 in 12 relative to the cases with repeated for) suggest that most people agree.

In any case, the two WTF examples from the NYT article that started out this post are cases of right-floating both, and these seem to be as rare as my reaction suggests they should be. After a fair amount of searching, I was not able to find any other examples with participles, and only one (marginal) example of any kind at all (aside from the possible prepositional examples above):

A state collapses when both its repressive and ideological apparatuses disintegrate and when the central authority is unable to check rampant corruption .

Perhaps the counts for the P both [...] and P [...] case are representative of the frequency of such patterns, in which case you'd have to sift a lot of ore to find any, or else think of a very clever way to frame the search.

I guess there's another possible explanation, besides language change in progress or a copy editor with delusions; perhaps this article, like The Dante Club, has slipped through some compositional wormhole from a parallel universe where linguistic norms are slightly different. Certainly there are some other constructions in the same article that took me aback. For instance, consider the second sentence:

Until he was elected pope last month, Benedict XVI, then Cardinal Joseph Ratzinger, held for 24 years the doctrinal job, one of the most powerful and contentious posts in the Roman Catholic church.

I guess the following appositive ought to make "the doctrinal job" heavy enough to shift to the end of its clause, but it still seems wrong to me.

[By the way, Arnold Zwicky mentions the analogous rightward shift of either in his post on Astounding coordinations (continued), including a terrific example from Darwin's Origin of Species: "... the stripes are either plainer or appear more commonly in the young". It's easier to find examples of this type, perhaps because they're more commonly accepted and used, or perhaps because it's easier to eliminate irrelevant cases from the search than it is with both. Here are a few:

All people aged 16 - 74 who are either resident in the area or work in the area...
It also seems to be more common on animals which are either sickly or have sustained some sort of injury.
Sarice are ankle length and are either sleeveless or have long sleeves...
These are people who are either paid or will receive some form of compensation for praising the product.
Of students admitted in the past five years, about 80% have either graduated or are in good standing...

]

Posted by Mark Liberman at 04:46 AM

May 13, 2005

The essence of this post, which you are about to read, ...

Check out the caption underneath the photgraph to the right (from an article in today's NYT about the Pentagon's military base-shutting proposal). Does it strike you as odd? That nonrestrictive relative clause (which suffered the biggest loss in terms of jobs, modifying Connecticut) just throws me, but I'm not quite sure why. It's not like any old modifier in this same context will throw me; for example, this is a definite improvement:

Senator Joe Lieberman of Connecticut, the state that suffered the biggest loss in terms of jobs, ...






[ Comments? ]

Posted by Eric Bakovic at 02:15 PM

The syntax/prosody interface: a cartoon version

Geoffrey Nathan emailed pointers to a couple of recent PreTeena cartoons about syntactic ambiguity in the context of a fifth-grade dance.

The first one (from May 4) is about decoration:

When Teena's friend asks "Do they need people to decorate?", she has in mind a structure analogous to "Do you want somebody to leave?" But Teena answers as if the structure were like that of "Do you want somebody to love?"

Translated into heavy English, the first version is something like "Do the dance organizers need people such that those people decorate the dance location?" while the second version is "Do the dance organizers need people such that the organizers decorate those people?".

Some people sometimes feel that this difference can be disambiguated by the placement of phrasal accent, e.g.

Do they need PEOPLE to decorate? vs.
Do they need people to DECORATE?

I think that I can get either reading with either accent pattern, but as Geoffrey pointed out, this is an instance of a problem that's been the subject of linguists' discussion for more than 30 years, starting with Joan Bresnan ("Sentence Stress and Syntactic Transformations", Language, 1971), and continuing with Dwight Bolinger ("Accent is predictable (if you're a mind reader)" and many others since.

There are many complex factors are interacting here -- the syntactic structure and the discourse context among others -- and intuitions about stress, accent and intonation are not very reliable. As a result, the nature and source of the interaction between prosody and meaning in such cases remains controversial. Now that it's possible in principle to do studies based on thousands of hours of transcribed natural speech, it may be time to look at these questions in a new way.

The second cartoon (from May 6) is about compliments:

Here the ambiguity is more straightforward, and so is its relationship to prosody.

Finally, this Sunday strip has no connection to syntax or prosody, but does highlight the inadequacy of modern lexicography:

Posted by Mark Liberman at 06:33 AM

May 12, 2005

Insurgent Midgets in Iraq?

On the local news last night there was a report about the death of a local man in Iraq. According to the report, he was killed by small enemy arms fire. In spite of the extensive news coverage of the Iraq war, I haven't heard a word about the enemy being unusually small, or one subroup of them being unusally small. In all probability what was meant was enemy small arms fire. Size may not matter, but word order does.

My guess is that the reporter started out saying that the soldier was killed by small arms fire, presumably in contradistinction to a bomb, which seems to be the most common cause of violent death these days in Iraq, and from heavy weapons such as artillery and aircraft cannon fire. Then he or she decided to add the information that this was not a case of a soldier being killed by what is euphemistically called friendly fire, but mistakenly inserted the word enemy into the middle of the phrase small arms. I'm not sure why. Maybe its a production error: perhaps the reporter had already started to say small and rather than back up and start over inserted enemy at the earliest opportunity.

Posted by Bill Poser at 05:27 PM

The Fellowship of the Predicative Adjunct

As I have previously mentioned, I belong to a small circle of syntacticians collecting attested "dangling modifiers". Dangling modifiers are (and I will not be very precise about this) predicative adjunct constituents, usually at the beginning a clause, that are semantically in need of a target for the predication they express, but don't get one in the sentence where they appear. Usage books tend to illustrate them with examples like Trembling with fear, the clock suddenly struck midnight (who was trembling?), and they condemn them more or less sternly. Our group is on the lookout for them not because we want to rail against them or ridicule their authors for grammatical sins, but in a way it is for the opposite reason: we are convinced that dangling modifiers are so frequent in real life that they cannot possibly be syntactically forbidden. The prohibition has got to be due to a more subtle preference, and we are not quite sure how it should be made precise. Most people don't notice dangling modifiers at all when they occur. We notice them, however, and we thought the following three (one from the front page of The New York Times!) were really quite striking.

  1. After reporting that the University of California, Santa Cruz, planned to increase its enrollment considerably, a Santa Cruz local newspaper said:

    Thrown on top of such already existing problems as traffic congestion, a water shortage and housing capacity, the angry roar of a response from residents and local government was deafening.

    What was thrown on top of traffic congestion? Surely not the angry roar?

  2. In the same paragraph the same newspaper remarked that the new Chancellor at UCSC had negotiated that her partner would also be offered a high-salaried position, and the paper continued:

    While a common practice in any large corporation or university, students in Santa Cruz, who have seen tuition double and classes cut, were none too pleased.

    What's the common practice? Students?

  3. And don't think it's just local papers in small towns. Arnold Zwicky just caught this one in teaser on page 1 of The New York Times:

    That Wolfgang Puck introduced a new latte line may not be surprising, but the container, which heats itself, is. By pressing a button on the bottom, water mixes with quicklime, producing a chemical reaction that heats the coffee.

    Who presses the button? Water?

These are classic dangling modifiers, as clear and plangent as we can imagine. To me they look startlingly bad; I don't think it would be at all silly to propose going back and rewriting to correct them. (Compare with the nonsense about changing which to that in integrated relative clauses: the editors who demand that are just being silly, and wasting the authors' time. There is no unclarity introduced if which is used to introduce an integrated relative; there is no rule forbidding it, and no grammarian who has worked on describing Standard English thinks there ever was.)

If the three examples above seem fine to you, then you use a variety of English that doesn't have the prohibition against dangling that most usage handbooks try to enforce (there may well be such varieties, perhaps quite widespread). But if the above examples seem ungrammatical to you and you are amazed they appeared in print, then get ready to be surprised by the linguistic world around you, because they are coming thick and fast, and we collect new ones all the time.

(We need a name for our group, incidentally. But ‘The Danglers’ sounds like a rockabilly band. Something Tolkienesque would be nice. I wonder if my colleagues would go for ‘the Fellowship of the Predicative Adjunct’?)

Posted by Geoffrey K. Pullum at 03:07 PM

How many scientists does it take to see a rock rat?

According to a NYT article by John Noble Wilford, the Wildlife Conservation Society has announced that the Laotian Rock Rat, known to locals as the kha-nyou, represents a new family of mammals, and has been given the scientific name Laonastes aenigmamus. The NYT article includes a turn of phrase that I found odd:

Dr. Timmins, who is based in Madison, Wis., but concentrates on research in Southeast Asia, said in an interview that he first came on the animals laid out on market tables. Local farmers and hunters trapped or snared the animals, which they also referred to as rock rats, slaughtered them and took them to market. As far as he knew, Dr. Timmins said, no Western scientists have ever seen a kha-nyou alive. [emphasis added]

The phrase "no Western scientists have ever seen a kha-nyou alive" raised two questions for me. First, why Western? do Timmins and/or Wilford mean to imply that some Eastern scientists have observed living kha-nyous? Second, why scientists in the plural -- and kha-nyou in the singular? does it take more than one scientist to see a given rock rat?

The answers to these two questions may be related, and may have little or nothing to do with the structure or meaning of the phase itself.

Timmins and Wilford can't be sure that no Laotian (or other Southeast Asian) scientist has ever seen one of these creatures alive, just as they can't be sure that no western scientist (say some wandering 19th-century French botanist) has seen one either. So since the statement is already qualified by "as far as he knew", why not just say "no scientists"?

Part of the answer is that the passage sets up a contrast between "local farmers and hunters" and "Western scientists", because those are the categories of people involved in the history of the discovery. Of course there are many local farmers and hunters, and there were apparently two scientists who collected specimens in Laos independently around 1990: Robert J. Timmins and Mark F. Robinson. There is little reason for the authors to think about the categories represented by switching the modifiers: Western farmers and hunters are not about to go to Laos in search of rock rats, and local scientists were apparently not involved (though they might well have been).

In the phrase "no Western scientists have ever seen a kha-nyou alive", I suppose that kha-nyou is in the singular in order to emphasize that not even one of them has been seen alive by any scientist. But why doesn't the same logic apply to "no Western scientists"? If no single scientist has seen a kha-nyou alive, then plural scientists have not done so either, though the opposite need not be true.

Well, this compositional choice surely started with the rhetorical opposition between the (many) local rat-catchers and the (two) western scientists. But perhaps another influence was purely linguistic: "western scientists" is a fairly common collocation, with 34,700 web hits on Google, as opposed to "western scientist" with a mere 826. (By contrast, "no scientist" outvotes "no scientists', 49,200 to 9,890.)

If this analysis is correct, then Timmins (or more likely Wilford) started with a implicit opposition between local hunters and western scientists, and then stuck with the plural in "no Western scientists have ever seen...", because "Western scientists" is a statistically amiable bigram.

[Note that I'm not criticizing Wilford's prose here -- the meaning is clear and the choice of plurality in the cited phrase should not really bother anyone. I thought about it for a minute, as people interested in language sometimes do in such cases, and decided that the explanation offered an interesting glimpse of the complex and subtle rhetorical network that ties texts together internally and connects them to general patterns of usage.]

[ Update 5/14/2005: Gene Buckley emailed a quote from another NYT story:

On the theme of odd plurals, here's a caption from an article in today's Times [5/12/2005], under a photo of many cups of water arrayed on a table:

"Athletes are warned that too many liquids can be deadly."

The idea is "too much liquid", but the writer was apparently thinking about athletes who drink too many cups of liquid in the course of a marathon or the like.

]

Posted by Mark Liberman at 09:45 AM

May 11, 2005

What's 420 feet high and travels at 120 mph?

Unmentioned in the foregoing post with the quote about the rollercoaster that can't get on track (but hasn't yet gotten untracked) is another interesting linguistic point. Here's the quote:

Cedar Point's Top Thrill Dragster stands 420 feet tall, goes from zero to 120 mph in four seconds and frequently doesn't work.

But of course the part that stands 420 feet high never goes over zero mph, and the parts that may attain 120 mph don't stand 420 feet high.

Linguists have often discussed other such anomalous examples of mixing the distinct modes of reference of polysemous words (anomaly is commonly marked with a # prefix). For example:

#The ham sandwich in the corner says he wants some mustard and should have been on wheat bread.

You can call him the ham sandwich, but then that phrase can't also refer to the sandwich.

#The Cambridge Grammar is careful in its scholarship and eleven pounds in weight.

The work in question is careful in a sense that would carry over to a CD ROM edition in a way that the weight would not. The abstract multi-authored entity must be distinguished from the heavy concrete objects through which Cambridge University Press permits it to intrude on the physical world.

It has also been noted that this sentence has a curious property which is perhaps worth mentioning in this context (though as John Baker has pointed out to me, this one slips down too smoothly to be regarded as anomalous):

The temperature is ninety and rising.

(Whatever the temperature is, if it has the property of being 90, and it has the property of being currently rising, then it would appear to follow by substitution that 90 must be rising.)

Semantics is not my thing, but the general recommendation here would appear to be that you should decide how you're going to conceptualize the referent of the phrase you're using, and stick with it consistently. The writer of the first quotation above didn't do that.

Posted by Geoffrey K. Pullum at 06:18 PM

Luckily still not untracked

Jill Beckman sent in a link to a 6/18/2003 story by Neal Rubin in the Detroit News, about a balky rollercoaster:

Cedar Point's Top Thrill Dragster stands 420 feet tall, goes from zero to 120 mph in four seconds and frequently doesn't work.

Some half-asleep (or subversive?) editor ran this story under the headline "Cedar Point ride can't get untracked". As Jill observed, getting untracked is one of the few things that hasn't gone wrong with the ride, which (as of the date of the story) had experienced a series of problems with its hydraulics and its electrical system, but had never come off the rails. And Lila Gleitman pointed out that the headline writer has clearly "lost track of the internal morphology".

Posted by Mark Liberman at 12:55 PM

Stuff

Another piece of baseball lingo: Stephen Laniel points out that "only pitchers have stuff":

Hitters don’t have stuff. When a pitcher comes out and rocks his opponents with pitches that loop and drop and do all kinds of craziness, commentators say that the pitcher has “good stuff.” If the pitcher consistently does this (Josh and I discussed Pedro), you stop talking about how he “brought out good stuff”; instead you say, in a tone of awe, that his stuff is amazing.

But still there’s always stuff. Only for pitchers, though. If a batter comes out, has a great stance, lots of power, good speed, lots of base stealing — in other words, he’s doing everything that a batter should do — you never say that the batter brought out his stuff.

He's surely right. Google News currently has 4,400 hits for {pitcher stuff}, mostly stuff like this:

Sparks thinks he's a better pitcher, stuff-wise, than he was in his first stint in Winnipeg, seven years ago.
He didn't have his best stuff today, and that's the sign of a good pitcher: When you don't have your best stuff, and you go out and win.
When your stuff's the way you want it to be and you've faced a team already that you pretty much know the hitters, it gives the pitcher a little bit more of an advantage
I had good stuff, but I never figured out how to throw strikes.
His secondary stuff also was effective.
The stuff is there, but the consistency isn't.
But for all the stuff a pitcher may have, it's more important that he has the right stuff.
I'm not afraid to go out there with nothing. I had better stuff at the end. I just waited and waited and waited until it came around.

Although {hitter stuff} also has 3,350 hits, the stuff is all about pitching:

I go with my strong stuff first, and then go with my junk.
Maddux (2-1) handcuffed the Mets with an array of off-speed stuff.
Her stuff is really doing well. Her stuff is moving around and she is using all of her pitches.
Brandon Webb, who has one of the game's nastiest sinkers (just ask any major league hitter), has ace-like stuff.
It is tough for a hitter to hit his stuff, especially when his pitches are ranging from 75 to 95 [mph].

And similarly for {batter stuff}.

Natalie just didn’t have her good stuff, and I could tell she was a little frustrated.
That would be Pat McCrory, 8-1, who came on in the sixth and showed a fastball and breaking stuff maybe a notch better than Whitmer's.
"Josh elevated his stuff tonight and they didn't miss 'em," said New Hampshire manager Mike Basso.

For some reason, pitching is viewed as a substance with varied characteristics (strong, secondary, off-speed, breaking, elevated) as well as qualities (good, better, best), while hitting isn't.

There's a similar way of talking about writers and writing:

And not everything that King wrote was as outstanding as his best stuff.
Like I said before, I love his older stuff. I really, really do. He's a great writer.
Her earlier stuff is better than her latest.
Her Celtic stuff bores me a bit, although I still read the whole book.
Or CL Moore if you can find any of her stuff in used book stores.
I read him in the Voice, so maybe I missed his good stuff, which people say was in Creem, which I didn't read.
Read her better stuff, like 'Hollywood Wives'.

So maybe pitchers, like writers, create stuff, which batters, like readers, react to?

The earliest use of pitcher's stuff that I've found is from the 1915 NYT, though the example is a marginal one since it talks about an essential property -- what stuff a pitcher is made of -- rather than a variable one:

[October 10, 1915: WILSON WATCHES RED SOX WIN, 2-1] Then Foster pulled himself together. [...] He has been knocked and bumped until now, in the heat of battle, he is as cool as a cucumber. he showed that he was made of real pitching stuff today. He settled down after the fifth, and for the rest of the game the Phillies made only one hit.

This example from 1924 is a bit closer to the modern idiom:

[July 2, 1924: YANKS AGAIN DIVIDE 2 WITH ATHLETICS] Hoyt had no time to warm up and he had but little of his best pitching stuff with him.

In 1950, the NYT still used with scare quotes for some versions of this idiom, suggesting that it was still in the process of formation:

[January 13, 1950 (by Roscoe McGowan): Hartung, Lohrke in Giants Fold] One of Clint's weaknesses, in common with many other strong-armed young pitchers, was lack of control of his "good stuff" -- with the usual results.

The OED doesn't recognize this usage, but the AHD gives sense 5 for stuff as:

5. Sports a. The control a player has over a ball, especially to give it spin, english, curve, or speed. b. The spin, english, curve, or speed imparted to a ball: "where we could watch the stuff, mainly curves, that the pitchers were putting on the ball" (James Henry Gray).

This gloss doesn't entirely correspond to the current facts of usage, it seems to me. Stuff is often used as if it were the opposite of control; and pitchers always impart some amount of spin, english, curve or speed to their pitches, but they don't always have their stuff. But more to the point, it's not just any player who has stuff or imparts stuff to a ball -- no one seems to talk or write about the stuff that a batter puts on a ball, and I haven't seen stuff used to talk about a catcher's throws to second, or other non-pitching throws.

[Update: Benjamin Zimmer sent in some stuff citations back to 1910:

1910 _Washington Post_ 22 Feb. 8/4 Lack of control was Gray's only failing last season. He had as much stuff as any lefthander in the league, but when he got himself in a hole he would have to let up in order to locate the place, and under such conditions he was usually hit hard.

1910 _Chicago Tribune_ 3 Jun. 10/1 It was at this juncture that McIntire called into usage all his stuff and struck out both Frock and Collins.

1910 _New York Times_ 21 Oct. 8/5 Coombs pitched much the same kind of game as in Philadelphia last Tuesday. When he started out he was wild and didn't seem able to put all his "stuff" on the ball.

1910 _Washington Post_ 23 Oct. S1/8 In today's game Bender seemed to lose his stuff toward the finish, and the way the Cubs switched their plan of attack would seem to indicate that they figured that he was going. ... Cole worked his strike-outs in at the right time. He did not seem to have as much stuff as Bender, who had extreme speed and a finely breaking curve ball, up to the last few minutes.

]

[Update #2: Ben took the pitching stuff citations back to 1905:

1905 _Washington Post_ 27 May 9/1 Long Tom Hughes, on the other hand, had plenty of undecipherable stuff, and the Browns had no key to his hieroglyphic code. He pitched magnificently throughout.

And amazingly, he found some some 1905-era examples of stuff referring to the spin that a batter is able to put on the ball:

1905 _Los Angeles Times_ 5 Jul II8/4 A good outfielder ... must know the kind of stuff the batsmen get off the delivery of the pitchers, so as to make allowance for the kind that curve in an eccentric manner as they shoot out toward the lots.

1906 _Washington Post_ 19 July 8/3 As Hahn got to it, the ball dropped fair then quickly turned with a reverse English curve and dashed under the stands. Not a batter in a million can put that reverse on there like "Cy." It had just enough stuff to accomplish "Cy's" fiendish purpose.

Ben also turned up some examples where the "stuff" that a pitcher puts on a ball is of the salivary variety:

1908 _Los Angeles Times_ 13 Feb. 7/5 Harry Howell, one of the original exponents of the spitball, says that he will continue to use the wet ball this year. "Deprive me of the wet ball," said Harry, "and I will be forced to quit the game. Regardless of the talk of Fielder Jones, Clarke Griffith, Chance, Jennings and the other leaders, you will find Walsh, Chesbro and myself using the same stuff the coming season."

Ben wonders whether the spitball is at least partly the linguistic source of pitching "stuff" -- as it surely was in some cases the physical source -- and suggests that "it would be interesting to see if the term shows up in other sporting contexts, e.g., cricket or billiards, to refer to the spin or 'English' placed on a ball". ]

[Update #3: Richard Hershberger emailed:

I set out to see if I could beat Benjamin Zimmer's 1910 citations of "stuff". I wasn't able to, but I found this interesting, from the June 8, 1910 New York Times, reporting on a game between the Giants and the Cardinals:

"St Louis not only worked all their baseball stuff to win, but they used all the chin music within reach."

"Stuff" here seems to be used in a more general sense, but I'm not sure if it means "general baseball skills" which could be better or worse, or if it is the common use of "stuff" with "baseball stuff" contrasted with "chin music". (The subsequent sentence makes clear that "chin music" means arguing with the umpire. Nowadays it more commonly means pitching the ball at or near the batter's head.) I am leaning toward the latter interpretation, which makes it not so much on point to the topic at hand, but still interesting.

]

Posted by Mark Liberman at 10:12 AM

Historically untracked

I think Lila Gleitman is right that "untracked" is an eggcorn for "on track" -- Arnold Zwicky has entered it as such in the eggcorn database. She is certainly also right that it's mainly a sports usage these days. Google News currently has 113 hits for {"get|got|getting|gets untracked"}, and every single one of them is in a sports story. Given how sports metaphors pervade language in other domains these days, this surprises me. I would have expected at least to see a couple of companies or government agencies getting untracked.

Lila's idea that the underlying metaphor is getting out of a rut makes sense. But something else is going on as well. The pattern of usage in sports stories makes it seem like a matter of warming up -- it's as if an individual or a team naturally starts out tracked (not that anyone ever says that), and then if things go right they can get untracked.

In any case, it seems that the history goes back further than Lila thought. The ProQuest Historical Newspapers Archive, which indexes the NYT back to 1857, finds an example of "got untracked" from 1927.

[August 5, 1927 (By John Drebinger): CLARK HALTS CARDS, ROBINS WINNING, 4-2] As might be expected, there had to be some peculiar baseball before the Flock family got itself untracked, and in the third it did the rather amazing thing of leading off with double and a single without scoring a run at all.

For some reason, the form "get untracked" doesn't occur until 1940. This one certainly seems to be a "get out of a rut" metaphor, describing problems running in mud:

[April 19, 1940: ROCHESTER VICTOR AT SYRACUSE, 6-1; Gornicki Restricts Chiefs to Four Hits in International League Inaugural Game ] The field was so heavy that several of the hits would have been easy outs if the fielders had been able to get untracked, and the deciding run, in the fourth inning, came as Crabtree tripled after Longacre fell trying to make the catch and Kurowski flied to right.

The earliest modern-style usage of the form "get untracked" in the NYT is from an AP wire story in 1946:

[September 14, 1946: DONS TRIP DODGERS, 20-14] The Los Angeles entry snapped into high for 14 points before the Dodgers could get untracked.

The earliest "get untracked" story by a Times sportswriter seems to be from 1947, describing a mile race:

[January 19, 1947; By Joseph M. Sheehan: MANHATTAN'S TEAM IS SURPRISE VICTOR IN A.A.U. TITLE MEET] Walsh, a step behind Hulse, was caught napping, and Quinn had seven yards on him before he could get untracked.

LexisNexis indexes the NYT only back to 1980, and finds several untracked usages from that year. There's one from an AP wire story in July:

[ July 6, 1980: SIMPSON WESTERN LEADER BY 5 (AP)] Bean, the 1978 Western champion, could not get untracked. He was in sand traps twice and the rough twice for a double bogey at No. 10. But he birdied the long 12th hole, and got another at No. 14 before falling back into trouble.

And again by an NYT reporter in September:

[September 22, 1980 -- By ED CORRIGAN, Special to the New York Times: Rutgers Overcomes Cincinnati by 24-7]

Rutgers sputtered through an uncomfortable first half today against the University of Cincinnati, a team that was supposed to offer only token resistance,
The Scarlet Knights, however, got untracked in the second half with Al Ray rushing for 114 yards and went on to defeat the Bearcats, 24-7, before a crowd of 17,800 in 82-degree heat at Rutgers Stadium.

The earliest of the (much less frequent) non-sports uses that I've found so far was a 1978 quote from Chemical Week, but I'm sure there must be earlier examples. I just don't have the patience to wade through all the sports examples to find them.

[August 2, 1978: Pretreatment isn't a treat for industry] Pretreatment of wastes discharged from industrial plants to municipal sewers for treatment at public utilities have been a concern of the CPI since EPA published its first proposed standard in July 1973. That concern seemed to wane when EPA appeared to be unable to get untracked on policy. But the agency was finally pushed by legal action by the Natural Resources Defense Council. The consent decree in that case resulted in EPA focusing on the 65 chemicals cited in the suit.

In some nonsports stories , untracked means "falling apart" rather than "getting it together". However, these never seem to involve "getting untracked" but rather "becoming untracked" and the like:

[ June 13, 1980: $13.6 BILLION CITY BUDGET IS VOTED] Partly because of those uncertainties and the requirement that the State Legislature approve some of the city's taxes, the Council added three days to the Mayor's normal nine-day veto period in the event the plan becomes untracked.

By the way, the OED so far knows only the "not furnished with a track or path" and "not tracked or traced" senses of untracked; Merriam-Webster's 3rd Unabridged has essentially the same two senses; and the AHD doesn't have it at all. Encarta had nothing, and suggested helpfully (though bizarrely, in my opinion) that I might be interested in unfrocked. It's curious that such a common usage is lexicographically ignored -- I wonder if sports terms in general are similarly underdictionaried, and if so, why?

Posted by Mark Liberman at 09:00 AM

Stottlemyre's Save Helps Brown Gets Untracked

Above you see (I kid you not!) a headline which I copied from the NY Times Sports page.   It appears to have an egregious grammar error ("gets") unless there's a parse that I'm just not seeing. And yes, this was printed on one line, so it couldn't be that the line space was meant to serve the place of a punctuation indicator, like, "Stottlemeyer's save helps.  Brown gets untracked."  Maybe this would indeed in some ways preserve the sense of what's said in the article. Though Sottlemyre is the pitching coach and Brown is the pitcher, so I guess even this wouldn't make sense.

But I mainly mean to point to the word "untracked."   I first thought I heard this when I moved to Philly in about 1960 and was listening to an announcer call a hockey game:  "If the Flyers don't get untracked in the next period, they're gonna lose this game."  Everyone said I had misheard and that the announcer must have said "on track."   I have heard this more and more often in recent years, yet many people still tell me there's "no such word" and that nobody ever says that.  Hooey, as you see. Note that "getting on track" and "getting untracked" are both positive states of affairs though at first hearing they would seem contradictory.   I believe, though, that the earlier locution alluded to railroads (where it's good to be on track) and the later urban analogy must have been to getting your car wheel stuck in the trolley tracks, a condition from which you'd like to recover, thus getting "untracked" would again be good).   After all, there's the old "in a rut" which clearly comes from muddy dirt roads and wagon wheels and is a parlous condition.

So I consider a language change to be complete when it shows up IN PRINT in the NY Times.   Especially in a headline.  Or maybe they're trying to be bloggy?

Posted by Lila Gleitman at 07:38 AM

May 10, 2005

What I currently know about which and that

Following on my postings about that, which, Bellow, and Safire and about the Committee for the Certification of Good Writing, people have been writing me about versions of the That Rule in the style sheets of institutions that should really know better.  Roger Shuy, in particular, wrote, somewhat sheepishly, about his sheeplike compliance for one of these institutions.  In reply, I summarized for him what I currently know, and think, about the That Rule, and now I'm passing a version of this summary on to you.

Two of my correspondents have dealt with large, famous, venerable, and transatlantic university presses that publish extensively in linguistics (actually, I've dealt with these presses too, but I didn't have their style sheets to hand).  In exhibit #1, which came to me on 5/3/05, the style sheet says sternly:

"That" will be used with a restrictive clause; "which" will be used with a nonrestrictive clause and set off by commas...

This one comes to me from someone who has done copyediting for the U.S. office of one of these large, famous, etc. presses.  I'm keeping my correspondent's name confidential, just in case there's a chance of more copyediting gigs in the future.

In exhibit #2, dated 5/8/05, the safely nameable Roger Shuy passes on a style sheet from the other of these transatlantic academic behemoths, a style sheet in which we are (merely) advised:

Generally use "that" not preceded by a comma, in essential (or restrictive) clauses (i.e., clauses that are essential to the meaning of the sentence) and "which," preceded by a comma, in nonessential (or nonrestrictive) clauses (i.e., clauses that can be omitted without altering the meaning of the sentence).

This is just to show that the That Rule lives on even in the genteel ivied walls of academia, not just in PSAT prep manuals and the like.

So, what do I currently know about the justification for the That Rule?

  1. The usual claim (now, although this was not Fowler's claim in 1926) is that the That Rule avoids ambiguity between the two sorts of relative clauses -- or, in some muddier formulations, that it avoids "unclarity".
  2. Taken at face value, this claim is just silly.  The guidelines in exhibits #1 and #2 recommend marking the distinction in TWO ways -- by punctuation, and by the choice of that (vs. which).  But the punctuation ought to be sufficient, and it's required in other cases (see below) where the that/which choice is not involved.  In any case, there should be no chance of ambiguity, or any other sort of unclarity, so long as the relative clause is properly punctuated.
  3. So, apparently, the ambiguity justification really is cogent only if people don't punctuate correctly.  Well, plenty of writers are not especially competent at punctuation; in particular, commas with restrictive which AND that are both pretty common (not in material from "good writers", but in, say, student writing).  So what the That Rule does is try to fix the punctuation problem by having the punctuation follow automatically from the choice of relative marker.  It seems to me it would make more sense to get at the punctuation directly; in fact, as I argue below, it's necessary to get at the punctuation directly.
  4. One baleful consequence of trying to use the that/which choice to fix a punctuation problem is that students are likely to learn the alternative rule, the Relative Punctuation Rule: use a comma with which, no comma with that.  Now, since virtually everybody uses some restrictive which in speaking and writing, this alternative rule actually INDUCES incorrect punctuation with restrictive which.

    (It's one thing to promulgate a rule.  How your formulation will be understood by your audience is quite another matter.  The universe of "rules" of grammar and usage is peppered with unintended consequences.)

    The underlying problem is that almost every English speaker's Sprachgefühl occasionally calls for restrictive which, and then the Relative Punctuation Rule can lead people to mispunctuate.  (For the brief story on restrictive which, see the MWDEU entry for that; the studies mentioned there, and now a number of others, show that edited prose contains quite a lot of restrictive which.)

    However, there are at least two cases where the That Rule isn't sufficient to fix the punctuation of relative clauses.

  5. The first case involves relative markers in combination with prepositions.  Contrast:
     The only case of which I have direct knowledge occurred in 1972.
     The only case, of which I have direct knowledge, occurred in 1972.

    Here, the punctuation alone marks the restrictive/nonrestrictive distinction, since (as is well known) that is unacceptable in combination with a preposition:

    *The only case of that I have direct knowledge...
    *The only case, of that I have direct knowledge,...
  6. If you insist that restrictive relatives must ALWAYS have that rather than which, then the only option open to you is to strand the preposition:
    The only case that I have direct knowledge of...
    Unfortunately, the same manuals that insist on restrictive that also recommend against preposition stranding.

    The That Rule could, of course, be altered to exclude this case, by saying something like:

    That Rule 2: Use that for restrictive relatives, unless the marker is the object in a prepositional phrase; otherwise, use which.

    I assume that something like That Rule 2 is what the manuals have in mind, though how a student is supposed to figure that out, I have no idea.

  7. The second case involves relative who(m).  If there is ambiguity or unclarity with relative which, the same ambiguity or unclarity exists with relative who(m).  The following contrast is made entirely by punctuation:

    The only cheater who(m) I know is Kim.
    The only cheater, who(m) I know, is Kim.
  8. You might expect the manuals to insist on restrictive that here as well as in the earlier cases; it would, after all, fix any problem of unclarity.  And restrictive that is available:

    The only cheater that I know is Kim.
    Unfortunately, the same manuals that insist on restrictive that in non-human relatives disfavor it in human relatives.

    The That Rule could, of course, be altered to explicitly apply only to non-human relatives:

    That Rule 3: Use that for non-human restrictive relatives, unless the marker is the object in a prepositional phrase; otherwise, use the appropriate wh-word.

    Again, I assume that something like this is what the manuals have in mind, though, again, I don't know how students are supposed to figure that out.

    I'd guess, by the way, that the standard That Rule actually ENCOURAGES people to use that in human relatives (against the intentions of the manual writers).  More unintended consequences.

  9. So far I've stated the rule as, literally, a prescription: advice about which words to use where.  Viewed this way, the rule is baroque.  But, actually, the manuals (in this case and most others) provide prescriptions only secondarily.  Their primary purpose is to issue PROSCRIPTIONS; the prescriptions are fixes for the bad stuff.

    The proscription is one that's pretty easy to check for mechanically:

    Which Hunting Rule: Look for a noun immediately followed, without a comma, by which; the which probably should be replaced by that.

    (There are a few cases where this sequence doesn't involve a relative clause, as in "I asked the man which (one) he wanted", so you can't be COMPLETELY mechanical about it.)  The near-mechanical character of the rule in this form undoubtedly has contributed much to its popularity on tests of "grammatical competence", its endurance in style sheets, and its incorporation into "grammar checkers".  An actual STYLISTIC CHOICE would involve judgment.

    Note that the Which Hunting Rule works well only when the writing you apply it to is generally punctuated correctly.  Which is almost always the case for the material that the venerable transatlantic presses' style sheets are applied to.  That is, this fairly mechanical rule can be used only to alter the writing of people who are in fact making a stylistic choice between which and that; the effect of the rule is to limit this choice.  That's why I object to it.

    As for those who are less competent writers than I am, I think their problem in this domain is mostly one of punctuation, and that's where they need help.  Trying to package the punctuation together with the choice of relative marker is a really bad idea.

zwicky at-sign csli period stanford period edu


Posted by Arnold Zwicky at 08:25 PM

A lot depends on how you frame it

Flash! Surprise! Gay men are attracted to the scent of other men!

While this is scarcely astonishing -- at least to gay men, who routinely report the subjective experience of such an attraction -- it's nice to have some preliminary lab results supporting the claim. But some of the reports are framed not the way I just did, but in terms of a similarity between gay men and women, thus reflecting (and also advancing) the widely held position that gay men are, psychologically and even neurologically, female.

The story made the front page of today's (5/10/05) New York Times, under the headline "For Gay Men, Different Scent of Attraction" (by Nicholas Wade), which frames things in terms of gay men's differences from men in general, that is, from straight men. Another piece of ideology about gender and sexuality: gay men fail to achieve, or reject, normative masculinity. The first paragraph slightly re-frames things, with gay men viewed not as different from men in general, but with gay and straight men merely viewed as different from one another. And then comes the comparison of gay men to women:

Using a brain imaging technique, Swedish researchers have shown that homosexual and heterosexual men respond differently to two odors that may be involved in sexual arousal, and that the gay men respond in the same way as women.

The effect is subtle, since the claims "gay men respond differently from other men", "gay men and straight men respond differently", "straight men respond differently from gay men" (whoa!), "straight men and gay men respond differently" (small whoa!), "gay men and women respond in the same way", "women and gay men respond in the same way" (another small whoa!), "women respond like gay men" (big whoa again!), and "gay men respond like women" are effectively equivalent in context. Yet the differ in which group is taken as the reference class, as expressed in the "from X" or "like X" phrase, and in which group the claim is about, as expressed by which group is mentioned first. There's no totally neutral version, but some are more neutral than others.

The world of discourse about gender is full of such subtle effects, many of them well known. My all-time favorite is Stuart Flexner's claim of 45 years ago that women use less slang than men. Aside from the problem that the claim is apparently about all people, women and men in general (how on earth would you test that?), the problem of figuring out what counts as slang, and the problem of figuring out how to quantify slang use across all occasions of speaking and writing, there are two, interrelated, framing problems: men are taken as the reference class, and the claim is about women (the claimed effect is the result of something that women do). Like I said, the effect is subtle. But likely to be more consequential because of that; every time the claim is repeated in this form, a particular gender ideology is reinforced, and no one is consciously aware of what's going on.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 01:36 PM

Darwinian evolution is so over

No, I'm not talking about the hearings in Kansas. The topic is Freeman Dyson's article "The Darwinian Interlude" in the March issue of Technology Review, which I just got around to reading yesterday.

Dyson's idea is that the first period of life on earth was a sort of bacterial version of primitive communism,

a golden age of pre-Darwinian life, during which horizontal gene transfer was universal and separate species did not exist. Life was then a community of cells of various kinds, sharing their genetic information so that clever chemical tricks and catalytic processes invented by one creature could be inherited by all of them.

Then, Dyson writes,

one evil day, a cell resembling a primitive bacterium happened to find itself one jump ahead of its neighbors in efficiency. That cell separated itself from the community and refused to share. Its offspring became the first species. With its superior efficiency, it continued to prosper and to evolve separately.

With individualism came sex -- negotiated bilateral genetic commerce -- and hierarchy -- structured multicellular organisms -- and "brains, which opened a new world of coördinated sensation and action, culminating in the evolution of eyes and hands". And now, those brains have allowed us to return to genetic communism:

[T]he Darwinian era is over. The epoch of species competition came to an end about 10 thousand years ago when a single species, Homo sapiens, began to dominate and reorganize the biosphere. Since that time, cultural evolution has replaced biological evolution as the driving force of change. Cultural evolution is not Darwinian. Cultures spread by horizontal transfer of ideas more than by genetic inheritance. Cultural evolution is running a thousand times faster than Darwinian evolution, taking us into a new era of cultural interdependence that we call globalization. And now, in the last 30 years, Homo sapiens has revived the ancient pre-Darwinian practice of horizontal gene transfer, moving genes easily from microbes to plants and animals, blurring the boundaries between species. We are moving rapidly into the post-Darwinian era, when species will no longer exist, and the evolution of life will again be communal.

In other words, Intelligent Design starts here? Not everyone is confident that the human designers will be intelligent enough; and presumably nature will continue to act and react; but maybe the Kansas state board of ed should call Dyson to testify. Though somehow, I don't think his brand of anti-Darwinism is quite what they're looking for.

Anyhow, it's curious how people like to force scientific theories into the shape of political metaphors. I guess Dyson's little bio-marxist riff is a fair reaction to a hundred and fifty years of social darwinism.

Dyson is taking off from Carl Woese's "A New Biology for a New Century", Microbiology and Molecular Biology Reviews 68(2) pp. 1092-2172. Woese's thesis is not anti-Darwinian, but rather anti-reductionist, or maybe post-reductionist:

Biology today is at a crossroads. The molecular paradigm, which so successfully guided the discipline throughout most of the 20th century, is no longer a reliable guide. Its vision of biology now realized, the molecular paradigm has run its course. Biology, therefore, has a choice to make, between the comfortable path of continuing to follow molecular biology's lead or the more invigorating one of seeking a new and inspiring vision of the living world, one that addresses the major problems in biology that 20th century biology, molecular biology, could not handle and, so, avoided. The former course, though highly productive, is certain to turn biology into an engineering discipline. The latter holds the promise of making biology an even more fundamental science, one that, along with physics, probes and defines the nature of reality. This is a choice between a biology that solely does society's bidding and a biology that is society's teacher.

Dyson's bio-marxism is his own overlay on Woese, who doesn't stress any similar rhetorical opposition of communal vs. individual in his article, though he does suggest that there was a "Darwinian threshold or Darwinian transition" which "took the cell out of its initial primitive state in which HGT [Horizontal Gene Transfer] dominated the evolutionary dynamic (and evolving cells had no stable genealogical records and evolution was communal) to a more advanced (modern) form (where vertical inheritance came to dominate and stable organismal lineages could exist)." Woese draws the lesson that we should "hold classical evolutionary concepts up to the light of reason and modern evidence before weaving an evolutionary tapestry around them. Most of them will turn out to be fluid conjectures that 19th century biologists used to stimulate their thinking, but conjectures that have now, with repetition over time, become chiseled in stone: modern concepts of cellular evolution are effectively petrified versions of 19th century speculations." And he concludes that

The molecular cup is now empty. The time has come to replace the purely reductionist "eyes-down" molecular perspective with a new and genuinely holistic, "eyes-up," view of the living world, one whose primary focus is on evolution, emergence, and biology's innate complexity. (Note that this does not mean that the problems worked on in any new representation of biology will not be addressed by customary molecular methodology; it is just that they will no longer be defined from molecular biology's procrustean reductionist perspective.) [emphasis added]

Though this may be a minority view within biology -- just as molecular biology perceived itself as an embattled minory in 1963, when I had a summer job in a molecular biology lab -- it's by no means an isolated or heretical perspective. I recently heard a form of this same message from David Searls, SVP Worldwide Bioinformatics at GSK, in his presentation at a symposium on Formal Grammars, DNA and Linguistic Theory. David suggested, in effect, that we now understand the biological equivalents of phonology, morphology, syntax and even semantics, and therefore are ready to face the challenge of understanding whole biological "discourses" in organisms, ecologies and societies.

Posted by Mark Liberman at 11:23 AM

Realistic surrealism

Jean-Frédéric Jauslin, director of the Swiss Office Fédéral de la Culture ("Federal Culture Office"), needs to learn to do research on the web, or to use a calculator, one or the other. Or perhaps he just needs a little more common sense and a little less arrogance. All of this, of course, is supposing that that he was quoted accurately by the reporter from silicon.fr. In such cases, my normal rule of thumb is to blame the journalist, but I might need to make an exception for culture bureaucrats.

Jauslin was apparently an observer at the recent meeting of EU Culture ministers. In any case, he has cast his lot with the Chirac/Jeanneny initiative, saying in a press conference that "la Bibliothèque nationale suisse pourrait participer aux futures réalisations européennes en matière de bibliothèques virtuelles pour contrer le projet de Google" ("the Swiss National Library could participate in future European activities in the area of virtual libraries, to counter Google's project").

The French geekoid publication silicon.fr quotes Jauslin as echoing the now-usual sentiments about the "danger non négligeable pour la pluralité culturelle" ("non-negligeable danger for cultural pluralism") and the "risque de prédominance de la notion de profit et de l'anglais" ("risk of predominance of the idea of profit and of English"). However, he goes that one chin-pull too far:

Il modère ainsi l'enthousiasme légitime de son confrère français le président de la Bibliothèque nationale Jean Noël Jeanneney à l'initiative du projet européen, en déclarant qu'une "numérisation systématique est surréaliste". En effet, numériser 100.000 pages par jour prendrait pas moins de... 400 ans!

He thus moderates the legitimate enthusiasm of his French colleague, the president of the National Library Jean Noël Jeanneney, for the initiative of the European project, declaring that "a systematic digitization is surrealist". In fact, to digitize 100,000 pages per day would take not less than... 400 years!

Well, "systematic digitization" might well be surrealist, but it's not unrealistic.

The picture on the right shows the Digitizing Line automated book scanner. The page turning system is made by the Swiss company 4digitalbooks, and the scanning system and camera by the French company i2s. It can scan 30 pages a minute -- "up to 1500 pages per hour in unattended operation", with "constant process reliability operating 24 hours a day", according to the maker. One of these machines was at the center of the Stanford digital library project where Larry Page and Sergey Brin got their start, before they went off to found Google, according to a May 12, 2003, NYT story, a paragraph of which can be found for free here.

Another "robot scanner", made by Kirtas Technologies of Rochester, NY, claims 1,200 pages per hour combined with "ultra-gentle handling". This machine is said to be "about the size of a small kitchen refrigerator" (much smaller than the Swiss unit, which is described as "the size of an SUV"), able to be "easily moved between locations", and (as of 1/15/2004) on sale for "$150,000 a pop".

The Google Library Project page says that "We have developed innovative technology to scan the contents without harming the book". This may imply that Google has sponsored the design of some new machines, which presumably are at least as fast as the 4digitalbooks and Kirtas products; or it may be that whoever wrote that page is using "we" in a rather inclusive sense.

Anyhow, let's do the arithmetic. The Google Library Project has been quoted as aiming at 4.5 billion pages -- the content of 15 million 300-page books -- for a cost of $150-200M, over a number of years. 1,200-1,500 pages per hour is 28,800 to 36,000 pages per day -- let's bring that down to 25,000 pages per day, to allow for maintenance and whatnot. Then 4.5 billion pages is 180,000 machine-days, or 493 machine-years. Perhaps that's where Jauslin (or the silicon.fr reporter?) gets his sneering estimate of "not less than... 400 years".

OK, let's round 493 up to 500. There are 25 EU countries. If they split the chore among them, that would be 20 machine-years each. If each of them had 20 machines, they could do it in a year.

There are five libraries participating in Google's project, so for them, it reduces to 100 machine-years per library. If they spread the work over five years, each would need 20 machines "the size of a small kitchen refrigerator". I'm sure that each of those libraries already owns and operates many more than 20 copiers of that size or larger.

At the 1/2004 quantity-one price quoted by Kirtas, this would require $15M for the 100 scanners. But this is not an unduly large proportion of the budgeted $150-200M, and surely Google will get a volume discount. More likely, Google will be able to take advantage of economies of scale in other, more serious ways.

I found all this out in less than half an hour of searching on line, and did the math in a few seconds. I'm sure that M. Jauslin has subordinates who know how to use Google and a calculator as well as I do, but apparently it never occurred to him to ask them to deploy their skills.

Of course, the EU sometimes seems to run according to a different system of arithmetic. According to this recent article on book digitization technology for European libraries at another French geekoid publication, 01net:

C'est dans ce contexte qu'Infotechnique, une filiale de Getronics spécialisée dans la gestion électronique des documents, notamment pour le compte de l'Union européenne, vient d'inaugurer Eurodema (pour « Europe dématérialisation ») à La Walck, à 40 kilomètres de Strasbourg. Le premier contrat d'ampleur engrangé par ce centre porte sur la numérisation des 32 millions de pages issues des livres d'actes notariés accumulés en Alsace-Moselle depuis plus d'un siècle. Montant de l'addition : 23 millions d'euros, facturés au Gilfam, le groupement d'intérêt public constitué par les départements du Bas-Rhin, du Haut-Rhin et de la Moselle.

It's in this context that Infotechnique, a subsidiary of Getronics specialising in the electronic administration of documents, especially for the European Union, has just inaugurated Eurodema (for "Europe Dematerialization") at La Walck, 40 kilometers from Strasbourg. The first large contract collected by this center deals with the digitization of 32 million pages of books of certificates (?) notarized in Alsace-Moselle over the past century. Adding up the bill: 23 million euros, divided among Gilfam, the public-interest group made up of the departments of Bas-Rhin, Haut-Rhin and Moselle.

This looks like an extraordinarily good deal to me -- for Infotechnique!

32 million pages for 23 million euros -- €0.72/page = $1.05/page. If I could get that contract, I'd be tempted to take a leave from Penn and do the job myself. I often scan articles and book chapters to put on reserve for students in my courses. Using my cheap, unautomated commodity scanner and Adobe Acrobat, I generally allow for a rate of 2 scans per minute. For most book formats, each scan is two pages, so I can do 240 pages per hour. Thus at Infotechnique's rate I could earn $252/hour, which I view as a pretty good wage. Since 32 million pages would get tiresome, even at that sort of rate, I'd be happy to split the work with some colleagues and friends.

And in fact we could do much better for ourselves. We could invest in one of those Kirtas scanners for $150K. Then all we need to do is load a new books in, one every 15 minutes or so, and the scanner would earn us up to $28,800 per day, making its cost back in less than a week. So we could easily buy several such scanners. At the rate of 25K pages per day, the whole job would take 1280 machine-days. With four machines, and (say) a dozen congenial partners to do the work in shifts -- in a nice place, with all the amenities -- we could do it in a year, and divide more than $29M among us, or almost $2.5M each.

Well, I know that there would be other costs. Let's allow $1M for renting a tasteful chateau in the neighborhood, and another $1M for spares, supplies, utilities, legal fees and whatnot. The split would still be about $2.25M each. Now if only Jean Véronis had kept his eye on the politico-digital-library ball, rather than doing all that clever reverse engineering of indexing methods! Or perhaps Chris Waigl might have been cultivating contacts in Alsace-Moselle rather than planting eggcorns...

Note that Google is projecting 4.5 billion pages for $150-200M -- between $.033 and $.044/page. If we call it $.04/page, that's 26 times cheaper than Infotechnique's rates. At Google's estimated prices, I'd only earn $9.60/hour with my old HP flatbed scanner, which would not tempt me, though many honest and respectable people work for less. In fact, come to think of it, it's almost $10/hour more than Language Log contributors get...

[Note: I recognize that the Alsace-Moselle contract may well involve all sorts of special and labor-intensive circumstances. Perhaps, for example, large numbers of hand-written documents need to be transcribed and edited; perhaps a textual (as opposed to image) form of the output needs to certified by notaries; etc. All this might mean that the contract is not outrageously padded, but just atypical. However, the 01net article does not mention any considerations of this kind, and instead presents this contract as a representative sample of the virtual library activities to come...

And as further evidence of sometimes-odd EU arithmetic in this general area, you can refer to my earlier discussion of on-going digitization efforts at Jeanneney's BNF, where a decade of work seem to have resulted in fewer than 1,500 books processed, or some 18 days' work for one of the automated scanners. ]

Posted by Mark Liberman at 08:34 AM

The Dan Brown Beat

With Geoff Pullum understandably distracted by the activities of the religious fanatics in Kansas, it looks like it falls to me to take up the slack on the Dan Brown beat. A Mr. Lewis Perdue is claiming that The Da Vinci Code improperly draws on his novels Daughter of God and The Da Vinci Legacy. In response to his claim, much-maligned author Dan Brown and his publisher, Random House, have sued for a declaratory judgment that The Da Vinci Code did not infringe Perdue's copyrights. Mr. Perdue has countersued, adding as defendants divisions of Sony Pictures and Columbia Pictures that are at work on a movie based on the book.

A good deal of information about the dispute is available. Lewis Perdue's web site has links to many of the legal documents as well as reviews of his books and other relevant material. He's also got a blog. This case seems likely to prove a good deal more entertaining than The Da Vinci Code itself.

Part of the basis for Perdue's case is a comparison between The Da Vinci Code and Lewis Perdue's books by the Forensic Linguistic Institute. The report presents quite a few similarities between Brown's book and Perdue's, but I was disappointed, in light of the name of the Institute, to find that there are no linguistic similarities. Not having read any of Mr. Perdue's books I can't say for sure, but it may well be that he can take comfort in this. The lack of notable linguistic similarities could be due to Perdue being a better writer.

According to Saturday's New York Times (B11), Judge George B. Daniels has acceeded to the urgings of the lawyers for both parties to read the three books at issue in their entirety rather than relying on the excerpts in the court filings. I'm sure that I speak for all of us at Language Log in wishing iron-boweled Judge Daniels the best of luck in surviving the ordeal to which he has submitted himself.


Posted by Bill Poser at 02:38 AM

Lost in Translation

In an effort to prop up the use of Irish, the government in the Gaeltacht, the region in which Irish is still in common use, has replaced the previously bilingual road signs with monolingual signs. According to this article in the Vancouver Sun (which you probably can't read unless, like me, you subscribe), this is causing problems for many tourists, who are no longer able to find their way. Many placenames are so different in their written Irish and English forms that people equipped with English-language maps can no longer navigate. The reporters encountered two busloads of lost French tourists who reported that they had passed through a village but had been unable to find out its name.

There is an amusing aspect to this, but it illustrates the difficulty of preserving endangered languages. Taking down English-language road signs reduces the amount of English in the environment and may thereby give a bit of support for Irish. At the same time, they risk discouraging tourism and other activities that may be important for maintaining the local economy, and with it, the infrastructure that supports the language.

Posted by Bill Poser at 01:57 AM

May 09, 2005

Kansas: a joke?

To my utter amazement, some people out there in cyberspace read my post on linguists boycotting intelligent design hearings in Kansas and thought its claims were serious and true. I really shouldn't have to do this (there is nothing so plonkingly dull), but I think I'm going to have to carefully explain my joke. Sigh.

I'm not making this up: There really were people who were taken in. Follow these links, and read not just the main posts but, especially, some of the comments:

http://pharyngula.org/index/weblog/comments/also_god_speaks_english/

http://forums.projectam.com/index.php?showtopic=8885

http://discuss.joelonsoftware.com/default.asp?off.9.121435.15

http://raath.org/v5/archives/2005/05/signs_of_intell.html

http://teflsmiler.typepad.com/weblog/2005/05/the_crazies_are.html

Some of these people definitely assumed (having perhaps not read carefully enough before they punched the Submit Comment button) that my post was a serious news report, or at least were initially undecided about or not I was serious (Mark tried to clarify things by doing a jokey follow-up post about Chomsky testifying for the anti-evolutionists, but knew enough by this time to add a statement that he was joking).

My post was drafted by simply copying and pasting actual current articles about the Kansas state board of education and falsifying them by inserting references to linguistics (insert "linguistic" before "evolution"; change "scientists" to "linguists"; change "the origin of life" to "the origin of the English language"; and so on). Yes, it was a joke.

The basis, though, was real news. The Kansas board of education really is under the control of conservatives again, and they really are determined to sneak religiously motivated anti-Darwinism into the curriculum again (they tried it six years ago), and they really are holding hearings this month on whether to require the teaching of intelligent design as an alternative view to natural selection in biology classes in the Kansas schools. Because the story about all this had just broken, I thought people would see what I was doing with my merry prank.

And scientists are boycotting the hearings: the American Association for the Advancement of Science, in particular, declined in writing to dignify the hearings with an official representative. They apparently don't want to lend any credence to the posturing of intelligent design advocates (who never publish papers in science journals clarifying their view or supporting it with evidence, but spend all their efforts on trying to sway the public, exerting influence on school boards, lobbying gullible politicians, etc. — not what scientists do with a serious theory).

I did use some real names in my spoof: Alexa Posny and George Griffith are genuine members of the Kansas education community, and I took their names from the real press coverage, e.g. the widely quoted UPI story repeated or excerpted all over the place in sources like Science Daily. But all the quotes from linguists are fake (Mark Liberman exists, but is not the utterer of any of the quotes I stuck mischievously in his mouth; the most recent president of the Linguistic Society of America really does work at the University of New Mexico, but she was not approached for comment).

"Immanuel Quierbaiter" is a pure invention, but of course he is designedly reminiscent of a real resident of Kansas, the ghastly Pastor Fred Phelps of Topeka, the bitter old maniac who took a busload of supporters from Kansas to the funeral of murdered gay student Matthew Shepard in Wyoming so that they could confront Matt's grieving mother with GOD HATES FAGS placards (how tasteful; how Christian). Phelps is (as far as I know) quite irrelevant to the current resurgence of the anti-evolution forces in Kansas schools, but he must bear some of the guilt if Kansas is becoming more generally a byword for intolerance, cruelty, and 19th-century intellectual attitudes.

It is not fair, of course, that the whole state of Kansas should be stereotyped as a place of cruel, homophobic, atavistic, bible-thumping, anti-scientific crackpots when most of the state's people are (I've been there) perfectly sensible. What's my excuse for contributing to such stereotyping? I believe that when people start pulling damaging political stunts like the one the conservatives on the board of education are currently pulling, risking irreparable harm to their own state's reputation for quality education, it will be marginally better for Kansas if we laugh at them than if we simply weep for America.

Posted by Geoffrey K. Pullum at 05:05 PM

May 07, 2005

Not to or to not

No one here at Language Log Plaza seems to have commented on the cartoon in The New Yorker of 4/18/05, p. 14, in which one woman says to another, on the street, walking past a restaurant, "I'm moving to France to not get fat."

It's a virtually obligatorily split infinitive. The not can't move "down" into the VP get fat, because of the conditions on the VP negator not. If it moves "up", to before the to, then we get something with the wrong meaning.

The crucial thing here is that we're dealing with a purpose adverbial to get fat 'in order to get fat'. Not in combination with its VP (get fat) at least implicates agency on the part of the referent of the higher subject; 'in order to not get fat'. (I am not a semanticist, though I play one at Language Log Plaza, so go easy on me here.) But not to get fat is going to get the wider scope semantically: 'not in order to get fat'. And that's not what this woman wanted to say.

zwicky at-sign csli csli period stanford period edu

Posted by Arnold Zwicky at 11:18 PM

The people from the CCGW are here to see you

goodwriting
In response to my posting on Safire, Bellow, and which vs. that, Richard Hershberger has written with a little rant on the idea (espoused by Safire, among many others) that only good writers, with good reasons, have the freedom to violate the "rules" of grammar that bind all the rest of us.  How does someone like Bellow achieve this happy state, he asks.

Now the truth can be told.  Through moles I have planted at PEN (a "worldwide association of writers" pledged to "fight for freedom of expression" through its 141 centers around the world), I have discovered a dark side of the organization, the misleadlingly blandly named Committee for the Certification of Good Writing (CCGW), which enforces the separation of the few who are truly free from the many who are enslaved to the rules of grammar.


But first, Hershberger's heartfelt questions:

I am always amused by the admonishment that we should not model our writing off the best writers. Apparently the rest of us are only permitted to strive for mediocrity.  And how, I wonder, did someone like Bellow achieve this happy state of freedom from arbitrary rules?  Was he a great writer from the first time he put pen to paper, or did he become great at some point in his career?  If the latter, did he formerly carefully observe the arbitrary rules?  If not, the surely his writing was substandard, not even rising to the level of mediocrity the rest of us strive for.  How then did such a poor writer achieve greatness?  And does his earlier writing benefit retroactively from his greatness dispensation?  If he did formerly observe the rules, when did he start not observing them?  How did he know he was now great enough to do this?  What is the notification process in these matters?

All of this, it turns out, is managed by the CCGW, through operatives more shadowy even than those of the MacArthur Foundation.  (My moles suggest that once MacArthur figures out how to plug its security leaks, the two organizations will merge.  It's a natural pairing.)  These operatives scan through trillions of words of text, of all sorts, every month, to find those that score high on each of two measures: the Writing Excellence, or WE score, a measure of creative thought and rhetorical excellence; and the Grammatical Purity, or GP score, a measure of adherence to the rules put forward in in-house style manuals, lists of dos and don'ts in grammar, and secondary school textbooks.

The vastness of this enterprise is incredible.  The eyes of the CCGW see (and judge) all: elementary school essays and stories, Post-It notes, e-mail, zines, little poetry magazines, college writing samples, doctoral dissertations, porn stories, television and movie scripts, assembly instructions, technical manuals, pulp fiction, serious novels, political blogs, biographies, livejournals, letters to politicians, interoffice memos, tabloid newspapers, and of course The Guardian, The Economist, The New York Times, The New Yorker and their counterparts in other languages.  And much much more.  There is no hiding from the CCGW.  You can toe the line in your articles in Harper's, but if you split infinitives or use restrictive which in newsgroups, your GP score is going to take a nosedive.  Your submissions to Poetry magazine might be models of grace and clarity, but if your letters to your agent are muddy and have clunky transitions, it's bad news for your WE score.  This is a harsh world, folks, and only a very few float to the top.

Those who do are tapped for the Good Writer Certificate, which is not an actual piece of paper with things written on it (that could fall into the wrong hands, you know), but an oral oath, administered in a most solemn private ceremony by two members of the CCGW.  The lucky writer is granted lifetime freedom from the rules, but must not refer in any way to the certificate, on pain of having both hands amputated and the larynx ripped out.  (One of my moles had to write notes to me with a pencil held between her teeth, and the other communicated by blinking his eyes in Morse code.)

Saul Bellow apparently showed extraordinary promise early in life, and was tapped while still in high school.  As he tells it (without reference to the certificate, of course):

At school, we, the sons and daughters of European immigrants, were taught to write grammatically.  Knowing the rules filled you with pride.  I deeply felt the constraints of "correct" English.  It wasn't always easy, but we kept at it conscientiously, and in my twenties I published two decently written books.
   (" "I Got a Scheme!": The words of Saul Bellow", The New Yorker, 4/25/05, p. 76)

The man was Free at 17, and then blazed on, writing according to what sounded right to him.  Lucky bastard.  Well, he put in his time.

Now: the dark side of the dark side.  What happens to those who are high on WE but low on GP?  Those who risk writing well while breaking the rules without permission?

Again, you are visited by two CCGW operatives.  They are dressed all in black, including black ski masks that reveal only their dark, steely eyes.  (Note the genre-appropriate "steely".  I know how to sling this stuff.)  They explain, in expressionless voices, what awaits those who exhibit "prematurely free grammar": the retracted royalty checks, the canceled book tours, the devastating reviews by famous people writing in prominent places, the accusations that you have been molesting children of your own sex, and on and on.  Their weapons are many, all fearsome.

Reader, I know this.  They came to me.  I took their words to heart.

I vowed to cut my WE score in half, so as to stay clear of the CCGW's notice.  I stuck to awkwardly technical academic writing, hastily scribbled postings to newsgroups, mailing lists, and blogs, poetry only my friends would publish, and fiction that only my friends would even read.  This has served me well for a lifetime of writing.  I have managed to make a decent living while flouting the rules of grammar, without being ground to dust under the heel of the CCGW.

Learn from my story.  It's Bellow's way, or mine.

[Awkward academic that I am, I can't resist an actual observation about grammar and usage proscriptions.  Rodney Huddleston noted in e-mail to me yesterday that "as far as I'm aware, prescriptivists don't actually say which in restrictives is incorrect: it is, rather, a matter of that being (much) preferable."  This is in fact true of what I think of as the "high end" of the modern advice literature on grammar and usage, from H. W. Fowler through Bryan Garner; they warn you about possible problems (not always realistically, I must add), rather than issuing blanket prohibitions.  Meanwhile, the low end -- the in-house style manuals, lists of dos and don'ts for writers, and secondary school textbooks that I mentioned above -- tends strongly towards Just Saying No.  The idea seems to be that it's easier for people if they don't have to use their judgment, but can rely on simple, clear rules.  Oh, this is where we came in.]

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 07:54 PM

Ethnologue: Languages of the World: Fifteenth Edition

The fifteenth edition of the Ethnologue has now been published, in both web and print versions. "The Ethnologue is an encyclopedic reference volume that catalogs all the known living languages in the world today. The Ethnologue has been an active research project for more than fifty years. Thousands of linguists and other researchers all over the world rely on and have contributed to the Ethnologue. It is widely regarded to be the most comprehensive listing of information of its kind." (excerpt from book announcement).

Features:

  • 6,912 language descriptions organized by continent and country
  • 39,491 primary names, alternate names, and dialect names
  • 208 color maps showing location and distribution of languages
  • Unique three-letter identifier for each language from the Draft International Standard ISO 639-3
  • Statistical summaries with numerical tabulations of living languages and number of speakers by continent, by language size, by language family, and by country

Browse the web version

Posted by Steven Bird at 04:58 PM

Chiune Sugihara

Thursday night the documentary Sugihara: Conspiracy of Kindness was shown on television here. It is the story of Chiune Sugihara נ"ע, who as Japanese consul in Kovno (Lithuanian Kaunas) Lithuania in 1940 defied the orders of his government and issued 2,139 visas, thereby saving thousands of lives. Among those he saved were the students and teachers of the famous Mir Yeshiva. The Japanese government needed his skills (among other things, he spoke both German and Russian) and so kept him on during the war, but after the war he was forced out by the Foreign Ministry, and for most of the remainder of his life lived in poverty and disgrace. Recognition in Japan, where a monument in his honor was eventually erected in his home town of Yaotsu, came only after his death in 1986, but in 1984 he was awarded the designation of חסידי אומות העולם [xasidei umot ha-olam] Righteous Among the Nations and his name was inscribed on the Wall of Honor in the Garden of the Righteous at Yad Vashem in Jerusalem.

The central aspect of the story of this great hero is not linguistic, but there is a a little linguistic point that warrants elucidation. Sugihara is known by two names: Chiune Sugihara and Sempo Sugiwara. This is frequently mentioned in accounts of his life, but few seem to understand the relationship between his two names. For example, this article on the web site of the United States Holocaust Memorial Museum says:

Sugihara is sometimes also referred to as "Chiune," an earlier rendition of the Japanese character for "Sempo," part of his formal name.

This wrongly suggests that his true given name was sempo and that chiune is an erroneous pronounciation based on an older reading of the characters. (It is also incorrect in its use of the singular character. Sugihara's given name is written with two Chinese characters.)

The real story is quite different. Sugihara actually used two names. His original name was Sugihara Chiune (since in Japanese the family name precedes the given name). He used the name Sempo Sugiwara when he worked as the representative of a Japanese company in Moscow from 1960 to 1975. He is said to have used a pseudonym to prevent the Soviet government from recognizing him as the Japanese diplomat who in 1932 had outsmarted them and obtained a very good deal for Japan when it purchased the Northern Manchurian Railroad. The Soviets were so angry at what they considered his fast dealing that when the Japanese government later attempted to post him to the Soviet Union, they refused to accept him. If not for this he would not have been in Lithuania in 1940.

Sugihara's pseudonym was not arbitrarily chosen. His name is written like this: 杉原千畝. 杉原 [sugihara] Japanese cedar + field is his family name; 千畝 [chiune] one thousand + furrows is his given name.

Once upon a time, in Proto-Japanese-Ryukyuan, field, plain was pronounced [para]. It isn't entirely clear whether the [p] was still pronounced [p] in Old Japanese, but eventually non-geminate [p] in Japanese became [ɸ] (like English [f], but bilabial rather than labiodental). [ɸ] in turn developed in different ways in different positions. Word-initially it became [h] except before the vowel /u/, where it remained [ɸ]. Intervocalically it became [w], where it was then lost before all vowels other than [a].

In compounds like Sugihara there is, and has been, variation as to what counts as intervocalic position. If the components are treated as separate words, the /h/ is retained, while if they are treated as more tightly bound, it is intervocalic and becomes /w/. The family name that Sugihara used in the Soviet Union is thus a slight variant of his real family name arising from the different effects of different treatments of the compound.

Sugihara's real given name chiune and the pseudonym he used in the Soviet Union, sempo, are also versions of the same name, but the relationship between them is different. When Chinese characters are used to write Japanese, they sometimes represent native Japanese words and sometimes represent Chinese morphemes borrowed into Japanese. For example, the character 水 water may, depending on context, be pronounced /mizu/, the native Japanese word, or /sui/, the loan from Chinese. Most Chinese characters used in Japanese have both types of readings. In a fair number of cases there is more than one reading in a class. The character 生 has the Sino-Japanese readings /sei/ and /syo:/, and at least seven native Japanese readings, covering a range of meanings from give birth and live to fresh, raw.

What Sugihara did in creating his pseudonym was to use alternate readings of the Chinese characters used to write his real given name. The first character, 千 one thousand, has the native Japanese reading /chi/. Its Sino-Japanese reading is /sen/. The second character, 畝 furrow, has the native Japanese reading /une/. Its Sino-Japanese readings are /ho/ and /bo:/. When /sen/ and /ho/ come together, /ho/ becomes /po/ by a phonological rule of Japanese. The final /n/ of /sen/ then assimilates in point of articulation to the /p/, yielding /m/. Sugihara's Soviet given name was thus obtained by using the Sino-Japanese readings of the characters used to write his real given name, in which the characters have their native readings.

In sum, Sugihara's alternate name is not a mistaken reading by others. It is a pseudonym that he actually used, which he created by using alternative pronounciations of his names.


Posted by Bill Poser at 04:32 PM

May 06, 2005

She and he in the Wall Street Journal

wsj As part of a project on non-standard pronoun case in coordination (Me and him did it; between you and I), Stanford student Tommy Grano did some searching through various sources of data, among them the Wall Street Journal corpus (formal writing, containing very few non-standard pronouns) and AltaVista (more informal writing, with more non-standard pronouns, but still not many), and stumbled on a much larger and more striking difference between the two sources as representatives of different genres: a big sex bias in the WSJ.


There were five relevant variables in this little study, which looked at conjunctions of personal pronouns with nonpronominal NPs: person/number of the pronoun; case of the pronoun (nominative vs. accusative); order of pronoun and the NP; grammatical function of the coordination (subject vs. direct object vs. object of a preposition); and source (WSJ vs. AV).   Grano found small numbers for some person/number combinations (1/pl and 3/pl), for some orders of pronoun and NP, and for non-standard case choices.  But substantial numbers of examples appeared for the standard subject case choices NP and I, he and NP, and she and NP. (Other studies suggest that I prefers second position, while the other pronouns tend to prefer first position, which puts light elements before heavier ones.)

There were 85 conjunctions of pronoun and NP in WSJ, 574 in AV.  The results, expressed as percentages of these totals for particular combinations, source by source:

conjunction
WSJ
AV
NP and I
8%
9%
he and NP
64%
18%
she and NP
6%
26%

Conjunctions involving 1/sg are pretty much the same in the two sources, but those involving 3/sg are wildly different: in AV, male and female are more or less comparable, though with an advantage to female; but in WSJ, it's male over female by an enormous margin.  The Wall Street Journal seems to talk about men in connection with others (other people, or ideas, or whatever) vastly more than women.  In everyday life, as sampled (however imperfectly) by AltaVista, women are slightly in the majority, but in the world of public events, women are of little note.  Well, we knew that, but, still, I was a bit shaken by the size of the difference.

[Buried in that table is the fact that in WSJ, a full 70% of the conjunctions are 3/sg, while in AV it's only 44%.  Not surprising, since AV has a lot of second-person reference (which was not of much interest to Grano, since you shows no case differentiation).]

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 05:01 PM

Chomsky testifies in Kansas

In further news from Kansas, the state board of education obtained testimony by deposition in absentia from Noam Chomsky, on the advice of Daniel Dennett:

If Darwin-dreaders want a champion who is himself deeply and influentially enmeshed within science itself, they could not do better than Chomsky. [from Daniel Dennett, "Chomsky Contra Darwin: Four Episodes", in Darwin's Dangerous Idea pp. 384-393]

Some selections from Chomsky's testimony:

It is perfectly safe to attribute this development [of innate language structures] to "natural selection", so long as we realize that there is no substance to this assertion, that it amounts to nothing more than a belief that there is some naturalistic explanation for these phenomena. [Noam Chomsky, Language and Mind, 1972, p. 97]

In studying the evolution of mind, we cannot guess to what extent there are physically possible alternatives to, say, transformational generative grammar, for an organism meeting certain other physical conditions characteristic of humans. Conceivably, there are none -- or very few -- in which case talk about the evolution of the language capacity is beside the point. [Chomsky 1972 p. 98]

It surely cannot be assumed that every trait is specifically selected. In the case of such systems as language or wings it is not even easy to imagine a course of selection that might have given rise to them. A rudimentary wing, for example, is not "useful" for motion but is more of an impediment. [Noam Chomsky Language and Problems of Knowledge: the Managua Lectures 1988 p 167]

It may be that at some remote period a mutation took place that gave rise to the property of discrete infinity, perhaps for reasons that have to do with the biology of cells, to be explained in terms of properties of physical mechanisms, now unknown. . . . Quite possibly other aspects of its evolutionary development again reflect the operation of physical laws applying to a brain of a certain degree of complexity. [Chomsky 1988, p. 170]

[For the commenters at Pharyngula -- this is a joke, OK? Chomsky did not testify in Kansas nor did he submit a deposition. As far as I know, the Kansas state board of education is unaware of his views.]

Chomsky defends himself here, against observations made by John Maynard Smith in a review of Dennett's book.

My own impression is that Chomsky has always been motivated by rationalist epistemology, and has always rejected the idea of Darwinian evolution of mental abilities, which he sees as a sort of genetic empiricism. He simply and consistently dislikes the idea that language might be learned, whether by neurons or by genes. As a result, he prefers to wait for "physical mechanisms, now unknown" at the level of cell biology, or necessarily-emergent properties of complex-enough brains, or some other now-mysterious form of explanation.

I very much doubt that he believes in "intelligent design", but his skepticism about the efficacy of natural selection makes him a natural ally for its partisans, who might well be happy to supplement the biology curriculum with references to "physical mechanisms, now unknown" and to the emergent properties of complex systems. After all, such emergent properties provide a natural programming language for the Divine Watchmaker to use to encode her plans for creation.

And given Chomsky's taste for intellectual provocation, I can imagine him flirting with such an alliance. In the first lecture I ever heard him give, in 1965, he asserted that psychology had in no way improved on Plato's theory that learning is remembering past lives. Hilary Putnam interrupted from the back of the room: "Wait a minute. You're not seriously suggesting that reincarnation is a plausible explanation?" Chomsky held his ground: "Why not? It certainly makes more sense than associative learning does."

Well, OK, it's not very likely that we'll see papers co-authored by Chomsky and Michael Behe, despite the analogy between Behe's notion of irreducible complexity and Chomsky's views on the uselessness of "a rudimentary wing" or an imperfect language. I imagine that C. dislikes creationism as an explanation as much as he dislikes learning (whether Hebbian or Darwinian), and the intelligent-designers are clearly creationists in disguise.

[Update 5/8/2005: Kerim Friedman at Keywords " elaborate[s] upon [what he takes to be] the scientific foundations for Chomsky's skepticism", specifically Gould's idea about spandrels. I certainly agree with Kerim that "skepticism about some of the core assumptions of evolutionary biology ultimately strengthen, rather than weaken, science; as well as Darwin's legacy". However, I wanted to suggest that the foundation of Chomsky's attitude on this subject seems to me to be epistemological rather than biological.

Cosma Shalizi raised by email a question I've always wondered about, namely the connection between Chomsky's anti-Darwinism and that of his co-author Marcel-Paul Schuetzenberger (N. Chomsky and M. P. Schuetzenberger, The Algebraic Theory of Context-Free Languages (Studies in Logic and the Foundations of Mathematics). Amsterdam, The Netherlands: North-Holland, 1963, pp. 118-161; M. Schuetzenberger. 1967. "Algorithms and neo-Darwinian theory." In Paul S. Moorhead and Martin M. Kaplan, ed. Mathematical challenges to the neo-Darwinian interpretation of evolution, p. 73. The Wistar Institute Symposium Monograph No. 5.) Cosma pointed out that Schuetzenberger (who outside of this context was an accomplished mathematician) "influenced David Berlinski, and I think Behe and Dembski too, though I'm less sure of that. For Berlinski, see his characteristically idiotic essay "The Deniable Darwin", _Commentary_, vol. 101, no. 6 (June 1996). Dembski quotes a particular argument Schutzenberger gave, for the impossibility of evolving computer programs, in 1966; of course by 1975 John Holland and his group had done enough work on genetic algorithms that Holland could publish a classic book on the subject... "

Schuetzenberger's anti-Darwin arguments have nothing to do with Gould's spandrels, but rather involve calculating the (in S's view vanishingly small) probablity that random processes could result in observed biological complexity. I imagine that Chomsky heard these arguments in the early 1960s, and they probably form part of the history of his opinions about neo-Darwinism.

Cosma also pointed out that the ID people generally hate the idea of self-organization and emergence. This seems to me to be short-sighted on their part, since there is no other naturalistic mechanism available for an intelligent designer to use, even to "program" the genotype-phenotype relationship, much less the observable dynamics of population genetics interacting with the environment. Perhaps the ID-ers are just tripping over that "self-" morpheme, which unnecessarily implies that no one established the initial conditions that led to a particular outcome. ]

Posted by Mark Liberman at 09:04 AM

May 05, 2005

Linguists boycott Kansas intelligent design hearings

The state board of education in Kansas plans to hold hearings in May on the "intelligent design" theory of the origin of English, which claims that the language was constructed in the early 16th century by a committee of unknown experts guided by a Supreme Grammarian. But professional linguists are mostly boycotting the hearings.

Six years ago, when conservatives previously held a majority of seats on the Kansas board of education, they established guidelines encouraging schools to give equal time to the theory of linguistic creationism, which claims that English was created directly by God five hundred years ago at the start of the Great Vowel Shift so that the King James Bible could be translated into it. But this triggered a backlash, and they lost control of the board, which repealed the guidelines. Now that conservatives are back in a majority position, they are instead promoting the teaching of the intelligent design theory. But linguists are not willing to appear at their scheduled hearings on the subject.

Alexa Posny, a deputy commissioner with the state department of education, informed the Kansas City Star that linguists have declined to testify on the linguistic evolution side. "We have contacted linguists from all over the world," Posny said. "There isn't anywhere else we can go."

But at Language Log Plaza in Philadelphia, the director of the giant Language Log organization, Dr Mark Liberman, charged that the hearings were a set-up, designed to have a pre-ordained outcome. He said that testifying would only permit the intelligent design theory to take on the appearance of legitimacy.

"Intelligent design is not going to get its forum, at least not one in which they can say that linguists participated," he said. "Language Log is in full support of Kansas Citizens for Linguistic Science on this."

In a letter to George Griffith, science consultant to the Kansas State Department of Education, Liberman said: "After much consideration, Language Log respectfully declines to participate in this hearing out of concern that rather than contribute to linguistic education, it will most likely serve to confuse the public about the nature of the linguistics enterprise."

Supporters of intelligent design say it is a theory with scientific backing. According to dissident linguist and itinerant preacher Immanuel Quierbaiter, who plans to testify at the hearings in favor of intelligent design, "Anyone who takes an unbiased look at the intricacies of the English language as detailed in Harvey's English Grammar will see that it shows evidence of having been carefully designed for its communicative purpose. It is beyond belief that such a system could have simply evolved through random processes of change."

Historical evidence of such processes of change has been refuted, he claims: "There are gaps in the literary record that the linguistic evolutionists have never explained."

Opponents of the theory, however, believe that it represents an attempt to smuggle religion into English classes. Said Dr Liberman: "If intelligent linguistic design is a viable theory, it should be defended through articles in peer-reviewed journals, not lobbied for politically in school boards."

Privately, linguists are more outspoken. A senior official of an important professional organization of linguists, reached at her office at the University of New Mexico in Albuquerque, said, "Are those crazies at it again? Well I don't want them driving down here from Topeka and picketing my classroom; no goddamn comment from me."

And one Language Log staff writer who declined to give his name, speaking briefly with a reporter while waiting in line at Language Log Plaza's café, the Latté Linguistica, snapped: "Intelligent design my ass. Have you looked at the lie / lay situation? It's a total disorganized mess. One thing I'm sure of: we're not looking at the product of a perfect mental organ created with the guidance of a higher grammatical power."

Posted by Geoffrey K. Pullum at 02:19 PM

Why is "compound" code for "cult"?

In an earlier post, I offered some statistical evidence that Mark Steyn was right when he wrote that "the [New York] Times seems to use the term [compound] as universally accepted shorthand for 'wacky cult'". Lane Greene suggested by email that other uses, like "Kennedy compound", may be similar sorts of writerly sneers:

... I more or less  intuitively agreed with Steyn's thrust when I first read it - "compound" seems to imply a certain sneering at someone different.

But then I thought of "Kennedy compound". Have you ever noticed it's always the "Kennedy compound"? That's journalese for "these rich jerks have so much money they can just do whatever they want, including perhaps raping a girl here and there." [...]

Belonging to rich people who don't have particularly bizarre religious beliefs, the Kennedys' home is surely more like an "estate". So maybe the common denominator in "compound" is sneering by the person writing it.

I get the same feeling that Lane does about "Kennedy compound", but I'd go back one step further.

It seems to me that all the common uses of compound for "residential complex" -- oriental trading post, military compound, diplomatic compound, rich folk's compound, family compound in Africa, and so on -- have in common the idea of a protected area in the middle of an excluded and generally hostile world.

In the case of the military and the diplomats, we accept that they have conventional reasons to set themselves apart and to shut out those outside. Likewise, I guess, for a family in a place like Nigeria, where we accept that protective exclusion is also conventional and reasonable.

In the case of the Kennedys and the cults, the motives for exclusion are less conventional and more a matter of a choice that the compound owners have made. Also, we ourselves are among those being excluded. So the use of the world "compound" in those cases evokes an "us" vs. "them" divison, set up by the compound owners as "them" on the inside, with the writer and the reader as "us" on the outside. In effect, they're the Europeans and we're the natives.

At that point we can ask why the writer chose a word with that effect, and there's where the sneering comes in, I think.

There is probably also a certain amount of collocational conventionality associated with such phrases, as these Google counts suggest:

 
compound
estate

compound/estate
ratio

"Kennedy __" Hyannisport
1,480
55
26.9
"Bush __" Kennebunkport
620
576
1.08
"Gates __" Seattle
41
231
0.18
"Jackson __" Neverland
140
988
0.14

I suspect that Bill Gates is at least as resented as the Kennedys are, and he is certainly the object of much more widespread and active sneering these days. And I'm sure that his place in Seattle is just as well protected from intrusion as the Kennedys' place in Hyannisport is. Still, Gates' compound/estate ratio is much more favorable.

Posted by Mark Liberman at 11:06 AM

Restoring the trivium (+ one)

In this morning's email, from Barbara Partee:

Semanticists!

If the proposals of the author of the third letter in the NYT May 4 (responses to Friedman's April 29 column about American education falling behind ) were ever implemented, there would be lots of jobs for semanticists! And no one could say "mere" semantics ever again!

"I would limit the mandatory classes to a few core life skills: public speaking, self-defense, basic logic and semantics."

The letters: (link)
The original Friedman column: (link)


If we could persuade the NYT letter-writer that semantics has to be grounded in the rest of grammar, we would have restored the trivium in its traditional form. Well, he gave the order wrong, and threw in self-defense in an odd place. Grammar, rhetoric, logic, martial arts is much more, um, logical. Don't you think?

Posted by Mark Liberman at 08:00 AM

Is "compound" code for "cult"?

From Mark Steyn's defense of polygamy in the May Atlantic:

A nice middle-aged gay man in a committed relationship, with a weekend home in Connecticut, where he serves as a popular longtime usher at the local "open and affirming" Congregational Church? Alas, no. Owen Allred was a proponent of a far less fashionable minority marriage cause: he was the patriarch of the Apostolic United Brethren, Utah's second largest polygamous group, a church with some 5,000 to 7,000 believers, many of them living a confetti throw from Allred's home in Bluffdale, on the edge of Salt Lake City. [...]

I say "home," though The New York Times preferred "compound." The precise point at which a "ranch," a "bungalow," or an "eighteenth-century saltbox with many original features" becomes a "compound" is best left to real-estate agents ("Extensively remodeled compound with drop-dead views of ATF agents at the tree line calling for backup"). But the Times seems to use the term as universally accepted shorthand for "wacky cult"; and certainly Owen Allred attracted his share of lurid headlines over the decades.

Steyn is claiming that home / compound is one of those bandit / terrorist / militant / freedom fighter kind of things, where the choice of terminology tells you something about the attitudes of the writer and the publication as well as about the nature of the person or thing being described.

But is it true? In terms of description of real estate, it's hard to tell, because we don't know what the "compounds" are physically like, in comparison to the "homes", the "estates", the "campuses" and so on. But what about on the linguistic side? When does the NYT uses "compound" as a word for "[a] building or buildings, especially a residence or group of residences, set off and enclosed by a barrier"?

To start with, I was surprised to learn that this residential sense of compound comes from Malay kampong "village", rather than from Latin con+ponere "put together", as the "mixture" sense of compound does. The spelling "compound" for the residential sense is apparently a sort of folk etymology, an eggcorn that made it. The OED gives as the original gloss

The enclosure within which a residence or factory (of Europeans) stands, in India, China, and the East generally.
Supposed by Yule and Burnell to have been first used by Englishmen in the early factories in the Malay Archipelago, and to have been thence carried by them to peninsular India on the one hand and China on the other. In later times, it has been taken to Madagascar, East and West Africa, Polynesia, and other regions where Englishmen have penetrated, and has been applied by travellers to the similar enclosures round native houses.

Note that factory here means a trading station, where a factor does business, not "manufactory". Also note that it's sometimes hard to distinguish this use from what the AHD gives as an alternative sense, "[a]n enclosed area used for confining prisoners of war".

Anyhow, in the past month, the NYT has used the word "compound" 47 times, according to the LexisNexis index. Of these, one was the "prisoner" use:

The G.I.'s at Abu Ghraib lived in cells while most of the detainees were housed in large overcrowded tents set up in outdoor compounds that were vulnerable to mortars fired by insurgents.

while 24 were examples of the "enclosure of buildings" use. Of these, two were associated with a cult (the Branch Davidians), while the others involved military compounds (four times), diplomatic compounds (four times), and compounds of very wealthy individuals (three times -- Richard Geffen twice and Herbert Bayard Swope once). The other 11 were all over the place, e.g.

[Governors Island Wants a Developer, a Future...an Idea, Anyway] Among the many possible uses already envisioned are an academic compound; a hotel, spa and conference center; film production facilities; museums; office space; sites for concerts; and a marina. Plans also call for the maintenance of a public path along the island's 2.2-mile waterfront perimeter, and the creation of a 40-acre public park.

[Secluded Retreats on the Big Island] Ramashala, a two-building compound that opened this year in Pahoa in the Kehena area, just off the coastal road, functions both as a conventional inn and a spiritual retreat.

[Where West Africa Goes Straight to Video] Iroh, a 43-year-old real estate broker from Nigeria, whose film purchases that day included ''Behold Family Life,'' about rivalries within a family compound in Nigeria. ''Maybe in the next 10 years, it will come up. It will be as big as Bollywood.''

[Bus and Bridge Reunite Kashmiris Long Kept Apart] On Wednesday, they stormed a government tourism compound where Indian officials said scheduled passengers were being housed as a protective measure after repeated threats.

Mark Steyn may feel, on reflection, that an academic compound is indeed a cult building, but I don't think that's what he meant when he wrote that "the Times seems to use the term [compound] as universally accepted shorthand for 'wacky cult'". And I'm sure that he doesn't think that Nigerian family life is an intrinsically cultish subject. Thus the probability that a given NYT usage of (residential) compound implies "wacky cult" has not been very high in the past month -- 1 in 12, or about .08.

The NYT usage over longer periods of time suggests that the true rate is even lower. Over the past year, there have been 15 stories that mention both a cult and a compound. As it turns out, only 8 of these are real cults and building-type compounds -- the others are things like cult movies or designers, or compounds that are mixtures or drugs. The real cults named were the Branch Dravidians (twice), the Aryan Nation (twice), the Manson family, the Yearning for Zion Ranch (polygamists in Texas), the Oneida commune (a 19th-century social experiment), and Sri Chinmoy. In comparison, there have been 275 stories that include the words compound and military, and 71 stories that include both compound and embassy.

This confirms the evidence of the past month's stories: P(cult | compound) appears to be well under 0.1. We can't just say that the NYT uses compound to mean residential buildings of a cult, unless we add a long list of exceptions -- a military compound, a diplomatic compound, a very wealthy person's compound, an academic compound, a Nigerian family compound, and so on.

Still, Steyn may be on to something. Maybe he's intuiting a different sort of lexical inference: not the probability of cult given compound, but rather the probability of compound given cult, compared to the probability of other choices such as estate under the same condition. And maybe all he means is that this ratio of conditional probabilities is significantly higher for cult than for other types of residence-owners -- and that this creates an association that can have communicative force in the case of groups like polygamists. So let's come at the problem from that direction.

Faced with a particular residential complex -- some buildings and grounds where people live -- a writer can choose among several different English words. Two choices with rather different connotations are compound and estate. If Steyn is right, then the choice is more likely to be compound in the case of the residence of a "wacky cult" than in the case of the residence of an industrial mogul or a rock star. Contrariwise, the mogul or rock star's home is more likely to be called an estate than the cult's home is. Of course, the mogul's home will probably be more luxurious as well, consistent with the posh connotations of estate. Still, the tendency [cult → compound, mogul → estate, interpreted as P(compound | cult)/P(estate | cult) > P(compound | mogul)/P(estate | mogul)] will plausibly generate the indexical or connotative meaning that Steyn calls a "universally accepted code".

We can explore this idea using the (admittedly crude and inaccurate) measure of how often specified words occur in the same story (according to LexisNexis):

Word Count Time period With compound With estate
compound/estate
ratio
P(compound|w)
heiress
145
1 year
1
36
0.03
.007
mogul
268
1 year
5
58
0.09
.02
rock star
288
1 year
4
23
0.17
.014
cult
597
1 year
15
38
0.39
.025
neo-nazi
172
5 years
12
6
2.0
.07
polygamous
78
10 years
8
1
8.0
.10

Because of word-sense ambiguity, let's stipulate that these numbers are worth very little until the usage is checked, case by case. I'm sure, for example, that many of the heiress & estate stories are about real estate or estate-the-inheritance, and not estate-the-residence. Still, this table has the shape of a plausible argument, even if it isn't yet an argument that anyone should rely on.

Q.E.more-or-less.D.

Posted by Mark Liberman at 07:21 AM

May 04, 2005

Queens, but

Responding to recent posts on phrase-final particles like eh, ne, yo and hey, Cameron Majidi wrote in with something new to me: English dialects in parts of New York City where this role is played by but.

I've never heard the Broolynese hey that you find attested in Krazy Kat, but there is a similar usage I've heard among working class white New Yorkers, not from Brooklyn, but from Queens. In certain neighborhoods of Queens there is a distinctive accent that also preserves some weird usages that you don't normally run into anywhere else. I'm thinking primarily of places like Ridgewood and Maspeth. One of the characteristics of that accent is a sentence-final "but."

As in: "You can't park over by the pump, but." (A "pump" in this context is what most people call a fire hydrant - in places like Ridgewood you'll sometimes hear the longer form, "johnny pump".)

I've often compared this sentence-final "but" to the "what" that you find in representations of upper-class English speech from the early 20th century (cf. Wodehouse, etc.) But the "but" is generally not voiced to sound anything like a question. It's a very short syllable, almost a grunt, and the final /t/ tends to vanish into a glottal stop.

"I gotta stop off, pick up some breakfix, but."

Posted by Mark Liberman at 12:30 PM

Google library ninjas launch "virulent attacks" on European culture

BBC News tells us that plans for "a vast digital library to preserve Europe's cultural heritage" have taken another step, as various EU culture ministers responded favorably to Jacques Chirac's call. And of course, money was also mentioned:

Luxembourg Prime Minister Jean-Claude Juncker warned the Comedie Francaise meeting that such a massive project would only be possible if sufficient funding was made available.

Digital libraries, especially vast ones, are surely a Good Thing, but some of these people are saying weird stuff:

"We have to act," Mr Juncker, whose country is the holder of the EU presidency, told the meeting of culture ministers, artists and intellectuals who gathered to come up with a European charter for culture.

"That's why I say 'yes' to the initiative of the French president (Jacques Chirac) to launch a European digital library. I say 'yes' because Europe must not submit in the face of virulent attacks from others," he said.

If the BBC is quoting him accurately -- always a matter for concern -- this man is insane. Remember, the "virulent attack" that he's talking about is Google's deal with some research libraries to digitize books and let the public read them online.

Other (indirectly quoted) remarks are equally detached from reality, if not as bizarrely emotional:

EU officials and cultural commentators have voiced concern that Google's ambitious plans could result in important European literary works missing out and being lost to future generations.

There are good reasons for pluralism in this as in most things, and good reasons for a mix of public and private initiatives. The more digital library projects, the better (though rivalry may create artificial barriers to search and information integration of new kinds). But is it even remotely plausible that Google Print's Library Project will cause "important European literary works" to be lost? Do these folks really think that the great works of European literature are systematically missing from The New York Public Library and the libraries of the University of Michigan, Harvard University, Stanford University, and Oxford University? And do they fear that Google's crack cultural commandos are even now infiltrating across the channel, preparing to burn libraries and destroy scanners from Brest to the Danube?

Well, EU politicians are not individually insane, I know. But there is something collectively out of joint in European culture, if rhetoric like this really resonates with the public. Anyhow, maybe a better title for this post would be "Culture ministers are in favor of more money EU digital library".

Posted by Mark Liberman at 11:33 AM

Ne, innit?

In response to my post on the Japanese sentence-final particle ne, several readers have written to point out that some varieties of Portuguese have a particle with a very similar distribution. Quoting an email from Arlo Faria:

As a Brazilian, I was very surprised when I first learned of Japanese "ne" because it was coincidentally similar in usage and pronunciation to the "né" used in some dialects of Brazilian Portuguese. It might be common in other dialects as well, but I associate it with the Minas Gerais region of central Brazil, whose speakers are notoriously laconic by means of compressing phrases into single syllables: "né" is a shortened version of "não é" (translated "isn't it").

Arlo did some searching on the web and came up with plenty of evidence for non-paradigmatic and narrative uses of the contracted form:

You cannot say: são bons, não é? (they're good, isn't it?)
You should say: são bons, não são? (they're good, aren't they?)
Yet this is fine: são bons, né?

It doesn't have to follow the é ___, né construction (it's ___, isn't it?). can end any sentence, regardless of the preceding verb forms. Here, the meaning of is like Japanese "ne", or English "right?":

Saiu né ("he left, right?")
quando estiver, né ("when he would be, right?")

... And you can also find examples where it doesn't occur in the sentence-final position. It's generally used, I feel, for telling stories. Here's an excerpt from an interview with an old lady:

“... foi o que aconteceu comigo, né ... hoje em dia a mulher trabalha, né, depois que ... as mulheres trabalhá fora, né ... não saindo pra trabalhá fora, sabe?!"
(... that's what happened with me, {né} ... these days the woman works, {né}, after which ... women work outside, {né} ... not going out to work outside, you know?!)

Note this woman's parallel usage of né with sabe.

This seems entirely analogous to the development of the tag innit in British English. Jenny Cheshire, Paul Kerswill and Ann Williams "On the non-convergence of phonology, grammar and discourse" give examples like

We might as well go home, innit?

and variously cite Hewitt 1986, Rampton 1995 and Andersen 1999 to the effect that invariant innit started in London "in the speech of British ethnic minorities" (though they don't specify which), and is "rapidly innovating [i.e. spreading] in the urban centres of Britain", starting with working class speakers.

The use in narrative parallels the examples with Japanese ne and Canadian eh. I'm not sure whether innit has a similar narrative use, but I'm sure someone will tell me about it soon...

Meanwhile, I can't resist quoting Des von Bladet's subversive translation of a Norwegian linguistic bureaucrat:

Og Norsk språkråd gnir seg i hendene.
- Det er svært hyggelig at ungdom velger å skrive tekstmeldinger på dialekt. Å ha dialektvariasjon gjør oss til et sterkere språksamfunn, sier direktør Sylfest Lomheim til VG.


And the Norwegish langwidgecouncil is rubbing its hands.
"It is wicked cool that The Kids choose to write textmessages in dialect. Having dialektvariation makes us a stronger speechcommunity, innit?", direktör Sylfest Lomheim told VG.

 

Posted by Mark Liberman at 10:23 AM

May 03, 2005

Things may be more complex than they seemed at first

maydenison A while back (March 28th, in fact) I spent a little time examining the claim that the modal verb may was encroaching on the territory of modal might, and suggested that (insofar as this encroachment was actually going on, which wasn't entirely clear) it might have something to do with the perception that may is more informal than might.

Since then, Elizabeth Traugott has pointed me to David Denison's article "Counterfactual may have" (in Gerritsen & Stein, Internal and external factors in syntactic change, Mouton de Gruyter, 1992), which makes it clear that things are a whole boatload more complex than I'd first thought.  There probably isn't just one shift in usage going on; the shifts probably have different motivations; and different people probably are moving in different directions, in different constructions.

The larger lesson is that the details of linguistic variation -- what forms are available, with what meanings, by whom, in what settings, with what effects -- can be VERY hard to discern indeed.


So, is may expanding in use?  Many have claimed that things like If he'd have released the ball a second earlier..., he may have had a touchdown are evidence that it is.  But Denison suggests otherwise: in general, he sees a contraction of may, though with increased specialization -- expansion in just a few contexts (like the counterfactual).  He looks at the factors that might have favored the spread, a great many factors, but the bottom line is that there are tugs in different directions.  And for good reasons.  Let me speculate...

The alternation is between may (originally a present tense form) and might (originally a past tense form).  So we start with closeness (in time) with may versus distancing (in time) with might.  The concomitant of this difference that pretty much everybody has noticed is the greater tentativeness of might: further off in time is further off in certainty.

But there are other possibilities: greater subjectivity for may, greater objectivity for might (expressing belief in a possibility vs. reporting the possibility); or greater social closeness, more informality, for may, vs. greater social distance, more formality, for might.  There's more than one way to extend the present-past distinction metaphorically or metonymically.

Denison's 1992 article (primarily concerned with U.K. English) suspects that counterfactual may have might have spread from the U.S.  This is not an unreasonable idea.  Meanwhile, Denison has some evidence that some U.K. speakers consider may MORE formal (or standard or correct) than might, probably as a result of "corrections" of root may as a replacement for vernacular can, as in May/Can I have a cookie?

The landscape of variation we then see looks pretty lumpy, not unlike what we see with the famous (morpho)phonological variable (ING), where the same stuff bears very different social/discourse/personal meanings in different contexts.  Are you:  Competent?  Educated?  Cool and easy?  Southern?  Friendly?  Stupid?  Upper class?  Gay?  Distant?  Or what?

zwicky at-sign csli period stanford period edu


Posted by Arnold Zwicky at 11:19 PM

News flash: the effect of politics, athletics and sex on IQ

Inspired by Glenn Wilson's demonstration that IQ is lowered 10 points by "infomania" -- the distracting effects of email and telephone calls -- Language Log Labs is proud to announce a series of even more striking discoveries.

Experiment 1. Purpose: to determine the effect of heated political argument on intelligence. Procedure: Each subject will take the Wechsler Adult Intelligence Scale twice, once in a quiet room and once while at table with Andrew Sullivan, Arianna Huffington, Michael Kinsley, Ann Coulter and a partner of the subject's choice. The order of testing will be randomly counterbalanced across subjects.

Experiment 2. Purpose: to determine the effect of athletics on intelligence. Procedure: Each subject will take WAIS-III twice, once in a quiet room and once while occupying one of the goals during a practice session for a high-school soccer team, or sitting in the middle of the court during a basketball scrimmage. The order of testing will be randomly counterbalanced across subjects.

Experiment 3. Purpose: to determine the effect of sexual intercourse on intelligence. Procedure: [redacted].

Preliminary results:

political argument causes a 20-point reduction in IQ;
soccer causes a 30-point reduction in IQ;
basketball causes a 40-point reduction in IQ;
sex causes [redacted].

[Update 9/25/2005: for the truth about the experimental design in the original study -- ironically, exactly the one lampooned here -- and an apology for blaming the media's excesses on Glen Wilson, see this post.]

Posted by Mark Liberman at 04:46 PM

Don't do this at home, kiddies!

safirethat Some people never met a rule of grammar they didn't like.  And some people seem not to have read the books they recommend with enthusiasm.  And some people believe that freedom from social constraints -- those rules of grammar, for instance -- is allowed only for the elite, the professionals and the artists: Don't do this at home, kiddies!

Bill Safire seems to be all three of these, to judge from his exchange on restrictive which versus that with Saul Bellow, as reported in his "On Language" column (in the New York Times Magazine of 1 May 2005, p. 26).


Safire tells us:

...some years back I performed an exegesis in this space on a beautiful extended metaphor the novelist used in one of his rare Op-Ed essays.  Snowbound in Boston, he wrote: "Let the pure snows cool these overheated minds and dilute the toxins which have infected our judgments."

In case anyone complained about his use of "the toxins which" instead of that introducing the restrictive clause "that have infected our judgments," I noted that "you get Nobel prizes for literature, not grammar."  Bellow promptly responded: "I'm only fair at relative pronouns.  I do know the restrictive from the nonrestrictive.  'Which' sounded better than 'that,' and I do go by sounds as well as by grammar."

That I took as a lesson for the overheated minds in the endless struggle of Language Snobs against Language Slobs.  Good writers are free to break the rules of grammar, but their freedom gains meaning when they know the rules and overrule them only for an artistic or polemical reason.

Point 1: Safire just accepts the advice "use that for restrictive relatives, which for nonrestrictive relatives" as a genuine rule of English grammar.  Even H. W. Fowler, whose 1926 formulation of this advice seems to have been the source of the astounding popularity of this "rule" (which has found its way into the practice of thousands of copyeditors, not to mention the Microsoft Word grammar checker), didn't go this far.  Fowler observed: "Some there are who follow this principle now; but it would be idle to pretend that it is the practice either of most or of the best writers."

Fowler's idea was that, since the variation between that and which in nonrestrictive relatives had by 1926 pretty much been eliminated, in favor of which, things would be neat and clean if this variation in restrictive relatives were also eliminated, in favor of that.  He apparently saw no basis for choosing between the two in restrictives and abhorred a choice that would be made entirely on the basis of murky considerations like the "sound" or "feel" of a sentence; there should be Only One Right Way to do anything, and it was the business of those who gave advice on grammar and usage to dictate how people should behave, or at least to exhort them to choose the Right Way.

Most linguists -- especially sociolinguists -- think this a really silly idea, but some people, like Safire, seem to have never met a rule they didn't like, especially if the rule would bring order into apparent chaos. 

In any case, Merriam-Webster's Dictionary of English Usage details the sad history of this "rule", noting wryly that authors who recommend it routinely violate it and that the facts of usage are squarely against it.   MWDEU  concludes, "You can use either which or that to introduce a restrictive clause--the grounds for your choice should be stylistic" (as it was for Bellow), and adds, "Formality does not seem to be much of a consideration in the choice", despite what a number of commentators have claimed.

Point 2: The cover of MWDEU carries an enthusiastic recommendation from Safire: "One of the great books on language..."  Now, if only Safire would read the damn book and take its lessons to heart!  Interestingly, the column from which the quotation above is taken is mostly about the puffery of blurbs: "Literary editors have learned to be suspicious of all endorsements." Rightly so, I gather.

Point 3:  The "Don't do this at home, kiddies!" advice -- leave the breaking of rules to the competent professionals, and then only if they have a good reason for breaking them -- is really condescending.  Writers like Bellow are allowed the freedom to have a personal style, but not the rest of us, who are expected to be compliant to arbitrary authority.  Hell no, I won't go.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 03:19 PM

Laying in the past

A reader named Kelly pointed out to us that as of 1 p.m. Eastern today, despite many urgent email requests, the CNN main news page was showing a Technology News headline link saying:

CHANGING EARTH
Clues to future may lay in past
Climate change could have drastic consequences

Grammar change could have drastic consequences too! As Kelly has pointed out to these people, lay is the transitive verb, not the intransitive one. (That is, you lay carpet or eggs, and you lay down the law; then when you're tired you lie across a big brass bed. Those are the rules. So those clues to the future lie in the past. Where are the copy editors when we need them?) Kelly was frantic: was nothing going to be done? Do these CNN types care nothing about vital lexemic distinctions? This issue has been carefully treated on Language Log, in a post offering a disastrously confusing account of the situation, but no one is listening. It is time for something to be done. Readers concerned to preserve confusing English irregular verb lexeme distinctions should start calling CNN on the phone to complain. (No, of course I don't know the number. Big websites never give phone numbers any more. They don't want you to call them. They want you to just fumble around clicking on stuff.)

The headline of the archived version of the story was eventually changed to "Past may hold clues to climate's future", so don't look for the erroneous version now.

Posted by Geoffrey K. Pullum at 01:17 PM

Linguistically noteworthy dates in May

We professional linguists often find ourselves thinking about the late genius James D. McCawley, but even more so in April and May than at other times. Jim died in April six years ago (April 10, 1999 — can it really be six years?). He had dinner, attended a Saturday evening concert on the campus of the University of Chicago, started to walk home, and fell dead of a heart attack. He was just 61. He was not just a great linguist — phonologist, grammarian, semanticist, and polyglot — but a musician, musicologist, Orientalist, and gourmet (he wrote a book called The Eater's Guide to Chinese Characters); and a wonderful humorist. How we would love to have him writing for Language Log today. May has just begun, and that reminds us of one famous humorous publication of Jim's — just a sheet of paper he typed out and duplicated for friends: a calendar of (completely fake) alleged linguistically important dates in the month of May. It's a bit arcane, I know; all I can say is that the more you know about linguistics, the funnier it gets. Read on if you'd like to see it. Old-timers in linguistics will be able to explain the ones you don't get. You may notice them wiping away tears, sometimes from laughter.

DATES IN THE MONTH OF MAY THAT ARE OF INTEREST TO LINGUISTS

By Jim McCawley

May 2, 1919. Baudouin de Courtenay concedes defeat in his bid for the presidency of Poland.
May 3, 1955. Mouton & Co. discover how American libraries order books and scheme to cash in by starting several series of books on limericks. The person given charge of this project mishears and starts several series of books on linguistics. No one ever notices the mistake.
May 5, 1403. The Great English Vowel Shift begins. Giles of Tottenham calls for ale at his favorite pub and is perplexed when the barmaid tells him that the fishmonger is next door.
May 6, 1939. The University of Chicago trades Leonard Bloomfield to Yale University for two janitors and an undisclosed number of concrete gargoyles.
May 7, 1966. r-less pronunciation is observed in eight kindergarten pupils in Secaucus, N.J. The governor of New Jersey stations national guardsmen along the banks of the Hudson.
May 9, 1917. N. Ja. Marr discovers ROSH, the missing link for Japhetic unity.
May 11, 1032. Holy Roman Emperor Conrad II orders isoglosses erected across northern Germany as defense against Viking intruders.
May 12, 1965. Sydney Lamb announces discovery of the hypersememic stratum, setting off a wave of selling on the NYSE.
May 13. Vowel Day. (Public holiday in Kabardian Autonomous Region). The ceremonial vowel is pronounced by all Kabardians as a symbol of brotherhood with all speakers of human languages.
May 14, 519 B.C. Birth of Panini.
May 15, 1964. J. Katz and J. Fodor are separated in 5-hour surgery from which neither recovers.
May 17, 1966. J. R. Ross tells a clean joke.
May 18, 1941. Quang Phuc Dong is captured by the Japanese and interned for the duration of hostilities.
May 19. Diphthong Day. (Public holiday in Australia)
May 20, 473 B.C. Publisher returns to Panini a manuscript entitled Saptadhyayi with a note requesting the addition of a chapter on phonology. Panini begins struggling to meet the publisher's deadline.
May 21, 1962. First mention of The Sound Pattern of English as ‘in press’.
May 23, 38,471 B.C. God creates language.
May 26, 1945. Zellig Harris applies his newly formulated discovery procedures and discovers [t].
May 27, 1969. George Lakoff discovers the global rule. Supermarkets in Cambridge, Mass. are struck by frenzied buying of canned goods.
May 29, 1962. Angular brackets are discovered. Classes at M.I.T. are dismissed and much Latvian plum brandy is consumed.
May 30, 1939. Charles F. Hockett finishes composing the music for the Linguistic Society of America's anthem, ‘Can You Hear the Difference?’
May 31, 1951. Chomsky discovers Affix-hopping and is reprimanded by his father for discovering rules on shabas.

[Note: Yes, we are aware that May 31, 1951, was a Thursday. Jim didn't check. It was only a joke, after all.]

Posted by Geoffrey K. Pullum at 12:25 PM

Never mind

If you've been worrying about losing IQ points to "infomania" -- constant email and phone interruptions -- you can put your mind at ease. Once you get off the phone and clear your inbox, that is.

In an earlier post, I listed some qualms about the experimental design of Glenn Wilson's widely-reported study. But in the discussion on PsyBlog, people have described design flaws that it didn't even occur to me to imagine: could it really be true that Wilson tested people's IQ while they were trying to ignore ringing phones and email messages popping up on a screen in front of them? I don't know, and neither do you. And we apparently never will, according to reader Ewan Dunbar.

Ewan emailed Glenn Wilson, who responded that "the study has not been published", and directed inquiries to Lucy Thomas at HP's PR firm. Ewan duly wrote to her, and she responded that "the study is not available to the public."

[Update 9/25/2005: for the truth about the experimental design, and an apology for blaming the media's excesses on Glen Wilson, see this post.]

Posted by Mark Liberman at 12:18 PM

The Poetry Corner

windows From the American Dialect Society mailing list: a little hymn by Clive James to Windows and its grammar checker, forwarded by ADS-Ler Neil Crawford.


'A poet [...] must master the rules of grammar before he attempts to bend or break them.'

--Robert Graves, letter to 'The Times', 1961 (cited by David Crystal, The Stories of English, Allen Lane, London, 2004, 182)
Clive James rush in where angels fears to tread!

Saturday Poem by Clive James

WINDOWS IS SHUTTING DOWN

Windows is shutting down, and grammar are
On their last leg. So what am we to do?
A letter of complaint go just so far,
Proving the only one in step are you.

Better, perhaps, to simply let it goes.
A sentence have to be screwed pretty bad
Before they gets to where you doesnt knows
The meaning what it must of meant to had.

The meteor have hit. Extinction spread,
But evolution do not stop for that.
A mutant languages rise from the dead
And all them rules is suddenly old hat.

Too bad for we, us what has had so long
The best seat from the only game in town.
But there it am, and whom can say its wrong?
Those are the break. Windows is shutting down.

—The Guardian Review, 30 April 2005, 36

Posted by Arnold Zwicky at 12:13 PM

Canadian "eh", Brooklyn "hey", hiphop "yo"

In the flurry of posts about Canadian eh, I neglected to link to John McWhorter's post from a year ago about English final yo. John makes a connection to George Herriman's use of final hey in his Krazy Kat strips, and to an episode of the 1940s radio sitcom My Favorite Husband where Lucille Ball characterizes "a gum-chewing girl from Brooklyn" by "postposing HEY to every second sentence".

Posted by Mark Liberman at 06:27 AM

May 02, 2005

Canadian "eh" and Japanese "ne"

In response to recent posts here on Canadian eh, Russell Lee-Goldman emailed

Looking at your recent post regarding "eh," in particular the table of types of eh, I immediately thought of the Japanese sentence-final particle.  Like eh, ne has been given many characterizations by both Japanese and foreign linguists and especially L2 educators.  Just by referencing the list in your 1 May post, it seems that uses 1, 2, 3, 4, 5, 7, and 10 are among the most common uses of Japanese "ne."  The broadest characterization of ne that I've come across is that it is a marker of "affective common ground," that is, it foregrounds interspeaker interaction rather than informational content.  Whether this can be said of eh, though, I'm not sure.

(Also, I have heard that there have been cases of Japanese people traveling to Canada and noticing a strong resemblance between the two words)

The "affective common ground" theory of ne is (I think) due to H.M. Cook, and is described in these on-line notes from a seminar at Berkeley.

Here's a discussion of ne from Peter Payne at the j-list side blog:

If you've paid attention while watching Japanese anime or JAV, you've probably picked up on the word "ne." This is an interesting Japanese grammatical particle that usually goes on the ends of sentences and serves several purposes, mostly related to asking for confirmation of information or agreement with an opinion. Here are two examples:

"Aisu kohii futatsu desu ne?" You'd like two glasses of ice coffee, is that right?
"Kyou wa atsui desu ne." It's hot today, isn't it?

Other functions of the all-purpose Japanese particle "ne" include softening a sentence so its meaning it less harsh ("Chotto furotimashita ne." You've gained a little weight, haven't you?); emphasizing what you want to say ("Kondo chanto kiite kudasai, ne." Please listen closely next time, alright?); working as a pause in sentences, like "um" in English; and to get the attention of the listener before saying something. Girls use "ne" more often than men and with a higher intonation, so males should use the word with caution lest they appear effeminate.

Apparently there is a "narrative ne" in Japanese that is often deprecated, just as the Canadian "narrative eh" and the repeated intonational rises of English "uptalk" are. Thus a page for English learners on Japanese particles has a section called "Terribly overused ne":

The correct place for ne is at the end of a sentence, where it is used to check or request the agreement of the listener:

  • Ashita watashitachi to issho ni ikimasu ne. (You're going with us tomorrow, right?)
  • Ii otenki desu ne. (Nice weather, isn't it. [with dropping intonation])

However, like "y'know" in English, too many people grossly overuse ne. I've even heard speeches where it was put between almost every word. Don't let it become a bad habit.

If the "terribly overused ne" is a female-associated usage, then that is apparently a difference from the "narrative eh, which was diagnosed as male-associated by Elaine Gold's survey of Toronto students.

Fernando Pereira emailed an anecdote about intensive use of eh:

Heli-skiing in British Columbia this February, our main guide was a young(ish) guy who grew up in the prairies (Saskatchewan if I recall correctly) and moved a while ago to interior BC (Golden). He had the highest "eh?" density I've ever heard in Canada. He had to talk to the group a lot, to give safety instruction, to direct us about where to go, etc. Pretty much every clause what punctuated by "eh?". It felt as if he used it as a way of asking implicitly whether we were paying attention, and creating opportunities for questions. This doesn't seem to quite fit the categories in Gold's paper, eh?

I'd assume that this was a variant of what Gold calls the "narrative eh", though she defines this category only by example, and doesn't speculate about what its function really is. The ski guide was not telling stories, but he was certainly foregrounding the interaction, perhaps asking for attention and evoking signs of uptake, and so on.

One of the interesting questions about both ne and eh is whether their various uses are socially symmetrical or not. For instance, if a schoolteacher is talking with a young student, would these tags mainly be used by the teacher, by the student, or equally by both? Would it be different for different uses of these tags?

Robin Lakoff's 1975 account of English tag questions, based on her introspective judgments, was that such tags "are associated with a desire for confirmation or approval which signals a lack of self-confidence in the speaker." But when Cameron et al. 1988 looked at the distribution of tag questions in nine hours of unscripted broadcast talk, they found that such tags were used only by the participants that they characterized as "powerful" -- in other words, those "institutionally responsible for the conduct of the talk". These were doctors as opposed to patients, teachers as opposed to students, talk show hosts as opposed to guests. [See this post on Gender and Tags for more details.]

Similarly, although "uptalk" (frequent use of final rises on statements in English) is often perceived as a sign of uncertainty, Cynthia McLemore's 1991 dissertation documented the use of such rises to signal the presentation of significant new information by institutionally powerful individuals. In her study, the speakers were the leaders of a sorority (as opposed to the pledges), and the rises marked announcements of new items in chapter meetings. Fernando's ski guide was also in an institutionally powerful position, and he may have been using eh in a somewhat similar way, to command attention and involvement on the part of his listeners.

I don't know whether there have been any empirical studies of the distribution of eh (or ne) that have looked at this aspect of their patterns of usage.

Posted by Mark Liberman at 06:49 PM

um, em, uh, ah, aah, er, eh

In the Canadian provincial Hansards that I wrote about earlier, some of the instance of what is spelled "eh" seem like "filled pauses", which might be transcribed as "uh" in an American context:

Mr Murdoch: [...] A couple of other ones: the stockyards, the money you talk about, is that the province's money? It is, eh, the money that you're -- who owns them?

If this is a filled pause, it might still be an instance of "eh" -- different languages and dialects have different sounds for filled pauses. And it still might have a "meaning". I first thought about the meaning of "uh" many years ago, when I lived in northern New Jersey and often heard (former New York major) Ed Koch on the radio. Koch speaks rapidly and fluently, but with a large number of filled pauses, and I noticed that he often inserted filled pauses in places where it was quite implausible that he was actually pausing to think of what to say next. For example, he would say things like "this is uh Ed Koch" or even "this is Ed uh Koch". I concluded that he used "uh" as a sort of emphasis marker, for verbally highlighting or underlining whatever came next.

If that's what he was doing, it probably worked, according to J. E. Fox Tree, "Listeners' uses of um and uh in speech comprehension", Memory & Cognition, 1 March 2001, vol. 29, no. 2, pp. 320-326(7),

Despite their frequency in conversational talk, little is known about how ums and uhs affect listeners' on-line processing of spontaneous speech. Two studies of ums and uhs in English and Dutch reveal that hearing an uh has a beneficial effect on listeners' ability to recognize words in upcoming speech, but that hearing an um has neither a beneficial nor a detrimental effect. The results suggest that um and uh are different from one another and support the hypothesis that uh is a signal of short upcoming delay and um is a signal of a long upcoming delay.

It's pretty easy to distinguish "uh" from "um", but I'm more worried about what is going on in this transcript of an interview with Billy Boyd, where things written "ah", "aah" and "em" join "um":

Stewart: Hi, ah, you join us here on Billy's official website in the first of hopefully many interviews with Billy just having a chat with the big things with Billy's career and certain movies that he's done certain stage plays that he's done. How are you Billy? Welcome.
Billy: Ah I'm very well thank you Stewart.
Stewart: Good good, eh I think for this interview what I'd like to talk about is what you did and how you started and all the rest of it take you back to your childhood days.
Billy: Aah
Stewart: Running about barefoot
Billy: [Laughing softly]Halcyon days
Stewart: On the streets of Glasgow, um did you always know that you were gonna be an actor I mean was that your dream and your ambition?
Billy: Em from, from very early on actually it was. I can't think of a actual moment that I thought I'm definitely gonna be an actor from now on but I remember being in eh guidance meetings which you used to have a school when they'd ask you what do you wanna be and I said an actor and the guidance teacher said well I- I- I wouldn't tell anyone else that. [both laugh] Honestly that's eh growing up in Glasgow maybe it wasn't the best thing to be but em yeah so from, from quite en early age

Something spelled "uh" in in there too, as in this sequence from the beginning of the third segment of the interview:

Stewart: Welcome back, eh we're now gonna have a chat with Billy about his career, his fledgling career in theatre after college, so Billy you got a job in St. Andrews?
Billy: Yeah, yeah
Stewart: Just as you left college, what was that?
Billy: It was em, while we were still at college eh, St. Andrews had a theatre called The Byer Theatre and they were doin some shows and they came to audition the people who were about to leave theatre and they were doing a show called 'The Slab Boys' by John Byrne a fantastic play ah which was made into a movie actually a few years on and 'The Diary of Adrian Mole' which is a musical on stage and its actually very funny
Stewart: And who did you play in that?
Billy: I played Adrian,
Stewart: Adrian?
Billy: [Laughing] Yeah,
Stewart: Don't look like an Adrian
Billy: Eh, uh That's coz I've no got my glasses on
[Stewart laughs like the Dr from The Simpsons again]

Does Glaswegian really have five different filled pauses ("eh", "ah", "uh", "em", "um"), and an interjection "aah" as well? Are some of these examples of tags (as Canadian "eh" usually is) rather than filled pauses at all? At this point, I think I'd like to have the recording as well as the transcript, and have the option of working from vowel formant frequencies and other phonetic measurements, or from phonetic perceptions, rather than only from orthographic transcriptions.

By the way, Billy's Simpsons reference is not irrelevant -- there's another Simpsons/linguistics connection here for Heidi Harley's collection:

In Montreal for a performance of The Simpsons In the Flesh stage show at the Just for Laughs comedy festival, the shows creator Matt Groening noted Thursday his dad was born in Canada. Homer being named after Groening's father, so what does where does this lead Homer?

"That would make Homer Simpson a Canadian," Groening said in an interview. "I hope Canadians won't hold it against the show now that they know.

Not all too surprising, as one fan noted, "Homer eats foods commonly associated with Canada: donuts, beer, bacon, and has been know to have a glass of maple syrup for breakfast.".

[Fox Tree reference via email from Hugo Quené. ]

Posted by Mark Liberman at 09:44 AM

Open access eh

By now, everyone understands the human value of freely indexed and openly accessible online information. Well, almost everyone -- Michael Gorman thinks that the "boogie-woogie Google boys" are on the wrong track entirely, and Jean-Noël Jeanneney echoes Gorman's concern about the "throbbing anxiety for anything and everything, scattering knowledge like dust". Still, most folks are pleased.

Linguists have a special reason to be happy about these developments, as the Economist pointed out back in January. That's why we here at Language Log have posted so often about new web search techniques, digital library developments, the open access movement and so on. I was reminded of this again when I recommended yesterday that people interested in the meaning of eh should look at actual patterns of use, not just native speakers' intuitions about possible use. Where, I wondered, could you find an accessible archive of Canadian English to study? A few minutes of searching turned one up.

Because the Canadian Hansards are available on line, and accessible to indexing by Google and others, I can search for {eh} on the site of the Legislative Assembly of Ontario, and turn up 452 examples. While some of these are French "eh bien", there are plenty of stereotypically Canadian English usages. Looking across all the provincial Hansards, I think you could put together a modest corpus of eh usage, well worth examining in detail.

Here are a few examples, from Ontario unless otherwise specified:

The Chair: Do we have any nominations for the subcommittee?

Mr Craitor: It's me again, eh? I'm pleased to move that a subcommittee on committee business be appointed to meet from time to time, at the call of the Chair or on the request of any member thereof, to consider and report to the committee on the business of the committee, and that the subcommittee be composed of the following members -- the committee Chair as Chair, Mr Gravelle and Mr Klees -- and that the presence of all the members of the subcommittee is necessary to constitute a meeting.

I guess this is an instance of Gold's category 2, "statements of fact", though the example has some other interesting properites. An American equivalent here might be "OK". The same substitution would work in other cases that are rather different in force, for example this instance combining Gold's category 3 ("commands") and category 8 ("insults"):

The Chair (Mr Paul R. Johnson): ... The first order of business I'd like to deal with is, I would like to know if the committee members feel that it's necessary -- I suspect not -- that this portion of our deliberations be televised. We would like to have it recorded in Hansard, no doubt.

Mr Gary Carr (Oakville South): My mom likes to watch me.

Mrs Karen Haslam (Perth): Yes, but we don't and we have to.

Mr Carr: You're going to hear me anyways.

Mrs Haslam: Give us a break, eh?

In other cases, the American equivalent would (I think) be "huh" (or maybe "right"), for example after sarcastic evaluations (which are perhaps instances of Gold's category 4, "exclamations"?):

Mr Marchese: I don't know that any one individual could take credit for that, but Tom Long evidently is a pretty powerful guy. Do you know what he's proposing these days? That we have national testing for teachers. He says, "If Mike Harris could have such popularity now with the general public, I, as the potential leader, am going to suggest that we test all teachers, not just in Ontario but across the land." Brilliant, eh? He's good. Tom Long is so good at this bullshit-I mean this kind of-

Interjections.

or again:

Mr. Christopherson: [...] Beautiful, eh? Beautiful for the employer, but the worker’s out of luck. That’s what this really means, and that’s what they’re hoping will happen. They know what will happen, and so will anyone who’s watching this who’s either had experience being a part of an organizing drive bringing their union in or has negotiated on either side.

Other cases are somewhat opaque to me:

The Chair: You have about seven more minutes you can use. Mr Duncan, do you want to say anything? No. Very well. Third party, Ms Lankin.

Ms Lankin: I'd gladly use that extra time. No, eh? I'm going to speak very briefly on the disability income support program and then turn it over to my colleague to address the Ontario Works bill.

This seems to mean something like "No, actually, I'm going to speak briefly and then turn the rest of time over to my colleague". I don't know whether eh can be used this way in general.

In other cases, eh seems purely to be a device for commanding attention [this one is from Manitoba (link)]:

Hon. Glen Cummings (Minister of Natural Resources): Mr. Chairman, I think the best thing to do is--

Mr. Chairperson: Do you have the minister's mike on? He is not sitting at his seat, eh. He is sitting at Mr. Downey's seat.

There are also a few self-referential uses, as in this speech by Mr. Ramsey of Timiskaming (11 June 1985), discussing "why the guys across the chamber are going out next Tuesday" [i.e. losing an election]:

I do not hold any grudges against these people. We are all going to be making the same salary now. These fellows have probably forgotten how to drive their own cars. We will all be sitting at the back of a Toronto Transit Commission subway or streetcar -- huddled in the back, with our tuques on in the winter and talking as one hoser to another.

I noticed that my legislative assistant kept changing that word to "loser," thinking I was not spelling it right. I meant "hoser." She is not familiar yet with what a hoser is, not being from the Great White North as I am. "Loser" is probably also appropriate here. But we poor hosers could be in the back -- Leo, Alan, Mike and myself -- and we can talk about the problems of the north.

I can think of a new name. Mr. Speaker, I know you are aware of the rat pack in Ottawa. I would have a name for this club of travelling minstrels on the TTC in the wintertime. We could be called the "eh team." I do not mean as in capital A but as in "Good day, eh." "Good day eh," we could say on the TTC.

The Canadian provincial Hansards don't seem to have any examples of "narrative eh", either because there aren't enough examples of the right sort of narrative, or because "narrative eh" is too stigmatized for use in such a context. Perhaps there are some oral histories online that would remedy the deficit?

Posted by Mark Liberman at 08:06 AM

May 01, 2005

Le plan biblio advances

Another development in a story we've been following for a few months: six European heads of state signed a letter in favor of a European Digital Library.

According to Le Novel Observateur on 4/28/2005, the letter was signed by Jacques Chirac of France, Gerhard Schroeder of Germany, Silvio Belusconi of Italy, Jose Luis Zapatero of Spain, Aleksander Kwasniewski of Poland, and Ferenc Gyurcsany of Hungary, and was addressed to the European Commission and the EU president. It's not clear to me whether the other EU countries declined to sign, or weren't asked.

Ils souhaitent donc «prendre appui sur les actions de numérisation déjà engagées par nombre de bibliothèques européennes pour les mettre en réseau» et «constituer ainsi ce qu'on pourrait appeler une bibliothèque numérique européenne».

They therefore wish to "support the digitization activies already underway at several European libraries so as to put them into a network" and "thus to constitute what one could call a European digital library".

I suppose that "les mettre en réseau" means something different from "put them on the internet", since I imagine that's already done -- we're talking about coordination of activities and harmonization of standards and so on, a set of problems that can consume amazing amounts of resources when the network is connecting a set of established institutions, as Fernando Pereira pointed out a week or so ago. Fernando's observations are worth quoting at length:

Large government efforts in information technology have a way of failing to deliver even a fraction of what they promise. The reasons are complicated and varied, but one of the main ones is mission creep: since a large project involves many constituencies, all of which have different concerns, and all of which can stop or slow down the project, the path of least resistance is to make the project a union of the requirements from all of those constituencies, regardless of whether the requirements are compatible, or fit within the project budget. This happens too in large, established corporations with many power centers and legacy systems. In addition, such projects are often managed by institutions involved in standards that are too fond of complex designs that are supposed to increase modularity and interoperability. The end results are often a bureaucracy of objects and interfaces -- plumbing -- with very limited actual functionality. In contrast, all successful information access projects I know of started with a clear goal and a few simple (but not necessarily obvious) design ideas, created and managed by one person (arXiv, ...) or a small team (CiteSeer, Altavista, Google, ...). Early success puts demand pressure on such projects that helps steer their growth in useful directions. In contrast, large bureaucratic projects spend most of their resources in pre-deployment design, planning, management, and conflict resolution, with the result that they do not acquire an early user base who advocate for the project and help make it better before the money runs out.

Returning to the story in Le Nouvel Obs:

Dans ce cadre, ils proposent que l'Union «fournisse le cadre d'une concertation entre les institutions concernées» et «apporte sa contribution à la solution des problèmes à surmonter».

In this framework, they propose that the Union "should furnish the framework of a consultation among the institutions involved" and "supply its contribution to the solution of the problems to be overcome".

I believe that concertation can mean either "talking together" or "working together". It's not clear which meaning is intended here. Either way, I think we just heard about it from Fernando...

Les six chefs d'Etat et de gouvernement souhaitent que cette question fasse l'objet, «à brève échéance», «d'un débat entre les ministres de la Culture» et de la Recherche des 25, «à la lumière d'une première communication de la Commission».
Jacques Chirac devrait évoquer le sujet dans son discours d'inauguration des rencontres européennes de la culture, lundi prochain à Paris.

The six heads of state wish this question to be the object "in the near term" "of a discussion among the ministers of culture" and of research of the 25 [EU countries], "in the light of a prior communication from the [European] Commission".
Jacques Chirac should raise this issue in his inaugural address at the European cultural meetings, next Monday in Paris.

There's an AP wire story, and a brief report in English from the American Library Association, and some commentary at The Economist.

Meanwhile, there's also a relevant piece in the most recent Science magazine (which you'll need a subscription to read) about a "digital libraries" area where Europe is apparently ahead of the U.S. -- though Jacques Chirac will probably not be talking about it. The article is Gretchen Vogel and Martin Enserink, "Europe Steps Into the Open With Plans for Electronic Archives", Science, v. 308, issue 5722, 623-624. Here's their lede:

BERLIN AND PARIS--While moves in the United States to make scientific research results available--for free--at the click of a mouse have generated intense debate, European research organizations have quietly been forging ahead. Slowly but surely, they are starting to build and connect institutional and even nationwide public archives that will, according to proponents, be the megalibraries of the future, allowing anyone with an Internet connection to access papers that result from publicly funded research. "The cutting edge of the Open Access movement is now in Europe," says Peter Suber of Public Knowledge, an advocacy group in Washington, D.C.

The article cites detailed progress, both technical and social, in many European countries, towards realizing the goals of the Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities. I'll summarize their findings in another post -- if I can't find an open-access article covering the same ground!

[Update 5/2/2005: In his speech today, Jacques Chirac mentioned the proposed "bibliothèque virtuelle européenne", and said that "à travers des initiatives comme celles-là, l'Europe sera aux avant-postes du combat pour la diversité culturelle" ("through such inititiatives, Europe will be in the vanguard of the fight for cultural diversity"), as opposed to leaving things "au jeu aveugle du marché" ("to the blind play of the marketplace"). ]

Posted by Mark Liberman at 03:07 PM

The meaning of eh

Sarosh Kwaja , responding to an earlier post about gender and tags, sent email this morning to ask about the function of the stereotypically Canadian tag "eh". I didn't know the answer, so I did a few minutes of poking around.

Searching on Google Scholar for {Canadian tag eh} turned up as the first hit an abstract by Elaine Gold and Mireille Tremblay, "Canadian English, Eh? Canadian French, Hein?", which in turn includes a bibliography full of interesting-looking stuff, including most recently Elaine Gold, "Canadian Eh?: A Survey of Contemporary Use", Annual Conference of the Canadian Linguistic Association, University of Manitoba, Winnipeg (2004). Searching again for {Elaine Gold} turned up a copy of the cited paper itself.

Gold's paper begins

Although eh is widely considered to be a marker of Canadian speech, there has been little research done into its use or meaning and none in the last 25 years.

and takes up the question of whether eh is really a Canadianism or not:

Avis wrote an article entitled 'So eh? is Canadian, eh?' in which he argued against eh being a Canadianism, citing examples of eh from English writings around the world. However, he undermined his own argument, in that he could find only Canadian examples of two types of eh: the narrative eh illustrated in (3), and eh used as a reinforcement of an exclamation in (4):

(3) "He's holding on to a firehose, eh? The thing is jumping all over the place, eh, and he can hardly hold onto it eh? Well, he finally loses control of it, eh, and the water knocks down half a dozen bystanders." (Avis: 103)
(4) "How about that, eh?" (Avis: 99)

The body of Gold's (excellent) paper "focuses on the results of a survey of eh usage conducted on students at the University of Toronto", covering 30 males and 61 females who were Canadian-born, and 5 males and 13 females who had immigrated to Canada within the past 5 years. All were under 30 and enrolled in the introductory linguistics course at the University of Toronto. The participants were asked about experience with, usage of, and attitudes towards various examples of eh:

Type of eh Sample sentence
1. Statements of opinion Nice day, eh?
2. Statements of fact It goes over here, eh?
3. Commands Open the window, eh?
Think about it, eh?
4. Exclamations What a game, eh?
5. Questions What are they trying to do, eh?
6. To mean 'pardon' Eh? What did you say?
7. In fixed expressions Thanks, eh?
I know, eh?
8. Insults You're a real snob, eh?
9. Accusations You took the last piece, eh?
10. Telling a story This guy is up on the 27th floor, eh? and then he gets out on the ledge, eh?

You can read the paper to see the details. There were few sex differences among native Canadians in answers to the "have you heard" questions -- significance levels are not presented, so maybe there were none, it's hard to tell. There was little overall sex difference in answer to the "do you use" questions -- 49% sometimes+often for males, 45% for females; 15% often for males, 14% for females.

There were larger differences in the responses for some particular cases: 92% of the females said they used the expression "I know, eh?" sometimes or often, while only 72% of the males admitted to this; on the other hand, only 13% of the females reported use of eh in commands like "Open the window, eh?", while 45% of the males did. (Though this last difference may have more to do with the rather brusque tone of the command, rather than the use of eh...)

It's clear from the attitude survey that some of the uses of eh are stigmatized. In particular, about half the Canadian respondents reported a negative attitude towards "narrative eh" (34% of males, 57% of females, 49% overall), while very few reported a positive attitude (3% of males, 2% of females, 2% overall). In general, the negatively-evaluated cases tend also to be the ones where males reported higher usage than females did.

The recent immigrants seem to be picking up eh pretty fast, although their overall rate of (self-reported) usage was somewhat lower, and they mostly don't get some of the subtleties, especially the use with fixed expressions:

As was noted with recognition rates, the immigrants' usage rates do not reflect the pattern of use they hear from native speakers. This can be seen in Table 14 where the responses of the native Canadian speakers regarding use (Table 4) are compared with the use reported by the new immigrants. On the one hand, immigrant speakers do not seem to be picking up set expressions like Thanks, eh? [11% vs. 53%] and I know, eh? [28% vs. 85%]. On the other hand, their use of eh for pardon is higher than that of native speakers [56% vs. 39%].

Gold offers an interesting and plausible theory about what's going on with the immigrants:

One explanation for the immigrants' different pattern of use might lie in their interpretation of the function of eh. It is possible that the immigrants are interpreting eh as strictly as a question particle, equivalent to tags like 'isn't it' or 'don't you think'. This interpretation of eh is consistent with eh following statements of opinion, accusations or fact; these can all be rephrased as questions, such as Nice day, isn't it? or It goes over here, doesn't it? However, this question particle meaning of eh is not compatible with expressions like Thanks or I know which make no sense when rephrased as questions. This would explain why immigrants are not picking up these expressions as quickly as some of the others, even though they are exposed frequently to them.

Gold's survey did not address the questions that Sarosh asked me about, which dealt with the distinction, originally made by Holmes 1984, between modal tags (which indicate speaker uncertainty) and affective tags (which may be softeners, conventionally mitigating the force of possibly-impolite or aggressive remarks, or facilitative tags, which invite the listener to take a conversational turn to comment on the speaker's contribution). Nor did Gold deal with the effects on eh usage of the structure of the interaction, which Cameron et al. (1988) found to interact with speaker sex in a crucial way. In their study, men were found to use modal tags more often than women, and affective tags less often; while for both sexes, the affective tags were only used by "powerful" participants (those "responsible for the conduct of the talk", like a teacher or talk-show host). (See the cited gender and tags post for more details.)

There are several different sorts of things at issue in these different studies: the form and function of the phrase to which a tag like eh is appended (e.g. commands, questions, fixed expressions, etc.); the discourse context (e.g. narrative eh); the function of the tag itself (e.g. Holmes' distinctions among uncertainty, softening and turn-taking); the relationships among the speakers (e.g. whether one of them is institutionally "in charge" of the interaction). Cross this with formality (and other aspects of "register"), age, sex, geography, ethnicity and so on, and you've got plenty of topics for empirical research. More important, the results of such research bear on basic questions about language, communication and identity.

Surveys, though very useful, are not by themselves an adequate way to study such patterns of usage. There are several serious problems with trying to resolve such problems using intuitions, whether the intuitions of a linguist or the intuitions of survey respondents. It's not just that people tend to underestimate their usage of stigmatized features, though this is true. It's also true that people may have no useful intuitions at all about some crucial factors. An even bigger problem is that the survey designer may not know to ask the most important questions. And the biggest problem of all is that we humans are incorrigible theorizers. As soon as we start reflecting about our own behavioral dispositions, we start to organize our impressions and reactions into more or less coherent patterns, whose relationship to our actual behavior can be remote.

This used to be a daunting and even depressing line of thinking, but now it's become invigorating and exciting. For those of use who are interested in language use, this is a great time to be alive and working. We've got large digital archives of speech and text, we've got (semi-)automatic techniques for search and analysis, we've got great tools for exploratory analysis of large bodies of data. Go to it, eh?

Posted by Mark Liberman at 01:11 PM