Language Log: August 2005 Archives

August 31, 2005

Legislatively Challenged

Last week, according to a widely reproduced AP story, New York Governor George Pataki vetoed a bill that would have required the use of politically correct terminology in laws, regulations, and charters when referring to people with disabilities. Pataki said that he vetoed the bill because it established "vague and subjective" standards and observed that not only do preferences change over time but that people with the same disability disagree as to what terminology they prefer.

I'm afraid that I have to side with Governor Pataki on this one. The bill isn't about avoiding obviously and egregiously offensive terminology like gimp and crip. To my knowledge not a single New York law or regulation uses such terms. According to its sponsor, Harvey Weisenberg, the bill would have required the disabled to be called instead people with disabilities. Hunh? I see no relevant difference in meaning betwen the two. The main difference is that expressions like people with disabilities are longer and in many contexts will be more awkward and less readable. Yet another impediment to clarity and readability of laws and regulations is somethng we really do not need.

The bill also reflects the assumption of a form of the Sapir-Whorf hypothesis. Assemblyman Weisenberg is quoted as saying:

By using the correct language in legislation, New York state lawmakers can make a positive impact on how people with disabilities are perceived by society,

I doubt it. It could be true, but I find it striking that this and other similar ideas are put forward by serious people without a shred of evidence.

I have no idea what disabled people, and people with disabilities, think about this. Their thoughts on the matter, if any, are not mentioned in any of the news articles and do not turn up on casual Googling, but I know from my own experience that well-meaning people seem to come up with non-existant distinctions that mean nothing to those they are trying to benefit. An acquaintance once explained to me that he thought that one should never say that someone is a Jew, always that someone is Jewish. He thought that He is a Jew. is somehow offensive, while He is Jewish. is not. I don't know of any linguistic or grammatical principles from which this would follow, nor of even a smidgen of evidence that anyone else shares his perception. I certainly don't, and I am a Jew. That's how I put it. If anything, I prefer He is a Jew. to He is Jewish because I associate the latter usage, rightly or wrongly, with people who wrongly consider being Jewish to be purely a matter of religion, like being a Baptist or a Hindu.

Posted by Bill Poser at 11:31 PM

Leading questions and frickin' cooks

According to a CNN story about New Orleans mayor Ray Nagin's frustration over lack of coordination,

"There is way too many fricking ... cooks in the kitchen," Nagin said in a phone interview with WAPT-TV in Jackson, Mississippi, fuming over what he said were scuttled plans to plug a 200-yard breach near the 17th Street Canal, allowing Lake Pontchartrain to spill into the central business district.

Arnold Zwicky, who told me about this by email, wondered about three editorial differences between this textual version and what he remembered from hearing the clip on CNN TV news, which he rendered as

There's way too many frickin' -- excuse me -- cooks in the kitchen.

CNN has the clip on their website (if the link doesn't work, try going through the story linked above), so I was able to verify that Arnold's memory is exact. I've extracted just the cited phrase here.

The three differences, obviously, are

un-contraction of there's to there is
standardized spelling of frickin' as fricking
elision of "excuse me"

With respect to the first and second points, Arnold was curious about whether his memory was wrong, or perhaps the Mayor was using informal language in a formal register. But Arnold's memory was exact, and I suppose that the partial formalization of the quote comes from a CNN copy editor, or an assumption by the writer that these are the right ways to render such things.

Opinions differ about how to transcribe informal speech. I tend to agree with the practice of writing -ing rather than -in', as Mark Twain did, though "fricking" in particular seems kind of silly. I feel that removing contractions is a bad idea, since in conversational English the lack of contraction in cases like this often indicates either some sort of emphasis or some extra dose of formality. In this case, the un-contraction is especially odd, since the singular is itself is non-standard. Still, Mark Twain himself did this, as in his maxim "The trouble ain't that there is too many fools, but that the lightning ain't distributed right".

Eliding the excuse serves to make a punchier quote, but a bigger issue is the extent to which the quote itself was set up by the interviewer. Here's the whole context:

Ray Nagin:	Uh I'm a very impatient person, I would love to see those uh resources come a lot quicker, and I would love to see some of the chiefs that keep showing up down here to kind of stay away for a minute and let us get to- these implementations uh phases adequately done.
Interviewer:	Are- are there too many cooks in the kitchen, is that what I hear you saying, Mr. Mayor?
Ray Nagin:	Absolutely, in my opinion, there's way too many frickin' -- excuse me -- cooks in the kitchen, we had this implementation plan going, they should have done these uh sandbagging operations first thing this morning, and it didn't get done and I- quite frankly I'm very upset about it.

This is the sort of ritual exchange that Rasheed Wallace lampooned here. It serves its purpose -- here we are, writing about Mayor Nagin's remarks, which took on a force that they would have lacked if the quote were just "I would love to see some of the chiefs that keep showing up down here to kind of stay away for a minute" or something similar.

Nagin apparently had good reason to be upset, and the reported helped him to express his anger in a way that got people's attention:

The National Weather Service reported a breach along the Industrial Canal levee at Tennessee Street, in southeast New Orleans, on Monday. Local reports later said the levee was overtopped, not breached, but the Corps of Engineers reported it Tuesday afternoon as having been breached.

But Nagin said a repair attempt was supposed to have been made Tuesday.

According to the mayor, Black Hawk helicopters were scheduled to pick up and drop massive 3,000-pound sandbags in the 17th Street Canal breach, but were diverted on rescue missions. Nagin said neglecting to fix the problem has set the city behind by at least a month.

"I had laid out like an eight-week to ten-week timeline where we could get the city back in semblance of order. It's probably been pushed back another four weeks as a result of this," Nagin said.

"That four weeks is going to stop all commerce in the city of New Orleans. It also impacts the nation, because no domestic oil production will happen in southeast Louisiana."

So it's easy to sympathize with what the reporter did: the conventions of the genre force him to make the point by asking the mayor a leading question, rather than expressing an opinion in his own voice.

[Update: Arnold Zwicky emailed

small additional subtlety on "there's" vs. "there is": "there is" + NPpl is indeed nonstandard (and somewhat more common in the south and south midlands than elsewhere, i believe -- i'm away from my sources on this today), but "there's" + NPpl should really be characterized, in current english, as merely informal/colloquial, rather than nonstandard. millions of people (like me) who wouldn't use "there is two people at the door" are entirely happy with "there's two people at the door". so the two versions differ not only in emphasis and/or formality, but also (for many of us) in standardness.

]

Posted by Mark Liberman at 02:04 PM

Killing me softly with their slides

Yesterday's Washington Post featured a column by Ruth Marcus entitled "Powerpoint: Killer App?" It begins with this very provocative first paragraph:

Did PowerPoint make the space shuttle crash? Could it doom another mission? Preposterous as this may sound, the ubiquitous Microsoft "presentation software" has twice been singled out for special criticism by task forces reviewing the space shuttle disaster.

The rest of the column is, as columns like these often are, equal parts funny and disturbing -- and each in several ways. I'm one of those sad folks who use Microsoft products like PowerPoint out of some ill-defined sense of necessity, and I'm always down for some Microsoft (product) bashing. However, I won't tolerate the gratuitous bashing of second-grade students' writing abilities nor that of those students' teachers' abilities to instruct them in writing.

I'm referring to this paragraph from the column, with emphasis added:

The most disturbing development in the world of PowerPoint is its migration to the schools -- like sex and drugs, at earlier and earlier ages. Now we have second-graders being tutored in PowerPoint. No matter that students who compose at the keyboard already spend more energy perfecting their fonts than polishing their sentences -- PowerPoint dispenses with the need to write any sentences at all. Perhaps the politicians who are so worked up about the ill effects of violent video games should turn their attention to PowerPoint instead.

Almost certainly, Marcus makes this claim in the absence of (a) any qualitative evidence of how "tutoring in PowerPoint" proceeds in second grade (was this vulnerable age group used simply for rhetorical effect?) or (b) any quantitative evidence of the ratio of time/"energy" that students spend "perfecting their fonts" vs. "polishing their sentences". Reading this criticism-founded-on-PTA-anecdote of both students and teachers has the bitter aftertaste of poor research on important issues -- I'm not saying the claims are false, only that I'm certain that they haven't been shown to be true in any significant way and that I don't see what good they do for any kids.

Though I do agree about the video games bit at the end of the above-quoted paragraph, at least when it comes to politicians. Leave the real work to the psychologists.

[Link to the column courtesy of Paul de Lacy.]

[

Update: John Lawler writes to tell me how well-worth reading Edward Tufte's piece "The Cognitive Style of PowerPoint" (cited by Marcus) is. I haven't spent the $7 to order it yet, but Marcus also cites a freely available story by Tufte that appeared Wired Magazine in 2003, "Powerpoint is Evil" ("Power corrupts. PowerPoint corrupts absolutely."), where Tufte mentions the use of PowerPoint in "elementary school":

Particularly disturbing is the adoption of the PowerPoint cognitive style in our schools. Rather than learning to write a report using sentences, children are being taught how to formulate client pitches and infomercials. Elementary school PowerPoint exercises (as seen in teacher guides and in student work posted on the Internet) typically consist of 10 to 20 words and a piece of clip art on each slide in a presentation of three to six slides -a total of perhaps 80 words (15 seconds of silent reading) for a week of work. Students would be better off if the schools simply closed down on those days and everyone went to the Exploratorium or wrote an illustrated essay explaining something.

I don't think I can accuse Tufte of not thoroughly researching this, but I would like to see more evidence (and will therefore probably fork over the 7$). for the references here to "teacher guides" and "student work posted on the Internet" -- this hardly seems like cause for this level of alarm, or for the conclusion that PowerPoint tutelage is somehow replacing "learning to write a report using sentences".

(Sidenote: Tufte's Wired story is accompanied by a somewhat different point of view on PowerPoint by David Byrne.)

Plus: Language Log's own Geoff Nunberg writes to remind me of his 1999 piece "Slides Rule" from Fortune Magazine, which also appeared in The Way We Talk Now, pp. 213-215.

]

[ Comments? ]

Posted by Eric Bakovic at 01:52 PM

Google Purge

Last spring , Jean-Noël Jeanneney warned us about "cette inquiétude lancinante du n'importe quoi, de la dispersion du savoir en poudre" ("this throbbing anxiety for anything and everything, for scattering knowledge like dust"). Well, according to The Onion, here comes the vacuum cleaner: Google Purge.

"Our users want the world to be as simple, clean, and accessible as the Google home page itself," said Google CEO Eric Schmidt at a press conference held in their corporate offices. "Soon, it will be."

As John Battelle explains

"Thanks to Google Purge, you'll never have to worry that your search has missed some obscure book, because that book will no longer exist. And the same goes for movies, art, and music."

As a phonetician, I'm especially excited about Google Sound:

"Book burning is just the beginning," said Google co-founder Larry Page. "This fall, we'll unveil Google Sound, which will record and index all the noise on Earth. Is your baby sleeping soundly? Does your high-school sweetheart still talk about you? Google will have the answers."

Page added: "And thanks to Google Purge, anything our global microphone network can't pick up will be silenced by noise-cancellation machines in low-Earth orbit."

Finally, speech and language scientists will be able to do away with old-fashioned sampling methods, and rely instead on statistics calculated from the entire domain of phenomena under investigation! In fact, scholars and scientists of all types will be able to complete their transformation from field and lab-bench investigations to purely digital research:

Although Google executives are keeping many details about Google Purge under wraps, some analysts speculate that the categories of information Google will eventually index or destroy include handwritten correspondence, buried fossils, and private thoughts and feelings.
The company's new directive may explain its recent acquisition of Celera Genomics, the company that mapped the human genome, and its buildup of a vast army of laser-equipped robots.

I guess this is what Jean-Claude Juncker and other European politicians were talking about when they warned of "virulent attacks" on European culture, fearing that "Google's ambitious plans could result in important European literary works missing out and being lost to future generations". Did Jacques Chirac slip them classified reports from a DGSE mole in Mountain View?

I feel in honor bound to warn readers that the Onion is a satirical publication, and this post is a joke... However, I do think that there is a serious point to be made here. And it's not that there is a paranoid strain in European intellectual culture, or that Google's servers are the leading edge of the The Matrix.

For me, the lesson is a narrower one, directed at publishers in general, and scientific and scholarly publications in particular. There is growing evidence that Open Access increases impact. In my opinion, this effect is certain to increase, asymptotically approaching the point where publications that are not indexed and accessible on line will effectively cease to exist. No one will have to purge them -- they will have purged themselves.

[Onion link via Kerim Friedman]

Posted by Mark Liberman at 11:24 AM

New Orleans is essentially an arm of the Gulf of Mexico

So say Cornelia Dean and Andrew C. Revkin in today's NYT. After contributing to one of the disaster relief organizations, you might distract yourself up by taking a look at John Cowan's page of Essentialist Explanations. This is essentially a list of 736 sentences of the form <Language X> is essentially <Language Y> <produced under conditions Z>. Some are funny, some are silly, some are mildly offensive, some are nearly true.

A sample:

English is essentially Norse as spoken by a gang of French thugs.
English is essentially the works of Joyce with the hard bits taken out.
Swedish is essentially Norwegian spoken by Finns.
Danish is essentially Norwegian, only you drop out all the consonants, skip all the vowels and then mispronounce the rest.
Spanish is essentially Italian spoken by Arabs.
Francophones are essentially Germans speaking the bad Latin they were taught by Gauls.
French is essentially an attempt by the Dutch to speak a Romance language.
French is essentially a language that elides everything that doesn't get out of the way fast enough, and nasalises everything else.
Russian is essentially Punjabi that fell off the wagon. Contrariwise, Punjabi is essentially Russian with better spices.
Modern Greek is essentially Classical Greek as spoken by Venetians.
Mandarin is essentially Chinese as spoken by Mongols.

Posted by Mark Liberman at 09:18 AM

Encoding Puzzle Answer

Here's the solution to yesterday's encoding puzzle If you look at the HTML metadata, the page claims to be in ISO-8859-1 (aka Latin-1), an ASCII extension in which things like accented characters occupy codepoints above the ASCII range, while still remaining in a single byte. The claim, though technically true, is misleading. All of the characters are ASCII characters. That is, not a single byte on that page has a value greater than 0x7F. Technically, you can call that ISO-8859-1, since it is consistent with it, but really the page is in the ASCII subset of ISO-8859-1.

Inspection of the page source reveals that the accented letters are each represented by a sequence of two HTML decimal numeric character entities. For example, é e with acute accent, is not represented by the single byte with value 0xE9 as it would be in ISO-8859-1. Rather, it is represented by a sequence of twelve bytes: Ã©. Ã is an HTML representation for Ã upper case a with tilde; © is an HTML representation for © copyright symbol. That's why the word représente comes out as reprÃ©sente on your terminal. (Don't anybody write in to say that this is the usual spelling used by dyslexic speakers of North African French when text-messaging after they've had a few drinks or something like that. Writing Unix man pages is a serious, indeed sacred, matter. Learned authors have compared the interpretation of Unix man pages to the study of the Talmud.)

What do I mean by saying that Ã is an HTML representation of Ã and that © is an HTML representation of ©? In HTML, characters may be represented as many as four ways:

They may be directly encoded. For example, a byte with the numerical value 0x26 (38 to the hexadecimally challenged) is the ASCII code, and therefore also the ISO-8859-1 and Unicode, for the ampersand character &. Most of the text that you see on web pages is represented this way.
They may be represented by character references. These are little labels enclosed between an ampersand and a semi-colon. For example, ampersand may be represented &. Such character references exist for most of the common symbols, such as © and ®, and for letters with diacritics, such as é and à. (You may be wondering how it is that I am writing things containing & if it is used to introduce character references. The answer is that I am very clever. If that doesn't satisfy you, the view page source command in your browser should clue you in.)
Characters may be represented by means of hexadecimal numeric character entities. A numeric character entity begins with an ampersand and a cross-hatch and ends with a semi-colon. Between them is a numerical representation of the character's Unicode value. If the number is base 16, it is preceded by an x. The hexadecimal numeric character entity for ampersand is &.
Characters may be represented by decimal numeric character entities. These are just like hexadecimal numeric character entities except that the number is in base 10. That it is decimal is marked by omission of the x that marks hexadecimal numbers. Ampersand is represented as a decimal numeric character entity as  . No one knows for sure why decimal numeric character entities exist since they are wholly redundant and not nearly as elegant as their hexadecimal counterparts. Some scholars suspect that they have a symbological value. Perhaps a sequel to The Da Vinci Code will enlighten us.

So, how did é end up represented as Ã©? Well, Ã© is an ASCII-fied representation of a sequence of bytes whose numerical values are 0xC3 (aka 195) and 0xA9 (aka 169). Notice how the use of decimal numeric character entities obscures things. It just happens that 0xC3 0xA9 is the UTF-8 encoding of UTF-32 0xE9. In its pure and ethereal form, Unicode codepoints are all 32 bits, or 4 bytes. For various reasons (discussed previously on Language Log and in more detail here) the preferred form for exchange of Unicode-encoded text is UTF-8, in which most characters are encoded as two or more bytes.

To pull all this together, the garbled man pages are what you would get if you started off with a page in UTF-8, and mistakenly thinking that it was in ISO-8859-1 ran it through an HTML-izer that converted anything outside the ASCII range to numeric character entities.

The reason for using an HTML-izer is that some software, such as the software that runs this blog, cannot handle bytes whose high bit is set. If you enter such a byte into a Language Log entry, it looks fine when you enter it, but you will find the post truncated immediately before the first such byte. So if you want to use non-ASCII characters with confidence in web pages, it is wise to convert them all to character entities. I have written a couple of programs that do this myself.

Several of our readers figured this out: Diane Bruce responded in the wee hours of last night not long after I posted the puzzle. The others are Aaron Elkiss and John O'Neill.

Posted by Bill Poser at 01:54 AM

August 30, 2005

An Encoding Puzzle

Recently I looked something up in the GNU/Linux manual pages at http://maconlinux.net/linux-man-pages/fr/strtol.3.html, which are in French. and couldn't get them to display correctly. Most of the text came out fine, but accented letters, and generally anything outside the ASCII range, came out garbled. At first I thought that the browser might be displaying the page using the wrong encoding, but changing encodings didn't solve the problem. The Spanish manual pages at http://maconlinux.net/linux-man-pages/es/strtol.3.html exhibit the same problem.

[Note: when I checked these URLs just now, I got a server error. If it is still acting up, here's a link to the Google cache of the French page.]

Although I couldn't get these pages to display correctly, short of writing a little script to transform them before letting the browser at them, after a few minutes I figured out what had happened to them. There is a perfectly straightforward explanation for what happened to them. For now, I'm going to leave the solution as an exercise for the ling-technically inclined reader. I'll post it tomorrow.

Posted by Bill Poser at 12:30 AM

"Grammar cranks" of the right

[Guest post by Benjamin Zimmer] Linguistic persnicketiness is certainly not restricted to any particular political ideology. But prescriptivist gripes are sometimes grounded in a conservative distaste for loosey-goosey moral relativism and the like. Here are two defenders of language conventions hailing from the political right: one a comic-strip character and one the current Supreme Court nominee.

The first example is Bruce Tinsley's comic strip Mallard Fillmore, marketed as a conservative answer to such left-leaning fare as Doonesbury and The Boondocks. Tinsley describes his protagonist (who bears a striking resemblance to Daffy Duck) as "a seasoned, rumpled ex-newspaper reporter" who "thinks we average, hardworking Americans need a break instead of a lecture." On Sunday, however, Mallard apparently thought we average, hardworking Americans needed a lecture after all (albeit in rhyme): a punctuation rant in the manner of Lynne Truss.

Mallard must have been reading Truss's best-seller, Eats, Shoots & Leaves: The Zero Tolerance Approach to Punctuation, since he echoes her impassioned plea, "How much more abuse must the apostrophe endure?" But Truss never states a black-and-white rule that apostrophes are only used "when you leave a letter out, or if you want to show possession." There are plenty of exceptions, such as the use of apostrophes in the pluralization of letters (e.g., "mind your p's and q's") or the pluralization of words used citationally (e.g., "the's and a's are reduced before consonants").

Mallard's examples of rampant apostrophization not surprisingly include the much-maligned greengrocer's apostrophe ("fresh apple's"), along with extraneous apostrophes in decade names ("the 80's") and pluralized family names ("the Smith's"). But as "a seasoned, rumpled ex-newspaper reporter," Mallard should know that the jury's still out on apostrophizing decade names. The New York Times house style, for instance, keeps the apostrophe in for names of decades; a search on the Times archive finds five examples of "the 80's" in Sunday's paper alone. On the other hand, given his usual tirades against the liberal domination of the media, Mallard might simply take this as an indication that the bastion of the MSM is too weak-kneed and morally relativistic to enforce proper rules of punctuation.

Rush, the little boy in the strip, warns Mallard that he's "turning into one of those grumpy old grammar cranks." (Linguists have grown accustomed to seeing "grammar" used as a catch-all term to encompass any number of perceived linguistic conventions, from punctuation to usage to pronunciation, depending on the pet peeves of the writer.) As it happens, the apostrophe-abusing New York Times had an article on Monday about a grumpy young grammar crank in the 80s (or the 80's), one who would later in life be nominated to become a Supreme Court justice. Under the headline "In Re Grammar, Roberts's Stance Is Crystal Clear," Anne Kornblut approaches the recently released Reagan-era memos of John Roberts with an eye towards his "tendencies as a grammarian." Roberts, we learn, "frequently peppered notes and documents with minor syntax corrections even when the basic legal arguments w ere sound." Some of his corrections really did have to do with grammar or syntax, such as his insistence on maintaining consistent parallel structure and pronominal reference. Other cases noted by Kornblut simply indicate a pickiness regarding word choice, such as Roberts's preference for voluntarism over volunteerism, ensuring over insuring, and multilateral over plurilateral. (We would need to see the context of these memos to know what beefs Roberts might have had with the offending words.)

Roberts also took issue with the phrasing of Neil Armstrong's famous line when setting foot on the moon (which was to be quoted by his boss, White House counsel Fred F. Fielding, in remarks at a Kennedy Space Center picnic):

"It is my recollection," Mr. Roberts wrote, "that he actually said 'one small step for a man, one giant leap for mankind,' but the 'a' was somewhat garbled in transmission. Without the 'a,' the phrase makes no sense."

Roberts is right that the phrase makes no sense without the "a", but he should take it up with Armstrong himself. The line wasn't actually "garbled in transmission" — Armstrong flubbed it, as the Snopes urban legends website has documented.

Kornblut writes, "If Judge Roberts is confirmed, and his word-consciousness follows him to the court, it will put him in the upper tier of justices who have put a premium on the English language." It's difficult to tell from the analysis of the memos if Roberts's "word-consciousness" will rise above the level of mere curmudgeonliness. But one auspicious omen appears in the graphic sidebar accompanying the article. In a memo from 1983, Roberts complains about how newspaper columnists focused on Ronald Reagan's memorable use of the word keister:

"Frankly, I've had it up to my keister with newspaper columns about an expression fairly common to those of us reared in the Midwest. I have drafted a reply." He concluded: "It is interesting how familiarity with slang phrases often varies among different parts of our country. In this case, excuse the bad pun, but I suppose it may depend on where one was reared."

Roberts makes a rather obvious dialectological point, but it's one that is frequently lost on self-appointed guardians of good "grammar". I take this as a hopeful sign that Roberts is no strict constructionist when it comes to linguistic variation.

[Update (by Mark Liberman): Bez Thomas, among others, reminded us of a classic apostrophe rant in cartoon form, from the left end of the political spectrum:

Both from the right and from the left, (some of) these defenses of linguistic norms are notable for their moral and emotional fervor. As reported in the NYT, Judge Robert's linguistic strictures are rather even-tempered in comparison.]

Posted by Mark Liberman at 12:11 AM

August 29, 2005

Encounters with Silliness

I've been catching up on Language Log and various other things. Mark's post about silly things people say about linguistics reminded me of a visit I had from a student some years ago when I was teaching at the University of Northern British Columbia. She came to discuss with me the topic on which she wanted to write her term paper for someone else's course, namely her idea that the Gitksan-Witsuwit'en are the Lost Tribes of Israel.

I began by objecting to the idea that the Gitkwan-Witsuwit'en could be the Lost Tribes of anywhere, on the grounds that they aren't a unitary group at all. The Gitksan speak a Tsimshianic language, closely related to Nisga'a, whereas the Witsuwit'en speak an entirely different Athabaskan language, whose closest relative is Carrier, which I have mentioned here from time to time. Their languages are no more similar to each other than English and Navajo. The reason that the term Gitksan-Witsuwit'en exists is that the two were for a time allied for political and legal purposes in the form of the Office of the Gitksan-Witsuwit'en Hereditary Chiefs. This is the organization behind Delgamuukw v. British Columbia, the lawsuit that ultimately led the Supreme Court of Canada, in 1997, to rule that aboriginal title still exists in British Columbia as a burden on the title of the Crown. The fact that these two quite different groups formed an alliance no more means that they shared a common history than does the fact that Turkey and Germany were allied in the First World War. We do not draw from this fact the inference that there is a Turco-Germanic people.

In addition to pointing out, with no evident impact, the fact that there is no such tribe as the Gitksan-Witsuwit'en, I enquired as to what precisely the evidence was, in her view, that the Gitkwan-Witsuwit'en were the Lost Tribes of Israel. I was pretty certain that they were not mentioned in the Bible. Was there some other evidence she had in mind?

She immediately demanded to know whether I believed in the Bible. I responded that my view of the truth of the Bible was irrelevant since the Bible had nothing to say about the matter. We went back and forth on this briefly. Then she stalked off, convinced that I was yet another unbeliever whose denial of the truth of the Bible led him to reject her hypothesis about the Gitksan-Witsuwit'en. There's no point in arguing with some people.

Posted by Bill Poser at 11:10 PM

Scammers' Language

I just got the following email message:

Dear user of babel.ling.upenn.edu, mail system administrator of
babel.ling.upenn.edu would like to inform you that,
 
We have found that your account was used to send a huge amount of spam
messages during the recent week.  Most likely your computer had been
infected and now contains a trojan proxy server.  We recommend you to
follow our instruction in order to keep your computer safe.
 
Best regards,
babel.ling.upenn.edu user support team.

It is accompanied by a zip file putatively containing the instructions that I am supposed to follow. I imagine that it actually contains a virus, though I'm not going to go to the trouble of finding out. (This is the one downside to running GNU/Linux - if I actually want to try out a virus I have to go find a machine running Microsoft Windows. I feel so left out...)

Anyhow, any native speaker of English will detect a number of errors in the above message, some of them errors or deviations from standard written usage of the sort that a native speaker is not likely to make at all, or even a non-native speaker who has been here long enough to be working as a system administrator. There's the use of a comma at the end of the salutation in place of a colon, the failure to start the first sentence on a new line, the failure to capitalize the first letter of the first word of the new sentence, and the omission of the before mail. Then there is the use of a comma rather than a colon before something set off like a quotation or list entry and the incorrect treatment of a subordinate clause as such. A native speaker would not say recent week instead of past week, or had been infected instead of has been infected. The construction We recommend you to follow... is not English.

Such a plethora of errors should alert just about anyone that the message is a fake. Are the scammers so foolish or ignorant that they don't realize this? It probably wouldn't be too hard to get someone to polish their prose. Or are enough computer users too dense to realize that messages like this are fake that the scammers don't bother?

Posted by Bill Poser at 08:25 PM

The modish macron

Now joining the heavy metal umlaut is, apparently, the modish macron. To the right is a picture of the awning sign of a local hair place, VŌG. I walk past it frequently, wondering who's supposed to be attracted by evocations of Vogon style, but I didn't realize it was part of a trend. Recently, Phillip Jennings wrote in with news of "a new downtown Minneapolis salon named all-caps-something-or-other BLŪ", and also a magazine called "Modern HŌM". I can't find any web presence for either of these, but I'll take Phillip's word for it.

If you know of any other examples, send them along. Extra points for cases that don't involve back vowels or capital letters.

This usage apparently imitates the conventions of pronunciation fields in (some) American dictionaries, rather than from the sort of diacritical associations involved in the heavy metal umlaut, or the more general allure of foreign branding. This may be related to Qwest's belief that badly faked dictionary pronunciations are authoritative. However, I imagine that the real motivation is the difficulty (both legal and psychological) of establishing a brand around common words like vogue or home.

Unfortunately, the modish macron doesn't help our campaign to promote the IPA through popular culture.

[Update: Jesse Sheidlower points out PŪR, and mentions

"another one I'm thinking of, that I can't quite place, that so irritates me that I deliberately mispronounce it because I feel so manipulated by the macron".

Marilyn Tarnowski points out the Sprint WordTraveler FŌNCARD.

Eric Bakovic writes that

I used to laugh at a commercial from the (early? mid?) '80s for a shampoo called FOHO ('For Oily Hair Only'), but a quick google search fails to confirm my possibly wrong memory that both Os had macrons over them. But this doesn't really disconfirm my memory either; the product's been (predictably) discontinued, and the few hits I got only had ASCII-text examples, no images of the labels or anything like that. I did discover that it used to be a Gillette product, but that's about it.

Aaron Dinkin writes that

I seem to remember that there was a brand of juice box called "Boku" - macrons over the O and the U, and it was pronounced "beaucoup".

More information about BŌKŪ can be found here .

And Chris Waigl gets extra points for reminding me of her 2/8/2005 post on IPA and exoticism, which includes the example of séxūal, with a lower-case u macron.]

[Update #2: David Low was the first of several readers to point out that in Episode 9F22 of The Simpsons, Sideshow Bob is shown with LUV and HĀT on his knuckles (like other characters on that show, he has just three fingers plus a thumb on each hand). This seems less like a "modish" macron and more like a creative way to update an old movie reference (picture here) for consistency with a cartoon anatomical convention.]

[Update #3: David Doherty also gets extra points for the lower-case o with macron in the Seattle nightspot TōST. ]

[Update #4: Ed Keer at Watch Me Sleep pointed out that the board game Hūsker Dū uses macrons, which the rock band Hüsker Dü changed into heavy metal umlauts; according to the wikipedia entry, "The name of the game is spelled with macrons to emulate Scandinavian letters with macrons over them (even if macrons are only used in hand-written text)", and the game was originally published that way in Sweden in the 1950s, so if there's any connection to the new VŌG for macrons, it can only be because of some childhood experience of today's marketeers.]

[Update #5: Rebekka Puderbaugh mailed in a link to Zōe's Flax & Soy products.]

[And reported by Andrew Malcovsky, the PAYDĀTA company in Vermont...]

[And here's another example, the Riō mp3 player, submitted by Kilian Hekhuis:

]

[And another: Cepacol, submitted by Mark Wayne:

]

Posted by Mark Liberman at 08:45 AM

Never mind the storm surge, watch out for those toponyms

In the midst of the disaster, some people are still worried about usage and pronunciation:

When this is over, someone please tell Tucker Carlson and the other national newscasters that "St. Louis" is in Missouri, and we call our town "Bay St. Louis". Hope they don't try to pronounce Pascagoula, Gautier, or Delisle.

Here's hoping that Tucker Carlson's misrendering of toponymic shibboleths is the worst damage they suffer.

Posted by Mark Liberman at 08:22 AM

August 28, 2005

Which It's Happy Bunny are you?

I first saw the new antihero last year on a waitperson's chest (slogan: "Cute but psycho"), but I didn't know her name then, or even that she was a nameable phenomenon. A few days ago, courtesy of a junior-schooler excited about her new t-shirt (slogan: "not listening") I learned that it's "Happy Bunny". Wikipedia shows Happy Bunny in the (literal) mug shot linked on the right.

But actually, it's not Happy Bunny, it's It's Happy Bunny, even in subject position: "Does It's Happy Bunny dislike Boys?? Of course not. It's Happy Bunny dislikes everybody." Likewise after which, as in the numerous "which IT'S HAPPY BUNNY are you?" quizzes.

Joanne Jacobs reports that

Some blunt-spoken Happy Bunny messages, including "You're ugly and that's sad" and "It's cute how stupid you are," wouldn't make the cut at Highland Park High School.

"We consider that harassment, and we just don't allow it," Principal Jack Lorenz said.

Thought the target demographic is very different, this reminds me of the BOFH phenomenon -- both BOFH and IHB involve openly flaunting well known but traditionally covert hostility.

IHB slogans like "I think I gave you crabs" hint that the original It's Happy Bunny target might have been a bit older and more cynical than the group that has responded is. And indeed this article quotes IHB's inventor, Jim Benton, confirming this:

When Benton originated It's Happy Bunny, he expected the products bearing his artwork -- including a handful containing anti-boy phrases -- to appeal to young women ages 16 to 26. "It actually turned out to be much broader in appeal than we thought," he says. In the Bay Area, for instance, It's Happy Bunny can be found in shopping malls at Claire's, a nationwide retail chain that targets its accessories to girls ages 7 to 12.

IHB's role in validating adolescent female hostility is none of our business here. Instead, I want to make a linguistic point -- phrasal names like It's Happy Bunny introduce into English, in a small way, the phrasal names that are dominant features of many other cultures and languages. The most common source for this kind of thing in English has been bands whose names are sentences like They Might Be Giants or Frankie Goes to Hollywood. The ease with which such phrasal names enter general use seems to show that the difference in this respect between English and (for example) Yoruba is more a matter of general cultural choice than of linguistic structure.

[As far as I know, IHB consumers are all female, but there seems to be some uncertainty about the gender of the bunny itself.]

Posted by Mark Liberman at 03:05 PM

'Tis the gift to be simplistic

"Refreshingly simplistic," was how a VH1 reviewer described a new CD by some artist whose name I didn't recognize. I couldn't jot it down, since I was wheezing on a treadmill at the time, but a Google search turns up 425 instances of the phrase, with results that are variously comical and bizarre. A Web design company boasts that its work is "stylish and refreshingly simplistic." SunMex Vacations tells vacationers that "most of Mazatlan remains refreshingly simplistic." And an Amazon.com customer review of Nelson Goodman's classic Fact, Fiction, and Forecast says, "The way that Goodman perceives our inductive system is unique and refreshingly simplistic." The press isn't immune, either -- the phrase is rare there, but it turns up in a 1996 article in Billboard and a 1998 article in The Independent.

A familiar sort of malaprop, but there's a bit more going on here.

That analysis of simplistic as merely a fancy synonym for simple seems to be implicit in the word oversimplistic. If you accept Merriam-Webster's definition of simplistic as "oversimple," then oversimplistic would be a pleonasm. Yet the word gets more than 14,000 Google hits and appears in 178 stories in Nexis major newspapers (the earliest cite I've found is from a 1970 story in The New York Times, but this would probably be easy to antedate). In fact Merriam-Webster's gives oversimplistic as a run-in in the entry for the prefix over-, and while the OED doesn't list oversimplistic as a word, the editors actually use it in their definition for nothing-but-ism: "An oversimplistic approach to the explanation of a phenomenon, which excludes complicating factors; reductionism."

You could argue, of course, that the over- of oversimplistic is chiefly an intensifier, the way it is in items like overbrutal, overfacile, overfussy, and overhasty, in all of which the root itself carries an implication of excess. But the existence of phrases like "refreshingly simplistic" shows that for some people, at least, simplistic itself has acquired a purely positive meaning. My guess is that this development is helped along by an analogy with simplicity. Someone looking for an adjectival version of "refreshing simplicity" (6290 Google hits) might be drawn to "refreshingly simplistic," particularly given the effective absence of the intermediate forms simplism and simplist that words ending in -istic tend to imply. (The words actually exist, but are rare and recondite.)

Posted by Geoff Nunberg at 02:35 PM

August 27, 2005

Disowning The Brothers Grimm

No, I don't want to disown Jakob and Wilhelm Grimm, the first of whom is something of a hero of historical linguistics. I want to disown the movie The Brothers Grimm, and I'm doing this on behalf of linguists everywhere.

What the movie has in common with the real world is: two brothers named Grimm, early-19th-century Germans who were involved with fairy tales. As far as I can tell, that's it. Imagine a Life of Noam in which, through the miracle of miniaturization, the heroic Chomsky (played by Brad Pitt in a revealing latex bodysuit) takes a band of brawling adventurers into the deepest recesses of the human brain, to recover bits of the language organ for sale through his start-up company -- a sort of cerebral 21st-century Fantastic Voyage. Appalling.

In any case, not a movie to put on the recommended viewing list for students in your intro linguistics classes.

[Update, 8/30/05: Correspondents have now suggested two alternative scenarios. First, from Andrew Malcovsky, on his blog, a proposal that sticks much more closely to the historical facts than Terry Gilliam did, yielding something that might be entitled The True Adventures of Will and Jake.

And then from Tim Fitzgerald, who finds my Brothers Grimm/Chomsky comparison unfair (since "these two men have been chosen for their role in storytelling"), a counterproposal for "a much more apt comparison":

... 200 years from now a movie (or whatever form of mass entertainment they may use) on Spielberg's harrowing attempt to fight off dinosaurs from the Temple of Doom with the help of his loving extra-terrestrial friend.

Ah, California Spielberg and the E-Temple of Doom. Please, don't write to tell me that Steven Spielberg was born in Cincinnati, Ohio. I know that. But he belongs, truly belongs, to California].

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 02:12 PM

No more inveigling in California courts

No more inveigling in California courts, according to a story by AP legal affairs writer David Kravets that appeared on 8/25/05 in the San Francisco Chronicle:

When California jurors sit on kidnapping cases, judges will no longer be required to explain that the perpetrator had to "inveigle" his victim.

Instead, as part of an eight-year effort to simplify jury instructions, the judge may say it like it is -- "enticed" his victim.

The new guidelines also revise the characterizations of (among others) "reasonable doubt" and "mitigation" and, in a move objected to by many prosecutors, has them referred to as "prosecutors" rather than as "the people". Though the changes are modest (intentionally so, according to lawyer-linguist Peter Tiersma, who helped craft them), some judges maintain that they "dumb down the justice process", an accusation that would be hard to make stick on the basis of the examples Kravets provides; "entice" for "inveigle", for instance, is scarcely a giant step away from judicial clarity and towards street speech.

A Google web search gives ca. 74,800 hits for "inveigle" in its various forms, vs. ca. 4,740,000 for "entice" in its various forms, so "inveigle" seems to be enormously less frequent -- less familiar -- than "entice". The disparity is much greater than this, though, since a huge number of the "inveigle" hits are mentions rather than uses -- they're from discussions of the meaning of "inveigle", including as a legal term -- and many more are uses in specifically legal contexts. Not that "inveigle" lacks ordinary-language uses; consider "He had slyly inveigled her up to his flat / To view his collection of stamps" (Flanders and Swann, "Have Some Madeira, M'Dear") and many everyday occurrences like these:

... as per usual, was one poor SOB trying to inveigle shoppers into buying ... (Leah Garchik, "The In Crowd" Column, San Francisco Chronicle, April 14)

inveigle yourself into the homes and wineries of a few big names whose egos ... (link)

Still, "entice" is probably a small improvement on "inveigle".

The changes seem to be mostly in vocabulary. For instance, the old version defines "mitigation" as

any fact, condition or event which does not constitute a justification or excuse for the crime in question, but may be considered as an extenuating circumstance in determining the appropriateness of the death penalty

which is now

any fact, condition, or event that makes the death penalty less appropriate as a punishment, even though it does not legally justify an excuse for the crime

This maintains the two-clause syntax, with coordination replaced by subordination, and it reverses the order of the proviso (about not justifying the crime) and the main part of the definition (about allowing certain factors to be taken into account), in favor of putting the main part first, which is surely an improvement. It also reduces the nominalization quotient a bit, by replacing "justification" by "justify" and "appropriateness" by "appropriate". And it replaces the restrictive relativizer "which" by "that", which could be seen as either as a move towards informal English or as a move towards prescriptively standard English, depending on who you read.

But mostly what it tries to do is unpack the meaning of the term of art "extenuating circumstance".

Another change tries to unpack "innocent misrecollection", also a term of art (ca. 447 Google webhits, all of them apparently in legal contexts), via replacing

Innocent misrecollection is not uncommon.

People sometimes honestly forget things or make mistakes about what they remember.

More side-by-side comparisons in Kravets's article.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 01:43 PM

"Approximate" quotations can undermine readers' trust in The Times

So says the NYT code of ethics, as well it should. For the past couple of months, I've been muttering about sloppy if not dishonest quoting practices in print media, including at the NYT. There's a particularly striking example in an 8/18/2005 article by Joel Brinkley and Steven R. Weisman, based on an interview with Condi Rice, which ran under the headline "Rice Urges Israel and Palestinians to Sustain Momentum".

The NYT article starts like this:

Secretary of State Condoleezza Rice on Wednesday offered sympathy for the Israeli settlers who are being removed from their homes in Gaza but also made it clear that she expected Israel and the Palestinians to take further steps in short order toward the creation of a Palestinian state.

"Everyone empathizes with what the Israelis are facing," Ms. Rice said in an interview. But she added, "It cannot be Gaza only."

The transcript of the interview was posted by the U.S. State Department web site under the title "Interview With The New York Times", dated August 17, 2005. In that transcript, the only occurrence of the string "empathize" is this one:

I know, in having talked to them and watched how hard and I think everybody empathizes with what every Israeli has to be feeling and with people uprooting from homes that they have been in for a generation and the difficulty and the pain that that causes.

And the only place where "Gaza only" occurs is here:

The other thing is, just to close off this question, the question has been put repeatedly to the Israelis and to us that it cannot be Gaza only and everybody says no, it cannot be Gaza only.

In between those two sentences are more than 1,300 words and 20 conversational turns.

Taking the first "quote" first, and ignoring the problem of yanking a phrase out of context, we've got an "approximate quotation" by anyone's standards (State Department transcript in black, NYT quote in blue):

                 Everyone  empathizes with what  the  Israelis are       facing
...  and I think everybody empathizes with what every Israeli  has to be feeling and ...

On the construal most favorable to the NYT -- scoring only the fragment from "everybody" to "feeling", and giving maximum credit for substitutions instead of insertions and deletions -- we have 5 substitutions and 2 deletions relative to 10 original words, for a word error rate of 70%. The meaning is similar, but that makes it a paraphrase rather than a quote.

In the case of the second quoted fragment, which Secretary Rice is said to have "added", there are three obvious problems. First, it's wrong to take a clause out of an indirect quotation and pretend that it's direct speech. If you say "everybody tells me that X", I can't quote you as asserting X -- you might well go to add "but I don't believe it for a minute". In this case, Rice does seem to include herself among the "everybody" who says that "it cannot be Gaza only", but that brings us to the second problem: she goes on to explain what she (at least) means by "not Gaza only", and it's not very much. Specifically,

There is, after all, even a link to the West Bank and the four settlements that are going to be dismantled in the West Bank. Everybody, I believe, understands that what we're trying to do is to create momentum toward reenergizing the roadmap and through that momentum toward the eventual establishment of a Palestinian state.

And finally, the linking phrase "but she added" seems to me to be the most dishonest thing of all. The meaning of add in question is something like "to say or write further", with the implication that the addition is in immediate rhetorical contiguity with what is added to. The use in Brinkley and Weisman's third sentence carries the clear implication that Rice chose to extend her remarks about empathy for the Gaza evacuation with a contrasting reminder of the need for further Israeli territorial concessions.

Now, that's the NYT's editorial line, and it might be the right line to take, but it's not really what Rice said. Not only was her "addition" yanked out of indirect speech attributed to others, not only was it was hedged immediately by a reference to the four West Bank settlements already being evacuated and a vague commitment to "momentum towards reenergizing the roadmap", but most important, it was in response to a different question, roughly eight minutes later, following 9 other intervening questions and answers.

I surmise that Brinkley and Weisman (or their editor) wrote the lede based on what they wanted to project as Rice's intent, and then looked through their notes on the interview for an illustrative quote. Not finding one, they stitched something together out of widely-separated fragments taken out of context. Somehow it's more surprising to see this done to the U.S. Secretary of State than to the San Antonio Spurs' scoring leader . But whether the speaker is Tim Duncan or Condi Rice, we should be able to believe that words in quotation marks in a newspaper stories are an accurate reflection of what was said, and give a fair impression of what was meant.

This is not just my own opinion. I've previously cited the NYT's own code of ethics on quotations:

Readers should be able to assume that every word between quotation marks is what the speaker or writer said. The Times does not "clean up" quotations. If a subject’s grammar or taste is unsuitable, quotation marks should be removed and the awkward passage paraphrased. Unless the writer has detailed notes or a recording, it is usually wise to paraphrase long comments, since they may turn up worded differently on television or in other publications. "Approximate" quotations can undermine readers’ trust in The Times.

The writer should, of course, omit extraneous syllables like "um" and may judiciously delete false starts. If any further omission is necessary, close the quotation, insert new attribution and begin another quotation. (The Times does adjust spelling, punctuation, capitalization and abbreviations within a quotation for consistent style.) Detailed guidance is in the stylebook entry headed "quotations." In every case, writer and editor must both be satisfied that the intent of the subject has been preserved.

Assuming that the State Department's transcript is accurate, the Brinkley and Weisman article seems to be a clear violation of both the letter and the spirit of this policy. Unfortunately, such violations are the norm rather than the exception, not only at the NYT but in print media in general.

Posted by Mark Liberman at 07:09 AM

Ultima Toolies

I'm used to noticing new things in old books, usually descriptions or situations or emotions that went past me before but now catch my attention for some reason. Last night I was surprised to learn a new word from a book that I've read at least once before: Ross Macdonald's Black Money, originally published in 1965.

Lew Archer has tracked Leo and Kitty Ketchel from LA to a mansion in "Santa Teresa", Macdonald's alias for Santa Barbara. Kitty is speaking. Lew, as usual, is thinking.

"Leo made a lifetime of enemies. If they knew he was helpless, his life wouldn't be worth that." She snapped her fingers. "Neither would mine. Why do you think we're hiding out in the tules here?"

To her, I thought, the tules meant any place that wasn't on the Chicago-Vegas-Hollywood axis.

[p. 189, 1990 Warner paperback edition]

When I hit that passage, I had absolutely no memory of ever having seen or heard the word tule, in that book or anywhere else.

The OED says that tule refers to

Either of two species of bulrush (Scirpus lacustris var. occidentalis, and S. Tatora) abundant in low lands along riversides in California; hence, a thicket of this, or a flat tract of land in which it grows.

and gives citations back to 1837

1837 P. L. EDWARDS Jrnl. 20 July (1932) 26 Driving her along the margin of a bulrush or Tule pond she turned about.
1845 J. C. FRÉMONT Rep. Exploring Expedition 252 They..live principally on acorns and roots of the tulé, of which also their huts are made.
1850 W. R. RYAN Personal Adv. Upper & Lower Calif. I. 298 The Indians of the party were despatched to hunt up the banks of the river for toolies.

The etymology is given as

[ad. Aztec tullin, the final n being dropped by the Spaniards as in Guatemala, Jalapa, etc.]

and the pronunciation is as suggested by the alternative spelling toolies.

The AHD entry explains further that

Low, swampy land is tules or tule land in the parlance of northern California. When the Spanish colonized Mexico and Central America, they borrowed from the native inhabitants the Nahuatl word tollin, “bulrush.” The English-speaking settlers of the West in turn borrowed the Spanish word tule to refer to certain varieties of bulrushes native to California. Eventually the meaning of the word was extended to the marshy land where the bulrushes grew.

Merriam-Webster's Unabridged has similar information, as does Encarta, which adds that "to be in deep tules" is a Hispanic expression meaning "to be in trouble with the law".

The OED has toolies, glossed as "Backwoods; remote or thinly populated regions.", with citations back to 1961 -- but curiously, flags it as a Canadian regional term rather than a Californian one:

1961 R. P. HOBSON Rancher takes Wife i. 22 We're plenty far back in the toolies at Batnuni.

Kenneth Millar (who wrote as Ross Macdonald) was born in Los Gatos but educated in Canada, for what that's worth.

Among the dictionaries I checked, none besides the OED gives tules, under any spelling, the meaning that's apparent in the Black Money passage. And glancing through the first hundred Google hits for {"in the tules"} didn't turn up any similar figurative uses, except that Bret Harte's short story In the Tules does make an implicit pun on Ultima Thule. However, the hits for {"in the toolies"} are a different matter:

Ok proof I've lived in the toolies just a tad too long, as I find that amusing.
There was a sense that we were out in the provinces, in the toolies.
At the time, this stretch of the old Route 66 was still "out in the toolies."
Please picture me and two tiny little kids in a very small stone house WAYYY out in the toolies.
You may find that it sometimes have you stopping so far out in the toolies that no hotels/campgrounds are anywhere nearby.

And so on. This seems to be a case of a word in fairly common use that is spelled one way when it's meant literally, and a different way in a figurative meaning. I wonder if it was Millar's choice to spell it "tules" in Black Money, or the idea of a copy editor at Knopf?

[Update: several readers have pointed me to a lovely page about the natural history of tule marshes of California's central valley, which also cites "out in the tules" as an equivalent to "out in the sticks". This page also mentions the "tule fogs", which several correspondents including Arnold Zwicky have described to me as their strongest association with the word.]

Posted by Mark Liberman at 05:58 AM

August 26, 2005

If it kwa's like a duck...

OK, this is "Language Log", not "Complaining about Editorial Standards at the New Yorker Log", so I was going to let it pass. But several readers have written to point out something strange in the little Mountweazels item that I linked to yesterday:

Anne Soukhanov, the U.S. General Editor of Encarta Webster’s, was the first to weigh in. “Ess-kwa-val-ee-ohnce—I want to pronounce it in the French manner—is your culprit,” she said.

It's the status of the made-up word esquivalience that's at issue, and Tom Rossen's reaction was the most pungent:

Kwa she talkin' 'bout, Willis? If that's what Microsoft's finest think is the French pronunciation of "qui", I'm at a loss for mots!

"Ess-kwa-val-ee-ohnce" is indeed a strange notion of how to pronounce esquivalience "in the French manner", but I don't think that it's safe to attribute the idea to Soukhanov. The pages of the New Yorker are by no means bereft of linguistic carelessness -- we've documented hallucinations about pronunciation and a preposterous transcription error, among other things, and the Soukhanov quote's chain of transmission is unclear. Henry Alford writes that "The six words and their definitions were e-mailed to nine lexicographical authorities", which suggests that the responses might have come by email as well; but then he uses the tag "she said", not "she wrote" or "she e-mailed", so maybe he talked with Soukhanov on the phone. If her answer was spoken, then the lamely fake representation of pronunciation is entirely Alford's. And if Soukhanov answered by email, that part of the quote might have been edited, either by Alford or by someone else at the New Yorker. This is the familiar problem of attributional abduction.

But even if Soukhanov provided the pronounciation as printed -- which I doubt -- it seems to me that the magazine is at fault. Depicting a respected senior lexicographer as ignorant of French pronunciation is a distraction from the light-hearted point of the piece. The spirit of Miss Gould is fading further.

Posted by Mark Liberman at 08:52 AM

August 25, 2005

Mountweazel and esquivalience

"It's like tagging and releasing giant turtles", says Erin McKean. Read all about in Henry Alford's Talk of the Town piece on lexicographic honeypots (though they are not identified by that name, which comes from the computer security area). I note that Alford says that esquivalience "has since been spotted on Dictionary.com, which cites Webster’s New Millennium as its source", but it's not there now.

Posted by Mark Liberman at 02:51 PM

Journal-mediated scholarly debate: slow and ceremonial conversation

Whether or not you're interested in the content of the on-going debate in Cognition about approaches to language evolution, you might find it interesting to contemplate its schedule. Here's a summary of the time line:

Hauser, Chomsky and Fitch (HCF): published in Science November 22, 2002.
Pinker and Jackendoff (PJ): Received 16 January 2004; accepted 31 August 2004. Available online 19 January 2005. Published March 2005.
Fitch, Hauser and Chomsky (FHC): Received 5 November 2004; accepted 15 February 2005. Available online 19 August 2005. Not published yet.
Jackendoff and Pinker (JP): posted on J's website March 23 2005. No information yet available from Cognition.

So Pinker and Jackendoff took a year or so to decide to respond to HCF and to send in their critique PJ, which arrived at Cognition roughly 14 months after HCF was published. It took 7.5 months for it to be accepted, and then another 4.5 months for it to be put on line, and another 2 months to appear in paper form. The response FHC to PJ was sent in 2 months after PJ's acceptance, 2.5 months before it appeared on line, and 4.5 months before it appeared in print; FHC was accepted 3 months after submission, published on line after an additional 6 months, and not published yet. Meanwhile, the response JP to FHC was completed and put on line (and thus I assume was sent to Cognition) about 1 month after FHC was accepted, and five months ago; apparently Cognition has accepted it, but it has not yet appeared on the Cognition web site.

(I'm not singling Cognition out for criticism — this is a typical sort of schedule for such sequences.)

This conversational tempo is reminiscent of 18th century correspondence between Europe and North America, or Europe and India, when a message could take as much as six months to reach its destination. If we start the conversational clock at the point where PJ was accepted by Cognition, and entirely ignore the timing and distribution of actual print media, we get the following sums for the conversational sequence PJ/FHC/JP:

Thinking and writing: 3 months
Review: 10.5 months + unknown time for JP, still not known to be accepted -- say 13.5 months?
Waiting for publication on line after acceptance: 10.5 months + unknown time for JP -- say 11.5 months?

The sums are uncertain because the time periods involved are not entirely disjoint (so that only 19 months in total have elapsed since PJ was received by Cognition), but it still seems likely that the mechanics of the system have slowed this conversation down at least as much as sending the manuscripts by square-rigger across the oceans would have done. Shouldn't there be a way to carry out scientific discussion that's a bit brisker? Certainly one good candidate for elimination is the 11.5 months or so clocked in this case by waiting for accepted articles to appear on line.

Somewhat more tentatively, I'd like to raise the question of whether the review process is always worth its cost in time. As is normal and probably inevitable in the refereed literature, this particular back-and-forth includes a number of factually doubtful statements and presuppositions. I'll cite just one example: in PJ we read that

HCF do discuss the ability to learn linearly ordered recursive phrase structure. In a clever experiment, Fitch and Hauser (2004) showed that unlike humans, tamarins cannot learn the simple recursive language AⁿBⁿ (all sequences consisting of n instances of the symbol A followed by n instances of the symbol B; such a language can be generated by the recursive rule S→A(S)B).

and in FHC, we read the response that

The inability of cotton-top tamarins to master a phrase-structure grammar (Fitch & Hauser, 2004) is of interest in this discussion primarily as a demonstration of an empirical technique for asking linguistically relevant questions of a nonlinguistic animal.

The reader will naturally conclude from this that Fitch and Hauser (2004) actually did establish something about the abilities of tamarins and humans to learn the language AⁿBⁿ, whereas the sad fact is that this conclusion was a serious over-interpretation of a rather limited experiment, and seems to be incompatible with later research.

In addition to such (inevitable) mistakes, the programmatic nature of this exchange results in an unusually large fraction of statements of opinion, where the role and value of the review process is especially unclear. I'll also point out that the review process, though regarded as a sacred ritual by our academic culture, is a relatively recent development. I recall reading that when Albert Einstein moved to the U.S. in the 1930s, and first submitted an article to an American journal, he was shocked and offended to learn that that it was being sent out for review. This was not because he thought himself in particular above such things, but rather because he had never encountered the practice before, so that his first reaction was that he was being singled out as an untrustworthy source. (Memory says, perhaps falsely, that I read this in Abraham Pais' wonderful biography of Einstein, Subtle is the Lord, which I don't have at hand.)

I'm a conservative sort of person, though not nearly as conservative as most academics are about their culture, so I'm not about to propose that we scrap the existing journal system. As Churchill is said to have said about democracy, it's the worst possible system, except for all the others.

But all the same, among the emerging technologies of networked text archives, links, indices and so on, there are a wide range of other possible solutions to the problems of scientific and scholarly communications that refereed journals have evolved to solve. And as a result, I'll predict that within 50 years, scientists and scholars will use a very different set of methods for communicating and discussing their research results, and the existing system of scientific and scholarly journals will survive only in a vestigial form, analogous to the caps and gowns that academics once wore all the time, and now put on only for ceremonial occasions.

[Update: Jay Cummings writes

In the upcoming Physics Today (September 2005), the story of Einstein's objection to being reviewed is told. The reviewer (with some trepidation, because after all, he knew this was Einstein) suggested a correction. Einstein refused the correction, but he turned out to be wrong.

I haven't seen the article -- I'll look forward to learning the details.

Posted by Mark Liberman at 11:20 AM

JP versus FHC+CHF versus PJ versus HCF

On August 19, 2005, the journal Cognition posted on line a 19,000-word article by Tecumseh Fitch, Marc Hauser and Noam Chomsky, entitled "The evolution of the language faculty: Clarifications and implications" (free version here), referencing an additional 6,000-word appendix "The Minimalist Program". This is the third turn in a (so far) four-turn, three-year debate with Steve Pinker and Ray Jackendoff.

Chris at Mixing Memory has posted on FHC 2005, asking especially for help in decoding Chomsky's Minimalist appendix. I'll limit myself to observing that it's entirely "inside baseball": seven pages of text that mention no linguistic facts and no specific languages, nor any simulations, formulae, or empirical generalizations. Aside from a very general and abstract account of Chomsky's view of the goals of his research, the only topic is who said what when, sometimes with a very abstract explanation of why. It's an odd document -- I can't think of anything at all comparable from a major figure in a scientific or scholarly field, except perhaps some controversies over precedence (which is not an issue here). I agree with the judgment of Jacques Mehler, the editor of Cognition, who asked for it to be cut; and it seems to me that it's a distraction for outsiders (including most of the normal readership of Cognition) to try to understand it.

However, the larger discussion of language evolution has many points of general interest, which we've touched on in this blog from time to time, and will again. So as a public service, here's a quick overview, with links, of the Chomsky/Fitch/Hauser vs. Jackendoff/Pinker story so far:

Step 1 (HCF, 2002): Marc Hauser, Noam Chomsky, and Tecumseh Fitch wrote an article in Science entitled "The Faculty of Language: What Is It, Who Has It, and How Did It Evolve?" (Vol 298, Issue 5598, 1569-157 , 22 November 2002). A free version is available here.

Step 2 (PJ, 2004): Steven Pinker and Ray Jackendoff responded with an article in Cognition entitled "The faculty of language: what's special about it?" (Volume 95, Issue 2 , March 2005, Pages 201-236 -- free version here).

Step 3 (FHC, 2005) Fitch, Hauser and Chomsky have responded, with an article due out in Cognition entitled "The evolution of the language faculty: Clarifications and implications" (free version here). The abstract refers to an "online appendix" where "we detail the deep inaccuracies in their characterization of [the Minimalist Program]". The appendix does not seem to be linked anywhere in the online paper, but it is on line here, with the authors ordered as "N. Chomsky, M.D. Hauser and W.T. Fitch", entitled "Appendix. The Minimalist Program."

Step 4 (JP, 2005): Jackendoff and Pinker will respond to the response, in an article entitled "The Nature of the Language Faculty and its Implications for Evolution of Language" (listed as "in press" at Cognition, but not yet available on line -- free version of 3/25/2005 here).

If you want a quick overview of what the conversation is about, without reading all 57,440 words so far expended by all sides, here are the abstracts, again with links to the full versions:

Step 1 (2002): Marc Hauser, Noam Chomsky, and Tecumseh Fitch wrote an article in Science entitled "The Faculty of Language: What Is It, Who Has It, and How Did It Evolve?" (Vol 298, Issue 5598, 1569-157 , 22 November 2002). A free version is available here. The abstract:

We argue that an understanding of the faculty of language requires substantial interdisciplinary cooperation. We suggest how current developments in linguistics can be profitably wedded to work in evolutionary biology, anthropology, psychology, and neuroscience. We submit that a distinction should be made between the faculty of language in the broad sense (FLB) and in the narrow sense (FLN). FLB includes a sensory-motor system, a conceptual-intentional system, and the computational mechanisms for recursion, providing the capacity to generate an infinite range of expressions from a finite set of elements. We hypothesize that FLN only includes recursion and is the only uniquely human component of the faculty of language. We further argue that FLN may have evolved for reasons other than language, hence comparative studies might look for evidence of such computations outside of the domain of communication (for example, number, navigation, and social relations).

Step 2 (2004): Steven Pinker and Ray Jackendoff responded with an article in Cognition entitled "The faculty of language: what's special about it?" (Volume 95, Issue 2 , March 2005, Pages 201-236 -- free version here). The abstract:

We examine the question of which aspects of language are uniquely human and uniquely linguistic in light of recent suggestions by Hauser, Chomsky, and Fitch that the only such aspect is syntactic recursion, the rest of language being either specific to humans but not to language (e.g. words and concepts) or not specific to humans (e.g. speech perception). We find the hypothesis problematic. It ignores the many aspects of grammar that are not recursive, such as phonology, morphology, case, agreement, and many properties of words. It is inconsistent with the anatomy and neural control of the human vocal tract. And it is weakened by experiments suggesting that speech perception cannot be reduced to primate audition, that word learning cannot be reduced to fact learning, and that at least one gene involved in speech and language was evolutionarily selected in the human lineage but is not specific to recursion. The recursion-only claim, we suggest, is motivated by Chomsky's recent approach to syntax, the Minimalist Program, which de-emphasizes the same aspects of language. The approach, however, is sufficiently problematic that it cannot be used to support claims about evolution. We contest related arguments that language is not an adaptation, namely that it is “perfect,” non-redundant, unusable in any partial form, and badly designed for communication. The hypothesis that language is a complex adaptation for communication which evolved piecemeal avoids all these problems.

Step 3 (2005) Fitch, Hauser and Chomsky have responded, with an article due out in Cognition entitled "The evolution of the language faculty: Clarifications and implications" (free version here). The abstract:

In this response to Pinker and Jackendoff's critique, we extend our previous framework for discussion of language evolution, clarifying certain distinctions and elaborating on a number of points. In the first half of the paper, we reiterate that profitable research into the biology and evolution of language requires fractionation of “language” into component mechanisms and interfaces, a non-trivial endeavor whose results are unlikely to map onto traditional disciplinary boundaries. Our terminological distinction between FLN and FLB is intended to help clarify misunderstandings and aid interdisciplinary rapprochement. By blurring this distinction, Pinker and Jackendoff mischaracterize our hypothesis 3 which concerns only FLN, not “language” as a whole. Many of their arguments and examples are thus irrelevant to this hypothesis. Their critique of the minimalist program is for the most part equally irrelevant, because very few of the arguments in our original paper were tied to this program; in an online appendix we detail the deep inaccuracies in their characterization of this program. Concerning evolution, we believe that Pinker and Jackendoff's emphasis on the past adaptive history of the language faculty is misplaced. Such questions are unlikely to be resolved empirically due to a lack of relevant data, and invite speculation rather than research. Preoccupation with the issue has retarded progress in the field by diverting research away from empirical questions, many of which can be addressed with comparative data. Moreover, offering an adaptive hypothesis as an alternative to our hypothesis concerning mechanisms is a logical error, as questions of function are independent of those concerning mechanism. The second half of our paper consists of a detailed response to the specific data discussed by Pinker and Jackendoff. Although many of their examples are irrelevant to our original paper and arguments, we find several areas of substantive disagreement that could be resolved by future empirical research. We conclude that progress in understanding the evolution of language will require much more empirical research, grounded in modern comparative biology, more interdisciplinary collaboration, and much less of the adaptive storytelling and phylogenetic speculation that has traditionally characterized the field.

In a continuation of the conversation with Fitch, Chomsky, and Hauser on the evolution of language, we examine their defense of the claim that the uniquely human, language-specific part of the language faculty (the “narrow language faculty”) consists only of recursion, and that this part cannot be considered an adaptation to communication. We argue that their characterization of the narrow language faculty is problematic for many reasons, including its dichotomization of cognitive capacities into those that are utterly unique and those that are identical to nonlinguistic or nonhuman capacities, omitting capacities that may have been substantially modified during human evolution. We also question their dichotomy of the current utility versus original function of a trait, which omits traits that are adaptations for current use, and their dichotomy of humans and animals, which conflates similarity due to common function and similarity due to inheritance from a recent common ancestor. We show that recursion, though absent from other animals’ communications systems, is found in visual cognition, hence cannot be the sole evolutionary development that granted language to humans. Finally, we note that despite Fitch et al.’s denial, their view of language evolution is tied to Chomsky’s conception of language itself, which identifies combinatorial productivity with a core of “narrow syntax.” An alternative conception, in which combinatoriality is spread across words and constructions, has both empirical advantages and greater evolutionary plausibility.

Posted by Mark Liberman at 10:04 AM

Robertson, Bonhoeffer and "taking out" dictators

Theologians can debate whether it counts technically as bearing false witness against the AP, but in this passage from his show yesterday (8/24/2005), Pat Robertson certainly seems to be straying from the intent of the ninth commandment. The relevant portion of the transcript:

Thor Halvorssen:	Essentially, Hugo Chavez has turned Venezuela into a dictatorship. Now, I think that it's very important to also note that your comments were about assassination. The person- I think that alternative is lowering to his level-
Pat Robertson:	Uh I didn't say "assassination", I said our- our special forces should quote "take him out", and "take him out" can be a l- a number of things including kidnapping, there are a number of ways to take out a dictator from power besides killing him. Uh I was misinterpreted by the AP, but that happens all the time.

What Pat actually said on 8/22/2005 was:

Thanks, Dale. If you look back just a few years, there was a popular coup that overthrew him and what did the United States State Department do about it? Virtually nothing. And as a result, within about forty eight hours that coup was broken, Chavez was back in power; but we had a chance to move in; he has destroyed the Venezuelan economy, and he's going to make that a launching pad for communist infiltration and- and uh muslim extremism all over the continent. You know, I don't know about this doctrine of assassination, but if he thinks we're trying to assassinate him, I think that we really ought to go ahead and do it, it's a whole lot cheaper than starting a war, and uh uh I don't think any oil shipments will stop. But this man is a terrific danger, and the United- this is in our sphere of influence, and we can't let this happen. We have the Monroe Doctrine, we have other doctrines that we have announced, and uh without question, this is a dangerous uh enemy to our south controlling a huge pool of oil, that could hurt us very badly. We have the ability to take him out, and I think the time has come that we exercise that ability. We don't need another two hundred billion dollar war uh to get rid of one you know, strong-arm dictator. It's a whole lot easier to have some of the covert operatives do the job and then get it over with. Kristi? [emphasis added]

It's hard to listen to this passage and understand "take him out" to mean anything other than "kill him". It's true that "to take someone out" can mean to escort them on a date, and that a football block can take someone out of a play. Googling "took him out of the" gives us completions like "the chariot", "the town", "the school", "the picture", "the fraternity", "the crate" and so on. However, the obvious meaning in this context was "assassinate".

And in any case, the first quote given in the 8/22/2005 AP story was not the "take him out" business, but this:

"You know, I don't know about this doctrine of assassination, but if he thinks we're trying to assassinate him, I think that we really ought to go ahead and do it," Robertson said. "It's a whole lot cheaper than starting a war ... and I don't think any oil shipments will stop."

Aside from cleaning up a few uhs, the AP quote is completely accurate, and gives the lie to Robertson's assertion that "I didn't say 'assassination'".

It's especially curious that Robertson makes this feeble attempt to deny his own words, given that a few minutes earlier in the same (8/24/2005) show he sketches a justification for assassination, based on the ethical theory and practice of Dietrich Bonhoeffer:

Ladies and gentlemen, you can see Dale's full story on cbn.com, and uh a number of people have expressed their opinions about Chavez, including a man named Thor Halvorssen. He's a Venezuelan and president of the Human Rights Foundation, and he's going to join us in just a minute from New York, but before I get to that, I want to tell you about a statement of uh the great Dietrich Bonhoeffer, who suffered under Adolf Hitler, and wondered what would be the case of a wicked dictator like Hitler, how would Christians react to that. And uh Dietrich Bonhoeffer is reported to have said "if you see a car going out of control, and heading toward a group of people, do you try to stop the car or ((do)) you console the victims after it hits them?" And he said after weighing the moral consequences of that, he determined it would be better to stop the car and therefore he allied himself with those who were attempting to assassinate Adolf Hitler, and to take this monster off the world stage. That by the way cost uh this brave soldier of the cross his life, because one did not speak out against Adolf Hitler.

You can read more about Dietrich Bonhoeffer here and here. According to the second link, Bonhoeffer was indeed hung, not for speaking out against Hitler, but for participating in a plot to assassinate him:

Bonhoeffer's role in the conspiracy was one of courier and diplomat to the British government on behalf of the resistance, since Allied support was essential to stopping the war. Between trips abroad for the resistance, Bonhoeffer stayed at Ettal, a Benedictine monastery outside of Munich, where he worked on his book, Ethics, from 1940 until his arrest in 1943. Bonhoeffer, in effect, was formulating the ethical basis for when the performance of certain extreme actions, such as political assassination, were required of a morally responsible person, while at the same time attempting to overthrow the Third Reich in what everyone expected to be a very bloody coup d'etat. This combination of action and thought surely qualifies as one of the more unique moments in intellectual history.

The car story seems to come from a memoir by G. Leibholz, reproduced in the beginning of The Cost of Discipleship, where he writes of Bonhoeffer (in the English translation)

As he used to say: it is not only my task to look after the victims of madmen who drive a motorcar in a crowded street, but to do all in my power to stop their driving at all.

Shortly after Robertson's Bonhoeffer passage, when Halvorssen objects to the assassination idea, Robertson doesn't deny it (link):

Thor Halvorssen:

Now I- I did wanted to mention, Pat, that the report filed by Dale Hurd yesterday, that aired on CBN, is without question one of the most accurate um reports that have appeared in the media; certainly a lot more accurate and uh alarming than what has appeared in the mainstream media; and for that I would like to commend you. Uh by the same token, I would categorically like to say that I- I disagree with you on uh in terms of the solution to some of these issues and I do not think that assassination is- is a- the route that any country should take in the case of Chavez.

Pat Robertson:

Well I appreciate that; what would you do to stop him?

So it's all the stranger that he breaks in later to claim that he didn't say what his own archives clearly documents him as saying. I can't imagine Bonhoeffer attempting to deny his own words in the way that Robertson has done.

[Update: as Timothy Noah points out in Slate, Robertson also appeals to the example of Bonhoeffer in the text of his official apology. In fact, Robertson's statement says that calling for assassination was wrong:

Is it right to call for assassination? No, and I apologize for that statement. I spoke in frustration that we should accommodate the man who thinks the U.S. is out to kill him.

but never that performing an assassination would be wrong; and the extended discussion of Bonhoeffer that follows makes Robertson's views on that matter clear enough. This is a non-apology of a different kind than those that Geoff Pullum dissected earlier: it does indeed have the grammatical form of an apology. It's rather like saying "I apologize for calling you a liar. I spoke in frustration, because I was upset about all the times you said things that aren't true." ]

Posted by Mark Liberman at 12:39 AM

August 24, 2005

An unspeakable title

My friend Caroline Henton lent me her copy of another book with an unspeakable title. In this case it is not a matter of modesty, as when NPR refuses to read a title out loud because it contains an Anglo-Saxon term for excrement, but rather that there is strictly no possible out-loud reading: it is a book whose orthographic title has no phonetic counterpart (like the two films I have mentioned elsewhere). The author is Sterling Johnson, an ESL teacher and lecturer from Pacific Grove, California, and the title is Watch Your F*cking Language: How to Swear Effectively, Explained in Explicit Detail and Enhanced by Numerous Examples Taken from Everyday Life (New York: Thomas Dunne Books). The asterisk is there even in the Library of Congress cataloguing data. Incidentally, I am not recommending the book. It only just manages to get to page 100 by dint of much wasted space and lots of large gaps between paragraphs, and in my opinion it s*cks. At least, I did not s*cc*mb to its charms. Caroline will get her copy back *ns*llied and relatively *nth*mbed. It's not the first book with an asterisk in the title; it's not even the first by Sterling Johnson, who has an earlier book entitled English as a Second F*cking Language.

Posted by Geoffrey K. Pullum at 06:40 PM

Remedial history for Pat Robertson -- better editors for the MSM

Amid all the fuss about Pat Robertson's assassination suggestion, no one seems to have picked up on what I thought was the oddest part of his outburst, namely the reference to the Monroe Doctrine.

I transcribed the entire passage, from the video in the 700 Club archive. The format is that of a news program. We're at the end of a canned segment on Venezuela, and about to start a segment on some events in Iraq. We switch to Robertson behind the anchor desk, and he says:

Thanks, Dale. If you look back just a few years, there was a popular coup that overthrew him and what did the United States State Department do about it? Virtually nothing. And as a result, within about forty eight hours that coup was broken, Chavez was back in power; but we had a chance to move in; he has destroyed the Venezuelan economy, and he's going to make that a launching pad for communist infiltration and- and uh muslim extremism all over the continent. You know, I don't know about this doctrine of assassination, but if he thinks we're trying to assassinate him, I think that we really ought to go ahead and do it, it's a whole lot cheaper than starting a war. and uh uh I don't think any oil shipments will stop. But this man is a terrific danger, and the United- this is in our sphere of influence, and we can't let this happen. We have the Monroe Doctrine, we have other doctrines that we have announced, and uh without question, this is a dangerous uh enemy to our south controlling a huge pool of oil, that could hurt us very badly. We have the ability to take him out, and I think the time has come that we exercise that ability. We don't need another two hundred billion dollar war uh to get rid of one you know, strong-arm dictator. It's a whole lot easier to have some of the covert operatives do the job and then get it over with. Kristi?

I expect that Mr. Robertson is old enough to have learned in school what the Monroe Doctrine is. Does he really think that Hugo Chavez represents a case of European intervention?

For a change, the reproduction of Robertson's quotes in the media are pretty accurate (thus Laurie Goodstein's NYT story quotes 38 words that entirely agree with my transcript, punctuation choices aside), but there are a few oddities. For example, the story on the Bloomberg wire replaces "have" with "let", introduces an ungrammatical "to", and deletes "then" in one of Pat's phrases:

It's a whole lot easier to let      some of the covert operatives to do the job and      get it over with.
It's a whole lot easier to     have some of the covert operatives    do the job and then get it over with.

And the Knight Ridder story deletes "of the", changes "covert" to the ungrammatical "cover", makes "operatives" singular, and also elides the "then":

It's a whole lot easier to have some                         cover operative do the job and      get it over with.
It's a whole lot easier to have some of the covert operatives                do the job and then get it over with.

The Bloomberg version of the phrase has 4 errors in 21 words, for a word error rate of 19%; Knight Ridder has 5 errors in 21 words, for a W.E.R. of 24%. This is better than you often see -- and rest of the reported quotes from Robertson were generally even closer to what he actually said. But really, is there any excuse for not getting it completely right in this case, where the reporters were presumably not basing their quotes on notes from a live presentation, but were transcribing from the same archival recording that I used?

Worse, each transcription error introduced a solecism: "...let some of the covert operatives to do the job..."; "have some cover operative do the job". Shouldn't an editor have noticed this, and asked someone to spend a few minutes to check whether a highly verbal media personality like Robertson really said it that way?

This sort of carelessness with elementary facts, which seems to be the norm rather than the exception in newspapers today, cuts the ground out from under arguments about the value of editors.

[Update: as several readers have suggested, Mr. Robertson's reference to "other doctrines that we have announced" probably was a swipe in the general direction of the (Theodore) Roosevelt corollary to the Monroe Doctrine. However, Calvin Coolidge and Herbert Hoover explicitedly repudiated the Roosevelt Corollary in 1928 and 1930, as did FDR in 1934 and others since. ]

Posted by Mark Liberman at 08:40 AM

August 23, 2005

Tar heel, Brahmin; edited, unedited; whatever...

In today's WSJ, Philip Howard has a review (subscribers only) of the new edition of "Webster's New World College Dictionary". Howard is a writer at the London Times, and he takes the opportunity to meditate on differences between British and American varieties of English, sprinkling his review with gracious little transatlantic compliments that are so forced as to seem almost like insults:

"It may be painful for a Little Englander to admit, but Webster leads Oxford in priority, in the same way that the U.S. leads the U.K. in technology, fashion, and the thousand other variables that make up modern living."

Another dribble of soft soap:

"We can conclude (e.g., from rhymes) that the pronunciation of Shakespeare was closest to that of a Boston Brahmin."

But the BBC told us not long ago that original-accent Shakespearean English is "completely intelligible if you happen to come from North Carolina".

However, I don't think we should trouble ourselves examining Howard's scholarship too deeply. He ends his review with some remarks on the virtues of printed books over on-line text, and the benefits of editing, which he demonstrates by telling a funny story about the bad things that happen when something is published without being edited. Oops, make that a funny story about the bad things that happen when something is subjected to editing....

I trust books from a reputable house to have been edited: I don't trust anything on the Internet. At the London Times we have a correspondent called Brian Cosker, the economics head of a group of English schools, who writes to us from Baldock in Hertfordshire. A copy editor in a hurry ran his letter through Spellcheck and Mr. Cosker appeared in print as "Drain Coaster from Padlock." What Mr. Cosker thought of that, you can be sure, will never make it into Webster's.

This one of those stories that sounds good over a pint, but seems increasingly implausible if you think about it seriously. I tried searching the Times archive, which is unaware of any articles authored by "Drain Coaster" since the start of the archive in 1985. I rather doubt that an editor would have run "Spellcheck" over submitted copy at the Times, in a hurry or otherwise, before 1985. So either Mr. Howard is embroidering, or the on-line version has been corrected.

In any case, his logic is odd. He trusts books from reputable houses, because they are edited; but he doesn't trust "anything on the internet", because a Times editor once turned Brian Coster from Baldock into Drain Coaster from Padlock. Should we conclude that the Times is not "a reputable house"? Surely not -- rather, it seems that Mr. Howard is angling for the prestigious Michael Gorman Prize for Pleistocene Punditry, and lost the thread of his argument while trying to maximize the number of pokes at computers and the internet he could fit into the few dozen words available to him.

Philip Howard's normal beat seems to be the Modern Manners column, which makes sense, since logic and historical accuracy are less relevant to advice about etiquette than they are in other areas of modern journalism.

Posted by Mark Liberman at 08:48 PM

Units

An article by Edward Wyatt in today's NYT calibrates Paul Anderson's new 1,360-page, four-pound-nine-ounce novel "Hunger's Brides" as weighing as much as 2.5 copies of "The Da Vinci Code". Next, maybe someone will figure out what all the pages laid end-to-end would add up to in smoots.

Posted by Mark Liberman at 03:12 PM

You can call it x all you want

Back in June, I puzzled over a particular example of a snowclone that I heard in a movie and on the radio: "that's why they call it acting". Several responses came in almost immediately, some saying basically the same thing as (or agreeing with) Mark's analysis: the dictionary definition of acting offers appropriate multiple senses of the word to render the example not so remarkable after all.

I'm not here to argue against this completely reasonable, polysemy-based analysis of the "... acting" example, but I do still wonder -- as do some of my correspondents -- if that's what the people who used the example were thinking when they decided to use it. My doubts are primarily fueled by other examples of the "that's why they call it x" snowclone that I've come across, including the "that's why they call it money" example that I noted at the end of my original post. I think a strict polysemy analysis of any of these examples (if one is even possible) is far more of a stretch than it is for the "... acting" example.

What I mean by "strict polysemy" here is critical, because I'll contrast it with "loose polysemy" in a moment. The way I'm defining it, the word substituting for x in the "that's why they call it x" snowclone is "strictly polysemous" if it has a generally agreed-upon set of senses (as defined, say, by a dictionary), at least two of which can be convincingly argued to be invoked and compared/contrasted by the snowclone. (I do realize that "generally agreed-upon" and "convincingly" are major points of weakness in this definition, but I'll go on for lack of a better way to put it.) For example, money has at least the following seven senses:

1. A medium that can be exchanged for goods and services and is used as a measure of their values on the market, including among its forms a commodity such as gold, an officially issued coin or note, or a deposit in a checking account or other readily liquifiable account.
2. The official currency, coins, and negotiable paper notes issued by a government.
3. Assets and property considered in terms of monetary value; wealth.
4a. Pecuniary profit or loss. b. One's salary; pay.
5. An amount of cash or credit: raised the money for the new playground.
6. Sums of money, especially of a specified nature. Often used in the plural.
7. A wealthy person, family, or group.

Unlike Mark's analysis of the "... acting" example, I just don't see how any two of the senses above can be contrasted to explain the "... money" example, so money is not strictly polysemous in the sense that I've defined to be relevant to this post (though I'm looking forward to the flood of correspondence I'm likely to get on this conclusion).

What I'd like to suggest is a weaker, "loose polysemy" analysis of the "that's why they call it x" snowclone: two relevant senses are coerced for x, even when the two senses can't be matched up (by the listener) with generally agreed-upon senses of x. In other words, what makes the "that's why they call it money" example interesting is the fact that you are forced to imagine what money might mean other than the obvious sense in 1. above -- you might even go through the other senses in 2. through 7. in your head, find that none does the trick, and arrive at the (I think intended) interpretation that the speaker is obsessed with money. (The "... acting" example was also interesting to me in this vague sort of way, at least until Mark and others showed me that relevant senses are available.)

Here are a few more examples I've been collecting, all of which have the same basic loose-polysemy flavor (to me) as the "... money example.

From the pilot episode of Monk:

Adrian Monk (Tony Shalhoub): "How long have you and Warren been married?"
Miranda St. Claire (Gail O'Grady): "Five years."
Monk: "Must be tough -- he's so busy, and now he's running for mayor. I would think that would be kind of stressful."
St. Claire: "You've been married, right?"
Monk: "Yes, I have."
St. Claire: "Then I don't have to tell you: every marriage is stressful. That's why they call it marriage."

marriage:
1a. The legal union of a man and woman as husband and wife. b. The state of being married; wedlock. c. A common-law marriage. d. A union between two persons having the customary but usually not the legal force of marriage: a same-sex marriage.
2. A wedding.
3. A close union.
4. Games The combination of the king and queen of the same suit, as in pinochle.

From 3rd World Bomb Squad (warning: graphic/tasteless/not-for-the-faint-of-heart), "an apparently real-life video clip" (forwarded to me by Neil Whitman, who heard about it elsewhere) with accompanying commentary (insert [sic] where appropriate):

Frame 1: Let me get this straight. You find a briefcase abandoned in a third world country and you think it might be a bomb.
Frame 2: What should you do?
Frame 3: (A) Open it and find out what's inside.
Frame 4: (B) Allow bystanders to look over your shoulder and crowd around
Frame 5: (C) Open it yourself without any protective equipment while being assisted by another officer equally unprotected, [pause] all while other officers are present who at least have body armor on.
Frame 6: (D) All of the above [pause] What do you think Third World Police Officer picked......
[This is followed by a 35-second video clip of a group of men crouched around a briefcase, which explodes, apparently killing some and injuring others.]
Frame 7: Thats why they're called 3rd world countries.

(Speaking of Neil Whitman: back in December, he discussed another kind of "that's why they call it x" example.)

One more example: I agree with Bridget at Ilani Ilani that the Elton John / Bernie Taupin lyric "I guess that's why they call it the blues" doesn't make much sense: the only sense being talked about in the song, as far as I can tell, is "feeling blue" -- sure, there's the salient sense of "blues music", but ...

And another, by way of Ben Zimmer (added 10/7/2005):

In Tuesday's episode of "Veronica Mars", Veronica is investigating a man's death, and she brings together his grieving daughter (Jessie) and his mistress (Carla) for the first time. The dialogue goes:

Carla: You look just like your picture.
Jessie (bitterly): That's why they call them *pictures*.

Here, the sense seems to be "Duh, the whole purpose of a picture is to resemble the person pictured." So the snowclone works to underscore the tautological obviousness of Carla's opening pleasantry, which Jessie explicitly rejects. (Jessie later warms up to Carla, of course.)

[ Comments? ]

Posted by Eric Bakovic at 02:28 PM

August 22, 2005

When "fuzzy" means "smoothed piecewise linear"

In yesterday's NYT magazine, Peter Maas has an article called "The Breaking Point", which features the concerns of Matthew Simmons about Saudi oil reserves, and puts Simmons' report of Saudi "fuzzy logic" to important rhetorical use:

Two years ago, Simmons went to Saudi Arabia on a government tour for business executives. The group was presented with the usual dog-and-pony show, but instead of being impressed, as most visitors tend to be, with the size and expertise of the Saudi oil industry, Simmons became perplexed. As he recalls in his somewhat heretical new book, ''Twilight in the Desert: The Coming Saudi Oil Shock and the World Economy,'' a senior manager at Aramco told the visitors that ''fuzzy logic'' would be used to estimate the amount of oil that could be recovered. Simmons had never heard of fuzzy logic. What could be fuzzy about an oil reservoir? He suspected that Aramco, despite its promises of endless supplies, might in fact not know how much oil remained to be recovered.

We can deduce from Simmons' ignorance of "fuzzy logic" that he hasn't bought a rice cooker recently. Not that buying a fuzzy logic rice cooker, or riding in a fuzzy logic elevator or a fuzzy logic subway train, would offer much insight into what the term really means. Of course, he could have checked with Google or looked at the Wikipedia entry.

On the other hand, looking the term up might not have helped. According to Simmons' book, when he asked the Aramco manager "what fuzzy logic precisely [means]", he got the standard sort of answer describing the work of Lotfi Zadeh on the logic of statements that are not crisply true or false, but are true to some intermediate degree. Thus the various Saudi oil fields, he was told, are neither exactly young and vigorous, nor old and played out, but somewhere in between. Simmons was not impressed by this answer, and writes that "hearing the Aramco manager's comment was one of the little events that tipped my thinking about the Saudi Arabian Oil Miracle towards skepticism". In fact, I suspect that the "fuzzy logic" presentation in fact was based on relatively sensible methods (though I have no idea whether Simmons skepticism about Saudi oil projections is justified on other grounds or not).

This SF Chronicle story "Rice goes digital cooked the fuzzy logic way" gives a similar sort of formulation:

Fuzzy logic recognizes more than simple true and false values; it sees degrees of truthfulness, for example, in the statement, "There is a 25 percent chance of rain today." Fuzzy logic deals with complex real systems. The Japanese learned exactly how well it worked when they used fuzzy logic to operate subway cars, which then ran and stopped more smoothly than when they were human-operated or automated. Fuzzy logic balanced out the complex components of acceleration, deceleration and braking.
Rice cooks in basically four stages: It stands in water, it boils, it absorbs (the "steamed stage") and then it rests. Heat is accelerated or decelerated for each stage and in different ways for each variety of rice.

This also is likely to leave a logical reader somewhat puzzled. Why are the complexities of subway car operation, or the four stages of rice cooking, improved by an approach that treats propositions as (say) 25% true?

I learned about Zadeh's fuzzy logic when I was a graduate student, back in the paleolithic era, but despite the intrinsic interest of the idea, there didn't seem to be any really impressive results or really useful applications. When I first heard about "fuzzy logic" control systems (during the neolithic age, about 20 years ago -- before Google or Wikipedia), I was puzzled. What exactly does the degree of truth of statements have to do with algorithms for controlling trains or elevators? When I asked this question after a dog-and-pony show at a Japanese research lab in the mid-1980s, I got answers like those that Simmons and the SF Chronicle got, repeating what I already knew about fuzzy logic, without adding anything convincing about the application to control theory. It sounded to me like technological double-talk. I was sure that the engineers were doing something relevant to control in complicated situations, but the "fuzzy logic" label seemed like a flack's evocative slogan for a variety of different technologies that didn't seem to have anything much to do with logic, fuzzy or otherwise.

A friend with a background in chemical engineering set me straight. His explanation went something like this: Standard control systems are linear. That means that controllable outputs (heating, accelerating, braking, whatever) are calculated as a linear function of available inputs (time series of temperature, velocity, and so on). Linearity makes it easy to design such systems with specified performance characteristics, to guarantee that the system is stable and won't go off into wild oscillations, and so on. However, the underlying mechanisms may be highly non-linear, and therefore the optimal coefficient choices for a linear control system may be quite different in different regions of a system's space of operating parameters. One possible solution is to use different sets of control coefficients for different ranges of input parameters. However, the transition from one control regime to another may not be a smooth one, and a system might even hover at the boundary for a while, switching back and forth. So the "fuzzy control" idea is to interpolate among the recipes for action given by different linear control systems. If the measured input variables put us halfway between the center of state A and the center of state B, then we should use output parameters that are halfway between state A's recipe and state B's recipe. If we're 2/3 of the way from A to B, then we mix 1/3 of A's recipe with 2/3 of B's; and so on.

In the case of the four stages of rice cooking, I suppose that a fuzzy logic controller is able to treat the process as a series of fuzzy or gradient transitions rather than a series of hard, stepwise transitions. I suspect that Simmons' Aramco executive was trying to present research that used a vaguely analogous method to fit a smoothed piecewise linear model to data about oil recovery as a function of various independent variables, including oil field "age". In both cases, the fuzzy approach might well be appropriate, under whatever name (though here's an alternative story about heating control -- and I have to say that I'm still quite happy with my old non-fuzzy thrift shop rice cooker...).

If you've shopped for a rice cooker recently, you'll have seen the addition of yet another buzzword: some cookers are not just "fuzzy", they're "neuro fuzzy". That term has a "what is this applicance doing to my brain" vibe that may not appeal to Americans -- I notice that our malls are not yet flooded with neuro fuzzy microwaves, for example. And indeed even plain fuzzy is by no means an entirely positive word. When George Bush famously accused Al Gore of "disparaging my [tax] plan with all this Washington fuzzy math", it was not a warm fuzzy moment.

But if you want to understand what "neuro fuzzy" means, you can read about it here. And there is a whole fuzzy world out there, as these links can help you discover. Though you might want to read this semi-skeptical review first.

[Update: Fernando Pereira emailed

Petroleum geologists have been pioneers on pretty sophisticated spatiotemporal estimation and smoothing techniques, for instance kriging (aka Gaussian process regression for statisticians). There are tight connections between GP regression and spline smoothing (via the theory of reproducing kernel Hilbert spaces). Either the Saudis are not hiring the best petroleum geologists, or they are being deliberately obfuscating with marketroid talk. I can't think of any situation in which fuzzy ideas (pun intended) would be preferable to Bayesian statistics for inference.

Well, if the "fuzzy logic" stuff in this case was for marketing purposes, it clearly had the opposite of the desired effect on at least one of its targets.]

[Update #2: Mike Albaugh emailed links to an interesting review article by Daniel Abramowitch, with an associated set of slides. ]

Posted by Mark Liberman at 12:53 PM

August 21, 2005

British Science: West Point takes the lead

Well, John Dryden and the Duke of Buckingham are still leading the words as turds parade in the general category, but the U.S. Military Academy's "annual yearbook" the Howitzer has dethroned T.S. Eliot and Ezra Pound in the race for the earlier reference in writing to the specific term B.S. and its relatives. Ben Zimmer pointed out by email that HDAS cites B.S. from the Howitzer in 1900, in a volume not yet available digitally. Ben also cites a glossary of West Point Slang ("Published for the benefit of our struggling relatives and others who try to read our letters"), from the 1905 issue of the Howitzer and available from the U.S. Military Academy Digital Library, with these entries:

B-essy -- An adjective used to describe a person addicted to the use of superfluous or flowery language.
B.S. -- British science: the English language. Superfluous talk.
Big Green B. S. -- Popular name for Williams' "Composition and Rhetoric."
Little Green B. S. -- Abbot's "How to Write Clearly."
Red B. S. -- Meiklejohn's "English Language."

The gloss for "B.S." is especially nice.

Ben also made the general observation that

Taboo restrictions on "shit", "crap", etc. certainly limit historical investigations into the "language as excrement" metaphor. But it's interesting to note that various nonsensical terms for nonsense have been *interpreted* to refer to excrement in some euphemistic fashion. So, for instance, "horsefeathers" is widely believed to be a euphemism for "horseshit", though there is no solid evidence for that derivation [1], [2]. Similarly, "poppycock" is often reported to be an Anglicization of a Dutch word, pappekak, meaning 'soft dung'. But no such word has been found in Dutch dictionaries, and the etymological conjecture was put forth by Webster's New International 2nd Edition (1934) more than eighty years after the earliest known usage of "poppycock" [3]. Then there's "bushwa", which HDAS says is probably from French bourgeois, though it is now taken to be a euphemism for "bullshit".

It's readers like Ben who have enabled us to establish the position of Language Log in the highly competitive field of B.S. scholarship.

Posted by Mark Liberman at 08:06 AM

August 20, 2005

Maybe it was John Dryden

He certainly didn't scoop T.S. Eliot on bullshit, but he might still have been the first poet to use the excretion of bodily wastes as a metaphor for the deprecated expression of ideas. Dryden was definitely the one who invented the idea that preposition-stranding is wrong, but it's odd to think that someone could have invented the Language is Excrement metaphor. This connection seems so natural that I thought it must be as old as the concepts are, like the relation between increasing and rising. (Or is there a culture where your age goes down as you get older?) However, a bit of poking around in the excremental vocabulary of the classical languages failed to discover any examples of crap words used in a figurative sense to describe the expression of false or foolish or shoddy ideas. And a search of LION turned up a poem from the late 17th century, entitled "A Familiar Epistle to Mr. Julian, Secretary to the Muses", and variously attributed to Dryden and to George Villiers, Duke of Buckingham, in which this metaphor is proclaimed:

1 Thou common-shore of this Poetick Town,
2 Where all our Excrements of Wit are thrown:
3 For Sonnet, Satire, Baudry, Blasphemy,
4 Are empty'd and disburden'd all on thee.
5 The cholerick Wight untrussing in a Rage,
6 Finds thee, and leaves his Load upon thy Page.

LION says that "The attribution of this poem is questionable", and does not date its composition, but Villiers lived 1628-1687, and Dryden 1631-1700. Pending new claims (and I'll be surprised not to get some), I'll take this to be earliest documentation of the Language is Excrement meme, of which the term bullshit is a later instance.

Another candidate from the same LION search is an extraordinary piece of poetic vituperation by John Oldham, entitled "Upon the Author of a Play call'd Sodom", and dated 1680. He certainly makes many comparisons between deprecated writing and various noxious materials:

22 Vile Sot! who clapt with Poetry art sick,
23 And void'st Corruption, like a Shanker'd Prick.
24 Like Ulcers, thy impostum'd Addle Brains,
25 Drop out in Matter, which thy Paper stains:
26 Whence nauseous Rhymes, by filthy Births proceed,
27 As Maggots, in some T---rd, ingendring breed.

and he ends with his own excremental comparison, where however the crucial analogy seems to be between a deprecated work and pieces of toilet paper:

47 Or (if I may ordain a Fate more fit)
48 For such foul, nasty, Excrements of Wit,
49 May they condemn'd to th'publick Jakes, be lent,
50 For me I'd fear the Piles, in vengeance sent
51 Shou'd I with them prophane my Fundament)
52 There bugger wiping Porters, when they shite,
53 And so thy Book it self, turn Sodomite.

Posted by Mark Liberman at 10:57 PM

August 19, 2005

Minorities as legal minors?

Yesterday the AP wire ran a story by Erin Texeira with the lede

What do you call a minority that is becoming the majority? News that Texas is the fourth state in which non-Hispanic whites make up less than 50 percent of residents has renewed discussion about whether the term 'minority' has outlived its usefulness; critics include both liberals and conservatives.

Texeira quotes Roderick J. Harrison, identified as "a demographer with the Joint Center for Political and Economic Studies", as presenting a bit of etymology that suprised me:

"The word's origins are that these are populations that once had the status of minors before the law," Harrison said.

It's certainly true that minority can mean "The period of a person's life prior to attaining full age", and that this usage is very old. with OED citations back to 1493.

However, the OED sets up a separate sense for

3. a. A group or subdivision whose views or actions distinguish it from the main body of people; (originally spec.) a party voting together against a majority in a deliberative assembly or electoral body.

with citations from 1716:

1716 J. ADDISON Freeholder No. 9 p.11 The Parliament of Great Britain, against whom you bring a stale accusation which has been used by every minority in the memory of man.
1736 R. AINSWORTH Thes. Linguæ Latinæ, Minority (lesser number).
1765 L. STERNE Life Tristram Shandy VIII. xix. 66 To prevent your honours of the Majority and Minority from tearing the very flesh off your bones in contestation.
1790 E. BURKE Refl. Revol. in France (ed. 2) 186 In a democracy, the majority of the citizens is capable of exercising the most cruel oppressions upon the minority.

These uses don't seem to derived the legal term minor (which is attested from 1552), but instead seem to be transparently related to the ordinary word minor meaning "lesser" or "relatively small" (attested from 1230 or so), as applied to the counting of heads in a political contest.

The OED then treats the sense

3.b. A small group of people differing from the rest of the community in ethnic origin, religion, language, etc.; (now sometimes more generally) any identifiable subgroup within a society, esp. one perceived as suffering from discrimination or from relative lack of status or power.

as an extension of this "political minority" sense, with citations from 1837:

1837 U. S. Mag. & Democratic Rev. Oct. 3 Though we go for the republican principle of the supremacy of the will of the majority, we acknowledge, in general, a strong sympathy with minorities, and consider that their rights have a high moral claim on the respect and justice of majorities.
1855 N. Amer. Rev. Jan. 171 The nucleus afforded by a vast and unappropriated country for the establishment and growth of political and religious minorities transplanted from ancient states and hierarchies.
1888 S. MOORE tr. Marx & Engels Manifesto Communist Party i. 11 All previous movements were movements of minorities or in the interests of minorities.
1917 Times 28 Dec. 8/1 According to the declarations of..the quadruple alliance, protection of the right of minorities forms an essential component part of the constitutional right of peoples to self-determination.

The point of Texeira's article is that the term minority is being overtaken by demographic events. Harrison's argument (if he was quoted corrected, which is always a gamble) seems to be that it's appropriate to go on using the term, even if the groups so named become collectively the numerical majority, because it referred originally not to demographic statistics, but to the legal status of being a child before the law. But none of these groups have such a legal status today, so why would this etymology be relevant, even if it were true?

It's common enough for the literal sense of a word to evaporate in favor of what started as connotation. If that's happening to minority, so be it -- we can go on using it, if we want to, without making up factually and logically dubious excuses.

Posted by Mark Liberman at 08:43 PM

Interprète, L'

Reading through a fairly positive NYT review of the new movie The 40 Year-Old Virgin, I found out that it co-stars Catherine Keener. I had one of those tip-of-the-tongue-type reactions where I recognized the name but was having difficulty matching it with a face, so I IMDB'd -- and found that Keener also co-starred in the recent movie The Interpreter (mentioned at least a couple times here on Language Log). I also found, much to my surprise and amusement, that the convention of putting articles (a, the) at the end of a movie (or book, etc.) title for alphabetizing purposes has a funny result in French (and, I assume, other languages that are like French in relevant respects).

Quick background, for those who may not be (so) familiar: in French, the article corresponding to English the is le (masculine) or la (feminine), both of which lose their vowel (which is replaced orthographically by an apostrophe) when they precede a vowel-initial word: le livre 'the book' but l'homme 'the man' (the initial 'h' of homme is silent); la table 'the table' but l'église 'the church'). (This rule applies in pretty much the same way with several other function words, such as de 'of', je 'I', etc.)

The rule is strictly based on the sound that the immediately following word starts with; for example, 'the big man' is le grand homme or l'homme grand, depending on where you place the consonant-initial adjective. There's no rule for how to pronounce/write one of the relevant function words when it appears phrase-finally -- these words never appear in such contexts under natural circumstances -- but all signs point to the prevocalic form being the special case and the preconsonantal form being the elsewhere, default case.

(Update, added immediately after posting: this may depend on the variety of French you speak, as devoted Language Log readers may know. For at least some Canadian French speakers, for example, a sentence may end in a preposition such as de; some European French speakers, on the other hand, accept stranded prepositions except de and à.)

Except, of course, in the case of this convention of putting articles at the end of a title: the rule, at least orthographically, appears to be to use the form of the article that is used when the convention is not in force: The Interpreter is L'Interprète, so Interpreter, The is Interprète, L'. (Figure, go.)

(Continuation of update: And I doubt that those Canadian French speakers would either pronounce or write d' when the fronted object of the stranded preposition is vowel-initial ...)

(Cross-posted, mutatis mutandis, on phonoloblog.)

[ Comments? ]

Posted by Eric Bakovic at 04:40 PM

August 18, 2005

The duty to correct

On Sunday (8/14/05), the New York Times Magazine's Ethicist, Randy Cohen, took on some ethical issues in publishing, in response to a translator who had discovered that an article she was translating (from Hungarian into English) "was copied in large part from a lexicon published in 1929" and asked whether she should report her discovery to her employer ("a major American research institution"). Yes, says Cohen, not surprisingly. But then he goes on to enunciate a duty to correct errors in the language of texts -- a position that strikes me as well-intentioned but potentially troublesome in practice.

Cohen begins by observing that if the translator doesn't report the copying, probably no one will. And she has a duty to:

When it comes to ordinary civilians, both law and ethics impose only a limited duty to report wrongdoing... But you are not an ordinary civilian; you are part of a scholarly community, and different contexts entail different obligations. Intellectual integrity can be maintained only if members of your community report transgressions. Without this self-policing, the field cannot sustain its own values.

So far, we're in familiar territory. Now come the language issues:

You also have a duty to your employer. Everyone in the publishing process should report a solecism that would otherwise go undetected--a misspelling, a grammatical error. Similarly, all should report a serious ethical transgression. To keep silent would undermine the project on which you are employed.

There are two duties here, one apparently more weighty than the other: to report serious ethical transgressions and to report solecisms in language. Perhaps the second duty falls short of being an ethical imperative, but it is still a significant responsibility, according to Cohen. Cohen might want to think about how to frame this responsibility, since "solecism" covers a lot of territory -- and just how much depends on who you read. The authorities are by no means on the same page here, so to speak.

No one's denying that writers fall into error. There are typos, "cutnpaste errors" (in which parts of two different formulations survive the editing process), inadvertently omitted words, ill-chosen words, and much more. If you're part of the publishing enterprise and these come by your eyes, you should of course report them. But then there's a large gray (or grey) area, which includes matters on which there are house styles, different styles for different houses, and also "usages people keep telling you are wrong but which are actually standard in English" (in the words of Paul Brians, on his non-errors page). Brians's page is a place to start, but for serious, detailed advice you'll need to consult MWDEU.

What you don't want to do is start reporting all those things that some manual or other says are solecisms: people used as the plural of person, over used as a quantifier meaning 'more than', once used as a subordinator meaning 'after, when, as soon as', restrictive relative which, and on and on. That will only make you a pest to your colleagues and employers, and a monkey wrench in the works of the publishing process.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 12:40 PM

Reading(s) in the social sciences

From Des von Bladet, two (implicitly) related items. First, a link to a story at BBC News telling us that

Former Spice Girl Victoria Beckham has admitted she has never read a book in her life - despite having apparently written her own 528-page autobiography.

and then a quote from the introduction to the 165-page Routledge textbook Knowledge and the social sciences: theory, method, practice:

When it snows I just see snow. If I think hard about it I might see sleet. But an Inuit living in Northern Canada. whose language includes over a dozen words for snow, will see a much more nuanced snowstorm than I ever can.

Des explains:

The claim is utterly false, of course. (See the title essay of G. Pullum's The Great Eskimow Vocabulary Hoax for an entertaining discussion, or his remarks here.)

Also, the Inuit are the Greenlandic bunch and the Canananadians aren't keen on the name; their habitat is technically an Arctic desert, so they don't get as much snö as all that, and the "than I ever can" is just plain icky - learn, you couldn't?

But most exasperating of all is that this necessarily unsourced factoid stars in the introduction to the block on "knowledge and knowing" where we get to be all epistemological for once. It'll star in my essay, too, I think.

With an appropriate modification of Des' jokey spelling, you can find Geoff's book here.

The second (and most recent) edition of the Routledge text, published in 2000, can be searched at amazon.com, and the quote Des cites is really there. A bit of the prior context:

Three key elements of the social construction of knowledge are explored in this book

the role of language and discourse
the role of institutions
the role of different types of social power

Language is a social phenomenon and no description or explanation of the world can be created without recourse to it. But the language we inherit shapes what it is we see in the world and what we cannot see, what we know and what we cannot know.

And then on to the subtle cognitions of the frozen north. Some of what comes next:

Institutions are equally important in shaping the content and standing of knowledge systems. At one extreme, the dominance and public legitimacy of knowledge systems has been backed up and underscored by the use of force, terror and censorship. But even in the context of more diverse, open and plural societies, institutions exert powerful effects.

And then:

Which brings us to power. The production, dissemination and legitimization of knowledge requires access to and use of resources economic, political and cultural and, as the examples above suggest, these resources are rarely equally distributed.

The author of this introduction is David Goldblatt. I wonder if the stuff about Institutions and Power in this textbook is any better researched than the stuff about Language seems to be. Des, who is apparently reading the book, may tell us. It's certainly a lot easier to write a textbook if you can just kind of make up stuff that sounds plausible.

Posh Spice is straightforward about her scholarship:

The 31-year-old wife of England captain David Beckham told a Spanish magazine she does not have time to read.

I wish that academics who make things up about language were equally forthright.

Posted by Mark Liberman at 10:40 AM

The birth of an eggcorn

"Spin Bunny" hears segue, thinks it's a neologism based on a metaphorical use of segway, blogs about it:

Played buzzword bingo in meetings? We thought so. PR is such a sad industry that such pasttimes are required in order to keep your mind on the job.

Yet buzzwords have sunk lower than we ever thought possible. Overheard in a recent fluffette meeting was (albeit by a client) the use of the word Segway as a verb. As in "Oh I guess now's the time to just Segway into that issue a little".

Bunny is not the first to perceive segue this way: Izzy's Guide to Starting and Running an Underground Paper sensibly suggests:

Before you craft the first sentence in a paragraph, ask yourself, "What the hell is my point for this paragraph?" Think about your thesis and what you're trying to argue. The first few sentences must support or relate to the strong stand you've already made on an issue.... The rest of the paragraph should be spent arguing this one point. Don't segway into another point. You want coherent writing, not chaos.

And at bookrags.com ("Writing and Studying Skills and Tips"), as part of "How to Write a Five Paragraph Essay", we have the nominal form:

The Introduction consists of an opening line. This opening line can be a generalization about life that pertains to your topic. It can also be a quotation. Another segway into the introduction is to start it with a little anecdote (or story). By "breaking the ice" so to speak with the reader, you are luring him or her into the rest of your essay, making it accessible and intriguing.

Not all uses are in advice to writers:

The second X-Files movie may or may not reveal straight undisputeable facts, but the last episode was still a great closer and segway into the movie, which I sure as hell can't wait for.

Overall, this little book offers much as a solid segway into intro Perl programming for bioinformatics.

As it is, this can be a good segway into an art lesson you have planned.

Someone asked me recently, "What's up with the 'jay Is' thing anyway?" (well, that's not entirely true, but makes a nice segway into a new blog entry)...

Gee, you spout a boatload of nonsense and gibberish about God knows what, then segway into something about your hard drive going bye bye because you picked up a virus?

As usual with eggcorns, this is a perfectly sensible metaphor, and wouldn't raise any problems if it weren't blocked by an existing usage.

[Update: Ben Zimmer points out that "segway" as an oddball spelling for segue pre-dates the naming of the scooter:

Just read your LgLog post on "segway". Though your first example is clearly an eggcorn, the other examples may simply be spelling errors, with no semantic reinterpretation (though the prominence of the Segway brand may have popularized the error). One can find the "segway" spelling in the Usenet archive all the way back to 1985:

http://groups.google.com/group/net.sf-lovers/msg/011176c532f8f296
Note: The June issue of JSF is a segway into the July issue and is therefore more enjoyable if you know the characters.

Also, "segway" has appeared as a conscious misspelling long before the introduction of the Segway scooter. Larry Monroe has hosted an Austin-based radio and TV show called "Segway City" since 1977:
http://www.larrymonroe.com/program/SC.html
http://www.kut.org/site/PageServer?pagename=mus_larrymonroe

]

[Update #2: Rich Baldwin had another theory...

An anecdote relating to your segue post, which you may find funny.
When I was very much younger, and having never understood it the few times I saw it in print, I thought the word segue was actually spelled segway. Further, I was sure that it was a borrowing from pig-latin, like ixnay. But I could never figure out what a "weseg" was; I kept expecting to find a well used phrase from somewhere in the worlds of stage and screen describing scene changes that started with "w" and had a "seg" in it, but I always came up empty handed.
Boy was I surprised when I learned the correct pronunciation of "seh-gooey"!

]

[Another anecdote by email from Ella at Cherrier:

My boyfriend produces an internet radio show, and the description for it in the directories used to read that it had "a million different segways going all over the place". He never really understood why this image put me into paroxysms of laughter - but eventually my excess of hilarity shamed him into changing it, more's the pity.

And Neal Goldfarb sent in citations showing that the spellings { segueway} and even { sequeway} are fairly common as well. ]

Posted by Mark Liberman at 06:57 AM

August 17, 2005

More illusions

In my posting on Dr. Language and I, I pointed out two seductive effects of selective attention: the Recency Illusion (if you've noticed something only recently, you believe that it in fact originated recently) and the Frequency Illusion (once you notice a phenomenon, you believe that it happens a whole lot). The point here is that your impressions are unreliable; you need to find out what the facts are.

Now my colleagues Elizabeth Traugott and Isa Buchstaller have pointed out that when people lament, "Those kids today!", they're likely to be victims not only of the Recency Illusion ("today") but also of related illusion that I'll call the Adolescent Illusion, the consequence of selective attention paid to the language of adolescents ("those kids") by adults. This illusion is a special case of a much broader effect, in which people pay attention selectively to members of groups they don't see themselves as belonging to and so locate phenomena as characteristics of these groups: an Out-group Illusion.

There are many familiar examples. Ask people about retro-not ("I think that's a smart idea -- not!") and lots of them will tell you it's both recent and characteristic of teen speech. As Larry Horn has observed (in, for example, his 1992 paper "The said and the unsaid", in Ohio State University Working Papers in Linguistics 40.163-92), neither of these impressions is really accurate.

Teenagers are likely to be blamed for most things that (some) people find reprehensible in language. This is not an entirely unreasonable view, since a great many linguistic changes do seem to originate in adolescent language. But, of course, you have to figure out whether the phenomenon you're looking at actually is one of these changes in the early stages of progress. Sometimes it is, sometimes it isn't.

More generally, people sometimes are exquisitely sensitive to some linguistic feature in groups they don't belong to, while missing it almost totally within their group. My current favorite example of the Out-group Illusion is a contribution to a Linguist List discussion of "double be" last year (issue 15.535, 2/9/04). Jill Murray, writing from Australia, joins the conversation:

Just as I was reading this posting I had a phonecall from an Irish speaker who used the construction twice in a five minute conversation. It is not a feature of Australian English and I had never heard it before. Both were "The thing is, is that ..."

Pat McConvell, who had been posting and writing about the phenomenon for over 15 years, then chimed in (issue 15.560, 2/12/04) to flatly contradict Murray's subjective impressions: most of his examples were from Australian speech, and he collected new examples "virtually every day" from Australian-born colleagues, on the radio in Australia, etc. Murray was detecting the feature only when it came from people whose speech she was likely to judge as unusual, exotic, marked.

Like I said, you just have to go and find out. I no longer trust my own subjective impressions, or those of other linguists, no matter how reputable. The OED, for instance, sometimes gives judgments about how frequent certain uses were in particular periods (many of them James Murray's, from well over a hundred years ago), as do reference works like Tauno Mustanoja's Middle English Syntax, but those are impressions based on experience with unsystematic samples, and they simply aren't reliable. There's lots of work to do.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 09:06 PM

No fuckin' winking at the Times

In my recent posting on linguistic modesty at Henry Holt, I reviewed some episodes of primness in the New York Times, which caused reader Matthew Hutson to point out one that I had somehow missed: Michael Brick's "Longing for a Cuss-Free Zone", in the Fashion and Style section of 7/31/05. Brick fastidiously avoids even the indirect f-bomb (a vivid version of f-word) in favor of the still more indirect word-bomb (and its abbreviated version bomb), used as both noun and verb in his piece.

Brick glosses word-bomb, which he admits is a "clunky construction", at arm's length:

I've made ["word-bomb"] up as a stand-in for a well-known hyphenated term that refers to an actual profanity. In use for at least a decade, the original hyphenated term (which begins with the first letter of the profanity and ends with "bomb") gives a knowing wink to the actual profanity's paradoxical place as a taboo in wide circulation.

And the second-level indirection (avoiding the contaminated letter f) doesn't come with a knowing wink? Yeah, sure. What's next? The word-b, which sidesteps the now possibly contaminated word bomb? Obzo, the rot-13 version of bomb? (Shpx would clearly be too racy.)

Apparently the whole exercise really is designed to keep the NYT from winking at its readers (don't you just hate it when newspapers do that?):

... very rarely does the paper print those obvious, winking, letter-word stand-ins. As the Times's two-page stylebook entry on obscenity says, "An article should not seem to be saying, 'Look, I want to use this word but they won't let me.'"

So what, kids, does word-bomb say? "I'd never use this word in polite company, and can barely bring myself to allude to it, even very obliquely"? Well, aren't you fastidious!

As icing on the cake, there's a letter on 8/7/05 objecting to Brick's verbing of the noun (word-)bomb, as in: "Outside office buildings smokers bomb their bosses"! A. Scott Falk writes, "This inconsistent use of a misguided neologism strikes me as a greater affront to the English language in polite society than any familiar four-letter word could ever be." You read it here: verbing is so evil, such a symptom of the breakdown of society, that it's even worse than fuck. So: no fuckin' verbing! And no fuckin' winking, either! And wipe that smirk off your face!

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 08:18 PM

Do so if you gotta

I've learned my lesson about misquotations in the New York Times, so I'm taking this one from today's Maureen Dowd column with a grain of salt (and added emphasis):

Pressed about how he could ride his bike while refusing to see a grieving mom of a dead soldier who's camped outside his ranch, he added: "So I'm mindful of what goes on around me. On the other hand, I'm also mindful that I've got a life to live and will do so."

This is a nice example of do so anaphora in which the referent is not exactly syntactically present in the discourse (do so = "live my life / live the life I've got"). These types of examples (and even more striking ones) have been studied extensively by my colleague Andy Kehler in collaboration with Gregory Ward. Some of their joint papers are available in PDF format from Greg's publications page; their most recent statement, "Constraints on Ellipsis and Event Reference", appeared in The Handbook of Pragmatics (Blackwell, 2004).

The source of the quote in Dowd's column is, of course, George W. Bush. As usual (IMHO), Dowd puts it well, but I think Ann Telnaes puts it better:

[ Comments? ]

Posted by Eric Bakovic at 07:37 PM

Bullshit: invented by T.S. Eliot in 1910?

In discussing Jim Holt's New Yorker piece on bullshit, I neglected to mention what Holt has to say about the etymology:

The word “bull,” used to characterize discourse, is of uncertain origin. One venerable conjecture was that it began as a contemptuous reference to papal edicts known as bulls (from the bulla, or seal, appended to the document). Another linked it to the famously nonsensical Obadiah Bull, an Irish lawyer in London during the reign of Henry VII. It was only in the twentieth century that the use of “bull” to mean pretentious, deceitful, jejune language became semantically attached to the male of the bovine species—or, more particularly, to the excrement therefrom. Today, it is generally, albeit erroneously, thought to have arisen as a euphemistic shortening of “bullshit,” a term that came into currency, dictionaries tell us, around 1915.

Holt apparently got his information from the OED, which is much more dismissive of those "venerable conjectures" than his description might lead a reader to believe:

No foundation appears for the guess that the word originated in ‘a contemptuous allusion to papal edicts’, nor for the assertion of the ‘British Apollo’ (No. 22. 1708) that ‘it became a Proverb from the repeated Blunders of one Obadiah Bull, a Lawyer of London, who liv'd in the Reign of K. Henry the Seventh’.

Though the OED admits the word bull in the sense "Trivial, insincere, or untruthful talk or writing; nonsense" is "of unknown origin", it directs our attention to

OF. boul, boule, bole fraud, deceit, trickery; mod.Icel. bull ‘nonsense’; also ME. bull BUL ‘falsehood’, and BULL v.3, to befool, mock, cheat.

all of which seem sounder references than those that Holt chooses to quote.

Holt also neglects to tell us that the OED's two earliest citations for bullshit are from Wyndham Lewis and E.E. Cummings:

c1915 WYNDHAM LEWIS Let. (1963) 66 Eliot has sent me Bullshit and the Ballad for Big Louise. They are excellent bits of scholarly ribaldry.
1928 E. E. CUMMINGS Enormous Room vii. 194 When we asked him once what he thought about the war, he replied, ‘I t'ink lotta bullsh--t.’

But the first citation is self-debunking, since it refers to the title of something written earlier! And indeed, according to this paper (Loretta Johnson, "T.S. Eliot's Bawdy Verse: Lulu, Bolo and More Ties", Journal of Modern Literature 27.1 (2003) 14-25), the letter in question was sent to Ezra Pound and refers to some poems that Eliot had written earlier:

On February 2, 1915, Lewis wrote to Pound, "Eliot has sent me Bullshit & the Ballad for Big Louise." He mistitles "The Triumph of Bullshit" and "Ballade pour la grosse Lulu." "They are excellent bits of scholarly ribaldry ... I am longing to print them in Blast; but stick to my naif determination to have no 'Words Ending in -Uck, -Unt and -Ugger.'_"

Johnson explains that:

"The Triumph of Bullshit" and "Ballade pour la grosse Lulu" address the vagaries of publishing and the mediocrity of the press. ... In three octaves and a final quatrain, the narrator thumbs his nose at the "Ladies" who are reading his work and determining its fate. In the first three stanzas he addresses the "Ladies, on whom my attentions have waited," "Ladies, who find my intentions ridiculous," and "Ladies who think me unduly vociferous." Then each stanza ends with "For Christ's sake stick it up your ass." The abababab, cdcdcdcd, and efefefef rhymes are unconventional, linking "waited" with "alembicated," "constipated," and "imitated." And "small" rhymes partially with "galamatias" and "ass." The final stanza refers to the word "bullshit" in the title. "It" shall triumph when "with silver foot" they step in it, "among the Theories scattered on the grass." "And then for Christ's sake," the narrator adds, "stick them up your ass."

Apparently Eliot's Bullshit was originally ungendered:

The first version of "Triumph," written or transcribed probably in 1910, addresses the "Critics" instead of the "Ladies." When it first was written, Eliot was not in print, except for poems in the Smith Record and the Harvard Advocate. In 1914, he wrote to Aiken stating he was writing and enclosed his "war poem," entitled "UP BOYS AND AT 'EM," which ends: "But the cabin boy was sav'd alive/ And bugger'd, in the sphincter." Eliot, perhaps amused by the idea of offending sensitive female taste, joked about publishing the Notebook, naming it "Inventions of the March Hare." He wrote he could give a few lectures and become a "sentimental Tommy," punning on his name and alluding to poetry readings at Harold Monro's Poetry Bookshop in London and the popularity of J.M. Barrie's Sentimental Tommy (1896). Critical of the "Ladies" who influence popular taste, Eliot yearned to be published, in part to impress his father. According to Valerie Eliot, when they parted for the last time at the end of his 1915 visit, Eliot was convinced that his father thought him a failure." Publication might reverse that problem.

When Eliot changed "Critics" to "Ladies" in 1916, he changed the meaning significantly. Ricks suggests that Eliot may have felt "at the mercy" of several women, including his wife. Other "Ladies" could have been Dora Marsden and Harriet Weaver of The New Freewoman, editors from whom Pound, in contest with Amy Lowell, tried as early as 1913 to wrest some editorial control. Pound was also working on Harriet Monroe to publish Eliot's poetry. After his premature discontent and following the instrumental encouragement of Pound, Eliot began to publish. Monroe published "The Love Song of J. Alfred Prufrock" in the 1915 issue of Poetry. She also accepted "The Boston Evening Transcript," "Cousin Nancy," and "Aunt Helen" for the October 1915 issue.

I guess this must be the 1916 version of The Triumph of Bullshit:

Ladies, on whom my attentions have waited
If you consider my merits are small
Etiolated, alembicated,
Orotund, tasteless, fantastical,
Monotonous, crotchety, constipated,
Impotent galamatias
Affected, possibly imitated,
For Christ's sake stick it up your ass.

Ladies, who find my intentions ridiculous
Awkward insipid and horribly gauche
Pompous, pretentious, ineptly meticulous
Dull as the heart of an unbaked brioche
Floundering versicles feebly versiculous
Often attenuate, frequently crass
Attempts at emotions that turn isiculous,
For Christ's sake stick it up your ass.

Ladies who think me unduly vociferous
Amiable cabotin making a noise
That people may cry out "this stuff is too stiff for us" -
Ingenuous child with a box of new toys
Toy lions carnivorous, cannons fumiferous
Engines vaporous - all this will pass;
Quite innocent - "he only wants to make shiver us."
For Christ's sake stick it up your ass.

And when thyself with silver foot shalt pass
Among the Theories scattered on the grass
Take up my good intentions with the rest
And then for Christ's sake stick them up your ass.

I haven't been able to find the 1910 version.

But anyhow, I don't believe that T.S. Eliot really invented bullshit in 1910. He could hardly have aimed to shock the "ladies" by naming his little poem "The Triumph of Bullshit" if the term had not already been a commonplace vulgarity.

[Update: Steve from Language Hat emailed

To complete the modernist trifecta, Ezra Pound used it in 1914 in a letter to Joyce:
"I enclose a prize sample of bull shit."
(That's the first clearly metaphorical use cited in HDAS; they include a couple of references to the actual excrement of the bull from much earlier.)

"HDAS" is the Oxford Historical Dictionary of American Slang.

And Uche Ogbuji at Copia has some thoughts about Eliot's Triumph:

Horrid genius. Eliot attaches several senses to "ladies", including (and this is the sense that does find best concord with the poem), the society matrons who influenced popular, and hence critical, taste. But Eliot is also a bit of a coward here. ...

... when it's time for brave, open sally, Eliot prefers weak targets.

Or at least targets that he can treat as weak.

Anyhow, it's -- poetic justice? -- to find Pound, Eliot and Joyce all lexicographically implicated in the origins of bullshit. ]

Posted by Mark Liberman at 05:43 PM

Modesty at Henry Holt

One of the staff assignments here at Language Log Plaza is to keep track of instances of conspicuous linguistic modesty in the media. It all started with a little rant by Geoff Pullum about how NPR managed to broadcast an entire talk show about Harry Frankfurt's book On Bullshit without a single mention of the title. Meanwhile, the very modest New York Times refers to the book as On Bull _ _ _ _. (It's been on the NYT Book Review's nonfiction best seller list for 20 weeks now, so the issue comes up at least once a week.)

A while back, I noted the way the Book Review coped with a double-whammy, the title of Nick Flynn's memoir Another Bullshit Night in Suck City: Another Bull _ _ _ _ Night ... (avoiding shit with one kind of ellipsis mark and Suck City with another).

And now, under the imprint of Henry Holt and Company (the Metropolitan Books division), Guy Deutscher goes to such lengths to avoid the naughty word fart that he has to supply clues to its identity.

The word comes up on p. 85 of Deutscher's entertaining and informative The Unfolding of Language, in a discussion of Grimm's law and the doublets it gives rise to in modern English, like pater(nal) and father:

Sometimes the siblings have gone such separate ways that upon meeting up they would hardly give each other a second glance. This is the case with the borrowed part(ridge) and the native ****. (The Greeks, who are the ultimate source of the loanword partridge, presumably gave it this name because of the loud whirring sound it makes when suddenly flushed out.)

We have the Grimm's law context, which suggests that the averted word begins with the Germanic counterpart to p, that is, f -- and, of course, that it ends with the Germanic counterpart to t, that is, in Deutscher's transcription th (as in tooth). That would suggest something like farth. The allusion to loud whirring sounds is, I figure, supposed to bring the reader to a similar-sounding naughty English word that has something to do with sound. That's probably enough to lead you to the word fart.

But why such indirection? Even fuck doesn't usually get all four of its letters ellipted. Surely f**t would have been sufficiently prim, and if the editors were worried that readers might think of foot first, why then f*rt would have worked. The story about whirring partridges would still be necessary, to account for the semantic relationship, but the whole business would have been easier on the reader.

[Added 8/21/05: Deutscher has written to ask: "... and what about the joy of discovery - is that worth nothing?" Ah, the "****" and the whirring partridges were meant as a little puzzle for the reader, but I didn't see that.]

Me, I would have gone for fart, flat out. It's a bit on the vulgar side, but we're not in the prim pages of the NYT here, and we're all adults. (Not that I would warn children away from Deutscher's book, but I suspect that few children would make it through an extended discussion of triliteral roots in Semitic, or of the laryngeal hypothesis and the discovery of Hittite. Great stuff for teenagers, though.) Anyway, sometimes a little vulgarity is just the ticket, as in one of John Mortimer's essays in Where There's a Will, p. 141:

Some of the best things in life, works that are a pleasure to be handed on to the generations to come, have vulgarity and sentimentality in spades... Indeed it's impossible to read through, say, the novels of Virginia Woolf without longing for a touch, a mere hint of vulgarity or sentimentality, a tear-jerking scene perhaps, or even a joke about a fart.

[Added 8/21/05: A correspondent going by the name Yarrow has written to observe that Virginia Woolf was not above a certain coarseness on occasion. Yarrow points to the beginning of Orlando:

He--for there could be no doubt of his sex, though the fashion of the time did something to disguise it--was in the act of slicing at the head of a Moor which swung from the rafters. It was the colour of an old football, and more or less the shape of one, save for the sunken cheeks and a strand or two of coarse, dry hair, like the hair on a cocoanut.

The "no doubt of his sex" is indirect, but the description of the old head is vividly earthy.]

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 12:18 PM

Labov's Test

According to Jim Holt's discussion in the New Yorker of Gerald Cohen's paper "Deeper into Bullshit", Cohen independently discovered what I have always privately called Labov's Test:

...how could one prove ... that a given statement is hopelessly unclear, and hence bullshit? One proposed test is to add a “not” to the statement and see if that makes any difference to its plausibility. If it doesn't, that statement is bullshit.

I described and exemplified Labov's Test in one of my first Language Log posts, though I attributed it to an anonymous "colleague" rather than to its author, Bill Labov. Since I was relying on my memory of a dinner-table anecdote from many years ago, I wasn't sure that I had the story exactly right. And Bill is a person who is very serious about quantitative validation of his empirical claims, so I thought he might be uneasy about being cited as author of an informal experiment without a large enough N or a double-blind design. And I was writing about whether or not there is any signal in Jacques Derrida's noise, so I felt constrained by contrast to be careful and exact.

But finding the truth depends as much on open presentation and discussion as on private research and reasoning; and this is a blog, not a refereed journal; and Labov's Test is a worthwhile rule of thumb, whose (co-)invention Bill deserves credit for. So I'm outing him as the author. I suppose we could call it the Labov-Cohen test. Some might object that Cohen actually wrote about it, while Labov's contribution was only through the oral tradition, but basic methodological innovations of this kind are often introduced by what scholars call "personal communication".

Anyhow, I have another purpose in bringing this up, which is to criticize Holt for carelessness in interpreting this useful test. (I'm not sure whether to bring Cohen into this or not, since I haven't yet read his paper “Deeper into Bullshit,” in Sarah Buss, Ed., Contours of Agency: Essays on Themes from Harry Frankfurt.)

A key thing about the test, missing from Holt's discussion, was front-and-center in Bill Labov's original presentation, as I recalled and described it:

This ... reminds me of a parlor game that a colleague of mine claims to have played, back in the day when it was easier to find academics who took Derrida seriously.

My colleague would open one of Derrida's works to a random page, pick a random sentence, write it down, and then (above or below it) write a variant in which positive and negative were interchanged, or a word or phrase was replaced with one of opposite meaning. He would then challenge the assembled Derrida partisans to guess which was the original and which was the variant. The point was that Derrida's admirers are generally unable to distinguish his pronouncements from their opposites at better than chance level, suggesting that the content is a sophisticated form of white noise. On this view, as Wolfgang Pauli once said of someone else, Derrida is "not even wrong.".

The point is that this is a test of communication from author to audience, not a test of the author's meaningfulness in itself. And it is framed as a behavioral test, not as a test of the author's intentions with respect to the relation between text and truth, or any other aspect of the author's state of mind. Labov's test could fail as easily because the audience is ignorant as because the text is nonsense.

For example, a 1999 JHEP paper by Seiberg and Whitten contains one of these two sentences, differing only in the introduction of "not" in the second one:

(1) ... at nonzero B (unless B is anti-self-dual) a configuration of a threebrane and a separated −1-brane is BPS, so an instanton on the threebrane cannot shrink to a point and escape.

(2) ... at nonzero B (unless B is anti-self-dual) a configuration of a threebrane and a separated −1-brane is not BPS, so an instanton on the threebrane cannot shrink to a point and escape.

Both sentences seem equally plausible to me. However, I don't take this as evidence that string theory is bullshit, but rather as evidence that I don't understand its mathematics. If I claimed to understand the mathematics of string theory as discussed in this paper, but was unable to pass Labov's test with respect to a set of pairs of sentences like this one, you'd be justified in concluding that I was bullshitting about understanding the paper, but not that Seiberg and Whitten were bullshitting in writing it. (By the way, the second sentence is the original one; and BPS is short for Bogomol'nyi, Prasad and Sommerfeld, and is discussed at greater length here, if that helps...)

(My interpretation of) Labov's claim about Derrida and similar writers is that all of his readers will fail the test (statistically speaking) all the time. If this were true, then we could conclude that everyone who claims to have understood Derrida (for example) is a bullshitter, or at least is in some sense deluded. This universal obscurity would certainly raise the suspicion that there was no suitable object of understanding available, for instance because the work is simply (or rather, complexly) nonsense.

I suspect that (my memory of) Bill's experimental hypothesis is often true, in the sense that in a controlled experiment, the partisans of such "theory" would fail to distinguish its statements from their negation at greater than chance level, in a large proportion of replications across statements and subjects. Of course, the ostentatious obscurity of such work suggests that its practitioners might be pleased rather than distressed by this result. And we'd need another test to distinguish this case from the similar results obtained for any sufficiently difficult mathematics. However, the point of my original post was that the test would not always be negative. Sometimes, Derrida was just wrong.

Posted by Mark Liberman at 09:58 AM

August 16, 2005

Onpassing, outgassing, and upskirting

Peter Suber sent in a usage that struck him as odd, from an article by John Blossom:

Usage statistics are the lifeblood of this exercise and they can illuminate a collection's importance to some degree, but what happens to content once it's away from the bounds of centralized statistics? It gets referenced in citation links, onpassed in emails and generally works its way into the infrastructure of an organization. [emphasis added]

I speculated by return email that Blossom's past role as VP of Outsell, Inc. might have given him a taste for words made from a preposition and a verb written solid. Peter's reponse:

I read enough stuff from Outsell and John Blossom to think that this isn't a house style at Outsell. But I like the idea that it's a house joke and that just this once it slipped through the copy editor (or throughslipped the copy editor).

Google finds 338 instances of {onpassed} and 186 of {onpassing}. A number are from South Asian sources, but some appear to come from native speakers of English -- often from Australia or New Zealand:

While tracking back through the snail trail left by my Nike email, a friend onpassed a story in The Financial Times where Doug Miller, of the Toronto-based consultancy, Environics warned ...

Unfortunately, those myths have been onpassed to current generations.

The major payments are for public housing (onpassed by the Budget to the Department of Housing - a Non Budget Sector agency) and roads.

The onpassing of these receipts to the ACT Government’s Central Financing Unit...

The first category relates to payments for onpassing to other bodies and individuals.

And at least two are from other articles by John Blossom, so if it's a joke that throughslipped the copy editor, (s)he upmessed more than once:

Though their consumer orientation would probably keep them out of the EContent 100 directly, expect companies such as Shared Media Licensing, Inc., creators of the Weed rights management system enabling content to be monetized as it's onpassed, and new companies such as SNOCAP to represent the beginning of a new wave of rights management capabilities that enable both traditional publishers and individuals and institutions to find profits from content in ways that they had never considered before.

Think long and hard about how the digital objects that you distribute can have life in the hands of your users beyond the first glance and as they get onpassed from one person to another.

There are plenty of precedents, such as bypass and uplift and overflow, and apparently onpass is well established for some people. But it's news to some others, one of whom asks:

(Q. I'm onpassing to you an email...)
A. Is onpassing sort of like outgassing?

No, I think that outgassing is more like upskirting... While outgassing is not related to a prepositional phrase of the form "out the gas", it could be related to a phrase like "let out the gas", just upskirting could be related to a phrase "look up the skirt".

[Update: Drew Smith writes:

I did a quick LexisNexis Academic search and found 92 probable uses of "onpass", "onpassed", or "onpassing" in major papers dating back as early as the April 23, 1982 issue of The Financial Times of London, which had this by Peter Montagnon on page 25 in Section II: "Although the borrowing vehicle is Morgan's offshore subsidiary J. P. Morgan International Finance N.V. proceeds of the note will be onpassed to Morgan Guaranty Trust in the form of a subordinated capital note."

Thanks to Drew for uplooking this and alongsending it! I just outchecked Google News, and found just one example, from a South African source: "He says the Protector avoids dealing with this core issue -- Imvume's onpassing of R11m of taxpayers' money to the ANC -- by 'a neat yet outrageous ... " A search of the current indices of the NYT and the WaPo didn't upturn anything.

Seriously, it's clear that onpass is an established usage, though it's one that some people (incuding Peter Suber and me) have managed to miss up to now. I note that many of the citations, from the web as well as this journalistic example, seem to come from the vocabulary of finance-speak, so perhaps it's become established there and is now outleaking into more general use.

Posted by Mark Liberman at 06:49 PM

August 15, 2005

Missing decades in Dutch, French and German

In response to my post "What happened to the 1940s?", Michel Vuijlsteke sent in the results of some experiments of his own on the relative frequency of 20th-century decade names in Dutch, French and German. His conclusions (which he qualifies as "preliminary and unscientific"):

- Dutch, French or German: no one writes of the 1940s much on the web, as you pointed out.
- Dutch speakers don't care much for the 1990s
- German speakers on the other hand *love* the 1990s (fall of Berlin Wall / reunification perhaps?)
- There's something fishy going on with Google's French pages: it contradicts all of the other trends in all of the other languages

Ah, Google and the French: wheels within wheels.

Anyhow, here are Michel's graphs for Dutch:

French:

and German:

and his counts for Dutch:

	de jaren 10	de jaren 20	de jaren 30	de jaren 40	de jaren 50	de jaren 60	de jaren 70	de jaren 80	de jaren 90
Google	895	31900	56100	29200	122000	192000	228000	295000	162000
Yahoo	6430	103000	175000	81300	374000	538000	651000	750000	513000
MSN	2842	26255	41826	19876	74246	104467	136817	144827	119179

for French:

	les années 10	les années 20	les années 30	les années 40	les années 50	les années 60	les années 70	les années 80	les années 90
Google	6330	143000	269000	124000	712000	948000	654000	730000	822000
Yahoo	19100	395000	684000	316000	1E+06	2E+06	2E+06	3E+06	2E+06
MSN	12445	76693	141856	61837	219592	326127	397380	516601	367362

and for German:

	1910er	1920er	1930er	1940er	1950er	1960er	1970er	1980er	1990er
Google	77900	219000	263000	139000	341000	454000	496000	510000	692000
Yahoo	92300	443000	437000	243000	701000	891000	980000	1E+06	1E+06
MSN	6075	51201	54000	19345	77429	101193	107841	112712	142733

[Update: Trevor at Kalebeul emailed:

My lunch companion suspects that the 1940s score so low because people often refer to them as "the war years." The yuppie years were neither so dramatic nor so focussed, so "the 1980s" tends to be preferred.

Yes, this was also my hypothesis.

In addition, I believe that "the war years" (essentially the first half of the decade) are felt to be very different from "the postwar years" (the second half of the decade and onward), so that there is a smaller tendency to refer to the decade as a whole under any name.

When people talk about "the 1960s" they really mean "1965-1969" or so, in most cases, but the 1960-1965 period has no real identity of its own, so "the 1960s" is used.

Trevor offered another suggestion as well:

Another more creative excuse for under-represented 40s would be that, because of paper shortages, they were not very good at documenting themselves, thus depriving us of primary sources. My grandad wrote many of his letters from the front on toilet paper, and the censors fortunately understood the concept of added value.

Though I'm no archivist, my guess is that the WWII years are nevertheless pretty well documented, and they were certainly full of events for people to write about in retrospect.]

[Update #2: Andrew Gray emailed with the plausible suggestion that the smaller number of references to the teens is essentially morphological in character:

...in English, we don't really ever say "the tens" (though we will say "the nineteen-tens"), because it seems verbally clumsy, and it's quite possible that this will spill over into writing about the decade. If you never *say* a phrase, you're less likely to use it in writing, in my experience. As an aside, try saying it to yourself - does it evoke any mental images? It draws a blank for me - "the 1910s" is a phrase with very few connotations, and so is probably less likely to be used idiomatically.
I went and did a little searching. I've taken Google numbers, as you did, for every decade back to 1510, so we can compare five centuries, and in all cases the numbers drop off sharply in the --10s decade. I missed out the first decade of each century, "1700s" and the like, since these usually refer to the century as well as the decade and so are pretty skewed.
http://www.generalist.org.uk/decades.png
The numbers on the Y-axis are "percentage of hits for that century which were this decade"; I normalised it to this to allow comparison across centuries, as the absolute numbers steadily shrank over time.

]

Posted by Mark Liberman at 04:42 PM

Silly talk about linguistics

In response to a call from Ben Goldacre at Bad Science, the folks at Cosmic Variance are swapping "silly talk about science" anecdotes, and Brian Weatherson is collecting "silly talk about philosophy" tales.

Some of the best stories come from forced conversations -- on a plane, or getting a haircut. For example, one from a comment at Cosmic Variance

Woman on plane: “So, what do you do?”
me: “I’m an astronomer.”
woman: “That must be fun. But… what’s left to do? I mean, we already know the names of all the stars!”

and one from a comment at TAR:

The setting: Prof. Garrett, on the plane, sitting next to a middle-aged woman.
She asks, "So, what do you do?"
Prof. Garrett, "I'm a philosopher."
"Oh! What are some of your sayings?"

Here's my contribution to the genre:

Person at party: "Someone told me that you know how to interpret spectrograms. That's so interesting! Could you teach me?"
Phonetician: "Well, sure, it's not hard to learn the basic techniques."
Person at party: "That would be so exciting! I've always been sensitive to communications from the spirit world, and with the help of scientific instruments, I can only imagine..."

("spectre-grams", get it?)

My own feeling about these situations is that they present a wonderful opportunity to emulate Ali G, but in reverse, so to speak. Any self-respecting philosopher ought to be prepared with some gnomic sayings that can bear several interpretations, at least some of them scandalous. An astronomer might point out, deadpan, that with the fall of communism, all the stars, comets and asteroids named by the Russians are up for grabs again, with the rights going for big bucks on the international cosmology auction circuit. And spectrographic interpretation of the voices of the dead is a piece of cake, actually, but the real research challenge is to analyze the voices of those who haven't been born.

Anyhow, if you have any good silly-talk-about-linguistics stories, send them to me and I'll add them to this post.

[Well, several people sent in the inevitable "you're a linguist? so how many languages do you speak?" question. So far only Eric Bakovic has supplied a good answer: "Both of them."

Joshua Guenter offers the following: "Once, at a party, upon telling a person that I studying Linguistics, I got the reply 'Oh, so William Safire must be the bigwig in your field, right?'"

And a literary silly-talk from Carrie Shanafelt:

Once, when I lived in Cleveland, I went to my favorite diner in the middle of the night to read Boswell and drink coffee. A woman at a nearby table yelled, "Miss! What the hell you readin'? That thing's biggern the goddamn Bible!" I said it was the Life of Samuel Johnson. She nodded knowingly, smiled, and said, "I loved him in Pulp Fiction."
]

[TStT has a lovely silly-talk anecdote about a mistaken (Western) folk tale about Japanese -- no, Tokyo is not Kyoto backwards -- see his post for the details. My favorite part of the story is not about linguistics or about silly talk, however, but about etiquette and self-presentation:

"At this point in the conversation, I was presented with a dilemma. In social situations, I don't like to act like that guy—you know, the guy who has to be right all the time and rubs your face in it? (Note that I say I don't want to "act like" him, because I am in fact that guy, but I try to keep a lid on it.)"

Words of wisdom for us all.]

[Emily Bender sent in three examples. One is the other commonest comment on being a linguist:
Aside from "How many languages do you speak?" the other one I get all the time is "I better watch what I say around you!"

I've never been able to come up with a better answer to this one than "Good plan!" or "Glad to hear it!" (Maybe "I promise to be merciful"?) Emily didn't help with this, but she did provide an excellent generic answer to questions about clothing text in foreign languages:

I have a t-shirt with the name of the university I studied at in Japan (Touhouku Daigaku), in kanji. When I wear it, someone invariably asks me "What does your shirt say." I usually answer "It says 'Ask me what my shirt says'." ... and people usually buy it!

Unfortunately I don't have any such shirts. And Emily also supplied a more personal story:

When I graduated from college, my family suddenly decided to try to understand what I had been studying. When I tried to explain Linguistics to my great-grandmother, she concluded that I was going to be a judge. The chain of reasoning apparently went like this:
My great-granddaughter is studying Linguistics That's about languages. She can speak lots of languages. Where do they need people who can speak lots of languages? In the courts! But my great-granddaughter is going to be *important*. ... She's going to be a judge!

So, better watch what you say. It all makes sense now...]

[Jesse Sheidlower offers some other come-backs to "I'll have to watch what I say in front of you":

This is by far the most common question I get as well after identifying myself as a dictionary editor. My stock response is usually "That's OK, I don't give tickets," with a smile.
I have long wanted to use an answer that someone suggested in a Miss Manners column to a similar question: "Thank you, but I am perfectly capable of forming a low opinion of you on entirely different grounds." But I just don't have the guts.

]

[From Robert G. Lee:

As a Certified ASL Interpreter, my favorite is this (all too common) exchange:

Person: So what do you do for a living?
Me; Among other things, I am an American Sign Language interpreter.
Person: Wow! So you know Braille!
Me: {sigh}

]

[This one is from Ella:

I get mostly a lot of puzzled looks when people hear that I'm a linguist (even more difficult to explain that I'm a phonetician working in computational linguistics for an IT company but I don't know how to code. I've taken lately to telling people that I'm a taxidermist just to put them at ease). But the oddest question I ever had about linguistics was - 'so if I learn Russian will I be a good chess player?'

]

Posted by Mark Liberman at 12:09 AM

August 14, 2005

To stumble in their ways from the ancient paths

According to David Remnick's retelling of New Yorker lore,

Six decades ago, not long after being hired by Harold Ross as a copy editor at The New Yorker, a shy young woman, an Oberlin graduate, set to work on a manuscript by James Thurber and soon came across the word “raunchy.” She had never heard of the word and thought it was a mistake. “Raunchy” became “paunchy.” Thurber’s displeasure was such that the young woman barely escaped firing.

But Eleanor Gould Packard, who died in February, would surely have modified "parauque" to "pauraque" before it made it into print for Steve at Language Hat to catch. I expect that she would also have doubted whether the head of the Family Research Council asked that every lion tongue would be cast down.

Posted by Mark Liberman at 06:11 PM

What happened to the 1940s?

There are several different ways to refer to decades -- "the seventies", "the 1970s", and so on. Of these, the textual form YYY0s is the most unambiguous -- a bit of web searching for patterns like {1990s} should convince you that nearly all uses refer to ten-year spans of time rather than to model numbers or the like.

And looking through the counts for the decades of the 20th century shows a main effect of recency: counts decline more or less linearly as the dates move backwards. But there are some divergences from this trend. In particular, the 1940s (and less clearly, the 1910s) are under-represented.

Here the effect is shown graphically:

In order to compare the counts from Google, Yahoo and MSN, I've expressed each search engine's results in terms of ratios to its average for the nine decades cited. The actual counts that I got are given in the table below, in millions. Note that MSN gives exact-seeming counts, while Google and Yahoo give approximations.

	Google	Yahoo	MSN
1990s	27.4	50.1	4.524152
1980s	23.9	48.4	5.078620
1970s	21.3	41.4	4.502463
1960s	18.8	36.7	4.219955
1950s	13.7	28.7	3.249855
1940s	6.53	15.0	1.908465
1930s	9.19	20.8	2.609433
1920s	6.38	15.6	2.000209
1910s	0.801	2.04	0.301935

The three search engines disagree about the status of the 1990s, but the 1940s and the 1910s are apparently below the trend line in all three counts. Is this because the 1940s and the 1910s were dominated by WW II and WW I respectively? I'm not sure.

I stumbled on this particular oddity because I was setting up to write something about an interesting recent paper by Anatol Stefanowitsch entitled "The function of metaphor" ( International Journal of Corpus Linguistics, Volume 10, Number 2, 2005, pp. 161-198). You can't read it, unless you have a subscription or are willing to pay the extraordinary sum of $37.17 -- a dollar a page plus change -- because, alas, the IJCL is not open access. If you could read the article, you'd find some interesting ideas about using collocational frequencies to explore the functions of metaphorical language. Specifically, Stefanowitsch contrasts what he calls "cognitive" and "stylistic" theories about the nature of metaphor. I thought I'd try to explain the basic issues for people who might not otherwise come across the paper, and especially to describe the methods used, which could be applied much more widely. So I started out to reproduce and extend a couple of Stefanowitsch's test cases, which deal with the distribution of metaphorical (e.g. "the dawn of <time period X>") and literal (e.g. "the beginning of <time period X>") expressions that are more-or-less referentially equivalent.

Stefanowitsch's idea is that we ought to find clues to the nature of the choice between metaphorical and literal expressions by looking at the words that tend to be more closely associated with each of them. He calls these collocational associates "collexemes". Thus with respect to "dawn of" vs. "beginning of", he writes that

...the events and time spans referred to by the collexemes of the literal pattern ... are much shorter and much more clearly delineated than those referred to by the distinctive collexemes of the metaphorical expression ...

I'll save the details (of S's paper and my reactions to it) for another post or two. But as a teaser, here's a bit of the data that I collected for looking at collocations relevant to metaphorical and literal time-period references. S used the British National Corpus, which is only about 100 million words; using the 5-10 trillion words on the web, we can examine some particular semantic fields in much more detail than he did.

For example, we can see that the 21st century is about 35 to 55 times more dawnish than the 18th. This makes metaphorical sense, I guess, but not because the 18th century is either shorter or more clearly delineated:

	(Google) "the dawn of the_ "	(Google) "the beginning of the _"	(Google) Ratio	(Yahoo) "the dawn of the_ "	(Yahoo) "the beginning of the _"	(Yahoo) Ratio	(MSN) "the dawn of the_ "	(MSN) "the beginning of the _"	(MSN) Ratio
21st century	64,900	129,000	2.0	206,600	390,000	1.9	44,177	91,834	2.1
18th century	466	33,600	72.1	1,280	114,000	89.1	252	29,322	116.4

As another example, the 1960s seem to have been two or three times as dawnish as the 1980s:

	(Google) "the dawn of the_ "	(Google) "the beginning of the _"	(Google) Ratio	(Yahoo) "the dawn of the_ "	(Yahoo) "the beginning of the _"	(Yahoo) Ratio	(MSN) "the dawn of the_ "	(MSN) "the beginning of the _"	(MSN) Ratio
1980s	831	60,500	72.8	2,530	135,000	53.4	762	44,519	58.4
1960s	983	19,500	19.8	2,410	48,100	20.0	464	12,300	26.5

Again this makes post hoc sense, but again the key is not the concreteness of the time period referenced.

[Update: Rob Malouf writes to draw my attention to Pollmann T. and R.H. Baayen, " Computing Historical Consciousness. A Quantitative Inquiry into the Presence of the Past in Newspaper Texts", Computers and the Humanities, Volume 35, Number 3, August 2001, pp. 237-253(17). From the abstract, it looks relevant and interesting:

In this paper, some electronically gathered data are presented and analyzed about the presence of the past in newspaper texts. In ten large text corpora of six different languages, all dates in the form of years between 1930 and 1990 were counted. For six of these corpora this was done for all the years between 1200 and 1993. Depicting these frequencies on the timeline, we find an underlying regularly declining curve, deviations at regular places and culturally determined peaks at irregular points. These three phenomena are analyzed.
Mathematically spoken, all the underlying curves have the same form. Whether a newspaper gives much or little attention to the past, the distribution of this attention over time turns out to be inversely proportional to the distance between past and present. It is shown that this distribution is largely independent of the total number of years in a corpus, the culture in which it is published, the language and the date of origin of the corpus. The phenomenon is explained as a kind of forgetting: the larger the distance between past and present, the more difficult it is to connect something of the past to an item in the present day. A more detailed analysis of the data shows a breakpoint in the frequency vs. distance from the publication date of the texts. References to events older than approximately 50 years are the result of a forgetting process that is distinctively different from the forgetting speed of more recent events.
Pandel's classification of the dimensions of historical consciousness is used to answer the question how these investigations elucidate the historical consciousness of the cultures in which the newspapers are written and read.

Unfortunately, this is not an open-access journal, and the Penn library doesn't have an electronic subscription to it, nor do I, and I'm not yet curious enough to pay $40 plus tax for a sixteen-page article relevant relevant to a blog post -- nor even to make a special trip to the library stacks].

Posted by Mark Liberman at 05:10 PM

Desertion and plundering -- not!

For a moment, it looked like a scandalous story of desertion and plundering by American forces in Iraq. The headline (p. 1 of the New York Times, 8/13/05) read:

G.I.'s Deployed in Iraq Desert
With Lots of American Stuff

Oh, the noun desert, not the verb desert.

Headline writing is full of perils.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 12:40 PM

August 13, 2005

This time it matters

Well, I figured this would happen pretty soon. Al Feuer's 8/12/2005 NYT story on the Air America financing scandal, running under the headline "Bronx Boys Club's Finances Investigated", had a correction appended on 8/13/2005:

Correction: Aug. 13, 2005, Saturday:
An article yesterday about state and city investigations of a loan made by a Bronx social service agency to the liberal radio network Air America quoted incorrectly from comments made on the air by Al Franken, the host of an Air America program. Referring to Evan M. Cohen, a former official of the network whom Mr. Franken accused of having engineered the loan, from the Gloria Wise Boys and Girls Club, Mr. Franken said: "I don't know why they did it, and I don't know where the money went. I don't know if it was used for operations, which I imagine it was. I think he was robbing Peter to pay Paul." (He did not say: "I don't know why he did it. I don't know where the money went. I don't know if it was used for operations. I think he was borrowing from Peter to pay Paul.")

The reason for the correction? Bloggers compared the original quote to the NYT version, and complained.

Here's the crucial bit of the original NYT article:

“I don’t know why he did it,” Mr. Franken said, according to a transcript of the broadcast made by the Department of Investigation. “I don’t know where the money went. I don’t know if it was used for operations. I think he was borrowing from Peter to pay Paul.”

Here's what Brian Maloney at The Radio Equalizer transcribed:

I don’t know why they did it, and I don’t know where the money went, I don’t know if it was used for operations, which I imagine it was. I think he was robbing Peter to pay Paul.

I went to the audio linked at Brainster's blog, and did my own transcription, which agrees in all relevant details with Maloney's. Here's an aligned comparison between the NYT version and the truth:


I don't know why      he did it /   I don't know where the money went 
I don't know why they    did it and I don't know where the money went

I don't know if it was used    for    operations
I don't know if it was used uh for r- operations which I imagine it was 

I think he was         borrowing from Peter to pay Paul
I think he was robbing                Peter to pay Paul

By my count, leaving out the disfluencies, that's 38 words in the genuine quote. To get the NYT version, we need to remove 8 words and add 3, for a word error rate of 11/38 = 28.9%. We can give them a bit more credit, since they split the quote into two pieces, and not charge them for the missing "and" at the break. That would make it 10/37 = 27% W.E.R.

But any way you count it, that might actually be better than the norm for quotation accuracy at the NYT! See this 7/30/2005 Language Log post, "'Quotations' with a word error rate of 40-60%" for documentation.

So Michelle Malkin may be right that

The omission of those five little words ["which I imagine it was"] matters because Al Franken's actual statement suggests that the money was in fact stolen from poor kids to pay Air America's bills--a speculation that the Times attributes to "conservative-leaning blogs," but not to the Times' favorite liberal talk show host who said it himself.

And there might be some truth to other speculations that the switch of "they" to "he" was politically motivated (one rotten apple, not a barrelful), and likewise softening "robbing" to "borrowing from".

But then again, maybe it was just the print media's astonishingly cavalier standards for quotation accuracy. Sometimes it doesn't matter, but this time it bit them.

When digital recordings of the original source are available on the web, you can count on someone checking the accuracy of cited quotations, especially when there's a question of bias. Why not take a few seconds to get the quote right?

Some other relevant Language Log posts:

"Quotations" with a word error rate of 40-60% and more (7/30/2005)
Linguists beware (7/9/2005)
Quotes from journalistic sources: unsafe at any speed (7/9/2005)
More comments on quotes (7/1/2005)
Bringing journalism into the 21st century (6/30/2005)
Down with journalists! (6/27/2005)
Ritual questions, ritual answers (6/25/2005)
Ipsissima vox Rasheedi (6/25/2005)
What did Rasheed say? (6/23/2005)

Posted by Mark Liberman at 04:29 PM

Lack of editorial oversight in the mainstream media

Newspapers and wire services are certainly welcome additions to the world's information economy, but these media, valuable as they are, can never be fully accepted as sources of information until they put into place some reasonable standards of editorial oversight and some workable mechanisms for detecting and correcting errors.

Um, that's a joke, sort of, but its conclusion is all too true. A lovely example was recently documented by Steve Outing at Poynteronline (and others too numerous to mention -- I got it via Peter Suber).

It seems that on Monday, Reuters ran a story claiming that

Wikipedia, the Web encyclopaedia written and edited by Internet users from all over the world, plans to impose stricter editorial rules to prevent vandalism of its content, founder Jimmy Wales was quoted as saying Friday.

In an interview with German daily Sueddeutsche Zeitung, Wales, who launched Wikipedia with partner Larry Sanger in 2001, said it needed to find a balance between protecting information from abuse and providing open access to improve entries. ...

Restricting access to entries particularly susceptible to unwanted attention could be one way of preventing [abuse], he said.
Wales has been at a meeting of those behind the successful free encyclopaedia in Frankfurt, which lasts until Monday.
He said that setting up a form of "commission" might be one way of deciding which entries could be "frozen" in perpetuity.

But apparently Wales said no such thing:

"The interesting thing is that the media simply made up the story about us permanently locking some pages. It's just not true. ... There is absolutely no truth at all to the story. None, zero. It is a complete and total fabrication from start to finish."

How did this canard get started? According to Outing,

Wales says the problem appears to be in the translation. He was in Germany recently and was interviewed by dozens of reporters, including from the Sueddeutsche Zeitung. He thinks the SZ reporter may have misinterpreted his comments. Then Reuters apparently translated his comments in German back to English, and his meaning got turned into something he didn't say.

What did he really say? According to Wales' explanation on slashdot,

"I spoke to one journalist about our longstanding discussions of how to create a 'stable version' or 'Wikipedia 1.0.' This would not involve substantial changes to how we do our usual work, but rather a new process for identifying our best work."

So, sloppy quoting at SZ, sloppy reporting and no quote checking at Reuters -- no biggie, this could happen even to a blogger. In fact, it did, because many bloggers credulously picked up the Reuters story. The real problem is lack of any interest at all in corrections, according to Wales (as quoted by Outing):

"The story seems to have legs, even though we've contacted Reuters and every other outlet to try to get a correction, no one seems to care at all. ... No response. We're important enough to write about, but not important enough for them to listen to at all."

As Peter Suber writes:

It's obvious but I'll say it anyway. An error like this would not have lasted 10 minutes on Wikipedia.

I just checked Google News, and found 564 hits for Wikipedia. I checked the first 20, and found about a dozen versions of the Reuters story, but no corrections. On Technorati, out of the first ten hits for "Wikipeida Reuters", three were corrections roughly equivalent in content to the listing above. On Blogdigger, five of the first ten set the record straight.

Posted by Mark Liberman at 11:18 AM

Another open access experiment

The journal Information and Computation has announced that

for one year, effective immediately, online access to all journal issues back to 1995 will be available without charge. This includes unrestricted downloading of articles in pdf format.
Retrieval traffic during the open access period will be considered as future subscription policies are formulated.
Journal articles may be obtained on Elsevier's Sciencedirect at http://www.sciencedirect.com/science/journal/08905401

This is obviously not the first journal to adopt an open access policy, experimentally or permanently, but is it the first one published by Elsevier to do it via Elsevier's site?

According to Peter Suber's page of Open Access lists, editors at several Elsevier journals have declared independence over the years (though not all of these have gone all the way to open access):

In 1998 most of the editorial board of the Journal of Academic Librarianship resigned to protest the large hike in the subscription price imposed by Pergamon-Elsevier after it bought the journal from JAI Press. Several of the editors who resigned then created Portal: Libraries and the Academy at Johns Hopkins University Press.
In November 1999, the entire 50 person editorial board of the Journal of Logic Programming (Elsevier) resigned and formed a new journal, Theory and Practice of Logic Programming (Cambridge).
Early in 2001, a handful of editors of Topology and Its Applications (Elsevier) resigned in order to create Algebraic and Geometric Topology (University of Warwick and International Press), a free online journal with an annual printed volume.
Elsevier has published the European Economic Review since 1969. In 1986 the European Economic Association (EEA) adopted it as its official journal. But the EEA grew increasingly unhappy with Elsevier's subscription price and its requirement that the publisher, not the association, hire the journal's editors. In 2001 the EEA started the process of declaring independence from Elsevier. In March 2003 its new official journal, the Journal of the European Economic Association, was launched by MIT Press at about one-third of the Elsevier subscription price.
On December 31, 2003, the entire editorial board of the Journal of Algorithms resigned in order to protest the high price charged by the publisher (Elsevier). On January 21, 2004, the same board then launched a new journal, Transactions on Algorithms, published by the Association for Computing Machinery (ACM).

Elsevier is by no means the only publisher to have been put into the role of King George by rebellious journal editors, but I've quoted these example's from Suber's list to indicate one of the sources of pressure that may have led to a change in policy at Elsevier specifically.

Several other recent conversions to open access are documented on Suber's Open Access News blog, such as the Netherlands Journal of Medicine, which was once an Elsevier publication.

Suber's blog also quotes from a recent article by Robert Kiley, Head of Systems Strategy for the Wellcome Trust, annnouncing that

...from 1 October 2005, all new grant recipients will be required to deposit in PMC [PubMed Central -- myl], or a UK equivalent, any papers arising from Trust-funded research. This condition will be extended to all existing grant holders from October 2006. All papers deposited with PMC will be made freely available to the public, via the Web, within 6 months of the official date of final publication.

Kiley goes on to write that

Ultimately, for the benefits of open access to be fully realised, we need to win over the hearts and minds of those who actually do the research and write the papers – the scientists and researchers. For this group, the key drive behind publishing is a desire for their research to be read and cited. To misquote President Clinton ‘it’s about impact, stupid’. Fortunately for advocates of open access, research1 is starting to show that open-access articles were cited between 50–300% more often than non-open access articles from the same journal and year....The developments announced by the Wellcome Trust over the past couple of months – coupled with the public access initiatives at the US National Institutes of Health and the recent announcement from Research Councils UK (RCUK) in support of open access – all suggest that we are witnessing a sea-change in the way research findings will be disseminated and made accessible in the future.

Are there any language-related journals that have moved in this direction? DOAJ lists 37 open-access journals in the subject area of linguistics, but these don't include the titles that I for one would like to see there.

August 12, 2005

The Jerk-o-Meter

Today's news includes this article about the "Jerk-o-Meter", described as a device which can tell you whether the person you're talking to on the phone, or you yourself, is paying attention. It analyzes speech for features that reflect "activity" and "stress". The same technique provides better-than-chance predictions of the outcome of speed dates.

How useful a device this is is not so clear. Anmol Madan suggests that it might help to improve relationships by preventing arguments. Maybe, but I'm not so sure. Is receiving confirmation that the person you're talking to is not interested really going to improve the relationship? And note that it doesn't predict the outcome of speed dates in any useful way: its "prediction" is based on the conversation during the date, so it doesn't save you any time or angst.. Its kind of like the situation in weather prediction about twenty years ago, where they could predict the weather three days in advance but the computer model took three days to run, only in this case the problem won't disappear with faster computation.

Another suggested use is that:

it might assist telephone sales and marketing efforts

I thought technology was supposed to improve our lives. As an antidote, I recommend the National Do Not Call Registry.

If you read the paper that underlies the press reports, it turns out that there is something interesting here.

we propose that minute-long averages of audio features often used to measure affect (e.g. variation in pitch, intensity, etc.) taken together with conversational interaction features (turn-taking, interrupting, making sounds that indicate agreement like 'uh-huh') are more closely related to social signaling theory rather than to an individual's affect.

In other words, these features aren't uncontrollable subconscious cues to the speaker's mental state but controllable aspects of communication.

Posted by Bill Poser at 10:12 AM

GNU GSS in Kinyarwanda

Some time ago we had a discussion of whether speakers of minority languages have much interest in the localization of computer software. I just encountered an interesting datapoint while checking out Freshmeat, a catalogue of Unix and cross-platform software whose front page contains announcements of new releases. One program on today's front page is version 0.0.16 of the GNU Generic Security Service Library. Here's the blurb:

Generic Security Service (GSS) is an implementation of the Generic Security Service API (GSSAPI). It is used by network applications to provide security services, such as authenticating SMTP/IMAP, via the GSSAPI SASL mechanism. It consists of a library and a manual, and a Kerberos 5 mechanism that supports mutual authentication and the DES and 3DES ciphers.

The first of the changes announced today is:

A Kinyarwanda translation has been added.

I don't know who did the Kinyarwanda translation or why, but somebody evidently sees a need for a Kinyarwanda translation of a fairly technical piece of software that will be used only by programmers.

Posted by Bill Poser at 01:40 AM

Rorschach Science

The stimulus? A journal article about functional brain imaging of men listening to variously-hacked men's and women's voices.

The response? Worldwide resonant evocation of sexual stereotypes, congruent and contradictory alike.

Some headlines: "Er, you what, luv?" -- "Man Leaves Wife, Realizes Six Hours Later" -- "Female Voices are Easier to Hear" -- "What We Have is Failure to Communicate" -- "Men do Have Trouble Hearing Women" -- "Why Imaginary Voices are Male" -- "It's official! Listening to women pays off" -- "Men do have trouble hearing women, scientists find".

The blogospheric reactions are just as creative: "I can't hear you, honey...you're just too difficult to listen to" -- "What to tell your wife when you didn't hear her" -- "Men who are accused of never listening by women now have an excuse -- women's voices are more difficult for men to listen to than other men's, a report said" -- "I've been waiting for this for a long time. I'm often accused of 'selective hearing' in which certain statements just disappear from my consciousness - often statements made by Mrs. HolyCoast. It usually occurs when I'm multi-tasking, such as watching TV or blogging while listening to my better half..." -- "Science explains patriarchal monotheism!" ...

So I went and read the journal article: Dilraj S. Sokhi, Michael D. Hunter, Iain D. Wilkinson and Peter W.R. Woodruff, "Male and female voices activate distinct regions in the male brain", In Press, NeuroImage. I'm deeply puzzled by some of the research that paper describes -- if Sokhi et al. really did what they seem to be saying they did, I don't see how the results can be interpreted at all -- but I'm pretty sure that the experiment doesn't mean most of the things that people are saying it does. Maybe it doesn't mean any of them.

Here's what they did. They recorded 12 male and 12 female speakers reading some "emotionally neutral" sentences, "balanced in being directed to three main cortical modalities: vision ('look in the newspaper'), auditory ('listen to the music') and motor ('open the kitchen door')". As expected, the average pitch (F0) was different for the two groups -- "112.01 ± 8.11 Hz for male speakers and 204.68 ± 19.31 Hz for female speakers". They took from the literature the observation that there is a "gender-ambiguous" F0 range in the region of 135 to 181 Hz., where the typical "tessitura" of male and female speakers overlaps, and so they scaled each phrase in four steps from its original speed to a speeded-up or slowed-down version whose average pitch would be at 135 Hz. (for female speech) or 181 Hz. (for male speech).

If that sounds like a strange thing to do, it is. But here's what they say:

We calculated the difference between a speaker's F₀ and the ‘target F₀’ which, defined by the GAR F₀ (see above), was 181 Hz for male speakers and 135 Hz for female speakers. We then derived speaker-specific scalar factors (SF_qs) to pitch-scale a speaker's F₀ in four equidistant steps (q = 1 to 4) to the ‘target F₀’ without preserving F_n.

When they say "without preserving F_n", what they (seem to) mean is that they didn't do any fancy processing to change the pitch (F₀) without changing the vocal-tract resonances (F₁, F₂, F₃ etc.); instead they just increased or decreased the overall playback speed in proportion to the needed F₀ changes. And the maximum amount of change was considerable -- an average male recording would have been sped up to as much as 181/112 = 162% of the original rate, while an average female recording would have been slowed down to as much as 135/205 = 66% of the original rate. (It's possible that I've misunderstood this, but I don't see any other way to interpret what they say...)

In fact they didn't use quite this much shifting, because they did perceptual tests to find the amount of shift that would produce "gender-ambiguous" stimuli, "defined by reaching the 50% mark .. for accuracy in reporting the gender of a given set of voices", and this was achieved by shifting a selected subset of stimuli to "159.13 ± 5.52 Hz for male speakers and 156.83 ± 4.09 Hz for female speakers, where the F₀s for the corresponding selected natural stimuli were ... ‘male gender-apparent’ = 107.55 ± 6.46 Hz and ‘female gender-apparent’ = 211.77 ± 14.07 Hz."

So they speeded up the male recordings, on average, by a ratio of 159.13/107.55 = 1.48, and they slowed down the female recordings by an average ratio of 156.83/211.77 = 0.74. Still quite a big shift -- I'd expect these stimuli to be species-ambiguous, not just gender-ambiguous. Also, note that the duration of the phrases will be changed by the same factors, so the female phrases slowed down so as to be sexually ambiguous will be roughly twice as long as the male phrases speeded up so as to be sexually ambiguous.

Why did they do this? Well, they say that the shifted recordings "were selected for the fMRI experiment as these stimuli were matched for F0, thus removing the confound of simple pitch effects during perception of gender from heard speech". The basic idea is a sound one, since they want to be able to claim that they're seeing the effects of perceived speaker sex, not just the effects of higher versus lower pitched voices. But this is a strange way to go about it, since the shifted stimuli (according to their perceptual experiments) were selected so as to be identified as male and female about equally often! (There are indeed other cues to sex in the voice besides F0, as the authors mention, but they've specifically selected the artificial stimuli so that sex judgments are roughly equal...). Logically, I would have expected them to choose naturally-occurring male-perceived voices and female-perceived voices with F₀ in an overlapping range, but they didn't try to do this. And the rate-shifting manipulation that they did (apparently) use not only doesn't preserve perceived sex, it introduces some other non-sex-linked acoustic factors (like duration differences) that seem just as problematic as the F₀ difference it eliminated. They could have used pitch-shifting technology to change the pitch without changing the duration or the vocal-tract resonances, but they didn't, again I'm not sure why.

In any case, they've got four classes of stimuli:

The original samples of the sentences recorded from each speaker, with the original F₀, together with the set of new stimuli of frequency F_0(g-amb) gave 96 stimuli falling into four categories: ‘male gender-apparent’ (unaltered in pitch), ‘male gender-ambiguous’ (pitch-scaled and ‘gender-ambiguous’), ‘female gender-apparent’ and ‘female gender-ambiguous’ stimuli.

They played these 96 stimuli to 12 male subjects. It's not clear why they only studied males -- at least I couldn't find any reason for this. Maybe they're planning to look at female subjects in a different study. But 12 subjects is not a big fMRI experiment, so I'm not clear why they didn't look at both sexes. (And as you'll see, having female subjects would make a big difference in interpreting the results...)

Anyhow, the key thing in such functional imaging studies is that you can't just look at one condition. You need to compare the distribution of cerebral blood flow when subjects are doing X to the distribution when they are doing Y, or some more complicated sort of comparison of a similar kind. This is roughly for the same reason that in studying the effect of drug on a disease, you can't just give it to some patients and see how many get well; you need to compare the results for a matched set of patients who didn't get the drug. In this experiment, they defined their comparisons as follows:

(i) [(‘female gender-apparent’ > ‘male gender-apparent’) AND (‘female gender-ambiguous’ > ‘male gender-ambiguous’)] = “female versus male”;
(ii) [(‘male gender-apparent’ > ‘female gender-apparent’) AND (‘male gender-ambiguous’ > ‘female gender-ambiguous’)] = “male versus female”;
(iii) [(‘male gender-apparent’ > ‘male gender-ambiguous’) AND (‘female gender-apparent’ > ‘female gender-ambiguous’)] = “‘gender-apparent’ versus ‘gender-ambiguous’”; and
(iv) [(‘male gender-ambiguous’ > ‘male gender-apparent’) AND (‘female gender-ambiguous’ > ‘female gender-apparent’)] = “‘gender-ambiguous’ versus ‘gender-apparent’”.

Thus when they say (in their press release) that "when a man hears a female voice" such-and-such a region of his brain is activated, what they mean is that the specified region is (among the regions where) the two conditions specified in (i) are met: first, 'female gender-apparent' recordings create significantly more activation than 'male gender-apparent' recordings, and second, 'female gender-ambiguous' recordings yield significantly greater activation than 'male gender-ambiguous' recordings.

But there are some other descriptions you could give of that set of conditions. For example, you could say that these are the brain regions that respond more to higher-pitched speech than to lower-pitched speech; and for speech in a medium pitch range, respond more to recordings that have been slowed down to reach that level than to recordings that have been speeded up to reach that level. Or perhaps, respond more to phrases that are longer in duration than to phrases that are shorter in duration. This last is not a trivial issue, especially since the subjects were listening to the stimuli against the background of scanner noise, which is roughly like being in a boiler factory inside one of the boilers. (It's possible to arrange the scanning acquisition so that audio stimuli are played in silent intervals, but that was not done in this experiment). So higher-pitch or longer-duration stimuli will probably be more acoustically salient, especially in this very noisy environment, and therefore might show increased auditory activation, quite apart from any sexuality judgments. And lower-pitch or shorter-duration stimuli will be harder to hear, and therefore might engage some additional attention-focusing mechansisms, again apart from any sexuality judgments.

Whatever the reasons, their results were these:

Conjoint contrast	Brain region
“Female vs. male”	Right anterior superior temporal gyrus
“Male vs. female”	Right precuneus
“‘Gender-apparent’ vs. ‘gender-ambiguous’”	Posterior superior temporal plane contiguous with inferior parietal lobule
“‘Gender-ambiguous’ vs. ‘gender-apparent’”	Right anterior cingulate gyrus

So as I said, I'm really puzzled about how to think about what these results mean. Whatever is going on, though, there's nothing in their results to stand behind statements like "[t]he female voice is actually more complex than the male voice, due to differences in the size and shape of the vocal cords and larynx between women and men", as the Sheffield press release asserts.

And the same press release says that "when a man hears a female voice the auditory section of his brain is activated, which analyses the different sounds in order to 'read' the voice and determine the auditory face" -- are we supposed to conclude that males hears male voices in a way that by-passes the auditory cortex? Well, they go on to say that "[w]hen men hear a male voice the part of the brain that processes the information is towards the back of the brain and is colloquially known as the 'mind's eye'. This is the part of the brain where people compare their experiences to themselves, so the man is comparing his own voice to the new voice to determine gender."

But if even if their conjoint contrast (ii) is really male-vs-female and not lower-pitch-and-shorter-phrases-vs.-higher-pitch-and-longer-phrases (and similarly for the other three contrasts), the results are still not about males-hearing-sex-identified-voices. They're (at best) about males-hearing-males after you subtract out everything this has in common with males-hearing-females; and males-hearing-females after you subtract out everything this condition has in common with males-hearing.males. And because they don't have any data on females-hearing-males vs. females-hearing-females (or females-hearing-lower-pitch-and-shorter-phrases, and so on), interpretations in terms of "people comparing their experiences to themselves" are at best highly speculative.

Unless I'm missing something, it seems to me that the increased STG activation in their condition (i) -- which they explain as males hearing females in areas adjacent to the auditory cortex -- might just as well be explained as subjects responding to acoustically more salient stimuli (higher pitch or longer duration) with more activation in acoustically-specialized areas of the brain. As for the increased precuneus activation in their condition (ii) -- which they explain as males responding to males by self-comparison in "the mind's eye" -- the precuneus (a structure in the parietal lobe) has been implicated in all sorts of things, from representation of the visual periphery to motor imagery of finger movement, with some stuff about attention along the way, so that I'd think you might just as plausibly explain this effect in terms of subjects attending more closely to acoustically less salient stimuli in a noisy environment, while thinking harder (or for a longer time) about which button to press to register the perceived sex of each stimulus.

The journal article starts out with some statistics about auditory verbal hallucinations in schizophrenia -- "The voices of AVHs are perceived as male 71% (and female 23%) of the time irrespective of the patient’s gender. The characteristics of the voices of AVHs are also commonly middle-aged, external to the person, right-lateralised, ‘‘BBC newsreader’’ accent in quality and derogatory in content (Nayani and David, 1996)." This is interesting, but I'm not convinced that the fMRI findings help us to understand this, especially the middle-aged, BBC newsreader, derogatory parts, which are properties totally orthogonal to anything in the experiments.

And as for the rorschach-blot reactions in the popular press and the blogs, about how this explains why men have a hard time paying attention to women, or why women's speech is more valuable, or why men and women often fail to communicate... Well, what's responsible for these responses is not the STG or the precuneus, it's the limbic system. When people have strong and complex feelings about a topic, research results become a screen for them to project their preconceptions onto.

Posted by Mark Liberman at 12:20 AM

August 10, 2005

Chimwiini exemplified, tax perceived, uncertainty admired

A few days ago, the mail brought me a copy of The Chimwiini Lexicon Exemplified, by Charles W. Kisseberth and Mohammad Imam Abasheikh, which is no. 45 in the Asian and African Lexicon series published by the Research Institute for Languages and Cultures of Asia and Africa, of the Tokyo University of Foreign Studies.

The book came in a cardboard package without any stamps or any other indication of a specific amount of postage having been paid. The upper right corner of the package is blank, but in the top center there is a circular stamping like a postal cancellation, which (after being de-circled) reads:

BUREAU DE POSTE

MUSASHIFUCHU
JAPON

TAXE PERÇUE

I didn't know that percevoir can mean "to receive (payment)" as well as "to perceive" or "to comprehend". Taxe perçue is a charming expression, as if it's enough for the Japanese postal authorities that TUFS has perceived its financial obligation in this matter. Or perhaps, since the perceiving agent is unspecified, it's only someone in the Musashifuchu tax office who perceived it? In any case, despite Shintaro Ishihara's insults, the Japanese government is apparently still using French to let the rest of the world know that adequate tax-perception has occurred. And the fact that the tax was perceived by someone in Japan, and thus noted in French, was also enough to persuade the U.S. Post Office to deliver the book to my office in Philadelphia

Chimwiini is a Somali dialect of Swahili. Ethnologue specifies

Region: The Mwini live in Baraawe (Brava), Lower Shabeelle, and were scattered in cities and towns of southern Somalia. Most have fled to Kenya because of the civil war. The Bajun live in Kismaayo District and the neighboring coast.

Dialects: Mwini (Mwiini, Chimwiini, Af-Chimwiini, Barwaani, Bravanese), Bajuni (Kibajuni, Bajun, Af-Bajuun, Mbalazi, Chimbalazi).

Comments: Reported to have come centuries ago from Zanzibar. Mwini: artisans (leather goods); Bajun: fishermen.

According to Kisseberth and Abasheikh's Preface,

Chimwiini is a dialect of Kiswahili which has, for some centuries, been spoken in the town of Mwiini (generally known as Brava or Barawa) in southern Somalia. Brava was at one time not the only location in Somalia where forms of Kiswahili were spoken. Historical evidence shows that some centuries ago, Kiswahili was spoken at least as far north as Mogadisho. The Somali language eventually displaced Kiswahili in Somalia except for Brava. The people of Brava (numbering roughly 10,000 in the early 1970's, according to MIA's estimate) somehow resisted the Somali language hegemony. Civil war and the political chaos in Somalia in the first part of the 1990's have apparently led to the dispersal of the population of Brava, with many people currently refugees in Kenya or further afield. The present outlook for the language's continued existence looks bleak indeed.

While Chimwiini is a dialect of Kiswahili, its differences from Kiswahili in phonology (especially in the prosodic features of length and accent), morphology, and lexicon (due in large part to the significant influence of Somali) warrant detailed study of all aspects of its structure.

The preface also explains what the authors mean by "exemplified":

This volume atempts (a) to document the lexicon of Chimwiini and (b) to exemplify the morphological, phrasal and sentential patterns of the language as fully as possible given the limitations of our research. ... The examples include single words, phrases and sentences. From the point of view of a purely lexical study, the examples are often redundant (i.e. do not provide new information about the meaning or use of the item in question). They do, however, serve the purpose of richly documenting a little studied, endangered language.

Although the authors have been working on this project on and off since 1973 (among many others activities for both of them), and although one of the authors is a native speaker of Chimwiini, the work often admits to uncertainty, in some cases about fairly basic things. For example, one of the exemplifications given for ma-haba "love, affection" is

wa'ishiize pamooyi ka mapeenḏo na mahabbá they lived together in love and affection (phon. This item was recorded with gemination, but the precise status of gemination in the language is not easy to determine -- is it entirely stylistic? is it a combination of both stylistics and lexicon? in the case of borrowed words, what is the relevance of gemination in the source language to its treatment in Mw.?)

I think this frankness about scholarly uncertainty is refreshing and praiseworthy.

Most of the uncertainties are more local, like about the relationship of words to possible cognates in Kiswahili or Somali. There are also quite a few entries whose gloss is "[unfortunately we did not obtain a gloss for this item]", and notes like this one, for a word glossed "at a hotel":

(Note: Doubtless the basic from of this noun exists in Mw., but we only recorded the locative form and thus now cannot be certain about what the correct vowel quantity is in the basic form.)

The work is full of helpful little grammatical notes, as in the entry for iḻa "defect", whose exemplification includes

[numba yaa wé/ nt^hukiingilá/ híiwi/ iḻáye] [prov.] the house that you/ have not entered/ you cannot know/ its defects (notice that a negative relative verb does not end in o but rather a)

And sometimes the grammatical notes are not so little. My favorite part is the discussion, scattered throughout the work, of the interaction of lexical, morphological and phrasal factors in determining which syllables are accented. Describing the development of the authors' ideas about this aspect of the language, the preface says:

Lexical items are characterized by penultimate accent in the unmarked case, but there are morphosyntactic factors that trigger ultimate accent. The principles governing vowel length and accent are critically dependent on the parsing of sentences into "prosodic phrases" [=PP]. Whether a vowel can be long depends on its position in the PP; whether a vowel is accented or not depends on its position in the PP. ...

In principle, we would have like to record each and every example in what we might call a narrow transcription. That is, we would have liked to indicate whether a given vowel was long or short, accented or not, and how each example is organized into prosodic phrases. Many of the examples in this book are in fact given in such a narrow transcription. These examples can be recognized as follows: there is a left bracket ("[") at the beginning of the example and a right bracket ("]") at the end; the right edge of all phrases except the last is marked by a slash ("/"); short vowels are written with a single symbol (e.g. a) while long vowels are written double (e.g. aa); and accented vowels are written with an acute accent mark over them (e.g. á) while unaccented vowels have no accent mark.

While we would have liked to always give a narrow transcription, this has not been possible. Unfortunately, at the time when the data was collected, we did not fully understand the accentual system. While we made an attempt to accurately transcribe the vowel length facts of every example we collected (and believe that our observations in that regard are generally accurate), we could not mark the accent fully. ...

Recently, we have achieved a much better understanding of accent, and armed with that understanding, it is possible to re-examine material that was tape-recorded and assign such material a narrow transcription. It is also possible to return to many examples that we collected (but did not tape record) and assign them an accentual structure and a PP-phrasing that is undoubtedly correct. But there are various reasons why this is not always possible ...

So they use three other kinds of transcription: what they call a broad transcription, which indicates vowel length and accent to the extent that they are sure of them, and prosodic phrasing to the extent that it is determined by that information; what they call a phrasing-free transcription, which indicates some vowel length and accent position but does not attempt to make any prosodic phrase boundaries; and what the call a prosody-free transcription, which makes no attempt to mark vowel length, accent or prosodic phrasing. Prosody-free transcriptions are required in cases where they were given examples in written form by others, or where examples are known to them only from song recordings (from which vowel length and accent can't reliably be determined in this language).

Anyone who has worked on an undocumented language or dialect will be familiar with this kind of situation. In fact, any honest observer who has worked on even the most extensively documented speech communities will recognize the sort of thing that they are writing about. For example, in the work recently sketched here on the pronunciation of the and a in English, there are some transcriptional uncertainties that are quite similar to the sorts of uncertainty that Kisseberth and Abasheikh discuss so frankly with respect to the Chimwiini examples. I'll pick this up again in another post.

[Update: Steve from Language Hat checked the OED, as I neglected to do, and discovered that English used to have the same sense of "take into possession" for perceive. This makes perfect sense, since the Latin root meant "to take"...:

II. To take into possession. Cf. L. percipere, F. percevoir, in lit. sense, from L. capere to take.

8. trans. To receive (rents, profits, dues, etc.).

1382 WYCLIF Tobit xiv. 15 Al the eritage of the hous of Raguel he perceyvede [Vulg. percepit]. 1472-3 Rolls of Parlt. VI. 4/2 Every of the seid men Archers, to have and perceyve vid. by the day oonly. 1512 Knaresb. Wills (Surtees) I. 4, I will that my forsaid doghters have and persaive all the revenieuse. 1596 BACON Max. & Use Com. Law I. xx. (1636) 73. 1625 Concession to Sir F. Crane in Rymer Fædera XVIII. 60 To have, houlde, perceive, receive and take the said annuitie or yeerely pension of two thousand pounds.

b. in gen. sense: To receive, get, obtain. Obs.

1482 Monk of Evesham (Arb.) 75 Gretely merueylde why he yat was so honeste of leuyng..had not yette perceiuyd fully reste and ioye. 1540-54 CROKE Ps. (Percy Soc.) 19 Full spedely let me obteyne Thy socoure, and perceyue the same. 1591 SHAKES. Two Gent. I. i. 144 Pro. Why? could'st thou perceiue so much from her? Sp. Sir, I could perceiue nothing at all from her; No, not so much as a ducket for deliuering your letter. 1748 J. NORTON Redeemed Captive (1870) 22 Mrs. Smeed was as wet.. but through the good providence of God, she never perceived any harm by it.

]

Posted by Mark Liberman at 10:43 PM

Disambiguating

Prescriptive grammarians routinely disparage innovative usages as introducing ambiguities: speaker-oriented hopefully, logical rather than temporal since and while, and on and on. Non-standard usages, like multiple negation, are sometimes attacked on the same grounds. Yet everyday language (even in conservative and standard varieties) is jam-packed with ambiguity, not all of it easily resolved in context. We end up having to ask whether someone meant 'spicy hot' or 'hot in temperature', 'funny-ha-ha' or 'funny-peculiar', 'just now' or 'just-only', etc.

Non-standard varieties not infrequently have usages that help to disambiguate; the choices in AAVE between a tensed copula ("They are sick"), the zero copula ("They sick"), and invariant be ("They be sick") is a famous case in point. This morning the New York Times (8/10/05, p. A15) provided another example, having to do with the ambiguity of have 'own, possess' vs. 'have on/with one'.

The example comes in Michael Winerip's "On Education" column, "Essays in Search of Happy Endings", about teachers and students in the disfunctional setting of Locke High School in Los Angeles:

They were supposed to do a half-hour of silent reading and write about it, but only a handful brought books. The rest... were allowed to write an essay on why it's important to bring your book. "If I write, 'I ain't got it; that's why I don't got it,' is that worth points?" asked one of three boys who taunted the young teacher the entire two hours.

I've boldfaced the relevant bit, in which the 'own, possess' sense is conveyed by negation with ain't, while the 'have on/with one' sense is conveyed by negation with don't. The student could have said, "If I write, 'I don't have it; that's why I don't have it'...", but that would have been just baffling. The student could have said, "If I write, 'I don't own it; that's why I don't have it with me'... ", that would have more or less worked (though own isn't quite the right verb here, since students don't usually buy their books, but have them issued to them). What the student did say was both clear and succinct (brevity is also a much-touted virtue), though seriously non-standard.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 01:30 PM

August 09, 2005

Under the constitution and not over it

Can you guess who delivered this speech, and when?

"The [Supreme] Court has been acting not as a judicial body, but as a policy-making body. ... The Court in addition to the proper use of its judicial functions, has improperly set itself up as a third house of the Congress, a super-legislature, as one of the justices has called it, reading into the Constitution words and implications which are not there, and which were never intended to be there.

We have, therefore, reached the point as a nation where we must take action to save the Constitution from the Court and the Court from itself. ... We want a Supreme Court which will do justice under the Constitution and not over it.

I want - as all Americans want - an independent judiciary as proposed by the framers of the Constitution. That means a Supreme Court that will enforce the constitution as written, that will refuse to amend the constitution by the arbitrary exercise of judicial power. ... I will appoint Justices who will act as Justices and not as legislators

During the last half century the balance of power between the three great branches of the Federal Government, has been tipped out of balance by the Courts, in direct contradiction of the high purposes of the framers of the Constitution. It is my purpose to restore that balance. "

Was this Ronald Reagan explaining why he nominated Robert Bork? Was it George W. Bush during the 2004 campaign, describing his philosophy on judicial appointments? Or was it Rick Santorum, explaining why he's decided to run for president in 2008?

No, it was Franklin Delano Roosevelt, in his ninth Fireside Chat, "On the Reorganization of the Judiciary", delivered on March 9, 1937.

A (rather errorful) transcript is here; a streaming Real Audio version is here, and a downloadable mp3 here.

My corrected transcript is here. I 've fixed a variety of omission, insertions and substitutions; divided the speech into breath-group-sized phrases; and noted the pronunciation of the indefinite article "a", with reduced forms ("uh", IPA [ə]) in blue and unreduced forms ("ay", IPA [ej]) in red.

This being Language Log, the pronunciation was what motivated me to listen to this speech. It's another data point in the on-going saga of article unreduction. In this 34-minute speech, FDR almost exactly splits the difference -- 41 of his a's are reduced and 40 unreduced.

If you look over the transcript and listen to the audio, I think you'll find that it's not trivial to predict where unreduction will strike. It's clearly not a marker of disfluency, but it doesn't always seen to be a phonetic hi-liter either. For example, FDR makes a contrast in which the first a is unreduced while the second one is reduced:

The Court has been acting not as a judicial body, but as a policy-making body. (audio link)

In this case, there's no pause after the unreduced a, or anywhere else in the phrase "a judicial body", but he does pause in "a policy-making ___ body".

A little later he gives two phrases in apposition where the first has a reduced a and the second an unreduced one:

The Court in addition to the proper use of its judicial functions
has improperly set itself up as a third house of the Congress
a super-legislature,
as one of the justices has called it (audio link)

In this case, he pauses in the middle of the phrase with the unreduced a: "a super ___ legislature".

And a bit later, there's a list where the first two instances of a in "a Chief Justice" are reduced while the third one is unreduced:

President Taft appointed five members
and named a Chief Justice;
President Wilson, three;
President Harding, four,
including a Chief Justice;
President Coolidge, one;
President Hoover, three,
including a Chief Justice. (audio link)

Go figure. Anyhow, Chris Waigl and I are still gathering data on this phenomenon -- you may have noted some interesting pronunciations of the in these FDR audio clips as well -- and you'll hear more from us on this over time. I'll admit, though, that I've posted about this speech because I thought the content was an interesting counterpoint to the current debate over judicial philosophy. I learned in high school about Roosevelt's attempt to add six justices to the Supreme Court, in order to overcome judicial resistance to his legislative agenda. I didn't know that he used this "courts should not legislate" rhetoric, though of course it makes perfect sense.

Posted by Mark Liberman at 02:08 AM

August 07, 2005

Just between Dr. Language and I

Stanford student Tommy Grano's been looking at pronoun case usage in English, especially in coordination. This has led him to the "Dr. Language" column on yourdictionary.com, specifically to a piece entitled "Are you and I you and me?":

The piece retails the standard hypercorrection story for "between you and I" and similar expressions, and in addition locates this hypercorrection as quite recent -- so recent that it could be nipped in the bud by quick action. Possibly, Dr. Language got this idea from James Cochrane's annoying Between You and I; on p. 14, Cochrane says: "This oddity, which seems to have emerged only in the last twenty or so years, presumably arises from a feeling of discomfort about using the word me, a sense that it is somehow impolite or 'uneducated.' "

Well, they're both wrong, pretty spectacularly, though Dr. Language's discussion has some amusement value. If only they'd thought to consult some standard sources or look at some facts, they might not have fallen into error and spread this error to their readers. Instead, they depend entirely on their subjective impressions about the facts of English usage -- impressions that are very likely to be skewed in systematic ways.

From Dr. Language's column:

... The result is that prescriptive grammar books used in U.S. schools for years have taught children to avoid constructions like "me and X" in favor of "X and I," where "X" represents any other noun or pronoun referring to a human being. They seldom make clear that this rule applies only in the Subject position. The critical grammatical rule, that "I" appears only in the Subject while "me" must be used in all Object positions gets lost in the concern for etiquette.

Young people in the U. S. have been so exposed to this oversimplified explanation of the "me-and-you" problem, that about 20 years ago U.S. English-speakers began switching "me and X" to "X and I" everywhere the phrase occurs -- in Subject and Object positions. When actors and others on TV and radio began speaking with this error, it spread like wildfire.

However, since yourDictionary.com has caught this speech error in its early stages, it is possible to stop its spread. The prescription is simple: first, we must all stop making the error. Second, we must make sure that when we, as teachers and parents, correct "me-and-you" problem [sic], we keep in mind that it is a dual error: the grammatical error of using the Object form of "I" in the Subject position and a point of etiquette that is at best optional. It is crucial that everyone understands that changing the "me" to "I" is restricted to Subject position.

... Keep the following mnemonic sentence in mind: "I" am the Subject but the Object is "me." There are no exceptions. Join yourDictionary in the fight to nip this linguistic virus in the bud!

Now for my rant. Why do people (like Cochrane and Dr. Language) who propose to offer authoritative advice to educated people not use standard sources of information? ("You could look it up", as Casey Stengel is reported to have said, with reference to his claim that most people his age were dead.) A quick trip to the OED would show a longer and more complicated history, and the MWDEU entry on "between you and I" would be a real eye-opener. The facts look complex, but it's safe to say that the rise of "between you and I" in Late Modern English goes back at least 150 or 160 years, not 20; earlier uses go back about 400 years. There's no way it can be blamed on modern education, as John Simon suggested in 1980 (see MWDEU), unless Simon was just playing with different senses of "modern".

In any case, we have here another instance of the Recency Illusion, the belief that things YOU have noticed only recently are in fact recent. This is a selective attention effect. Your impressions are simply not to be trusted; you have to check the facts. Again and again -- retro not, double is, speaker-oriented hopefully, split infinitives, etc. -- the phenomena turn out to have been around, with some frequency, for very much longer than you think. It's not just Kids These Days.

Professional linguists can be as subject to the Recency Illusion as anyone else. Charles Hockett wrote in 1958 (A Course in Modern Linguistics, p. 428) about "the recent colloquial pattern I'm going home and eat", what Laura Staum has been investigating under the name (due to me) the GoToGo construction. Here's an example I overheard in a Palo Alto restaurant 8/6/05: "I'm goin' out there and sleep in the tent." But Hockett's belief that the construction was recent in 1958 is just wrong; David Denison, at Manchester, has collected examples from roughly 30 years before that.

Another selective attention effect, which tends to accompany the Recency Illusion, is the Frequency Illusion: once you've noticed a phenomenon, you think it happens a whole lot, even "all the time". Your estimates of frequency are likely to be skewed by your noticing nearly every occurrence that comes past you. People who are reflective about language -- professional linguists, people who set themselves up as authorities on language, and ordinary people who are simply interested in language -- are especially prone to the Frequency Illusion.

Here at Stanford we have a group working on innovative uses of all, especially the quotative use, as in the song title "I'm like 'yeah' and she's all 'no'". The members of the group believed that quotative all was very common these days in the speech of the young, especially young women in California, and the undergraduates working on the project reported that they had friends who used it "all the time". But in fact, when the undergrads engage these friends in (lengthy) conversation, tape the conversations, transcribe them, and then extract occurrences of quotatives, the frequency of quotative all is very low (quotative like is really really big). There are several interpretations for this annoying finding, but we're inclined to think that part of it is the Frequency Illusion on our part.

Nominative coordinate objects are also a lot less frequent than you might have thought, according to Grano's searches through several types of corpora.

Of course, sometimes your off-the-cuff frequency estimates are right. Quotative like really IS incredibly common for some speakers. Double is -- "The thing is is that we've gotta go" -- really IS incredibly common for some speakers; I've come across one speaker who appears to use it very close to categorically, producing an extra form of be in virtually every place he could. But the point is that you actually have to look at the facts; your impressions are unreliable.

People like Dr. Language are just too lazy to look it up in reference works (so they fall into the Recency Illusion) or to look at the facts (so they fall into the Frequency Illusion). They just go on their seat-of-the-pants guesses; don't confuse me with facts. And so they spread error. And on top of that, some of them make reputations and actually earn money doing this.

[This is a lightly edited version of a posting to the American Dialect Society mailing list, 8/7/05.]

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 02:48 PM

Never thought the day would he see

Once again Doonesbury has Zonker's surfing master talking in what is supposed to be Yoda's syntax, and as Mark noted here a couple of months ago, Gary Trudeau has no real idea of what the syntactic characteristics are meant to be. At one point the master says "Never thought the day would I see." This is meant to mean "I never thought I would see the day [when beach access for surfers at Malibu was finally restored]." He has the direct object of say in the subordinate clause fronted, and also subject-auxiliary inversion in that clause (the auxiliary would is positioned before the subject I), but in the main clause the verb precedes its complement and the subject is missing... Even Yoda would be surprised at this syntax, I think. People who try to write Yodic seem to imagine that if you just sling the words around a bit in random directions, that will count. The real Yoda from the Star Wars scripts has somewhat more in the way of syntactic regularities: the basic sequencing principle for clause constituents is Complement-Subject-Verb (which if applied here would yield "See this day I would, never [I] thought", or if applied only to the subordinate clause, "Never [I] thought see this day I would"). However, Mark found that extending the data under consideration (he looked at all of Yoda's utterances in episode 3) made the syntactic situation less clear rather than clearer. In real natural languages, looking at a larger quantity of data generally makes it clearer what the grammatical principles are. If that wasn't so, linguistics as a field would not exist.

Posted by Geoffrey K. Pullum at 01:03 PM

August 06, 2005

Contextual tautologies

I'm not the only Language Log contributor who's a fan of Rob Balder's wonderful PartiallyClips, a cartoon strip made by the very simple device of adding speech balloons to an uncopyrighted clip art picture repeated three times. This recent strip is about language and logic.

Rob's male character is wrong about paradoxes (looking up the right definition is left as an exercise for the reader); but he is right that sometimes the effect an utterer's speech activity has on the context can affect the possible truth values of a statement. I am now moving my lips, which contains four bilabial consonants, is another example: the moment you say it, it becomes true.

Not logically true, though; contingently true in virtue of a property that the context of utterance picks up because of what you are doing. It might be called (if you want a term for it) a contextual tautology.

Posted by Geoffrey K. Pullum at 08:46 PM

Science vs. semantics

A letter in the New York Times of 8/6/05 (p. A26) opposes science and semantics in a way that will strike linguists (who are already sufficiently annoyed by people who say "but that's all just semantics", as if the meanings of words and constructions weren't important) as very odd. It turns out that there is an important distinction here, between (to use technical language) technical language and ordinary language, which in ordinary language is sometimes referred to as a distinction between science and semantics. This is confusing, but I'm not sure what linguists can do about it, any more than I know how to untangle the thicket of meanings and uses surrounding the word gender.

Here's the letter, from Richard P. Binzel, a professor of planetary science at M.I.T., about whether the newly discovered celestial body orbiting the sun, beyond Pluto and somewhat larger than it, should be counted as a tenth planet (or whether Pluto should be demoted from planetary status):

    Re "Too Many Planets Numb the Mind" (editorial, Aug. 2 [suggesting the demotion of Pluto]):
    There is great difficulty in reaching a scientific consensus on defining a "planet" because this is not a scientific question. It is a question of semantics.
    A semantic solution is best reached using a historical context. Pluto as the "ninth planet" for eight decades sets the historical precedent for what size should serve as the dividing line for planetary status.
    You imply that Pluto's planetary status is a mistake. Modern trends toward inclusiveness across society argue differently. To exclude Pluto as a planet would be a mistake.
    Ten planets or more is a terrifically exciting and inspiring prospect worthy of expanding the mind.

What we see here is a privileging of science -- and scientific language -- over ordinary language, which for non-linguists is just "language", "language" having "semantics" for its words. On this view, the terminology of scientific (and other technical) disciplines gets its meaning from the categories of nature; science "carves nature at its/the/her joints", as they say. (I'm having some trouble finding out who first put it this way. Philosophers regularly put the expression in quotation marks, but they also regularly fail to cite a source.) Meanwhile, in plain-ol' "language", what words mean -- their "semantics" -- is a matter of convention, mostly arrived at through common practice, so that historical precedent is a relevant consideration. (I am reporting on this view, not necessarily advocating it.)

[E-mail has now rushed in with sources for the carving-nature image: Plato's Phaedrus 265d-266a (of course it goes back to Plato!). Thanks to, so far, Aaron Boyden and Jamie Dreier; and to John Lawler, who supplied an echo from Chuang Tzu.]

But in fact, both in technical language and in ordinary language, we have words and meanings. The meanings of technical terms are also matters of convention -- explicit convention, rather than implicit as in ordinary language. We also think, or hope, that we've fixed on the "right" set of scientific concepts, so that, in combination with a set of hypotheses about their relationships, they will allow us to explain and predict phenomena.

What's at issue for Binzel -- correctly, I think, no matter how much I cavil at his wording -- is whether the CONCEPT of a planet (referred to by a technical term planet) plays a role in scientific theories, or whether planet is only a word of ordinary language. (It could, of course, in some sense be both. That is, the same phonology and orthography could be used differently in the two domains -- usually because ordinary-language words were borrowed as technical terms, with different meanings: fruit, herb, bug, force, mass, element, class, group, and so on.) Binzel in effect claims that there's no scientific theory in which planets play a role; nothing of scientific significance would follow from the classification of the new celestial object as a planet, or from its classification as something else, for that matter. He's saying that planet is, nevertheless, a useful word of ordinary language, and we can discuss whether older conventions of ordinary language should take in this new object.

On this question, there are arguments on all sides: for conservatism (many object to abandoning material memorized in school) or for generosity (Binzel's option, with its out-of-left-field appeal to a wider notion of social inclusiveness as well as to the unsurprising size criterion) or for retrenchment (the NYT's suggestion, using the chemical composition of celestial bodies and the nature of their orbit, as well as their size, as criterial). In the end, the people who write textbooks will probably tip the scales in favor of one usage or another. Remember, these are the folks who gave you indigo as one of the "colors of the rainbow", not to mention the label violet for purple. (They didn't originate these usages, but they sure did make them the coin of the realm.)

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 03:44 PM

The thug and the slut

Granted, this extended metaphor occurs in a piece that is both playful and artful. But still, but still... this is surely the summer's loopiest grammar/meaning metaphor:

The structure of language lurks below the meaning of words. Chomsky wrote, "Colorless green ideas sleep furiously." This grammatical sentence illustrates that grammar and meaning have about as much relationship to one another as strangers on a blind date. Grammar is the towny. This dude, this thug, knows the ins and outs of the place by heart. He runs the show, and he practically owns the territory. His date just blew into town. She's all fluttery in this gaudy multipart outfit she copped at various exotic bazaars and flea markets. Half the time she's got no idea what she's saying, but she's easy, in actual fact a slut willing to go along with just about anything.

Oh my.

[Priscilla Long, "Genome Tome: Twenty-three ways of looking at our ancestors" (The American Scholar, Summer 2005), p. 34, the end of section 11, "The Grammar Gene", on the innateness of grammar]

When I try to work out the details of Long's thug-and-slut metaphor, my head threatens to explode. Like, Meaning has no idea what she's saying? I would have thought that that's pretty much ALL she had an idea about. And how come Grammar's the one with all the information?

Someone should investigate the ways in which the grammar/semantics distiction is personified. Grammar is often cast as a fussy schoolteacher (a schoolmarm, in particular: Miss Fidditch) or some other kind of authority figure, a legislator or judge or priest (almost surely male). But grammar can also be seen as empty form, which on its own produces mere chatter without substance -- a female stereotype. Meaning, in contrast, is configured either as substantial and significant (so: agentive and male) or as "natural", even earthy (so: passive and female). You can get pretty much any assignment of the sexes to the two actors, Grammar and Meaning. (Though the fact that grammar almost always gets mentioned first, as in the passage from Long, suggests that it's more likely to be personified as male.)

The thug-and-slut story is, I guess, a version of the male authority figure (wielding the authority of the streets) vs. the passive, pliant female. But it's still loopy.

As a grammarian, it tickles me to see myself cast as a classic Bad Boy, the tough hoodlum with a sneer on his face (top of the world, Ma!), but then I dissolve into giggles because the picture is so far from my actual presentation of self. And I guffaw while trying to visualize the semanticists of my acquaintance -- David Beaver (of this parish), Stanley Peters, Barbara Partee, David Dowty, Angelika Kratzer, Hans Kamp, Sally McConnell-Ginet, Gennaro Chierchia, and so on -- as Women of Easy Virtue, dressed eccentrically in thrift-shop clothing and willing to go along with just about anything (they have always depended on the kindness of strangers).

I'm not quite done quoting Long, though. It gets weirder. Here's her entire section 12, "Grammar Gene Mutation":

Courtly cows dispense with diphthongs. Chocolate-covered theories crouch in corners. Corners rot uproariously. Refrigerators frig the worms. Catastrophe kisses the count of five. A statement digests its over-rehearsed rhinoceros. Bookworms excrete monogamous bunnies. Blue crud excites red ecstasy. All this during the furious sleeping of colorless green ideas.

Yes, a fresh contribution to the poetry of "colorless green ideas sleep furiously" (ca. 21,100 raw Google webhits, including its own Wikipedia entry). From which "frig" stands out. I would have written "fuck", but no doubt that wouldn't have been acceptable to The American Scholar; the kissing, excretion, and excitement to ecstasy are quite enough, thank you. (Dorothy Parker, surely apocryphally, to Norman Mailer, who was obliged to use "fug" in The Naked and the Dead: "So you're the young man who can't spell fuck?")

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 01:50 PM

A field guide to grammar

I swear, I'm not one of those people who thinks that Western Civilization is entering its Last Days. At least, not in general. I've defended modern students, writers and others from Camille Paglia's charge that "interest in and patience with long, complex books and poems have alarmingly diminished not only among college students but college faculty in the U.S.". I've defended email and cellphone usage against shoddy pseudoscientific indictments. But there are a few areas where I'll agree that civilization has indeed been overwhelmed. In particular, when it comes to elementary usage of linguistic terminology, intellectuals have joined the general population in untroubled ignorance, and even the sacred groves of academe have been clear cut, strip mined and used as a landfill.

Here at Language Log, we've noted example after example of this. Arnold Zwicky documented another one yesterday, and it's a doozy. William Howarth, writing about Rachel Carson in The American Scholar, mis-identified progressive verbs ("eels were leaving the marshes") as "passive gerunds". The walls of the city have fallen, and some Visigothic looter, swilling cognac, complains to his companions that the scotch tastes funny. But this is not a kid sounding off on his blog, or even a journalist mis-using terminology that he doesn't care to check: William Howarth is a professor of English at Princeton University, and The American Scholar is the "literary and intellectual quarterly" of the Phi Beta Kappa society.

What to do? We could simply abandon the terminology to the vagaries of current usage. For example, we could admit that the word passive just doesn't mean "passive" any more, but has developed two new senses, one for phrases whose subject is not an agent, and another for phrases involving a form of "to be" anywhere in the vicinity. Then we would have to make up new terms, like "perispectual verboid", to replace the old ones.

But I'm not ready to give up yet. Another option is suggested by Geoff Pullum's recent post on his sighting (well, hearing) of an aux-initial clause with complex subject, and Arnold Zwicky's old post about his search for first-mention possessive antecedents. Both pieces resonate with the joy of a birdwatcher adding a rare species to his Life List.

So what we need is a new social phenomenon: verbwatching. There would be books ("The Verbs of North America", "Geoff Pullum's Field Guide to Predicative Adjuncts", "Preterites of the Central Brazos Valley"), web sites, clubs, field trips, videos, ...

Well, it could happen. Seriously, although there are excellent books on English Grammar, I don't know any that entirely solve the access problems sketched in this overview of field guides, so perhaps there is a niche for a Field Guide to English Grammar. And someday, the editors of intellectual periodicals will have learned enough from their field trips to correct a Princeton professor who submits a piece identifying robins as warblers.

[Update: Linda Seebach emails

Heh. Try Googling "passive tense."

The presence of some form of "to be" isn't even necessary. I used to be on a listserv for writers, all professionals and many highly regarded -- the person who organized it, Jon Franklin, has won two Pulitzers -- and he, as well as another list member who taught journalism, were both convinced that "she looked sad" was a passive sentence.

Well, that comes under the "subject is not an agent" heading, I guess.

For other examples, see this earlier post and those it links to.]

Posted by Mark Liberman at 12:10 PM

August 05, 2005

Tossing technical terms around

If you're going to wield technical terminology in a critical way, you really should know how to use it correctly. Case in point: William Howarth's critique of Rachel Carson's writing in Under the Sea Wind, in his article "Turning the Tide: How Rachel Carson became a woman of letters" (The American Scholar, Summer 2005, p. 46). But first, take a look at the passage from Carson, below, decide what a copy editor might challenge in it, and describe these problematic stylistic choices using appropriate technical vocabulary. Then you can compare your account with Howarth's.

As long as the tide ebbed, eels were leaving the marshes and running out to sea. Thousands passed the lighthouse that night, on the first lap of a far sea journey--all the silver eels, in fact, that the marsh contained. And as they passed through the surf and out to sea, so also they passed from human sight and almost from human knowledge.

Now here's Howarth's critique:

The flaws that a copy editor might challenge here--passive gerund ("were leaving ... and running"), pointless aside ("in fact"), and closing fragment ("And ...")--are less glaring than the dark tone, as embarkation becomes not hopeful but an emptying of the marsh womb, with the sea viewed as an alien future.

There are three technical terms (of grammar and usage) here: "passive", "gerund", and "fragment". The first two are flat wrong. Expressions like "were leaving" are progressive, not passive, and the -ing-form verbs like "leaving" in them are labeled participles in many manuals on English grammar and usage, but never, so far as I know, are they labeled gerunds, a term usually reserved for Poss-ing ("I'm tired of their complaining") and Acc-ing ("I'm tired of them complaining") constructions, and possibly action nominals ("The dissolving of parliament was a surprise") as well. Howarth was no doubt misled by the fact that the English passive ("They were left by their partners") and progressive ("They were leaving their partners") constructions share the auxiliary be, and by the fact that the things commonly labeled gerunds and participles are both uses (among a great many) of the -ing-forms of verbs. These are confusions of the sort that students stumble into when they try to memorize grammatical terminology without understanding what it's for. But they're inexcusable in a periodical published by Phi Beta Kappa.

On the stylistic issue -- whether Carson's past progressive ("were leaving ... and running") should have been edited to a simple past ("left ... and ran") -- I think there's plenty of room for argument. Carson's choice views the leaving and running out as extending continuously throughout the ebbing of the tide, and that seems to me to be a defensible way of framing the description.

Similarly, it's not obvious to me that the aside, "in fact", is pointless. Carson is telling us that all the eels from the marsh passed the lighthouse, and that there were a great many of them. She chose to package these items separately, with the observation about size leading and the observation about totality in a parenthetical. (She could have packaged them together -- "All the thousands of the silver eels that the marsh contained passed the lighthouse that night..." -- but that would have made for a pretty topheavy sentence.) What the "in fact" does is indicate a logical relationship between the two observations: not just a lot of eels, but the whole crop. Without the "in fact", the sentence is a mere inventory of observations, only implicitly related, and this is so whether the parenthetical comes late ("Thousands passed the lighthouse that night, on the first lap of a far sea journey--all the silver eels that the marsh contained") or early ("Thousands--all the silver eels that the marsh contained--passed the lighthouse that night, on the first lap of a far sea journey"). Somewhat better is: "Thousands of silver eels--all that the marsh contained--passed the lighthouse that night, on the first lap of a far sea journey." Even better would have been leading with totality rather than size: "All the silver eels that the marsh contained--thousands of them--passed the lighthouse that night, on the first lap of a far sea journey." Simply deleting the "in fact", however, doesn't improve Carson's sentence.

But maybe what Howarth is thinking of as the aside is not just the "in fact", but the whole parenthetical. I hope not, because the information that every single damn eel in the marsh set off on the journey has a lot of surprise value.

Now, to the presumed fragment. What we're dealing with here is the presumed proscription against beginning sentences with conjunctions that Mark Liberman has discussed here, as an instance of a zombie rule, with no basis in the practice of competent writers. That is not, however, the way Howarth frames things; he says we're talking about fragments. Two questions then: (1) are fragments bad? and (2) is Carson's last sentence a fragment?

On question (1): a lot depends on who you read. If you look at advice meant for novice writers, or advice in test prep books, you'll probably find sentence fragments unconditionally deplored, but if you look at manuals for college students, you'll probably find a more nuanced warning: sentence fragments are used by good writers, but you should be aware that you're choosing them and be sure that you're getting the effect you want with them.

On question (2): again, opinions differ, but, so far as I can tell, hardly any college manuals treat sentences beginning with and, but, or so as sentence fragments; a fair number don't even mention sentence-initial and, but, and so. I hope to take up sentence-initial conjunctions in a future Language Log posting, but for the moment it's enough to point out that many authorities wouldn't treat these as starting sentence fragments, and that in any case there's no reason to edit out sentence-initial conjunctions in the work of a writer like Carson, even in her early work for publication. I don't object at all to the "And" in her last sentence above. It provides a connection to what went before, without laying too much emphasis on this connection. It marks the end of a sequence of (three) events. And it initiates a parallel "and as... so also..."

Howarth seems to have wanted to find Carson's early writing inept -- he describes it as "semi-autistic prose" and speculates that this sort of writing might have resulted from a process of heavy revision, involving both Carson and her mother -- but in his zeal to fault the early Carson he runs off the rails himself.

By the way, a fair number of advice givers -- the famous Strunk & White among them -- would deprecate Howarth's conjoining of an AdjP with a NP in "becomes not hopeful but an emptying of the marsh womb". Some might find expressions like "marsh womb" and "alien future" to be too ostentatiously poetic for comfort. And what the hell is "semi-autistic prose", anyway?

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 08:31 PM

One world, how many bytes?

Victor Mair sent me some interesting observations about the slogan for the 2006 Beijing Olympics. The English version is "One World, One Dream", while the Chinese version is "tong2 yi1 ge shi4jie4, tong2 yi1 ge meng4xiang3" in pinyin, or 同一个世界 , 同一个梦想 in simplified characters, or 同一個世界 , 同一個夢想 in traditional characters.

Victor has interesting things to say about the source of the slogan (it was devised in English and then translated into Mandarin), the slogan's division into words, the history of the words, and so on. But early in his note he makes a quantitative comparison

Mandarin:	10 syllables, 8 words 75 pen(cil) strokes (traditional) / 58 (simplified)
English:	4 syllables, 4 words approximately 25 pen(cil) strokes

asks (what I take to be) a rhetorical question about it:

In cybernetic / IT terms, which is more economical? This is NOT even taking into account that there are only 26 letters of the alphabet to deal with, in contrast to at least 26,000 characters that have to be separately considered when determining memory size.

Now, to a first approximation, I reckon that the cost of text storage is now zero -- a compressed copy of all the text I've ever written is roughly the size of one high resolution digital photograph -- and so the answer to Victor's question may not matter very much, since a mere factor of two or three hardly matters in a situation where pictures, audio and video consume many orders of magnitude more storage than text does. However, I was still curious about the facts of the matter in a larger sample than just this one slogan. So I turned to the LDC's catalog of Chinese/English parallel text.

One of our offerings in that area is a body of United Nations documents. There are 7,070 pairs of documents. I believe that in most cases, the documents were written in English and then translated into Chinese. These are essentially plain text documents -- no mark-up -- and they are not compressed. The Chinese is GB encoded. The totals byte counts are:

Chinese: 54,640,469
English: 123,301,197

Now, the fact that English puts spaces between words, while Chinese does not, accounts for some of this difference. But in any case, the direction of the difference in bytes is the opposite of Victor's counts of syllables, words, strokes etc., and the magnitude of the difference is a factor of about 2.26.

In the Olympic slogan, the Chinese version is 11 characters, including the comma. Even encoded as 2 bytes per character, that's only 22 bytes. The English version is 20 characters -- at one byte per character, that's 20 bytes. This suggests that the slogan is not typical of other material.

Another database on which we can make comparisons is some material from Hong Kong. There are three subcorpora: the "Hansards", which are the parliamentary records; the legal code; and an archive of news stories. This data includes some formatting information, but it's largely the same in both languages. This table shows the disk usage in megabytes for the various subcorpora:

	Chinese	English	English/Chinese Ratio
Hansards	158.454	270.472	1.76
Laws	50.094	68.796	1.37
News	78.898	117.890	1.49

I'm not sure why the ratios vary so much, nor why they're all lower than the UN ratio (perhaps because these were written in Chinese and translated to English?), but they certainly all favor Chinese texts as being smaller than the corresponding English texts.

Several people have written in to ask about the size relationship once the files have been compressed. I wondered too, but didn't have time earlier to check. The results of a couple of experiements suggest that it reduces but does not eliminate the discrepancy in size. For example, the Hong Kong News corpus, put into a tar archive and compressed with gzip, is 33,535,939 bytes in English, and 29,291,135 bytes in Chinese, for a ratio of 1.14. This is smaller than 1.49, but it's not 1.

[Update: Xiaoyi Ma observes that the LDC parallel Chinese/English corpora in general amount to some 218M English words and 370M Chinese characters, or about 1.7 Chinese characters per English word. In terms of byte count, he gets the following English/Chinese ratios:

	text	gzipped
FBIS	2.27	1.41
Sinorama	1.95	1.19
UN	1.96	1.24

All FBIS and Sinorama text was translated Chinese to English, while 90% of the UN data was translated English to Chinese. The ratios again are variable, but clearly show that Chinese texts are smaller than the corresponding English texts, with the difference shrinking but not disappearing under compression.]

Posted by Mark Liberman at 07:45 PM

Finna and tryna

The future certainly isn't what it used to be. That's true in general, but I'm talking about English verbal morphology here. In response to my post on future on, Darryl McAdams emailed to point out that there's been a development of fixing to into finna, parallel to the development of going to into gonna (and want to into wanna). He observes that {"I'm finna"} has some 5530 ghits, and turns up examples like this Kanye West lyric:

I wanna tell the whole world about a friend of mine
This little light of mine and I'm finna let it shine
I'm finna take yall back to them better times
I'm finna talk about my mama if yall don't mind

For an example with with a subject that's not a pronoun, and with the copula deleted, there's this from Missy Elliott:

Missy finna spit this simply raw
Misdemeanor always make MC's feel small
Stick you on the table with a plastic cup
Say grace, then eat ya ass up

Darryl adds that "[a]A friend learned it in middle school (Apollo, in Hollywood, FL)".

This one was completely new to me, since fixing to isn't part of my dialect. However, I do use a contracted form of trying to that might be put into IPA as [ˈtɹɐj.nə], and seems to be represented in conventional English orthography as "tryna". This one comes up in recent song lyrics too:

You tryna wear my shoes
You tryna wear my clothes
You tryna be like me,
I'm tryna be like you bro,
What I'm really tryna say
You got to keep it all real

but I can testify that it's been a normal part of American English pronunciation for a long time, even though I don't recall ever having seen it written before I (just now) looked for it on line. I wonder why "gonna" and "wanna" have been standard non-standard orthography for so long, while "tryna" has lagged? Is it because the contraction is newer -- you couldn't prove that by me, I've used all of them from the cradle -- or because "tryna" is just orthographically weirder?

Posted by Mark Liberman at 09:47 AM

August 04, 2005

"on" time

My flight from Philly to Denver was a couple of hours late, I missed my connection to San Jose, and I'm waiting for the next flight. The good part is I had plenty of time to read Elmore Leonard's Mr. Paradise, which I bought to read on the plane. On page 332 (of the 2004 Harper paperback), I noticed a way of marking immediate future time, in the family of gonna, gone, I'ma, I'monna and so on, that's new to me. Well, really it's not, in the sense that I even blogged about it before, but I didn't understand what I wrote at the time.

The characters are Frank Delsa, acting lieutenant of Squad Seven, Homicide Section, Detroit Police Department, and Orlando Holmes, a drug dealer who killed three of his suppliers and cut one of them up with a chain saw. Delsa speaks first.

"You know who put the stuff on you?"
"Somebody close to me, his girlfirend's punk-ass brother. Is how it goes. But listen, I'm on tell you something, I was scared."

Google finds a discussion of "I'm on tell you" in an amazon.com reader review of Donna Tartt's "The Little Friend":

Tartt has written a novel with all of Faulkner's insights about the South in clear, enjoyable prose. She adds the element of likeable characters and believable women, both black and white. She has captured the language of the white "redneck" class: "on" is exactly how we say "going to," "I'm on tell you one more time."

Leonard's Orlando Holmes is an African-American living in Detroit. I've heard the form that this orthography represents, I think, though my own dialect's version of it is I'monna. At least I think they're equivalent. But Carrie Shanafelt characterizes I'monna as a "deep Southernism".

(And I should have seen the connection to "on", given Carrie's observation that "I'monna go run" is "more often 'I'monngo run'" -- but I didn't unpack the the run-together orthography and the doubled 'n'.)

Now, I'monna is not exclusively southern, since I use it, and I was born and raised in rural eastern Connecticut. However, I'll take Carrie's observation as evidence that the form is used in the south, wherever else it may show up. But then what's the difference between "I'monna tell you" and "I'm on tell you"?

Is this just a variant pronunciation, an instance of the sporadic loss of unstressed vowels in some (I think mostly rural) southern dialects? That's what Carrie's remark suggests.

Or have some speakers re-analyzed the form as a version of the spatial preposition on? That would be a sensible thing to do, but if it happened very often, I'd expect to see more hits for spellings like { "I'm on tell you"}.

Posted by Mark Liberman at 03:14 PM

August 03, 2005

Emphatic unreduction again

In my spare time, I've been continuing to chip away at the the-and-a reduction problem that Chris Waigl and I took up a little while ago. The question at hand is, when (and why) do the and a appear with unreduced vowels? One interesting answer is given in the psycholinguistic literature; but (I think) it's wrong, or at least it's incomplete. In fact, we've already seen some examples of phenomena that this theory doesn't cover, and in this post, I'll give some others. For extra bloggy relevance, I'll take some of the examples from an interview with Glen Reynolds, the Instapundit.

Here's the background of the problem, which you can skip if it's old hat to you. (Even if it's new to you, you might want to skip to the examples and come back to this list later.)

1. In standard dialects, English the and a are pronounced as IPA [ði] and [ej] -- sometimes symbolized orthographically as "thee" and "ay" -- when they are used in citation forms ("the word 'the' is spelled tee aitch ee") or when they are contrastively stressed ("it's *A* factor, but not *THE* factor").
2. In fluent speech, when followed by a word starting with a consonant, both words are usually pronounced with a schwa-like reduced vowel, IPA as [ðə] and [ə], sometimes symbolized in conventional spelling as "thuh" and "uh".
2. When fluently followed by a vowel, the is usually pronounced with a higher vowel, roughly the same as in the second syllable of slithy. In most American dialects, this is the same vowel quality as in a stressed monosyllable such as fee, and is sometimes symbolized in conventional spelling as "thee" ([ði] in IPA) In some British dialects, the vowel is somewhat lower, more like the vowel in fin or this.
3. In all dialects, when a is fluently followed by a vowel-initial word, the form "an" is normally substituted.
4. Some fraction of the's and a's, followed phonetically by a consonant, show the forms [ði] and [ej] instead of the forms [ðə] and [ə].
5. A fraction of pre-vocalic the's and a's show the schwa-voweled forms [ðə] and [ə] instead of [ði] and [ej].
6. When followed fluently by a "filled pause" (the various sounds usually written as "uh" or "um" or "ah"), the pronunciations [ði] and [ej] are usually but not always used. Note that this is expected for the (because the pause sounds are vocalic) but not for a.
7. When followed by a disfluent pause, all of the various alternative forms occur, though [ði] and [ej] are fairly common.

The question, again: why are he "full" forms [ði] and [ej] (thee and ay) sometimes used where the standard rule would say that the "reduced" forms [ðə] and [ə] (thuh and uh) should appear?

For the case of the, Jean Fox Tree & Herb Clark gave an answer in their 1997 Cognition paper "Pronouncing 'the' as 'thee' to signal problems in speaking" (Cognition 62 (1997) 151–167). The answer is telegraphed by the title; as David Beaver summarized it in an earlier Language Log post, people "use the full form when they can't figure how to say whatever the hell they want to say next". It would make sense to extend the same model to the pronunciation of a as well.

But we've seen several examples already that don't seem to fit that mould. In one post, for instance, I cited FDR's use of unreduced a five times in his famous "infamy" speech, along with one use of unreduced the, all six cases in fluent performance of a prepared speech, with no signs of compositional or reading difficulty. And in another post, I cited a single non-reduced a in voice-over by George Vecsey, which struck me as expressing a rhetorical underlining of the following noun phrase, not any compositional problem.

A real solution to this sort of problem requires careful compilation and statistical analysis of many examples from many sources, with audio as well as transcriptions available. However, there's some initial value in looking at anecdotal clips, if only to get a sense of the range of phenomena to be counted, and the aspects of the performances that might be relevant.

The next few examples come from Chris Lydon's 2003 interview with Glen Reynolds. In the portion of Reynolds' speech that I've transcribed, 7 of 97 phonetically preconsonantal the's are unreduced ([ði] rather than [ðə]), and 5 of 74 phonetically preconsonantal a's are unreduced ([ej] rather than [ə]). These rates are fairly typical of what we've seen in other cases. The point here is not the rate of unreduction, but its context.

Chris Lydon opened the interview with this long-winded question:

Let me just say, you know when I was in school, my idea of a god of journalism was Walter Lippmann, he had lunch at the Metropolitan club every day, talked to big shots, and then well sometimes talked to them at home next to the National Cathedral there in Washington and ((then)) he turned out these beautifully phrased short essays for American newspapers twice a week, distilling the mood and the mind of Washington, and of course shaping it. Uh today, the Walter Lippmann is a University of Tennessee law professor with a thing about guitars and Mazda sports cars, uh who's reading hundreds, maybe thousands of web sites all the time, and cuing the rest of the world to where the good stuff is. I want you to tell me how the world created this monster "Instapundit".

Glen Reynolds' answer began:

uh monster's probably the¹ right word. [audio link]
um
I am uh hardly in the² Walter Lippman category, uh about all I can say is that my rate of fire exceeds his [audio link]
um
but uh but that's about all.

Professor Reynolds' first "the" is pronounced [ðə], as we expect in before a consonant; but the second one is [ði], despite the fact that it's produced in fluent sequence with the following consonant-initial word "Walter". Furthermore, "Walter Lippmann" is hardly new information, since the full form of the name was used twice in Lydon's question, just a few seconds before, and it's not very credible that Reynolds is having trouble remembering it. Nevertheless, Reynolds seems to want to emphasize it a bit, and he accomplishes this in in part by the non-reduction of the preceding "the". While he's speaking deliberately overall -- his verbal "rate of fire" in the interview as a whole is rather slow -- I don't hear (or see) any evidence in the prosody of any sort of phrasal juncture before "Walter".

Another example, about 13 minutes in, occurs when Reynolds is talking about future "horizontal models" of journalism:

and I suspect we will see that sort of thing grow, as the¹ software gets better and as the² network gets larger. [ audio link]

Here both of the the's are unreduced, without any indication that Reynolds is having any trouble fetching the words "software" and "network".

In the other direction , there are several examples in the interview of Reynolds' using reduced (though elongated) articles in front of quite long pauses-for-thought, for example at about 2:50 of the recording:

uh though I'm not sure that the
tone is all that different
than it would have been if I had a couple of hundred readers
because for me the experience is the same. [audio link]

There's a 980 msec. pause between "the" and "tone", and the vowel of "the" is 340 msec. long, but "the" is still pronounced [ðə]. The word "tone" is new to the interview, and Reynolds appears to be giving himself a "think-pause" before choosing it, but the prepausal article is still produced with a schwa.

That's not to say that disfluency and uncertainty are never relevant. However, it's neither a necessary nor a sufficient condition for unreduction. We already know that uncertainty (real or feigned) is not an essential ingredient in unreduction, because of citation-form and contrastive pronunciations. What we're adding here is the idea that there's a species of article-unreduction that is mainly about vocal underlining of the following word or phrase. (There's clearly another form of article-unreduction as well, a sort of reading pronunciation that can occur even in quite fluent reading from some speakers, especially those that are less well educated.)

Sometimes when Reynolds uses unreduced articles, it does indeed seem plausibly to be linked to uncertainty about what to say next. Here's an example from about 7:20 of the interview:

uh in fact I got an email just today about that, I had linked to the¹ blog of a² military guy in Iraq
uh named L T Smash, that's not his real name, I
actually know his real name, but
but he blogs anonymously [audio link]

Here both the "the" and the "a" are unreduced; there are no silent pauses or overt disfluencies, but Reynolds slows down as he thinks about how to describe Lt. Smash and his blog, perhaps inhibited by the problem of internally swapping the pseudonym for the true name.

Switching away from Reynolds for a moment, here's another example of emphatic unreduction of a, from NASA's 7/29/2005 Mission Status Briefing. Phil Engelauf is answering a question from the AP's Marcia Dunn, about 17 minutes into the briefing. I've divided (this small piece of) his answer into breath groups:

There has been some discussion about whether or not we might send the crew
to uh take a close look at or remove one of those gap fillers that's protruding
uh that is a¹ very very preliminary discussion at this point, ((it-)) we've been sort of asked to uh [audio link]
take a look at what the impact of doing that would be
uh I don't think that there's a consensus that that's required yet
it's really just a-² a preliminary "what if" discussion [audio link]

Case 1 is "a" pronounced [ej] without any pause or pseudopause and without any indication of disfluency or uncertainty. Nor is the following word technical or rare or hard-to-understand -- it's just plain old very, somewhat emphasized. In my opinion, this is basically the same phenomenon as the unreduced a in George Vecsey's comment that Lance Armstrong "goes out as a great champion with a clean record".

Case 2 is "a" followed by a short pause and a repetition. Despite the disfluency and the speaker's clear momentary uncertainty about how to go forward, "a" is pronounced [ə] here.

And for another interesting bit of anecdotal phonetics, this time from Britspeak, here's another example that I heard this morning as I was writing this post (from the BBC Newshour 8/3/2005 12:00 GMT edition, about 48 minutes into the hour). Former Ford president Sir Nick Scheele is being interviewed:

BBC:	... the American car companies are terminally uncompetitive, aren't they?
Scheele:	ah th- they have a huge problem there is no question that the health care cost problem allied to declining profitability is causing a major squeeze -- however I think to say that this is terminal is m- a vast exaggeration. [audio clip]

In this case, "a major squeeze" has [ej] (or really in this case more like [e]), while "a huge problem" and "a vast exaggeration" have[ə]. The three phrases "huge problem", "major squeeze" and "vast exaggeration" are all reasonable candidates for being underlined, while the only one of them near a disfluency is "vast exaggeration". So the emphatic unreduction theory can't claim any sort clean sweep here; but the think-pause theory doesn't help at all. In fact, these few data points might make you think that Sir Nick has some sort of vowel harmony thing going on...

In a later post, I'll take a critical look at the details of the Fox Tree & Clark paper. In particular, I'll look at their finding that

"About 20% of the time, speakers continue after THIY without further disruption, apparently able to repair the problem in time. But about 80% of the time they deal with the problem by pausing, repeating the article, repairing what they were about to say, or abandoning their original plans altogether"

which was based on counts made from the transcriptions in a British speech corpus for which audio was not available to them, but seems quantitatively very far away from the numbers that we've been seeing in material for which we have the audio. It's hard to tell whether this is because of dialect differences or because of some sort of transcription bias.

Posted by Mark Liberman at 03:08 PM

On making stuff up

Mean media metaphor of the month: Jack Shafer's judgment on Judge Richard Posner's essay "Bad News":

Maybe Posner should stop composing his essays with a paint roller and switch to a Sanford Uniball Micro.

Courtesy aside, Shafer's criticisms are reasonable ones: Posner's piece links broad-brush conventional wisdom about lowered barriers to entry with mostly-unsupported assertions about increased sensationalism and polarization. However, Shafer ends his critique with an astonishing and gratuitous piece of quantitative idiocy, which significantly undermines his whole "let's draw rational conclusions from documented facts" stance.

First, let's set the stage. Here's Judge Posner's conclusion:

Thus the increase in competition in the news market that has been brought about by lower costs of communication (in the broadest sense) has resulted in more variety, more polarization, more sensationalism, more healthy skepticism and, in sum, a better matching of supply to demand. But increased competition has not produced a public more oriented toward public issues, more motivated and competent to engage in genuine self-government, because these are not the goods that most people are seeking from the news media. They are seeking entertainment, confirmation, reinforcement, emotional satisfaction; and what consumers want, a competitive market supplies, no more, no less. Journalists express dismay that bottom-line pressures are reducing the quality of news coverage. What this actually means is that when competition is intense, providers of a service are forced to give the consumer what he or she wants, not what they, as proud professionals, think the consumer should want, or more bluntly, what they want.

This is a plausible story, but as Shafer observes

The authentic media maven understands that newspapers have been "dying" since the advent of radio in the 1920s, with the number of titles dwindling steadily with the rise of every new media (television, cable, the Web) and their share of the audience shrinking.

(A linguistic aside: note that media, like data, is now firmly singular in general usage...)

Shafer persuaded me that Posner's essay combined fuzzy thinking with factual carelessness. But Shafer's take-down makes an astonishing claim in its conclusion, a quantitative assertion that a few seconds of common-sense reasoning will show to be several orders of magnitude off.

Posner reveals the sort of rigor he applied to this piece of hackwork in his conclusion, where he notes that a survey by the National Opinion Research Center recorded the public's confidence in the press declining from 85 percent in 1973 to 59 percent in 2002 "with most of the decline occurring since 1991." He writes:

So it seems there are special factors eroding trust in the news industry. One is that the blogs have exposed errors by the mainstream media that might otherwise have gone undiscovered or received less publicity. Another is that competition by the blogs, as well as by the other new media, has pushed the established media to get their stories out faster, which has placed pressure on them to cut corners.

How could blogs have played any role in eroding public trust by 2002 when almost nobody in the mainstream had heard of them? The press loves to seize on new trends, especially techno-trends, but the word "blogs" doesn't appear in a Nexis search of all U.S. newspaper and wire stories until 2000, when it was mentioned in 22 stories. In 2001, the word appeared in 67 stories. In 2002, the concluding year of the survey cited by Posner, it appeared in 359 stories. That's too few by a factor of about 100,000 to have had an impact on the public's view of the press.

Does Shafer really mean that for blogs to have an impact on the public's view of the press, the word blogs would have to appear in about 359*100,000 = 35.9 million newspaper and wire stories within a calendar year?

The version of Lexis-Nexis that I have access to won't give me a response if the size of the set returned is greater than 1,000. So as a proxy, I tried single-month searches, with the results as follows. All searches were done on Lexis-Nexis Academic, in the category of "General News", source "Major Papers", search terms "blogs" in "Full Text".

	March	April	May	June	Sum March-June	Full year	Shafer's counts
2001	1	10	7	4	22	50	67
2002	8	19	15	24	66	205	359
2003	78	57	58	66	259	883
2004	119	129	201	170	619	?
2005	524	539	619	673	2,355

(I guess that Shafer has access to a "media pro" version of Lexis/Nexis that indexes a somewhat larger set of sources -- but his counts are within a factor of 2 of mine, and the exaggeration we're talking about involves a factor of 1,000 or so.

Shafer's basic point against Posner is obviously correct. To attribute to the influence of blogs something that happened over the period 1991-2002 is preposterous. But in his excess of indignation, Shafer does something that Posner doesn't -- he pulls a specific number out of nowhere that is roughly three orders of magnitude too large. Here's a reprise of this bit of froth:

In 2002, the concluding year of the survey cited by Posner, it appeared in 359 stories. That's too few by a factor of about 100,000 to have had an impact on the public's view of the press.

Again, 359*100,000 = 35.9 million. My Lexis-Nexis count for blogs in the March-June period of 2002 is 66, and for the same period of 2005 it's 2,355. That's an increase by a factor of 35.7, which is way less than 100,000. It's 2,801 times less, to be precise.

One way to read this is that blogs are not yet having an impact on the public's view of the press, and won't do so until there are 36 million newspaper and newswire stories a year that include the wordform blogs. But surely this is not what Shafer means. If that's the criterion, there can't be many developments that actually do have any impact on the public's view of anything. I mean, it might have seemed like there were 36 million stories about Michael Jackson last year, but there weren't -- checking Lexis-Nexis for "Michael Jackson" in June of 2004 turns up a mere 319 stories...

No, I think Shafer just pulled a big number out of the air. It wasn't a number based on careful sociological studies of the impact of media on public opinion, and it wasn't even a number that Shafer bothered to evaluate for common-sense plausibility. It was just a big-ass number. So if I were Richard Posner, I'd offer to stop writing my essays with a paint roller if Jack Shafer agrees to stop doing arithmetic with his rear end.

[Update: a couple of readers have suggested that maybe Shafer meant that the number of stories in 2002 was too low by an additive increment of about 100,000, not a (multiplicative) factor of 100,000 -- 359+100,000, not 359*100,00. Frankly, I don't see any evidence that he gave the matter enough thought to distinguish those two cases. In any event, this would be contrary to the ordinary-language meaning of the word factor, e.g. "A quantity by which a stated quantity is multiplied or divided, so as to indicate an increase or decrease in a measurement", as the American Heritage Dictionary puts it. And if you're beating up on someone for sloppy thinking, careless writing and poor factual support, and you want to avoid charges of hypocrisy, this is not a good mistake to make.

Even an additive increment of 100,000 is probably hyperbole, since extrapolation from my 4-month Lexis-Nexis counts for 2005 suggests fewer than 10,000 stories in major newspapers containing the word "blogs" this year.]

Posted by Mark Liberman at 07:15 AM

August 02, 2005

Illustrations

Get Fuzzy for 7/29/2005 illustrates creativity with quotations:

And the 8/01/2005 strip exemplifies "What is this 'snowclone' of which you speak?"

[links via Ben Zimmer]

Posted by Mark Liberman at 07:30 AM

Google gods: please make the * shine again!

Once upon a time, on a world wide web far, far away, Google wildcards seemed to work pretty well. Must have been all of a year ago. Now, the Google gods must be crazy. Consider this: a search on any pattern with "* X Y", now matches strings of the form "X * Y", and most of the latter are included in the count estimate.

Example: "whether nobler in the mind to suffer" produces (until Google indexes the current page!) 0 hits. Eminently reasonable, since Hamlet never said that. On the other hand, he didn't say "* whether nobler in the mind to suffer" for any choice of word for the * either. But that produces 15, 500 hits, only 500 less than "whether * nobler in the mind to suffer", and a few thousand more than "whether tis nobler in the mind to suffer".

And no, the * does not merely hop over one word. It jumps anywhere into a string. More than once. The search "whether tis in the mind to suffer" gives no hits, but "* whether tis in the mind to suffer" produces 14,900. And "* whether tis nobler in the mind suffer" produces 14,700. And "* whether tis in the mind suffer" gives 14,900, although without the * we get none.

Heck, let's try to pretend this is a feature rather than a bug. "To be, or not to be: That is the question:-- Whether tis nobler in the mind to" produces 3440, which is plausible. "To be, not to: That is question:-- Whether nobler in mind to", which has every third word removed, produces 0, which again seems fair (until this post is indexed). "* To be, not to: That is question:-- Whether nobler in mind to", which is the same quote with a *, produces 7690. And "* To be, not to: That question is:-- Whether nobler in mind to", which is just like the 7690 search but with the order of two words swapped, produces 0 again. So sticking a star at the start of a quoted string will tell you whether that sequence of words occurs on the net in that order with any combination of single words stuck in between. But I can't really turn this into something useful. If you leave out pairs of words, weird stuff happens: you only get a tiny fraction of the results. "* To be, not to: That is :-- Whether nobler in mind to" gives 16 hits, apparently full Hamlet quotes. And I tried once taking out three words ("* To be, not to: That Whether nobler in mind to") and got zero hits. I'll leave you all to experiment.

By the way, you can put the star elsewhere in the string and get similar results but I think there's a proviso: as well as any number of extra words before or after the location of the star, there must be a match at the location of the star. Thus "To be, not to: That * is question:-- Whether nobler in mind to" has one hit, and the match is: "To be, or not to be, that is the question;/ whether 'tis nobler in the mind to suffer.", so * matched "/" or ";/". However, "To be, not to: That is * question:-- Whether nobler in mind to" gives 8040 hits, most of them presumably where * matches "the". I'll leave you all to experiment even more.

I want to be able to make linguistic claims based on web counts, and wildcards allow me to get at really thorny data amazingly quickly. But I cannot trust the wildcard results any more. Let us all pray to the Google gods that one day we shall return to that land of innocence we knew a year ago, a far off place where the * shone and it never rained on the linguists' parade.

Here's an index of past LL posts on Google count problems:

Pass the hát.

Type twice for truth?

August 01, 2005

Mordac visits Geoff Pullum

Inspired by Scott Adams' favorite Language Log posts:

Well, those might be his favorite posts, if he reads Language Log...

Posted by Mark Liberman at 08:53 AM

Anxious and pleistocene musings

That's what Karen G. Schneider at Free Range Librarian calls Michael Gorman's interview with Josh Sanburn of the Cox News Service. Gorman, you remember, is the new president of the American Library Association, who did so much to inspire Jean-Noël Jeanneney's campaign against

"that throbbing anxiety for anything and everything, scattering knowledge like dust", characteristic in his view of Google's project, "which the president of American libraries" -- Michael Gorman -- "has so persuasively and disturbingly denounced"

(as Le Monde put it). Now Gorman is (quoted as) rallying the troops to keep "The Education of Henry Adams" from being digitized:

"It's a kind of foolishness to say that just because you want to digitize the Oxford English Dictionary and the Yellow Pages, therefore you should be digitizing a biography of Henry Adams," he said.

News flash, Mr. Gorman: it's too late.

Meanwhile, Ms. Schneider has been wondering "Why am I not as famous as Stephanie Klein?", complaining (perhaps too politely for someone who aspires to notoriety) about the lack of "kicky phrases" and high-quality one-liners in the links we send her way. She explains:

O.k., maybe I do see why this blog has not led to fame, a New York Times article, or a book deal. But I can change, starting today!

First, let me adopt a more au courant writing style. No more biblish, no more tiresome polysyllabic nonsense, no more mundane middle-class mutterings. From now on, in the words of Ms. Klein, "Yeah, right. Okay. Whatever." No more talk of buying sports bras at Target (though mind you, I did finally settle on the two-for-$8 deal and I like these bras better than much more expensive over-the-shoulder-boulder-holders I have purchased in the past. See how casual I can be?). No more free verse. No more discussion about the American Library Association. And many more kicky phrases, such as "I love etymology almost as much as karaoke." (Why can't Language Log come up with one-liners like that?) Not to mention Klein's soliloquy to her date that made my toes curl with envy: "I just spent half a day telling you, communicating with you, saying things that were really hard for me to admit, and then, you apologize, say it won't happen again. Then, BAM! You pull a fcuking Emril on me."

Then--let's get to why people really read Klein's blog--there's the sex and the other lurid personal details (because it certainly isn't the writing, and is this what Barnard turns out these days?). Yes. As soon as Sandy comes home this afternoon I will ask for her permission to write about our sex life, past, present, future, and imagined. She is very supportive of my writing endeavors (oh dear; "endeavor" is not a very Klein sort of word) and I am sure she will agree that splashing our personal life onto this blog, where it will then have a digital half-life in perpetuity, is a reasonable exchange for my personal gain, particularly for a book that very important people will read for at least one season.

I know it awaits me: the celebrity, the book deal, the book jacket with the pink cover and the high heel and martini glass on it. It can be mine! I just have to--BAM!--change my tiresome ways.

Does it help for me to point out that a librarian ought to start with an advantage in reaching at least some segments of the American reading public, as documented in Dan Lester's scholarly study The Image of Librarians in Pornography? No, I thought not. Well, I'll work on those kicky one-liners.

Posted by Mark Liberman at 12:37 AM

Welcome to the NFL

That's the message at the top of the National Forensics League's home page. In response to my post on dramatic license at the Globe Theatre, Ryan Miller wrote that

...high school thespian competition on a national scale in the United States is under the auspices of the National Forensics League whose rules are the following:

1) Any amount of cuts can be made as long as the original word order is not changed.
2) Up to 10% of the production by time can be words or phrases not actually present in the original.
3) The above changes must be consistent with authorial intent.

Well, Colin Hurley's version of what Shakespeare wrote for Thersites wouldn't pass muster with the NFL, since the original order was changed. But Peter's performance on the cell phone would be fine, since the message received was exactly what its author intended...

Posted by Mark Liberman at 12:32 AM

And every lion tongue cast down

The New Yorker has a well-deserved reputation for being carefully (if sometimes eccentrically) edited. As Tom Rossen pointed out to me today by email, however, something strange has happened on page 49 of the current issue. The scene is a gala dinner for Tom DeLay at the Capitol Hilton:

Finally, Tony Perkins, the head of the Family Research council, delivered a benediction. "Heavenly Father," he said, "we are here tonight to thank you for our leader, Tom DeLay. We thank you for him, and we want to pray for him and Christine," -- DeLay's wife. "We lift them up before you, and we ask that you put a shield around them. Father, we pray, your own word over them, that no weapon formed against them would prosper. Lord, that every lion tongue would be cast down. And we pray, Lord, that they will come out on the other side of this, servants more usable in your kingdom. [emphasis added]

[John Cassidy, "The Ringleader: How Grover Norquist keeps the conservative movement together", The New Yorker, August 1, 2005, p. 49]

I've got to assume that "lion tongue" is a slip of the ear for "lying tongue". The King James Version has 5 instances of "lying tongue", but none of "lion tongue". "Lying tongue" makes sense in the context, while "lion tongue" makes no sense at all. If there were any lions besetting Tom and Christine DeLay with their tongues over at the Capitol Hilton, John Cassidy didn't learn about them. At least he doesn't tell us, and you'd think that if he had, he would have.

Every tongue cast down is perhaps not the most coherent of images -- I see them draped over the landscape like Dali watches -- but extracting the tongues from every member of some relevant set of lions doesn't help. Google has 637 hits for {"lion tongue"}, but they seem to deal with the actual tongues of lions, which as I've said seem to be thin on the ground at the Capitol Hilton. In contrast, there are 22,100 hits for {"lying tongue"}, many of them in religious contexts similar to Perkins' benediction.

The error must have happened when Cassidy (or some underling) transcribed the benediction. There's no indication that Cassidy was was given Perkins' prayer in writing, if a written form ever existed; and if the phrase had come in writing from Perkins, I imagine that Cassidy would either have silently corrected it or added a sic.

And then Cassidy's transcriptional eggcorn made it through the New Yorker's copy-editing process. Not to speak of the famous fact checkers. But I doubt that even the New Yorker fact-checks prayer, so maybe this is a case where theory checkers would have been more advisable: "Mr. Perkins, I'm a theory checker from the New Yorker, and we're trying to make sense of those lions whose tongues you asked to be cast down. Can you offer any coherent story about just where these beasts are, and what they have against the DeLays?"

Posted by Mark Liberman at 12:27 AM