April 30, 2005


In reference to one of my recent posts, where the Bibliothèque Nationale de France was abbreviated as "BNF", Mike Albaugh complained that

BNF still means Backus/Naur Form to me, at first glance. (or Normal :-)

OK, me too. But in interpreting abbreviations, you've got to consider the context, as an Australian traffic court judge recently explained to the author of a clever new defense against a speeding ticket:

An early contender for the 2005 Nice Try But No Cigar award goes to Carl Ross La Riviere. Yesterday in the District Court La Riviere was avidly defending himself against a speeding charge, even though the Crown appeared to have irrefutable proof: a photo of his car travelling 56 "KPH" in a 40 "KPH" zone.

La Riviere presented the court with the National Measurement Act and Regulations to prove these initials did not mean what everyone thinks they do. His argument: K stands for Kelvin, a measurement of thermodynamic temperature; P for Poise, a measurement of viscosity; and H for Henry, which measures electric inductance.

Entertained but unmoved, Judge Megan Latham agreed the abbreviation might be illogical or incorrect, but had to be seen in context, which clearly suggested it related to a vehicle speed and meant kilometres per hour. La Riviere's sentence - a good behaviour bond - was upheld.

Mike also wondered about the relationship between plan calcul and Plankalkül,

...the names given to the French national computing initiative (circa 1966) and Konrad Zuse's proposed algorithmic language (or notation :-) (circa 1946).

When I pointed him to the entry in the French Jargon File that says there's "aucun rapport" ("no relationship"), Miked answered:

Well, that's just what they would say, isn't it? :-)

and (returning to BNF) suggested that it's

Time to revive SAFEBAGEL (Scientists Against Far-out, Extensive, Burdensome Acronyms Getting Entrenched in Language).

Scientists are the worst offenders. A couple of years ago, I wrote a little program to find acronyms in the MEDLINE corpus. There are lots of them -- my not-very-smart program found more than 78,000 distinct acronym/definition pairings, many of which occurred many times. Thus GM-CSF was defined 2,401 times as "granulocyte-macrophage colony-stimulating factor", but was also defined by 150 other strings. In this case, these are basically all variant forms of the same term (including a shocking number of typos -- it seems that biomedical journals are not always very well copyedited) -- see this page for the complete list of variants, each preceded by the number of times my program found it as a definition for GM-CSF in MEDLINE.

There were also plenty of acronyms whose definitions were not just different versions of the same term. For example, ABA was variously abscisic acid, Agaricus bisporus agglutinin, aminoalkyl-iodobenzamides, aminobenzamide, aminobenzanthrone, anti-biotin antibody and azobenzenearsonate. With a bit of extra context, ABA could be part of I-ABA (Iodo-4-aminobenzyl adenosine), PABA (para-aminobenzoic acid or pyridylamino butylamine), and hundreds of other things.

For most Americans, ABA is the American Bar Association. Down under, it could be the Australian Broadcasting Authority. For others, it might be the Association for Behavior Analysis or the American Board of Anesthesiology or the Antiquarian Booksellers Association. It shows my age that for me, ABA will always be first and foremost the American Basketball Association.

Life, like language, is ambiguous. Without the effect of context, referential communication would hardly ever succeed.

Posted by Mark Liberman at 11:51 AM

A tale of two media

You've probably read about or heard about Glenn Wilson's study allegedly showing that email lowers IQ more than marijuana does, and you probably even remember the amount of alleged damage (10 points) and the alleged explanation (the cognitive wear and tear of jumping around among many topics developing in parallel). Now you have a chance to read Steven Johnson's argument in the NYT Sunday Magazine that "Watching TV Makes You Smarter", and evaluate his proposed explanation:

Think of the cognitive benefits conventionally ascribed to reading: attention, patience, retention, the parsing of narrative threads. Over the last half-century, programming on TV has increased the demands it places on precisely these mental faculties. This growing complexity involves three primary elements: multiple threading, flashing arrows and social networks.

In other words, for Johnson, jumping around among many topics developing in parallel is a "cognitive benefit".

It's certainly possible that dealing with the multiple email threads linking your own social network makes you stupider, while dealing with the multiple TV-show threads linking Tony Soprano's social network makes you smarter. On the other hand, a more parsimonious explanation is available: both Glenn Wilson and Steven Johnson are blowing smoke.

Let's look at the evidence.

You probably don't remember anything about Glenn Wilson's evidence for the effects of email on IQ, except that "a study showed" it, because he hasn't presented any. Not even a sketch of how the experiment was done has appeared in any of the stories that I've read, and searching several databases of scientific information leads me to conclude that no details have so far been published anywhere at all, not even in the most obscure psychometric journal. It's possible that no details will ever be published, because this was apparently part of a study privately commissioned by HP, and its author has a history of hopping around from topic to topic himself: political psychodynamics, sex differences, love addiction, and the psychology of bubble baths, among other things.

As I suggested in the blog entry linked above, there could be lots of confounding factors in an experiment on this topic. In fact, it's not easy to see how to design an experiment on the cognitive effects of email that doesn't have serious confounds. But instead of a series of carefully-documented and well-controlled experiments, we've got a single small experiment, documented only by a press release that doesn't even sketch the experimental design, and carried out by a psychiatrist whose previous work strikes me as higher in topicality than in scientific rigor.

What about Steven Johnson's evidence that TV makes you smarter? Well, Johnson is a writer of popular science books -- his last book was a tour of neuroscience called Mind Wide Open, and the NYT piece is adapted from his forthcoming book ''Everything Bad Is Good for You: How Today's Popular Culture Is Actually Making Us Smarter.'' So he's wearing his bias on his sleeve, so to speak -- we can assume that he's looking for a good story that will sell books, not seeking the truth in a careful and dispassionate way.

Still, in contrast to Wilson's press release on email and IQ, which was basically just a guy in a metaphorical white coat pushing the media's buttons, Johnson actually presents evidence and makes an argument. His evidence and arguments are all about developments in modern culture, specifically the design of TV shows, which have been getting more complicated in specific ways that he describes. His argument about the psychological effects of these cultural changes is a pretty weak one -- basically, he just asserts that more complicated experiences must make you smarter than simpler ones do. You could make the same argument about email.

Nevertheless, Johnson does actually present some supporting evidence. The part I liked best was the way he represents the plot of TV shows, as a sort of checkerboard graph in which "the vertical axis represents the number of individual threads, and the horizontal axis is time." Here's his graph of an episode of The Sopranos:

I'm not sure whether this sort of plot graph ("plot plot"?) is Johnson's invention -- he doesn't credit it to anyone else -- but I haven't seen it used before. I'd think that graphs like this would be a natural starting point for critical study of story-telling techniques, and it's easy to think of all sorts of interesting measures to derive from them. They should apply well to TV dramas like The Sopranos, where I imagine that the writers have such displays in mind as they plan episodes and seasons, and maybe even use something like them explicitly. The notion of "thread" may be harder to define for the plots of some other genres, where more of the structure is in the evolution of individual narrative strands than in the way the strands are woven together.

Anyhow, Johnson's plot plots impressed me, but his use of them didn't. He supports his generalizations with examples, without demonstrating (other than by assertion) that the examples are typical; some of the crucial cases are what we might call "generic examples", unsupported claims about typical examples of a type; and it turns out that some of the crucial aspects of his examples are not actually exemplified in the specific cases that he presents. This is normal and reasonable for journalism, but Johnson is presenting an original argument, not reporting on someone else's scholarship.

He asserts that the complexity of TV dramas has developed in four stages, of which The Sopranos is the culmination. He describes the first two stages this way:

Draw an outline of the narrative threads in almost every ''Dragnet'' episode, and it will be a single line: from the initial crime scene, through the investigation, to the eventual cracking of the case. A typical ''Starsky and Hutch'' episode offers only the slightest variation on this linear formula: the introduction of a comic subplot that usually appears only at the tail ends of the episode ...

and he depicts the "typical" episode of Starsky and Hutch like this:

The next stage is represented by the plot of a specific Hill Street Blues episode:

I'm sure that Johnson is describing a real trend, but it bothers me that the first two stages in his claimed evolution are not supported by any specific facts at all, and the change from Hill Street to The Sopranos turns out not really to be about the number of parallel plot threads at all, but about some more subtle developments:

The total number of active threads [in "The Sopranos"] equals the multiple plots of ''Hill Street,'' but here each thread is more substantial. The show doesn't offer a clear distinction between dominant and minor plots; each story line carries its weight in the mix. The episode also displays a chordal mode of storytelling entirely absent from ''Hill Street'': a single scene in ''The Sopranos'' will often connect to three different threads at the same time, layering one plot atop another. And every single thread in this ''Sopranos'' episode builds on events from previous episodes and continues on through the rest of the season and beyond.

Again, I'm sure that there's some truth here, but I can remember plenty of examples of "chordal storytelling" and cross-episode continuity in Hill Street Blues. And I'm sure that Dragnet and Starsky and Hutch had a very different plot layout from current shows, but it'd be nice to see at least one specific example rather than an assertion about what is typical. For all four stages, it'd be even better to see an argument based on analysis of a reasonable sample of shows. Overall, this is the kind of argument from assertion that often establishes as conventional wisdom a proposition that turns out to be completely false when someone finally gets around to checking it.

I was going to start the conclusion by writing "If Johnson were a scientist...", but that's misleading. This is not about science vs. the humanities, or even about good science vs. bad science. It's about rational investigation.

Everyone these days seems to believe that modern life is complex, fragmented and disjointed, in comparison to the life of the past. And most people have some kind of opinion about the effects on our attention span, our intelligence, our culture, our politics, our morals. Given how important this is, you'd think that people would look at it in a serious way, rather than just marshalling stereotypes and counter-stereotypes in rhetorical parade.

It's unfair for me to complain so much about Steven Johnson's article. He makes a clear and interesting argument about the evolution of modern mass culture, leading to a contrarian conclusion, and he supports it with a large number of interesting examples. I wish that more contemporary literary critics did this sort of thing as well as he does.

But why didn't Johnson consider doing this kind of investigation with some scholarly (or scientific) care? Alternatively, why hasn't someone else done this, so that Johnson could base his popular book and articles on a solid foundation of fact rather than a flimsy scaffolding of anecdote and rhetoric? Oh, I know, it's because the fragmentary and disjointed nature of modern life has left us without the attention span required by scholarship and science. Or wait, I guess it's actually because the experience of modern complexity has made us smart enough to transcend the plodding path of scholarship, leaping to valid conclusions in a cyberintuitive blink. One or the other, anyhow: whatever.

[Update 9/25/2005: for the truth about the experimental design of the "email lowers IQ" studies, and an apology for blaming the media's excesses on Glen Wilson, see this post.]

Posted by Mark Liberman at 09:39 AM

April 29, 2005

Linguistic candidate coverage

Last month on phonoloblog, Bob Kennedy commented on an LA Times story on the difficulty local voters have pronouncing LA mayoral candidate Antonio Villaraigosa's name. Last weekend, there was another story about Villaraigosa and linguistic difficulty in the Times, this time about Villaraigosa's apparent difficulties with Spanish. This article is interesting for purely sociolinguistic reasons -- like many Latinos in the U.S., Villaraigosa's Spanish was not exactly aided along during his early years, with largely predictable effects on his competence in the language -- but I wonder what effect this sort of "candidate coverage" has on the average LA voters who may read it. Will any of them judge the candidate, positively or negatively, on either of these completely irrelevant linguistic grounds?

[ Comments? ]

Posted by Eric Bakovic at 10:50 PM

Crisis ≠ Danger + Opportunity

A few months ago, Mark Swofford at Pinyin.info posted Victor Mair's terrific essay debunking the "widespread public misperception ... that the Chinese word for 'crisis' is composed of elements that signify 'danger' and 'opportunity'. As a result, I can link to it, in response to an email from Robert Neal Baxter, who quotes from an article by Xavier Queipo on a Galizan news and opinion site:

Os chineses, na súa lingua de ideogramas, non teñen un ideograma específico para designar o concepto "crise" e recorren a unión de dous ideogramas, o que representa "riscos" e o que representa "oportunidade".

Chinese people, in their ideogram language, don't have a specific ideogram to refer to the concept 'crisis', resorting instead to joining together two ideograms which representing 'risks' and 'opportunity' respectively.

The cited article is of course not about Chinese at all, but about political issues in Galiza, and the author is just using this (false) linguistic cliche as a rhetorical framing device.

Robert doesn't know any Chinese, but (being well educated linguistically) he sees that nothing about this trope makes sense, and observes that

People really shouldn't just make stuff up as they go along about other peoples, cultures and languages just to suit their rhetorical or stylistic needs.

Indeed. Of course Queipo didn't make this up, in the sense of employing any creative invention. He just deployed a cliché. But someone once made this up, and people have been repeating it ever since, just like the nonsense about Eskimo snow words.

Robert continues:

What this shows, at best, is a profound misunderstanding of the way Chinese works. [...]

At worst, it reveals a journalistic willingness to exploit people's fears and ignorance about far-flung peoples with weird habits and customs and their corresponding willingness to believe any old bullshit that people make up about them. [...]

Would it be fair to assume that English has no word for what the French refer to as 'papillon', resorting instead to a compound made out of the words 'butter' and 'fly'. What would such a statement, even if it were linguistically valid - which it isn't - show about the language or the speakers of the language in question? Probably very little. In fact it's about as likely that a Chinese speakers using the word 'crisis' made up of whatever morphemes it happens to be made up of is to be aware of this secondary reading as an English speaker would be to think that butterflies are some sort of 'air-borne grease balls'.

Didn't Michel Foucault once point out that butterfly expresses a fundamental contradiction in anglophone culture: the libertarian urge to take wing, subverted by the consequences of a diet too rich in animal fat?

Posted by Mark Liberman at 08:28 AM

April 28, 2005

News flash: European national libraries are willing to take EU money

According to an AFP-based article in Deutsche Welle (4/28/2005),

In a stand against a deal struck by five of the world's top libraries and Google to digitize millions of books, 19 European libraries have agreed to back a similar European project to safeguard literature.

Nineteen European national libraries have joined forces against a planned communications revolution by Internet search giant Google to create a global virtual library, organizers said Wednesday. The 19 libraries are backing instead a multi-million euro counter-offensive by European nations to put European literature online.

The 19 signatories are "national libraries in Austria, Belgium, the Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Italy, Lithuania, Luxembourg, Netherlands, Poland, Slovenia, Slovakia, Spain and Sweden". Apparently the British National Library "has given its implicit support to the move, without signing the motion" (whatever exactly that means), and Cyprus, Malta and Portugal are expected to sign up as well.

This all started with a warning a couple of months ago by Jean-Noël Jeanneney, head of the Bibliothèque Nationale de France (BNF), that Europe faces the "crushing domination of America" in the cultural arena, and an initiative by Jacques Chirac to promote Jeanneney's proposal for a pan-European publically-funded competitor to Google Print.

It's not an enormous surprise that the national libraries are in favor of "a multi-year plan" with a "generous budget" to provide for them to plan, implement and deliver this service. And I sincerely hope that this turns out to be a success, as Airbus has been, and not another "plan calcul". This was badly conceived and badly implemented Gaullist plan to promote the French computer industry, 1966-1975, discussed in context here:

From 1965 on, General de Gaulle and the government devoted their attention to developing a national computer and communication technology industry. After blocking the acquisition by a giant US firm, General Electric, of what was at the time France's only computer company, Compagnie des Machines BULL, the government decided to create the Compagnie Internationale pour l'Informatique (CII) as part of its computer development plan or "Plan Calcul" (13 April 1967). In its early years, the company would enjoy "national preference" from users in the public and semi-public sectors.

The lightning pace of development in the field of computer science, however, doomed these efforts to failure.

This concern is not mine alone: an op-ed by Bertrand Le Gendre in Le Monde of 4/21/2005, discussing the Jeanneney/Chirac initiative, is entitled "Le plan calcul de la BNF" ("The plan calcul of the BNF").

Le gaullisme était particulièrement chatouilleux sur ce chapitre de la grandeur et longtemps il a fait illusion : paquebot France, supersonique Concorde, plan calcul. Trois fiertés nationales, trois gouffres financiers, trois échecs commerciaux, qui incitent à se demander si ce"Google à la française" dont la Bibliothèque nationale de France (BNF) serait le pivot n'est pas, lui aussi, un pari risqué.

Gaullisme was especially ticklish about this business of grandeur, and has had a long history of delusion: the ocean liner France, the supersonic Concorde, the plan calcul. Three objects of national pride, three financial sinkholes, three commercial failures, which lead us to ask whether this "Google French style" based on the BNF is not, also, a risky bet.

I see little reason to be confident that the 20-odd national libraries will be able to work together efficiently to plan and implement this massive digitization process, and to make the results available to the public in an effective way. There is likely to be a substantial communications overhead, and perhaps some issues of local technical competence. Of course, Europe has many highly skilled technical managers who could make a success of such a project, but I wonder if the politics of the situation will allow any of them to be given the opportunity to do it.

[Deutsche Welle reference via email from Benjamin Zimmer]

Other current stories: in Le Monde, the Inquirer (where the likely cost is underestimated by an order of magnitude), from the AFP wire in the Sydney Morning Herald, and (a generally-negative opinion piece on the Jeanneney/Chirac initiative as a whole, by Pierre Buhler from Sciences-Po in Paris) in the IHT.

From the Le Monde article:

Pour M. Jeanneney, il n'est pas question de laisser les politiques se mêler directement des contenus. Des conseils scientifiques européens, composés de bibliothécaires, de conservateurs, d'informaticiens et de savants de toute nature, pourvoiraient à les définir. Une instance qui en serait l'émanation déterminerait une stratégie collective. Elle "s'attacherait à encourager tous les choix privilégiant la mémoire des échanges d'une nation à l'autre." Et devrait répondre "à cette inquiétude lancinante du n'importe quoi, de la dispersion du savoir en poudre" , caractéristique à ses yeux du projet Google, "dont le président des bibliothèques américaines - Michael Gorman - s'est fait le dénonciateur persuasif et inquiet."

For M. Jeanneney, it's not a question of letting the politicians meddle directly in the content. European scientific councils, made up of librarians, conservators, computer scientists and scholars of all kinds, will arrange to define it. A decision-making body that would result from this process would decide on a collective strategy. This body "would dedicate itself to encouraging all options, favoring the retention of the exchanges between one nation and another." And it should respond "to that throbbing anxiety for anything and everything, scattering knowledge like dust", characteristic in his view of Google's project, "which the president of American libraries" -- Michael Gorman -- "has so persuasively and disturbingly denounced".

Reste un risque majeur : face à la souplesse et à la détermination d'une entreprise privée, disposant de moyens financiers très importants, l'Europe risque d'opposer à la firme californienne une complexe usine à gaz, addition d'administrations atomisées, jalouses et paralysées par des interférences politiques.

There remains a major risk: in contrast to the flexibility and determination of a private enterprise, able to spend large sums, Europe risks opposing to the California firm a gasworks project, adding atomised bureaucracies, paralyzed by administrative jealousies and political interference.


Previous Language Log coverage of this story:

2/01/2005 Revenge of the Codex People [a roundup of Gorman links]
2/20/2005 Google challenges Europe?
3/08/2005 The Progress and Prospects of the Digital BNF
3/19/2005 France challenges Google
3/23/2005 EuGoogle advances
3/26/2005 Europe's Response to Google to be Managed by ... Microsoft?
3/27/2005 Tomorrow was Yesterday

Posted by Mark Liberman at 05:21 PM

Replyese, or everyday English?

I recently had the following exchanges with technical staff at Stanford. The first relevant message went as follows (I suppress irrelevant details):

From: A...
Date: April 27, 2005...
To: zwicky@Turing.Stanford.EDU (Arnold Zwicky)
Subject: Re:...
In soc.motss, you wrote...

I forwarded a copy of A's message to B, who replied, in part:

I can't tell from below who the "you" is who wrote something in soc.motss, but perhaps it's *you*.

B can't tell who the "you" is? What's going on here?

My hypothesis is that B is treating e-mail as an instance of a special register of English, Replyese, while A and I are reading it as an exchange in everyday English, supplemented by a variety of extra information (like times in GMT). In particular, A and I think that since A was writing to me -- a fact made clear by the "From:" and "To:" headers -- the pronoun "you" refers to me, just as it would in a note to me or a phone call to me. B, on the other hand, expects (I surmise) that persons will be identified by their full names (and e-addresses) in the body of the message; the headers are irrelevant. B was expecting A to have written something like:

In soc.motss, Arnold M. Zwicky (zwicky@Turing.Stanford.EDU) wrote...

Or perhaps:

In soc.motss, Zwicky, Arnold M. (zwicky@Turing.Stanford.EDU) wrote...


In soc.motss, Woolly Mammoth (zwicky@Turing.Stanford.EDU) wrote...

Or simply:

In soc.motss, (zwicky@Turing.Stanford.EDU) wrote...

In everyday English, such a use of proper names, nicknames/pseudonyms, or addresses would be just bizarre. If Dan Jurafsky, say, greeted me in the halls of Language Log Plaza by saying "In soc.motss, Arnold M. Zwicky (zwicky@Turing.Stanford.EDU) wrote..." or any of the other variants above, I would be seriously concerned about his mental state.

I suppose A and I, clinging in our quaint way to the conventions of two-person interchanges, even in e-mail, are hopelessly Out of It. Oh, I mean that I suppose A (supply.e-address.here) and Arnold M. Zwicky (Turing.Stanford.EDU) are hopelessly Out of It.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 02:31 PM

Voilá: the movie

To explain the fractured syntax of a New Yorker Infiniti ad, I invented an elaborate plot sequence, despite having no relevant knowledge or experience of the advertising business. Then I came across some misspelled French decorating a wine ad in the same magazine, and concluded (with equal ignorance) that the responsible parties are probably just incompetent and careless. Now Ed Keer, a linguist working as an advertising copywriter, sets me straight by supplying a believable plot for the wine ad blooper.

Having now been on the inside of an agency, I can tell you how this went down. The copywriter most likely made a mistake and wrote violá voilá in the manuscript. Then the sharp-eyed editor noticed the problem and changed it to violà voilà. All was fine until it went to the client for review. The client remembered back to her highschool French and changed it back to violá voilá. The editor at the agency flew into a rage. The account person asked if the client is right. The editor composed a heated email explaining the problem, complete with scanned dictionary and style book pages. The account person gently tried to explain the problem to the client. By this time the client had found a few colleagues to back her up. The writer, exhausted from coming up with 50 different concepts to sell cheap wine, ignored the whole thing. At some point after that, the account person uttered the phrase, "We're not going to die on our sword for this." And so it went to print.

That makes sense. I can see Bill Murray as the copywriter, Melanie Griffith as the client, and John Lithgow as the editor. Maybe the copywriter and the client are former lovers... and you can make up the rest for yourself.

I should know better than to ascribe to simple human error something that could instead be explained on the basis of a complex network of ignorance, interpersonal conflict, defensiveness and communications failure :-).

Posted by Mark Liberman at 11:47 AM

Standing out by blending in

(Annals of post-modern advertising, part 3.) Nissan North America bought another two-page spread in the front of the May 2 New Yorker, as they did in the April 25 issue. This time the featured model is the Infiniti FX rather than the Infiniti M, and there are no incoherent sentences in the ad copy. Well, there are no syntactically incoherent sentences, anyhow.

The background is black, as before, with a picture of the featured vehicle on the right-hand page. At first I thought that the left-hand page was solid black, but then I realized that there are some large letters in a slightly lighter shade, roughly like this:

if you are not
generic and
ordinary and
you stand out
no matter how
similar your

(If you have trouble reading it, try highlighting the text.)

Perhaps Saul Gorn's Compendium of Rarely Used Cliches and Self-Annihilating Sentences should be expanded to include a section on "Self-Refuting Advertisements".

It's easy to trace the associative process that produced this ad. The TV advertising for the Infiniti FX includes the catchphrase "The only thing it won't do is blend in", and the slogan on the FX website is

You've never seen anything like it.
Then you see nothing else.

(Or maybe "nothing at all"...)

So the theme is "standing out"; but people who buy luxury cars want to be fashionable as well as admirable; so they need a countervailing theme of fitting in, though of course not by being in any way ordinary or generic or homogenized. Perhaps someone at TBWA\Chiat\Day decided to embody the contradiction typographically. Then again, maybe they were just flailing around with associated concepts, and thought that black letters almost blending in on a black page would be a cool way to make a point about the Infiniti FX not blending in on the street.

Either way, it's not making me want to buy an Infiniti. Not that I'm in their target demographic anyhow.

Posted by Mark Liberman at 09:41 AM

April 27, 2005

Bad translation?

I just heard a brief interview with French author Frederic Beigbeder and the translator, Frank Wynne, of Beigbeder's novel about 9/11 Windows on the World on BBC World Service. The novel was recently announced as having won the Independent Foreign Fiction Prize. The author and translator -- who hadn't met until now, having collaborated entirely by phone and e-mail -- will split the £10,000 prize equally. When asked what it was like working with a translator, Beigbeder -- whose English was excellent -- said: "I speak very bad English, but I can read Frank's work." For a second there, I wondered whether Wynne was being praised or panned.

[ Comments? ]

Posted by Eric Bakovic at 10:13 PM

"This was a total embellishment"

It's not just copywriters. Graphic designers could use a bit of fundamental education in linguistics, too. Mark Swofford at Pinyin News takes a swipe at Chris Calori and David Vanden-Eynden, for the quotes attributed to them in an April 12 article in Metropolis Magazine, under the headline Graphics That Bridge a Linguistic Divide.

I'll refer you to Mark's post for the critique, and here just let Calori and Vanden-Eynden misspeak for themselves. First, their take on Chinese characters as "ideas":

The Chinese language has more than twenty thousand characters in common usage, and they function in fundamentally different ways than Western letters. “Each character is really an idea,” Calori says. “They’re called ideograms. The Chinese can say in four characters what it might take us a paragraph to describe.”

Then the really fun part, their analysis of the Chinese text of the sign whose design they're explaining (苏州国际博览中心 Sūzhōu guójì bólǎn zhōngxīn = “Suzhou International Exposition Centre”). I especially like the part (at the end) about how "this was a total embellishment" and "they wanted the warm, fuzzy heart center, as opposed to the cold, hard center of hell".

“Historically, Suzhou was known for its gardens and greenery,” Vanden-Eynden says. “So the top particle of the first character means ‘grass.’ The second character translates roughly to ‘green state.’

“The next two characters combine to create International. The third character stands for ‘nation;’ the big box around it is ‘mouth’ or ‘center.’ The strokes inside of the box denote ‘jade,’ which is highly prized. The fourth character represents ‘border’--but one part also symbolizes ‘the ear,’ another part ‘to demonstrate.’ So, literally translated, you’re demonstrating that you’re the prize or the center or the mouth.

“How do you describe an Expo? It’s a notion. We shortened it, because ‘exposition’ was too damn long. So what takes place at an expo? Well, a bunch of people and companies get together in one spot to show each other new products and ideas. That’s a lot to describe in one word. The Chinese manage to do it in two characters, which stand for ‘abundant’ and ‘view.’ The top part of the fifth character is ‘noting like it’ or ‘of itself’ (the cross doesn’t have a lot of significance); the second part means ‘subsidiary,’ and the bottom is the unit name for the Chinese inch, which implies a multitude of something. The sixth character is ‘to look’ or ‘view.’ So, an abundant-amount-of-things-to-look at equals expo.”

“The last two characters for Centre--it’s interesting they went with the British spelling--are actually redundant,” Calori says. “Often you see the seventh character--it means ‘middle’--for center. But the client also added the eighth character, which is the symbol of ‘heart.’ The heart is the middle, so they reinforce each other. This was a total embellishment.” Adds Vanden-Eynden: “They wanted the warm, fuzzy heart center, as opposed to the cold, hard center of hell.”

There are more than a few grains of truth in there, but they're embedded in a dense matrix of confusion. I can't resist quoting Mark's comment on the "total embellishment" of the "warm, fuzzy heart center":

This is so wrongheaded and absurd it’s hard to know whether to laugh or cry. The client didn’t add the eighth character (心). It’s used in writing the word for “center,” which is zhōngxīn (中心). The only thing “fuzzy” here is the thinking behind this nonsense.


Posted by Mark Liberman at 07:57 AM

Bad ads

A couple of days ago, I went into a long song-and-dance to explain the grammatical incoherence of an ad for the Infiniti M on pages 2 and 3 of the April 25 New Yorker. Two sentences, 29 words, $200k to run it, and the second sentence is not English. Not informal English, not dialectal English, just what looks like a careless editing error.

I made up a whole one-act-play's-worth of backstory about this, driven by the assumption that everyone involved was competent and careful. Class anxiety, clash of egos, high drama. But now I'm starting to think I was wrong. Maybe advertising copywriters are just ignorant and careless.

The inside of the back cover of the same issue of the New Yorker is an ad for Turning Leaf Vineyards. More black background, here the night-time wall of a McMansion. Warm orange window in the middle, with gauze curtains outlining the shape of a wine bottle. Behind the gauze, the silhouettes of a man and woman at dinner. He's pouring the wine, she's using chopsticks to serve the Chinese take-out. The copy is again in off-white letters, this time in a quiet space in the lower right:

An empty refrigerator
turns into a great excuse

Fine china and silver
make a surprise appearance

And voilá, Kung Pao chicken
becomes Le Kung Pao chicken

OK, no surprise for a New Yorker ad, there's more class anxiety here. Turning Leaf is Gallo's middlebrow brand, aiming for a higher level of snob appeal while staying in contact with everyday life. Adding a French definite article to "Kung Pao chicken" is a fine poetic emblem for that striving. And using an acute accent (voilá) instead of the correct grave accent (voilà) is a poignant, pathetic reminder of the potential for humiliation that social climbers expose themselves to.

Was this some copywriter's ironic subversion of the campaign's message, crystallized in one subtle little diacritical error? I doubt it. My money is on the theory that no one associated with the campaign knows any better. If the agency or the client has anyone literate in French, they weren't paying attention.

Now, I'll freely admit that I'm a careless typist, an occasional misspeller, and the world's worst proofreader. Geoff Pullum deserves course relief from Santa Cruz for all the time he puts into correcting my posts. But if I were spending $100,000 to put a full-page ad onto the back page of the New Yorker, with 26 total words of copy, I think I could manage to check the spelling.

Posted by Mark Liberman at 07:02 AM

April 26, 2005


Apparently in response to my April 19th joke, the top ten listing on amaztype™ zeitgeist, in the TITLE in ALL MEDIA category, is now (Apr 26, 2005, 22:10:00 GMT)

1 LINGUISTICS 2889 hits
2 SEX 2559 hits
3 LANGUAGE 1442 hits
4 FUCK 1148 hits
5 TOM HANKS 883 hits
6 FLASH 482 hits
7 PORN 442 hits
8 BOOBS 393 hits
9 LOVE 379 hits
10 HARRY POTTER 351 hits

Even more amaz-ingly, the top ten in the AUTHOR in BOOKS category:

1 YUGO 1444 hits
2 LEONARD TALMY 591 hits
3 MARK LIBERMAN 434 hits
4 NEIL GAIMAN 327 hits
5 STEPHEN KING 262 hits
6 ARNOLD ZWICKY 249 hits
8 ILLIAD 135 hits
9 SCOTT MCCLOUD 127 hits
10 PICKOVER 351 hits

So who is this "Yugo"? Searching Amazon for "Yugo" in the Author field turns up, in order, Nuclear Reactor Safety Heat Transfer, by Dubrovnik, Yugo Summer School on Nuclear Reactor Safety; El triunfo de la gracia sobre el pecado, by Pedro Yugo Santacruz; The recycling of plastic wastes in packaging, by Yugo Suzuki; Semeynyye obryady i verovaniya karel, by Yugo Yul'yevich Surkhasko; and Hokkaido no shizenshi: Hyoki no shinrin o tabisuru, by Yugo Yul'yevich Surkhasko. Perhaps there's a signal there...

Anyhow, once past that obscure and perhaps-coded reference to "Yugo", there's Len Talmy! Go Len! I'm honored to be on a list with him, not to speak of Stephen King, Mercedes Lackey, Illiad and Scott McCloud.

Here's a short Illiad sequence with a lexicographical theme:



And here's Scott McCloud's site.

But I wonder why Patrick Farley isn't on the list? Or Rob Balder:

Language and gender: the cartoon version

Simon Baron Cohen has been promoting the idea that autism is a symptom of an "extreme male brain" -- runaway male-associated systematic and analytic thought with associated deficits in female-associated social and empathetic processing. It makes me nervous when a scientific theory lines up so nicely with current cultural stereotypes. An earlier Language Log post featured Marcel Just's alternative idea -- that autism is lack of neurological coordination. And a year ago, I discussed some examples of the problems that come up when scientific research engages gender stereotypes about language use.

Today, we'll look at some cartoon versions of these ideas.

Yesterday, Tank McNamara got his first lesson in the International Women's Code:

...and today he verified the translation:

A couple of weeks ago, Sara Toomey demonstrated Jeremy Duncan's social and interactive cluelessness (from Zits):

Finally, Deadlock. This is a wonderful account of dating as game theory, by Dan Zettwoch. It which was previously discussed on Language Log 11/24/2003, but deserves to be linked again.

There are hundreds of similar examples out there -- 10-20% of all Cathy strips, for a start. For example, this year's Cathy series on income tax preparation (an annual feature) began on April 3 with a strip about gender styles

and ended on April 16 with another one:

With so much evidence to support them, how could these ideas be wrong?

Posted by Mark Liberman at 07:46 AM

Strange bookfellows

Q: What do Geoff Pullum and Emily Dickinson have in common?

A: They are the only two authors in whose works the phrase gratuitous capitalization is currently identified by amazon.com as "statistically improbable".

Let me re-phrase that, with the help of amazon's "learn more" pop-up for "Statistically Improbable Phrases (SIPs)":

Amazon.com's Statistically Improbable Phrases, or "SIPs", are the most distinctive phrases in the text of books in the Search Inside! program. To identify SIPs, our computers scan the text of all books in Search Inside. If they find a phrase that occurs a large number of times in a particular book relative to all Search Inside books, that phrase is a SIP in that book.

SIPs are not necessarily improbable within a particular book, but they are improbable relative to all books in Search Inside. For example, most SIPs for a book on taxes are tax related. But because we display SIPs in order of their improbability score, the first SIPs will be on tax topics that this book mentions more often than other tax books. For works of fiction, SIPs tend to be distinctive word combinations that often hint at important plot elements.

Click on a SIP to view a list of books in which the phrase occurs. You can also view a list of references to the phrase in each book. Learn more about the phrase by clicking on the A9.com search link.

But the funny thing is, "gratuitous capitalization" only occurs once in Geoff Pullum's The Great Eskimo Vocabulary Hoax and Other Irreverent Essays on the Study of Language, and once in The Complete Poems of Emily Dickinson. So can it really be true that this is a phrase that "occurs a large number of times in [those] particular [books] relative to all Search Inside books"?

It seems misleading, in ordinary language terms, to say that once is "a large number of times".

Nevertheless, there can be a plausible argument for characterizing a phrase that occurs only once -- or perhaps never occurs at all -- as more or less "statistically improbable". This is a point that Noam Chomsky got wrong in 1957, but it's a commonplace idea by now.

It's ironic that Geoff -- the author of the Once-is-Cool-Twice-is-Queer (OICTIQ) principle for linguists and philologists -- is tagged by amazon for a "statistically improbable phrase" that he used only once. All the same, this might be a feature of the SIP algorithm rather than a bug. In an earlier post, I asked "how many times does a word or phrase need to be repeated in order to seem characteristic of a speaker or author?" and answered "not very many times, maybe only once or twice, if the use in context is salient enough".

This might be such a case -- it must be admitted that the phrase gratuitous capitalization does, as amazon puts it, "hint at important plot elements" in Geoff's oeuvre.

Still, I'd like to know more about the algorithm that amazon is using. As I observed in the previously-cited post

Simple ratios of observed frequencies to general expectations will not work..., because ... such tests will pick out far too many words and phrases whose expected frequency over the span of text in question is nearly zero.

This is an instance of the problem that troubled Noam Chomsky in 1957. There are many, many two-word sequences in Geoff's book that do not occur at all in the other works indexed so far by amazon's "search inside" program. Looking at the context in which gratuitous capitalization occurs in Geoff's book, the immediately following sentence is

The harsh yoke of (e.g.) Academic Press and MIT Press copy-editing practices imposes on authors pointless and information-destructive capitalization of `significant' words (roughly, words that belong to the categories, N, A, or V) in titles.

Choosing at random, I find (by searching on A9.com) that the sequence "authors pointless" occurs in no other work known to amazon.com (check the returns for books, not the results borrowed from Google...). So why is "authors pointless" not in the SIP list for The Great Eskimo Vocabulary Hoax? Amazon must be doing something clever.

Ah, you may say, but "gratuitous capitalization" is a syntactically and semantically meaningful unit, while "authors pointless" is not. This is certainly an issue for such algorithms -- SIPs ought to be meaningful phrases of some sort, not just random uncommon word sequences. However, it's not obvious that amazon's cleverness is based on this sort of linguistic analysis of the content of the books indexed. Take a look at the actual occurrence of "gratuitous capitalization" in The Complete Poems of Emily Dickinson (pages x-xi of the Front Matter, written by the editor Thomas H. Johnson):

I have silently corrected obvious misspelling (witheld, visiter, etc.) and misplaced apostrophes (does'nt). Punctuation and capitalization remain unaltered. Dickinson used dashes as a musical device, and though some may be elongated end stops, any "correction" would be gratuitous. Capitalization, though often capricious, is likewise untouched. [emphasis added]

So in this case the "statistically improbable phrase" is no phrase at all, but a word sequence spanning a sentence boundary.

On the other hand, looking over some longer lists of Statistically Improbable Phrases, it does seem that they are limited to things that are plausibly phrases to start with. (See for example the SIP list for Ray Jackendoff's Foundations of Language.)

So here's what seems to be going on:

  1. amazon is indexing books by a method that throws away all punctuation, case (and stop words?), and identifying possible SIPs by reference to (2- and 3-element?) subsequences of the resulting degraded strings;
  2. amazon is limiting SIPs to things that are plausibly phrases in a linguistic sense, as they might occur in undegraded text, independent of their context of occurrence in any particular work -- or they are imposing some other condition that has this effect;
  3. candidate SIPs (identified as in [1], and limited as in [2]) are accepted iff their probability (estimated from a model derived from all books indexed) is below some threshold (and perhaps if some other conditions are met).

I'm pretty sure about [1] and [3] (though I'd like to know more about the probability estimation method, and any other conditions that may be used). [2] is the part that is least clear to me. All the methods that occur to me will either miss genuinely characteristic phrases (problems with "recall"), or flag sequences that should not be considered phrases at all (problems with "precision").

A few minutes of poking around turned up plenty of other mistakes like the Emily Dickinson one, where a SIP is not actually being used as a phrase in the cited context, but no examples at all where a SIP might not plausibly be a meaningful phrase in some context. Thus amazon must be tuning its algorithm (sensibly) for high precision at low(er) recall. But I'd still like to know how it works.

And I'm distressed to learn that Geoff and Emily are not really textual siblings after all.

Posted by Mark Liberman at 06:07 AM

April 25, 2005

And the answer is: abemus

Never in the history of blogging was a post so rapidly and decisively refuted and crushed as my profoundly ignorant remark on Cardinal Estevez's h-less pronunciation of habemus papam. The best defense that could be offered is that my post was partially right: the language is called Latin, it does have a verb habere, and papa (accusative form papam) does mean "pope". But the accurate content of my post mostly stops there. A number correspondents with more knowledge of Latin than I will ever have (I who failed high school Latin at the age of 16 and never got much better at it than I was then) wrote lengthy emails to correct me.

Eliah Hecht happened to have just been reading the book I should have looked at (if the library had been open, or if I had owned the book): W. Sidney Allen's wonderful Vox Latina. And it reports that /h/ had started to disappear by end of the Roman republic, as various omissions and misapplications show (you get ORATIA for HORATIA, AUET for HAUET, and so on); Allen says that "by the classical period in fact knowledge of where to pronounce an h had become a privilege of the educated classes." The educated Roman classes, that is.

Geoff Nathan confirms this: the /h/ was gone in Latin by the third century CE or so, and the Appendix Probi (a third-fourth century prescriptive spelling manual for Latin) has corrections that put h's back in, a key sign that the sound had all but disappeared.

But we're not half done with how wrong I was. Nathan Vaillette points out to me that

if Cardinal Estevez was not speaking flawless *Classical* Latin, you still can't complain about his *Ecclesiastical* Latin pronunciation. This norm seems to be (semi)standardized and established. For instance, the following page on the Global Catholic Network site ("adapted from the Liber Usalis [sic], one of the former chant books for Mass and Office") tells us not to pronounce orthographic "h":


I found several choral sites with similar recommendations for singing church Latin, e.g.


(I also seem to remember that John F. Collins' popular Primer of Ecclesiastical Latin says the same about "h", but I don't have it in front of me.)

Furthermore, if indeed Estevez spoke Latin with "a Chilean Spanish accent you could cut with a knife", you would expect the [b] in "habemus" to come out as a voiced bilabial fricative. And I'm pretty sure Chilean Spanish is among the "s" aspirating varieties, where [s] in some environments—syllable codas at least—is either replaced by [h] or lost completely. So unless you heard [aBemuh papam], where [B] = voiced bilabial fricative, I think you're being a tad harsh.

That last point is so obvious that I actually knew the facts, but forgot to apply them. He did indeed have a [b] between vowels, not a bilabial fricative [β]. I'm writhing on the floor with humiliation here.

There is more. John Cowan writes to point out that the Cardinal was almost certainly likely to be using

the standard pronunciation of ecclesiastical Latin ("c" and "g" as in Italian, vowel length lost, etc.). In that tradition, written "h" is not pronounced except intervocalically, where it is pronounced /k/; thus mihi is ['miki].

We see an example of /h/ > /k/ in the name of the letter "h": /'aha/ > /'aka/ > OF /atS@/ > ME > ModE /eitS/. Some people supply an unhistorical /h/ at the beginning to make it /heitS/; this is often heard in Ireland, and it's said that terrorists on both sides have used this feature to separate h-ful Catholics from h-less Protestants.

The English names of the letters, because they have never had a standard written orthography, are a juicy example of "pure" sound-change and analogy at work; they were apparently invented by the Etruscans (an Etruscan?) and borrowed into Latin, and tracing them tells us both the Latin > Old French and the ME > ModE sound changes as well as the history of the modern Latin alphabet!

John McChesney-Young has written with more details (I am too exhausted to repeat them), citing another important book that I was too slothful to get up out of my reclining chair to go and check, Sturtevant's Pronunciation of Greek and Latin (see pp. 155-157). And Bob Kennedy writes from Santa Barbara's Institute for Social, Behavioral and Economic Research to provide additional evidence of vacillation with [h-] in Classical Latin:

A poem by Catullus mocks someone who hypercorrectively inserts [h] at the starts of vowel-initial words. The man in the poem is named Arrius, and "hinsidiously" insists on referring to Ionia as Hionia.

There is a discussion at http://community.middlebury.edu/~harris/Texts/catullus3.html, which includes this:

The Romans had trouble with the initial aspirate / h /, which they sometimes omitted, other times produced without reason. The wide prevalance of Romans as soldiers and adminsitrators in the Greek speaking world may account for the fact that the Greek grammarians of Alexandria felt it necessary to introduce the "smooth and rough breathing" marks at the start of Greek words which have an initial vowel. Everyone in a decent position at Rome had to know Greek, but this Latin Cockneyism would still be a problem for men like Arrius when they tried with difficulty to talk in public.

This morning in Language Log Plaza little knots of staff writers were talking to each other in low voices and then breaking off when I came by. Now when I go into our ground-floor coffee shop, the Latté Linguistica, people get theirs to go so that they won't have to talk to me; they rush off, or pretend to be looking down into their coffee cup as if they thought they'd seen a bug floating in it... I'm being ostracized. I made a remark on Language Log without doing my fact-checking. I am the lowest form of linguistic slime. I am no better than a BBC science reporter.

I am probably not going to be here very much longer. The call will come to present myself in the Big Office where MYL sits, and after a brief and painful talking-to I will be introduced to the security guard who will help me carry the things from my desk to the front door. Then they will shut down my email account and scrub the hard disk on my desktop machine in preparation for handing it to the new staffer who will replace me.

Which, on the bright side, will at least save me from having to answer quite a lot of email. It occurs to me that there are about a billion Catholics, and so far I have only heard from two or three of them.

Posted by Geoffrey K. Pullum at 12:57 PM

News about brain structure in Williams Syndrome

In the latest Journal of Neuroscience, there's an interesting paper about brain structure in Williams Syndrome, a disorder caused by deletions of variable length in a gene on chromosome 7 (7q11.23) that codes for the connective-tissue protein elastin, and perhaps in other adjacent genes. Among the many symptoms of the syndrome are mental redardation with hypersociability, relatively spared language, and relatively spared musical abilities that sometimes rise to savant levels. What's new in this paper is a systematic and thoughtful examination of differences in brain structure between WS subjects and controls.

The reference is Thompson PM, Lee AD, Dutton RA, Geaga JA, Hayashi KM, Eckert MA, Bellugi U, Galaburda AM, Korenberg JR, Mills DL, Toga AW, Reiss AL. "Abnormal Cortical Complexity and Thickness Profiles Mapped in Williams Syndrome." Journal of Neuroscience, 25(16):4146-4158, April 20, 2005.

The background finding (in keeping with earlier studies) is one of general reduction in brain size, and especially in "white matter" (i.e. neuronal interconnections consisting of myelinated nerve fibers, as opposed to "grey matter", consisting mainly of cell bodies):

...the WS group had greatly reduced overall brain volumes (...left and right hemisphere volumes were reduced by 13.3 and 12.2%, respectively)...
... this overall deficit was attributable to primarily a far more dramatic reduction in white matter (left hemisphere, -18.0%...; right hemisphere, -18.3% ...) than gray matter, although gray matter also was reduced severely (left hemisphere, -6.8%...; right hemisphere, -6.2%...).
[The] WM deficit was found somewhat uniformly across all lobes (frontal, -15.9%; parietal, -18.4%; temporal, -21.2%; occipital, -20.0%...). Lobar GM volumes also appeared uniformly reduced (frontal, -5.9%; parietal, -7.4%; temporal, -5.3%; occipital, -8.6%).

However, against this background, there were striking local exceptions:

The WS group had greatly increased cortical thickness in a large neuroanatomical region encompassing the perisylvian language-related cortex. This region surrounds the posterior limit of the Sylvian fissures and extends inferiorly into the lateral temporal lobes (Figure 4c, red colors denote a 10% thickening of the cortex relative to controls). The region of significant thickness increases also extended over the inferior surface of the right temporal lobe (Fig. 4e) into the collateral and entorhinal cortex. This region included the fusiform face area, which processes facial stimuli, a cognitive ability in which WS subjects show notable strengths.

Here's the picture from their Figure 4c:

The authors also

...developed an algorithm to measure the fractal dimension, or complexity, of the human cerebral cortex, based on a previous algorithm that we developed for mapping the complexity of deep sulcal surfaces in the brain (Thompson et al., 1996).

According to this measure,

Cortical complexity was ... significantly increased in WS for both brain hemispheres.

The differences were small (e.g. left hemisphere Williams 2.2522 +/- 0.0016 SE, left hemisphere controls 2.2457 +/- 0.0014) but significant (p < 0.00145; two-tailed t test in this case). The authors comment that

Although these differences appear to be small in magnitude (~0.1– 0.3%), this can be misleading because they are computed from a log–log plot, in which small differences in slope translate into very large differences in gyral complexity.

As their plot shows, there is a great deal of overlap in the values for individual subjects:

Their discussion of function interpretations is interesting:

One simplistic interpretation is that thicker cortex is better, and that regionally thicker language cortex in WS subjects may account for their verbal strengths and unusually expressive language. By a similar argument, WS subjects are also prone to seek the gaze of others (Mervis et al., 2003), and the thicker cortical region in WS also encompasses the superior temporal sulcus, an area important in face and gaze processing (Kanwisher et al., 1997; Zeineh et al., 2003). However, this interpretation is unduly simplistic for several reasons. First, WS subjects have relatively intact language, but they do not outperform controls, which would be implied by the idea that thicker cortex is better (Haier et al., 2004). Second, WS subjects do not have enhanced function in other systems with thicker cortex (e.g., posterior and lateral occipital and inferior occipital-temporal regions), which subserve visuospatial functions impaired in WS. Third, a similar thickening of perisylvian cortex in fetal alcohol syndrome (FAS) (Sowell et al., 2002b) is not associated with better language function.

They add that

In both WS and FAS, excess cortical gray matter is most likely a result of a failure of cortical formation during gyrogenesis or a concomitant failure or delay in myelination, perhaps specifically in subcortical U-fibers (conventional MRI cannot distinguish these two possibilities).

In thinking about genetically-related brain structure/function correlations, I always wonder how much of such patterns is genetically determined in an open-loop sort of way, and how much depends on developmental processes that involve feedback from experience (which might amplify initial differences in ability, interest and motivation). So I was very happy to see the authors explicitly raise those issues:

Cortical architecture in WS is likely influenced by haploinsufficiency for specific deleted genes, but it is also dynamic and environmentally influenced throughout life. This study correlates genetic mutation and anatomical change, but causality cannot be determined. There is no way to make any simple categorical interpretation of genetic and nongenetic influences, because these cannot be disentangled, and both may occur downstream of a genetic lesion. The observed cortical thinning may be shaped primarily by negative genetic influences (that impair parietaloccipital structure and function). Nonetheless, the cortical increases may represent increased use or overuse of specific networks. Even if the thickening represents an adaptive response to the genetic deletion, whether or not it is functionally advantageous cannot be assessed without additional testing. Conversely, a cortex that appears relatively intact on MRI is not necessarily functionally intact. The notion that there are functions left intact in developmental disorders is likely incorrect; massive reorganization is likely standard across developmental disorders, and the resultant functionality is probably deviant (Karmiloff-Smith et al., 1997; Mills et al., 2000; Thomas and Karmiloff-Smith, 2002; Grice et al., 2003). In particular, language processing, musical abilities, and face processing in WS are not par with normal performance (Karmiloff-Smith et al., 1998, 2004).

Overall, this is fascinating work, not least because it's a welcome corrective to the simplistic interepretations that are sometimes given to the relative sparing of linguistic abilities in this syndrome.

Posted by Mark Liberman at 11:51 AM

Think on, think off

Almost two weeks after Geoff announced the end of his public radio station's pledge drive, I'm still suffering through mine. This morning, I heard this wonderful attempt to break up an idiom (paraphrasing slightly; I didn't record the exact quote):

We know you've been thinking about becoming a member off and on during our pledge drive. This morning, we want you to think about it on.

[ Comments? ]

Posted by Eric Bakovic at 10:17 AM

Save those scraps

A big posthumous payday for Norman Mailer's mom.

Posted by Mark Liberman at 09:43 AM

Better a spectacular blunder than a hint of unseemliness

In the April 25 New Yorker, pages 2 and 3 are a spread for the "all-new Infiniti M". The right-hand page shows a driver's view of the high-tech cockpit in glowing beige and brown. Above the picture, a few words of normal-looking text tell us about the Lane Departure Warning System, the Bose Studio Surround Sound, the Bluetooth Wireless Technology, and the exhilarating 335 horsepower.

On the left-hand page, the cockpit photo fades elegantly into a warm brownish blackness, against which enormous glowing off-white letters are laid out as if on a surface slanted away from us toward the center of the spread -- an open driver's door? -- with textual perspective lines leading us back into the picture to the right:

Designed to think the way you do, the technology is smart, simple and invisible. A fresh new contemporary space that takes luxury into the wireless modern world it belongs.

Uh, like, "to"? Or "in"?

Unless I'm being really dense, or missing some language change in progress here*, that second sentence is ungrammatical. Not non-standard-English ungrammatical, not made-up strunkadelic pseudo-rule ungrammatical, but just plain everybody-knows-it's-wrong inco-freaking-rrect.

What's the story here? This two-page ad must have cost Nissan North America about $200,000 to run, and Lord knows how much to design, so we can assume that the copy was proofread once or twice. Surely this is not a typo.

Well, I have a theory.

Although "to" or "in" would fit the lay-out easily, the other obvious alternative wordings wouldn't: "the wireless modern world where it belongs"; "the wireless modern world to which it belongs"; "the wireless modern world in which it belongs". For any of these, you'd have to change the font sizes and redo all the line divisions. That would be hard, since the existing lines are only 15 or 16 characters long. To add the five characters of "where" or the eight characters of "to which" would take some big changes, spoiling the whole feel of the lay-out.

So here's what I think happened. The copy started out as "...the wireless modern world it belongs in", or "... the wireless modern world it belongs to". Then at the last minute, someone at Nissan North America looked at the ad and said "Wait a minute, that sentence ends with a preposition. What will people think?"

The ad agency team (from TBWA\Chiat\Day, according to BrandWeek) trotted out the usage books that say stranded prepositions are OK, and even the fake Churchill quote. But the auto execs weren't having it: "This is a luxury car, you can't use that tacky syntax!" What to do? There was no room for "where" or "to which", and no time to re-do the whole thing. Other local substitutions raised other troubling associations: "the wireless modern world it controls"? "... deserves?" "... inhabits"? "... inherits"? "... traverses"? In desperation, they decided to leave the preposition out, hoping that most people wouldn't notice. Better a genuinely mistaken sentence than the social anxiety associated with violating an absurd "rule" that was invented out of thin air by John Dryden in 1672 and has been scorned by every competent expert since.

This is all just a theory, mind you. I'll let you know if I hear another story from anyone in a position to know.

* To be fair, it's not totally out of the question that some people might be moving in the direction of using "belong" with a location as direct object. There's a model in relative clauses with the place as the head:

Lay out all the cards and the drawings and work with your child to match each machine with the place it belongs.
The place that I belong right now is home.

However, other words of similar meaning don't work for me in the same construction:

???...match each machine with the location it belongs.
???The location that I belong right now is home.

...though Google finds a few people who think this sort of thing is fine:

...copy the new directory into the location it belongs.
...returning anything out-of-place to the location it belongs.
Either send it to the location that it belongs or send it to Susan Fanning and ask that she forward it to the proper person.

Even place doesn't work for me other than in such relative clauses:

*This machine belongs that place.

and in this case, Google doesn't seem to find any native speakers who disagree with me. I think this is something about place, not something about belong. All these internet examples (with place as head and other verbs in the relative clause) are fine for me:

Check out the place that we're going, Burke's Canoe Trips,
All the places I've looked just have it up for sale without listing what the DVD contains.
Congress needs a crash course in Internet technology followed by a swift kick in the place it sits.

even though all the non-relative-clause versions strike me as hopeless or at least questionable even with place, and worse with other noun phrases for locations:

*We're going a place on the river.
*We're going Burke's Canoe Trips.
?I've looked many places.
*I've looked many music stores.
*He's sitting a place halfway between his shoulder blades and his knees.
*He's sitting his rear end doing nothing.

The string "the world that it belongs" occurs 40 times in Google's index, and none of them have the structure required by the Infiniti ad. The string "the world it belongs" occurs 1,940 times, mostly irrelevantly:

Before Him, individual distinctiveness belongs to the 'nothingness' of being in the world. It belongs on a basis that is not God...
And the world it belongs to me...
But world music is still something that belongs to the world. It belongs to the people.

There are too many for me to want to check them all, but after looking at the first couple of hundred hits, I don't think we're going to find anything like the structure used in the Infiniti ad. If there's a change in that direction, it's way too early to use it in a luxury car ad.

[Update 4/29/2005: Neal Whitman emails:

I've noticed the kind of missing preposition that you noticed in the Infiniti ad, too. Sometimes I think the situation is what you hypothesize. In fact, I think your analysis of this ad is on the money, since it really does sound bad. Other times, though, deletion of the preposition can be gotten away with:

1. It works with antecedent-contained deletion. For example, someone giving me advice on exiting a tight parking space said, "Go out at the angle you came in." Not "...at the angle you came in AT." But the key is that the 'at' already appears earlier in the sentence. I'm not sure of the exact conditions when this can happen, but I'm pretty sure the name for it is "antecedent-contained deletion." In this case, the preposition omission is almost obligatory, since (to my ear) the repeated 'at' sounds funny, even though it parses out right. In fact, maybe the 'into'/'to' repetition was close enough to activate the ACD rule in the ad-writers' grammar, but just not in yours or mine.

2. Or, as you note, the noun heading the adverbial relative clause might be a special one such as 'place,' which allows the omission of a needed preposition. These have been written about by Richard Larson in a couple of LI papers in the 1980s, and by McCawley. And by yours truly, in a 2002 issue of Journal of Linguistics (where full bibliographic info on the other sources is listed).

That's Whitman, N. (2002) " A categorial treatment of adverbial nouns." Journal of Linguistics 38.521-597]

[Update 5/1/2005: Andrew Palumbo observes that I could use negative conditions like -"belongs there" to eliminate spurious matches (along perhaps with some real ones) from the Google search for other examples of phrases like "into the world it belongs". The search {"into the world it belongs" -"belongs there" -"belongs to" -"belongs in"} returns nothing at all; {"into the * world it belongs" -"belongs there" -"belongs to" -"belongs in"} returns one spurious hit; {"into the * * world it belongs" -"belongs there" -"belongs to" -"belongs in"} returns only this very page itself!

We can combine negative conditions with a wildcard "*" {"into the * it belongs" -"belongs there" -"belongs to" -"belongs in" -"the place it -"the places it"} to find 94 possible examples of this construction with head words other than place or places, such as

Either move in behind it, or pass it, giving it opportunity to move back into the lane it belongs.
As you continue to evaluate, improve, and adjust it will bring your marriage back into the arena it belongs.
...you have to make your trail map narrow to fit say 400 or 450 pixels wide to get that map back into the area it belongs...
Ask the priest to go kick some other church's backside into the spot it belongs.
The ABRA's goal is not to take over the barrel racing industry, but rather to take the barrel racing industry back into the Hands it belongs.
The title of Chapter 18 puts oral sex into the context it belongs...
Album reviews, like most other reviews, should either give the album kudos, or knock it into the shitcan it belongs.

All of these strike me as pretty bad, but after I've read 50 or 60 of them in a row, they're beginning to move out of the WTF category. ]

Posted by Mark Liberman at 12:42 AM

Habemus or abemus?

I only just noticed, when NPR played back a selection of "voices of the week" this morning, that what Cardinal Estevez actually said as he announced the choice of Cardinal Ratzinger was (in phonetic transcription) [a'bemus 'papam]. No [h] on the first word. According to the St. Louis Review, the weekly newspaper of the archdiocese of St. Louis, "At 6:40 p.m., Chilean Cardinal Jorge Medina Estevez, the senior cardinal in the order of deacons, appeared at the basilica balcony and intoned to the crowd in Latin: "Dear brothers and sisters, I announce to you a great joy. We have a pope." Well, I'm sure it was supposed to be in Latin, but unless I am much mistaken, Latin would have had that [h]. Hence the spelling. Of course, I am not philologist enough to know the exact century when the [h] disappeared (as it certainly did: there is no [h] in Spanish or French or Italian, and I can't name any modern Romance language that has preserved it; philologist acquaintances, please correct me if I'm wrong), so the Cardinal could perhaps be defended on the grounds that he using the Latin of some later period when the [h] as already gone. But my money would be on the simpler hypothesis that he speaks Latin with a Chilean Spanish accent you could cut with a knife.

[Added later: Actually, just about everything in this post is wrong except that Cardinal Estevez may indeed come from Chile and may have spoken at roughly twenty to seven. Many philologist acquaintances and even total strangers have rebuked me on Latin pronunciation issues, some very sternly indeed. Click here awful details of my rank ignorance. It's going to be a long time before I get invited to any classics parties or Catholic church events, that's for damn sure.]

Posted by Geoffrey K. Pullum at 12:16 AM

April 24, 2005

The King of Linguistics

Thanks to Mark Liberman's having pointed us to the amaztype zeitgeist site, I can now report that the title of King of Linguistics has been taken, by one Andrew Laidlaw, a young white UK hip hop artist who performs under the name MC (or M.C.) Unique. The title is, presumably, a boast about his facility with the English language.

How did I find this out? Well, first, zeitgeist informed me, a few minutes ago, that LINGUISTICS had now pushed PHP out of the #10 slot in the rankings of words recently requested by users of amaztype (which spells out words using thumbnails of titles, from the amazon.com database, with those words in them). (Meanwhile, LANGUAGE is holding on at its recently achieved #3 position, behind SEX and FUCK.) So I went to the amaztype homepage and checked on LINGUISTICS in book titles; there are, no surprise, a great many such titles. No dvd or video titles with LINGUISTICS in them, however. And only one music title with LINGUISTICS in it: yes, MC Unique's "The King of Linguistics", named after one of the raps on the cd.

While I was playing with the software, I checked out LINGUIST. No music, but there's a series of dvds entitled "X 101 (Learn to Speak X wiith the Travel Linguist)", where X is French, German, Italian, (Brazilian) Portuguese, or Spanish.

But it's shocking that linguistics and linguists haven't been celebrated in the titles of music and films. "Lay, Linguist, Lay". "The Lilt of Linguistics". "Let Me Call You Linguist". "The Lady from Linguistics". "Lassie Saves the Linguists". "The Last Linguist". So many possibilities, not one of them yet exploited.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 10:47 PM

Tho Fan

Here's a welcome contrast with the shabby treatment of Said el-Gheithy, the inventor of Ku (= "Ch'toboku"). When Stephen Totilo wrote on 4/19/2005 in the NYT about the language Tho Fan, invented for an X-Box game called Jade Empire, linguist Wolf Wikeley is front and center. With a picture and everything.

Now we need to work on those linguists' consulting fees. The NYT article says that that BioWare paid Wolf "just over $2,000" for four months of work. Unless he gets big-time residuals, that must be way under minimum wage, even if it was US dollars rather than Canadian ones. (And the University of Alberta Linguistics Department web site suggests that he had to take a leave to do it.) I'm sure BioWare can't get programmers or graphic artists to work for wages like that.

Wikely says he had fun ("This was really a dream job because as a hobby I write and as a profession I work with languages so to combine those, working creatively and scientifically, was a blast"), but all the more reason to give him a proper payday, says I.

[The University of Alberta press release is here.]

Posted by Mark Liberman at 01:54 PM

Rampant malaprops, companionable

Not all substitutions of one word for another are eggcorns: some are typos, some are misspellings, some are mishearings, and some are plain old malapropisms, not involving any sort of reinterpretation or reanalysis. Of the malaprops, a few have become rampant, usually because the words in question are similar both semantically and phonologically: militate/mitigate, flout/flaunt, and flounder/founder are familiar examples (and the first two are discussed in the eggcorn database).

New to me, though probably not to more experienced collectors of these things, is eccentric/eclectic, as in the following:

Some of the most fascinating passages of the book are anecdotes in the first chapter about Bouissac's adventures with lions and bears. To dream of running off to join a circus is clichéd; to actually do so is eclectic. (Ken Schellenberg, review of The Pleasures of Time: Two Men, A Life by Stephen Harold Riggins, Lambda Book Report, Jan.-March 2005, p. 25)

I really can't see how running off to join a circus is an eclectic action, in the sense that it combines diverse elements of something or other. But eclecticism is both odd and conspicuous, so you can see how thinking about eccentricity might lead you to eclecticism. Especially when the words eccentric and eclectic are so similar phonologically (and morphologically). If you reach for a fancy word that has the meaning you want, eclectic is likely to be close to hand.

Apparently, lots of people have picked up eclectic on the way to eccentric, so many that some others have come to think that eclectic means something like 'notable and unusual, eccentric', or at least that the two words are related. And lots of other people have connected the two words in their minds and use the two together; they've become companions (something that hasn't happened with the other rampant malaprops above). A Google web search on "eclectic" and "eccentric" in proximity to one another turns up ca. 647,020 hits, including items like these:

Like the big, broad, bountiful country that it is representing in Athens, the 2004 US Olympic team is eclectic, eccentric, brash, rambunctious and very ... (www.freenewmexican.com/artsfeatures/3122.html)
What I mean by eclectic and eccentric actor, is that Johhny Depp not only chooses the deeper meaning roles, or oddly twisted characters, he tries to embellish them with his idea of the characters personality (which he forms from his experiences or people!) (elon.powerfulintentions.com/forum/topic/251)
Block has subsequently written five books about Weetzie and her eclectic, eccentric friends (all five books are now bound into one volume, ... (www.teenreads.com/authors/au-block-francesca.asp)
But call them eclectic, eccentric or quirky, this band's elusive quality has taken them far in the past two decades: across the country and around the ... (redclayramblers.tripod.com/1992carolina_alumni.htm)
Eclectic/Eccentric:. Washington Square Hotel - $110 to $225 103 Waverly Pl Phone: (212) 777-9515 Fax: (212) 979-8373 The North Square Restaurant offers ... (www.sachsnet.com/contact/hotels.html)
HITCHCOCK ON DVD ECCENTRIC, ECLECTIC AND ESSENTIAL. With an abundance of documentaries supporting the films themselves, and a 4-page booklet in each ... (www.urbancinefile.com.au/ home/view.asp?a=7022&s=Features )
An Eclectic(Eccentric?) List by cammykitty, ex poet, aspiring YA writer ... a book from Nancy Willard, one of the most eclectic/eccentric writers alive. (www.amazon.com/exec/obidos/ tg/listmania/list-browse/-/14FMBF4LGY8R3)

I'm not claiming that everybody who uses eclectic and eccentric together sees them as near-synonyms, or at least as overlapping in meaning. Some of the cites above are clearly intended to convey 'both diverse and unconventional' (though sometimes with an extension to new sorts of referents for eclectic, as with that eclectic team and those eclectic friends). With others, I'm inclined to think that the writer was hedging bets by using both words, just to be sure that one of them would convey something in the vicinity of the intended meaning.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 01:16 PM

Another day, another reprinted press release

Mary Blume has an article in Friday IHT about Jean-Paul Nerrière's "Globish". Read her article, read about Charles Kay Ogden's "Basic English", and then tell me what's really new here. It's easy to see why Nerrière doesn't tell us about the history of this concept, since he's trademarked "Globish" and is selling various products based on it. But it's harder to understand why the IHT would essentially reprint Nerrière's press release, without giving the context that any culturally literate person should know about.

I'm not a difficult person, really I'm not, but here I am carping about incompetent journalists again. Please send me links to well-researched and insightful articles in the popular press, so I can balance blame with praise. [That's myl at cis.upenn.edu].

Benjamin Zimmer sent email to point out one difference between Ogden and Nerrière -- Ogden could write reasonably well.

In Nerrière's English explanation of Globish, we're invited to "Read the two documents below, in sequence as presented here, and then ask yourself the one and only key question: 'if I wanted to help someone in Zanzibar or Oulan-Bator understand what is the idea behind globish, which of the two documents should I send?'".

Here's the start of the so-called "American version":

This little tidbit of literary joy is amiable and a slam dunk to peruse, notwithstanding the fact that it has the overwhelming gall to propose a revamping of our methods of verbal exchange around the world.

Here's the equivalent portion in "Globish":

This book is easy to read and with pleasure. Still, it proposes a complete change in the way we communicate around the world.

Here's a comment in a more genuinely American idiom: "bullshit". I apologize for using a philosophical term of art, but you can find an explanation by following the links.

Posted by Mark Liberman at 09:46 AM

Words needed for words used for special reasons

Prentiss Riddle at aprendiz de todo asks

There must be a term for bogus content intentionally included in a text to show that the readers don't get it, sort of like easter eggs in software.

This occurs to him in the context of Laura K. and SCIgen. The concept is analogous to the copyright traps in maps (which are apparently not legally effective). I recall being told of a lexicographers' term for similar copyright traps in dictionaries, but I don't remember what it is.

I've recently come across another kind of communicative act whereby words are used for something other than their conventional effect, in a way that doesn't seem to have a conventional name. This is where you say something not because you mean it, exactly, but because it gives you a chance to use a word or phrase you've been saving up. The cartoon version:

There's a possible real-world example right at the start of Matt Taibbi's entertainingly vicious pan of Thomas Friedman's new book The World is Flat. Taibbi describes hearing about the book a few months before publication, under the title "The Flattening" rather than "The World is Flat":

It didn't matter. Either version suggested the same horrifying possibility. Thomas Friedman in possession of 500 pages of ruminations on the metaphorical theme of flatness would be a very dangerous thing indeed. It would be like letting a chimpanzee loose in the NORAD control room; even the best-case scenario is an image that could keep you awake well into your 50s. [emphasis added]

The chimpanzee-at-NORAD thing is a good line, as is the awake-well-into-your-50s tag. The only trouble is that there's essentially no connection to Friedman's book or the rest of Taibbi's review. It's clear that Friedman is the metaphorical chimp -- Taibbi refers to him as "a genius of literary incompetence", and cites chapter and verse to establish the point -- but what and where is the metaphorical NORAD control room? A blank sheet of paper and the concept of flatness? Friedman's copy of MS Word? The NYT bestseller list? The modern world?

I'd guess that this is a witticism that Taibbi heard, or used himself, in some other (more appropriate?) context. He's been looking for a place to use it in writing; this context is only half-way appropriate, but the phrase is primed and ready to go, so out it pops. The ironic thing is that his review's main point is Friedman's thoughtless use of half-appropriate metaphors.

[Send any suggestions to myl at cis.upenn.edu, as your contribution to lowering my IQ. ]

[Update: Andrew Gray emailed:

You mentioned today that: "I recall being told of a lexicographers' term for similar copyright traps in dictionaries, but I don't remember what it is"

Would this be a Nihilartikel?

"A Nihilartikel is a deliberately fictitious entry in an encyclopedia or dictionary, which is intended to be more or less quickly recognized as false by the reader. The term "Nihilartikel" is German and combines "nihil" (Latin for "nothing") and "Artikel" (German for "article"). There does not appear to be any commonly used English-language term for this phenomenon."
- http://en.wikipedia.org/wiki/Nihilartikel

(I have nothing to add, I just like to say Nihilartikel. Which really ties in to the rest of the comment...)

This seems to fit Prentiss Riddle's question almost exactly (depending on what "more or less quickly" and "the reader" mean). It's not the dictionary term I may or may not vaguely remember, which was not a German compound. The gloss first given for Nihilartikel makes it seem unsuitable semantically as well, since a copyright-trap entry is supposed to be hard to spot. However, later in the Wikipedia entry, it says "Besides the obvious possibility of simple playful mischief, Nihilartikels may be composed for other purposes. Chief among these is to catch copyright violators..."

In any case, Nihilartikel is new to me, and I'm glad to learn about it. ]

Posted by Mark Liberman at 08:11 AM

April 23, 2005

Unreliable dialect identification help

The dialect test available if you click here is based purely on words, and in some cases basically slang items (question 10: what do you call an easy course, a crip course, a gut, or a blow-off?). You can decide whether it does a good job of classifying you, but my results were:

Your Linguistic Profile:

40% General American English
35% Yankee
15% Dixie
5% Midwestern
5% Upper Midwestern

Well, the truth is that I'm originally a speaker of middle-class Southern British, with some phonological and lexical features modified over the past 25 years by living on the West Coast of the USA. I've spent no time in Dixie; hardly any in New England; a little in the northern Midwest. The excess of alleged Southern features over Midwestern is hogwash (oh, all right, maybe that's a word from a dialect other than my original one; let's call it balderdash).

Now, you might say that my dialect is confused by my emigration, and it misled the test. But my friend Gerald Gazdar is not confused. He speaks a very cultured 100% Southern British English with no American features whatsoever (he couldn't supply any answer for question 10 at all: they don't have easy courses in Britain), and he got a report of 45% General American English, 25% Yankee, 20% Dixie, 5% Upper Midwestern, 0% Midwestern. Gerald Gazdar one fifth pure Dixie? Y'all bin drinkin' moonshine or sump'n?

So the test is highly erratic when used on British English speakers. Perhaps that's an unfair criterion. Those born and raised in America who have definite ideas about their dialect can check things out for themselves, but in general I'd say it will be purely for personal amusement; it's not science. Look at the territory covered by the test's 20 questions:

Your Test's Profile:

75% presence of selected geographically variable lexemes
25% phonology/phonetics of particular lexical items
 0% general phonological rules
 0% morphology
 0% syntax

I doubt that's a large enough or diverse enough portfolio of test items for a standardized test of local dialect in a community of over 280,000,000 speakers who have been spreading over a continent for four hundred years.

[Added later: It has been pointed out to me by Wes Meltzer that the test appears to be based on the dialect survey devised by linguist Bert Vaux of the University of Wisconsin, Milwaukee; see this page. And in defense of the test, John Cowan writes to say:

Well, of course a test of American English dialects is going to cough up a hairball when taken by people whose English comes from a different dialect group, even if they have some overlay from living in this country a while.

I'm from New Jersey, just outside the New York City isogloss bundle (I'm rhotic, e.g.), and my parents (who can influence your lexical choices if you are fairly isolated from your peer speech community as a kid, what in AAVE and Labov is called a "lame") were from Detroit and Philly. Bingo:

55% General American English
35% Yankee
10% Upper Midwestern
 0% Dixie
 0% Midwestern

Couldn't be more on-target.

Point taken. Try it yourself. But keep in mind the sharp limitations: it's assuming you live in the continental US and have the features associated with your region as assessed by fifteen lexical items and five phonological properties.]

Posted by Geoffrey K. Pullum at 05:28 PM

Quit email, get smarter?

There's a lot of buzz about a new study that (allegedly) shows that email lowers IQ more than pot.

"This is a very real and widespread phenomenon," said Glenn Wilson, a psychiatrist from King's College, London University, who carried out 80 clinical trials for TNS research, commissioned by the IT firm Hewlett Packard. The average IQ loss was measured at 10 points, more than double the four point mean fall found in studies of cannabis users.

Now, I have a lot of sympathy for Don Knuth's attitude about email. As far as I'm concerned, it's usually somewhere between a necessary evil and a major distraction -- and the fact that I sometimes enjoy it just makes things worse.

However, I'm pretty skeptical about the cited study. I can't be very exact about my skepticism, because I haven't been able to find out any details about the experiments. As far as I can tell, nothing has been published so far. Perhaps nothing ever will be published -- this is a privately commissioned study described in a press release, with some quotes from the author in the resulting popular-press articles.

The MSM articles are mostly as careless as usual: the Times indicates that "Eighty volunteers took part in clinical trials on IQ deterioration and 1,100 adults were interviewed", though it doesn't tell us anything about how the IQ experiments were designed; most of the other articles I've seen, such as the Bloomberg wire story, were worse, saying things like "the study of 1,000 adults found their intelligence declined as tasks were interrupted by incoming e-mails and texts. The average reduction of 10 IQ points, though temporary, is more than double the four-point loss associated with smoking cannabis. A 10-point drop is also associated with missing a night of sleep."

I certainly don't expect newspaper stories to be like scientific journal articles, but couldn't they give us one or two sentences about how the IQ study was actually carried out? I'm not just being a fuss-budget here. Think about it. Were the subjects people whose work and social lives normally require email? If so, were they in effect being compared in normal life and on vacation? Or if they were not normally users of email, were they being tested while trying to master a new set of skills such as typing and computer use? If the study was done in a lab setting with concocted emails to read and answer, what was the control activity? Or were subjects simply tested before and after a day of intensive email interaction? Was it even a within-subjects design, with the same subjects tested with and without email and similar distractions, or did the study compare the effects of a day at work on subjects who used email and text messaging vs. subjects who didn't? Without answers to questions like these, I'm not convinced that such a study necessarily tested the things attributed to it at all. (And some answers might well convince me that the study definitely didn't test what is claimed for it.)

The author of the IQ portion of the study is this Glenn Wilson, said to be an expert in "Personality; sexual behaviour; male-female differences; social behaviour; performing arts psychology; fame and celebrity". He's previously written an apparently controversial popular book called The Great Sex Divide; another apparently controversial book called The Psychology of Conservatism; found (surprisingly large if true) differences in startle responses based on sex and sexual orientation; examined the role of hormones in the physiology of "love junkies"; and studied the psychological benefits of bubble baths.

None of the links in the previous paragraph fill me with confidence that Wilson's experimental design can be trusted to have avoided the many obvious confounding factors, or that the popular-press summaries mention any caveats required by its design or its results.

I can certainly believe that mental distraction and/or fatigue temporarily decreases problem-solving ability. But I wonder how the effect of a given period of time reading email compares with the effects of spending the same amount of time in other sorts of potentially distracting or fatiguing mental activities, such as reading novels, doing mathematics, praying or writing poetry. All of these sometimes seem to leave people somewhat dazed. I can see the headlines now: "Science Fiction lowers IQ more than pot does".

You could even look to see whether different sorts of distracting activity affect performance on different neurocognitive tasks differently. But I forgot: Dr. Wilson is an expert on "fame and celebrity".

This seems to be another case where the press is happy to publicize a plausible alarmist result of wide interest, without any hint of the sort of aggressive skepticism that they are famous for applying to the pronouncements of politicians. Is this because there are no journalists who are smart enough and well enough educated to ask the obvious questions? Or is it a matter of high-level editorial policy? Most likely, I guess, it's a combination of laziness and lack of editorial attention.

(More MSM coverage here, here, here, here, here and here.)

[Update 9/25/2005: for the truth about the experimental design, and an apology for blaming the media's excesses on Glen Wilson, see this post.]

Posted by Mark Liberman at 02:16 PM

Language: the anti-beer?

According to BlogPulse's "Trend Search", "language" is negatively correlated with "beer":

The beer spike in the middle of the plot was of course St. Patrick's Day.

For the rest of it, there's no mystery: more people blog about "language OR languages" during the week, and fewer on weekends, whereas "beer" is the opposite. The sad thing is that on any given day, only about 1 blog in 60 mentions either language or beer.

It's interesting that "drugs" doesn't show any clear weekly rhythm:

Nor is there any obvious longer-term relationship between drugs and language:

But what's with the linear downtrend in language since mid-January? Some sort of semesterly rhythm, I suppose...

It would be nice to be able to see a list of words sorted by the relative magnitude of the weekly component in the fourier transform of the time function of their blog frequency -- that would be the relative spectral amplitude at 1/(60*60*24*7) = 1.653439e-06 = 1.65 µHz, for those who insist on SI units.

Posted by Mark Liberman at 06:40 AM

April 22, 2005

Ku two

Yesterday I asked a few questions about the made-up language Ku, used in Sydney Pollack's new movie "The Interpreter". This morning's mail brought some additional information, in a note from David Nash.

It's the London Language Institute, according to http://portal.telegraph.co.uk/arts/main.jhtml?xml=/arts/2005/04/01/bfpollack.xml which Googles to an outfit in Ontario -- which clashes with "we went to a language center in England" that you quote, so we're not much better off. London, UK, would seem a better bet to find Africanists. I'll be interested to see whether you find the adviser.

After we saw the movie last Saturday, I was iChatting with Bill Poser, and he said his field methods consultant's mother is a Shona speaker, so I tried to interest Bill in getting an opinion of the "Ku" that way.

Anyway, there's not much of "Ku" in the movie really, and learning to rattle off a few sentences and expressions is hardly being "fluent in this tongue" eh. Also, it struck me that Our Nicole's character didn't use "Ku" to converse with native speakers (only to interpret) (with one exceptional moment when she barks out some "Ku" to startle a native speaker who doesn't know her -- then they proceed to converse in English -- with not a syllable of Ku thrown in even.) (Not that I want to pan the movie --it has some other strengths I think.)

In addition to the adviser's name and some information about the construction of the language, I'd still like to understand why the journalists involved are so incurious about the details of this aspect of the movie.

[Update: the indefatigable Benjamin Zimmer reports

I found the name of the linguistic adviser for "The Interpreter" from a Nexis search:

Financial Times (London), July 12, 2004, p. 11
The actors have spent time with UN and secret service staff, but the most elaborate arrangement has been commissioning Said el-Gheithy, an African linguist in London, to create a language for the fictional country of Matobo. He used Swahili and Shona as the models for the language, which will be called Matoboan or Ku, depending on whether it is a national or tribal language in the final version of the film. "It has its own internal dictionary, so you can speak it," Misher says. "The guy created a whole culture and history in his mind."

Googling on el-Gheithy's name finds that his affiliation is with the "Centre for African Language Learning," rather than the "London Language Institute."

Benjamin adds in a follow-up that "the Centre for African Language Learning has a whole page with information on el-Gheithy's work on the film..." The page includes this helpful paragraph, suggesting that 'Ku' is supposed to be a sort of approximation to proto-Bantu, and that the name is short for Chi'itoboku:

Although known as 'Ku' to foreigners, the actual language spoken by the Tobosa people of the fictional Democratic Republic of Matoba is indigenously known as Chitob uk u, literally meaning 'the language of the Tobosa people'. Ch'itoboku is the only surviving ancient Bantu language, and the Tobosa oral traditions indicate that 'Ku' is the root of modern Bantu languages spoken in contemporary sub Saharan Africa. The structure of Ch'toboku is characterised by its use of indicators to make up words. For example, 'tobo' is the root and 'sa' is the indicator for 'they'. There is no gender distinction as in French, hence the word for 'he' or 'she' is the same, 'a'. Verbosity is positively valued in Ch'toboku, and ordinary speech should approximate the elegance of poetry. This could be the reason for Sylvia's hesitation when interpreting.

Said el-Gheithy ends his discussion with a Ku proverb:

Truth requires no translation — Angota ho ne njumata


[Update #2: Jean Véronis at Technologies du Langage has a lot more (in French).

And if you're curious about Bantu language, you can find a lot of links at the Comparative Bantu Online Dictionary (cBold).]

Posted by Mark Liberman at 07:51 AM

Smoke signals and sounds

Geoff Pullum, being a syntactician, looked at the smoke over the Sistine chapel on 4/19 and saw a moral about the complex relations between form and meaning in language

The white smoke emerging from the chimney ... to announce the election of Pope Benedict XVI was unquestionably a communication, but not a linguistic one. ...

If all human communication were done in ways similar to the way the cardinals initially signal their votes (as opposed to the way the camerlengo ultimately makes the official announcement to the waiting crowd), then although there might be a discipline of semiotics (created by extra-terrestrial visitors, presumably, since such crude forms of communicative signalling would hardly put humans in a position to create academic disciplines), there would be no linguistics.

Being a phonetician, I saw a different moral, one about the difficult relations between messages and signals in speech.

According to an article by Alessandra Stanley, published on 4/20/2005 in the NYT:

Infallibility is expected of popes and television anchors, so there was something arresting about the confused scramble to interpret the first creamy wisps of smoke floating from the Vatican chimney yesterday.

"Darned if it doesn't look darker," said Charles Gibson of ABC, trying to square the appearance of white smoke with the absence of confirmation from the Vatican bell tower. All the networks went live at the first puff of smoke and as they waited, watched and deliberated (beige? charcoal?), none of the anchors could be certain of what they were seeing.

The first few newswire reports (found on Google News) were equally confused and confusing. The confirmatory bells also were rung, but it was almost time for them to sound the hour anyway, and so some sources discounted this signal and called the whole thing a false alarm, until that camerlengo came out and spoke.

The problem with the smoke signals is that everyone involved gets so little practice. The Vatican employees who burn the ballots don't get any rehearsals, at least not in the real setting, and the people watching outside don't get (what psychologists would call) practice trials with feedback. I'm sure that with a few dozen rounds of practice, everyone involved would get their signals straight.

There's a lesson here for language as well as for communication. These smoke-signaling problems help explain why in human spoken languages, the sound of a word is not defined directly (in terms of mouth gestures and noises). Instead, it's encoded in terms of a phonological system, whereby a word's pronunciation is defined as a structured combination of a small set of elements, meaningless in themselves. This was called "duality of patterning" by Charles Hockett in his celebrated list of characteristic properties of human language. More concretely, we could call it the "phonological principle".

Why is phonological encoding needed? Here's the math: a typical child learns about 40,000 words in the ten years between the ages of 3 and 13. 40,000/(10*365) = 10.96 words per day on average. Most of this learning is without explicit instruction, just from hearing the words used in meaningful contexts. Usually, a word is learned after hearing only a handful of examples. Experiments have shown that young children can learn a word (and retain it for at least a year) from hearing just one casual use.

Let's put aside the question of how to figure out the meaning of a new word, and focus on how to learn its sound.

You only get to hear the word a few times -- maybe only once. You have to cope with many sources of variation in pronunciation: individual, social and geographical, attitudinal and emotional. Any particular performance of a word simultaneously expresses the word, the identity of the speaker, the speaker's attitude and emotional state, the influence of the performance of adjacent words, and the structure of the message containing the word. Yet you have tease these factors apart so as to register the sound of the word in a way that will let you produce it yourself, and understand it as spoken by anyone else, in any style or state of mind or context of use.

In subsequent use, you (and those who listen to you speak) need to distinguish this one word accurately from tens of thousands of others. (The perceptual error rate for spoken word identification can be less than one percent, where words are chosen at random from a list of dictionary headwords and spoken by arbitrary and previously-unknown speakers, and transcribed by careful and motivated listeners under good acoustic conditions.)

Let's call this the pronunciation learning problem. If every word were an arbitrary pattern of sound, this problem would probably be impossible to solve.

The phonological principle solves this problem by splitting it into two problems, each one easier. One problem is to learn the general relationship between phonological "spellings" and sounds; the other problem is to learn the specific phonological "spellings" of individual words.

  • Phonological representations are digital, i.e. made up of discrete elements in discrete structural relations.
  • Copying can be exact: members of a speech community can share identical phonological representations.
  • Within the performance of a given word on a particular occasion, the (small) amount of information relevant to the phonological identity of the word is clearly defined.
  • The acoustic interpretation of phonological representations is general, i.e. mostly independent of word identity.
  • Thus every performance of every word by every member of the speech community teaches about the system as a whole, and therefore helps listeners to sharpen up their perception of all words, not just the particular one spoken.

Later in the NYT article there is a telling phrase about color of the Sistine smoke:

It was those few moments of uncertainty, however, that haunted those who had to hold forth, live, on the air, for minutes with no idea what color smoke was floating to the sky.

But that's not true. They knew what color it was: it was right there in front of their eyes. They just didn't know what its color meant, because they didn't know where to put the threshold in their perceptual space, because they hadn't had enough practice with Sistine smoke, and they didn't have any other relevant experience to bring to bear. At least not in a precise enough way.

Once the feedback came, the watchers tried hard to adjust their thresholds:

Mr. Blitzer on CNN kept going back to the tape.

"It's clearly white," he said. "In hindsight."

It's not only the news anchors who had trouble interpreting what they were seeing. Newsday quotes another watcher whose experience of the smoke's color was also semiotically uncertain and temporally unstable:

"It looks white," said the Rev. Carlos Encina, 40, who is from the small European country of Liechtenstein, "but at the beginning it was black."

Ah, but that was before he knew what it meant.

[Note: some bits of this post are recycled from my lecture notes for ling001]

Posted by Mark Liberman at 07:20 AM

April 21, 2005


According to a 4/21/2005 story in Newsday:

Nicole Kidman learned a made-up language called "Ku" for The Interpreter -- and forgot every last word once the project wrapped.

"I remember nothing," she said Tuesday night at the Ziegfeld, where the movie's premiere kicked off the Tribeca Film Festival.

"It was like studying for an exam," she said. "You just cram it in the night before and then it's completely gone."

The production information at allmovieportal.com says that

[Charles] Randolph [one of the screenwriters] imagined an entire political reality for the made-up country based on Southern Africa's modern history, involving post-colonial struggles, warring tribal factions and institutionalized corruption. Then, working with linguists, the filmmakers helped to forge an imaginary language for Matobo, dubbed "Ku," that would sound entirely real to most ears.

Sydney Pollack explains, "We went to a language center in England and worked with a professor there to develop a tongue that's a cross between Swahili and Shona, two common African languages in Eastern and Southern Africa. This new language, Ku, has elements of both of those languages, along with a number of unique elements.and Nicole Kidman had to become fluent in this tongue that doesn't truly exist."

So shouldn't it be "Ki-Ku", or something like that? I wonder what the "language center in England" was, and who "the professor there" might be. It's striking that these pages, which name literally hundreds of names, don't choose to identify these two other than as rather generic definite descriptions. In the same vein, it's interesting that the trailer for the movie doesn't have even a single syllable of "Ku" in it.

The allmovieportal.com site also explains that

Like Nicole Kidman, [James] Cameron spent weeks learning to speak the fictional language, Ku, for the role, as did Jesper Christensen, who plays President Zuwanie's head of security. "Learning a language that at first made no sense at all was extremely difficult for all of us," says Christensen. "But it also became quite fun after awhile. I think the whole achievement of creating this new language was quite brilliant."

It doesn't say whether he forgot his piece of the brilliant achievement just as quickly.

[Update: more here.]

Posted by Mark Liberman at 11:52 PM

Everybody can write and nobody writes well

Arthur Hugh Clough (that's "Cluff", not "Clue" or "Clow" or whatever) was a 19th-century British poet who deserves to be better known than he is. I cited a passage from his poem The Bothie of Tober-Na-Vuolich in an earlier post, and I'll try to find excuses for quoting more of him in the future. His poetry is simultaneously ornate and informal, in a manner that seems characteristic of his times. He was born in 1819 and died in 1861, and many people today view his stretch of the 19th century as a Golden Age of the English Language. Clough himself, however, saw the 19th century as an age of linguistic iron if not lead.

Here's a passage from his (posthumously published) lecture "On the Formation of Classical English":

The English diction of the nineteenth century has no Burke or Chatham to boast of, nor any Hume or Johnson.

There may be some superiority in matter. We have had a good deal of new experience, both in study and in action---new books and new events have come before us. But we have not yet in England, I imagine, had any one to give us a manner suitable to our new matter. There has been a kind of dissolution of English, but no one writer has come to re-unite and re-vivify the escaping components. We have something new to say, but do not know how to say it. The language has been popularized, but has not yet vindicated itself from being vulgarized. A democratic revolution is effecting itself in it, without that aristocratic reconstruction which pertains to every good democratic revolution. Everybody can write and nobody writes well. We can all speak and none of us know how. We have forgotten or rejected the old diction of our grandfathers, and shall leave, it seems likely, no new diction for our grandchildren. With some difficulty we make each other understand what we mean, but, unassisted by personal explanations and comment, it is to be feared our mere words will not go far. Our grandfathers read and wrote books: our fathers reviews: and we newspapers: will our children and grandchildren read our old newspapers?

And from his "Lecture on the Development of English Literature":

They [the writers of the 18th century] constitute our ordinary standard literature, and for models in English writing the tradition, not yet obsolete, of our fathers refers us imperatively hither. We cannot, with any safety, follow examples anterior to them; nor easily find any amongst their successors. Our own age is notorious for slovenly or misdirected habits of composition, while the seventeenth century wasted itself in the excesses of scholastic effort.

The prose writers that he's (apparently) slighting as "notorious for slovenly or misdirected habits of composition" include Jane Austen, Mary Shelley, Thomas Carlyle, Thomas Macauley, J. S. Mill, Benjamin Disraeli, William Thackeray, Charles Dickens, the Brontë sisters, George Eliot, and more. (It's hard for me to place these lectures in time, since they were not published until after Clough died, but from his biography I would guess that they belong to his period as a professor of English at University College around 1850-1852.)

It's clear that Clough was not a stupid or a tasteless man. But there seems to be something about looking backwards that often blinds people to what is happening around them.

Posted by Mark Liberman at 04:19 PM

April 20, 2005

A new incompleteness theorem

Or is it just a new proof by talk-page diagonalization of the same old result? "No web forum sufficiently powerful to express interesting things can be established as coherent by arguments within its own format..."

[Note: a couple of literal-minded readers have emailed to clue me in that the linked page is not in fact an example of the proof technique known as diagonalization. I do know this: it's supposed to be a joke, not a theorem...]

Posted by Mark Liberman at 04:19 PM

(Mis)Informing Science

Jeff Erickson at Ernie's 3D Pancakes has an extensive review and discussion of the SCIgen affair, in which three MIT grad students got a randomly-generated paper accepted at one of the IIIS/SCI spamferences, as Jeff calls them. Jeff's post features an analysis of the response by the president of IIIS, Nagib Callaos, which Jeff calls a "mindboggling rambling rationalization".

Against this background, I thought I'd take a look at Prof. Callaos' own scholarship. When I checked a couple of days ago, Google Scholar had 33 hits for {Nagib Callaos}, just one of which was a link to a paper by Prof. Callaos in person, rather than a reference to his role as an editor of conference proceedings: Nagib Callaos and Belkis Callaos, "Toward a Systemic Notion of Information: Practical Consequences", Informing Science, 5(1) 2002. This paper has got some mindboggling properties of its own, epitomized by its observation that "[one] bit is the minimum information that a systems [sic] of two states can provide".

It's worth looking at the paper in a bit more detail, as a window into a curious quasi-technical demimonde.

The paper begins:

The meaning of “information systems” has been growing in diversity and complexity. Several authors have pointed out this fact, described the phenomena and tried to bring some order to the perceived chaos in the field. Cohen (1997, 1999, 2000), for example, after describing the attacks on the Information Systems (IS) field, for “its lack of tradition and focus” and the “misunderstandings of the nature of Information Systems,” examines “the limitations of existing frameworks for defining IS” and reconceptualizes Information Systems and tries “to demonstrate that it has evolved to be part on an emerging discipline of fields, Informing Science” (Cohen, 2000). Our objective in this paper is to participate in the process of conceptualization and re-conceptualization required in the area of Information Systems and in Cohen’s proposed Informing Science. We will try to do that making a first step in the description of a systemic notion of information, by identifying, first, the meaning of information. ...

Let's pass over the authors' discussion of what they call "The Subjective Conception of Information" and get to the section on "The Concept of Information as Objective Form or Order", which begins

Lately, an increasing number of authors are showing an objectivist bias in their conception of the notion of “information”. Shannon’s definition of information is at the roots of this perspective, and information technologies authors provided its strong impulse. Shannon, in his 1938 paper, "A Mathematical Theory of Communication," proposed the use of binary digits for coding information. ...

Shannon's paper was published in 1948, not 1938 (specifically, it was originally published in two parts: The Bell System Technical Journal, Vol. 27, pp. 379-423, 623-656, July, October 1948). Am I betraying my "objectivist bias" by fussing about the actual date? In any case, Shannon 1948 is not in the Callaos' paper's bibliography, despite being cited and discussed at some length.

Perhaps this bibliographic omission is an honest one -- at least, Callaos & Callaos seem confused to me about the "objectivist" ideas that they are rejecting, although I'm no kind of expert on information theory. They explain that "the information expected value of an n states system" is given by the equation (image copied from their paper):

The core formula is correct. The equation given in Shannon 1948 is

where K is a positive constant. But nothing is lost if K is set to 1 -- as Shannon explains, "the constant K merely amounts to a choice of a unit of measure", and Shannon also uses the equation without the constant, as we'll see below.

However, it's unexpected for Callaos & Callaos to equate this formula to "–Entropy", since Shannon's formula defines entropy, not negative entropy. The reason for the minus sign in Shannon's formula is that the p's here are probabilities, positive quantities between 0 and 1, whose logs are therefore all negative. (Well, non-positive, allowing for the case of only one option with p=1.) Without the minus sign, the sum would always be less than or equal to zero.

At first I thought that the minus sign in Callaos & Callaos' "–Entropy" was just a typographical error. But no, they go through the case of what they call a "two states system" in detail, concluding that the "minimum information" in this case is obtained when "p1 = p2 = 1/2",

And, if the logarithmic base is 2, then I = log22 = 1, which is the definition of "bit", i.e. a bit is the minimum information that a systems [sic] of two states can provide, or the information that could be provided by a 2 states systems [sic] with maximum entropy.

This seems deeply confused. One bit is the maximum quantity of information that can be provided by a choice between two alternatives, not the minimum. Shannon equated his quantity H directly with Boltzmann's entropy, and described entropy as "a reasonable measure of choice or information", not as a measure of the opposite of information:

Quantities of the form H= –Σ pi log pi (the constant K merely amounts to a choice of unit of measure) play a central role in information theory as measures of information, choice and uncertainty. The form of H will be recognized as that of entropy as defined in certain formulations of statistical mechanics where pi is the probability of a system being in cell i of its phase space. H is then, for example, the H in Boltzmann’s famous H theorem. We shall call H= –Σ pi log pi the entropy of the set of probabilities p1,...,pn. If x is a chance variable we will write H(x) for its entropy; thus x is not an argument of a function but a label of a number, to differentiate it from H(y) say, the entropy of the chance variable y.

How did Callaos & Callaos get this backwards? A clue is provided by a passage later in their paper:

Shannon’s Theory provided the grounds for a strong support to the objectivist position, where information is conceived as completely independent from their senders and receivers, and as a neutral reflection of real world structure or order. The identification of information with negative entropy, or negentropy, made by Shannon, gave the foundation of the increasing emphasis in the objectivist conception of information. Shannon found out that his equation was isomorphic with Boltzmann’s equation of entropy. So, equating both of them, he equalized information to negative entropy. This made some sense, because since entropy is conceived as disorder, negative entropy and information (its mathematical isomorphic) might be both seen as order. Then, anyone who conceives an independent order in the Universe would accept that information, its ‘synonym’, is independent, from any subject. This explains the increasing number of authors endorsing the objectivist position.

This passage seems to me to suffer from several basic confusions, which point to a sort of coherent pattern of error consistent with the earlier oddities in the paper.

Shannon's monograph was entitled "A Mathematical Theory of Communication", not "A Mathematical Theory of Real World Structure" or "A Mathematical Theory of Independent Order in the Universe". His theory is all about senders and receivers and communications channels. It does assume that we can tell whether the message received is the same as the message sent, and it offers a way of thinking about what happens to messages in noisy channels that are independent of both senders and receivers. But it applies just as well to messages whose content is false or undecidable as it does as to true ones. And to the extent that it's used for modeling conceptions of states of the world, as it is for instance in research on perception, this is done by casting the objective world in the role of the sender of a message.

The term "negentropy" was apparently coined by Schrödinger, in his 1944 book "What is Life?" (which apparently inspired James Watson's DNA research):

It is by avoiding the rapid decay into the inert state of `equilibrium' that an organism appears so enigmatic....What an organism feeds upon is negative entropy.

The wikipeida stub for negentropy says that

Schrödinger introduced that term when explaining that a living system exports entropy in order to maintain its own entropy at a low level. By using the term "Negentropy", he could express this fact in a more "positive" way: A living system imports negentropy and stores it.

Schrödinger also apparently suggested that "negative entropy" is something like "free energy". To understand what Schrödinger might have been getting at, and its relations to (the later development of) information theory, look at Tim Thompson's What is Entropy? page (especially his equations 3, 4 and 5). For some thoughts on difficulties with a simple-minded "entropy = disorder" equivalence, see Doug Craigen's summary, and his longer discussion of the same point.

So now I think I see what has happened. Callaos & Callaos start out thinking in terms of rather vague metaphorical relationships like "entropy is disorder" and "information is order", which predispose them to see entropy and information as opposites. Then they trip over the fact that in themodynamics, entropy is sometimes expressed in terms of the number of states of a system, rather than the probabilities of those states. Thus the equation carved on Boltzmann's tomb is

S = k log W

where S is entropy and W is the total number of microstates available to the system. Obviously in this case, W is a large positive quantity, and so log W is also positive. If all the states are equally probable, then the probability of each is 1/W. Since log(1/W) = –log(W), Boltzmann's tomb equation is equivalent to

S = –k log 1/W

and this is the form in which Shannon adopted it, since that form generalizes suitably to the case where the probabilities are not uniform.

Finally, this misunderstanding apparently resonated for Callaos & Callaos with some exposure to Schrödinger's idea of "negative entropy" as the essential stuff of life.

So, we start with a fuzzy conception that "entropy is disorder, information is order"; we add the existence of the term negentropy for "negative entropy", identified by Schrödinger as "what an organism feeds upon" (aha! life feeds on information!); we mix in a confusion over log W vs. –log 1/W ... and hey presto, we've apparently got a couple of deeply confused partisans of "informing science".

If this stuff were in a paper submitted by an undergraduate in a survey course I was teaching, this is the point at which I'd feel like I was starting to earn my salary. I've found a point of significant confusion and a hypothesis about its origin, and now I can sit down with the student and help them on the way to a clearer and more useful understanding of some basic and important ideas. I've also learned something myself (since the Schrödinger "negentropy" business was new to me).

However, according to the biographical sketches given at the end of the cited paper, the authors have been teaching for 32 and 25 years, respectively, on topics including "Informations Systems", "Operations Research", "Software Engineering" and so forth. The first author is president of the Venezuelan chapter of the IEEE/Computer Society. And the two authors are president and vice-president, respectively, of the International Institute of Informatics and Systematics (IIIS), the sponsor of the "spamferences" that started this whole discussion. In the face of these facts, I concur with Prof. Nagib Callaos in "having a huge sadneess".

[P.S. There are a number of other curious points in the cited Callaos & Callaos paper. For example, the biosketch for Nagib Callaos at the end of the paper tells us that

The core of most of his research is based on the Mathematical Solution to the Voter Paradox (or Condorcet Paradox) he discovered in his Ph. D. Dissertation, in opposition to Nobel Prize Kenneth Arrows [sic] who gave a mathematical proof (his Impossibility Theorem) of the impossibility to find a solution to the Voter Paradox. Professor Callaos showed, in his dissertation, several inconsistencies in Arrows’ axioms.

I'll leave it to someone else to track this one down.]

Posted by Mark Liberman at 12:59 PM

English-Only in West Virginia

According to various press reports, the West Virginia state legislature recently passed a bill declaring English the state's official language only to have it vetoed by Governor Joe Manchin III because it violated a provision of the West Virginia constitution that requires each piece of legislation to deal with a single topic. Many legislators did not realize what they had done: the English-only provision snuck by the legislature as a rider on a bill increasing the size of municipal park and recreation boards. Even so, it sounds like it has a fair chance of being reenacted as a separate bill. Governor Manchin favors it, as does Senate Majority Whip Billy Wayne Bailey.

Mr. Bailey is quoted by the Associated Press as explaining:

I just told the members that the amendment clarifies the way in which documents are produced.

Where I come from that is called "lying". I knew that politicians routinely lied to the public; I wasn't aware that it was smart for a Majority Whip to lie to his own caucus.

You'd think that such a bill would be a response to the perception by English speakers that their language was being overwhelmed by others. Here, by way of example, is an editorial by David Gibson advocating Rep. Steve King (R-IA)'s English Language Unity Act, which would make English the official language of the United States. Analysis of the flaws in Mr. Gibson's piece is left as an exercise for the reader. I note only that it contains an error found in most "English only" advocacy, namely the belief that immigrants to the United States from non-English-speaking countries do not wish to learn English and do not do so.

In fact West Virginia has very few speakers of other languages, with only 2.7% of its people speaking a language other than English at home according to the 2000 census. That's the lowest percentage of speakers of other languages in the United States. Its hard to come to any conclusion other than that the motivation for declaring English official is jingoism. Charleston Gazette columnist Phil Kabler put it nicely:

House Judiciary Chairman Jon Amores, D-Kanawha, nailed it when he said (paraphrasing here) that the real intent of the legislation is to send a big up-yours message to non-English-speaking immigrants.

Mr. Kabler laments the fact that West Virginia has been singled out for derision when 28 other states have passed similar legislation, citing a previous incident in which West Virginia was made the butt of jokes when it passed a law allowing people to eat roadkill. I at least have no intention of singling out West Virginia. As far as I'm concerned, West Virginia is merely one of 29 states with an excess of legislators who at best are misguided and at worst are ignorant bigots.

Quebec English Teachers

Most of the time when I misinterpret a headline, after the fact I realize that the intended interpretation is reasonable, but I just encountered an example where that isn't the case. The headline of this CBC News article is Quebec English Teachers Stage 1-Day Strike. My interpretation was that this was about a strike by people who teach English in Quebec. Indeed, I wondered whether it was a routine labor dispute or whether it had something to do with language politics.

It turns out that the article is not about teachers of English: it is about the fact that teachers in the English-medium school system are staging a one-day strike in solidarity with the teachers in the larger French-medium school system. In an alternative universe I can imagine the phrase English teachers meaning "teachers in the English-medium school system", but the association of this phrase with the meaning "people who teach English" is for me so strong that even after the fact I find the headline inappropriate and misleading. I want it to be something like English-medium Teachers Stage 1-Day Strike or Teachers in English Schools Stage 1-Day Strike. You might think that the editor was forced to use a headline he or she didn't consider entirely well formed for reasons of space, but in fact CBC News allows headlines to spill over onto a second line, as in the headline for the current lead article Witness denies discussing sponsorship program with Martin, so it seems that not everyone agrees with my reaction to this headline.

Posted by Bill Poser at 12:00 AM

April 19, 2005

Habemus linguam?

The white smoke emerging from the chimney on the roof of the Sistine Chapel to announce the election of Pope Benedict XVI was unquestionably a communication, but not a linguistic one. It's a rather useful example for drawing the distinction, in fact.

Confusing language with communication is the overwhelmingly most frequent linguistic misconception among those who have not studied linguistics. If elephants make low-frequency noises, ordinary folks think, then elephants must have a language. If flowers can be used to say thank you, there is a language of flowers. And so on for the language of love, the universal language of music, etc. etc. When non-linguists are told that mathematical linguists and theoretical computer scientists often conceptualize a language as simply a set of algebraically defined objects such as bracketed symbol strings, they react with blank incomprehension (I vividly remember doing so myself: I simply could not take in the first sentence on page 13 of Chomsky's 1957 book Syntactic Structures). "But where does meaning come into that?", the naive non-specialist immediately wants to know.

When people learn that Chomsky denies that communication is the main function of human language (he thinks that its primary role is just to provide for structured internal representation of thought), they tend to be baffled and incredulous. Indeed, Chomsky's view on this is rather unusual; Ludwig Wittgenstein tried to make out a case that a private language for the internal representation of thought, one that no one else could know even in principle, was an impossibility, an incoherent idea, and that seems closer to the common idea that language always has a social status and a communicative function, and Chomskyan and Wittgensteinian thought are in sharp controversy on that matter. But my point is not about that. My point is that even if human languages were always and necessarily used for interpersonal communication, that isn't a license for going the other way, and saying that wherever there is communication there is language (so that whales must have language, too, and so must ants, etc. etc.).

The smoke signals from the Vatican were certainly this week's most newsworthy communicative acts. The black smoke rising from the first ballot conveyed a message of the utmost importance to the Catholics who wait and watch in St Peter's Square, and the white smoke signalling the announcement "Habemus papam" confirming Cardinal Joseph Ratzinger's election, when it finally came, even more so. All that linguists are pointing out is that while "Habemus papam" is a linguistic communication, the smoke of burning ballots mixed with damp straw is not. If all human communication were done in ways similar to the way the cardinals initially signal their votes (as opposed to the way the camerlengo ultimately makes the official announcement to the waiting crowd), then although there might be a discipline of semiotics (created by extra-terrestrial visitors, presumably, since such crude forms of communicative signalling would hardly put humans in a position to create academic disciplines), there would be no linguistics. It takes more than a few pre-assigned (or intuitively grasped) meanings for a specific signals to make a language, in anything remotely like the sense in which English or Latin are languages.

Posted by Geoffrey K. Pullum at 01:19 PM

Could language be more popular than porn?

I intend this question in a rather limited sense, as I'll explain below.

By now you must know that if you go to amaztype™, you can see the word of your choice spelled out in letters made up of thumbnails of the publications whose titles contain it. (You can also ask to collect the works by authors rather than titles, or use thumbnails from the covers of music CDs or video/DVDs rather than books.) But now, amaztype™ zeitgeist lists for you the most popular requests.

The current lists (valid as of Apr. 19, 2005, 8:10:01 GMT) have some surprises. For example, the TITLE in ALL MEDIA category is

1 sex 2529 hits
2 fuck 902 hits
3 harry potter 541 hits
4 porn 496 hits
5 flash 474 hits
6 boobs 382 hits
7 love 348 hits
8 php 303 hits
9 cat 270 hits
10 superman 172 hits

Looking at the frequency first, we see that this is one of the few phenomena in the natural or social world that doesn't show a power law distribution, as indicated in the plot on the right. Alert Per Bak! (Note: this is a feeble joke -- Per Bak is dead, and doesn't seem to have been very interested in contrary evidence while he was alive. So please don't send me lists of other examples, unless they're really interesting ones.)

The top-ten words themselves divide naturally into six groups: (1) sex, fuck, porn, boobs; (2) Harry Potter; (3) flash, php; (4) love; (5) cat; (6) Superman. The categories themselves are not surprising, but the choices within the groupings are not always what I would have guessed.

In category (1), where are all the bodily fluids, waste products and rude noises? Not many fans of Dave Barry here, apparently. There are some other features of this category that we'll pass over in silence.

In category (2), is Harry Potter really the only actual book title that users care enough about to spell out? (Dan Brown doesn't make the authors' list either...)

In category (3), what happened to python, perl, java, C++? Are the partisans of lisp too old to register with the zeitgeist anymore? What about OCaml? Is there no pocket of 300 rebel forthians, or hypercardites, still holding out on some far planet of the empire? I won't even ask about C#.

I'm happy to leave category (4) alone, and I guess that category (5) doesn't surprise me either -- dog would be next, but far behind these days, and hamster, ferret etc. are just not in the same class. But who would have guessed that Superman would make the cut, when Spiderman, Wonder Woman and even God are missing?

However, the most striking thing about this list is how small the current counts are. Look, people, more than 3,000 of you read this weblog every day*! If ten percent of you went to amaztype™ and asked for "language", it would rank higher in the amaztype™ zeitgeist than PHP does! If all of you did it, language would outrank sex...

That would probably be inappropriate. But somewhere north of Harry Potter would be nice. So get busy! Tell your friends! Zeit early and zeit often!

[Update: as of 4/21/2005 17:30 GMT, "language" is in third place, with 515 hits, behind only "sex" (2596 hits) and "fuck" (1078 hits), and ahead of "flash", "harry potter", "porn", "love", "boobs", "cat" and "php".]

*At least, sitemeter registers more than 3,000 visitors on an average day. As I understand it, they count visitors in terms of distinct IP addresses within certain time windows. This is an imperfect measure, since some ways of accessing the internet may channel many users through the same apparent IP address, while in other cases, a single user may show up from different IP addresses at different times.

Posted by Mark Liberman at 05:11 AM

April 18, 2005

Waiting for the punch line

A Finnish reader sent in a link to the web site of S.P.E.C.S., the "Society for the Preservation of English and Correct Speech", so (according to the sitemeter tag on the home page) I became its 12th visitor. The president is Albert Tudor-Smythe, there is a featured article by Robin Tyler-Wright, and "Society member Alice Sedgewicke-Browne alerts us to the tendency of Welsh BBC newsreader Huw [sic] Edwards to split infinitives". The society's financial manager, Eric Bowdler, "made the decision to step down after a series of public grammatical errors". Supportive blurbs come (allegedly) from Lynn Truss, "HRH Prince Charles", and "Michael Howard, Jew".

According to Uwhois.com, specs.org.uk was registered on April 16, 2005 by someone living in Whiston. So perhaps the punch line for this joke hasn't been posted yet. Or perhaps this is just another piece of evidence that there are whole geological strata of British humor that are inaccessible to me.

Posted by Mark Liberman at 09:09 AM

A new form of the Urim and Thummim?

Some previously-unreadable portions of the Oxyrhynchus Papyri, a collection of document-fragments found in 1897 in an ancient town dump about 300 km south of Alexandria, are now being read by newly-applied imaging technology. According to The Independent:

...in a breakthrough described as the classical equivalent of finding the holy grail, Oxford University scientists have employed infra-red technology to open up the hoard, known as the Oxyrhynchus Papyri, and with it the prospect that hundreds of lost Greek comedies, tragedies and epic poems will soon be revealed.

In the past four days alone, Oxford's classicists have used it to make a series of astonishing discoveries, including writing by Sophocles, Euripides, Hesiod and other literary giants of the ancient world, lost for millennia. They even believe they are likely to find lost Christian gospels, the originals of which were written around the time of the earliest books of the New Testament.

According to the story in The Scotsman, only about 2,000 of the 100,000 papyrus fragments excavated from Oxyrhynchus had previously been read. The new finds are said to include

parts of the Epigonoi, (Progeny), a long-lost tragedy by Sophocles, the 5th century BC Greek playwright, and part of a lost novel by Lucian, a 2nd century Greek writer. There is also an epic poem by Archilochos, a 7th century successor of Homer, which describes events leading up to the Trojan war.

The Independent writes that

Oxford's classicists... even believe they are likely to find lost Christian gospels, the originals of which were written around the time of the earliest books of the New Testament.

POxy ("Oxyrhynchus Online") tells us the place where the papyri were found was a "county town" whose residents "called it Oxyrhynchus, or Oxyrhynchon polis, ‘City of the Sharp-nosed Fish’". The great thing is that

The town dumps of ancient Oxyrhynchus remained intact right up to the late nineteenth century. They didn’t look exciting, just a series of mounds covered with drifting sand. But they offered ideal conditions for preservation. In this part of Egypt it never rains; perishables which are above the reach of ground water will survive. In the dumps was something which the famous sites of classical Greece and Italy could not preserve: papyrus, the ancient equivalent of paper.

Quite a lot of other good stuff is available on the POxy site, all of it (so far) about results prior to the recent breakthroughs.

A newly-decoded fragment of Sophocles' Epigonoi is offered in translation:

Speaker A: . . . gobbling the whole, sharpening the flashing iron.
Speaker B: And the helmets are shaking their purple-dyed crests, and for the wearers of breast-plates the weavers are striking up the wise shuttle's songs, that wakes up those who are asleep.
Speaker A: And he is gluing together the chariot's rail.

The project leader is Dr. Dirk Obbink, named a MacArthur fellow in 2001. The Independent says that "Oxford academics have been working alongside infra-red specialists from Brigham Young University, Utah", but doesn't identify the BYU people.

There's some resonance here with an older optical technology, the "seer stones" Urim and Thummim that Joseph Smith used in his translation of the Book of Mormon. According to The Cambridge History of English and American Literature (vol. XVIII, part III):

Joseph Smith, sprung of parents reported to be specially responsive to local conditions, said in 1838 that on the night of September 21, 1823, at his home in Manchester, near Canandaigua, New York, the angel Moroni three times appeared to him with a revelation of “Golden Plates” buried on Cumorah Hill, and that on September 22, 1827, in accordance with instructions, he dug up the same, and found them covered with small, mystic characters “of the Reformed Egyptian style”—as Professor Talmage hints. It was a time when people were still talking of the Rosetta Stone, when travelling showmen were exhibiting mummies, and when the Egyptian style was affecting the public taste, even in some housebuilding. 9

With the aid of a pair of crystal spectacles, his “Urim and Thummim,” which Smith said he found, and with the co-operation of certain kindred spirits, Martin Harris, Oliver Cowdery, and David Whitmer by name, whose services were the more valuable because Smith seemed expert neither in reading nor in writing, in 1830 the Book of Mormon was published, and the angel Moroni, according to the narrative, then took away the “Golden Plates.”

Many people's response to the announcement from Oxford is similar to this reaction from Niraj at Blogcritics.org:

Personally, I'm a big fan of Sophocles, and hope to read his newly discovered works as soon as it's [sic] translated.

However, the previously-read parts of the Oxyrhynchus collection are mostly in the form of scattered fragments, not whole works. I don't see any reason, so far, to think that the new stuff will be different.

[Update: Ray Girvan writes to suggest that the BYU group involved must be the Center for the Preservation of Ancient Religious Texts, which previously did multispectral imaging on the Herculaneum and Petra papyri. ]

[Update 4/25./2005: Note that this debunking comment at Ars Technica argues that there's nothing really new happening here, and probably not anything worth calling a breakthrough.

It was clear from the beginning that the technique of multispectral imaging is not at all new, that many Oxyrhynchus fragments have already been decoded over the years, and that the likely outcome would be a stream of new fragments rather than a flood of new texts. However, the Ars Technica comments (by "Hannibal") suggest that even this much may be going too far in support of what may be yet another credulous and under-researched piece of journalistic sensation-mongering.

The cited scholars are reputable, but of course the spin came from (the reporters) David Keyes and Nicholas Pyke at The Independent, Alastair Dalton at The Scotsman, etc., and thus is suspect. There is now a page on the POxy site discussing the developments, which gives a much more sober and balanced assessment:

The results provided many new readings and confirmation of uncertain readings in some problematic areas, none at all in others, depending on settings and surface type. A number of new identifications emerged of literary and documentary texts not previously made by the usual means, together with the isolation of four or five different types of surface and obscurity that respond well or not well to the BYU process.

More specifically

The process seemed to work best on darkened, charred, or stained surfaces, and can image through some surface materials, but sees nothing through mud, clay, or silt. It produced excellent results on palimpsests, cancellations, and erasures due to damnatio memoriae, and on disintegrating surfaces where the ink has settled deep into the fibres. It was least successful on surfaces that were partially or entirely washed out. On abraded and uneven surfaces the camera's long depth of field elides differences in levels and aids reading by eliminating all shadows and levelling so that all writing appears well-defined as though on a single layer.

We can't really tell whether the breathless "holy grail" stuff in the news reports was provoked by the scholars (scholars are not always innocent of hype, when given a shot at it) or entirely invented by the journalists. In this sort of case, my rule of thumb is to blame the journalists, who at a minimum failed to ask a few probing questions and to poke around for some relevant background on the web.

I suspect that the quality of MSM reporting has always been this bad. We just didn't noticed it before, because there was no effective mechanism for knowledgeable people to circulate corrective information. However, it's possible that the pressure of competition for a dwindling pool of readers has recently been making things a bit worse. ]

Posted by Mark Liberman at 07:30 AM

April 17, 2005

And they're all unzipped

There's a great double pun in the latest PartiallyClips cartoon:

But shouldn't that be eight billion?

Posted by Mark Liberman at 02:53 PM

The virus is in the mail

Are the vials half missing or half found? Actually, it's not 1/2 vs. 1/2, but rather something like 1/9 vs. 8/9, or perhaps about 20/4,700 vs. 4,680/4,700, but the principle is the same.

Here are some of the headlines under which this story is now running [via Google News]:

WHO Update: 90% H2N2 Influenza Virus Destroyed
Most of dangerous flu virus destroyed, officials say
Two-Thirds of Deadly Virus Destroyed
Two-Thirds of Lethal Asian Flu Virus Destroyed - WHO
WHO: All virus samples to be destroyed
No need for panic
Samples of pandemic flu virus sent to Lebanon, Mexico and Chile missing: WHO
Most of dangerous flu virus destroyed, officials say
Flu samples destroyed after epidemic fears
Labs race to destroy flu virus after test kit mistake
U.S. Health Experts Say Mistakenly Distributed Flu Virus Being Destroyed
Virus kit destruction makes progress
Flu Strain Almost Destroyed - WHO
Deadly flu samples sent out by mistake nearly all destroyed

Vials of deadly flu virus still missing, WHO says
2 killer flu virus samples still missing
Vials of deadly flu virus still missing, WHO says
Location of Flu Strain Samples Still Unknown
Samples of pandemic flu virus sent to Lebanon, Mexico and Chile missing: WHO
WHO: Virus Sent to Mexico, Lebanon Missing
Deadly virus samples unaccounted for
Deadly flu virus unaccounted for in Lebanon
Mexico: Deadly flu lost in the mail?
Deadly influenza virus shipments missing: WHO
WHO: Deadly Viruses Gone Missing

We're talking about vials of H2N2 flu virus, the strain involved in the 1957 pandemic, which were included in a shipment of materials sent out starting last October by an Ohio bioscience company as part of accreditation testing kit that labs use to demonstrate that they can correctly identify flu viruses. Under the instructions of the College of American Pathogists, which handles the accreditation, packages were apparently sent to 4,700 laboratories in 18 countries around the world. On March 26, a Canadian lab noticed that the testing kits included samples of the 1957 pandemic virus, which has not been seen in humans since 1968, and should not have been included. It's assumed that if this virus gets out, it will spread rapidly and kill many, since no one born since 1968 will have any resistance to it.

The Globe and Mail tells us that "three of the potentially deadly packages never reached their destinations and are still missing, a UN official says". Specially, "vials of H2N2 virus shipped to certain labs in Lebanon, Mexico and Chile could not be accounted for". (Chile is now accounted for). Other news reports suggest that it's not three packages that were missing, but rather all the packages sent to labs in three (now two) countries. According to Klaus Stohr of the World Health Organization, "the College of American Pathologists suggested it was possible the missing samples never were sent". Well, that's reassuring.

And so is this (not).

Posted by Mark Liberman at 11:42 AM

Laisser who?

There's more news from Jacques Chirac's fight against Anglo-Saxon liberalism, which I mentioned in an earlier post. And this time it's lexical.

Le Monde 4/15/2005, on Chirac's televised "debate" of 4/14/2005 about the European Constitution -- the French will vote "oui" or "non" on May 29, and Jacques is trying to reverse the rising tide of sentiment for "non":

Un coup d'oeil aux fiches surlignées de rose et de jaune qui s'étalent sur la table et M. Chirac se lance sur ses deux thèmes de prédilection. La mondialisation "portée par un courant ultralibéral au profit des plus forts, ce qui pose problème" et l'émergence de nouvelles grandes puissances. "L'Europe doit être forte et organisée pour s'opposer à cette évolution", explique longuement le président.

Il se répète encore pour fustiger l'Europe du "courant ultralibéral, anglo-saxon, atlantiste", qu'il qualifie de "solution du laisser-aller", version chiraquienne et ironique du "laisser faire, laisser passer" des libéraux du XVIIIe siècle.

A glance at the cards underlined in pink and yellow spread out on the table, and M. Chirac plunged into his two favorite themes. Globalization "carried by an ultraliberal current for the profit of the strongest, which poses a problem"; and the emergence of new great powers. "Europe must be strong and organized in order to oppose this development", explained the president at length.

He repeated himself again to scourge the Europe of the "ultraliberal, anglo-saxon, atlanticist current", which he described as a "laisser-aller ['let go'] solution", the Chiraquian and ironic version of the "laisser faire, laisser passer" of the 18th-century liberals.

I don't plan to start a Chiraquism-of-the-day feature. However, Chirac's apparent (ironic?) malapropism -- laisser aller for laisser faire -- made me wonder what the history of these phrases really is, in French as well as in English, and in terms of connotation as well as denotation. Here are some (long and chaotically ordered) notes from some time spent poking around this morning.

The OED defines laissez faire as

A phrase expressive of the principle that government should not interfere with the action of individuals, esp. in industrial affairs and in trade.

However, the earliest citations for this phrase in English date only to the 19th century, not the 18th. The citations given also suggest that this was a largely a term of abuse in the beginning, even among Anglo-Saxons:

1825 [MARQ. NORMANBY] Eng. in Italy I. 296 The laissez faire system of apathy.
1848 Simmonds's Colon. Mag. Aug. 338 Mammonism, laissez-faireism, Chartism, currency-restriction [etc.].
1873 H. SPENCER Stud. Sociol. xiv. 352 Shall we not call that also a laissez-faire that is almost wicked in its indifference.
1887 Contemp. Rev. May 696 The ‘orthodox’ laissez-faire political economy.
1891 S. C. SCRIVENER Our Fields & Cities 168 Laissez-faire is the motto, the gospel, of the person who lives upon the work of another.

I wonder whether that is because of how the phrase was actually used, or because of the political opinions of the OED's editors? It wouldn't be surprising for the term to have had a negative connotation from the start, given the resonances in French of the construction from which it's derived. The Dictionnaire de l'Académie Française, 8th edition (1932-5), has no entry for laisser/laissez faire in anything like its current economic sense, but it does mention a couple of meanings that may be relevant to the political and cultural debate:

[under faire]:

Se laisser faire, se dit d'une personne qui ne se défend pas, qui n'oppose point de résistance. On se jeta sur lui pour le battre, et il se laissa faire. Son tuteur l'a mariée, elle s'est laissé faire.

Se laisser faire (lit. ="self let do"), said of someone who does not defend himself/herself, who offers no resistance. They jumped him to beat him up, and he let it happen. Her guardian married her, and she went along.

[under laisser]

Suivi d'un infinitif, signifie Permettre, souffrir, ne pas empêcher. Je l'ai laissé sortir. Je l'ai laissé reposer. Laissez-moi parler. Je les ai laissés aller. On a laissé échapper ce prisonnier. Laisser tomber ce qu'on a dans les mains. Se laisser tromper. Se laisser faire du tort. Se laisser dire des injures. Se laisser tomber. Se laisser aller à la douleur.

Followed by an infinitive, means Permit, allow, not prevent. I let him leave. I let him sleep. Let me talk. I let them go. They let the prisoner escape. To let fall what you're holding. To let oneself be deceived. To let oneself be done wrong. To let oneself say offensive things. To let oneself fall. To let oneself give in to sorrow.

[and again]

Laisser à d'autres la direction de soi-même. On dit aussi, figurément et familièrement, Se laisser mener par le bout du nez, Laisser prendre de l'empire sur soi et n'avoir pas la force de s'y opposer.
Fig. et fam., Se laisser faire, Ne pas opposer de résistance, ne pas se défendre, ne pas résister à des offres, à des avances.

To leave to others the control of oneself. It is also said, figuratively and familiarly, To allow oneself to be led around by the nose, = To let someone take control of you without having the energy to resist.
Fig. and fam. Se laisser faire (= lit. "self let do"), To fail to offer resistance, to fail to defend oneself, to fail to resist offers or advances.

It seems laisser with an infinitive can describe something praiseworthy about the subject of laisser ("I let him sleep") as well as something blameworthy ("They let the prisoner escape") -- though perhaps the sense of failing to act tends generally to connote weakness. However, when laisser is combined with the infinitive faire ("make, do, act"), the result seems always to be a Bad Thing for the subject, at least in all the examples that the dictionary gives. And perhaps we should also note the gender associations -- for males, the Académie Française finds that prototype of se laisser faire is to get beaten up, while for females, it's a forced marriage. So in France, it seems that laisser faire evokes an effective frame for rallying all sectors of the population against les perfidies anglaises -- though for Chirac, it may be a problem that the normal way to avoid the humiliation of se laisser faire is to say "non"...

Anyhow, the use of the phrase laissez/laisser faire (and perhaps laissez/laisser passer ) in economics may have begun with the physiocrats in 18th-century France, led by François Quesnay (1694-1774), who pioneering the idea that "leaving the economy alone" might be a good thing. I'm not certain of the lexicographical facts, however, because the phrase does not occur in any of the editions of Quesnay's Tableau Économique, nor in any of the (few) other words of his school that I've been able to find online and search. Unfortunately the BNF's Gallica site, which has some of the physiocrats' works in its index, doesn't provide the option of sorting its results by date.

The other obvious place to look is in the works of Adam Smith, "whose name more than any other is connected with British laissez-faire doctrines" (according to the Columbia Encyclopedia). However, the string laissez apparently does not occur in his Wealth of Nations, and laisser occurs only in this (irrelevant) footnote:

72. [Possibly the supposed authority for this statement is Montesquieu, Esprit des Lois, liv. xxi., ch. vi.: `L'Egypte éloignée par la religion et par les mœurs de toute communication avec les étrangers, ne faisait guère de commerce au-dehors.... Les Egyptiens furent si peu jaloux du commerce du dehors qu'ils laissèrent celui de la mer rouge à toutes les petites nations qui y eurent quelque port.']

Thus I've so far failed to find any 18th-century uses of laissez/laisser faire to denote an economic doctrine -- more later as it develops.

The OED also has an entry for laissez aller, defined as

Absence of restraint; unconstrained ease and freedom.

and also originating in the first half of the 19th century:

1842 THACKERAY Miss Löwe Misc. Ess. (1885) 310 As Wilder said with some justice, though with a good deal too much laisser-aller of tongue.
1862 ---- Philip II. xxi, Sir John..was constrained to confess that this young man's conduct showed a great deal too much laissez aller.
attrib. 1818 LADY MORGAN Flor. Macarthy II. iii. 178 He..found or fancied in her what he called the ‘delicious laissez aller ease of a charming French woman’.
1832 LD. LYTTON Godolphin xx, Those well-chosen laissez aller feasts.
1839 DICKENS Nich. Nick. Pref., A magnificent high-handed laissez-aller neglect.

Curiously, the "attributive" use by Lady Morgan in 1818 -- to describe the "delicious ease" of a "charming French woman" -- seems to be the earliest documented use in English of any laisser+infinitive phrase at all.

[John Kozak has pointed out by email that Collins-Robert gives "casualness, slovenliness, carelessness" as the gloss for laisser aller. John suggested that this might be exactly what Chirac meant. I'm a bit skeptical, since the topic is political economy, not personal hygiene; but in any case this draws attention again to the negative connotations of French laisser+infinitive.

However, there is some support for John's theory in an email from Jean Véronis:

I was puzzled when I read the quote from Le Monde, because in my memory I remember this maxim as "Laissez faire laisser aller", and Chirac's use didn't strike me as an ironic invention, but rather a shorter form of the maxim. I am convinced that I have heard the maxim cited in this form ("Laissez faire laisser aller") many times. There seems to be some hesitation about the original form. See:
http://www.econlib.org/library/Marshall/marPNotes5.html (note 37).
So, I wouldn't jump too quickly on the same interpretation as Le Monde and see irony in Chirac's use of the phrase. I'd rather think that Le Monde's writer hadn't heard the complete and/or alternate form(s).

The note that Jean cites is this:

Even the generous Vauban (writing in 1717) had to apologize for his interest in the wellbeing of the people, arguing that to enrich them was the only way to enrich the king--Pauvres paysans, pauvre Royaume, pauvre Royaume, pauvre Roi. On the other hand Locke, who exercised a great influence over Adam Smith, anticipated the ardent philanthropy of the Physiocrats as he did also some of their peculiar economic opinions. Their favourite phrase Laissez faire, laissez aller, is commonly misapplied now. Laissez faire means that anyone should be allowed to make what things he likes, and as he likes; that all trades should be open to everybody; that Government should not, as the Colbertists insisted, prescribe to manufacturers the fashions of their cloth. Laissez aller (or passer) means that persons and goods should be allowed to travel freely from one place to another, and especially from one district of France to another, without being subject to tolls and taxes and vexatious regulations. It may be noticed that laissez aller was the signal used in the Middle Ages by the Marshals to slip the leash from the combatants at a tournament.


According to the OED, laissez-passer in English dates only from the early 20th century, and is used only in the sense of "[a] pass, especially one used in lieu of a passport", not as a way to refer to doctrines of free trade or free emigration:

1914 T. A. BAGGS Back from Front xx. 94 You must first pass grim Charon and his watchdogs at the entrance, where your passports, laisser-passers, sauf-conduits, are inspected.
1928 Sunday Express 1 July 5 The Ballet was given a laissez-passer and were allowed to come to England through Paris.
1936 E. WAUGH Waugh in Abyssinia 77 Many writers have left accounts of the intricate system of tolls and hospitality by which the traveller was passed on from one chief to another and of the indifference with which the Emperor's laissez-passer was treated within a few miles of the capital.

[Update: I searched the ProQuest American Periodicals Series 1740-1900 (APS) database, and did find one use of laissez faire earlier than the OED's 1825 citation. The work is an anonymous review in The Literary and Scientific Repository, and Critical Review 4(7), January 1822, of M. le Chevalier Chaillou des Barres' "Essai, Historique et Critique, sur la Legislation des Grain, jusqu'a ce jour", (Didot, Paris, 1820) . The citation is an an untranslated quotation, discussing an act passed in 1736 "requiring particular societies and eleemosynary institutions... to keep on hand three years supply of provisions, and directing that a public grainery should be constructed to contain ten thousand muids of grain.."

The nation now possessed a number of englightened statesmen, whose learning, good sense, and respectability, could not fail strongly to impress the court, licentious as it was, with the truth and justice of their views. These economists, for so they were called, at the head of whom was M. Turgot, warmly espoused the freedom of the corn trade, and put forth the following principle, which justly merits the title of an axiom in political economy:

" Laissez faire -- le commerce et l'intérêt personnel sont là qui veillent à votre conservation; si les blés deviennent rares en France, c'est en France aussi qu'on les apportera. "

This principle, so self evident, M. Chaillou denounces as replete with danger; and considers it amply refuted by the following weak observation.

" Mais quand y parviendront-ils avec des communications intérieures encore si imparfaites ? dites-moi, est-il bien certain que les bateaux ou les voitures transportant des blés arriveront dans les province réculées assez à temps pour prévenir les effect d'une cherté désastreuse ?"

The first quotation is apparently from Anne-Robert-Jacques Turgot (1727-1781), who would have been only 9 years old in 1736, and so must have written about the events in question from a historical perspective later on.

Anyhow, this passage suggests that laissez faire had become "an axiom in political economy" by 1822 in America, and was denounced as "replete with danger" in Paris. This biographical sketch of Turgot (by David Hart) identifies the original source of the phrase as Vincent de Gournay:

Also during the mid-1750s Turgot came into contact with members of the French free market school known as the Physiocrats. He met Dr. Quesnay and Dupont de Nemours and traveled extensively with Vincent de Gournay (who was the free market Intendant for Commerce) on his tours of inspection around the country during 1753-56. It was Gournay who is reputed to have coined the expression "laissez faire, laissez passer" when asked what government economic policy should be. When Gournay died in 1759 Turgot wrote a lengthy "Eloge de Gournay" in which he defended laissez-faire economic policies with an eloquence which other members of the Physiocratic school too often lacked.

However, this page by Karen De Coster says that

The physiocrat Marquis de Gournay is usually credited with originating the term "laissez -faire, laissez passer," but A.R.G. Turgot's Eloge de Gournay attributes that term to Le Gendre, an anti-Colbert merchant of France, who spoke the phrase "laissez-nous faire" while speaking out against the Colbertization of industry. Boisguillebert is also said to have used the term before Gournay.
[From The Physiocrats by Henry Higgs, 1897.]
Colbertism was an extreme form of mercantilism built around war financing schemes, high taxation, and central planning.

Unfortunately, access to the Eloge de Gournay from the BNF's Gallica site is limited to one page image at a time (with a significant wait for the preparation of each page), and I don't have time to read all 30 pages by this method to see what Turgot actually said. Maybe later. ]

[Update on the Éloge de Gournay: Jean Véronis sent instructions about how to download the whole document at once; and also the information that Turgot cites Le Gendre's slogan as "laissez-nous faire", with nothing about either "passer" or "aller", while Dupont de Nemours, in the preamble to the Éloge, uses the "laissez passer" idiom. Here's the relevant passage from the Dupont de Nemours préambule:

M. de Gournay, fils de négociant, et ayant été longtemps négociant lui-même, avait reconnu que les fabriques et le commerce ne pouvaient fleurir que par la liberté et par la concurrence qui dégoûtent des entreprise inconsidérées, et mênent aux spéculations raisonnables; qui préviennent les monopoles, qui restreignent à l'avantage du commerce les gains particuliers des commerçants, qui aiguisent l'industrie, qui simplifient les machines, qui diminiuent les frais onéreux de transport et de magasinage, qui font baisser le taux de l'intérêt; et d'où il arrive que le productions de la terre sont à la première main achetées le plus cher qu'il soit possible au profit des consommateurs, pour leurs besoins et leurs jouissances.

Il en conclut qu'il ne fallait jamais rançonner ni réglementer le commerce. Il en tira cet axiome: Laissez faire et laissez passer.

M. de Gournay, son of a merchant, and having long been a merchant himself, recognized that manufacture and trade could only flourish by means of freedom and competition, which repels ill-considered enterprises, and encourages rational speculation; which prevents monopolies and restrains to the advantage of commerce the profits specific to traders, which sharpens industry, which simplifies machines, which diminishes the onerous costs of transport and storage, which lowers the rates of interest; and from which it develops that the fruits of the earth are bought at as high a price as is possible to the profit of consumers, for their needs and pleasures.

He concluded from this that commerce should never be extorted or regulated, and derived this axiom: Laissez faire et laissez passer [ = "let people work as they please, and go where they want"]

Here's Turgot's citation of Le Gendre's slogan, from the body of the Éloge:

La résistance que ces principes ont éprouvée a donné occasion à plusieurs personnes de représenter M. de Gournay comme un enthousiaste et un homme à système. Ce nom d'homme à système est devenu une espèce d'arme dans la bouche de toutes les personnes prévenues ou intéressées à maintenir quelques abus, et contre tous ceux qui proposent des changements dans quelque ordre que ce soit. [...]

Il faut dire encore que ce prétendu système de M. de Gournay a cela de particulier, que les principes généraux en sont à peu près adoptés par tout le monde; que de tout temps le vœu du commerce chez toutes les nations a été renfermé dans ces deux mots: liberté et protection, mais surtout liberté. On sait le mot de M. Le Gendre à M. Colbert: laissez-nous faire. M. de Gournay ne différait souvent des gens qui le traitaient d'homme à système, qu'en ce qu'il se refusait, avec la rigidité d'un esprit juste et d'un coeur droit, aux exceptions qu'ils admettaient en faveur de leur intérêt.

The resistance that these principles have met has given several people the occasion to represent M. de Gournay as an enthusiast and a systematizer. This name of systematizer has become a sort of weapon in the mouth of everyone concerned or interested in maintaining some abuses, and against all those who propose changes in any social structure at all. [...]

It must also be said of this supposed system of M. de Gournay, that its general principles have been mostly adopted by everyone; that all nations' laws of commerce have been restructured on these two words, "freedom and protection", but especially freedom. We know what M. Le Gendre said to M. Colbert: "let us work". M. de Gournay often did not disagree with those who called him a systematizer, except when he refused, with a just rigidity of spirit and an honest heart, the exceptions that they permitted in favor of their own self-interest.


[Update #2: here's another citation from APS, in which the phrase is used in an English-language context, though still quoting Dupont de Nemours in French. And the vibe is a positive one, by contrast to the OED's early citations. The source is a review of Daniel Raymond's The Elements of Political Economy, 1823; published in The Southern Review, v. 5 n. 9, Feb.-May 1830.

The school of Adam Smith has adopted the broad and liberal principles of the Economists; and to that meddling spirit of rulers which has so often led them to make regulations for the industry of the governed, they reply, laissez faire et laissez passer: "for as the public interest consists in the union of all individual interests, individual interest will guide each man more surely to the public interest than any government can do."


Posted by Mark Liberman at 10:23 AM

April 16, 2005

Word strength ethics

From the "[t]ranscript of an interview between editors and reporters from The Washington Times and House Majority Leader Tom DeLay, Texas Republican, [Apr. 12] at his Capitol office:"

Mr. Hurt: Have you ever crossed the line of ethical behavior in terms of dealing with lobbyists, your use of government authority or with fundraising?

Mr. DeLay: Ever is a very strong word.

Mr. DeLay continues:

Let me start out by saying, you can never find anything that I have done for personal gain. Period.

It seems to me that never is a pretty strong word, too -- at least as strong as ever, and it is only further strengthened in this context by the added Period.

Then Mr. DeLay adds:

What I'm doing is what I believe in, I'm doing it the way I believe in it.

What does this all mean? In the context of Mr. Hurt's question, ever is apparently too strong for Mr. DeLay, which appears to mean that Mr. DeLay is admitting to have "crossed the line of ethical behavior" once or twice. But, Mr. DeLay would like to "start out by saying" (by which I assume he means "emphasize") that any ethical line-crossing that he's done has never been "for personal gain" (never ever even, given that added Period.), and that ethical line-crossing is one of the ways he believes in doing things for things he believes in.

So, what we are to learn from this is that Mr. DeLay is of the opinion that it's OK to (occasionally) cross the line of ethical behavior as long as it's for something you believe in, and not for personal gain. It's an admission of wrong-doing, but one that's safely couched in a complicated little ethics lesson for the kids watching at home.

For the record, the rest of Mr. DeLay's response to Mr. Hurt's question runs as follows. Or if you prefer, read the whole interview here.

Yes, I'm aggressive. I'm passionate about what I believe in, and I'm passionate about winning and accomplishing our agenda. I know since 1995 that everything that we have done has been checked by lawyers, double-checked by lawyers, triple-checked by lawyers, because I know I have been watched and investigated probably more than even Bill Clinton. They can't find anything, so they're going back to my childhood, going to my family, going to things that happened eight years ago. There's nothing there. And they can keep looking. There's nothing there. I have tried to act ethically, I have tried to act honestly. I have tried to keep my reputation - to fight for my reputation - while it's been besmirched, and I have tried to do it in a way that brings honor to the House.

[From my Cognitive Science colleague Seana Coulson, by way of my Linguistics colleague Robert Kluender.]

[ Comments? ]

Posted by Eric Bakovic at 05:19 PM

What is "sausage-eating bastard" in Latin?

Current events in Rome have inspired a wonderful series of posts by Angelo Mercado ("Caelestis") at Sauvage Noble, among which I'll single out his discussion of Clint Hagen's critique of the Latin "attack ad" presented on the Daily Show on 4/12/2005. That post includes his Latining of the Lex Hartmania McCeania Scittiaque de talione ex grammaticis ("Harman, McKean and Skitt's Law of Prescriptive Retaliation"), among many other delights. His post about the names of four new slime-mould beetles is also good fun.

Among other interesting links on current events in Rome, Bruce Schneier wrote in his blog on hacking the papal election, suggesting that the level of mutual trust in the conclave of cardinals is not terribly high, and a 4/14/2005 article by Natasha Bita in the Australian leads to a similar conclusion on the basis of different evidence.

Much more conclavity at Wim Wylin's weblog nieuws over de kardinalen en het conclaaf.

Posted by Mark Liberman at 12:38 PM

The future of the history of usage

The OED traces "could care less" back to 1966:

1966 Seattle Post-Intelligencer 1 Nov. 21/2 My husband is a lethargic, indecisive guy who drifts along from day to day. If a bill doesn't get paid he could care less.

A few days ago, Benjamin Zimmer supplied a citation from 1955, which he got from searching the ProQuest Historical Newspapers database:

This Morning . . . With Shirley Povich
Washington Post, Sep 25, 1955, p. C1
The National League clubs have always shied from pitching left-handers against the Dodgers, but Casey Stengel could care less about the Dodgers' reputation for beating southpaws.

The ProQuest Historical Newspapers and American Periodicals Series (APS) databases are the leading edge of a series of developments that will make it possible to study, in an entirely new way, the origin and progress of new idioms, constructions and word senses. All we can do so far is to search for words and word sequences, contingent on source and/or date, but this is already very useful.

When researchers have fuller access to the back-end corpora of OCR'ed text, or when outfits like ProQuest have access to modern NLP technology, it will be possible to search over corpora that have been automatically tagged for morphological and syntactic properties, word senses, discourse function and so on. An even more important innovation will be the ability to go beyond the search for the earliest citation, or for a representative series of historical citations, and instead to create richer compilations of information about changes in usage as a function of time, space, genre, personal identity and so on.

There are many legal, social and technical issues between us and that happy end, but the first step is establish a digital archive of the historical texts, and that is already happening.

[Note that the various ProQuest databases are subscription services, which may be available to you through a library. If the University of Pennsylvania is typical, university libraries subscribe to some but not all of the relevant services -- though the Penn library I can access the APS sources, and the New York Times portion of the historical newspapers archive, but not the other papers. You may also be able to access such databases through some public libraries.]

Posted by Mark Liberman at 11:56 AM

April 15, 2005

Cybernetic text

As you've probably read by now, some grad students at MIT ginned up an "Automatic CS Paper Generator", using a "hand-written context-free grammar to form all elements of the papers". As the authors (Jeremy Stribling, Max Krohn and Dan Aguayo) explain,

One useful purpose for such a program is to auto-generate submissions to "fake" conferences; that is, conferences with no quality standards, which exist only to make money. A prime example, which you may recognize from spam in your inbox, is SCI/IIIS and its dozens of co-located conferences (for example, check out the gibberish on the WMSCI 2005 website). Using SCIgen to generate submissions for conferences like this gives us pleasure to no end. In fact, one of our papers was accepted to SCI 2005!

This exploit has made it into news outlets via Reuters, and no doubt soon other services. The Reuters story suggests that the students were "surprised" to have a paper accepted at SCI2005 ("The 9th World Multi-Conference on Systemics, Cybernetics and Informatics"). For my part, I was surprised that only one of their papers was accepted.

Like everyone else on whatever lists the SCI/IIS spammers use, I regularly find my spam traps clogged with hundreds of transparently fraudulent invitations to submit abstracts to conferences to be held in somewhat attractive places, contingent of course on my willingness to pay a few hundred dollars to register. I've always assumed that these folks would be happy to take my $390 and accept a compilation of old shopping lists. Perhaps Stribling, Krohn and Aguayo submitted two papers under the same authors' names? That would explain why only one of their papers was accepted, since the other would bring in no additional revenue. And I really like the letter that they got when they asked to see the reviews on the basis of which the paper had been rejected...

Reuters makes the connection to Alan Sokal's Social Text hoax:

The prank recalled a 1996 hoax in which New York University physicist Alan Sokal succeeded in getting an entire paper with a mix of truths, falsehoods, non sequiturs and otherwise meaningless mumbo-jumbo published in the quarterly journal Social Text, published by Duke University Press.

But I wasn't aware that Social Text (which is still publishing) was (or is) generally regarded as a "fake journal" in the fields that it served, in the sense that SCI/IIS has always been widely understood to be a "fake conference". And the editors' motivation for publishing Sokal's parody was not financial -- the hope of making money by inducing him to register for a phony conference -- but rather ideological --- his submission (as he has explained it) "came from a 'conveniently credentialed ally'' (as Social Text co-editor Bruce Robbins later candidly admitted), flattered the editors' ideological preconceptions, and attacked their 'enemies'."

Reuters quotes the infamous Nagib Callaos offering some feeble excuses. (I was shocked to learn that Callaos actually exists, since I had always assumed that he was one of those people like Serenity Q. Oxbow from whom I often receive other attractive offers by email.)

Nagib Callaos, a conference organizer, said the paper was one of a small number accepted on a "non-reviewed" basis -- meaning that reviewers had not yet given their feedback by the acceptance deadline.

"We thought that it might be unfair to refuse a paper that was not refused by any of its three selected reviewers," Callaos wrote in an e-mail. "The author of a non-reviewed paper has complete responsibility of the content of their paper."

However, Callaos said conference organizers were reviewing their acceptance procedures in light of the hoax.

Asked whether he would disinvite the MIT students, Callos replied, "Bogus papers should not be included in the conference program."

It's normal (and not necessarily stupid) to see a random recombination of problems, models, algorithms and thematic settings in the papers accepted at serious conferences. There's a long tradition of poking fun at this process, exemplified by the call for papers for the " 1st Workshop on Unnatural Language Processing". But at a serious scientific or engineering conference, even the worst of these memetic recombinations makes sense at a certain level, and sometimes the process creates a new conceptual species that even deserves to prosper. The results of the MIT students' interactions with Nagib Callaos provide evidence that the SCI/IIS system, in contrast, really is the scam that it appears to be.

[Update 4/16/2004: I had completely forgotten an earlier prank originating at MIT, only slightly less spectacular, that also demonstrated the fraudulent character of the SCI/IIIS process: Prof. Justin Zobel submitted three nonsensical papers, all of which were accepted. One was created by selecting alternating sentences from two existing papers; another explained that "we have implemented a[n] ... algorithm ... the computational cost is high, and the method does not work at all. We believe that this method is not capable of being improved", and included discussions of the consequences of the authors' inebriation and their decision to "invent more promising results, an approach that we report on in the last line of the table"; the third was '[a] surreal collection of remarks about information retrieval, by myself and a colleague. Aside from the first page, many of the paragraphs make no sense, and much of the content consists of jokes and nonsequiturs".

All three were accepted. I learned from Zobel that my theory about why the second artificially-created paper was rejected must be false:

The organisers of the conference invite the contributors to pay the registration fee, with a separate fee for each accepted paper, and state it is not necessary to attend so long as the publication fee is paid -- a highly unusual practice. I have repeatedly requested the referees' reports, but there has been no response. The organisers have however rapidly responded to queries about the financial arrangements.


[And now, an even better submission has been revealed! I'm seriously tempted to borrow it and submit a copy myself!!]

[Update: more on the work of Prof. Callaos himself here]

Posted by Mark Liberman at 07:00 AM

April 14, 2005

Linguistic sorcerers

Joshua Green's piece in the May 2005 issue of Atlantic Monthly highlights Democrats' recent interest in linguist George Lakoff's book, Don't Think of an Elephant!: Know Your Values and Frame the Debate, which advances the idea that Republicans have been doing a better job than they at promoting their causes by using language to influence public perceptions. A good example is the term "tax relief", which frames the entire concept of taxation as a burden imposed from without, rather than as a reasonable expectation citizens have of themselves under a social contract. One would think it would be unsurprising at this late date to point out how willing people are to buy into manipulations of language, and you'd think both parties would be very practiced at this sort of thing by now. But on reflection, Democrats do seem remarkably ham-handed at it.

Contrast Reagan's "Mistakes were made", which deflected responsibility in a subtle enough way that his supporters could defend it, with Clinton's "it depends what 'is' is", which even supporters had to admit was an embarrassment. Another, lighter linguistic WTF moment, thanks to a Democrat: the recent statement by State Sen. Ellen Karcher (D, New Jersey), who said of the tomato: "Botanically it's a fruit, legally it's a vegetable".

Regardless, Green's view is that the real problem the Democrats have is with ideas, not language, and that's a question one could fairly debate. But Green doesn't. Rather than asking any substantive questions about the possible role of "framing" versus substance in forming public opinion, he himself plays the "framing" game. Republican Frank Lutz (who created terminology like "tax relief") is a pollster, strategist, wordsmith, "message-meister". In contrast, a "Masonic cabal" of "superstitious Democrats" imagines Republicans as having "linguisic sorcerers", and now seeks to employ similar "mysterious alchemical skills" in order to move the masses.

I think Lakoff's point is a bit overblown. But on the other hand, Green's piece is precisely the sort of crafted, carefully framed language that Lakoff is worried about. It's designed to reinforce a point of view in the reader's mind without making any real argument or presenting any real evidence. Is Lakoff's book the only game in town for drawing attention to this strategy and blunting its effect, or can we do better? Where do we linguists apply for our robes and magic wands? What spell or potion will get people to question rather than simply to accept and follow?

Posted by Philip Resnik at 11:32 AM

April 13, 2005

Pledge break psycholinguistics: production

The situation of the station staff and volunteers during a public radio station's pledge break — having to talk continuously for as long as necessary about how nice it would be if people would send in money despite the temporary loss of the very thing they tuned in to the station for — stimulates the production of truly loony and incoherent blather from the tense and inexperienced local station staff as they struggle to find new begging language without ever leaving even a second of dead air. My brother Richard carried on collecting instances of public radio babble during the rest of the week in which he collected the deathless line "This is the station that you really makes a difference to you", and he came up with some great stuff. There were lots of empty or meaningless and mildly amusing clichés about movement ("We're off to a running start! Here we are rolling along here!") as they struggled to keep up the sense of a train rolling down the track toward financial security, but also some truly weird and wonderful pieces of lack of forethought. You'd think people would scribble down some sentences they might use, in order to be sure that they ended as well as they began, but apparently not...

"We will exhort you... If there is such a word."

"KAZU continues to go where you... and we... cannot."

"Thank you, Joel, for coming on board. He's an existing member, so he's... coming on board again."

"You're catching up with what's been happening in the world, now."

"The phone numbers haven't changed but what has changed is that... it's a new day!"

And just once Richard caught the station manager going where you, and we, and certainly he, could not: he was on the edge of falling into the delusion that he was the radio, NPR was him, and he was more than just an interruption, he was the main event:

"Now is the time to keep this pro... er... pledge drive on track."

That's right! Your babbling is not the program. It will never be a program. You are just a break, that interminable, teeth-grating break in service that we NPR listeners endure twice a year, finally (if we can catch the damn phone numbers) sending cold cash to you so we can listen to a talk radio program that isn't interrupted by interminable, teeth-grating commercials every eight minutes.

Posted by Geoffrey K. Pullum at 07:46 PM

Pledge break psycholinguistics: perception

NPR pledge week is now (oh, praise God), behind us. I didn't send in my check. I will real soon. But I swear that one of the reasons I never did during those days of program interruptions and local-studio beseechment and blather was that I couldn't catch the phone numbers. They kept saying them, constantly; but they insisted on saying the local phone number and the 800 number all together in a big rush so that you heard eighteen digits all at once and couldn't remember any of them. Try the experiment: say to someone very loudly and suddenly, "Three seven five seven two seven five one eight hundred nine oh three six six two four call now the volunteers are waiting to take your calls we have two on the line right now we need one more call before we return to NPR's Morning Edition!", and then ask them to tell you either of the phone numbers. You just can't.

Why the people at the station babbling on the microphones don't realize this I just don't know. It's like the way highway authorities don't realize that when you put ONLY LANE BIKE on the road it doesn't look like "Bike lane only", it looks like "Only lane bike." Nobody seems to have any intuitive grasp of how other people's linguistic perception mechanisms work. The obvious argument would be that if you can't memorize either of two phone numbers when they are given to you at high speed in quick succession with a whole lot of words following them, then neither can they. But no one at my public radio station seems to be able to put themselves in the listener's place.

Posted by Geoffrey K. Pullum at 07:11 PM

Wrong for so long

The O-Zone Fan Forum is a site for fans of Ohio State University sports teams. It includes an Off-Topic Forum, where recent posts deal with things like Country & Western songs that mention Ohio, the inability of neighborhood "kindergarden age children" to spell God, and so forth. Every once in a while, forums.the-ozone.net shows up in our referrers' log, and this happened again over the past couple of days, because '86Buck remarked (in response to a post by B2 about the idiom "could care less") that

I suppose it's been wrong for so long (1946) that it's "accepted" now...like / gay couples or bastards.

with a link to a Language Log post. (Where the offending idiom is dated only to 1966...)

Additional contextual flavor was provided by Topspin, who in response to B2's question

Isn't there a term for the use of a phrase in direct opposition to its "face" logic?


Don't know the name of the term, but "yes dear" is used in that way around here.

Before everybody gets all worked up into a redneck vs. intellectual lather, let me get to my point, which is that '86Buck and Camille Paglia are siblings under the skin. They both look back to a cherished alignment of truth, morality, logic and tradition, now nearly lost to a sinful nation, a people laden with iniquity, a seed of evildoers, children that are corrupters. At the same time, each would likely see the other as a perfect symbol of everything that's gone wrong.

Some enterprising philosopher or social scientist ought to trace the relations between linguistic ideologies and other strands in the great tapestry of modern thought. Like metabolic alternatives among bacteria, attitudes towards language and language use seem to diffuse and recur through the space of human personality, culture and politics. The patterns are certainly not random, but they don't line up with social and political groups in any easy way either.

[Update: Benjamin Zimmer emails

Digital newspaper databases now allow us to date "could care less" back to 1955, 11 years before the earliest citation currently given by the OED.

This Morning . . . With Shirley Povich
Washington Post, Sep 25, 1955, p. C1
The National League clubs have always shied from pitching left-handers against the Dodgers, but Casey Stengel could care less about the Dodgers' reputation for beating southpaws.

And "couldn't care less" can now be dated to 1944 (the OED had 1946).

'Danger List' by Christianna Brand
Chicago Tribune, May 15, 1944, p. 18
"I couldn't care less, darling," said Frederica who, being on duty in the ward, could not go to the party.

Given the expected resistance of editors to "could care less", the fact that it appears in print 11 years after the first citation for "couldn't care less" suggests to me that the two expressions probably arose at essentially the same time, like quark-antiquark pairs in a high-energy collision.]

Posted by Mark Liberman at 01:37 PM

The RIAA tries a new direction

According to Rachel Feintzeig in The Daily Pennsylvanian (where April 12 is this year's April 1), the RIAA has decided to stop suing its customers for copyright violation, and instead to start using the justice system for aesthetic purposes:

The Recording Industry Association of America filed lawsuits yesterday against four Penn students who were found to have downloaded Sonic Youth songs onto their computers.

Citing "bad taste," officials said the individuals will be prosecuted to the fullest extent of the law. If convicted, the students face a minimum sentence of 10 months in an alternative music rehabilitation center.

Treatment could also include intensive listening sessions featuring musicians of the 21st century, or trips to spring concerts at other universities that plan to feature contemporary artists.

This came as welcome news to those Penn students who were unhappy about this year's choice of Sonic Youth as the featured band at Spring Fling. The article goes on to suggest that the RIAA's new strategy will win them greater cooperation from University authorities:

The fate of the Spring Fling organizers remains unclear, but it appears as though the University is unwilling to offer them the same protection they have given earlier RIAA targets.

"We have no obligation to these individuals," University President Amy Gutmann said of the three Social Planning and Events Committee directors. "We just don't support students who endanger the Penn community, and we certainly don't support students who like shitty music."

And even the targeted students are apparently grateful in the end:

Lawsuits such as the ones aimed at the four students are part of the RIAA's strategy of suing individual users for their personal music preferences. The trend began in September 2003, when the group sued two Princeton students for downloading entire Ace of Base albums.

"I just liked 'I Saw the Sign' and it got out of control," recovering bad-music addict Bridget Takacs said. Though her police record will forever be branded "stuck in the 1990s," Takacs was grateful for the intervention.

"I'm thankful that the RIAA stepped in and got me the help I needed," the Princeton senior said.

There are serious scientific questions here as well:

Harvard President Larry Summers blamed innate differences between the Spring Fling organizers and the undergraduate population for the lawsuits.

"Fling organizers just lack the intrinsic aptitude to bring in good bands," he said.

Rumor has it that Steve Pinker's forthcoming book, The Blank Tape, will explore these issues in greater depth.

Posted by Mark Liberman at 07:36 AM

Watch out Google

In our continuing coverage of the war against omnigooglisation, we present evidence that the French are deeply serious about protecting intellectual property, even when there are no Anglo-Saxons in sight.

The Société pour l'administration du droit de reproduction mécanique des auteurs compositeurs et éditeurs (SDRM) has assessed a fine of $1,000 euros ($1,293.50 at current exchange rates) -- because an actor whistled a few bars of The Internationale without permission, in a movie that closed after total box-office sales of 203 tickets.

According to Nicole Vulser in Le Monde on April 9:

Pendant sept secondes, dans son long métrage Insurrection résurrection, l'acteur et réalisateur Pierre Merejkowsky a siffloté L'Internationale. Comme ça, au débotté. Une improvisation. Une fantaisie qui pourrait coûter cher à son producteur, Les Films sauvages.

For seven seconds, in his long film Insurrection Resurrection, the actor and filmmaker Pierre Merejkowsky whistled The Internationale. Just like that, off the cuff. An improvisation. A whim that could cost his producer, Savage Films, dearly.

Jean-Christophe Soulageon, le directeur, a reçu une lettre sèche, en recommandé avec accusé de réception, de la Société pour l'administration du droit de reproduction mécanique des auteurs compositeurs et éditeurs (SDRM), qui gère les droits d'auteur sur les supports cinématographiques. "Au cours d'un contrôle dans les salles de cinéma, nos inspecteurs musicaux ont constaté que l'œuvre L'Internationale avait été reproduite dans le film" sans autorisation. La SDRM demande donc 1 000 euros pour avoir omis de déclarer ce sifflotement, qui constitue une exploitation illégale d'une musique éditée par la société Le Chant du monde.

Jean-Christophe Soulageon, the director, has gotten a stiff note, return receipt requested, from the Company for the Administration of the Right of Mechanical Reproduction of Authors, Composers and Publishers (SDRM), which manages author's rights in film media. "In the course of an audit in the movie theaters, our musical inspectors have observed that the work The Internationale was reproduced in the film" without authorization. The SDRM therefore demands 1,000 euros for having failed to declare this whistling, which constitutes an illegal usage of a piece of music published by the company Le Chant du Monde (Song of the World).

That's harsh. Richard Posner, in a guest post on Lawrence Lessig's weblog, calls an analogous case "a reductio ad absurdum of folding in the face of copyright overclaiming". Judge Posner suggests that the problem is fundamentally a lexicographical one: "If only one could define 'glimpse'!" I believe that expert assistance in this task is available, and we here at Language Log Plaza certainly stand ready to offer our services. However, Savage Film's transgression was acoustic rather than visual:

M. Soulageon ignorait qu'un sifflotement valait chanson. Pis, il ne savait pas non plus que L'Internationale, dont la musique a été écrite par Pierre Degeyter (1848-1932) et les paroles par Eugène Pottier (1816-1887), n'était pas dans le domaine public. Membre du Parti ouvrier français, Pierre Degeyter a composé en 1888 ce qui est devenu par la suite l'hymne du mouvement ouvrier mondial. Le compositeur meurt en 1932 à Saint-Denis, "un peu dans la misère", malgré une petite pension de l'ambassade de l'URSS, précise Hervé Desarbre, le directeur du Chant du monde.

Mr. Soulageon didn't know that whistling is the same as singing. Worse, he also didn't know that The Internationale, whose music was written by Pierre Degeyter (1848-1932) and whose words were written by Eugene Pottier (1816-1887), is not in the public domain. A member of the French Worker's Party, Pierre Degeyter composed in 1888 what has since become the hymn of the worldwide workers' movement. The composer died in 1932 in Saint-Denis, "somewhat in misery", despite a small pension from the Soviet embassy, according to Hervé Desarbre, director of Song of the World.

You'd think that 70 years after the death of the author would mean that the work went out of copyright in 2002 -- but apparently (at least in France) they add on 12 years for "les années de guerre" ("the war years"), so whistlers are on the hook until 2014.

Le producteur a tenté, en vain, de négocier, en proposant 150 euros au Chant du monde. La société d'édition musicale des "grands Russes" (Chostakovitch, Prokofiev...) aurait préféré une demande préalable. L'épisode est d'autant plus rude que Les Films sauvages ne se sont guère enrichis avec le film de Pierre Merejkowsky. Sorti le 10 novembre 2004 dans une seule salle d'art et d'essai parisienne, ce long métrage a réalisé 203 entrées.

The producer tried in vain to negotiate, offering 150 euros to Song of the World. The publishing company of the "great Russians" (Shostakovitch, Prokofiev...) would have preferred a request prior to release of the film. The episode is even rougher because Savage Films hardly got rich from Pierre Merejkowsky's work. Released on November 10, 2003, in a single theater for art and experimental films in Paris, this long movie sold 203 tickets.

Here's hoping they make it back in the DVD market.

Pourquoi Pierre Degeyter n'est-il pas mort riche ? Chaque fois que L'Internationale était chantée en public, il aurait dû toucher des droits. "L'Union soviétique violait la loi en ne redistribuant rien aux ayants droit", déplore M. Desarbre.

Why didn't Pierre Degeyter die rich? Every time that The Internationale was sung in public, he should have gotten royalties. "The Soviet Union violated the law in not redistributing anything to the rights holders", complained Mr. Desarbre.

Re-distributing? Does he mean that in the 1914-1932 period, the USSR was collecting royalties on The Internationale but not giving them to Degeyter? Shocking, if true, but probably false.

[via BoingBoing]

Posted by Mark Liberman at 07:32 AM

Human-machine communication

Watch the movie first.

Then read about Blendie.

And learn about the wider world of Machine Therapy.

Exciting developments from the Media Lab. But I'm afraid that it's all to set you up for when the blender calls you on your cell phone.

[Work by Kelly Dobson.]

[Update: Kerim Friedman makes the connection to a Japanese innovation: the wired kettle, or " i-pot". ]

Posted by Mark Liberman at 06:49 AM

April 12, 2005

Astounding Coordinations (continued)

Eric Bakovic, spurred by Neal Whitman and Mark Liberman, continues our conversation on Astounding Coordinations, with examples that seem to involve coordination of unlikes and/or remainders in coordination that are understood in different ways with different conjuncts.  I've been collecting various sorts of failure-of-parallelism examples for a while now; here are some further contributions to the conversation -- a couple of relatively routine examples, plus some apparently involving coordinations of a clause having a subject gap with a clause having an object gap.

1.  Ordinary failures of parallelism.  First, from MacNeil & Cran, Do You Speak American? (2004), p. 61:  "Kirk Arnott, assistant managing editor [of the Columbus Dispatch], is the language cop or watchdog of the Dispatch.  He believes in informal and conversational language, and that his paper should be as conversational as possible, to be accessible and clear to readers."

On the surface, this is:

... believes [ [in NP] and [that S] ]

with a PP and a complement clause treated as parallel.  Somewhat more ingeniously, you could claim that the structure is really:

... believes in [ NP and [that S] ]

(with automatic suppression of the "in" of "in that S" when the remainder is distributed over the conjuncts), though that still has an NP and a complement clause treated as parallel.  And there's the question of whether the two conjuncts are interpreted with the same verb believe.

Next, there's a Remington shaver commercial, heard on television 12/21/04:  "... designed for closeness, comfort, and to clean itself automatically".

There are two issues here.  One is what James Cochrane, A Little Book of Bad English, calls "not enough ands", as in his example (pp. 91-2): "The scandal was headline news, seriously damaging the credibility of the president, the Republican Party, and giving a considerable boost to the lagging Democrats."  (Cochrane, p. 93, maintains that "errors of this kind have begun to crop up regularly only in the last twenty or thirty years", a claim of recency that I doubt, though I'm not yet in a position to dispute it with data.)  In the Remington commercial, the and problem, if it is one, can be easily fixed: "... designed for closeness and comfort, and to clean itself automatically".

Then this example is like the Dispatch one.  On the surface, it's:

... designed [ [for NP] and [to VP] ]

with a PP and an infinitival VP treated as parallel.  Or, if the structure is

... designed for [ NP and [to VP] ]

(with automatic suppression of the "for" in "for to VP" when the remainder is distributed over the conjuncts), we have a NP and an infinitival VP treated as parallel.  In any case, I have no trouble interpreting the two conjuncts with the same verb design, but others might find even this problematic.

As a side note, I should point out that instances of "faulty parallelism" involving either... or are very common indeed, to the point where it seems absurd to treat all of them as ungrammatical.  Among the examples supplied by Merriam-Webster's Dictionary of English Usage (p. 434) is this fine specimen from Darwin's Origin of Species: "... the stripes are either plainer or appear more commonly in the young".  Here either is located "too low"; for perfect parallelism we'd need: "... the stripes either are plainer or appear more commonly in the young".

It turns out that there's a pretty considerable literature on the placement of either, which I was made aware of last Friday by Philip Hofmeister's public presentation of a Stanford qualifying paper.

2.  On to subject and object gaps.  Yesterday, Bruno Estigarribia wrote Ivan Sag and me with an example from the New York Times, April 8, 2005, "Maybe Less Use of the Prescription Pen" by Anahad O'Connor, Business section:

"When you compound these drugs, that means the heart won't see it, and the stomach won't see it," she said. "So for people who I'm not going to give a cox-2 and also have a history of ulcers, the way around it is to take the anti-inflammatory and make it into a cream."

Estigarribia commented: "It took me near 3 seconds to parse the sentence (or maybe more, I wasn't timing it, I sure was confused). What is going on here?"

I replied that coordination of a clause with an object gap ("I'm not going to give a cox-2") and a clause with a subject gap ("also have a history of ulcers") is usually judged ungrammatical, though there's some question about what condition bars it.  And I provided two further examples that were discussed on the newsgroup sci.lang back in November 2004:

(1) ... the "Control Panel" (which you presumably have to know is there and how to get to)...

(2) [...] New Mexico, which the president leads but was still uncalled as of noon Wednesday...

(In example (2), you have to accept that the writer intended "the president leads New Mexico" to mean something like 'the president leads in New Mexico'.)

Ivan added that, if he remembered right, Gerald Gazdar's 1981 paper ("Unbounded dependencies and coordinate structure", Linguistic Inquiry 12.155-84) treated these as ungrammatical, though that conclusion was challenged in a paper presented at a summer LSA meeting (maybe College Park in 1982).  I'm working on tracking that paper down.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 02:01 PM

What's real?

Newspapers around the U.S. -- among them, the Palo Alto Daily News -- carried an AP story over the weekend about proposed increases in the first class postage rates, with an accompanying graph created from data supplied by the Postal Rate Commission. The graph is labeled:

First class postage rates for regular mail since 1920, real and adjusted for inflation

"Real?" I asked. The Real part of the graph, in darker color, climbs steadily upwards, while the Adjusted (unreal?) part, in lighter color, looks like a city skyline, with peaks in the 30s and 70s above 40 cents, a trough in the 50s around 20 cents, and small-scale variation a bit below 40 cents for the past two decades. I would have thought that the Adjusted figures, which represent an approximation to the buying power of a first-class postage stamp for each year, were the real values; the Real figures are only apparent, or face, values. The graph is talking like an ordinary person -- who tends, rather literally, to take prices at their face value, especially for certain commodities (notably gasoline and postage stamps) -- rather than like an economist.

The choice of labels isn't without consequences, of course. Calling face values "real" encourages the perception that prices are rising out of control. While, really, they might be holding steady, or even falling. I do remember penny postcards and pay phones that charged a nickel for local calls. But I also remember earning 75 cents an hour as a newspaper reporter.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 11:43 AM

FLoP is not?

In response to my post about "FLoP and anti-FLoP", Lance Nathan sent email to Neal Whitman and to me, calling Neal's original terminology into question. And Lance has citations from the classics to support his point:

I realize I'm probably too late to change the terminology, but I was just reading Mark's recent "Flop and Anti-flop" LanguageLog post, and I realized I'm not convinced by the canonical example: "where the whiskey drowns and the beer chases my blues away."

I think I'm OK with the idea of drowning something away. Robert Johnson was OK with it, too; Google tells me his song "Preachin' Blues" contains the line "Started raining - drown my blues away".

Johnny Lang also found it acceptable enough for him to write "I keep drinking malted milk / Tryin' to drown my blues away." And the chorus of "Quicksand" by Travis is

Everyday sinking into quicksand
Follow me down the drain
Everyday drinking in the same bar
Drowning my sorrows away

(Google gives other results for "drown * * away", "drowned * away", and so forth.)

The "Flop" coordination is definitely quite real; I find the other examples of floppage convincing. Just not, unfortunately, Garth Brooks's.

Among the "other examples" (from various of Neal's posts):

There is little or no incentive for the contractor to reduce or keep the cost down.
After using dishes, please wash, dry, and put them away in the proper place.
A Monroe County man, convicted yesterday of raping, beating and stuffing a 7-year-old girl into an abandoned well, could be executed by lethal injection.
Please move from the exit rows if you are unwilling or unable to perform the necessary actions without injury.
Led by France and Canada, a majority of countries are asserting the right of governments to safeguard, promote and even protect their cultures from outside competition.

As Lance suggests, Robert Johnson never wrote a lyric about promoting his culture from outside competition.

Posted by Mark Liberman at 05:52 AM

April 11, 2005

Still more WTF coordinations

Mark's post on FLoP and anti-FLoP coordinations reminds me that I've been meaning to follow up on my own post from last month, in which I commented on WTF reaction to an odd coordination that I read. I got several comments on it, some with more examples attached. And I have a couple more examples of my own.

As I noted in an update to my original post, Neal Whitman wrote to tell me about his recent article in Language 80.3 (pp. 403-434) on this topic, entitled "Semantics and Pragmatics of English Verbal Dependent Coordination" (sorry, access to Project Muse required for the link to work). (Mark Liberman also wrote to remind me of his original post on FLoP coordination, which is of course due to Neal's work.)

Neal also provided the following additional examples:

It makes it hard for him to get [his stuff done] and [to bed on time].
She wants [an engagement ring] and [her boyfriend to stop dragging his feet].
Don't eat [fast food], or [at restaurants, food-service companies, or caterers].

The last of these examples is perfectly fine for me, underscoring the apprehensiveness I had about saying that only phrases with the same syntactic category can be conjoined: [fast food] is a noun phrase, and [at restaurants ...] is a prepositional phrase. The other two examples are different, though; my knee-jerk WTF reaction is to give the first a question mark (by which I mean that it's somehow borderline between grammatical and ungrammatical) and the second a star (by which I mean that it's ungrammatical -- except that it improves somewhat if a for is added before the second conjunct).

Russell Lee-Goldman wrote and noted the similarity of my example to Right Node Raising constructions. (This is basically what Mark just wrote about.) Russell provided a couple examples:

I have a liking [ ] and want to eat [chocolate].

Here, 'of' or 'for' is missing ("I have a liking of/for chocolate"), which is similar to the anti-FLoP examples Mark talked about. Those are all bad for me, but Russell also provided this other type of curiously good example:

I like [to eat chocolate / eating chocolate] but rarely can [ ].

Mike Pope also wrote to comment:

Would you say that this is a form of zeugma? The small child of a friend of mine once said "The sun makes you hot and sneeze," which seems at least similar in spirit to what you've got here.

As explained here, zeugma is "A construction in which a single word, especially a verb or an adjective, is applied to two or more nouns when its sense is appropriate to only one of them or to both in different ways, as in He took my advice and my wallet." (I'll assume that the "two or more nouns" part is overly restrictive; a better definition might say "two or more complements".)

This [my advice] and [my wallet] (noun phrase and noun phrase) example is fine for me; Mike's [hot] and [sneeze] (adjective and verb) example is not (but it must have been fun to hear a kid say it). If they're both just examples of zeugma, why is that? WTF?

Paul Howard wrote in with this nice example for The Boston Globe (emphasis added):

Justin Sherrod, called up for the day from minor league camp, homered in the eighth, accounting for the decisive run. Sherrod was wearing No. 13, previously assigned to Roberto Petagine. The homer sent the scribes looking for a roster and the Sox home happy.

Finally, two additional examples I've come across. One was in a story on NPR's Weekend Edition Sunday this past weekend. (Here's a link to the audio for the full quote; here's a link to just the bolded part.)

Robinson is one of the top twenty schools in the state. It's more known for its students fighting to get into the best colleges than each other, but students in this class say even here the daily shuffle in their crowded hallways can lead to the occasional angry shove.

Two different kinds of fighting. Another example of zeugma?

Now consider the following example (from Life of Pi, pg. 37):

I nodded so hard I'm surprised my neck didn't snap and my head fall to the floor.

When I first read this a few months ago, I had an even bigger WTF reaction than for any of the others. But I immediately reasoned through it and now find it almost perfectly grammatical. All that it took was the recognition that the negation expressed by "didn't" in the first conjunct takes scope over both conjuncts ...

NOT [ [my neck snap] and [my head fall to the floor] ]

... and that this means something subtly different from having two negations, each taking scope over one of the conjuncts ...

[ NOT [my neck snap] and [ NOT [my head fall to the floor] ]

as in:

I nodded so hard I'm surprised my neck didn't snap and my head didn't fall to the floor.

Which is ambiguous (as my colleague Andy Kehler pointed out to me) between a reading in which the neck-snapping causes the head-falling and one in which there is no causation (as pragmatically odd as that might be); in other words, causation between the first and second conjuncts is not necessary in this second sentence while it is in the original.

Andy also reminded me of Arnold Zwicky's post from last August about grammatical and ungrammatical coordinations, itself sparked in part by a suggestion by Neal Whitman. Just another day in Language Log Plaza.

[ Comments? ]

Posted by Eric Bakovic at 11:48 PM

FLoP and anti-FLoP

Neal Whitman has given the name FLoP coordination to a certain kind of incompletely-parallel coordination. The canonical example is from Garth Brooks' "Friends in Low Places":

I've got friends in low places,
where the whiskey drowns and the beer chases
my blues away.

We start from a structure in which a final constituent is construed with both members of a preceding conjunction, say

Kim selected and Leslie packed the samples.

and then add something else that only makes sense with the second conjunct, e.g.

??Kim selected and Leslie packed the samples up.

It's not clear what the status of these structures is. Ordinary examples like the one that I just constructed seem pretty doubtful to me, but there are plenty of cases like the Garth Brooks lyric that go down fine, at least as long as I don't think about them too closely.

Meanwhile, I keep seeing things that might be called "anti-FLoP coordinations", where rather than the second half of such a conjunction having an extra bit, instead the first half is missing something. Here's one that I remembered to mail myself a link to -- it's from a 2/21/2005 Q&A in the "Online Only" section of the New Yorker, where Michael Spector explains about avian flu:

Second, it can kill and cause severe disease in humans—though, so far, for that to happen a person would have to have been exposed at great length, or have eaten raw, infected poultry. [emphasis added]

The point is that the "full" form of the first conjunct must be something like

...a person would have to have been exposed at great length to infected poultry.

with an extra "to" that's nowhere to be found in the original.

In both conjuncts, an object or indirect object has been placed after another constituent that it might well have preceded, in order to get "infected poultry" into final position to be shared:

...a person would have to have eaten [infected poultry] raw.
...a person would have to have eaten raw [infected poultry].

...a person would have to have been exposed [(to) infected poultry] at great length.
...a person would have to have been exposed at great length [(to) infected poultry].

It's clear from the rest of the transcript that Michael Spector likes this kind of structure, which has traditionally been called "right node raising" to express the sense that a shared final (i.e. "right") constituent has been "raised" so as to be shared by both members of a preceding conjunction:

[A B C] and [D E C] ⇒ [[[A B] and [D E]] C]

There are at least two other examples in the same interview. One was earlier in the same sentence cited earlier, and involves sharing a noun that is the object of a verb in the first conjunct and of a preposition in the second:

Second, it can kill and cause severe disease in humans—though, so far, for that to happen a person would have to have been exposed at great length, or have eaten raw, infected poultry.

In the third case, the shared constituent is a prepositional phrase, connected to noun phrases in both conjuncts:

But by closely monitoring the spread, and by examining the genetic structure of the virus, we can get a sense of how to develop a vaccine and how to make better drugs.

In the first case I cited, the noun phrase "infected poultry" is the object of a preposition in the first conjunct and of a verb in the second one. Michael Spector was apparently so confused by all the shifting around needed to get the constituents in the right order to allow "infected poultry" to be shared, that he didn't notice the little "to" that got lost in the shuffle. In other words, I don't think this is a dialect difference, or an informal construction, or a syntactic change in progress -- it's just a mistake, and I'd bet that Spector would think so too, if it were pointed out to him.

It's surprising that that this got through the New Yorker's editorial process. Although the document is an interview transcript, it's surely been edited at least to the extent of removing filled pauses, false starts and so on, and you'd think that they'd fix this kind of verbal mistake as well.

Posted by Mark Liberman at 09:47 PM

The pointless game of grammar Gotcha

A letter recently published by the San Jose Mercury News read as follows (I'll quote in full):

Editor, edit thyself

The large and bold head- line on the March 15 Editorial page, "New SAT writing section aims to better reflect needed skills," suggests that your editorial page editor may need to take an SAT prep course. The bane of all English teachers, the split infinitive (to better reflect) certainly caught my attention. I couldn't help thinking, "Could I really benefit by reading on?"

David Bour Sr.

Now this is what gives the whole subject of grammar a bad name: reducing it to a pointless, unthinking, anti-intellectual game of Gotcha. What's so pathetic in this particular case is not just that (does any Language Log reader have to be told this again?) the split infinitive construction is grammatical and has been attested in all forms of written English for at least seven hundred years, but that this particular example is one of those where "correcting" it would create ungrammaticality or ambiguity, not prevent it.

The point is that you can't move better to a better place. Shift it rightward and you get New SAT writing section aims to reflect better needed skills, where the sequence better needed suggests the wrong meaning (as if the skills were better needed than something else). Shift it leftward and you get New SAT writing section aims better to reflect needed skills, where the sequence aims better suggests a different wrong meaning (as if the new SAT aimed better than something else did). Putting better between the to and the verb it modifies is the right thing to do in this case. It makes a grammatical sentence that correctly expresses the intended meaning.

I suppose if all the usage books got this wrong one would have to admit that they people who follow them had some excuse. But the fact is that every decent guide to grammar and usage on the market agrees that the split infinitive is grammatical and often preferably to all other alternatives. Look it up! Don't take my word for it. Go to a library and take in your hand what appears to you to be a comprehensive, high-quality reference work on English usage. See what it says. There just aren't any that insist the split infinitive is always ungrammatical and should never appear in writing. Some of them even point out cases where (as Arnold Zwicky noted here on Language Log, and as actually recorded in a usage note in the American Heritage Dictionary by our own Geoff Nunberg) the split infinitive is grammatically obligatory.

The split infinitive is not the bane of English teachers. No sensible English teacher cares one whit about the split infinitive. Trust me: I teach courses on English grammar myself, and I've just published a textbook on the subject — I do have some credentials in this area. No, the bane of English teachers is pompous old fools like David Bour Sr. who attempt to carry on a tradition that values ignorant nitpicking more highly than sensible attention to style and richness of prose composition. People whose misguided pedantry undermines the very idea that the business of grammar might involve complex patterns of evidence, difficult investigations, subtle distinctions, intricate generalizations. People who contrive to ensure that the SAT test will for some decades into the future waste some of its effort on testing things that are irrelevant to scholarly aptitude. People who reduce a complex and rather interesting subject to a narrow, mechanical, empirically uninformed game of grammar Gotcha.

Posted by Geoffrey K. Pullum at 09:42 PM

Stupid redundant warning blather

Today I looked at the tiny print on an Aerocell super alkaline AA battery manufactured in the European Union and found that among other warnings it said (in English, French, and German): "Do not misuse."

You could put that on absolutely every product sold or manufactured anywhere, couldn't you?

[This message comes to you from Language Log. Do not misuse it.]

Posted by Geoffrey K. Pullum at 07:12 PM

April 10, 2005

The unwanted conversation of other people

The FAA is thinking about lifting the ban on cell phone use in flight, but passengers and air crews overwhelmingly dislike the idea, according to stories in the WaPo, Information Week, Newsday and many other sources. The stories agree in assuming that cell phone conversations are uniquely intrusive and annoying. Thus the WaPo:

"The airplane is one of the few places you can go to have some quiet time," said Susan Grant, vice president of public policy at the National Consumers League, which sponsored a poll released yesterday that said 63 percent of Americans don't want the federal government to lift its ban on cell phones in flight. "If we lose that, there will be no place to hide from the aggravation of having to listen to the unwanted conversation of other people."

As I've observed before, there's something funny about this. People do still have face-to-face conversations, and many people do this on airplanes, in my experience. So a phone-free airplane is not a place where "the unwanted conversation of other people" is absent. It could be that the survey respondents feel that cell phone availability in planes would cause the number of conversations to increase, and they might be right about that. But as Mark Twain was the first to point out, and as controlled experiments have since confirmed, listening to one side of a conversation is much more annoying than listening to a whole conversation at the same distance and volume level.

Posted by Mark Liberman at 08:36 AM

April 09, 2005

Enhance breast size by 80%

Now that spammers are being sentenced to jail terms in the U.S., it seems that some of them have decided to take up a new career as BBC science reporters. In a BBC News article recently discussed by Ray Girvan at the Apothecary's Drawer Weblog, some anonymous ex-spam-artist reveals that "Bust-Up gum, when chewed three or four times a day," can not only "enhance the size, shape and tone of the breasts", but also "improve circulation, reduce stress and fight ageing".

The BBC tells us that "The gum works by slowly releasing compounds contained in an extract from a plant called Pueraria mirifica", and that the gum's manufacturer

...cites tests carried out by Thailand's Chulalongkorn University which found Pueraria mirifica therapy was able to enhance breast size by 80%.

Further tests carried out in England found that the plant had a beneficial effect on the skin, and hair, as well as the breasts.

If you think about it, this approach makes a great deal more sense than spam. You can safely reach millions of readers by publishing ads for your breast and penis enlargement products on BBC News, whose production is actually subsidized by the British government. And rather than being angry, the audience will mostly be grateful to you for providing them with this marvelous opportunity for personal enhancement.

This doesn't have much to do with language, I admit, but it helps to explain why the BBC's science reporting in language-related areas is also so erratic: presumably the ex-spammers need to provide themselves with cover by occasionally writing something other than advertising copy for dubious products.

[Just to keep things clear, this post is a joke -- I don't really have any evidence that BBC health and science reporters are spammers on the lam. But how could you tell the difference?]


Posted by Mark Liberman at 02:06 PM

Brother Cattle Prod of Reasoned Discussion

There's a new threat on the horizon: the Unitarian Jihad.

Greetings to the Imprisoned Citizens of the United States. We are Unitarian Jihad. There is only God, unless there is more than one God. The vote of our God subcommittee is 10-8 in favor of one God, with two abstentions. Brother Flaming Sword of Moderation noted the possibility of there being no God at all, and his objection was noted with love by the secretary.

Close reading of the communiqué suggests that cult members take new "struggle names", consistent with a semantic grammar of the form "Brother|Sister <Weapon> of <Abstract-Noun-Connoting-Positively-Evaluated-And-Gentle-Property>". Other examples include "Sister Immaculate Dagger of Peace", "Brother Neutron Bomb of Serenity", and "Brother Gatling Gun of Patience". Demonstrating the role of the internet in fostering fundamentalist terrorism, Bill Humphries has set up a web page that assigns would-be UJ militants a suitable name. Accessing this page, I was assigned "Brother Cattle Prod of Reasoned Discussion", which I would object to, if I were a Unitarian Jihadist, which of course I am not. I appealed to the Name Assignment Committee anyhow, and got "Brother Pepper Spray of Desirable Mindfulness", which does not strike me as an improvement. I feel that shorter names involving less fussy weapons and greater alliteration might be more effective.

Anyhow, after a long list of blood-curdling threats

We will appear in public places and require people to shake hands with each other. (Sister Hand Grenade of Love suggested that we institute a terror regime of mandatory hugging, but her motion was not formally introduced because of lack of a quorum.) We will require all lobbyists, spokesmen and campaign managers to dress like trout in public. Televangelists will be forced to take jobs as Xerox repair specialists. Demagogues of all stripes will be required to read Proust out loud in prisons.

the UJ communique ends this way:

People of the United States! We are Unitarian Jihad! We can strike without warning. Pockets of reasonableness and harmony will appear as if from nowhere! Nice people will run the government again! There will be coffee and cookies in the Gandhi Room after the revolution.

I hope that Michael Chertoff can spare the time from announcing support for continuing rail hazmat placards to keep an eye on these dangerous extremists. One obvious step is to set technology against technology, countering Bill Humphries' web page assigning struggle names with a program to detect such names on weblogs and in web forums.

Posted by Mark Liberman at 01:33 PM

Language in the social and behavorial sciences

I'm going to start this post with two suggestive graphs, continue with some historical background, and end with a startling prediction.

The graphs come from a talk that Jamie Pennebaker gave at Penn a few weeks ago. They plot two simple time-functions derived from posts by Americans on LiveJournal between September 10 and November 5, 2001.

The first graph shows the frequency (in percent) of the words I, me, my, day-by-day from September 10 to September 24, and then week-by-week to November 5:

The second graph shows the frequency of the words we, us, our from the same sources over the same period:

Pennebaker and his co-workers calculated these counts because of their theory that "word choice can serve as a key to people's personality and social situations", and in particular that "pronouns, prepositions, conjunctions, articles, and auxiliary verbs" are especially "powerful indicators of people’s psychological state". Their work offers many other striking facts and interpretations, and raises all sorts of complex questions, all of which I'll ignore for now, because I want to talk about the history and future of a related family of ideas and techniques.

The early 1960s saw Gerald Salton's insight that the content of a document can be usefully approximated by nothing more than the frequency counts of the words it contains, and also the influential work by Frederick Mosteller and others on the use of simple linguistic statistics to make inferences about authorship. It was during this same period that Bill Labov showed how to use counts of simple things like word choice and pronunciation variation to investigate the social and temporal dimensions of language.

About 20 years later, in the mid 1980s, Doug Biber and others explored the idea that notions like register and genre could emerge from an analysis of the distribution of simple linguistic measures across texts. Pennebaker's work adds "people's personality and social situations" to the list of things that can be studied this way.

Anyone who uses internet search is reaping the benefits of Salton's ideas, and of course there's a robust area of academic and industrial research on how to make textual information retrieval work better. As for models of authorship, Erica Klarreich wrote in a Science News article in 2003 that

Stylometry is now entering a golden era. In the past 15 years, researchers have developed an arsenal of mathematical tools, from statistical tests to artificial intelligence techniques, for use in determining authorship. They have started applying these tools to texts from a wide range of literary genres and time periods...

Quantitative sociolinguistics has become an established discipline, with its own journals and meetings. Computational linguists have been busily and successfully applying frequentistic methods to a wide range of problems, from parsing and semantic analysis to summarization, automatic translation and "text data mining". And psycholinguists, who have always had to control for frequentistic effects in their experimental design, are increasingly interested in studying such effects directly.

Viewed in this context , what's especially interesting to me about Pennebaker's work is how isolated it is. If we look across the social and behavioral sciences -- outside of sociolinguistics and psycholinguistics -- there are remarkably few cases where linguistic analysis plays any explicit role in research. (See Damon Mayaffre's "digital hermeneutics" for another example.)

I'm exempting sociolinguistics and psycholinguistics because the whole enterprise in these subdisciplines is focused is on aspects of language or language use. And I'm using the term "explicit role in research" because many social and behavioral science researchers use linguistic analysis implicitly, for instance in interpreting survey or interview results, or in examining political rhetoric. What's missing is work that uses linguistic analysis -- even something as trivial as word counts -- as an explicit component of a research program that's not mainly about language.

I predict that this will change. Pennebaker's use of LiveJournal data to investigate the social and psychological effects of 9/11 suggests some of the reasons:

  • Enormous amounts of text are now being produced in digital form, explicitly situated in space, time and various sorts of social networks.
  • Much of this text is freely available to anyone who cares to download it from the web.
  • Even the most elementary forms of analysis (such as local word counts) can serve as effective indicator variables for content, individual and social identity, style, emotional state and so on.
  • Simple and accessible computer methods make it easy to generate and analyze such data on a large scale.

There are other reasons as well:

  • There are new techniques for automatic analysis of the form and content of text (parsing, tagging of "entity mentions", determination of reference and co-reference, etc.).
  • There are new statistical techniques for finding relevant patterns in very high-dimensional data.
  • In some cases, linguistic analysis could be used simply to enhance research productivity in existing paradigms (e.g. because many of the kinds of "coding" of survey and interview transcripts that already go on every day could be automated).

There's also an enormous educational opportunity here. At the high school level, you could use quantitative linguistic analysis to teach statistics, simple computer programming, and scientific methodology -- and even perhaps some linguistics! Simple techniques of this kind can be applied to many sorts of problems that most students will be interested in: information retrieval, analysis of individual and group identity, style, personality and mood, and so on. So I also predict that it will become routine to use this stuff to teach math and science in high school.

Some readers may be tempted to complain that these predictions are not at all "startling," despite what I wrote in the first sentence of this post. If you're one of them, I'm happy that you share my belief that the predicted changes are so easy and so beneficial that implementing them would be a no-brainer. But I'm afraid that I still find the predictions "startling", in the sense that I'll be pleasantly surprised if they come true in the near future.

[ You can learn more about the 9/11 LiveJournal investigation in Michael A. Cohn, Matthias R. Mehl and James W. Pennebaker, Linguistic Markers of Psychological Change Surrounding September 11, 2001, Psychological Science, Volume 15, Issue 10, Page 687-693, October 2004.

The abstract:

The diaries of 1,084 U.S. users of an on-line journaling service were downloaded for a period of 4 months spanning the 2 months prior to and after the September 11 attacks. Linguistic analyses of the journal entries revealed pronounced psychological changes in response to the attacks. In the short term, participants expressed more negative emotions, were more cognitively and socially engaged, and wrote with greater psychological distance. After 2 weeks, their moods and social referencing returned to baseline, and their use of cognitive-analytic words dropped below baseline. Over the next 6 weeks, social referencing decreased, and psychological distancing remained elevated relative to baseline. Although the effects were generally stronger for individuals highly preoccupied with September 11, even participants who hardly wrote about the events showed comparable language changes. This study bypasses many of the methodological obstacles of trauma research and provides a finegrained analysis of the time line of human coping with upheaval.


Posted by Mark Liberman at 11:01 AM

April 08, 2005

Can software tell a lie?

Jim McCloskey points out to me (citing the Linux Weekly News for March 30, 2005) that if you open the Preferences in a recent edition of Adobe Reader (versions 6 or 7), go to the Javascript submenu, and disable Javascript, a message will pop up telling you that the document you are looking at contains Javascript and may not display accurately if you disable it. Would you like to keep Javascript enabled? The highlighted button says Yes, you would. But when I tried this (with Acrobat version 6.0.3, Professional Edition, under Mac OS-X) it was on a document of my own, produced with LaTeX. There was absolutely no chance of there being Javascript in the document. The program was telling a flat lie (assuming this is metaphysically possible for a piece of software). Unless it's just an accidental bug. But there is independent reason to be suspicious about Javascript in PDFs. It's not pretty. Let me explain.

A company called Remote Approach is apparently in business with a service that exploits Javascript attached to PDF documents in order to provide publishers with information about the "reach and use" of the materials they make available. (Story here.) Apparently, when you use Adobe software to view a PDF file that has been uploaded and tagged by Remote Approach (and you won't know whether a given document has been tagged, since the publisher does it), the publisher will learn that you viewed it and will know your IP address and which viewer you use. This trick is accomplished by adding to the document some Javascript code that secretly sends information out via port 80.

I'm not really a privacy freak, but this seems a little creepy even to me. Be warned. But also be encouraged: you can disable Javascript despite the untrue claims in the warning message when you do so.

I must say, didn't see this possibility coming at all. Sometimes I frighten myself with my inability to see into the future of technology.

Posted by Geoffrey K. Pullum at 07:44 PM

What's in a name (reduced media edition)?

The rapacious Microsoft corporation has of course mostly gotten its way with its passive-aggressive behavior over the name of the the special European edition of Windows XP that (by European Commission regulators' insistence) does not have Windows Media Player bundled into it. The regulators have now acceded to the names "Windows XP Home Edition N" and "Windows XP Professional Edition N". The N is of course an abbreviation for "No fucking good, No one will want it, and Nobody will ever make Microsoft respect legal or ethical standards of business behavior." Neither side likes the names but they both say they are tolerating them so things can "move on". However, I am here not merely to cavil and carp: I have a proposal concerning what the name should have been. Read on.

For those recently returned from sailing around the world nonstop with no radio, let me remind you that Microsoft has been actively pursuing its policy of illicitly destroying other companies' ability to do business, this time in the media player software market. The idea is to kill companies like Real Audio. The strategy is familiar: if anything new or good comes out, plagiarize it or buy a mediocre competing product (remember, it doesn't have to be good, you're a monopolist); embed it as an integrated component of the Windows operating system in a way that made it easy to access and very hard to remove; introduce a few covert and plausibly deniable difficulties for other vendors' media players and keep your code secret; wait for the other vendors to die; then increase the price of Windows to cover the costs.

In the USA this anti-competitive behavior was long ago found illegal in the courts (mainly with respect to browsers), but nothing serious was done about it. Europe got a bit more serious, and instructed Microsoft to market a version of Windows without Media Player in it if it wanted to go on doing business in the European Community. And the product Microsoft came back with to comply with this had the proposed name "Windows XP Reduced Media Edition".

What's in a name? Well, quite a lot. Imagine if you were permitted to name your competitors' products. Here Windows had been ordered by the court to come out with a product that would compete with its own, so naturally it proposed to give it a name suggesting it was shrunken, inadequate, and not as good. They don't actually want to sell any copies of it, after all.

Again and again I have found myself thinking "These Microsoft people are bandits"; and then I think again and realize I'm being unfair — to bandits. (Bandits who are caught and found to have violated anti-banditry law don't typically manage to plea-bargain robbery and murder down to a parking violation, appeal the parking ticket, and go right back into banditry while the appeal is being heard.)

Anyway, the European Commission has agreed to accept the least offensive and ridiculous of Microsoft's suggestions for the name of the new product.

So, what would my naming suggestion have been, had the regulators thought to ask me? Very simple. I think the regulators should have insisted that the new product without Windows Media Player be called Windows XP. I also have a suggestion about the other product, with Media Player embedded in it. It should be called "Windows XP With Media Player", and I think it should cost more, by roughly the cost of a media player program. That would be fair and reasonable commercial behavior by a virtual operating system monopolist. So don't expect it ever to happen. (For one thing, I guess for basic media players the cost is typically zero, which rather dents my argument.) Instead, expect Microsoft to start out after another portion of the software industry and try to destroy it.

Perhaps, I have been thinking, they will try to destroy the flourishing industry of marketing ready-to-hand-in term-papers to student plagiarists. The new bundled-in program and service could be called Microsoft Cheat™. It could be hooked right into Word as a plugin: the student just types a topic and some keywords, clicks on a "Termpaper" button on the button bar, and the operating system automatically starts the browser goes to the Microsoft termpaper repository, runs a search, downloads a suitable paper, changes the by-line to the student's name, charges the student's credit card $39.99, prints the paper, and starts up the student's favorite video game... Shit! This could work! Sometimes I frighten myself with my ability to see into the future of technology.

Posted by Geoffrey K. Pullum at 07:23 PM

Normalization denormalization

If you go to Google Groups and search for {Z-normalization score}, Helpful Google asks you

Did you mean: Z-normalization denormalization score

If you then click on the helpfully provided link, Helpful Google next asks you

Did you mean: Z-normalization normalization denormalization score

If you click again, Helpful Google wonders

Did you mean: Z-normalization denormalization normalization denormalization score

and of course you click again, and so you are asked

Did you mean: Z-normalization normalization denormalization normalization denormalization score

At this point, we can invoke Stein's Law: "Things that can't go on forever, don't". On the other hand, there are also Davies' Corollaries:

1. Things that can’t go on forever, go on much longer than you think they will.
2. Corollorary 1 applies even after taking into account Corollorary 1.

The spelling "Corollorary" was that way in the original 11/4/2003 post at Crooked Timber, and prompted the comment: "Reading this almost gave me a coronorary", as well as a post hoc explanation on our site.

[The recursive correction at Google Groups was pointed out to me by Partha Pratim Talukdar.]

[Update: a similar problem seems to come up whenever you ask Google Groups about something with a hyphen in it, like { spark-plug cleaner}, or {watch-band replacement}. ]

Posted by Mark Liberman at 04:56 PM

Monks in space

In this morning's news, under the headline "Island Monks Fly in Satellite to Watch Pope Funeral", Reuters reports:

Monks living on an island off the west coast of Wales have flown in a satellite dish to watch the Pope's funeral on Friday.

Caldey Abbey, on Caldey Island, is home to the monks of the contemplative Cistercian order which follows a strict routine of prayer and work throughout the day and observes vows of silence every evening.

They already own a television set, although it has not been used for some time, but the satellite is needed to pick up the signal from Rome.

The monks have promised to return the dish once the funeral is over.

[Link sent in by Liz Upton]

Posted by Mark Liberman at 10:04 AM

April 07, 2005

Vatican commentary by limerick

Today I read Mark's post about press speculation on the short and portly Cardinal Tettamanzi's strong support from Italian cardinals as a candidate to restore Italy's leadership of the Vatican in the upcoming papal conclave; and I heard from Sylvia Poggioli on NPR this morning that certain statues in Rome have traditionally been used to get around church censorship by acting as display sites for subversive limericks about pontifical authority (do they really write limericks in Italian?). Naturally, the two topics fell together in my mind immediately, in limerick form, as you might expect:

According to some in the Vatican,
Tettamanzi is confident that he can
Get the Roman coalition
Behind his ambition —
At least, if he isn't too fat he can.

All right, yes, it's rubbish, I grant you that; but better than what my mortgage company has sent me unbidden in the mail, for heaven's sake. The rhyme was the tricky bit. It's perfectly clean. I think it should count as legitimate comment on a topic of public concern. I am not free to get to Rome for tomorrow's funeral, as my class meets on Fridays, but if some Language Log reader in the city could kindly attach my poem to a suitable statue over the weekend, I would be most grateful.

Posted by Geoffrey K. Pullum at 02:24 PM

Defense Language Transformation Roadmap

Fred Kaplan in Slate tears into what he calls "one of the funniest and saddest government documents I've run across in years", the Defense Language Transformation Roadmap. Money quote:

In the three and a half years after the Japanese bombed Pearl Harbor in 1941, the United States built a massive arsenal, equipped an equally massive fighting force, and declared victory in a worldwide war over imperial Japan and Nazi Germany.

In the three and a half years after the Soviets launched the Sputnik satellite in 1957, the U.S. government funded dozens—if not hundreds—of Russian-language and Russian-studies departments not just within the military but in high schools and colleges all across America.

Now, three and a half years after Islamic fundamentalists flew airplanes into the World Trade Center and the Pentagon, the Department of Defense is three months away from publishing an official "instruction" providing "guidance for language program management."

Well, the DoD has been doing a few other things. And other relevant sectors of American society have not all been joining in enthusiastically. But still...

Kaplan's complaint reminds of something that happened towards the end of my (generally positive) 15 years in the research lab of a large company:

Style 1 : 50-odd R&D department heads sat in meetings for several months to plan a weekend "technology portfolio exposition" for managers of line organizations.

Style 2 : the CEO of a small ($100M/year sales) technology company had an idea, called a friend who told him to talk to me, reached me by phone on the Thursday before the big weekend exposition, and then flew in from California overnight for a brief demo Friday morning at the computer in my office, and a conversation over lunch about implementation issues.

Style 1 results: After the weekend technology show, we set up a subcommittee to compile and evaluate the responses, and scheduled a further series of meetings to decide on the methodology for prioritizing product opportunities for further exploration.

Style 2 results: Monday morning, FedEx delivered a package from Friday's visitor, containing a draft licensing agreement and half a dozen bound copies of a business plan.

Of course the license agreement and business plan then languished for quite a long time in the committee charged with considering such proposals. In fact, I think they were still there when I left the company.

The U.S. Department of Defense now seems to exhibit instances of both styles, in general and in the specific case of its language needs. My impression is that the Defense Language Institute in Monterey, for example, has responded rapidly and effectively to changes that began fifteen years ago with the end of the Cold War, and have accelerated over the past few years since 9/11. However, the document that Kaplan links to certainly gives an impression of something less than urgency felt in some other quarters.


Posted by Mark Liberman at 02:18 PM

Dangling Milan

Keith Ivey alertly pointed out a very odd word order in the AP story that I quoted a couple of days ago:

Tresoldi, from northern Italy, appeared concerned that a remark Sunday by Milan Dionigi Cardinal Tettamanzi would put the cardinal in the proverb's risk category. Tettamanzi, 61, spoke of a "very affectionate caress" that John Paul gave him three years ago when tapped to lead the high-profile diocese.

The story is about someone named Dionigi Tettamanzi, who is the cardinal archbishop of Milan. So you might think he would be "Milan Cardinal Dionigi Tettamanzi" in journalese, like "Harvard logician Henry Sheffer", or "Philadelphia architect Louis Kahn".

And indeed some versions of the AP story have exactly that order:

Tresoldi, from northern Italy, appeared concerned that a remark Sunday by Milan Cardinal Dionigi Tettamanzi would put him in the proverb’s risk category. Tettamanzi, 61, spoke of a “very affectionate caress” that John Paul gave him three years ago when tapped to lead the high-profile diocese.

The thing is, the title of cardinal has a strange pattern of usage in English: it's always Archbishop Sean O'Malley, but (if the current archbishop of Boston had been elevated to cardinal) he'd traditionally have been called Sean Cardinal O'Malley. When I was a kid, I thought that Cardinal was Francis Spellman's middle name, because what I heard on the radio was always "Francis Cardinal Spellman".

This usage seems to be going out of favor, though I wasn't able to find any online usage manuals that specified the change. Google counts 3,940 for "Francis Cardinal Spellman", and 1,430 for "Cardinal Francis Spellman", a ratio of almost 3 to 1 for the medial placement of the title; but there are 5,030 for "Bernard Cardinal Law", and 32,300 for "Cardinal Bernard Law", a ratio of more than 6 to 1 in the other direction. The practice in news source has swung even further: Google News gives only 9 for "Bernard Cardinal Law" against 353 for "Cardinal Bernard Law". Yahoo News gives only 3 for "Bernard Cardinal Law" and 224 for "Cardinal Berard Law".

Likewise, Google News has 221 for "Dionigi Cardinal Tettamanzi" and 6,760 for "Cardinal Dionigi Tettamanzi". Among English-language news organizations, the more general order of Title Firstname Lastname now outnumbers the special treatment of Cardinal by 30 to 1 or more, and the AP's house style seems empirically to favor the regularization. This appears to follow the current usage of Catholic Church itself, which puts the title first in pages on the vatican's web site.

So Keith speculates that the AP story originally had "Milan Cardinal Dionigi Tettamanzi", and some old-fashioned editor at the Winnipeg Sun (on whose site I linked the AP story) corrected the text by moving Cardinal to its traditional place between the first and last names, without noticing that (s)he thereby created an ungrammatical phrase due to the modifier Milan left dangling there without anything to modify. I agree that this is the most likely explanation.

Ironically, there was already a syntactic oddity in the quoted paragraph. In the second sentence

Tettamanzi, 61, spoke of a “very affectionate caress” that John Paul gave him three years ago when tapped to lead the high-profile diocese.

there is a when-phrase without a subject that is obviously intended to apply to Tettamanzi, who was moved to Milan three years ago. Though this is too subtle to count as WTF grammar, I have a hard time not associating the participle tapped with John Paul instead. I don't have this problem to the same extent if an explicit pronoun is inserted:

Tettamanzi, 61, spoke of a “very affectionate caress” that John Paul gave him three years ago when he was tapped to lead the high-profile diocese.

[Update: Chris Waigl emailed

Same in German.

Google, German pages:

9 850 for "Kardinal Joseph Ratzinger" - 5 840 for "Joseph Kardinal Ratzinger"

So it looks as if the medial placement is still a bit more prevalent in German than it is in English.

On the Vatican site, it's all over the place, with a bit of an advantage for putting the title first:

.va domain, German documents:

22 "Kardinal Joseph Ratzinger" - 15 "Joseph Kardinal Ratzinger"
7 "Kardinal Joachim Meisner" - 3 "Joachim Kardinal Meisner"
8 "Kardinal Christoph Schönborn" - 4 "Christoph Kardinal Schönborn"
1 "Kardinal John Henry Newman" - 1 "John Henry Kardinal Newman"

The advantage in English seems to be somewhat more in the Cardinal-first direction -- in English-language documents in the .va domain, Google finds:

  Cardinal First (Middle) Last First (Middle) Cardinal Last
Bernard Law
Joseph Ratzinger
Joachim Meisner
Christoph  Schönborn
John Henry Newman
Dionigi Tettamanzi
Francis Spellman

Some indication of unstable Vatican usage on this point can be found in the letters from popes to cardinals that are archived on the .va site. For instance a letter dated March 3, 2005, to Cardinal Francis Arinze, puts the "Cardinal" title first in all versions [English, French, German, Italian, Portuguese, Spanish] -- Latin isn't available. A letter dated December 7, 1978, to Cardinal Egidio Vagnozzi also puts the title first in the Italian version, but has a medial title in the Latin version. A letter dated May 10, 1982, to Cardinal Joseph Höffner, has a medial title in the German version

Meinem ehrwürdigen Bruder Joseph Kardinal Höffner
Erzbischof von Köln
und Vorsitzender der Deutschen Bischofskonferenz

and also in the Italian version:

Al mio venerabile Fratello Giuseppe cardinale Höffner,
Arcivescovo di Colonia e
Presidente della Conferenza Episcopale Tedesca

Usage in Latin seems more stable -- on a quick scan, I didn't see any Latin letters to cardinals with non-medial titles, right up to a letter from Febuary 2005 (no translations available) that begins

Venerabili Fratri Nostro
Pontificii Consilii pro Valetudinis Administris Praesidi


[Update: Caelestis at sauvage noble cites further evidence of variation "in the valedictions of the Vatican Secretary of State's letters published on line". ]


Posted by Mark Liberman at 07:34 AM

April 06, 2005

"The Japanese are Japanese because they speak Japanese"

An article by Roger Pulvers in The Japan Times, dated 4/3/2005, discusses the widespread belief among Japanese people that their language is uniquely difficult. Pulvers describes a conversation with a cab driver -- an argument about the comparative difficulty of Japanese and Polish morphology -- and looks for a more general moral:

Is his quaint obstinacy an indication of a wished-for ethnic "exclusivity"?

I believe that this irrational belief in the difficulty of their language bestows upon Japanese people, willy nilly, a false mystique, as if through their language they were able to harbor secrets to which the outside world could never be privy. This false mystique allows them to entertain a feeling of national sharing without having to prove it explicitly. "We all think and feel the same way," it tells them, "and we can express this in a way that is only open to Japanese. The fact that non-Japanese cannot decipher this is proof of our ethnic cohesion." If they admit that the Japanese language is no harder than any other, and maybe even easier in some ways, their self-styled aura of exclusivity loses much of its shine.

Pulvers' article was cited by Bridget Samuels at ilani ilani, who saw it on the Language Feed; Language Hat picked it up from Bridget, and gave Pulvers an appropriately hard time about grading the difficulty of languages according to morphology:

...the idea that a simple morphology means a simple language is ridiculous. Complexity is to be found in many areas of a language, and if morphology is simple I guarantee you syntax and other aspects pick up the slack.

The rhetorical structure of Pulvers' article is familiar from deadline-haunted columns over the decades. An allegedly general characteristic of some group (New Yorkers, the French, the Japanese) is established by citing its display in the person of a cab driver, and then used by the writer as the basis for an even broader set of generalizations.

However, the idea the Japanese language is somehow unique, and that this is causally connected with unique properties of Japanese people, does really seem to be widely held in Japan, and not just among cab drivers. One of the odder books on my shelves is a little volume entitled The Japanese Brain: Uniqueness and Universality, by Tadanobu Tsunoda, translated by Yoshinori Oiwa (Taishukan Publishing Company, 1985). From 1958 to 1970, Dr. Tsunoda was the chief of the department of Otology and Audiology at the National Center for Speech and Hearing Disorders, and at the time of the book's publication, he was a professor at the Department of Auditory Disorders at the Medical Research Institute of the Tokyo Medical and Dental University.

Here's a relevant passage from Tsunoda's Foreword:

...there is part of human speech which cannot be simulated by the computer and which might be called pre-verbal or semi-verbal sounds. I have investigated at depth the responses of the human brain to this little known type of sound, using normal subjects and a variety of sounds existing in nature and in our everyday environment. As a result, I have found that the normal human brain has an elaborate subconscious mechanism which discriminates sounds on the basis of their physical characteristics on the sub-cognitive level. [...]

My findings seem to provide an explanation of the unique and universal aspects of Japanese culture. Why do Japanese people behave in their characteristic manner? How has the Japanese culture developed its characteristic features? I believe the key to these questions lies in the Japanese language. That is, "the Japanese are Japanese because they speak Japanese." My investigations have suggested that the Japanese language shapes the Japanese brain function pattern, which in turn serves as a basis for the formation of Japanese culture.

Tsunoda claimed to find differences between Japanese and Westerners in cerebral lateralization for the processing of certain sorts of sounds. Although he attributes the difference to the effect of the Japanese language, he confuses things by also asserting an oddly-distributed racial or ethnic effect:

It has been found that there are esssentially two brain function patterns -- one shown by Japanese and Polynesian people and the other by the rest of people.

As far as I know, other researchers have not been able to replicate his findings.

There is a long and distinguished tradition of speech science and neuroscience in Japan, and this sort of thing is by no means typical of it. But the Japanese version of Tsunoda's book, which was published in 1978, was enormously popular for a decade or so. Nihonjinron -- the discussion of Japanese identity which often seems to invoke popular feelings of Japanese essentialism -- often seems to be part of the background here, for intellectuals as well as cab drivers.

[Update: Ray Girvan emailed:

A relevant cross-link: it was reading about Tsunoda that led me to that list of Japanese ideophonic terms you mentioned a while back ( http://itre.cis.upenn.edu/~myl/languagelog/archives/001238.html).

Ideophones, I recall, fitted into his conclusion that the Japanese process natural sounds in the language sphere (as if we heard ducks literally saying "quack quack" rather than making a noise conventionally written as "quack quack").

See " The Japanese Language Brain"

Ray has some excellent further discussion on his Apothecary's Drawer Weblog.]

Posted by Mark Liberman at 06:07 AM

I didn't know...

that the audio from some broadcasts and other sources can now be searched in transcripts produced automatically by speech recognition software, provided by a HP system called SpeechBot.

Here's the SpeechBot search page for WBUR's Here and Now, which still features the Compaq logo.

I tried for umami but got nothing -- it's probably an out of vocabulary ("OOV") word, though I guess it's also possible that today's broadcasts haven't been indexed yet.

A query for {"silver leaf gospel"} got me the April 1 story "A Joyful Noise" that I was searching for, though. The surrounding bit of transcript looked like this in the ASR output:

...this is a road it's english move the use of the only city to the phrase day that that I now for more information on the silver leaf gospel singers of roxbury massachusetts go to our with you that that that in the end of the man in the state hanging on and they go on and I can be a reduction in the view are obliged to me to ..

The words in the associated clip actually seem to be something like:

Deacon Randy Green: ...like I said, that the Lord has uh much more for me to do as I always say, he ain't through with me yet.

Singers: ... ((we ain't got no)) Three gates ((will)) open over here, I got my religion and I won't be late.

Robin Young: For more information on the Silver Leaf Gospel Singers of Roxbury, Massachusetts, go to our web site here dash now dot org.

Singers: ... gates to the city, hallelu, hallelu.

though it's hard to tell, in places, because the singers are always in the background.

This examples shows off two of the worst aspects of current speech recognition technology: the lack of robust "diarization" (i.e. keeping track of who is talking when rather than running everything together as if it was from a single source), and the lack of good ability to deal with overlapping speech, speech over music etc. Still, at least it accomplished the indexing that I asked for!

Looking for "pope john paul" found six extracts from the April 1 show (which therefore must indeed be the most recent day indexed), with the most relevant passage (or at least the one presented first) being given as:

...in the holy cross and mr. massey and senora furnaces says his sentencing and that is instances the pope's condition has worsened you're listening to here and now you're a growing young is here and now and if you just joined us the vatican has just released a statement saying that to a pope john paul the second's conditions has seriously worsened we are following the situation in rome where pope john paul the 2nd would seem to be close to death and we're also speaking here in united states today that o'brien professor of history at the college of holy cross in worcester massachusetts an expert on the american catholic church and david welcome back to his and joining us and now the studio to hear now ...

My transcript of the associated clip is

[Female speaker]: ... of the Holy Cross in Worcester, Mass; we're going to continue our conversation in a few seconds; again, the Vatican statement says the Pope's condition has worsened. You're listening to Here and Now; we'll be right back.

Robin Young: I'm Robin Young, it's Here and Now, and if you've just joined us, the Vatican has just released a statement saying that uh Pope John Paul the Second's conditioned [sic] {breath} has seriously worsened. Uh we are following the situation in Rome uh where Pope John Paul the Second seems to be close to death, and we're also speaking here in the United States to David O'Brian, professor of history at the College of Holy Cross in Worcester, Massachusetts, an expert on the American Catholic Church. Uh David, welcome back --

David O'Brian: Hi. ((Glad to be here))

Robin Young: And joining us- and joining us uh now in the studio here and now is uh ...

Again, pretty good indexing; semi-crappy transcript; lack of diarization and other punctuation-type formatting makes the ASR transcript pretty hard to read, even where it's mostly correct.

Though it's hard to tell from two short passages, the speech-recognition engine used in this system seems to be a generation or two behind the state of the art. These days, the best systems should be able to achieve an overall word error rate of about 10% on broadcast material. These two passages are not ideal because of background music (fairly loud in the first one, softer in the second), and so the expected performance would be somewhat worse.

However, the overall application design is impressive, and the indexing performance is decent. A sign of things to come, I think.

Posted by Mark Liberman at 04:57 AM

April 05, 2005


In response to a couple of posts by Heidi Harley (and a note by me, and a post by Bob Kennedy over at Phonoloblog), Q. Pheevr presents some actual data on the phonetics of "the sound Marge Simpson makes to express some combination of disapproval, annoyance, and frustration". Q found some audio clips on the web, and made (wideband) spectrograms, which Q admits "don't really tell me a whole lot".

But a "wide-band spectrogram" -- one made with narrow time resolution and thus broad frequency resolution -- is usually not a good way to see what's happening with voice quality. Here's a better display of Q's first example:

The top panel is a pitch track (not at all believable in this case); the middle panel is a "narrow-band spectrogram" (with broad time resolution and thus narrow frequency resolution -- the analysis bandwidth here is about 20 Hz., as opposed to the 200 Hz or so of Q's spectrograms); the bottom panel is the audio waveform.

As you can see clearly in the waveform, there are three basic parts of the sound. The first part is short (about 100 msec), high-falling in pitch (about 350 Hz. to 275 Hz.), and relatively "pure" in voice quality. After a brief transitional segment of period doubling, the second part is longer (about 300 msec), lower in amplitude and slightly rising in pitch (about 70 Hz to 90 Hz.), with a fair amount of "shimmer" (period-to-period amplitude variation). The fundamental is completely missing -- I suspect that this is due to the recording or some other aspect of the audio processing, though there might really be a nasal or voice-quality-related zero canceling the fundamental. The third part is the longest (about 430 msec) and the loudest. The glottal oscillation has become extremely variable, both in amplitude and in period, verging on what would be called "vocal fry" if there were fewer short-period components. I'd guess that there are several different modes of glottal oscillation going on at once, and the whole system is on the edge of chaos (probably in the technical sense of the word). The transition from the second to the third segment of the groan certainly involves some increased subglottal pressure, but there is probably a laryngeal-pharyngeal gesture as well, such as constriction of the false vocal folds and/or vertical tension on the larynx implemented by the strap muscles.

Q nevertheless gives what I think is a pretty good description:

Anyway, I'd describe the sound as a possibly creaky-voiced bilabial nasal with a very narrow somethingo-pharyngeal secondary articulation and falling tone. I might be able to figure it out better if I could make the sound reliably myself. Heidi can; she writes:

Heck, I can make that annoyed noise too, distinguishing it from the yummy mmm noise by doing some trick with my pharynx/tongue root/larynx, together with the other normal features associated with bilabial nasals, and I don't have anything at all like Kavner's distinctive vocal apparatus.

But when I try to do it, sometimes it comes out sounding like Marge, but sometimes it sounds more like a wounded muskrat or a sexually frustrated wookiee. I also don't have a readily available corpus of non-annoyed Marge sounds with which to compare the samples above. (Marge Simpson has a rough life, you know; there's a lot for her to be annoyed about.) But perhaps these notes will inspire someone else to improve upon my description.

I'll wait to say more until I've seen more data. My Simpsons corpus has just arrived from amazon.com -- now all I need is some free time to do the research!

(Of course, acoustic analysis can only tell us so much. We really need to study Julie Kavner's speech production -- but then Heidi says she is fluent in Marge-ese, so she would be just as good a subject. What Marge is doing in the third segment of the utterance analyzed above might be something that is called "Dysphonia plicae ventricularis" when someone can't help doing it all the time:

Typically, patients with dysphonia plicae ventricularis (also called false vocal fold phonation or ventricular dysphonia) demonstrate a low-pitched, coarse or rough, monotone voice. The voice may have a breathy quality. Usually, hyperadduction of both the true and false vocal folds is present. Because the ventricular folds have difficulty in making a good firm approximation along their entire length, severe hoarseness and breathiness often result. Vocal fold scarring may be mistaken for this disorder and must be ruled out.

This disorder is frequently responsive to voice therapy that focuses on gestures such as gargling and sighing, which relax supraglottic muscles and isolate true vocal fold adduction from false vocal fold adduction.

Whatever is going on can probably be seen using a fiberoptic laryngoscope.)

Posted by Mark Liberman at 05:05 PM


This morning on the the radio show Here and Now I heard Robin Young interviewing John Villani, who was pitching his book "The 100 Best Art Towns in America." I hope that Villani's taste in art and real estate is better than his evaluation of, well, taste...

Here's a transcription of the segment that bothered me:

Robin Young: You write that another uh one of your criteria is a Japanese culinary term umami.
John Villani: Mm hm.
Robin Young: Te- tell us, you know, more about what that means and how you used it.
John Villani: Well, it's a gut feeling. Umami is a- is a sixth sense, if you will, that's applied to uh to tastings of food um and wine.
And what it is is there's a notion that you can sort of feel something happening in the air.
Uh there's something intangible that you really cannot put your finger on. But yet when you do get there and you do get- and you- you do get exposed to this umami feeling, you get a sense that there's a vibe, something happening.

Umami is a Japanese word for a kind of taste, that's true. But just about everything else about this passage is nonsense, as far as I can tell.

For an alternative perspective, here's a quote from I.E.T. de Araujo, M. L. Kringelbach, E. T. Rolls, and P. Hobden, Representation of Umami Taste in the Human Brain, J Neurophysiol 90: 313-319, 2003:

Recently, the taste referred to by the Japanese word umami has come to be recognized as a "fifth taste" ... (after sweet, salt, bitter, and sour; umami captures what is sometimes described as the taste of protein). In fact, multidimensional scaling methods in humans ... have shown that the taste of glutamate [as its sodium salt monosodium glutamate (MSG)] cannot be reduced to any of the other four basic tastes. Specific receptors for glutamate in lingual tissue with taste buds have been also recently found. Umami taste is found in a diversity of foods like fish, meats, milk, tomatoes, and some vegetables, and is produced by the glutamate ion and also by some ribonucleotides (including inosine and guanosine nucleotides), which are present in these foods.

So umami is a "fifth taste", not a "sixth sense"; and it's not "something intangible", but rather a response to certain specific molecules such as glutamates (e.g. MSG) and ribonucelotides (e.g. IMG and GMP).

You can get the same story, along with a little history and some Japanese characters, in the Wikipedia article on Basic Taste:

Savoriness or umami is the name for the taste sensation produced by the free glutamates commonly found in fermented and aged foods. The additive monosodium glutamate (MSG), which was developed as a food additive in 1907 by Kikunae Ikeda, produces a strong umami taste. Umami is also provided by the nucleotides IMP (disodium 5’-inosine monophosphate) and GMP (disodium 5’-guanosine monophosphate). These are naturally present in many protein-rich foods. IMP is present in high concentrations in many foods, including dried Bonito flakes (Used to make Dashi, a japanese broth). GMP is present in high concentration in dried Shiitake mushrooms, used in much of Asian cooking. There is a synergistic effect between MSG, IMP and GMP which together in certain ratios produce a strong umami taste.

Umami is considered basic in Japanese and Chinese cooking, but is not discussed as much in Western cuisine, where it is sometimes referred to as "savory" or "moreish."

The name comes from umami (旨味 or うまみ), the Japanese name for the taste sensation. The characters literally mean "delicious flavour."

In English, the name of the taste is sometimes spelled umame, but umami (which conforms to the romanization standards of Japanese) is much more common, as in Society for Research on Umami Taste (http://www.srut.org/index_e.html).

The same taste is referred to as xiānwèi (鮮味) in Chinese cooking.

Umami taste buds respond specifically to glutamate in the same way that sweet ones respond to sugar. Glutamate binds to a variant of G protein coupled glutamate receptors.

Beyond Young's complicity in Villani's cluelessness, there might be a point here about language and thought. As I understand the history, umami was a traditional Japanese term for a kind of taste that wasn't clearly named in European languages; Kikunae Ikeda figured out in 1907 that umami taste could be stimulated by MSG, just as others have worked out (some of the) chemical underpinnings for sweet, sour, bitter and salty; European languages have happily borrowed the word along with the concept; but most people still don't know what it means.

[Update: Benjamin Zimmer emailed:

Enjoyed the "Umami" post. I see that the Wikipedia article gives a Chinese equivalent for umami (xian1-wei4). I can supply an equivalent in Indonesian (bahasa Indonesia): "gurih". Not much online about the umami-gurih equivalence... I found some Indonesian discussion, and also this poster (in German) from the Centre for General Linguistics, Typology and Universals Research (ZAS) of Berlin: "Wörter des Geschmacks und Geruchs"

There's probably more in "Umami in Japan, Korea, and Southeast Asia" by S. Otsuka in _Food Reviews International_ Vol. 14 No.2/3 (1998) (Special Issue: Umami). Found that listed on the website for the Society for Research on Umami Taste.

There's also some discussion of umami and its equivalents in this Linguist List post: http://listserv.linguistlist.org/cgi-bin/wa?A2=ind9804D&L=linguist&P=R5573

And Ray Girvan has some excellent further discussion in his Apothecary's Drawer Weblog.]

Posted by Mark Liberman at 02:40 PM

The rhetoric of relevance

Today's NYT has an article by Lydia Polgreen and Larry Rohter under the headline Third World Represents a New Factor in Pope's Succession. Among CBS News' pages on possible new popes, the one on Dionigi Tettamanzi introduces the factor of geography in a different way:

The cardinal - considered a moderate - became the leading Italian candidate for the papacy with his July 2002 nomination to become the cardinal of Milan, Italy's richest, most powerful archdiocese. Tettamanzi's promotion from his post as the cardinal of Genoa marks the first time in recent history the pope has moved a cardinal from one Italian diocese to another. Pope John Paul was the first non-Italian to lead the church in 455 years, a fact that could help or hinder the cardinal's chances. [emphasis added]

By using this disjunctive phrase, CBS News introduces geography into the discussion in a way that makes almost no claims about it at all. (This article in the The Australian explains at much greater length why John Paul II's non-Italianness is relevant to Tettamanzi's chances.)

An AP story uses a disjunction of relevance in a slightly different way, to weaken a topic sentence:

Being in a favoured position might or might not be an advantage. An Italian bishop, Libero Tresoldi, reminded reporters in Milan's Gothic cathedral about the oft-quoted proverb warning cardinals against overconfidence: "He who enters a conclave as pope leaves as a cardinal."

Tresoldi, from northern Italy, appeared concerned that a remark Sunday by Milan Dionigi Cardinal Tettamanzi would put the cardinal in the proverb's risk category. Tettamanzi, 61, spoke of a "very affectionate caress" that John Paul gave him three years ago when tapped to lead the high-profile diocese.

In this case, the negative version is basic: what follows elaborates on the idea that being favored in public speculation is not an advantage in the private decision-making process. But instead of just saying straightforwardly that "Being in a favoured position is not an advantage", the article uses "might or might not be an advantage" to put the question on the table as weakly as possible.

In an earlier post, I noted the use of such disjunctions as a way to put an issue out for discussion with minimal commitment to any content. We can expect an usual number of similar rhetorical devices in the current coverage of events at the Vatican. With tens of thousands of stories being filed each day based on a very limited amount of definite information about the selection of the pope's successor, journalists are forced by circumstances to present an unusual quantity of rumor and speculation.

Posted by Mark Liberman at 11:52 AM

April 04, 2005

Higher scrabble scores lead to lower test scores

The abstract from a paper by David N. Figlio, "Names, Expectations and the Black-White Test Score Gap", NBER Working Paper No. 11195 (March 2005):

This paper investigates the question of whether teachers treat children differentially on the basis of factors other than observed ability, and whether this differential treatment in turn translates into differences in student outcomes. I suggest that teachers may use a child's name as a signal of unobserved parental contributions to that child's education, and expect less from children with names that "sound" like they were given by uneducated parents. These names, empirically, are given most frequently by Blacks, but they are also given by White and Hispanic parents as well. I utilize a detailed dataset from a large Florida school district to directly test the hypothesis that teachers and school administrators expect less on average of children with names associated with low socio-economic status, and these diminished expectations in turn lead to reduced student cognitive performance. Comparing pairs of siblings, I find that teachers tend to treat children differently depending on their names, and that these same patterns apparently translate into large differences in test scores. [emphasis added]

Figlio used

test score, gifted classification and transcript data for every student in this Florida school district from 1994-95 through 2000-01. Because of confidentiality restrictions, I cannot reveal the identity of the school district, but I can report that my dataset includes information on 55,046 children in 24,298 families with two or more children.

Most notable about my dataset is that I can compare the outcomes of sibling pairs, as proxied by children sharing the same home address and phone number.

He modeled the socio-economic status of names from an independent data set:

In order to measure the socio-economic status of a name, I use birth certificate data from all children born in Florida between 1989 and 1996 to predict the probability that a baby’s mother will be a high school dropout. I decomposed every observed name into a series of phonemic components—combinations of sounds, letter orders, and punctuation, and then regressed these combinations against maternal dropout status to construct predictions of socio-economic status implied by a name. Four frequent attributes of low socio-economic status names are particularly striking: (1) the name begins with one of a number of prefixes, such as "lo-", "ta-", and "qua-"; (2) the name ends with one of a number of suffixes, such as "-isha" and "-ious"; (3) the name includes an apostrophe; and (4) the name has is particularly long, with several low-frequency consonants. The easiest way to characterize this fourth characteristic is to count the number of "Scrabble" points of the name—I consider a name to have a high Scrabble score if its Scrabble value exceeds twenty points.

This measure identifies about 12% of the children in his school sample as having low socio-economic status names. Figlio found that "there is considerable within-family variation in naming patterns. Moreover, ... families, both Black and White, are equally likely to transition from a low socio-economic status name to one that has no identified characteristics as they are to transition away from a name with no identified characteristics".

While confidentiality restrictions prevent me from describing the names that are extremely uncommon in the Florida data set, I can identify names given at least ten times in the data to describe a hierarchy of names’ expected socio-economic status, and present all regression results in terms of a range of observed names—first I compare two marginally common names, one given almost exclusively to White children ("Drew") and one given almost exclusively to Black children ("Dwayne"). Then I compare names along a hierarchy, from a name with one identified attribute ("Damarcus") to a name with two identified attributes ("Da'Quan") to a name with three or more identified attributes (none are observed with sufficient frequency to name here.) Almost no White children are given names with two or more observed attributes, but ten percent are given names with one of these attributes. Most are sufficiently uncommon to name here, but some names given to at least ten White children in my dataset include "Jazzmyn" and "Chlo'e" (not to be confused with "Chloë", which is associated with high socio-economic status.)

He uses national percentile rankings on nationally-norm-referenced tests, and regresses these against the equation. The results:

The upshot here is that while names associated with Black children tend to be associated with modestly lower test performance, the largest estimated negative relationships between names and test scores occur with regard to low socio-economic status. We observe virtually identical results regardless of whether I characterize names using a socio-economic status index or merely count the number of low socio-economic status attributes of the name.

In fact, none of the effects are enormous: the largest (statistically-significant) effects on test scores that Figlio cites seem to be about 1.5 in terms of "national percentile ranking" (which implies a scale of 100). However, he presents the quantitative results exclusively in terms of a hierarchy of paired name comparisons (e.g. "Drew" vs. "Dwayne" or "Damarcus" vs. "Da'Quan"), and it may be that there are larger effects across the whole spectrum of names.

[via Jeff Erickson at Ernie's 3D Pancakes, via Abiola Lapite at Foreign Dispatches. Also a WaPo article.]

Posted by Mark Liberman at 05:59 AM

Hartman's Law confirmed again

For April 1 this year, Paulo Ordoveza, the author of the excellent How Now, Brownpau? weblog, organized a March to End "Beg the Question" Abuse:

For too long, we linguistic pedants have cringed, watching this phrase used, misused, and abused, again, and again, and again. "This begs the question..." we read in the editorials, see on TV, hear on the radio, (perhaps even read in one of those newfangled "web blogs") and we must brace ourselves as the ignoramii of modern society literally ask a question after the phrase.

It was inevitable, though, that Paulo should fall victim to Hartman's Law of Prescriptivist Retaliation: "any article or statement about correct grammar, punctuation, or spelling is bound to contain at least one eror". [I'm extending Hartman's Law to cover "usage" as well.]

Paulo's downfall was pluralizing ignoramus as "ignoramii". The error was immediately pointed out by Termite in a comment on Metafilter, and it's such an obvious mistake that we can safely assume Paulo did it on purpose, as a joke. However, for those who might have been taken in by his gambit, I'll explain.

The fake plural "ignoramii" exhibits two mistakes at once. To begin with, ignoramus is not a Latin noun. It's the first person plural present indicative of the verb ignoro, and it means "we do not know" or "we take no notice of". Once it was borrowed into English and (later) made into a noun, its plural became simply "ignoramuses". And even if ignoramus had been a noun in Latin, its plural would have been something like "ignorami" or "ignoramūs", depending on its declension, but never "ignoramii".

As the OED explains, the English use of ignoramus originated as

The endorsement formerly made by a Grand Jury upon a bill or indictment presented to them, when they considered the evidence for the prosecution insufficient to warrant the case going to a petty jury. Hence quasi-n. or ellipt., esp. in the phrases to find, return, bring in (an) ignoramus [...] Also transf. an answer which admits ignorance of the point in question; fig. a state of ignorance. (The words now used in the finding of the Grand Jury are ‘not a true bill’, or ‘not found’ or ‘no bill’.)

It later came to be used to mean "an ignorant person". The OED says that

[In reference to the origin of this, cf. Ruggle's Ignoramus (acted 1615) ‘written to expose the ignorance and arrogance of the common lawyers’, in which ‘Ignoramus’ is the name of a lawyer. The word occurs also in the following title, evidently in legal connexion: ‘The Case and Arguments against Sir Ignoramus, of Cambridge, in his Readings at Staple's Inn’, by R. Callis, Serjeant at Law (1648). See also quot. 1634 below.]

a1616 BEAUMONT Vertue of Sack in Poems (1653) Nj, Give blockheads beere, And silly Ignoramus, such as think There's powder-treason in all Spanish drink.
1634 Grammar Warre Dvij, All students of Ignorance, with these bussards of Barbary, Ignoramus and Dulman his Clearke, were..exiled for euer out of all Grammar; and all false Latine was euer after confiscated to their vse.
1641 Vox Borealis in Harl. Misc. (Malh.) IV. 434 So many of their commanders are ignoramusses in the very vocables of art.
1675 COCKER Morals 8 By verbal sounds, who makes his small parts famous, But proves himself the greater Ignoramus.
1683 KENNETT tr. Erasm. on Folly 48 Who is so silly as to be Ignoramus to a Proverb? 1790 COWPER Lett. 10 May, So ignorant am I and by such ignoramuses surrounded.
1853 C. BRONTË Villette vi, I am quite an ignoramus, I know nothing--nothing in the world.

So watch those fake Latin plurals -- you too might be "exiled for euer out of all Grammar; and all false Latine ... euer after confiscated to [your] vse".

[Update: Benjamin Zimmer observes that this law was independently discovered by two other people at about the same time, and thus has two other names besides "Hartman's Law":

" McKean's Law" (after Verbatim editor Erin McKean):
Call it McKean's Law: Any correction of the speech or writing of others will contain at least one grammatical, spelling, or typographical error.

"Skitt's Law" (after alt.usage.english contributor "Skitt"):
Skitt's Law, a corollary of Murphy's Law, variously expressed as "any post correcting an error in another post will contain at least one error itself" or "the likelihood of an error in a post is directly proportional to the embarrassment it will cause the poster."


Posted by Mark Liberman at 12:06 AM

April 03, 2005

Go Cards

It's Sunday, so my wife Karen is making her more-or-less-weekly telephone rounds with her family back in Louisville, KY. She was on the phone with her mother when I heard her say:

So were you guys in mourning last night?

So I'm thinking: who died yesterday? Oh, right, the Pope. Karen's family is Catholic, so that makes sense. But the next thing out of Karen's mouth is:

I know, they just couldn't hit a shot in the last 5 minutes!

Did I mention that Karen's family is from Louisville? So now I'm thinking: oh, those Cardinals. Shows you how little I know about what it means to come from basketball country.

[ Comments? ]

Posted by Eric Bakovic at 10:11 PM

Mall semantics

Caught on the "international male" page (about shopping opportunities for gay men, all over the world) in the March 2005 issue of Instinct, p. 38:

Another recent addition to L.A. is the Grove, an outdoor mall, which has your basics (Banana Republic) as well as department stores and more boutiquey shops.

Surely "outdoor (shopping) mall" has come past my eyes thousands of times, but this was the first time I reflected on it.  Its meaning is (almost) transparent, so it's unlikely to find a place in dictionaries (and, indeed, it's not in the OED Online).  Still, it's not without some interest semantically.

This will be a little adventure in folk categorization.

First, "outdoor mall" has the form of a marked, special case: your classic (shopping) mall -- the Galleria, the Mall of America -- is indoors, under a roof.  Outdoor malls are, well, outdoors and open to the sky.

But an outdoor mall isn't just a place to shop that happens to be open to the elements.  It shares one crucial element with indoor malls: easy pedestrian access from one store to another, without interference from traffic.

So your ordinary "shopping street", like Fifth Avenue, doesn't count as a mall, because of the traffic on the avenue and the side streets.  More generally, city "shopping districts" don't count as malls.  If, however, the shopping street or district is closed to traffic, then we have a species of outdoor mall, sometimes described as an "outdoor pedestrian mall".  (I draw here from some of the 46,900 sites that Google provides for "outdoor mall".)

And your ordinary "shopping center", with clusters of stores sprinkled around a gigantic parking lot, doesn't count as a mall, because pedestrian access from one store to another is not, in general, easy.  At the San Antonio Center, a few miles south of me, it borders on the harrowing, in fact, and I don't recall anyone ever referring to the place as a "mall".  If, however, you clump all the stores together in a central core, with the parking all around it, then you have an outdoor mall.  So the Stanford Shopping Center, a mile north of me, which has this arrangement, is commonly referred to as a "mall".  In fact, the center's literature refers to it as a "mall", a "shopping mall", an "outdoor mall", and an "outdoor shopping mall".

A subtype of this sort of outdoor mall is the "shopping village", which resembles an apartment or condo complex (often on several levels), and indeed not infrequently has housing mixed in with the stores and restaurants and health clubs and barber shops and whatever.

(Note that all sorts of non-shopping establishments can be located in malls, but if there isn't a significant opportunity for shopping, it's not a mall, but merely some kind of "center".  It's like drugstores: you can sell all sorts of things in a drugstore that aren't in any way describable as "drugs", but there has to be a significant presence of things that are.)

Outdoor malls can be permanent fixtures or temporary events.  So, when the main street of Elmira (ON) is closed for the annual Maple Syrup Festival, with booths selling all sorts of things on the street, the event is described as an "outdoor mall".

A further subtlety: malls, both indoor and outdoor, are designed to foster not mere shopping, but a "shopping experience".  It's expected that visitors to the mall will window-shop, socialize in the common areas, and enter more than one establishment.  If a high percentage of visitors do business at just one establishment, you have something that is technically a mall, but not a very good example of one -- the mall equivalent of the penguin in the bird world.  Such malls are actually very common in the U.S.: this is the ubiquitous (outdoor) "strip mall", where the establishments are arrayed in a row, making access to any one of them easy from the parking area, without inviting walking from one to another (though this is possible).

Four footnotes.  (1) In addition to malls in the real world, there are virtual malls, "web malls" (53,200 raw Google web hits).  (2) The hits for "outdoor mall" take in not only uses of this sequence of words parsed as adjectival "outdoor" plus head noun "mall" (as above), but also some parsed as a noun-noun compound meaning 'mall related to the outdoors'; these are malls, real or virtual, devoted to outdoor equipment (for hiking, climbing, barbecuing, etc.) or activities.  (3) An earlier version of this posting appeared on ADS-L on 3/12/05.  (4)  Since then Geoff Nunberg has written me about lots of mall-related vocabulary from the retail business: "anchor", "pad", "big box", "destination retail", "power center".  This is fascinating stuff, but not what I was going on about above, which is (mostly) about  the categorizations that ordinary people make, and about the vocabulary that goes along with it, while Nunberg's expressions are technical terms, used  mostly by specialists.  Admittedly, the line here is by no means clear, and there's plenty of traffic across it.   But "mall" and "outdoor mall" seem to be pretty clearly on the folk side of the line.

And now (4/3/05) come reports of variability in the reference of "mall", in particular uses for any street closed to traffic.  So, from Manhattanite John Cowan:

... the prototypical meaning of "mall" for me is "street closed to powered vehicular traffic".  The other, suburban, meaning is one I recognize and use in context, but out of context it's the above meaning that comes to mind first.

And Danielle McCredden reports from Australia:

... in Melbourne, we have Bourke Street mall, a section of Bourke street in the central business district which does not permit vehicular traffic (with the exception of trams).  Similarly in Adelaide, the Rundle Mall is nothing more than a section of street in the city which is paved and doesn't permit cars.  So "mall" for us describes the outdoor plan of the place but not necessarily the shops.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 08:42 PM

The sound of one hand waving*

On April Fool's Day, Terrence Deacon, Professor of Biological Anthropology and Linguistics at the University of California, Berkeley, gave a talk here at the University of Michigan on the evolution of language. If the talk was meant to be self-contained (as in, you needn't have read all his writings in order to follow the argument), it was remarkable for the complete lack of support offered for the main thesis.

Early in the talk, Deacon presented a handsome Power-Point slide with pictures of various plants that display Fibonacci spirals -- daisy petals, a pine-cone, things like that. He said that, although there's a genetic component underlying these structures, the spirals themselves come about through self-organization: they are not directly encoded in the plant genomes, but arise in each plant because they're useful (for instance for ensuring that the maximum amount of sunlight will hit each leaf) and because their shape is mathematically determined. I haven't studied botany since I was an undergraduate, eons ago, so I will assume that his story about the Fibonacci spirals is right. In most of the rest of the talk he discussed other non-humans, especially finches, and the complex relationships between genes and behavioral patterns (like finch songs).

Finally he returned to people and argued that, although human language has a stage-setting genetic component [I can't guarantee that that's a precisely accurate paraphrase, but it's not too far off], innate universal grammar is nowhere near as rich as it's often claimed to be. Instead, like the plants with their self-organizing Fibonacci spirals, many or most of the universals in human language are to be attributed to -- and here I quote -- "social-semiotic self-organization". In one short sentence he mentioned a couple of examples that, he said, support this claim, but in the talk itself he gave no shred of evidence to justify the analogy to the mathematically elegant Fibonacci spirals. It wasn't even hand-waving -- at most one hand waving, or maybe just one appropriate finger. I wanted to ask what could possibly constitute non-circular evidence for such a claim, but I couldn't, because he announced at the beginning of the question period that he would recognize only in-group members in the discussion period. Well, O.K., he didn't put it that way: he said he'd call on "you guys at the back because I know you have to leave soon". So did the rest of us, unfortunately (or anyway I did; possibly others stayed and even got to ask questions after the favored few were finished with theirs).

A not totally unrelated thought: I'm beginning to wonder about biological anthropologists who talk about language. A year or two ago the local set invited a speaker who proposed that the Ur-human language probably had clicks because (a) humans originated in Africa, and (b) clicks occur in a few African languages, and (c) there's a lot of genetic distance between some groups who speak click languages, and (d) clicks can't arise spontaneously, and (e) the chances for borrowing are vanishingly small because the groups aren't all that close geographically. (The huge problem here is with premises d and especially e.) Most linguists would hesitate to make pronouncements about biological anthropology; too bad the reverse doesn't also hold.

A final thought: maybe the late-19th-century members of the Société linguistique de Paris got it right when they banned research and publication on the evolution of language: the ban was meant to suppress wild unfounded speculation about language origins. Now that the topic is popular again, there still seems to be a lot more chaff than wheat.

A post-final thought: No, I don't think Deacon's talk was an elaborate April Fool's Joke. I did consider that hypothesis, but on the evidence it had to be rejected.

*Acknowledgment: The title of this post was suggested by fellow Language Logger Philip Resnik, who was visiting the Ann Arbor corner of Language Log Plaza on Friday, and whose fascinating demo and excellent talk entirely did away with the bad mood that Deacon's talk left me with.
Disclaimer: Philip's title suggestion does not, of course, mean that it'd be fair to blame him for any infelicities, a.k.a. stupid mistakes, in this post.

Posted by Sally Thomason at 03:54 PM

Elephant talk

Some recent observations of African elephants apparently learning to imitate sounds were noted by Henry Fountain in the NYT 3/29/2005:

In Kenya, a 10-year-old elephant named Mlaika seems to think she's a truck. At least she has been heard imitating the low rumble that trucks make on a nearby highway.

Mlaika's mimicry is described in the journal Nature, along with a report of an African elephant that lived in a Swiss zoo with Asian elephants and learned to imitate the chirping that only the Asian species makes.

The two findings show for the first time that elephants - like primates, birds, bats and some marine mammals - are capable of vocal learning. The discovery has important implications for understanding how elephants communicate.

This is important for reasons that go far beyond communication among elephants. Vocal learning seems to be rare among animals, to a degree that is surprising to most people. Imitating a sound is easy and natural for us, and so it's natural to assume that any intelligent animal who can hear and can vocalize shoud also be able to do it. However, the fact seems to be that this ability is quite rare: Eric Jarvis at Duke University discovered not too long ago that hummingbirds have it, and that was big news at the time. This page on his lab's web site expresses the now-standard view that

Vocal learning, the substrate of human language, is a very rare trait. It is known to be present in only 6 groups of animals: 3 groups of birds (parrots, songbirds, and hummingbirds) and 3 groups of mammals (bats, cetaceans[whales/dolphins], and humans). All other groups of animals are thought to produce genetically innate vocalizations. To understand this concept, it is important to distinguish vocal learning from auditory learning. Auditory learning is the ability to make sound associations, such as a dog learning how to respond to the sound "sit". All vertebrates have auditory learning. Vocal learning is the ability to imitate sounds that you hear, such as a human or a parrot imitating the sound "sit". Currently only vocal learners have been found to have forebrain regions dedicated to vocal learning and production of these learned vocalizations. Vocal non-learners only have been found to have non-forebrain vocal regions responsible for the production of innate vocalizations. [emphasis added]

Thus the statement in the NYT article that "primates, birds, bats and some marine mammals" are capable of vocal learning has false implications. Since it says "some marine mammals", but leaves "primates, birds, bats" unmodified, most readers will think that all primates and birds have the ability, whereas just three groups of birds (parrots, songbirds and hummingbirds) and one species of primates (humans) were previously known to have it.

The source of the new information about elephants is a paper in last week's issue of Nature: Joyce H. Poole, Peter L. Tyack, Angela S. Stoeger-Horwath & Stephanie Watwood, "Animal behaviour: Elephants are capable of vocal learning". Nature 434, 455-456 (24 March 2005). Here's the abstract:

There are a few mammalian species that can modify their vocalizations in response to auditory experience— for example, some marine mammals use vocal imitation for reproductive advertisement, as birds sometimes do. Here we describe two examples of vocal imitation by African savannah elephants, Loxodonta africana, a terrestrial mammal that lives in a complex fission–fusion society. Our findings favour a role for vocal imitation that has already been proposed for primates, birds, bats and marine mammals: it is a useful form of acoustic communication that helps to maintain individual-specific bonds within changing social groupings.

On Nature's "Supplementary Information" site for this paper, you may be able to listen to .wav files of Mlaika imitating a truck, Calimero producing a chirp-like call, and an adult female Asian elephant producing a chirp call. (I say "may" because I'm not sure which parts of the site require a subscription -- Nature in general is not an Open Access publication.)

I suppose that this Asian-elephant "chirp call" is what Kipling called "the 'hoot-toot' of a wild elephant" in Toomai of the Elephants:

At last the elephants began to lie down one after another, as is their custom, till only Kala Nag at the right of the line was left standing up; and he rocked slowly from side to side, his ears put forward to listen to the night wind as it blew very slowly across the hills. The air was full of all the night noises that, taken together, make one big silence—the click of one bamboo-stem against the other, the rustle of something alive in the undergrowth, the scratch and squawk of a half-waked bird (birds are awake in the night much more often than we imagine), and the fall of water ever so far away. Little Toomai slept for some time, and when he waked it was brilliant moonlight, and Kala Nag was still standing up with his ears cocked. Little Toomai turned, rustling in the fodder, and watched the curve of his big back against half the stars in heaven; and while he watched he heard, so far away that it sounded no more than a pinhole of noise pricked through the stillness, the ‘hoot-toot’ of a wild elephant.

All the elephants in the lines jumped up as if they had been shot, and their grunts at last waked the sleeping mahouts, and they came out and drove in the picket-pegs with big mallets, and tightened this rope and knotted that till all was quiet. One new elephant had nearly grubbed up his picket, and Big Toomai took off Kala Nag’s leg-chain and shackled that elephant fore-foot to hind-foot, but slipped a loop of grass-string round Kala Nag’s leg, and told him to remember that he was tied fast. He knew that he and his father and his grandfather had done the very same thing hundreds of times before. Kala Nag did not answer to the order by gurgling, as he usually did. He stood still, looking out across the moonlight, his head a little raised, and his ears spread like fans, up to the great folds of the Garo hills.

Of course none of this is not really "elephant talk", although Kipling assumes in his usual anthropomorphic way that elephants can communicate complex ideas:

Kala Nag, which means Black Snake, had served the Indian Government in every way that an elephant could serve it for forty-seven years, and as he was fully twenty years old when he was caught, that makes him nearly seventy—a ripe age for an elephant. He remembered pushing, with a big leather pad on his forehead, at a gun stuck in deep mud, and that was before the Afghan War of 1842, and he had not then come to his full strength. His mother, Radha Pyari,—Radha the darling,—who had been caught in the same drive with Kala Nag, told him, before his little milk-tusks had dropped out, that elephants who were afraid always got hurt; and Kala Nag knew that that advice was good, for the first time that he saw a shell burst he backed, screaming, into a stand of piled rifles, and the bayonets pricked him in all his softest places. [emphasis added]

It's very unlikely that elephants can communicate at anything like that level of complexity and abstraction. Still, vocal learning is felt to be one piece of the biological substrate needed for (spoken) language to develop.

I suspect that vocal learning is somewhat commoner among animals than scientists now recognize, so that hummingbirds and african elephants are not the last species who will be found to have it. I've seen someone who taught a Yorkshire terrier to imitate slowly rising pitch contours, and have myself sung along with a mutt who seemed to imitate motifs from George Jones and Mozart. It never occurred to me to submit a paper to Nature -- perhaps I should have done so!.

And I wonder, could the famous hybrid whale song of the North Pacific be the result of confusing adult role models rather than cross-species breeding? As I understand it, the species apparently involved are not among those that have been thought to exhibit vocal learning, but I think that this has simply been assumed on the basis of the stereotyped nature of those of their vocalizations that have so far been identified and studied.

Anyhow, all this raises again the question that I asked in an earlier post: "The mechanical substrate for language seems to have been lying around, ready for use, for hundreds of millions of years. Why didn't evolution pick up on the possibilities in a serious way until so very recently?"

Posted by Mark Liberman at 08:26 AM

April 02, 2005


We haven't had a cartoon in a while.

Here's another one:

From the people who brought you Chicken, blogged here back in 12/2003:

Note that full documentation is available here.


Posted by Mark Liberman at 10:08 PM

When is subalternism conciliatory?

If you were puzzled by all the in-jokes in the April Fool's CFP for the 1st Workshop on Unnatural Language Processing, I feel your pain. To demonstrate my empathy, I'm going to display my own ignorance. In particular, I'll track my painful attempts to understand a simple 22-word sentence in a newspaper article written for a general audience.

The context is a 3/30/2005 article by Jai Kasturi in the Columbia Spectator, dealing with a controversy centered on Columbia's department of Middle East and Asian Languages and Cultures (MEALAC). In the ninth paragraph, Kasturi -- an 8th-year MEALAC grad student -- connects the department's current troubles to some earlier academic kerfuffles at Columbia:

[...] I would like to suggest that the situation in MEALAC is in fact an extension of the dual English and anthro crises that preceded it, and perhaps has as much or more to do with internal Columbia politics. To put it simply, there has always been an intense and sometimes hostile competition among (and within) these departments on the question of how to teach cultural studies and literary theory at Columbia, including the difficult legacies of post-colonial theory. [...] Both the English and anthro crises revolved around these issues. The English department, as described in the March 10 Spectator article, dealt with their stalemate in part by eliminating their most hostile players. Anthro under Dirks took the conciliatory approach of importing subalternist theory and burying questions of narrative representation under a flurry of microhistory. [link and emphaisis added]

That last sentence is almost completely opaque to me. I understand all of the words, or at least the morphemes out of which they're composed; I can parse the sentence; I can even tell you who it says did what to whom, at least if I'm allowed to repeat phrases like "questions of narrative representation" whose intended meaning I suspect I don't grasp. But in the end, I just don't get it. What is subalternist theory and why was importing it conciliatory? What questions of narrative representation are (or were) at issue? What is microhistory? Why was it conciliatory to bury the former under a flurry of the latter? Was the flurry-burial a consequence of the subalternist importation, or an independent development?

Now, I took a few anthro courses in college, and more recently I've co-taught a course (entitled "Biology, language and culture") with a biological anthropologist (Alan Mann) and a cultural anthropologist (Greg Urban). I read books and papers by anthropologists from time to time, I've been to a couple of meetings of the American Anthropological Association, and I go to several talks a year sponsored by Penn's anthropology department. So I thought that I knew a reasonable amount about that field, for an outsider.

However, reading this sentence makes me feel stupid. Here's a sentence about the recent history of a department in a discipline I thought I knew a little bit about, a sentence whose content is apparently supposed to be plain to the entire readership of Columbia University's student newspaper, and I can't make head or tail of it. It seems that I'm seriously out of the intellectual loop.

I take some comfort in the fact that I'm not the only one.

The term subalternism is not found in the OED, or other dictionaries I've consulted, nor is it in The Johns Hopkins Guide to Literary Theory & Criticism. The phrase "narrative representation" is found, but seems just to mean, literally, "the representation of narrative", which does not help me to understand what questions about it might be "[buried] under a flurry of microhistory". Subalternism is also not found in the Wikipedia, but microhistory is: it's defined there as "the study of the past on a very small scale", and the OED calls it "Historical study which addresses a specific or localized subject". That's more or less what I guessed on the basis of the meaning of micro and history, and I can see how you might have a flurry of that, but I'm still puzzled about how to bury questions of narrative representation under it. Much less do I see why such burial was concilatory.

I found a 2002 paper by H. Masuda on Narrative representation theory, whose abstract says that

The mission of Narrative Representation Theory is to provide insights into the general principles that operate in the formation of covert discourse structures in natural languages. Narrative representations, which function as part of the underlying language faculty, are direct projections of the discourse module in the mind/brain. They are realized as external levels of hierarchical units that include interpretation units, coherency units, episode units, juncture units, and apex units. Each of these levels may consist of a sequential unit of internal constituents that are realized by exposition, complication, and denouement.

I'm ashamed that I've never heard of this -- it's basically a form of discourse analysis, which is a branch of linguistics, and one that I'm interested in even if I'm far from being an expert. But over at Google Scholar, {"Discourse representation theory"} (which I do know about) gets 1,390 hits, while {"Narrative representation theory"} gets only 5, all to work by Masuda. So I guess that's a false trail -- it can't be the source of the "questions of narrative representation" that Dirks had to conciliate his anthropological colleagues by burying under a flurry of microhistory. I'll work with the hypothesis that "narrative representation" is not a term of art in the sentence under study, but instead has something like its normal English meaning -- though this could be "how things are represented in a story", or "how the structure of stories is represented", or several other things.

OK, what about "subalternist theory"? A modest amount of googling fails to turn up a definition. However, I did find a review by Horacio Legrás of The Latin American Subaltern Studies Reader ( Ileana Rodríguez, ed. Durham: Duke University Press, 2001), which says that

The Latin American Subaltern Studies Reader is the only book published by the Latin American subaltern studies group. Immediately after its publication the group dissolved along lines marked by political as well as disciplinary disagreements. The project of subalternity began in India as a political and epistemological criticism of history. Historical knowledge, subalternists contended, organized the past in line with the governmental efforts of the modern state. Opposition to state policy was deemed logical and political if carried out in a language that the state could contest and eventually incorporate. When protests were incommensurable with that logic they were deemed archaic, aberrant, unintelligible.

So I gather that subalternism examines the point of view of the subaltern or subordinated groups whose protests were previously "deemed archaic, aberrant, unintellible", and subalternist theory must be the theory of what such viewpoints are like, and how to study them. Fair enough -- but why was it "conciliatory" to import such theory?

Well, enough research, or self-directed socratic dialogue, or whatever is going on here. Let's take the plunge and guess what Kasturi probably meant by writing that

Anthro under Dirks took the conciliatory approach of importing subalternist theory and burying questions of narrative representation under a flurry of microhistory.

Apparently (Kasturi thinks that) the anthropologists at Columbia were split along political lines about how to tell the story of (current?) social and cultural development. These disagreements can be called "questions of narrative representation" (though surely the disagreement was really about substance and not about presentation?). In this context, importation of "subalternist theory" was conciliatory because its perspective is agreeable to those on the left, while its explicit self-identification as a study of the point of view of subordinated or suppressed groups avoids claims of universality or objectivity, and so doesn't force confrontation on those who disagree with the views it studies. And microhistory's obsession with uncontroversial detail was conciliatory, because it provided a useful distraction from the fraught political questions of how to tell the big-picture story (the "questions of narrative representation").

At least that construal makes sense of the words as written. If it's wrong, I'm sure that someone will correct me. My question next is, what fraction of the readership of the Columbia Spectator was able to puzzle out some explanation of this kind? I'd bet that 99% of the campus would have been entirely defeated by that sentence, if they had read it. Most readers or listeners pass over that sort of puzzle in silence, though, because such flourishes of trendy terminology dare the reader to display cluelessness by admitting failure to understand.

Perhaps Kasturi has been affected by 8 long years in MEALAC to the point of being unable to remember what it's like not to understand such stuff. Or maybe this is an example of the technique of assertion-by-presupposition that is often used to introduce concepts without critical examination. Either way, it's not a terrific advertisement for MEALAC's pedagogy.

[Update: a well-informed friend writes

I am not particularly knowledgeable about the history of Columbia's Anthropology program, but drawing on my knowledge of the discipline as a whole, I would read the sentence this way:

Anthro under Dirks took the conciliatory approach of shifting the department's emphasis from literary approaches which emphasized normative issues regarding the West's portrayal of "the other," to a more empirically grounded approach based on social history, thus emphasizing the historical agency of formerly colonized peoples.

This doesn't explain who this is supposed to "console" or why it might console them, but I hope it does clear up some of the other questions.

So this reading interprets "questions of narrative representation" in a different way. I took it to mean "disagreements about how to tell the story of social and cultural development". My well-informed friend sees it as "literary approaches which emphasized ... the West's portrayal of the 'the other'". W.I. F. also combines microhistory and subalternist theory into the concept of "a more empirically grounded approach based on social history, thus emphasizing the historical agency of formerly colonized peoples".

This all rings true to me. But W.I.F. misremembers K.'s account of the social impact -- it's supposed to conciliate [someone] , not console them. I don't think that clarifies who or why, though.

W.I.F. offers a link I wish I had found:

Subaltern.org has a nice definition of the term subaltern:

SUBALTERN---Originally a term for subordinates in military hierarchies, the term subaltern is elaborated in the work of Antonio Gramsci to refer to groups who are outside the established structures of political representation. In "Can the Subaltern Speak?" Gayatri Spivak suggests that the subaltern is denied access to both mimetic and political forms of representation.

Subaltern.org is not on the first three pages of Google hits for either {subalternism} or {subalternist}. But I should have known to try {subaltern}.

W.I.F. adds:

What this definition leaves out is that Spivak was part of a Calcutta based group of scholars who published a journal called "Subaltern Studies" and that her essay is in part a critique of their efforts to "give voice" to the Subaltern. I say this because it is otherwise confusing to read subaltern studies as a replacement for a more literary approach focused on normative issues of representation - as Spivak seems to be engaged in just such a normative critique. (It was Spivak who first translated Derrida into English.)

Ah. That clears up a lot. But it makes the "conciliatory" part all the more puzzling. W.I.F. provides a couple of other links:

http://www.english.emory.edu/Bahri/Glossary.html (Scroll down to "subaltern.")

and explains further:

I think it [K.'s sentence] only makes sense if you look a the broader work of the subaltern studies collective, and not just at this particular essay by Spivak (although it is this essay which is most responsible for how the word is usually used in contemporary theoretical contexts).

I think I'm following this, sort of. Given a few more threads to unravel, I was able to find some other helpful links, such as the JHU GLT&C entries for Antonio Gramsci and Gayatri Spivak.

W.I.F. ends:

I hope this provided some clarification, despite my own obvious confusion.

Absolutely. But you're not the locus of confusion, friend.

Looking back over this post, I can't believe I've devoted so many words to trying to understand this one sentence. I guess that any fragment of in-group talk needs quite a bit of explication for outsiders. But it'd be nice for academic groupings to be less convoluted and more accessible to the rest of contemporary intellectual culture. ]


Posted by Mark Liberman at 08:52 AM

Noisily channeling Claude Shannon

The following announcement landed in my email inbox at 11:48 last night. The author (or anyhow the sender) was Jason Eisner. It's really very funny, at least for those who are familiar with recent work in computational linguistics and machine learning. We usually try to avoid unexplained technical references, but in this case I'll make an exception.

          First and Last Call for Papers (April 1, 2005)

Frankly, NLP is just too hard, and unsupervised learning is getting
itself into all kinds of trouble now that it's in its teens.  Here in
the heart of the Silicon Swamp, we're alarmed to find ourselves
uttering random n-grams just for emphasis.  It's time to treat the world
to 99.9% accuracy.  It's time to redefine the task.  It's time for the

           1st Workshop on Unnatural Language Processing
                  Johns Hopkins University CLSP

TALK ABSTRACTS of up to 1 page due by APRIL 30, 2005 to xxx@xx.xxx.xxx.  
We will attempt to collect these in an online proceedings.  As this is
an electronic workshop, there is no time limit on the talks themselves, 
although there is also no guarantee that anyone will be within earshot.

Self-invited talks (highest bidder)
Question Evasion: Lessons from the Loebner Prize Competition
Understanding Abney's Exposition of Blum & Mitchell's Reinterpretation 
    of the Yarowsky Algorithm

Shared task

   Zero-Sum Corpora: Destructive Mining of the Web

       Twenty teams.  One Web.  Three days.
      Are you computational linguist enough? 

Government panel
Is Document Classification Easier on Classified Documents?
Information Extraction: A Government and Binding Approach 
Anti-Discriminative Training
A Sin Tax for Some Antics  
English Unzipfed: No Unigram Left Behind

Suggested paper topics
(We hasten to assure you that our purported theme on punitive
linguistics is merely a strategem to extract abstracts from you.
You know that workshop organizers would never actually twist your arm in
a way that might keep you from typing something.  Thus, we concede that
we would grudgingly salivate over any overly original work at all: 
i.e., topics that have never been addressed before, and for good reason.)

* Scaling Down: From Universal Grammar to Galactic Grammar
* Corpse Linguistics (transducer decomposition, final states, 
                      the ultimate epsilon transition ...)
* Doonerism Spetection
* To Ken is at ion correct ion
* Self-Reference and its Implications for This Workshop
* Sentence Fragment Assembly and
* Cataphora Resolution (see below)
* Dynamic Time Warping (again)
* When Summarization Meets Wintarization
* Degenerative Grammar
* The Phrenology-Phrenetics Interface  
* Neuro-Linguistic Programming (a.k.a. Machine-Assisted Charisma)
* The New Irrationalist-Experientialist Debate

Machine miseducation track:
  + Overbearingly Supervised Techniques for Very Small Corpora
  + Mixtures of Pundits, Worldly Bayes Classifiers, & other Sadistical Models
  + The Information Turtleneck Algorithm
  + Aping Syntax: Monkey c-command, Monkey do command
  + Support Vector Hotlines and Other Monologue Systems
  + Bootstrapping Without the Boot

Program Committee (unconfirmed, indeed unwitting)
To preserve plausible deniability, we are adopting a triple-blind
procedure in which the reviewers will not be known even to themselves.
The most we can disclose here is that we will be noisily channeling
Claude Shannon and other emanations grises.

We are grateful for moral support from the Notional Science
Foundation, the Defense Advanced Delirium Agency (DADA), and 
the Linguistic Stipulation Consortium.

It would be an appropriate tribute to Jason to identify him as the bud of these jokes, but the permanent home of this call for papers is at the Journal of Machine Learning Gossip.


Posted by Mark Liberman at 08:07 AM

April 01, 2005

Odd landings

A couple of days ago, Eric Bakovic posted about an adverb that landed in the wrong phrasal slot: "I think that was clear from the day that I certainly met him." This morning, I heard another one in an NPR report on President Bush's Social Security roadshow in Iowa. Don Palmer, co-chair of the Linn County Republican Central Committee, expressing some skepticism about the plan, says "... there's a lot of still uh economic unrest here in Iowa..."

Palmer was not contrasting still economic unrest with sparkling economic unrest , obviously. He meant to say "there's still a lot of economic unrest here in Iowa", and the still just kind of slipped into the wrong slot.

If you look at a display showing how his words unfolded in time, and listen carefully to how he said them, you'll see and hear a little hesitation after "there's", where "still" didn't quite make it out in time, and then a longer pause after "still", when Palmer realized at some level that he'd messed up the order. Then there's an "uh" as he prepares to go on.

It's probably not an accident that the other plausible landing site for an adverb -- between "unrest" and "here in Iowa" -- also gets a pause: there's a definite correlation between the inter-phrase locations where adverbs tend to land, and the places where speakers tend to slow down or pause.

"There's a lot of still economic unrest here in Iowa" is not a reasonable order. But you often hear a pause in that sort of, uh, context. I wonder if that type of pause-for-reflection (or sometimes pause-for-emphasis) attracts adverbs, by performance analogy to the pauses-for-phrase-boundaries where adverbs rightfully land.


Posted by Mark Liberman at 10:16 PM

The station that you really makes a difference

That time of purgatory has arrived for us National Public Radio listeners: it is pledge time, one of the two periods each year when teams of enthusiastic volunteers and small-town radio station staff and management get up early and come into the studio to gather round the microphone to babble at us listeners to send in money. Don't procrastinate, they babble, this station means a lot to you so send in some money today. But the truth is that the average person, be they station manager or staff member or volunteer, simply cannot sit down at a microphone and talk coherently for five minutes over the airwaves several times an hour. It is an acquired skill. Most people don't have it. And so what you get is incoherent babbling and almost unbelievable logical and grammatical lapses. "On this final Friday of the week," said one staffer this morning on my NPR station. "These levels," said the station manager (he meant the various suggested levels for suggested gifts to the station), "really keep us towards the goal." And my brother Richard, visiting me from England and enthusiastically making notes on fine pieces of American English he encounters, wrote down the classic sentence-oid: "This is the station that you really makes a difference to you." It is the time of year that I really makes a difference to me, I know that. Pray for it to be over. I'd do anything for it to be over. Maybe even send them some money. I'll do that later. Ooh, right now the Capitol Steps are on. Love them.

Posted by Geoffrey K. Pullum at 11:29 AM

News of the day

I'm too busy for a real post, but here are a couple of language-related links...

That Media Lab researcher who invented a way for nearby merchandise to call you on your cell phone has also created a writer's assistant that translates English-language stories into computer programs. These programs don't actually run, but "they help writers to understand the implicit structure of their narratives", claimed the author, who is working on a C++ translation of The Mill on the Floss. Another Media Lab whiz has created Dynamix, a wearable digital assistant that turns data from body sensors into a heads-up display of coupled differential equations for athletes and dancers to use in planning their movements, and Mandelbroken, an image-processing program that calculates the Hausdorff dimension of a cookie in real time as it crumbles.

In breaking news from the Amazon basin, it turns out that the Pirahã and the Mundurucú actually have a complete number vocabulary, but regard numbers as deeply obscene, and will never say them in public. However, it has been confirmed that both groups definitely do lack a word for FCC. The IAWJ (International Association of Whorfian Journalists) has convened an emergency meeting to decide on a response.

Posted by Mark Liberman at 08:02 AM

No word for "lazy hack parroting drivel"?

Marc Ettlinger at Berkeley tells me he watched with mounting annoyance as 60 Minutes did its story on the Moken (the "sea gypsy" hunter-gatherer tribe living on islands in the Andaman Sea off Thailand). I can understand why he was irritated. You can read a transcript here. It doesn't content itself with the news story (covered elsewhere too) about how the Moken knew the tsunami was coming (just like the animals of the jungle) and fled to higher ground so that not a single one of them was taken by the wave; it wanders on into a whole slew of traveler's tales about how their language has no word for "when", no word for "want", no word for "take", no word for "hello", no word for "goodbye", no word for "worry", and of course if you have no word for worry you never worry...

Many will want to believe this drivel, notwithstanding the critique that Marc offers; but not me. Having seen how little work people are prepared to do to check claims about languages even when they are well known and readily accessible (remember, President Ronald Reagan once got away with claiming in a speech that Russian had no word for "freedom"!), I would not bet a cent on any of the claims about the Moken being true.

Ben Zimmer points out to me that while the statements made by anthropologist Jacques Ivanoff were bad enough ("risible pseudo-Whorfian arguments about the Moken language," says Zimmer), Bob Simon took the ball and ran with it. Note the illicit shift in the following sequence:

Ivanoff: "Time is not the same concept as we have. You can't say for instance, 'When.' It doesn't exist in Moken language."

Simon: "And since there is no notion of time, it doesn't matter if the last visit was a week ago or five years ago."

Simon takes the (utterly unsupported) anthropologist's claim that they don't have the same concept of time as us westerners and stretches it to get to the notion that they have no concept of time. That, of course, will link to why they have no word for "hello": they have no idea whether anyone has been away. No concept of time, so no way absence could make the heart grow fonder. Utter, self-refuting nonsense, of course. If the Moken had no concept of time, how would they have known to flee to higher ground when the tsunami was coming, rather than three hours later? And how would they know that time had passed so it was OK to come back to the beach? How can people believe these things?

I tell you honestly, I wish English had a word meaning "lazy journalist eagerly repeating hogwash about natural languages". Oor a word for the state of not knowing whether to feel pity or simply barf when told stupid things about implications of lexical poverty. Or a lexical item with the sense "absurd and unsubstantiated thesis about some language allegedly lacking words for elementary concepts basic to all human life". Such words would be used so often here at Language Log. The corridors at Language Log Plaza would ring with them. (But you'll notice we manage to reflect upon these concepts anyway, despite not having the words.)

Posted by Geoffrey K. Pullum at 01:05 AM