November 30, 2004

Latte lingo: Raising a pint at Starbucks

Back in October, Dave Barry's periodic "Ask Mister Language Person" column addressed the naming of coffee cup sizes:

We begin today with a disturbing escalation in the trend of coffee retailers giving stupid names to cup sizes. As you know, this trend began several years ago when Starbucks (motto: ''There's one opening right now in your basement'') decided to call its cup sizes ''Tall'' (meaning ''not tall,'' or ''small''), ''Grande'' (meaning ''medium'') and ''Venti'' (meaning, for all we know, ''weasel snot''). Unfortunately, we consumers, like moron sheep, started actually USING these names. Why? If Starbucks decided to call its toilets ''AquaSwooshies,'' would we go along with THAT? Yes! Baaa!

This is exactly how I feel about such things, and so for years I've been quietly but firmly using the terms small, medium and large, at Starbucks as I do elsewhere.

My reasoning is essentially the same as Dave's, only not as funny:

Recently, at the Dallas-Fort Worth International Airport and Death March, Mister Language Person noticed that a Starbuck's competitor, Seattle's Best Coffee (which also uses ''Tall'' for small and ''Grande'' for medium) is calling ITS large cup size -- get ready -- ''Grande Supremo.'' Yes. And as Mister Language Person watched in horror, many customers -- seemingly intelligent, briefcase-toting adults -- actually used this term, as in, ''I'll take a Grande Supremo.''

Listen, people: You should never, ever have to utter the words ''Grande Supremo'' unless you are addressing a tribal warlord who is holding you captive and threatening to burn you at the stake. JUST SAY YOU WANT A LARGE COFFEE, PEOPLE. Because if we let the coffee people get away with this, they're not going to stop, and some day, just to get a lousy cup of coffee, you'll hear yourself saying, ''I'll have a Mega Grandissimaximo Giganto de Humongo-Rama-Lama-Ding-Dong decaf.'' And then you will ask for the key to the AquaSwooshie. And when THAT happens, people, the terrorists will have won.

Tell it, brother!

There are just two problems, though.

The first problem is that Starbucks is right, in a sense. I've established that asking for a "small coffee" gets you the 12-ounce size; "medium" or "medium-sized" gets you 16 ounces; and "large" gets you a 20 ounce cup. However, in absolute rather than relative terms, this is nuts. A "cup" is technically 8 ounces, and in the case of coffee, a nominal "cup" seems to be 6 ounces, as indicated by the calibrations on the water reservoirs of coffee makers, and implied by Starbuck's own brewing instructions: "We recommend two tablespoons of ground coffee for each six ounces of water." And 16 ounces is otherwise known as one pint -- so we seem to have established that a "medium-sized coffee" is a pint of coffee, a concept that might have given even Balzac pause. When you think of it, "grande" is a more descriptive term. The next time I visit Starbucks, if I really think I can cope with 16 ounces of coffee, I'll try just asking for "a pint, please".

The second problem is that (most?) Starbucks outlets actually have four sizes of cups, not three. There's an unadvertised 8-ounce size, called "short". Look up short in Starbucks' own Latte Lingo lexicon, if (like most people) you don't believe me. Or try asking the barista for a "short coffee" -- it's usually worked for me, though even some Starbucks employees seem to be unaware of this size. Since this is usually the size that I really want, I was very glad to learn about it.

By the way, venti may well mean "weasel snot" in some language, but the Starbucksian marketeer who came up with the name was probably thinking of the Italian for "twenty". Or "winds", take your pick. Of course, the Italians would use the metric system, and 20 fluid ounces in metric is approximately 591.476 cc, but I don't think that cinque nove uno virgola quattro sette sei is going to make it as a product name. I guess you could round up and call it seicento. For all I know, maybe they do, somewhere out there in Metricland.

[Update: Melissa K. Fox points out that "pint" is ambiguous:

I'm glad to hear that there is still a "short" size at Starbucks -- I remember when "short" used to be coffee for "small", and now it distresses me that "tall" is the smallest size, when it used to mean "large". Bah.

Incidentally, the "pint" thing wouldn't work worldwide -- here in Britain, a pint is 20 fluid ounces (and the fluid ounces themselves might not be the same size as US ones, but I'd have to check on that), so if you asked for a pint you'd get a "venti" instead of a "grande".

Indeed. The AHD sez (with some elisions):

1. a. A unit of volume or capacity in the U.S. Customary System, used in liquid measure, equal to 1/8 gallon or 16 ounces (0.473 liter). ... c. A unit of volume or capacity in the British Imperial System, used in dry and liquid measure, equal to 0.568 liter. See table at measurement.

I think the British pint is not quite as big as a "venti" -- though I've never measured the actual liquid contents of a Starbucks "venti" sized beverage (the barista leaves a variable amount of room for added dairy products, and I don't know if the cup sizing allows for this or not...). And is an British ("Imperial") pint really 20 fluid ounces? This table just says that it's half a quart (half an imperial quart, natch). But yes, this site on "Capacity" explains that indeed, there are 20 (imperial) fluid ounces in one (imperial) pint. And read the rest to learn about kilderkins, firkins, pins, gills, pipkins and nipperkins. You will also learn that in the U.S., a "barrel" of fruits and other dry commodies is equal to 7,056 cubic inches, except in the case of cranberries, where a barrel is 5,826 cubic inches...

Meanwhile, Kai von Fintel observes that "Seattle's Best" is hardly a Starbucks competitor:

Another problem with Dave Barry's very funny bit is this: "Starbuck's competitor, Seattle's Best Coffee".

SBC is actually owned by Starbucks.


[Update 12/1/2004: Stefano Taschini writes:

I believe there must have been a small mistype: technically, a warm liquid that you ingest in quantities exceeding half a liter is "stock" not "coffee".

-- Stefano

P.S. South of the Alps, asking for "un caffè" will bring you around 25 milliliters of liquid, corresponding to ca. 5 teaspoons, i.e., 7.8 millionths of a spherical fathom.

Or a bit less than one ounce. Starbucks could call it an uno! ]


Posted by Mark Liberman at 09:22 AM

November 29, 2004

No time for an apostrophe

As of this date (my Unix system reports Mon Nov 29 11:12:45 PST 2004), the Time website at has a box advertising the Verbatim (Quotes of the Week) feature with these words:

"Some of you people have been illegal for a long time." — Thomas Menino, Boston mayor, on repeal of a law that imprisoned Indian's inside the city

(The mayor was talking to a group of Indians at a ceremony in which he called for the repeal of the 1675 Indian Imprisonment Act.) Talk about a bad example to present to the young!

Time is highly regarded for its typographical accuracy; there are teams of people who work all day ensuring that no plural noun will sport an apostrophe. But perhaps the website is not so strictly policed. The passage in question also has an HTML error ("&quot" shows up because the data entry person failed to end the code for a double quotation mark with a semicolon); I didn't reproduce that above because it wasn't relevant to anything about English. Language Log will report on how long it takes the apostrophe-catchers, which-hunters, and semicolonoscopists at Time to catch both errors and fix them.

Added later (at Tue Nov 30 08:27:55 PST 2004): O.K., it's the next morning, and Time has now fixed the error (clearly they read Language Log, as they certainly should). But don't let it happen again, Time magazine. Use of apostrophes (unlike syntax) is not a domain in which you get any real dialect variation or judgment differences, and Language Log is not going to offer any defense for such errors. We are not anarchists when it comes to spelling or punctuation of written Standard English. You simply never use an apostrophe to form the plural of an ordinary dictionary word, like Indians. Not ever. Maybe with digits (the 1960's) or special symbols or letters of the alphabet (P's and Q's and @'s), and some acronyms (NGOs looks odd, so people write NGO's); but not with ordinary dictionary words spelled with lower-case letters in them. Yet there is someone on Time's web page staff who does not have this principle etched into their soul. Bad staff member!

Posted by Geoffrey K. Pullum at 02:21 PM

Politics and language in Ukraine

The Ukrainian electoral map is orange and blue instead of red and blue, and divided east/west instead of center/surround:

Another difference is the relation between the political map and the linguistic one:

The role of language affiliation seems to be complex here, in my limited understanding. As in the Baltic republics, there is the problem of a Russian-speaking minority, used to linguistic dominance from many years of Czarist as well as Soviet rule, and now concerned about the prospect of being linguistically disadvantaged. However, in a (long, interesting) guest commentary on John Quiggins' blog, Tarik Amar suggests that this division has been exaggerated:

It is a much abused quarter-truth that Ukraine divides between a “nationalist” West and a “Russophile” East. While there are important differences between regions in Ukraine, the West-East divide has been deliberately talked up by President Kuchma’s regime to enable it to pose as a neutral arbiter. Now his chosen successor Yanukovych is escalating this reckless rhetoric to threaten with “civil war” a recalcitrant society whose political ethics and maturity seem to simply elude him. It is true that the quintessential Eastern, Russified region of Donetsk has voted overwhelmingly for its ex-governor as well as ex-petty criminal Yanukovych. Yet, a local voter turn-out of about 95 per cent, at some precincts even of 104 per cent, should make everybody very careful about attributing these results to the way its inhabitants have really voted or would have liked to vote. Donetsk is firmly in the hand of Ukraine’s most ruthless oligarcho-mafiotic clan, of whom Yanukovych is a card-carrying member. As the ex-governor told President Kuchma long ago, the Donetsk clan could get an “Orang-Utan” elected.

He points to the results in places like Kiev as evidence that the cultural and linguistic divisions do not correspond to the election results:

Crucially, even in round one the opposition managed to win all Ukrainian regions in the West as well as the Centre of the country, including – by a large margin – the largely Russiophone capital city Kyiv. The government has always liked to pretend that the opposition’s base was restricted to the Ukrainophone West, implying that it was “nationalist”, even “separatist.” Some Western observers still cling to these facile stereotypes. It is Yanukovych who has been cornered in a minority of eastern oblasts. If anybody represents an above-regional Ukrainian solidarity, it is clearly Yushchenko. He speaks proper Russian as well as Ukrainian and his being a native of one of Ukraine’s most eastern oblasts and having spent his student and working life in western as well as central Ukraine cannot be matched by Yanukovych, whose biography is strictly mono-regional and whose Ukrainian is not perfect.

On the other hand, in a comment at Fistful of Euros, Michael S writes that

Kiev is not so much a russophone as a bilingual city. That makes a big difference from eastern regions, where most city dwellers have only studied Ukrainian as a foreign language in school.

It’s also not a much of a news item that Yanukovych managed to scare some Russian speakers. There’s no language on earth whose speakers he wouldn’t scare. It’s much more remarkable how Yushchenko’s campaign let itself be turned into a handy scarecrow for the same constituency. Even its main slogans weren’t tranlated. When people gather at pro-opposition rallies in strictly russophone cities, they can only make speeches in Russian, but their chants are all in Ukrainian. In a way, it’s been remarkable to see Ukrainian make the transition (in Ukrainian-Russians’ eyes) from its traditional place of a bumpkin cousin of Russian to the language of civic courage, but from a political standpoint, I think it was a big tactical mistake.

There’s a letter from Lugansk waiting in Maidan’s translation pipeline that points this out from first-hand experience. A lot of people in the east were led to believe that a Yushchenko administration would make mastery of Ukrainian a prerequisitve for making a living.

Several years ago, I had an interesting exchange about Russian and Ukrainian with an undergraduate, originally from Kiev, who was enrolled in my intro linguistics course. I'd given a lecture about language and thought, Whorf, and Sapir's notion of "formal completeness":

The outstanding fact about any language is its formal completeness ... To put this ... in somewhat different words, we may say that a language is so constructed that no matter what any speaker of it may desire to communicate ... the language is prepared to do his work ... The world of linguistic forms, held within the framework of a given language, is a complete system of reference ...

The student came up after the lecture, to explain carefully and emphatically that Sapir and I were wrong. She had spoken both Russian and Ukrainian all her life, and she was prepared to state without any uncertainty or qualification that Russian was simply a more complete and expressive language. "There are many things that I can say easily in Russian that I could not possibly say in Ukrainian." She was happy to provide examples -- as I recall, some of them involved particular word-sense distinctions present in Russian and lacking in Ukrainian, while others involved cultural resonances, especially literary ones.

Not being competent to argue the point on the facts, I was forced to fall back on generalities and platitudes about the possibilities of lexical borrowing, the availability of paraphrase, the difference between connotation and denotation, the valuation of elite and popular culture... The point here is just that there is apparently some linguistic prejudice among bilingual but Russian-dominant residents in Kiev (and perhaps other large cities). To the extent that this attitude is common, Yushchenko's success in Kiev supports the view that the campaign transcends cultural and linguistic identity.

However, it seems that in the current Ukrainian election campaign, the cultural stereotypes are inverted. It's the Russophone Yanukovych who is nekulturny. Tarik Amar again:

A big man of few words and two criminal convictions (strictly not political), Yanukovych confirmed his image as the candidate from the “zone” – i.e. jail – by publicly abusing his opponents and their voters as “goats”, prison slang for stool-pigeons and passive homosexuals. His penning his profession into official papers as the Ukrainian equivalent of “pryme-meenister” and his – very dubious – academic degree as “proffessor” were the comic relief of the campaign. Yanukovych’s tough order to his bodyguards to sort out a Second World War veteran, who dared challenge him, starkly contrasted with his already famous dying-swan performance when hit by a raw egg. Collapsing, he was hurried away in best line-of-fire fashion. Yanukovych's talking Tarantino and turning tail, capped by his initial refusal to join a legally obligatory TV debate with Yushchenko before round two, hugely popularized jokes about eggs and Yanukovych – with eggs, by a simile common to many languages, standing for male courage or its absence.

[Update 11/30/2004: More on this, along with a terrific combined map, from Tobias Schwartz at A Fistful of Euros. And DoDo proposes a three-way rather than two-way historical and cultural division in Ukraine.]


Posted by Mark Liberman at 10:05 AM

November 28, 2004

Quiz #3 Answer

Stefano Taschini, who sent in the material for our Language ID Quiz #3, has kindly provided a step-by-step account of how he would have gone about finding out the answer, if he hadn't already known it. I've copied his note below.

Let me first tell you that, although I obviously know what our "mystery" language is, I can neither speak it nor understand it, and therefore I set myself the exercise to try and determine the features that could lead to its identification. Here is how I proceeded.

Comparing the three clips allows us to understand a few structural details. In clip #1 and #2 there is a short introduction (almost the same) that ends with "kiirdiler". In clip #2 this is followed by few syllables that are present also at the end of clip #1. These syllables form a a russian name, the singer's name, Pavel Semjonov, i.e., Павел Семёнов (usually spelled without diacritics as Павел Семенов). It ought to have helped to realize that the singer is the same in both clips. The presence of russian names (there is one more later on) in an obviously non-russian context suggests a language present in a territory of the former Soviet Union, probably written in cyrillic.

The structure of the two clips is the following:

Clip #1: [intro #1] [something #1] [toloroočču] [Pavel Semjonov]
Clip #2: [intro #2] [Pavel Semjonov]

In clip #3, the speaker introduces a duet. The structure of the announcement is:

Clip #3: [something #2] [tolorooččular] [something in -ova] [uonna rustamxon]

The comparison between clip #1 and clip #3 leads to the hypotheses that [something #1] and [something #2] are the song titles, and that [tolorooččular] is the plural of [toloroočču] (possibly a verb meaning "to sing" or a noun meaning "perfomer"). Clip #3 ends with the names of the two singers in the duet. One is again a russian name ending in -ova (the female singer), followed by a mysterious [uonna rustamxon].

The speaker clearly pauses between [uonna] and [rustamxon] and I think it's a fair assumption that they correspond to two separate words. My hypothesis is that [uonna] is the conjuction and [rustamxon] is the apparently non-russian name of the male singer. If not a nom de plum, it should be split in first and last names. The general rhythm of the language clips seems to show a preference for oxytone words. If that is the case, the singer's name would be Rustam Xon, i.e., Рустам Хон.

Summing up these ideas, we have that

1. Two of the singers have russian names. One of them is Pavel Semjonov (Павел
2. The language is written using the cyrillic alphabet.
3. The third singer has a non-russian name, Rustam Xon (Рустам Хон).
4. The suffix -lar (лар) is used to form plurals (maybe only for either verbs or nouns, we don't know that).
5. The word uonna (уонна) is the conjunction "and".

At this point, Google is enough to confirm these conjectures. Besides showing us indirectly what our mystery language is, it also tells us the name of the female singer, Люба Готовцева (Ljuba Gotovceva, using the UN romanization [1]), and the title of the songs in the first and third clips, Миэхэ эн мэлдьитин (Miexe en meld'itin, using ad hoc romanization) and Дьоро киэһэ (D'oro kiehe). We note the presence of an additional character, not part of the russian cyrillic: һ, for the voiceless glottal fricative (which makes the UN romanization for Russian awkward to use). This won't be the only one additional character, as the clips present at least three more additional sounds: a voiced velar fricative, a german-like "ü" and a lax "i" (at the end of the refrain in the first clip). These sounds are represented by ҕ (which I often found spelled with the digit 5 on the web), ү (not to be confused with the standard cyrillic у), and ы.

Finally, I contacted a native speaker, Katerina Potapova [2] who kindly transcribed and translated the clips.

Clip #1: Ааспыт икки тыһыынча биир сыл түмүгүнэн “Сыл бастыҥ ырыата” номинацияҕа киирдилэр:“Миэхэ
эн мэлдьитин”. Толорооччу: Павел Семёнов — Билэбин эн миэхэ мэлдьи чугаскын...
Aaspıt ikki tıhıınča biir sıl tümügünen “Sıl bastıŋ ırıata” nominacijaɣa kiirdiler: “Miexe en meld'itin”. Toloroočču: Pavel Semjonov — Bilebin en miexe meld'i čugaskın...

On the results of last year, the nominations for “The hit of the year”: “You, always for me.” Performer: Pavel Semjonov — I know, you are always close to me...

Clip #2:
Ааспыт икки тыһыынча биир сыл түмүгүнэн “Сыл бастыҥ ырыаһыта” номинацияҕа киирдилэр: Павел Семёнов — Сэгэттэйим оҕото сэмэй нарын Өрүүнэ ...

Aaspıt ikki tıhıınča biir sıl tümügünen “Sıl bastıŋ ırıahıta” nominacijaɣa kiirdiler: Pavel Semjonov — Segettejim oɣoto semej narın Örüüne...

On the results of last year, the nominations for “The best performer of the year”: Pavel Semjonov — My darling-baby, charming gentle Irina...

Clip #3:
“Дьоро киэһэ.” Толорооччулар: Люба Готовцева уонна Рустам Хон — Эн биһикки...

“D'oro kiehe”. Tolorooččular: Ljuba
Gotovceva uonna Rustam Xon.” — En bihikki...

“Happy evening.” Performers: Ljuba Gotovceva and Rustam Xon.” — You and me...

In case it's not clear by now, the language is Yakut [3], spoken in the Republic of Yakutia in the Russian Federation. The clips were downloaded from the website for the 2002 edition of the "Etigen Xomus" music awards [4].



Posted by Mark Liberman at 08:49 AM

November 27, 2004

Twos and Threes

Geoff Pullum has just examined here a rule of thumb for practical reasoning, the OICTIQ:

What the Once-is-Cool-Twice-is-Queer (OICTIQ) principle is saying is that in the realm of human behavior a single event can be dismissed as sporadic, but you have to take it seriously when you find a pattern repeated twice or more, especially within a short space of time. I want to suggest that this is in fact a rather useful rule of thumb for linguists and philologists.

Or, to put it negatively, and with apologies to (the heirs and assigns of) Jacqueline Susann: Once is Not Enough (OINE).

Sometimes it's twice that's crucial, more often three times; once could be an accident, twice a coincidence, but three times is a trend. The Rule of Three (ROT).

Back in February, I began a thread in the American Dialect Society mailing list on a manifestation of the ROT, a formula -- a snowclone even -- that I'll call "X3":

X3: The three most important Xs in Y are: Z, Z, Z. (conveying something like 'the only really important X in Y is Z')

The ADS-L discussion quickly turned to other manifestations of the ROT, but eventually yielded some information on X3 in particular. Here's a summary of what I now know.

It started innocently enough. On 2/8 I posted (in lower-case mode):

a commentator on NPR's Sunday Morning Edition today (2/8/04) claimed that the three most important (electoral) issues in michigan are: "Jobs, jobs, jobs."
this is (i think) a play on the real estate cliche that the three most important considerations in buying a house are: "Location, location, location". location-location-location itself has been extended to a great many domains besides real estate; to appreciate this, google on "location location location" and sample some of the roughly 218,000 sites listed.
... has anyone assembled some collection of instances of the formula? (they are all over the place.) has anyone looked at the history? (is location-location-location in the real estate domain the earliest exemplar in english? in any case, what's the earliest citation for an exemplar?)

Since then, I've collected several more instances of X3:

David Pogue, "State of the Art" column in the NYT Circuits section, 3/18/04, p. E1, "A TV That Cuts All Cords": Every industry has its marketing buzzwords. In the food business, you've got your "all natural" and, lately, your "low-carb." In the auto market, it’s "G.P.S.," "ABS" and "AWD." But in the consumer-technology racket, the three hottest buzzwords are, in no particular order, "wireless," "wireless" and "wireless."
Steven Greenhouse, "It’s Not Just About Jobs, but Where the Jobs Are". NYT Week in Review, 9/5/04, p. 3: "Our longest-serving governor, Jim Rhodes, used to say there are three critical issues in the election: the first is jobs, the second is jobs, and the third is jobs," said Andrew Doehrl, president of the Ohio Chamber of Commerce.
Corey Taylor, “Field of Creams”, interview with porn model Gregg, Unzipped 11/04, p. 19: East coast stud Gregg is hotter than the Florida sun in summer. Originally from Dover, Del., Gregg moved to Tampa Bay because of "the weather, the weather, and when I really stop and think about it further, I would have to say the weather."

Unfortunately, back then I chose to call the formula "The Rule of Three", which invited everyone to veer away from the topic at hand. In my experience on discussion groups, such invitations are hardly ever resisted. We were off and running.

Alice Faber (2/8) introduced the joke formula "practice, practice, practice" (many, many instances), and Sam Clements (2/8) produced cites of this form from 1888 through 1906, having to do with piano-playing, learning to type, and learning to write. My intuition (2/8) was that this was a separate figure ("eat, eat, eat"), though there might have been some cross-fertilization. John Baker (2/9) noted that "location, location, location" was invariant in form, but that the Carnegie Hall punch line has a common variant with a vocative in the middle: "practice, X, practice", where X is "son", "boy", or whatever. I agreed (2/9), arguing that X3 wasn't merely emphatic repetition (as in the Carnegie Hall joke or in "eat, eat, eat"):

i think that emphatic repetition generally can involve doubling *or* tripling. i can say "eat, eat, eat" or "eat, eat" with pretty much the same effect, but if i said "the two most important things in the michigan election are jobs and jobs", you'd figure out what i was getting at, but it would take you more work than if i'd used the figure with tripling, because you'd recognize that figure automatically.

And Dennis Preston (2/9) offered "defense, defense, defense", noting that it was like "location, location, location" in conveying 'the most important thing in the game is...'

Larry Horn (2/9) broadened the scope:

I was collecting these for a while, in connection with my more systematic exploration of the Lexical Clone construction (a.k.a. Doubles, Contrastive Focus Reduplication), as in "No, what I wanted was a {dog dog/salad salad}" or "We're not LIVing together living together". My hypothesis was that the emphatic triple (= 'and nothing else matters') emerged for this function (and I did have a bunch of others, but they didn't reveal anything earthshattering) because the double was pre-empted for the modificational use. Of course, whenever I presented anything on those triples, someone would quote the line from the Lewis Carroll epic poem, The Hunting of the Snark: "What I tell you three times is true". A quick web search indicates that you wouldn't be the first to refer to this pattern as "The Rule of Three".

Oh, lordy no, not the first. And very much not the first to use "Rule of Three" for investing threeness with significance, as my web search showed. Meanwhile, back on ADS-L we'd moved to thrice-cursedness in Dostoevsky's Crime and Punishment (Gerald Cohen, 2/9), "third time lucky", Muslim divorce, and the ordination of Buddhist monks (Orin Hargraves, 2/9), and threeness in Karl Menninger's Number Words and Number Symbols (David Bergdahl, 2/9). [For a brief and, I think, entertaining inventory of some of the special meanings of 2 and 3, see my introduction to Studies Out in Left Field: Defamatory Essays Presented to James D. McCawley (On the Occasion of His 33rd or 34th Birthday) (1971, reprinted 1992).] At this point I tried , somewhat sternly, to wrest the topic (on 2/9) back to the X3 formula, whose most salient characteristic (for me) was that it was conventionalized:

this discussion, interesting though it is, has veered significantly from my original query. (i know, e-discussions are like that.) i have no doubt that a special regard for the number 3 influenced the way the originator(s) of "location location location" framed this emphatic utterance, but the fact is that it did become a formula, which was then extended to other contexts than real estate and to utterances using expressions other than the word "location". the formula/figure/trope has a life of its own, as a convention of language use, and that life was what i was inquiring about.
it's much the same with syntactic constructions: aspects of a construction often "make sense" from a semantic or pragmatic point of view (more and more sense as we get back to the historical origins of the construction), but from the point of view of the speaker of the language they are simply the conventional ingredients of the construction. it makes sense that the english passive uses the auxiliary verb BE in combination with a past participle, but now those are just aspects of form that are paired with a particular meaning (a meaning that is distinct from the meaning of the predicate adjectival construction that served as the historical source for the passive). similarly, it makes sense that some languages use the subjunctive mood for imperative sentences, but speakers of such languages aren't creatively using the subjunctive to convey a suggestion; they're just taking the subjunctive off the shelf, so to speak, for this purpose.
(the processes -- or, perhaps, process -- of grammaticalization of syntactic form and conventionalization of figures are certainly interesting in their own right, and in fact i am very much interested in both, but they're not what i was asking about.)

Incredibly enough, people were actually working on my original query. The ADS-L Antedating Pack -- antedating expressions is a sort of sport for those who are lexicographically inclined -- was on the job. Jesse Sheidlower (2/8) noted that Barry Popik had posted earlier to ADS-L with a 1960 cite for location x3, which immediately trumped John Baker's (2/9) bid of a 1984 quote about the young Donald Trump. Baker did venture to speculate that X3 was relatively recent ("perhaps as little as 25 to 40 years ago") and that the real estate joke with location was the original model. And so it seems to be.

Popik's 1960 cite, conveniently reproduced by Sam Clements on 2/9:

4 May 1960, IOWA CITY PRESS CITIZEN, pg. 23, col. 2: LOCATION! LOCATION! Location! A famous realtor once said the three most important features of a home are its location.

The vague reference to "a famous" person who "once said" so-and-so is just classic. The quotation or formula is out there, and no one really knows who said it first, but it just had to be someone famous, from some time ago. (The 1984 cite had "hackneyed" in it.) Neither of these things is necessarily so. (And The Donald is right out of it.)

In any case, Popik (2/9) managed to get things back to 1956 in Van Nuys, California (for you non-Angelenos, that's in "the Valley", land of serious real estate development in the postwar years and, more recently, the native territory of the Valley Girl):

Van Nuys News - 6/10/1956: Two 3 bedroom homes. Reseda VAN NUYS LOCATION. Fireplace, patios, BBQ, fenced.....Trees. Best LOCATION. THE REALTY HOUSE 5818 VAN NUYS Bl. ST 6-7360 Open weekdays 'til 9 p.....With Chavin 4415 Ventura ST 9-0331 11-VAN NUYS District 11-VAN NUYS Distric The BEST.....2-BEDROOM CARPETED We repeat-LOCATION LOCATION, LOCATION. charming home with big...
Valley News - 11/22/1956: ...Excellent LOCATION. Asking VACANT CLOSE IN VAN NUYS. only down. 2-bedroom and den Large.....CHOICE LOCATION NEAR Kester elementary and VAN NUYS junior high schools. Just 9 months.....lo volume business. Write r. S. Box 237 VAN NUYS Nfews, VAN NUYS. CASH FOR YOUR EQUITY.....Ave. ST 6-1860. Eves. ST 0-0053 LOCATION LOCATION 'LOCATION The 3 things to look for...

(Popik also unearthed an apparent 1930 real estate cite, but that turned out to be 1980. See below.)

And that's it, folks. Back to the middle '50s, in a real estate context. And then the cites pile up fast and high, and they're all about location, until some more recent time, when the X3 formula took off for other purposes.

A couple of final notes. First, I have great respect for the lexicographic types who do these searches. It takes an enormous attention to context: what's important is not just the first time some word or sequence of words is attested, but how it's used. And what's important (as Jesse Sheidlower has recently been stressing on ADS-L) is not really the first appearance of an expression in this use (there are often repeated but isolated inventions), but when it "took off", when it spread though significant parts of the speech community. And the relevant texts can be very hard indeed to access; a lot of truly tedious library work is involved. And the miracle of text scanning and automatic searching is a mixed blessing; most of the scanned text that Popik looks at is really crappy, full of flagrant mis-scans, which is why a 1980 cite could appear with a 1930 date ("8" vs. "3").

When I have accidentally fallen into work of this sort, I've found it grindingly difficult and often baffling. I hope to report soon on my adventures in tracing the AAVE lexical item asto(r)perious/asteperious/astiperious 'haughty, uppity' (if any of you reading this actually uses this word, tell me now, please). Suffice it to say that I found myself engaged with the work of Zora Neale Hurston, the naming of World War Two bombers, race horses, romantic pseudo-historical fiction about early Ireland, rap music, detective mysteries, and much else. And I still don't understand any of it, really. I'm amazed that people can, sometimes, succeed at this sort of enterprise.

Second -- remember that there was a "first" a while back -- my apologies for having wandered from Geoff Pullum's interest in twos (the OICTIQ) to my own interest in threes (the ROT, and, specifically, the X3 formula). And try not to read too much into the "twos" vs. "threes" thing. (Insert, sigh, obligatory reference to cigars here.)

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 03:12 PM

Once is cool, twice is queer

In an almost forgotten 1970 Sidney J. Furie movie about a pair of itinerant motorcycle racers, Little Fauss and Big Halsy, a character named Halsy Knox (Robert Redford) picks up not just one small-town girl but two, and spends a hot night with them both. In the morning his sidekick Little Fauss (Michael J. Pollard) is surprised to find him creeping away before the girls wake up, and preparing to leave town and move on. Fauss wonders why Halsy wouldn't want to stick around for more of the same. But Halsy's reply is negative: "Uh, uh! Once is cool; twice is queer."

Such is the harsh homophobic code by which the straight American male must live: you can engage in one threesome with a pair of girls who are happy to be naked in bed together, but hang around for a second and it is not just they who will be tagged with the savage judgment "queer", but you too.

I was surprised to find that such a striking line did not figure in any online databases of notable movie quotes. But as the eleventh of the 38 films Redford has so far made, and basically just one of several attempts to cash in on the success of Easy Rider, the film was pretty well forgotten along with the eminently forgettable 1970s.

This being Language Log, you will be wondering, I know, how I will now segue to a linguistic topic. I promise you, I will achieve this. You have only to read on.

What the Once-is-Cool-Twice-is-Queer (OICTIQ) principle is saying is that in the realm of human behavior a single event can be dismissed as sporadic, but you have to take it seriously when you find a pattern repeated twice or more, especially within a short space of time. I want to suggest that this is in fact a rather useful rule of thumb for linguists and philologists.

Philology first. Let's look at the text of the Second Amendment to the Constitution of the United States (are you following me? pay attention please):

A well regulated Militia, being necessary to the security of a free State, the right of the people to keep and bear Arms, shall not be infringed.

This is quite hard to parse for a literate modern reader, because it begins with a noun phrase, a well-regulated militia, which turns out not to be the subject of the main clause. The sentence makes sense only if shall not be infringed is taken to be the predicate of the main clause, and the right of the people to keep and bear Arms is the main clause subject. But in that case there is a comma between subject and predicate. This is an error under standard modern punctuation principles (common though the error is in undergraduate writing). Is it just an isolated slip? Well, it happens that we have another chance to find out, without even leaving this one sentence. The sentence begins with what is traditionally known as an absolutive clausal adjunct — a gerund-participial clause functioning as an adjunct in clause structure. It is understood as if it began with since or because or in view of the fact that (notice that Our situation being hopeless, we surrendered means "Since our situation was hopeless, we surrendered). The subject is a well-regulated militia, and the predicate is being necessary to the security of a free State. But in that case there is, again, a comma between subject and predicate.

Well, once is cool, twice is queer. With two occurrences in a text this short, we are advised by the OICTIQ principle to assume that in the 18th century it was normal to place a comma between the subject and the predicate, a practice now regarded as ungrammatical. In translating this text into modern Standard English to divine its intent, we should therefore remove both the first comma and the third.

Now a topic in English syntax. There is plenty of evidence that even educated Americans often believe that there is something wrong with sentences ending in prepositions. Heaven knows why, after more than 700 years of such constructions, but they do. Suppose we were investigating the question of whether there was any support for such a view. How might we proceed?

Well, suppose we fix upon an author who is universally agreed to be a master of the craft, an admired author from at least a hundred years ago. Let's take Oscar Wilde, who died in 1900. And let's select a work of his that is above reproach as an instance of his finest work: "The Importance of Being Earnest", which has often been called the finest stage comedy in the English language. Now, who, of all the very upper-class characters in that play, has the most pompously and rigorously correct speech? There can only be one answer: Lady Bracknell. She really does speak like a book, and a tedious one. So, we start looking at preposition placements in the utterances of Lady Bracknell, and we rapidly find this:

LADY BRACKNELL: A very good age to be married at.

Could that conceivably be just an extraordinary slip-up on Wilde's part, a momentary lapse which, if someone had pointed it out to him, he would have immediately fix to make sure Lady Bracknell always sounded correct? We might imagine that this was so; but surely, not after we spot this second case:

LADY BRACKNELL: What did he die of?

No, once is cool, twice is queer. The second one should settle it: Lady Bracknell uses prepositions at ends of sentences whenever she damn well pleases. So should you. The notion that there is something slightly ugly or disreputable about them is just a myth — a myth, moreover, that is only believed by people who do not belong to the upper classes, and who have not studied the English language or paid real care and attention to its use in literature.

Of course, the OICTIQ principle is not a law. It is conceivable that someone who is prone to some sporadic error might commit it twice, even twice in one short piece of writing. The principle is merely methodological, a rule of thumb. It cautions the philologist or linguist to remember that dismissing one isolated inexplicable feature of a text as just a speech or writing error may be reasonable, but dismissing a second immediately becomes much less plausible when a second instance turns up hard on the heels of the first. The credibility of the linguist arguing that a sporadic slip is involved goes down, and the likelihood that an actual regularity of grammar is involved (possibly one that diverges from the grammar of the standard dialect of the language at issue) goes up.

The didactic Lady Bracknell basically states the principle herself. In a truly famous line, she says to the foundling Jack Worthing:

To lose one parent, Mr. Worthing, may be regarded as a misfortune; to lose both looks like carelessness.

What she means, of course, is that when it comes to children completely losing track of parents, once is cool but twice is queer.

Posted by Geoffrey K. Pullum at 12:04 AM

November 26, 2004

The Fall Eggcorn Crop

As fall (as we call autumn in North America) comes to an end -- in the U.S., the folk season of fall begins the day after Labor Day and ends the day after Thanksgiving, when millions stream to the stores to begin their orgy of Christmas shopping -- it's time for me to bring you all up to date on the eggcorn front; my last LL posting and Mark Liberman's were a while back (8/27 and 9/26, respectively). It's been a busy harvest season here by the computers in Palo Alto.

Two provisos: first, I can't guarantee that all 22 examples below are fresh ones. By this time, the Language Log eggcorn collection has become such a huge sprawling assortment of postings and links that I'm no longer sure what's in it (though I did try searching through the blog for most of the examples below).

And second, I can't always be sure that it's an eggcorn we're looking at. It might be a typo (teh), a spelling error (speach), a phonological reshaping (Poppa > Poppo, reported to me by Wilson Gray), a morphological innovation (see #12 below), or an ordinary classical malapropism, like palpated > palpitated, as in this letter to the New York Times 11/26, from Meredith Parsons McComb:

...and there's no other word than "groped" for having one's breast palpitated in public.

and in several web cites supplied by Google, for example:

All rams for sale are palpitated by my Vet for abnormalities in the winter after shearing. In young rams brucellosis is rare. (

One old favorite -- defuse > diffuse -- I will repeat here, however, because the latest report has the mark of the indubitable eggcorn, the eggcorner's defense of the innovative usage. The report came from George Markell on 11/20, about:

the phrase "diffuse the situation." As a newspaper copy editor, I saw that phrase many times, and of course I always changed the verb to "defuse." Once when the phrase was in the sports section, not my department, I walked over to the sports desk and pointed out the error, only to have an editor there defend the reporter's usage. His defense was rather vague and I don't really remember it, except that he seemed to equate diffusing with diluting, as if the implied metaphor were about a poison, not a bomb.

And now on to the fall inventory clearance, roughly in their chronological order of appearance in my world. I've been helped by e-mail correspondents (some of them anonymous here) and by posters to the American Dialect Society mailing list (ADS-L) and the Usenet newsgroup soc.motss; participants in both have been much taken by the eggcorn idea.

1. barbed wire > bobwire (via barb wire). On 9/6 a correspondent e-mailed to say:

When teaching about the settling of the American west, my high school history teacher made sure to to write the correct spelling for us of what a student had once referred to in an essay as "bobwire" [barbed wire].

On 9/11 I reported to ADS-L (in my quaint lower-cased fashion) on this one:

"bobwire" is an old familiar to me. i assume "barbed wire" > "barb wire", by the usual t/d participle deletion, and then "barb" > "bob" via a non-rhotic variety. but i was surprised not to see it in DARE. did i somehow miss it, or is it just too widespread to count as regional?

This would count as an eggcorn for anyone who thinks the verb bob (as in "bobbed hair") is part of bobwire. Or for anyone who's conjured up some story involving a guy named Bob. (Ordinary folks are fond of very specific stories about the origins of words and phrases, an inclination I labeled "narratophilia" on ADS-L a couple of years ago. A good story pretty much always trumps truth.)

2. know the score. Also on 9/6 another correspondent offered know the score as a possible hidden eggcorn, saying:

I'm not even sure if the original reference was to sports or music, although I think of it as the latter. In any case, I know I've heard the metaphor extended in ways which indicate that both meanings are assumed by different people.

My initial comment on ADS-L (9/11):

on "know the score", my correspondent noted that he'd seen it in contexts suggesting a sports origin and in contexts suggesting a musical origin (salman rushdie wrote a Guardian column in 1987 entitled "Songs Don't Know the Score"). so presumably one usage was the original and the other a reinterpretation (a "hidden eggcorn"). my correspondent favored the musical story, i'd always assumed it was a sports-based metaphor.

Many, many ADS-L postings followed, with passionate defenders on both sides. My belief in a sports origin hardened (and I have still more to say on the topic now), but Gerald Cohen's belief in a musical origin did too. If you want to know the score, you can following the discussion in the ADS-L archives.

3. anecdotal > antidotal. On 9/7 a correspondent offered:

... I have an eggcorn for you which I ran across on a mailing list I subscribe to: "antidotal evidence" instead of "anecdotal". Google gives a ratio of 1:240 on that one, and even corrects the former with "did you mean..."

Also check out MWDEU under antidote.

4. ones > once and once > ones. First, on 9/8 Ken Rudolph sent me the following, from a poster to

Sometimes, i wonder if pete sampras would attend the us open or even watch it on tv. or like lendl , have nothing to do with it at all.
many former pro tennis player said goodbye to the sport ones they retire. I wonder why ?

And on the heels of this, that same day Larry Horn forwarded some spam to ADS-L with the reverse substitution:

Satisfy YOURSELF & LOVED once (Take Viagra)

Hard to know just what's going on in these substitutions, which could be phonologically based (final /z/ being lightly voiced in English) or mere misspellings, rather than semantically based reinterpretations.

5. one and the same > one in the same. On 9/11 John McChesney-Young e-mailed about this one, suggesting that since it was very common I probably had lots of examples, but here was one from a northern California musicians' site (9/8), on which whistles were being discussed:

A: I love my Syn. Actually, both my whistle contributions to Bended Knee are on Syns. One is the aluminum, the other is an Ironwood with the wood mouthpiece.
B: Is that I. Ron Wood from the tour that was given away in the Jerry Freeman raffle?
A: One in the same. I also recorded The Wexford Carol with that whistle. It was something else.

6. Tagalog language > tag-along language. From a correspondent who shall remain anonymous, on 9/14:

In a linguistics class (which shall remain anonymous, at a school which shall remain anonymous) the instructor had made a reference to Tagalog. Later, a student raised her hand and inquired about "The tag-along language".

7. copywriter > copyrighter. On 9/14 Mark Mandel noted that during the know the score discussion on ADS-L, Sam Clements had written:

Obviously this isn't a clearcut useage in a figurative sense, but it shows how an ad copyrighter used the phrase--keeping score, as if in a game.

Mandel corrected:

"copywriter". A very easy mistake to make, for fairly obvious reasons. Conversely, "copyright" is often misspelled as "copywrite" and its past participle written as "copywritten".

8. sealed > ceiled. On 9/15 a correspondent pointed me to the satire site Something Awful, where the following is quoted:

Chief Security Officer Davies believes there may be taco shells left over from the previous Ares Station crew in the Area 3 Dining Room Supply Hall, which was melted shut by the previously hostile Electron Catalyst Dephazer.
... DynaMars Corporation wishes to thank Chief Security Officer Davies and his officers for their valiant fight against a ceiled door.

9. underlying > underlining. On 9/17 Ron Hardin reported this one to the newsgroup sci.lang:

As we celebrate the 217th anniversary of the signing of the U.S. Constitution, it's easy to forget how revolutionary its underlining principles really were. (

10. last ditch > last stitch. On 9/21 Orin Hargraves noted (on ADS-L) the following item that he'd found in a blog:

I thought maybe it was the types of vibrators I was buying, so I decided as a last stitch effort to try something new.

Google provides a couple of hundred web hits for this one.

11. heap scorn > heave scorn. On 9/21 Michael Palmer offered the following eggcorn on soc.motss (suggesting that heap scorn might involve derision applied from above, while heave scorn would supply derision from below):

Whenever he [Stanley J. Kunitz, editor of the Wilson Bulletin for Librarians, 1928-1943] had an opportunity, he heaved scorn on the fascists in Germany, Italy, and Spain. (David A. Lincove, "Propaganda and the American public library from the 1930s to the eve of World War II," RQ 33, no. 4 (Summer 1994), 510ff.)

Google supplies at least one further web example:

.. adoring him as the hero who made the first trans-Atlantic flight; grieving for him when his son was kidnapped and found dead; and heaved scorn on him for his ... (

12. supposedly > supposably. On 9/23 Wilson Gray asked me on ADS-L if I had supposably, which he said was very common in Black English, in my files. My reply:

i didn't have it in my files, but it turns out to be in paul brians's inventory of common errors in english. many google hits, many of them pretty clearly not from BE. there's even a sort of famous cite from Friends:
Chandler: What if I never find someone? Or worse, what if I've found her, but I dumped her because she pronounced it 'supposably?'.

In the succeeding days our attention turned to attacks on and defenses of, first, supposably and, then, assumably (for presumably), with side excursions to, omigod, that old demon hopefully and also seasonable/seasonal and seasonably/seasonally. Check out the ADS-L archives if you're interested in the twists and turns of this discussion.

In any case, it's not clear that eggcorning, rather than innovative uses of English derivational morphology, is going on here.

13. in earnest > in Ernest. In the 9/27 Palo Alto Daily News, p. A5, reporter Jean Whitney tells us about how "Couple goes from renting to owning":

Eventually, after about three months of looking online at real estate, the pair began in Ernest to house hunt on weekends.

Google has a few hundred hits for began in ernest (lower case), but these could be simple misspellings, not an invocation of some fabled Ernest.

14. mother lode > motherload. On 10/1 Elizabeth Daingerfield Zwicky relayed the following, from e-mail to her from a friend:

I registered for a few things. (After talking to R... it seems like we will indeed recover the motherload of all baby objects from her garage, leaving us not needing much stuff.)

Google provides similar examples with load 'a lot':

The motherload of cb info,... (
I found the Moso motherload! ... I'ma happy man. ( load/bamboo/msg090300239648.html)

Plus a few cites with motherload reinterpreted as 'a mother of a load, a huge load':

Protect your computer from a motherload of viruses, spyware Web site, company offer free ways to protect PC... ( stories/20040711/localnews/821113.html)

15. smarmy > swarmy. Also on 10/1, Scott Safier posted to soc.motss:

Kerry won because the media disobeyed the rules and showed their reactions while the other person was talking. Bush's body language made him look swarmy. People will forget content -- they won't forget Bush looking hunched over and the scowling faces when Kerry criticized him.

He was corrected on the newsgroup, and when I queried him about the word he replied that he had "been mispronouncing the word for eons". For non-linguists, "mispronunciation" covers a lot of territory -- including, in this case, the possibility that Safier thought that swarms were somehow involved in the word.

16. fare > fair. From the cartoon "Moonbeam in the 21st Century", by David Sporrone, "Shoeless pay source", reprinted in the 10/04 Funny Times, p. 17:

Woman: Ya wonder how a guy like Shoeless Joe Jackson would fair in today's modern major leagues?

I'm assuming some notion of fairness is involved here.

The answer to the question, by the way, is: "not well... no shoe contracts."

17. conscious decision > conscience decision. On 11/8 this came by on a (closed) mailing list I'm on:

I was frightened at first due to the 2 trips to the emergency room in the last year. The last time for emergency surgery for an ectopic pregnancy. Everything is fine this time around, but I did have to make the conscience decision to let myself be happy about it and let go of the fear.

Phonological? Possibly, but back in my files there's a pile of examples of conscience raising where I, at least, would have used consciousness raising.

18. by-election > bi-election. On 11/12 Elizabeth Daingerfield Zwicky sent this one on from e-mail to her, about Australian electoral politics:

Note that when a politician leaves (or gets dismissed), there's a bi-election for that seat rather than having someone appoint a new person (or one of their friends/family).

Google shows ca. 4,160 web hits for bi-election. Well, it's an extra election.

19. infest > infect. On 11/18 Wilson Gray, his attention caught by a discussion of cat/pet/animal hoarding, reported this find to ADS-L:

A newsreader, "And, in local news, a house infected with rats!"

The story was about a lonely old lady who hoarded wild rats. Infect seems suitably vivid in this context.

20. over the bow > over the bough. Meanwhile, over on soc.motss Robert Coren offered this wonderful example on 11/20:

From Jayson Stark's column on (at ):
"This is, basically, a shot over the bough," the agent said, "which clearly indicates that the Players Association is onto MLB's game."
I don't know whether Stark or some editor is responsible for this, but I'm trying to imagine what on earth whoever it was thinks the phrase means.
Incidentally, although it would be way tedious to explain what it's about, it's my opinion that even if the agent said "bow" he was using the phrase incorrectly: the context shows that what's being referred to is essentially a defensive action by MLB, not an aggressive one by the Players Association.

21. imperial > empirical. On 11/20 John McChesney-Young passed on a reference on the Classics-L mailing list to a capsule review of the movie "Alexander" in the NYT:

The film follows the young king as he leads his forces on a bloody empirical conquest across the known world,

Morphological reshaping? Or just a spellchecker run amok?

22. ceanothus > cyanothus. If you look carefully at Geoff Pullum's LL posting on the disastrous lack of words for 'robin' in Arctic languages, you'll see that for the photo of the lovely ceanothus plant (common names wild lilac, California lilac -- it actually is a California native plant), the source file is labeled cyanothus. Google turns up five or so occurrences of this spelling. Hey, cyan is a blue hue, and ceanothus has beautiful blue flowers, so it makes sense. I'm only surprised there weren't more hits.

zwicky at-sign csli dot stanford dot edu

Posted by Arnold Zwicky at 07:06 PM

The Language Log Code

Much of the news these days seems to have been written by a disciple of Thomas Pynchon, working off a caffeine overdose under severe deadline pressure. Yesterday's harvest includes the announcement by Bletchley Park of a possible solution to "the coded message on a garden monument designed and installed at the Anson family’s Shugborough, Staffordshire estate between c1748-1758". The coded message reads "D.O.U.O.S.V.A.V.V.M." and is believed by some to provide a clue to the location of the Holy Grail.

WW2 codebreakers Oliver and Sheila Lawn have identified not one, but rather 48 "promising approaches to cracking the code", supplied by "communications received in emails, letters, books, telephone calls and face-to-face" from a volunteer army of cryptographical irregulars, whose contributions have been organized into a "Premier Codebreakers' League", with named teams. The top of the current league standings, arranged by number of solutions, goes like this:

                              No. of Solutions   Position in League
Knights Templar Academicals           9                1
Arcady Orient                         8                3
Numbers Argyle                        6                7
Batty Brothers Albion                 6                9
Rennes-le-Château Rovers              6                4
Nifty City Shifters                   5                8
Lorn Lovers United                    5                2
Runda Town                            2                6
Maritime Wanderers                    1                5


The Bletchley codebreakers eliminated solutions "whose contributions covered UFO’s and Nostradamus, among others, or whose material arrived surprisingly soon after the challenge had been announced", though the reasons for these prejudices are not explained.

The anonymous author of the most highly-favored analysis has "discovered that the key is actually visible, over and over again, on the monument itself. By careful inspection, he says, the key ‘1223’ is revealed. After anagramming, a plain text message emerges: JESUS H DEFY."

My thoughts almost exactly. I reckon that by chosing among a set of plausible codes and keys, and then anagramming the results, I could make a ten-letter message say something more impressive than this. In my sleep.

Reading this, a chill may run up your spine as you realize that Language Log is an (unencrypted!) anagram for Agelong Gaul, and you contemplate the possible involvement of the ancient and international Gallic conspiracy in suppressing the hidden wisdom of the Knights Templar. The legacy of Philip the Fair lives on. And then again, maybe not.


Posted by Mark Liberman at 09:13 AM

November 25, 2004

Thanks Giving

Revisionists and debunkers are fond of noting the Thanksgiving holiday's history of blood, failure and fakery. Despite this, it's my favorite holiday, one of the many valuable tributes that vice pays to virtue. In addition to whatever contemporary blessings of family, health and prosperity we may enjoy, we can be thankful for the lessons of the associated history, in its darker as well as its nobler aspects.

The history really starts in 1827, or perhaps in 1863, as much as in 1621:

The establishment of the day we now celebrate nationwide was largely the result of the diligent efforts of magazine editor Sarah Josepha Hale. Mrs. Hale started her one-woman crusade for a Thanksgiving celebration in 1827, while she was editor of the extremely popular Boston Ladies’ Magazine. Her hortatory editorials argued for the observance of a national Thanksgiving holiday, and she encouraged the public to write to their local politicians.

In addition to her magazine outlet, over a period of almost four decades she wrote hundreds of letters to governors, ministers, newspaper editors, and each incumbent President. She always made the same request: that the last Thursday in November be set aside to "offer to God our tribute of joy and gratitude for the blessings of the year."

Finally, national events converged to make Mrs. Hale’s request a reality. By 1863, the Civil War had bitterly divided the nation into two armed camps. Mrs. Hale’s final editorial, highly emotional and unflinchingly patriotic, appeared in September of that year, just weeks after the Battle of Gettysburg, in which hundreds of Union and Confederate soldiers lost their lives. In spite of the staggering toll of dead, Gettysburg was an important victory for the North, and a general feeling of elation, together with the clamor produced by Mrs. Hale’s widely circulated editorial, prompted President Abraham Lincoln to issue a proclamation on October 3, 1863, setting aside the last Thursday in November as a national Thanksgiving Day.

However, Mrs. Hale was by no means the first post-colonial American to suggest a national day of thanksgiving. In 1808, the Rev. Samuel Miller asked Thomas Jefferson to proclaim a national thanksgiving day devoted to fasting (!) and prayer. Whatever our opinion on the legal issues involved, we should be grateful that Jefferson refused:

I consider the government of the U S. as interdicted by the Constitution from intermeddling with religious institutions, their doctrines, discipline, or exercises. ... But it is only proposed that I should recommend, not prescribe a day of fasting & prayer. That is, that I should indirectly assume to the U.S. an authority over religious exercises which the Constitution has directly precluded them from. It must be meant too that this recommendation is to carry some authority, and to be sanctioned by some penalty on those who disregard it; not indeed of fine and imprisonment, but of some degree of proscription perhaps in public opinion. And does the change in the nature of the penalty make the recommendation the less a law of conduct for those to whom it is directed? I do not believe it is for the interest of religion to invite the civil magistrate to direct its exercises, its discipline, or its doctrines; nor of the religious societies that the general government should be invested with the power of effecting any uniformity of time or matter among them. Fasting & prayer are religious exercises. The enjoining them an act of discipline. Every religious society has a right to determine for itself the times for these exercises, & the objects proper for them, according to their own particular tenets; and this right can never be safer than in their own hands, where the constitution has deposited it.

Though doubtless beneficial to our spiritual and physical health, a Thanksgiving of fasting rather than feasting would emphasize the cramped and defensive tendencies in American culture, instead of the generous and optimistic side. (And I doubt that it would be a very popular holiday...)

Despite the Puritan's reputation for asceticism, when they invited Massasoit (Grand Sachem of the Wampanoag) to celebrate their first harvest, it was not for fasting. The feast lasted for three days, even though neither group at that point had a lot to celebrate at that point other than lack of total extermination. The food was prepared by the four adult women among the colonists who had survived the previous year -- thirteen others had died of cold, starvation and disease. And by 1621 the mainland Wampanoag had been nearly wiped out by "three epidemics which swept across New England and the Canadian Maritimes between 1614 and 1620", reducing their population from 8,000 to about 2,000. The resulting conflicts made their situation even more precarious:

For the Wampanoag, the ten years previous to the arrival of the Pilgrims had been the worst of times beyond all imagination. Micmac war parties had swept down from the north after they had defeated the Penobscot during the Tarrateen War (1607-15), while at the same time the Pequot had invaded southern New England from the northwest and occupied eastern Connecticut. By far the worst event had been the three epidemics which killed 75% of the Wampanoag. In the aftermath of this disaster, the Narragansett, who had suffered relatively little because of their isolated villages on the islands of Narragansett Bay, had emerged as the most powerful tribe in the area and forced the weakened Wampanoag to pay them tribute.

Massasoit, therefore, had good reason to hope the English could benefit his people and help them end Narragansett domination.

The alliance with the English seems to have worked out for a decade or so:

To the Narragansett all of this friendship between the Wampanoag and English had the appearance of a military alliance directed against them, and in 1621 they sent a challenge of arrows wrapped in a snakeskin to Plymouth. Although they could barely feed themselves and were too few for any war, the English replaced the arrows with gunpowder and returned it. While the Narragansett pondered the meaning of this strange response, they were attacked by the Pequot, and Plymouth narrowly avoided another disaster. The war with the Pequot no sooner ended than the Narragansett were fighting the Mohawk. By the time this ended, Plymouth was firmly established. Meanwhile, the relationship between the Wampanoag and English grew stronger. When Massasoit became dangerously ill during the winter of 1623, he was nursed back to health by the English. By 1632 the Narragansett were finally free to reassert their authority over the Wampanoag. Massasoit's village at Montaup (Sowam) was attacked, but when the colonists supported the Wampanoag, the Narragansett finally were forced to abandon the effort.

However, the events subsequent to 1632 gave the Wampanoag little reason to celebrate:

As more British colonists arrived in Massachusetts, they began displacing Wampanoag people from their traditional lands, particularly by plying them with alcohol and obtaining their signatures on land sale documents while they were drunk. The Wampanoag leader Metacomet, known as "King Philip" to the English, tried to get this practice outlawed, and when the British refused, a war ensued. The British won decisively, sold many of the Wampanoag survivors into slavery, drove the rest into hiding, and forbade the use of the Massachusett language and Wampanoag tribal names.

As I understand this history, the English colonists were no more violent, aggressive or dishonest than the other groups competing for domination in the area; but they were more numerous and somewhat more advanced in the technologies of both war and peace, and their growth overwhelmed the nations who had been there before them.

The archives of the American Philosphical society include a recording of a prayer in Wampanoag, made in 1961 by the last semi-speaker. No transcript is provided. According to the Ethnologue entry for Wampanoag, there was a bible translation done in 1663-1685, which is discussed at greater length in this page about Harvard's "Indian College":

In 1655, Harvard completed building the Indian College — a two-story brick structure intended to house twenty scholars. The Indian College was the fourth building erected in the “Yard” and was situated near the southern end where Matthews Hall now stands. Within the new Indian College, Harvard installed a press that had previously been housed in the College president's house since 1638. From 1661 to 1663, John Eliot, "the Apostle to the Indians," printed a translation of the Bible into the Wampanoag language-- the first Bible printed in North America. John Eliot's translation remained in use some 200 years later. Between 1655 and 1672, printing presses at the Indian College produced books and pamphlets, along with primers, catechisms, grammars, and tracts— one-eighth of which were in Wampanoag. James Printer, a Nipmuc Indian who had attended Harvard several decades earlier, worked the presses.

Despite this apparently promising multicultural beginning, the "Indian College" seems to have matriculated only five native students, only one of whom graduated, and the building was torn down in 1693, 50 years before Thomas Jefferson was born.

This page gives some further information on the John Eliot's Indian Bible, entitled Mamvsse wunneetupanatamwe up-biblum God naneeswe nukkone testament kah wonk wusku testament / ne quoshkinnumuk nashpe wuttinneumoh Christ noh asoowesit John Eliot, nahohtoeu ontchetoe printeuoomuk. Cambridge [Mass.] : Printeuoop nashpe Samuel Green, MDCLXXXV:

and The Massachuset Psalter, or, Psalms of David : with the Gospel according to John : in columns of Indian and English : being an introduction for training up the aboriginal natives in reading and understanding the Holy Scriptures. Boston, N.E. : Printed by B. Green and J. Printer for the honourable Company for the Propagation of the Gospel in New-England, 1709 (translated by Experience Mayhew):

The only federally-recognized Wampanoag group today is the Wanpanoag Tribe of Gay Head (Aquinnah), living on the western end of Martha's Vinyard, at the opposite end of the island from Chappaquiddick. Other organized bands include Assonet, Herring Pond, Mashpee, and Namasket, with a total membership of about 3,000.


Posted by Mark Liberman at 10:12 AM

November 24, 2004

A plain English reading level

I received two interesting responses to my post on jury instruction reform in California. They each made me believe even more strongly that there are some real alternative employment opportunities for linguists that are being missed here.

The first response was from Danielle McCredden, a Senior Associate at a law firm in Australia:

Interestingly, when I went through law school in Australia, the plain english movement was well in swing, but never in any sort of systematic sense. Legalese and technical language was considered a stylistic flaw and would be marked down.

The program in a broad sense was supported by our federal government, and all of our legislative drafting changed over in the early eighties sometime. The reason I found your post interesting is because of an anecdotal understanding that Australian law graduates are popular with US firms for the reason that they are better at plain english drafting. I have no reason to know why this might be the case, or whether in fact it is true.

Suppose this is in fact true. I'm sure that at least some US law firms are into (the idea of) plain English. So why don't they just hire some linguists as consultants, rather than (what amounts to) outsourcing from Oz? (Outback-sourcing?)

I'm also sure that at least some US law firms are decidedly not into any kind of plain English (as Kevin Russell cynically noted with respect to at least some members of the medical profession). They could also hire some linguists, to help further clarify their language for each other and to obfuscate it for the rest of us.

The second response came from my good friend Ed Keer, a Senior Copywriter for a US healthcare branding agency:

Reading level is big in healthcare too. Pfizer has a whole department devoted to Clear Health Communications. They use a bunch of consultants, none of which, as far as I can tell, are linguists. They fixate on using reading level tests like Fry or Flesch which basically boil down to short words and short sentences. They are satisfying because they give an objective measure of the reading level (ha!). So they get a number they can hang onto even if the number is meaningless. Some of the consultants I talked to are much less interested in the number and more concerned about common sense stuff. Still most of the consumer stuff we've written comes off sounding like Dick and Jane and is pretty bad.

I'm sure the CA law will incorporate a Fry test. Too bad.

Ed's been telling me for years about some of the nonsense that reading level tests can produce when followed blindly by folks who know next to nothing about language or linguistics. Fortunately, at least some plain English groups (as identified via a Google search for "plain english") appear to be well-aware of the limitations of reading level tests. For example, the Plain English Campaign in the UK has the following on their FAQ page (emphasis added):

Do you recommend the FOG index or the 'Flesch test'?

The FOG index was a very rough measure of 'readability', created by a man named Robert Gunning in the 1940s, and used in Plain English Campaign's first report, 'Small Print', in the early 1980s. We do not recommend it, or any other mathematical formula for measuring readability. You cannot give a document a score for plain English - either it is crystal-clear or it isn't. There is no substitute for testing a document on real people.

If you use Microsoft Word, you may have seen the 'Flesch reading ease' score. This is based on a combination of sentence length and how many syllables are used. Rudolf Flesch, who created the system, warned that 'Some readers, I am afraid, will expect a magic formula for good writing and will be disappointed with my simple yardstick. Others, with a passion for accuracy, will wallow in the little rules and computations but lose sight of the principles of plain English. What I hope for are readers who won't take the formula too seriously and won't expect from it more than a rough estimate.'

Not too unreasonable-sounding, though I don't have time to read on and find out what they do recommend, or if they have any linguists on staff. I looked a little more closely at The Plain Language Action and Information Network, a US "government-wide group of volunteers working to improve communications from the federal government to the public." (The fact that their acronym spells PLAIN, like FAIR, feels like cheating to me for some reason, but that's a story for another post.) This group of volunteers has produced an interesting-looking document on "Writing User-Friendly Documents", but judging from the little I have browsed through so far, it looks like linguist volunteers are in short supply.

If you Google for "fog index" or "flesch test", you get hit after hit of the kind of reading-level nonsense that Ed is talking about ("fry test" gives you similar results, among the obvious food-related hits).

Similar nonsense can be found in A Plain English Handbook: How to create clear SEC disclosure documents (yes, from the US Securities and Exchange Commission):

What Is a "Plain English" Document?

We'll start by dispelling a common misconception about plain English writing. It does not mean deleting complex information to make the document easier to understand. For investors to make informed decisions, disclosure documents must impart complex information. Using plain English assures the orderly and clear presentation of complex information so that investors have the best possible chance of understanding it.

Plain English means analyzing and deciding what information investors need to make informed decisions, before words, sentences, or paragraphs are considered. A plain English document uses words economically and at a level the audience can understand. Its design is visually appealing. A plain English document is easy to read and looks like it's meant to be read.

A few pages later, the handbook advises the reader to "[a]ssemble the team or move ahead on your own", the disjunction virtually guaranteeing that a linguist will never be hired as the "lead writer who ensures the document uses a logical structure and simple, clear language."

As with a lot of things in life, it's the preparation that often determines the success or failure of an effort to write documents in plain English. Many of you routinely select a team to think and talk about how to write a document from scratch or rewrite an existing document. Or you may do it on your own. In that case, rest assured that one person can do it alone.

Not too reassuring to linguists seeking to enter this job market. Unfortunately, it appears that the only way for linguists to infiltrate is to make connections the old-fashioned way: law school, business school, medical school, the Ivy League ... or, perhaps someday, major league sports.

Or, design school. The quote above from the SEC Plain English Handbook explicitly mentions design, as does the Plain English Campaign's FAQ: "Plain English takes into account design and layout as well as language." Naturally, the point of plain English is better communication through the better use of the tools of communication, including (but not limited to) language. This should appeal to the sensibilities of linguists keen to the fact that language is not limited to being a tool of communication.

UPDATE. Two more responses. Melissa Fox writes:

God, wouldn't it be great if law firms *did* keep linguists on staff, either to clear up or muddy up their language, as they preferred? I did three years hard time as a paralegal between my BS and MA, and only occasionally was I asked to read documents for clarity and suggest ways to make them make more sense (which is fair, since that wasn't actually my job). Mainly, when people heard my degree was in linguistics, their only response was -- well, you can guess. "Yeah? How many languages do you speak?"

When I asked Melissa if I could quote her, she replied: "Feel free. :-) (And will you let us know if there *are* law firms hiring linguists? I've got the experience, now. Heh.)" If only I had that kind of information at hand.

Mike Pope of Evolving English writes:

Enjoyed your post on the dubious value of reading scores.

Since MS Word actually offers this "feature," a while back we played around with it at work just for fun. (I'm a technical writer.) No surprises there, really. The exercise did, however, inspire me to write [this].

And someone who prefers to remain anonymous writes:

I've got a degree in linguistics and work for a European law firm. My job is to make sure our lawyers' English documents are well written. I was hired because they needed a native speaker of English to rework documents written by non-native speakers--not because they were concerned about writing more clearly in their own language(s). I've learned at this job that lawyers mostly have no idea what linguists do or what linguists know--and they don't often care to find out. Many lawyers are too proud to accept criticism of their writing or (God forbid) to sit down for a seminar on writing skills. I also get the sense that some lawyers think baffling legal jargon and tortured syntax will impress their clients.

[ Comments? ]

Posted by Eric Bakovic at 02:44 PM

That the same kinds are not everywhere uniquely named

Geoff Pullum properly ridicules a Reuters story that tries to stir up sentiment about global warming by inciting sympathy for an alleged lexicographic challenge to Arctic indigenes:

What are the words used by indigenous peoples in the Arctic for "hornet," "robin," "elk," "barn owl" or "salmon?" If you don't know, you're not alone.
Many indigenous languages have no words for legions of new animals, insects and plants advancing north as global warming thaws the polar ice and lets forests creep over tundra.

This predicament has a classical precedent. Exactly the same problem confronted the ancient Romans, not because "new animals, insects and plants [advanced] north" but because the Romans themselves did.

Here's book ix, chapter 32 of Bostock & Riley's translation of Pliny the Elder's Naturalis Historia. I've left out all the footnotes except for [7], which is the one that's relevant:



There is this also in the nature of fish, that some are more highly esteemed in one place, and some in another; such, for instance, as the coracinus in Egypt, the zeus, also called the faber, at Gades, the salpa, in the vicinity of Ebusus, which is considered elsewhere an unclean fish, and can nowhere be thoroughly cooked, wherever found, without being first beaten with a stick: in Aquitania, again, the river salmon[7] is preferred to all the fish that swim in the sea.

[7] Hardouin remarks, that Pliny and Ausonius are the only Latin writers that mention this fish; while not one among the Greeks speaks of it. It was probably a native of regions too far to the north for them to know much about it. In this country it holds the same rank that the scarus and the mullet seem to have held at the Roman tables.

The OED's etymology for salmon suggests that the Romans just called this new fish "the leaper", more or less:

[a. AF. samoun, saumoun, salmun (OF. and mod.F. saumon): -- L. salmon-em, salmo (Pliny); the spelling with l is from the Latin form.
Cf. Pr. salmo, Sp. salmon, Pg. salmão, It. salmone, sermone. The Latin word is prob. a derivative of the root of salire to leap.]

and the AHD's entry for PIE sel- agrees:

... Probably Latin salmo (borrowed from Gaulish), salmon (< “the leaping fish”)...

As Geoff points out, the Inuit's polysynthetic language puts them in a strong position to make up descriptive words like this. They could no doubt assign to salmon a word meaning "the fish with red flesh that swims upstream leaping", or something similar. Or they could borrow the English word, borrowed in its turn from French and Latin, where is was invented as a description (a calque from Gaulish?) of the fish leaping.


Posted by Mark Liberman at 10:15 AM

Making up names

Geoff Pullum, citing his invention of the term "California bluebush" as a model, suggests that the Inuit have plenty of resources at their disposal for naming newly-encountered species like robins. Geoff mentions the "redbreast" part of "robin redbreast" in comparison, and in an earlier post he discussed the extension of the word robin to a new bird in the New World; but he might also have cited the development of the original term robin itself, which is apparently an innovation in English, certainly since the Norman conquest, and probably within the past 500 years, not long before the English first came to North America.

Robin was "a dim. or familiar form of the personal name Robert" in Old French, according to the OED, and came to be used "in more or less allusive or general application", something like Joe in today's America. We have "Joe Six-pack", "Joe College", "Joe Bloggs", "joe-pye weed", "a cup of joe", "sloppy joe" and the like; a few hundred years ago, the English had "poor Robin", "jolly Robin", "ragged robin" (= "herb robert"), "robin-in-the-hedge" (another plant), "round robin", "Robin the devil", "cock Robin", "Robin Round-cap", "Robin Run-rake" and so forth.

Robin Red-breast was just another of these coinages, used since about 1450 to name a commonplace bird. And as applied to birds, robin was such a generic term that (according to the OED) it has variously been used for the European redbreast Erithacus rubecula, for the linnet Carduelis cannabina (as in Frisian robyntsje), for the (American) red-breasted thrush Turdus migratorius, for birds of the genus Miro in New Zealand, for species of Petroica in Australia, for the green tody in Jamaica, and also applied "to the red-breasted snipe and merganser, and to the mouse-bird or coly".

Among other modified bird-names involving robin are the "blue robin, the bluebird, Sialia sialis; golden robin, the Baltimore oriole; Indian robin ...; magpie robin ...; yellow robin ...; etc." Abandoning its usual exhaustiveness, the OED suggests plaintively: "For an enumeration of the various Australian birds thus named see Morris Austral English 390-1."

Encountering new species in their travels, speakers of English have managed to cope without noticeable strain. Encountering new species brought to their territory by climate change, the Inuit will no doubt find words for them as well, as they have found words for guitars, rifles, snowmobiles and the like.


Posted by Mark Liberman at 09:38 AM

November 23, 2004

Arctic folk at loss for words again

The idiocy of journalists writing stories about people not having words for things continues. Robins, for instance. Ben Zimmer tells me that the stuff about not having words for robins in Arctic languages was being reported by the BBC several years ago; Senator McCain needed no researchers: he was merely picking up on an already established story. And David Chiang points out a Reuters article from yesterday, by environment correspondent Alister Doyle, headlined "As Ice Thaws, Arctic Peoples at Loss for Words". In it, the chair of the Inuit Circumpolar Conference (of Inuit descent, I presume) directly contradicts herself. Doyle begins thus:

What are the words used by indigenous peoples in the Arctic for "hornet," "robin," "elk," "barn owl" or "salmon?" If you don't know, you're not alone.

Many indigenous languages have no words for legions of new animals, insects and plants advancing north as global warming thaws the polar ice and lets forests creep over tundra.

"We can't even describe what we're seeing," said Sheila Watt-Cloutier, chair of the Inuit Circumpolar Conference which says it represents 155,000 people in Canada, Alaska, Greenland and Russia.

In the Inuit language Inuktitut, robins are known just as the "bird with the red breast," she said.

Let P = We can't describe robins. Let Q = We call robins "the bird with the red breast." Clearly, P  flatly contradicts Q. Of course you can describe what you're seeing, certainly in Inuktitut, a language in which a single word meaning "little bird with red breast" could be constructed in an instant, but also in English, where bird names like "spoonbill", "redwing", and "woodpecker" were obviously invented that way (as Ray Girvan points out to me). Yet she utters her contradictory nonsense anyway. Why? Is she out of her mind?

Well, no more than the journalists who prompt for such drivel by asking questions designed to elicit it. When indigenous Arctic birdwatcher Roger Kuptana of Sachs Harbor told the BBC interviewer Bob Carty about having seen a robin, Carty immediately prompted, "What's the word in your language for 'robin'?", so he could get Kuptana to say that there wasn't one (listen to it here).

These linked ideas about language use being a matter of having appropriate words to name things, and seeing or experiencing being impossible without the words to act as mediators, add up to a claim about language that is just palpably insane. You can immediately refute it from your own experience. When my partner Barbara first moved from Ohio to California she noticed the beautiful dark green foliage and bright blue blooms of ceanothus by the side of Route 17, and asked me what it was. I am not good on plant names. "That, Barbara," I told her very solidly and authoritatively, "is California bluebush." The invented pseudo-name did fine until she had had time to visit a few nurseries and find out its real name (often misspelled "cyanothus", as Google, and the URL of the picture link above, will confirm). You don't need real names for new things if you have command of a human language.

Or take the point Mark recently made about the complete lack of dedicated vocabulary (so far) for "the processes, categories or roles involved in academic outsourcing." He seems to be right. But does that mean I cannot describe what goes on when students cheat in my Unix course? How about "student who paid someone to write a piece of code for him so he could pass his programming course" for student academic outsourcers? Or how about using "student academic outsourcers"? Or "snivelling little cheating weasels"?

You get the point. I won't go on about it. But I tell you, this continual harping on the "no word for it in their language" meme strikes me as one of the two most irrational features of everyday attitudes to language (the other, of course, being willingness to believe in rules of English grammar that simply never existed). Sheila Watt-Cloutier is probably a perfectly sensible person most of the time. But ask her about naming things, and suddenly she loses her wits, and so does the journalist who is scribbling down what she says. Oh! A bird for which I don't have currently have an indigenous species name! Waaaah! I am struck dumb! Waaaahhh! I am mute! Glubble glubble glubble...

The late philosopher Jerry Katz maintained that natural languages were inherently without expressive limits: that because of their expressive power and the possibility of paraphrasing when the lexicon provided no short way of making reference to a concept, there were no limits at all on what could be said in a natural language: the set of propositions that could conceivably be expressed in some language or other and the set of English sentence meanings were the same set. It seems very likely to me that Katz was right. But this whole do-they-have-a-word-for-it thing seems to be tacitly predicated on the unargued assumption that he was wrong.

And as for bringing it into discussions of global warming: of all the stupid things about to be worried about! If the more alarming statements of the global warming problem are right, the problem with the spread of warmer climate northward and the melting of the Arctic and Antarctic ice sheets is not that circumpolar nomads will not have distinct lexical roots for each new species of flora or fauna that makes its way to the shores of the Arctic ocean. It's that whole nations like Kiribati and Tuvalu will be submerged and entire populations of hundreds of thousands of people will have to become immigrants to Australia which won't want them; it's that Holland and Bangladesh may be completely inundated, and Florida may shrink to a fraction of its present land area, and parts of the Middle East may become uninhabitable... This is big, people! Whole countries disappearing like Atlantis. Why in god's name do ecojournalists have this passion to write instead vapid stories about running short of words?

Still, I wish there was a lexeme in my language that meant "ridiculous prospect of scribbling morons coming up with pathetic trivia about lexical deficits instead of researching stories of genuine scientific interest." I could really use that.

Posted by Geoffrey K. Pullum at 12:18 PM

Google Scholar

After reading Daniel Akst's article on computer text generation in yesterday's NYT ("Computers as Authors? Literary Luddites Unite!"), I decided to use it to try out Google Scholar.

Akst describes his first example of computer-generated text like this:

That pregnant opening paragraph was written by a computer program known as Brutus.1 that was developed by Selmer Bringsjord, a computer scientist at Rensselaer Polytechnic Institute, and David A. Ferrucci, a researcher at I.B.M.

Probing Google Scholar with {Brutus Bringsjord Ferrucci}, the first hit is a .pdf for the 52-page preface of a 1999 book by Bringsjord and Ferrucci, "Artificial Intelligence and Literary Creativity: Inside the Mind of Brutus, a Storytelling Machine". There are 37 other hits, and more than half look pretty good. It's pretty clear that we're talking about five-year-old work with some more recent commentary, but that's the fact of the matter, not any flaw in Google Scholar's search and retrieval.

After the next computer-generated passage, Akst writes

What you just read is the work of StoryBook, "an end-to-end narrative prose generation system that utilizes narrative planning, sentence planning, a discourse history, lexical choice, revision, a full-scale lexicon and the well-known Fuf/Surge surface realizer." Believe it or not, that description was written not by a computer but by the humans who created StoryBook, Charles B. Callaway and James C. Lester, who are computer scientists.

Asking Google Scholar about {StoryBook Callaway Lester} gives four hits, of which the first leads to a .pdf of a 2001 book chapter by Charles Calloway, "A Computational Feature Analysis for Multilingual Character-to-Character Dialogue". The others are also relevant.

So even without actually reading any of the links, we've learned that Akst is not reporting breaking news here. He doesn't pretend to -- his article is billed as "an essay", not "a news flash" -- but the context will likely lead some readers to imagine that there's been some recent breakthrough. On the contrary, the most interesting aspect of this strand of computational linguistics, from my perspective at least, how old-fashioned it is. For example, Calloway's work (which is more recent than Bringsjord's) is based on FUF, "functional unification grammar", which is High Classic AI. I'll postpone an explanation of the issues for another time, but let's say that a plausible analogy would be unification:computational linguistics::I.M. Pei:architecture. Or perhaps unification:computational linguistics::John Havlicek:basketball.

Anyhow, Google Scholar did just fine so far. How does it compare with regular Google? Well, if we probe regular Google with the two search strings that we tried, {Brutus Bringsjord Ferrucci} and {StoryBook Callaway Lester}, it turns up pretty much the same stuff, and more besides, without too much irrelevant junk. But these are pretty well specified search strings. However, "Brutus" and "Storybook" generate a big pile of irrelevant stuff on the top of the returns from both searches, while {Brutus Bringsjord} and {StoryBook Callaway} are specific enough to get useful information at the top in both cases. So in this test, we're not seeing any evidence of a real benefit due to limitation of the search to the scholarly literature.

In principle, Google Scholar ought to offer not just more focused search, but also links to some material not normally indexed, because it spiders some journals and other sources not generally accessible. That didn't turn out to matter in this case, but sometimes it ought to make a big difference. Just as important, Google Scholar sometimes offers "cited by N" links that let you see how many other indexed sources cited a given document. Even better, you can click the link to see the list, and iterate the process to explore the citation space. I didn't report on those links in this case, though a couple of minutes of poking around turned up some interesting things.

Anyhow, I'm adding Google Scholar to my Firefox toolbar, and will continue to try it out.


Posted by Mark Liberman at 10:23 AM

Academic outsourcing

Two email messages from faculty in computer science, recently passing through my inbox:

A: I wonder if I am hopelessly naive about these things, but has anyone else had any experience with ?
Apparently, this is a web site where students can hire other students to complete their programming assignments.  It seems that my latest  homework assignment in [Course X] was posted to the site.  I'm not quite sure how to proceed.
B: A colleague of mine at another university discovered one of her assignments at and simply bid very low, won the auction, and was contacted by the "winning" student...

B's solution is clever, but it won't always work -- student buyers can be anonymous, can disguise the solution before submitting it, might choose bids on the basis of reputation as well as price, etc.

However, I suppose that students who are not smart or industrious enough to do their own work may also not be smart or industrious enough to avoid being caught in simple traps.

I should point out that bills itself as a service to honest businesses rather than to dishonest students: "an international marketplace where people who need custom software developed can find coders in a safe and business-friendly environment. Buyers can cherry pick from a pool of 90,408 coders...enabling them to hire a coder across the country or across the globe…from the comfort of their computers." I don't have any idea what proportion of their transactions involve outsourcing course assignments. However, I observe that one of their work categories is "homework helper", so they are doing business in this space without any pretense.

Similar markets exist for paper writing and other tasks, not to speak of the large market in pre-written papers. For someone paying full tuition at private college or university in the U.S., each course costs about $3,000. If you think of this as a sort of social licensing fee, as opposed to an opportunity to learn, then it may look like a bargain to pay a few dollars more -- or even a few hundred dollars more -- to get someone to do an assignment for you, in order to get a better grade.

I don't know how reliable the numbers are, but I've read about several studies that say that most college students admit to one or another kind of academic cheating. Hiring proxies -- outsourcing college work -- strikes me as potentially the most troublesome kind of cheating. It's hard to detect by the usual methods, and it leads to wealthy students who are lazy or slow hiring smarter, poorer students to do their school work for them. This is the way of the world, I suppose. But if a generation of American problem sets and term papers are actually done by contractors in India or Romania...

Of course, there's nothing new about this mode of cheating: students were paying others to do their assignments when I was an undergraduate, back in neolithic times. Even earlier, in 1951, Ted Kennedy was kicked out of Harvard (for two years) for getting a friend to take a Spanish exam for him. I'm not convinced that the frequency of proxy work is much greater now than it was in Ted Kennedy's college days, though I admit that I don't have any evidence. The worldwide digitally-networked marketplace, with provisions for essentially anonymous transactions, certainly increases the opportunity. On the other hand, today's more egalitarian admissions policies may decrease the demand. In my own recent experiences with college students, outside the classroom as well as in it, they seem to be smart, earnest, hard-working and honest. But the best evidence is lexicographic: as far as I know, there is no new slang or jargon for the processes, categories or roles involved in academic outsourcing. (If you know any, let me know)

All the same, there are likely to be some consequences as the market in academic proxies grows, or at least becomes more blatant. Instructors will be forced to put more emphasis on in-class written tests, or on oral exams where students explain and defend their out-of-class work. Cooperative small-group projects may make it harder for individual students to outsource their research, writing and problem-solving without being detected. And some faculty or administration groups may set up systematic sting operations to try to poison the well at places like, along the lines suggested by B. There's a job category for you: special investigator for AHA, the Academic Honesty Agency.


Posted by Mark Liberman at 08:35 AM

November 22, 2004

Hey Judge, these instructions are, like, too long or, like, in Latin or something

Legal language, what a coincidence. Listen to this short piece on NPR's Morning Edition this morning, reporting that "judges and legal experts" in California "are revamping the instructions given to jurors for the state's criminal cases. Studies have shown that jurors are often confused by the very language that is supposed to help them determine a defendant's guilt or innocence." Carol Corrigan is an Associate Appeals Justice in San Francisco "who heads the task force rewriting the instructions for jurors", and she says:

All the studies seem to show that people's attention spans certainly aren't getting any longer. If you look at the way information is disseminated to people, everything has to be the length of a bumper sticker.

Judge Corrigan "says there is a reason why the average person cannot decipher legalese terms rooted in Latin and Norman French. 'Because it's written by lawyers for other lawyers, primarily.'"

For those keeping score at home, here is a summary of what has been asserted so far. Jury instructions are confusing because:

  1. they require longer attention spans than the average person has; if it's not bumper-sticker length, forget it.
  2. they require some knowledge of Latin and Norman French vocabulary that the average person doesn't have.

We could quibble over the veracity of these assertions, but here's the punchline: the one example of jury instruction simplification cited in the piece.

So, old phrases like "mitigating factors" would be changed to: "factors that make the crime less worthy of punishment".

In terms of the short attention span problem, I think the average person would be lost at around "less worthy". In terms of the vocabulary problem, this proposed rephrasing is flanked on both sides by Latin and French words: "factors" [ad. Fr. facteur, ad. L. factor, agent-n. f. facĕre to do, make. Some of the obs. senses are immediately from L.] and "punishment" [a. AF. punisement (13th c. in Britton) = OF. punissement, f. punir to PUNISH: see -MENT.]. (Etymologies courtesy of the OED Online.)

Don't get me wrong -- I do think that more average people will understand the new phrase than the old phrase. But at the very least a better example could have been chosen here, don't you think?

Responding to concerns about the possible legal consequences of these changes to jury instructions, Judge Corrigan is equally off the mark:

We aren't going to bring it down to the manner of speaking of Beavis and Butt-head, [laughs] you know? And the law is not a See Spot Run kind of enterprise, so it's not going to be reduced to that level.

Laurie Rozakis, an English professor and author of English Grammar for the Utterly Confused, can apparently be counted on "to help strike a balance between basic English and legalese":

I'm a very big proponent of clear, direct, simple prose. [...] Make it communicative; make it communicate quickly and easily -- especially when someone's life is at stake.

Movements to "simplify" legalese are popping up all over the place, and have already made inroads in some states (according to this NPR piece). Is there a linguist involved in any of these movements? I sure hope so.

[ Comments? ]

Posted by Eric Bakovic at 12:38 PM

No lawmaker left behind

In the spirit of the NCLB testing program, I'll suggest a question for assessing the reading abilities of congressional candidates.

Read the following sentence carefully:

"Hereinafter, notwithstanding any other provision of law governing the disclosure of income tax returns or return information, upon written request of the Chairman of the House or Senate Committee on Appropriations, the Commissioner of the Internal Revenue Service shall allow agents designated by such Chairman access to Internal Revenue Service facilities and any tax returns or return information contained therein."

True or false: if enacted into law, this sentence would not allow any inspections of tax returns.

This is not a trivial question, apparently. Over the weekend, the quoted sentence was inserted into the big end-of-session spending bill, as the result of a request from Rep. Ernest Istook (R-OK).

Members of the Senate expressed various degrees of annoyance and embarrassment when this was noticed, and unanimously passed a resolution repudiating the provision. According to the AP, Rep. Instook has said that the provision has been misinterpreted, and is not his fault anyhow:

Istook, chairman of the House Appropriations transportation subcommittee, said in a statement Sunday that the Internal Revenue Service drafted the language, which would not have allowed any inspections of tax returns. ``Nobody's privacy was ever jeopardized,'' the statement said.

I haven't been able to find a copy of Rep. Istook's actual statement, to see if he really asserted that the quoted provision "would not have allowed any inspections of tax returns"; nor have I seen any evaluation by a spokesperson for the IRS about their role in drafting the provision, or their interpretation of its consequences.

So the possible relationships between reality and reporting are complicated here. However, after several minutes of pondering, I can't come up with a possible world, plausibly related to these reports, in which Rep. Istook is not stupid or dishonest or both. I await clarification, since I'd rather think that a congressman would be in favor of congressional access to tax returns than that a congressman can't understanding the plain meaning of a provision that he pushes into a bill, or would be willing to tell a deliberate and obvious lie about what (he thought) the provision means.

Nevertheless, I take the whole story to be good news about the health of the republic. At least some Senate staffers actually read the text of bills before they become final, and are capable of understanding what the text means; and everyone in Washington agrees (at least in public) that it's a bad idea for politicians to go snooping into people's tax returns.

Whether (some) congresscritters need remedial English remains to be seen.

[ Update: Rep. Instook's website suggests that he (or his staff) has a larger problem with linguistic detail. To start with, he has a minority view about how representative is spelled:

From the bottom of my heart, I am humbled and honored to be re-elected to represent you in the House of Represenatives[emphasis added]

This might have been a simple typo -- the omission of a "t" -- but it might also reflect the influence of pronunciation. Most Americans, including me, flap and voice the "nt" sequence in this word, so that the pronunciation becomes something like [ˌrɛ.prɪˈzɛ.ɾ̃ə.ɾɪv], which is exactly how "represenative" would probably be pronounced, if it were a word. There may also be some submorphological resonance between represenative and senator lurking here.

Google currently finds 23,000 other pages with the spelling "represenatives", compared to 32,000,000 with the spelling "representatives". So this mistake happens, but the frequency is still less than one in a thousand.

There's certainly nothing either shameful or ignorant about the pronunciation that may underlie this spelling error -- it's the result of regular sound laws that are nearly universal in today's American English, and I speak the same way myself. To make the point that there is no regional or political prejudice here, let's note in passing that this flapping and voicing of post-stress coronal consonants is less culpable than Senator Kerry's pronunciation of paraplegic, which is arguably a mistake, though a common and natural one. And as regular readers know, I'm a sloppy typist and a bad proofreader, so I'm in no position to carp about typographical errors.

But however erratic and irrational English spelling might be, it's a matter of strict social convention rather than individual choice. And typographical errors should be caught and corrected, especially in prominent places. So for a U.S. representative to misspell the word respresentative in the first sentence of his home page is, let's say, not a sign of mindful communication.

The rest the home page message is not much better:

Thank you to all of the countless hours that were spent for me and other candidates all across Oklahoma.  I am going to keep this site current and up to date and stay tuned for some new features.  Thank you again for your vote and support in this last election!

I suspect that Rep. Istook meant "thank you [to my supporters] for all of the countless hours...", not "thank you to all of the countless hours". I don't think there's any variety of English in which "thank you to X" means "thank you for X".

In the next sentence, "current and up to date" is redundant and says the same thing twice. Worse, it produces a confusing sentence with adjacent conjunctions on different levels, which may be why the writer didn't notice the questionable non-parallelism of "I am going to keep this site current" and "stay tuned for some new features".

All in all, this is not the home page of someone who chooses and arranges his words carefully.

To get back to the controversial spending-bill provision, let's compare it side-by-side to Istook's statement on his home page. On balance, I'm inclined to believe that the texts in the two panels below were not written by the same person. The right-hand text is in the first person and on Istook's home page, and if it was written by a professional speechwriter, Istook was badly cheated. So I conclude that someone other than Istook wrote the left-hand text, just as he claimed.

Hereinafter, notwithstanding any other provision of law governing the disclosure of income tax returns or return information, upon written request of the Chairman of the House or Senate Committee on Appropriations, the Commissioner of the Internal Revenue Service shall allow agents designated by such Chairman access to Internal Revenue Service facilities and any tax returns or return information contained therein. From the bottom of my heart, I am humbled and honored to be re-elected to represent you in the House of Represenatives. Thank you to all of the countless hours that were spent for me and other candidates all across Oklahoma. I am going to keep this site current and up to date and stay tuned for some new features.

Questions still open: who wrote the left-hand sentence? why? why did Istook ask for it to be inserted in the spending bill? Did Istook understand what the passage said?

Here's a wild guess: Istook wanted legislators to have better access to income tax information, in order to be able to predict the fiscal effects of changes in tax law. (He cares about this stuff because he's the author of a balanced budget amendment). He asked someone to draft a provision that would eliminate certain roadblocks that privacy considerations now impose. They did so, carelessly and without thinking about the broader consequences, and Istook had his staff push it into the spending bill without reading it, or at least without understanding it.


[Update #2: Philip Brooks points out that the mysterious author(s) of the provision probably meant hereafter rather than hereinafter. He also provides a "plain English" translation:

From now on, no matter what the rest of the law says, the Chairmen of the House and Senate Appropriations Committees and their staff can get permission to go into IRS offices and look at anyone's tax returns.

I'd add "or any other return-related information".

Meanwhile, Senator Stevens showed reporters a hand-written form of the proposal, allegedly from an IRS employee, in an attempt to demonstrate "neither he nor any other Republican had crafted the potentially privacy-invading language".

Despite this handwritten evidence, the Washington Post reported on Wednesday 11/24 that "doubts remained yesterday over exactly how the controversial tax-return provision -- which allows Appropriations Committee chairmen or their "agents" access to Internal Revenue Service facilities or "any tax returns or return information contained therein" -- got into the omnibus spending bill late last week. House Republicans blamed committee staff aides and the IRS". As Joshua Marshall pointed out, it's hardly credible that after four days, the causal chain involved is still so unclear that the paper can only write about unidentified staffers for unidentified representatives or senators dealing with unidentified IRS employees. This is the best investigative reporting that the Washington press corps can come up with? ]


Posted by Mark Liberman at 10:43 AM

November 21, 2004

I said a hip hop = asereje ja

I'm not sure whether this means that I'm completely out of it, or just Euro out of it, but I'll confess that I hadn't heard the Ketchup Song before reading about it in this post by Ray Girvan at the Apothecary's Drawer Weblog.

This is an impressive example of cross-linguistic eggcornage -- I especially like the way in which the flapped /d/ in "I said a hip" turns into the tapped /r/ in "aserejé" -- but my favorite in the Spanish-English category is still bebop-aroo.

[Update: David Landfair emailed to point out that "I said a hip hop" → "aserejé ja" assimilates the English phrase to the inventory of Spanish sounds and Spanish syllable structure (what linguists call phonotactics), but results in spelled-out gibberish rather than real Spanish words; and so it shouldn't really be called a "cross-linguistic eggcorn". Fair enough -- though not all the lyrics to Rapper's Delight were in the dictionary either, at least in 1979. ]

Posted by Mark Liberman at 10:30 AM

November 20, 2004

English in Kurdistan

There's a letter in Friday's Financial Times from Brendan O'Leary and Khaled Salih, Constitutional Advisers to the Kurdistan National Assembly, arguing as follows:

That many Kurds have chosen English (or "American") as their primary second language is evidence of Kurdistan's progress, and should be welcomed. English is the lingua franca of advanced scientific and medical journals, and of international governmental and business organisations; and it is the emergent public language of the European Union that Kurdistan's neighbour, Turkey, may soon join.

It is equally in the interests of Arab Iraq to have English as its second language, not least to bridge the three deficits in the Arab-speaking world identified by the Arab Human Development Report of 2002; namely, the democracy deficit, the female equality deficit and the knowledge deficit. Kurdistan's comparative success in these three domains owes much to the prevalence of European second languages among its diaspora and residents. English, as a post-colonial and a world language, is the appropriate impartial link medium for a pluri-national, federal and democratic Iraq in which both Arabic and Kurdish will be official languages.

They are responding to an Op-Ed piece by Damjan De Krenjevic-Miskovic and Nikolas Gvosdev of the Nixon Center, published on 11/16 (accessible only with a subscription).

I recently heard a Kurdish official describing a meeting with an (Arab) representative of the current Iraqi governing council. According to the Kurd, he began the discussion in English, and the Arab objected: "Can't you speak Arabic?" The Kurd's response: "No, I can't." In fact, his command of Arabic, both literary and colloquial, is excellent -- he was making a true statement about his perceived moral and political obligations, rather than a false one about his linguistic abilities.

Recent history helps to explain this attitude.


Posted by Mark Liberman at 09:38 AM

A man stopped on I-74, someone rear-ended him, a semi saw the accident, and then...

A few days ago, Phil Dennison complained about some cases where "journalists go out of their way to avoid statements of fact". The example that got his attention was a WaPo story about teen drinking and traffic deaths in Montgomery County, which included the picture caption "Sarkis George Nazarian Jr., 16, was killed when the SUV he was driving hit a tree."

Phil objects that "the SUV didn't do anything independently of the person driving it", and argues that the caption should read "Sarkis George Nazarian Jr., 16, died [not 'was killed'] when he drove his SUV into a tree."

Phil cites some other examples of a similar sort from the same paper, and asks

Is this common in other news outlets when reporting on vehicle crashes? Why? Even if they feel the need to include an “apparently” or an “allegedly,” why not attribute the action to the driver rather than the vehicle?

There's a simple answer to the narrow version of this question: for a newspaper to say that "so-and-so drove his vehicle into a tree" would suggest a degree of intent that is lacking when a drunk teenager loses control of a vehicle on a slippery back road at night.

But Dennison is asking a larger and more interesting question about reference to people and their vehicles, as well as about attributions of agency in the case of "accidents". We often talk about people when we mean cars ("I'm parked around the corner") and cars when we mean people ("the BMW swerved to avoid the pothole"). And this is especially true in the case of traffic accidents, where the action is usually described in terms of the vehicles involved, and the people are mentioned mainly as victims acted on by the vehicles and other relevant objects.

There seem to be several reasons. One is that it is usually unclear what the causal sequence really was, and what role human intentions played in it. Consider this report from today's news in Australia:

(link) The man received serious head and facial injuries when the monkey bike he was riding and a taxi collided at the corner of Chesterville and Wickham roads, Moorabbin, about 11.35pm (AEDT) yesterday.

There was a collision between a taxi and a miniature replica motorcycle (known as a "monkey bike"); the motorcyclist was seriously hurt. Did the cyclist steer his bike into the taxi on purpose? Probably not. Did he do something irreponsible that contributed to the accident? It's not clear. Was the taxi driver speeding? The article doesn't raise the issue. The available fact is just that a collision happened. If the police and the courts and the people involved are not saying anything about human agency in a routine traffic accident, there's not a lot of scope for a reporter to speculate, and an impersonal description of what happened to the vehicles may be the only reasonable option.

A second reason for attributing actions to vehicles rather than people, even when talking about perceptions and intentions, is that the vehicles may be more salient in our mental picture of the event than the people are. Consider this quoted description of a recent multi-vehicle accident in Indiana:

(link) Hendricks County Sheriff Jim Quearry explains what happened, "A car was stopped on the westbound lane of I-74 just west of Brownsburg. Another car rear-ended that car. It sort of caused a chain reaction. A semi tractor trailer saw the accident and was slowing to avoid it and he was rear-ended by another semi trailer. The second semi trailer hit an SUV prior to hitting the first semi trailer. That SUV was knocked off the interstate and down the embankment and the driver of that vehicle was killed. The driver of the second semi was pinned in his vehicle. He was cut out of that by various fire departments."

Some of these events seem to to have been more-or-less ballistic interactions among very large objects moving fast ("that SUV was knocked off the interstate and down the embankment") rather than anything controlled by humans. But even when human perception and action are at issue, Sherriff Quearry talks about the vehicles rather than the people ("a semi tractor trailer saw the accident and was slowing to avoid it"). This may be because at the time of the quote, Sherriff Quearry apparently didn't yet know the names of the people involved, but I suspect that even if he had known who they were, his description would have focused on the vehicles.

A third reason for metonymic confusions between people and their vehicles seems to be a preference for descriptive (and perhaps conceptual) simplicity. If person X is behind the wheel of vehicle Y, and vehicle Y comes to a halt, it's easier to say simply that X stopped or that Y did, rather than to say that X brought Y to a stop, even if that's what happened. And of course if the causal chain is murky (the car ran out of gas, or the driver fell asleep), then there's another reason to keep it simple.

Consider the descriptions in another story on the same Indiana accident. In the lede, it's the driver who stopped:

A 35-year-old Indianapolis man was arrested on drunken-driving charges today after authorities say he came to a virtual halt on I-74 in Hendricks County, causing a chain-reaction crashed that killed another motorist.

But a few paragraphs later, it's the car that stopped:

Sheriff's Capt. Brett Clark said the wreck apparently was caused by a car that stopped in the middle of the highway and was struck by another vehicle.

This contrast seems to involve both uncertain agency and relative salience or focus. The drunk driver may have passed out or fallen asleep. In this case, it seems appropriate for the lede to say that he "came to a virtual halt", since it's not so clear that he "brought his car to a virtual halt" or that he "almost stopped his car" or whatever, since such phrases suggest a degree of control or intentionality that may have been lacking. I guess the reporter could have written "after his car came to a virtual halt on I-74", but the focus is on the driver at that point.

When we get to the quote from Capt. Clark in the 6th graf, the focus is on the dynamics of the accident, and so the description shifts to the vehicles. It then shifts to the people again as the deaths and injuries come into the picture:

Sheriff's Capt. Brett Clark said the wreck apparently was caused by a car that stopped in the middle of the highway and was struck by another vehicle. Behind that collision, a semi slowed and was hit from the rear by Rhoades' sport utility vehicle, which was then struck by a second semi.

Rhoades was killed in that secondary collision when the vehicle left the road and struck a post, Clark said. [emphasis added]

The other most serious injury was to Robert N. Fox, 47, of Sumner, Iowa. He was driving one of the semi-tractor trailers. He suffered a broken leg, cuts and lacerations over the majority of his body.

The sentence in bold is quite similar to the picture caption that bothered Phil Dennison ("Sarkis George Nazarian Jr., 16, was killed when the SUV he was driving hit a tree"). I surmise that Phil was annoyed by the failure of the caption writer to assign responsibility, and it's easy to sympathize with his reaction.

In the case of the Indiana accident, the unfortunate SUV driver seems to have had no culpable role whatever in the causal chain that led to her death. She was just in the wrong place at the wrong time, driving along I-74 when a chain-reaction accident caused two large trucks to hit her vehicle and force it off the road into a post. So her vehicle "struck a post", she didn't drive it into a post; and she "was killed", though she also "died", right?

But in the case of the Montgomery County teen DUI death, even though it was a single-vehicle accident, the driver's immediate agency is not obvious. The WaPo story makes it clear that this wasn't vehicular suicide -- Nazarian didn't intend to steer his SUV into a tree. He was driving at 1:30 a.m. on "a rain-slicked two-lane road", after leaving a party where the police found "the remnants of 12-packs and 30-packs", and "the 1997 Jeep Grand Cherokee he was driving slid off Travilah Road and hit a tree." So he was apparently drunk, perhaps driving too fast, and lost control of the vehicle on a dark, slippery and winding road.

It's fair to feel that Nazarian bears more responsibility for what happened than Rhoades did. But the immediate sequences leading up to the two fatal impacts were similar: both drivers were behind the wheel of an SUV sliding out of control off a road and into an obstruction.


Posted by Mark Liberman at 09:17 AM

November 19, 2004

Canadian translation quiz prize awarded

Nassira Nicola and Mike Gillis are joint winners of the Canadian translation quiz prize. The mystery dialect, by the way, was Ontarianese. You'll recall that the Canadian text said "Cripes! Grade thirteen! Here's a loonie -- buy yourself a Coffee Crisp, eh?" The approved translation is as follows.

"Man, the SAT's! Here's eighty-four cents — buy yourself a Snickers, OK?"

Mike and Nassira between them add various helpful cultural notes (I paraphrase and add at certain points). Grade thirteen, in the province of Ontario, was the year after your senior year (grade twelve) in high school. It was phased out last year: school year 2002-2003 was the last in which there was a grade thirteen. It was also called 'OACs', as your courses in grade thirteen resulted in Ontario Achievement Courses. The courses were called 'OAC English', 'OAC Calculus', etc. The similarity to SAT's is that they're for getting you placed in a good university. (Bob Kennedy suggests translating "grade thirteen" as "fifth year of high school", and that might be a good idea. Certain schools in Britain, including the one I went to, had something known as the "third-year sixth", which was equivalent: an extra year to prepare for a serious attempt at getting into a good university with a scholarship.)

A Coffee Crisp is an sort of chocolate bar, coffee-flavoured and consisting of (crispy) wafers with a chocolate coating. Mike quite reasonably takes the Snickers bar to fill a similar ecological niche in American culture.

Cripes is just a general exclamation, with no more vulgarity than "Gee!" or "Golly!" (Mike says he had no idea it was Canadian. It originates, of course, as a corruption of Christ; so does crikey, which is heard a lot in Australia.)

Loonies are Canadian dollar coins, which feature a picture of a loon floating in water on the front. A loon is a species of bird of the family Gaviidae.

Mike adds that everyone should move to Canada. Well, maybe. I think it should be a matter for personal decisions. I myself have always been a little bit intimidated by some of the cruel jokes about the place, two of them recorded here. But you shouldn't let those remarks dissuade you. Head up there. Take a look around. Spend a loonie on a Coffee Crisp, eh?

Posted by Geoffrey K. Pullum at 02:11 PM

Inadvertent eponymy

Peter Fisk has pointed out that Arnold Zwicky's recent essay on Robert Hartwell Fiske's reaction to trepidatious is "a fisking of a Fiske".

I can't believe I missed this accidental (?) pun myself. I nominated fisk for ADS 2003 Word of the Year, a suggestion that got no traction at all at the meeting. In the end, the palm went to metrosexual. (These two words are still reasonably competitive in cyberspace, with 121,000 web hits on Google for fisking, and 183,000 for metrosexual. My money is still on fisking in the long haul, but we'll see.)

Meanwhile, Helpful Google still asks

Did you mean: fishing  


Posted by Mark Liberman at 07:29 AM

November 18, 2004

Translation quiz: Canadian

A language translation quiz for polyglot readers: in the regular Back Page feature in the latest New Yorker (11/22/04, p. 108 of the print edition) Bruce McCall has a very funny spoof application form for disillusioned Democrats to apply for permanent Canadian residence. Question 8 says:

Translate the following statement: "Cripes, grade thirteen! Here's a loonie — buy a Coffee Crisp, eh?"

A note on the form adds:

This is really just practice for when or if you come here to live; best to get used to the idea that even if Canadian language isn't really distinctive, like, say, Hungarian or Finnish, we do sometimes use funny-sounding words and phrases, so "forewarned is forearmed"!

Personally I can't make heads or tails of it. Give me Finnish any time. Translate yliopistokirjakauppa. Piece of cake. The answer is here. But the answer for the Canadian quiz above will be not be given until tomorrow, when Bill Poser, long-time Canadian resident, may be able to deal with it. Or possibly not.

Posted by Geoffrey K. Pullum at 10:43 PM

Uplifting link for the day

I'm generally a happy and optimistic person anyway, but it helps a lot to come across something like this from time to time. Not to speak of this.

Posted by Mark Liberman at 04:47 PM

Sorry survey

In order to commemorate the 70th anniversity of the game Sorry!, Parker Brothers has proclaimed November 15-21 as National Sorry Week (Nov. 15-21). They also hired the Opinion Research Corporation to do a survey, based on 1,020 telephone interviews, to see what Americans are sorry about and how often they say so.

This press release gives some details of the results. The featured point seems to be gender roles: men are 37% more likely to have apologized recently to their significant other than women are (56% vs. 41%). This appears to be based on self-reporting -- no figures are quoted for how often each sex reports themselves to have been apologized to. Neither are any figures given about how often respondents believe themselves to have been involved in behavior requiring an apology, in one direction or the other.

Apparently participants were asked about whether they had apologized for any of a list of specific acts. The examples given seem to be heavily weighted towards stereotypically male faults, and correspondingly got a higher percentage of male 'yes' responses:

  • Leaving a mess (44% compared to 37%, respectively)
  • Forgetting to take out the trash (29% compared to 19%, respectively)
  • Leaving dirty socks on the floor (30% compared to 16%, respectively)
  • Missing dinner (26% compared to 19%, respectively)
  • Not replacing the toilet paper roll (24% compared to 18%, respectively)
  • Drinking the last of the orange juice (23% compared to 14%, respectively)
  • Drinking from the milk carton and putting it back (22% compared to 10%, respectively)

I don't know whether there was a corresponding list of stereotypical female faults. My guess is not, but I haven't been able to find a complete list of the survey questions and the responses.

I also don't know how carefully respondents were questioned about what really constitutes an apology, an issue that Geoff Pullum has discussed several times.

At least a few journalists have taken the bait, called psychologists and other alleged experts, and banged out a story on this, and there are several polls in play asking people what they want apologies for (in Philadelphia, so far, it's the Schuylkill Expressway).

[Update: I wrote to the PR firm that put out the press release about the sorry survey, to ask for a copy of the survey questions and the answer distributions, but I haven't gotten any response so far. Meanwhile, more discussion of internet sorriness can be found here. ]


Posted by Mark Liberman at 11:34 AM

November 17, 2004

Not a word!

Language sticklers -- people who propose to regulate a language by stipulating which usages are acceptable and which are not -- tend to be hostile to innovations (as well as to informal style and non-standard vernaculars), which they view as a threat to the language and respond to with some combination of dyspepsia, disdain, ridicule, and invective. Toward the playful end of the stickler scale we have David Foster Wallace, recently discussed here in Language Log. Toward the raving end, with over-the-top contempt couched in icy high style, we have Robert Hartwell Fiske, editor of The Vocabula Review.

Fiske is so much on top of the English language that he can baldly assert to us that some things are words in the language and others aren't. It must be wonderful to have such sure knowledge. A case in point...

Part 1. The main story. I begin with a posting I made to the American Dialect Society mailing list on November 2, somewhat revised here.

While I was putting Robert Hartwell Fiske's The Dictionary of Disagreeable English: A Curmudgeon's Compendium of Excruciatingly Correct Grammar (2005 -- yes, 2005, this book is really on the cutting edge of the time line) onto the shelf, it fell open to a page with an entry for TREPIDACIOUS, which caught my eye because i am an occasional (and proud) user of the word TREPIDATIOUS 'tremblingly reluctant' and took TREPIDACIOUS to be a misspelling of this word, which should have a T because TREPIDATION does. (A quick web Google search showed ca. 2,150 hits for TREPIDATIOUS, to 658 for TREPIDACIOUS, and Google asked about the latter if I meant the former. The site notes the latter spelling and suggests that the word should be spelled with a T "if at all" -- on which, see below.) In any case, from here on I'm referring to the item in question as trepidatious; spelling isn't the issue.

Fiske's entry declares sternly that trepidatious is "solecistic for fearful (and similar words)"; he offers uneasy and anxious as well as fearful. A bit of thesaurisizing for the noun trepidation provided the following alternatives to trepidatious: agitated, alarmed, anxious, apprehensive, dismayed, fearful, frightened, hesitant, reluctant, timid, uneasy. But none of these expresses the shade of meaning I want when I use trepidatious; I want the sense of trembling reluctance that trepidation conveys. Trepidatious is simply a more vivid adjective than all the alternatives (though apprehensive comes closest to the effect I want), certainly a better choice than the three blander options that Fiske provides. On the general principle that you should use the best word for your purposes, I choose trepidatious.

Ah, but Fiske doesn't allow me this choice. He asserts, with utter self-assurance and no qualification:

Trepidacious is not a word.

adding that "Trepidation, meaning fear or apprehension, is a word, as as trepid (the antonym of the more familiar intrepid), meaning timid or fearful." (Yeah, like I'm going to use trepid. Even Fiske doesn't go so far as to advise that I use trepid instead of trepidatious.)

I've been hearing this "not a word" bullshit since I was a kid, usually applied to non-standard ain't and taboo fuck (neither of which Fiske bothers to inveigh against, undoubtedly because they're so far beyond the pale). It mystified me then, and it angers me now. It's (literally) superhyperbolic, two steps of exaggeration beyond reality, and it's insulting.

First, the reality (and the insult): The admonition that people of taste and refinement should not use X. This is an expression of the admonisher's judgment about linguistic usages, couched as an injunction. It's insulting because the admonisher takes himself to be the arbiter of other people's behavior and brooks no objection that people of taste and refinement do in fact use X. The admonisher knows what's right; it's not a matter for discussion. Well, I'm a person of some taste and refinement (in the appropriate circumstances), and I use trepidatious. Stop telling me I'm a clumsy ignoramus.

A side issue here. I assume that Fiske objects to trepidatious because it's a recent innovation: "Even though people use it (horrible to hear, ridiculous to read though it is), no major dictionary, remarkably, has yet included trepidacious in its listing." Give them time, Fiske, give them time. The word has a lot going for it, beyond the fact that some careful writers -- like me -- use it. It's an instance of a small but significant pattern in English derivational morphology: words in -atious meaning 'inclined to -ation '. Ostentatious, flirtatious, disputatious, vexatious. Trepidatious is transparent, easily understood. It's a good thing to have. (Trepid, in contrast, is a dead loser.)

But back to superhyperbole. We start with the admonition that people of taste and refinement should not use X. This is then exaggerated, elevated to the admonition that people, in general, should not use X; what should govern the behavior of the "best" of us (those are genuine sneer quotes) in certain circumstances should govern the behavior of all of us, all of the time, in all contexts, for all purposes. (What a remarkable lack of nuance! What a divorcement from the complex textures of social life!)

As if that weren't enough, it ratchets up, hysterically, one more notch, to the bald assertion that X simply isn't available for use; it's just not part of the social repertoire. My dear, it just isn't done.

But if it truly isn't done, then there's no need for the admonitions.

Don't tell me there's "no such word". Parade your idiosyncratic prejudices, if you wish, and if your mind is open enough we might be able to talk about the bases of your prejudices (and mine). But don't lie to me about the state of the language.

Part 2. The coda. It turns out, contrary to widespread belief in certain circles, that Fiske is not entirely a write-only subscriber to ADS-L. On November 15, Fiske posted a brief message entitled "Arnold Zwicky et al. aside ...", which suggests that he had noticed my posting. There are two parts to this message: first, an excerpt from a review of DDE:

However curmudgeonly, Mr. Fiske betrays a bluff humanitarian spirit. ... [Fiske] wants to save [the English language]. And he knows that he can count on little help. Dictionaries "have virtually no standards, offer scant guidance, and advance only misunderstanding." His own flogging of Merriam-Webster's is one of the many pleasures of this lovely, sour, virtuous book. -- Erich Eichman in Wall Street Journal (Nov. 12)

And, second, a blurb, presumably in his own voice:

The Dictionary of Disagreeable English -- it's an annoying, amusing book.

Cute. He cops to "sour" and "annoying", but it's all in the service of the very salvation of the English language. Who could argue with that? And it's humanitarian labor to boot; I mean, the Wall Street Journal says so! The man must be not only a savior, but a saint, working so hard for human welfare and social reform.

Me, I'd be more than a bit trepidatious about having him at the helm of the Ministry of Language.

zwicky at-sign csli dot stanford dot edu

Posted by Arnold Zwicky at 07:46 PM

Birlashdirilmish yangi Turk alifbesi banned again?

The Latin alphabet is unconstitutional in the Russian Federation. Well, not exactly -- the cited basis for the decision by the Russian Constitutional Court was that "regional authorities have no jurisdiction over the alphabets of ethnic groups and peoples", but the effect is to uphold a "2002 federal law requiring that Cyrillic be applied to all alphabets used by people living in Russia", overruling an attempt by Tatarstan to use a Latin alphabet.

This is the newest chapter in a long story. I wonder if the recently banned alphabet is the 1927 version? (More news links here, here, here).

Posted by Mark Liberman at 10:19 AM

Wallraff subverts English syntax

Barbara Wallraff's Word Court feature in the back of The Atlantic magazine is usually sensible and interesting. But the November edition suggests a principle for avoiding ambiguity that would reduce the English language to a pathetic remnant.

It starts with R. H. Fanders, of Council Bluffs, Iowa, who writes to complain about sentences like 'I was better than her' and 'I was wondering if this time my dog did better than me'. Fanders proposes banning such phrases, on the grounds that "Than is a conjunction, never a preposition". Wallraff agrees about "the traditional view of the grammar of such sentences", but sensibly calls it into question:

Consider "She's the one than whom I was better." That is to say, "I was better than she"—so what's "than whom" doing there? "Than whom I was better" is grammatically equivalent to "I was better than whom," which is grammatically equivalent to "I was better than her." If you insist that than is a conjunction, "than whom" would have to be "than who." But I don't think any of today's authorities on language would make that "correction," and very few from the past 200 years would either. Sometimes, even in formal English (than whom sure ain't colloquial), than functions as a preposition.

This is the same argument that Ken Wilson gave in the Columbia Guide to Standard American English, as we discussed, in illustrated form, back in June. As that post also observed, following CGEL, the issue may be better framed as the difference between a "reduced clause" and an "immediate complement" following than, rather than as a distinction between than as preposition and than as conjunction.

But Wallraff continues:

The main reason not to welcome all prepositional uses of than, in my opinion, has to do with sentences like this one: "I like her better than him." That's clear, no? It means I prefer her to him. If we start allowing than to be either a preposition or a conjunction catch-as-catch-can, soon that example will become ambiguous: do I prefer her to him, or do I like her better than he does?

Um, wait a minute. In the first place, the syntactic analysis here doesn't make any sense to me (though I speak under correction, being a mere phonetician). Wallraff is apparently saying that "I like her better than him" must be the reduced-clause ("conjunction") version, i.e. elliptical for "I like her better than I like him", while the immediate-complement ("preposition") version would have to mean "I like her better than he likes her". Why? Consider "I saw her instead of him" -- this means "instead of me seeing him", not "instead of him seeing her", but there's no way that it's elliptical for "I saw her instead of I saw him". Instead of is clearly a prepositional construction, and it seems to have pretty much the same structure as better than in this case, at least as far as relevance to this argument is concerned.

And in the second place, since when does what prescriptive grammar "start[s] allowing" really have much effect on what "become[s] ambiguous"? This is surely a case where Norma Loquendi is in charge, not The Atlantic magazine, as excellent a publication as it surely is.

But I'm much more troubled by the larger implications of Wallraff's argument, which seems to be of the form

"Word X normally occurs in structure B as well as structure A. However, sometimes using word X in structure B creates an ambiguity. Therefore, we should not allow word X to be used in structure B in general, in order to avoid this ambiguity."

To start with, it would equally logical to forbid the other alternative -- in this case, to say that than should never be used with a reduced clause. But in the general case, accepting this form of argument would lay waste to English syntax. Consider the steps we would have to take in order to avoid the ambiguity of "flying planes can be dangerous", or "fruit flies like a banana".

More charitably, we might interpret Wallraff to be saying only that "we should avoid (genuinely confusing) ambiguity". This is harmless if somewhat banal. But I bet that R. H. Fanders et al. interpret her as confirming the 18th-century idea that than should not be allowed to be used with immediate complements, via an argument of the form sketched above. Luckily, no one is really going to try to implement her ambiguity-prevention proposal systematically.

[Note: in order to read Wallraff's Word Court piece on line, you need to be a subscriber to The Atlantic. But you should be, anyhow, so if you aren't, why not sign up?]


Posted by Mark Liberman at 07:52 AM

A riddle

Question: What do the 22 words food, gourmet, toilet, peace, hunger, family, development, ecotourism, information, tech, freedom, climate, comics, Buddhist, nanotechnology, economy, flu, finance, mediation, women, student and religious have in common?

Answer: If you ask Google about the pattern {"world * summit"}, your first 100 of 267,000 hits, in page rank order, include references to the World Food Summit, the World Gourmet Summit, the World Toilet Summit, the World Peace Summit, the World Hunger Summit, the World Family Summit, the World Development Summit, the World Ecotourism Summit, the World Information Summit, the World Tech Summit, the World Freedom Summit, the World Climate Summit, the World Comics Summit, the World Buddhist Summit, the World Nanotechnology Summit, the World Economy Summit, the World Flu Summit, the World Finance Summit, the World Mediation Summit, the World Women Summit, the World Student Summit, and the World Religious Summit.

One thing about this list: there's nothing about speech and language on it. Where's the world literacy summit, for example? That string gets one feeble hit, which I would call worse than nothing: apparently they held a World Literacy Summit and no one heard about it. Not me, anyhow. There have also apparently been two initiatives calling themselves the "world language summit", one of which gets three hits while the other gets one. I know pets with a better web presence than that. Let's compare the "world toilet summit", which gets 29,100 hits. The string "world * language summit" comes up empty, as do "world speech summit", "world speech * summit", "world ESL summit", "world EFL summit"; and don't even mention the idea of a "world X summit" for X=phonetics, morphology, syntax, semantics or pragmatics.

This is pathetic. I surmise that the whole "world thing summit" business is mainly a junketeering and networking opportunity, but if NGOs and others are going be spending their (=our) money on these gabfests, why not focus a couple of them on some of the very real issues connected with the activities that most humans spend more time on than anything else -- talking, listening, reading and writing?


Posted by Mark Liberman at 05:55 AM

November 16, 2004

More split references

Eric's recent post about strange reference problems reminds me of another context which seems that it encourages mid-stream pronoun switches: cooking shows. The following example is typical:

When you're mixing in the liquid ingredients, I always try to stir from the bottom to make sure they're well incorporated.

These "when you [X], I [Y]" statements, where "you" and "I" refer to the same person, are extremely common in instructional contexts. I suspect that they are generally a result of politeness: you need to tell people what to do, but you don't want to come off as bossy—so instead, you helpfully share how you personally would do it. This is similar to an indirect command, in which the command is put into an impersonal form, to avoid making direct orders. This is one step more removed, though: here, no desire for future actions is expressed at all, just a seemingly innocent statement about what the speaker usually does. We need a term for this phenomenon, in which the agent is changed to avoid using an imperative. I suggest the "passive aggressive voice."

Not all such examples can be attributed to politeness, however. Sometimes it seems to be a more general interchangeability of "generic you" and "generic I", as seen in the following quote from an interview with Grant Aleksander:

When you're asked to submit something, I always try to find an episode where the writing is good. Because even if you don't do a particularly good job with it, the writing makes you look better. I'd always rather have a well-written episode than anything else...

(Maybe he finds it hard to imagine that he personally would not have done a good job with an episode—but he could easily see it happening to others.)
In the following quote about National Good Manners Day*, Bridget from the USA can't settle on who she's talking about, either:

When you're a nice restaurant, I always try to have better manners than I have at home. Everyone should have manners to some extent!

Example of this are a bit hard to find on Google, but I hear them all the time. My very favorite example occurred in a commercial for one of those Fox "reality dating" shows (Survivor Island? Fantasy Millionaire?) One of the contestants confessed that

If you kiss Jason, it's a very intimate thing for me.

(meaning when she kisses Jason—not that she gets a kick out of watching him kiss others) In this case, it's not politeness, but modesty that prompts the switch. It would seem too kiss-and-tell to say "When I'm kissing Jason...", but of course there's a limit to how far you can continue in the second person (?? "If you kiss Jason, it's a very intimate thing for you"). I don't know why she didn't just stop after "thing", but there you have it.

* National Good Manners Day was celebrated in the UK on September 5, 2003. It was evidently not repeated. It should be noted, however, that September is now Children's Good Manner month in some parts of the U.S. [back]
Posted by Adam Albright at 10:54 PM

No word for robins

William Hovingh points out to me that John McCain (quoted in The New York Times, though you'll need to register to read it), has added an interesting variation to the "what Eskimos have words for" universe:

Particularly disturbing, he went on, is the rapid pace of [global] warming.

'The Inuit language for 10,000 years never had a word for robin,' he said, 'and now there are robins all over their villages.'

Tragic. How will they get along, with no word for it? They'll be singing (in one of the eight Eskimo languages): "When the red, red [embarrassed silence] comes bob, bob, bobbin' along, along..."

Perhaps the first serious thing to say about this nonsense is that, as Cameron Majidi points out to me, English also lacked a word for the bird in question, which is a variety of thrush (it has the unpleasant-sounding Latin name Turdus migratorius). The North American bird now known as the robin has nothing in common with the very differently colored and sized bird called a robin in Britain. Settlers in North America had no word for the russet-chested thrush-sized bird they were faced with, so they ignored the issue of accurate species classification and just used the word robin for it. Senator McCain's researchers probably gave no thought to the possibility that the Inuit might also call this bird "robin", borrowing the word from English, just as English speakers call a house built of snow blocks an "igloo", borrowing from the Inuit.

People overlook such obvious possibilities because the bafflingly unstoppable human drive to be fascinated by what-words-do-they-have questions (regardless of any lack of data) tends to blind people to the simple fact that loanwords, coinages, and other additions to the language make that issue almost totally uninteresting.

Nor have the staffers who prepared this Eskimological shaft of wisdom for the Senator looked in an Eskimo dictionary. The Comparative Eskimo Dictionary lists two or three terms in several of the Eskimo languages that would cover small birds such as thrushes, with rather indeterminate species denotation (when you hunt in the Arctic you aren't necessarily all that interested in the exact species classification of an uneatable thrush or sparrow weighing about two ounces). Those words would do. And with the astounding word-compounding techniques that Eskimo languages have, they could build a word for "red-breasted small brown bird" in a second. That's the key thing about the Eskimo languages that laypeople don't grasp: they don't need a whole lot of basic unrelated roots for different sorts of thing (snow or robins or anything else), because they can manufacture them on the fly in an instant.

One more nerdy and dyspeptic linguistic note is that McCain's research staffers are considerably overestimating the age of the Eskimo language family, which contains the Yup'ik languages of Siberia and Alaska and the Inuit or Inuktittut languages of Canada and Greenland. The Yup'ik started moving down into southern Alaska, and the Inuit groups started moving into northern Alaska heading eastward, only about a thousand years ago (both groups have seen various multi-year warming and cooling trends during the intervening years). The whole Eskimo (Yup'ik-Inuit) family is only about two thousand years old. In fact it's only been about four thousand years since Proto-Eskimo-Aleut (the ancestor language for the Eskimo and Aleut language families) was spoken. Before that, the speakers of the even earlier ancestor languages were somewhere in eastern Asia, and ten thousand years ago, for all I know, there may have been robins bobbin' along all over the villages of the pre-Proto-Eskimo-Aleut peoples on the Asian continent, named with some robin-denoting word that the migrants to the Americas then forgot.

Remember to take all your words with you when you move to another continent; you never know.

[Drafted November 16; revised November 17 and 18,2004.]

Posted by Geoffrey K. Pullum at 08:51 PM

I'm starting to get like "this is really interesting"

Neal Whitman at Literal Minded posted some interesting suggestions about possible future (?) developments in the seems, like, go, all saga (previous links here, here, here, here):

She's becoming like, "What's taking them so long?"
Whenever you get mad, you get like, "I don't care what anyone thinks!"
"I really have a case," is what he seems like.
"Get outta here!" That's what he was like.
"I don't care what anybody thinks," is what you get like.

I added the question mark because I suspect that many if not all of these are already out there, though I don't have time for a thorough search.

[Update: a quick internet search turned up these examples of quotative get:

He tricks me that way. I always fall for it too... I'm starting to get like 'yeah right' but I'm also too scared to say yeah right cos what if he did actually get the bad mark he's hinting at? O_O how am i ever to know?
savvy now i'm starting to get like $%&? ! there's no more! ⟨wich actually means i'm liking it⟩
But, she got like all, "I'm all grown up now... Hehe, I'm gonna go wear skin colored clothing."
Okay, so I thought he was kidding around, becasue, to me, that sounds like kidding around, and then he got all like 'why would i be kidding?'
K, before you get all like "how come Sarah never updates anymore?" Its because Sarah's internet is never on anymore. But it is on now.

I suspect that at least some of these uses are documented in the bibliography that Arnold Zwicky cited]

And John Lawler emailed a long meditation on the subject, full of fascinating ideas that I haven't had time to think through:

Today's post on seem like, go, all reminded me that I'd been meaning to write you about your earlier post on seems like/that. You point out two aberrant constructions with seem; one you call

'a syntactic generalization of Pattern 5
[NounPhrase seems like Sentence[Pro]
e.g, He seems like I can trust him]
from seem like to seem that'.

That may be a good description (who knows, even an explanation), but there's more than one iron in the fire.

You also note that all examples of this construction have the coreferential pronoun in subject position; i.e, exactly where it would be if this were an infinitive complement instead of a that complement. As we all know, with an infinitive complement, seem obligatorily A-Raises, producing an unexceptionable sentence like

(1) Strangely enough, he seems to be much happier here.

instead of the quite exceptionable

(2) ?Strangely enough, he seems that he is much happier here.

which apparently means exactly the same thing as (1). So one can view this equally well as a switch from an infinitive to a that complement. Why would one do this? Well, two of the sentences you adduce from Google suggest a reason:

(3) ?I met with a new client who seems that he might be difficult.

(4) ?She seems that she can fit the part.

Unlike (2), these complements contain modals and therefore can't be infinitives. But it seems like they want to A-Raise anyway, thus producing the usual invited inference of Raising that the conclusion stated in the complement of seem is generated by observation of the subject, instead of the impersonal conclusion that would be produced by Extraposing them (which is supposed to be obligatory with that complements of seem):

(5) I met with a new client who it seems (*that) might be difficult.

(6) It seems (that) she can fit the part.

What I think we have here, in short, is the equivalent of Raising from a that complement, but with a resumptive pronoun in subject position. This produces a slightly bad sentence, but it's only a venial sin compared to the mess real copy+delete Raising would leave:

(7) *I met with a new client who seems that might be difficult.

(8) *She seems that can fit the part.

This may be affected by the seem like construction, as you suggest – indeed, the seem like construction itself may be an earlier case of the same thing – but it seems to me to be a nifty workaround to a bug in English syntax: extending a rule to another context, with a patch to improve its efficiency. Not unlike what we do when we see a violation of a Ross constraint looming at the end of the sentence, like:

(9) ?That's the book that Bill married the woman who illustrated it.

(10) *That's the book that Bill married the woman who illustrated.

Indeed, this may not be a novel construction at all. The OED gives, for instance, this bilingual citation of seem; note that the Latin translates as 'they are seen to be able':

1627 Hakewill Apol. i. ii. 17
Possunt, quia posse videntur. They can, because they seeme they can.

The other construction you characterized as 'a subtle semantic shift in the meaning of seem [like], from something like "to give the impression of being" to something like "to give the impression of believing". Examples:

(11) ... she seems like I'm stopping her from doing something.

(12) you seem like he doesn't please you.

Seem, it seems, has always had a sense equivalent to think or believe. This is the sense that appeared with clitic dative subject in meseems:

1876 Morris Sigurd iii. 182
But meseems that the earth is lovely and each day springeth anew.

which was effectively synonymous (and constructionally equivalent) to methinks:

1831 Lamb Elia Ser. ii. Shade of Elliston,
Methinks I hear the old boatman,..with raucid voice, bawling 'Sculls'.

In 20th-Century English, of course, think and seem have long gone their separate syntactic ways, think as an ordinary experiencer-subject transitive active verb, and seem as a rather odd A-Raising or Extraposing intransitive stative, with experiencer expressed, if at all, in a to phrase.

This construction seems to me to be a shift from ordinary modern seem back to the seem of meseems, except it's not impersonal third person; instead it seems to make use of the same invited inference that A-Raised seem subjects have – i.e, in (11) the speaker is inferring her beliefs from observing her, and in (12) your beliefs from observing you. Using like, it seems, makes it a whole new construction, with affordances and prohibitions not yet carved in stone.

God, syntax is fun!

Apologies to those of you to whom phrases like " A-Raised seem" are somewhat opaque. As a phonetician rather than a syntactician myself, I sympathize. If I can find a few minutes at some point over the next few days, I'll try to explain; or maybe someone more qualified than I am will step in. I hope at least that you see from the examples that something interesting (and yes, fun!) is going on here; and jargon aside, syntacticians do have quite a bit of insight to offer about the nature and relationships of the patterns involved.


Posted by Mark Liberman at 09:40 AM

Rumsfeld overnegates Powell, Powell uses "fulsome" correctly

As the cabinet is reconfigured for George Bush's second term, the Secretary of Defense (who is staying) and the Secretary of State (who is not) have each made an interestingly ambiguous remark, featured prominently in stories about the changes.

On Nov. 16, the American Armed Forces Information Service released an article by Donna Miles, under the headline "Rumsfeld Praises Powell, Expects No Major Policy Shifts", whose lede runs as follows:

Defense Secretary Donald H. Rumsfeld said today he has "thoroughly enjoyed" working with Secretary of State Colin Powell and "will miss not working closely with him" after Powell steps down from his post. [emphasis added]

As we've frequently documented, overnegations are easy to fail to miss. And this one -- "miss not VERBing ..." in the meaning "feel the lack or loss of VERBing ..." rather than "feel the lack or loss of not VERBing" -- has become a widespread (if not universally accepted) idiom. I checked 10 instances at random from the first 150 produced by a Google search for "miss not", and found that 8 of them were overnegations:

I will miss not being able to communicate with you, and I will miss not being Editor of The American Journal of Sports Medicine.
Well, again. it's not my favorite track, but I'll miss not being there for the fans. ... I'll miss not being there for sure.
I guess I miss not getting to throw my two cents in, so can we get some opinionated people to start writing again ??
And lots of times i find i miss not having them around to remenis about some of the fun times and such. but that's just how it is. .
So... I know you'll miss not having me around to post stories that you don't want to read and argue with you... but figured I'd let you know.
My dear Yoshi, I will miss not having you run up to me with your nose wiggling in the air.
But for now, let's just say that there are some things that I had, and remember, and miss not having now.
She will miss not having access to radio and TV during the week of RTÉ Charity 252.

versus 2 that were not:

As a native hillbilly, I miss it all too much! I miss not getting lost in the crowd. I miss not getting looked at as a dollar sign.
They never came home, but I didn't have to deal with the things that they did. I miss not caring at all. I miss not worrying.

So based on this linguistic a priori, there's roughly an 80% chance that Rumsfeld meant that he'll feel the lack or loss of working closely with Powell, and a 20% chance that he'll feel the lack or loss of not doing so. Unless, of course, he was intentionally communicating a mixed state.

According to the Nov. 15 NYT article on Colin Powell's resignation as U.S. Secretary of State (written by David Stout and Mark J. Prendergast)

"In recent weeks and months, President Bush and I have talked about foreign policy and we've talked about what to do at the end of the first term," Mr. Powell said. "After we had had a chance to have good and fulsome discussions on it, we came to mutual agreement that it would be appropriate for me to leave at this time."

Like many others, I was taken severely to task in secondary school for using fulsome to mean "copious or abundant" instead of "offensive to good taste", but apparently my teachers were wrong on the etymological facts. [Update: as is the all-too-predictable Language Log straight man William Safire, who upbraids Powell for straying from "the grammatical strait and narrow" on this point. This ain't grammar, Bill, it's lexicography; and you could look it up. Vide infra] So says the American Heritage Dictionary, and the OED agrees, giving the first sense as "Characterized by abundance, possessing or affording copious supply; abundant, plentiful, full", with citations back to 1250:

c1250 Gen. & Ex. 2153 Ðe .vii. fulsum ʒeres faren.
a1412 LYDG. Lyfe our Ladye (Caxton) Av, For alwey God gaf hyr to her presence So fulsom lyght of heuenly influence.

and continuing through more recent times (though with some self-consciousness):

1868 HELPS Realmah II. xi. 80 My complaint of the this -- that there is too much of everything..and so I could go on enumerating..all the things which are too full in this fulsome world. I use fulsome in the original sense.

Still, those who have been indocrinated by the common prescriptive view may briefly wonder whether Powell might have meant to suggest that his discussions were fulsome in OED sense 3b "Having a sickly or sickening taste; tending to cause nausea", or 3c "Cloying, satiating, wearisome from excess or repetition", or sense 5 "Offensive to the senses generally...", or sense 6 "Offensive to normal tastes or sensibilities; exciting aversion or repugnance; disgusting, repulsive, odious.", or sense 7 "Of language, style, behaviour, etc.: Offensive to good taste; esp. offending from excess or want of measure or from being ‘over-done’. Now chiefly used in reference to gross or excessive flattery, over-demonstrative affection, or the like". I won't even mention sense 6b, "Morally foul, filthy, obscene", because I'm confident that Secretary Powell meant only that his discussions with President Bush were "copious and abundant" (though see this article by Christopher Hitchens for some evidence of a "sense of dankness and exhaustion" that might suggest the other interpretations to some people).

Both Rumsfeld and Powell chose words that will engender some amused speculation about alternative, subversive interpretations of their sentiments. These alternatives are eminently deniable. The intended primary meanings, though in both cases slightly non-standard, are so common that the alternatives don't even need to be denied explicitly. But another choice of words would have denied us our little joke -- and theirs?

[Rumsfeld quote tip from Nick Montfort.]

[Note: I originally misclassified one of the 10 "miss not" samples, as an alert reader (Skevos Mavros) pointed out to me. I don't have any real excuse -- I just blew it -- but the fact that this is so easy to do when you're in a rush helps to make the point that the "miss not VERBing" == "miss VERBing" equivalence has become really idiomatic.]


Posted by Mark Liberman at 06:55 AM

November 15, 2004

I'm like, all into this stuff

Mark Liberman's just posted on quotative (and other uses of) like, go, and all. As it happens, Stanford has a modest department project going on innovative uses of all, and I can offer some bibliography on quotatives.

The project, which got underway this fall, was organized by John Rickford in collaboration with Isa Buchstaller, Elizabeth Traugott, and Tom Wasow, and with the participation of a large group of undergraduates, graduate students, and faculty members. You can access Buchstaller's October 2004 bibliography on quotatives: social and linguistic factors and grammaticalization here.

Notable entries on this bibliography are Buchstaller's 2004 Edinburgh Ph.D. dissertation, The Sociolinguistic Constraints on the Quotative System -- US English and British English Compared, which treats like and go, in contrast to one another and to older quotatives like say; what we believe to be the only published treatment so far of innovative all (both "specifier" uses, as in the title of this posting, and quotative uses), Rachelle Waksler's "A new all in conversation" (American Speech 76.2.128-38 (2001)); and, remarkably, a 1990 (yes, 1990) Stanford undergraduate honors thesis by Ann Wimmer (directed by Rickford). We would of course welcome additions to this bibliography; send mail to Buchstaller (ibuch at-sign stanford dot edu) or Rickford (rickford at-sign csli dot stanford dot edu).

We are well situated for the study of quotative all, since the Bay Area seems too have been its birthplace. But it is spreading fast, and we also welcome sightings (or, more likely, hearings) from other places. As usual, we beg you to supply as much relevant context, linguistic and otherwise, as you can.

zwicky at-sign csli dot stanford dot edu

Posted by Arnold Zwicky at 09:19 PM

Another quiz!

This quiz business came in over the transom, as the saying goes, with an email from Stefano Taschini that resulted in the first Language Log "guess the language" challenge (answer here). A couple of weeks later, I made up another one myself (answer here). Both quizzes were surprisingly popular, perhaps tapping the psychic wellsprings that make "examination" one of the four universal dreams.

I had in mind to do continue the series about once a month, but yesterday the indefatigable Stefano sent in another audio sample. So I'll put it out there, but this time, I'm not going to post the answer for a week or so. This is delay is mainly for me -- I won't have time to make my own analysis and guess for several days, at best!

There are three mp3 clips, here, here and here.

[P.S. the other three universal dreams, I learned last night from Salman Akhktar, are falling, flying, and having your teeth fall out.]

[Note that these sound clips are significantly harder than the earlier ones, due to rapid speech and poor SNR. That's what speech is like, out there in the real world, but it's tough to transcribe such material in a language you don't know. I'll see if I can get a sample with higher sound quality (i.e. an easier sample) and post it later on.]

[Update #2: more clips are not available, but Stefano suggests using the "Amazing Slow Downer" (non-free software) or something similar to overcome the problem of rapid speech -- though of course rapid speech slowed down is by no means the same as slower speech, it's still cognitively less taxing to attend to phonetically. Slowed-down versions of the three clips are available here, here, here.

As before, you may find it helpful to use the free software programs audacity, wavesurfer, and/or praat for careful listening, format modification (including speed changes), transcription, analysis of various sorts, and so on.]

Posted by Mark Liberman at 10:48 AM

Seems like, go, all

Email arrived this morning from Maryellen MacDonald, responding to last week's post on seems like.

I was thinking that your novel seems like examples must be related to quotative like as I was reading your post (though I didn't know that this is what they were called until you told me). Then I got thinking about the other quotative tems go and all, as in:

And then I go, "Get out!"
And he was all, "Whoa!"

My brief thinking about this led me to believe that be+all and be+like are pretty interchangeable as quotatives, but not in the "seems" constructions. "All" admits a very narrow usage with seems (as in He seems all anxious and stuff.), basically meaning "completely," but it's never had any of the usages that you discuss for "seems like". Thus the new usages for "seems like" that you note seem to stem from the combination of a) the previous "seems like" usages and b) the quotative usage. The quotative usage by itself isn't enough, otherwise you'd see things like: He seems all I really have a case.
The closest I found (though not looking exhaustively) was this theory of mind piece, though not quite the right usage:

one minute he seems all I like you and the next I'm just out of the picture.

I realize you were suggesting that the quotative usage of "like" was allowing its extension out of preexisting "seems like" constructions. I'm like all agreeing and stuff, I guess.

Part of what's going on here, I think, is that there are two possible sources for "like" in such cases. We noted this ambiguity in connection with God's late 2003 revelation to Pat Robertson about the outcome of the recent election.

One kind of like is used to introduce clauses or noun phrases, roughly in the way that words like after do:

He ran like the devil was chasing him.
He ran like a deer.

while the other kind of like is a particle that can be inserted almost anywhere in any phrase, without modifying the syntactic relationships of the words around it:

"... her and her, like, five buddies did, like, paint their hair a really fake-looking, like, purple color."

[Example from Muffy Siegel's paper "Like: The Discourse Particle and Semantics" (J. of Semantics 19(1), Feb. 2002)].

I suspect that a careful study of the phenomena (which I have certainly not made) would reveal that these alternative construals are playing a role in the on-going changes in various uses of like, including quotative "be like". As Maryellen suggests, it's also likely that the same ambiguity is playing a role in changes in some related constructions in which like does not always appear. If you analyze "he seems like S" as a case of like-the-particle, then "he seems S" ought to be an alternative. Since this is a perfectly sensible English syntactic frame (e.g. "he said S"), why not go with it? And "he seems S" would logically be an alternative form of "he seems that S". As I explained in last Thursday's post, I haven't gone down this road myself, but I can see how someone might.

One of the interesting questions here is how these alternatives work psychologically. For someone (like me) who uses both kinds of like, does the planning and execution of a given example necessarily fall into one category or the other, or am I sometimes (or always) in a mixed state, kind of like electrons in the Copenhagen interpretation of quantum mechanics?

[Update: Maryellen responded:

The dominant view of language production is that lexical items can be in your "mixed" state, that is with several alternatives partially activated and competing for selection in the utterance, but possible syntactic structures are never in a state of partial activation--speakers are thought to develop only one as a function of the lexical items that have become activated (among other things). Thus on that view, the answer to your question depends on whether you're referring to like the word or the various syntactic structures it might appear in.

I don't happen to accept this syntax/lexical dichotomy and think that there's good evidence for partial activation of both alternative structures and words during production, but since you asked a psycholinguistic question, I wanted to be sure to include the mainstream view.

I agree with her -- and it seems to me that the "mainstream" view is not entirely coherent. In nearly all cases, different lexical items carry with them different syntactic as well as semantic structures. So how can alternative lexical options be simultaneously activated without alternative structural options also being in play? You could invent a model in which collateral inhibition operates strongly among structures but not among words, but this would imply that structures are never intrinsically lexicalized. Which seems wrong.

(Apologies to general readers for the the allusive "inside baseball" musings here, and to psycholinguists for unlicensed and ill-informed encroachment on their territory... As always, the Language Log Marketing Department stands ready to refund your subscription fees, cheerfully and in full.)]


Posted by Mark Liberman at 08:27 AM

November 14, 2004

Localizing Computer Software

The availability of computer interfaces in local languages is, as Mark notes, hardly a major factor in language endangerment, but it arguably does affect people's use of computers. Sure, technical people throughout the world can generally deal with a computer interface in English, French, or another international language. However, in many countries there are large numbers of people who do not know international languages but would benefit from being able to use computers. These people may not have much education or technical skills, but they still want to write, to send email, to keep accounts and other records and to obtain information. Even if most of the information on the web is not in a language they know, computers can greatly facilitate dealing with a poorly known language. My life sure would have been easier when I was learning Japanese if things like this Japanese reading tool had been available.

Even if you can handle English or French, if you use a computer a lot, you may find it annoying to have to deal constantly with a foreign language. If you're trying to figure out how to do something, you may not really want to have to decipher error messages and documentation in another language. In countries that are trying to modernize business practices and build up their technological infrastructure, in which computers are still unfamiliar to many people, having computer interfaces in local languages makes things that much easier.

There's also a subjective factor here. For many people, being able to do their work in their own language is a matter of pride, especially when the alternative is a language that they associate with colonialism.

For all these reasons, there is a good deal of activity around the world in localizing operating systems and other major pieces of software. One such project is the Simputer, a low-cost computer with Indian language support, which I wrote about some time ago. There is a lot of activity in South Africa right now. In fact, on August 28th (Software Freedom Day), announced localizations of, the FLOSS office suite, for Zulu, Sepedi, and Afrikaans. Here's Duane Bailey's keynote speech. Linux is also being localized for Swahili by the Kilinux project. On October 18th, they announced the first build of a partially localized version of OpenOffice. There's also a project afoot to create a localization of Linux for Dzongka, the main language of Bhutan. Microsoft, which is driven primarily by profit rather than local pride or geek interest, has often been criticized for not localizing their software when they didn't see enough profit in it. For instance, it is reported that Microsoft only produced Welsh versions of its software after Welsh versions of Linux and other FLOSS software appeared. They must think that there is a market for localization in some less-than-international languages. They have announced plans to provide versions of Microsoft Windows and Microsoft Office in Quechua, the language of the Incas.

Posted by Bill Poser at 01:40 AM

November 13, 2004

African language computer farrago

There's a curious article by Marc Lacey in the NYT today, under the headline "Using a New Language in Africa to Save Dying Ones". The article reads if a few raw notes about computer technology in Africa had been mixed up together, dumped out in random order, and strung together as if they told a coherent story.

The article starts by asserting that there are problems using African languages on computers, though it never explains clearly what these problems are:

Swahili speakers wishing to use a "kompyuta" - as computer is rendered in Swahili - have been out of luck when it comes to communicating in their tongue. Computers, no matter how bulky their hard drives or sophisticated their software packages, have not yet mastered Swahili or hundreds of other indigenous African languages.

But that may soon change. Across the continent, linguists are working with experts in information technology to make computers more accessible to Africans who happen not to know English, French or the other major languages that have been programmed into the world's desktops.

The article goes on to tangle up the problem of preserving dying languages with the problem of facilitating computer use by the speakers of some very lively ones:

But the campaign to Africanize cyberspace is not all about the bottom line. There are hundreds of languages in Africa - some spoken only by a few dozen elders - and they are dying out at an alarming rate. The continent's linguists see the computer as one important way of saving them. Unesco estimates that 90 percent of the world's 6,000 languages are not represented on the Internet, and that one language is disappearing somewhere around the world every two weeks.

"Technology can overrun these languages and entrench Anglophone imperialism," said Tunde Adegbola, a Nigerian computer scientist and linguist who is working to preserve Yoruba, a West African language spoken by millions of people in western Nigeria as well as in Cameroon and Niger. "But if we act, we can use technology to preserve these so-called minority languages."

This is a bizarre transition. Yoruba is very much alive, with 25 to 30 million native speakers, almost as many as Polish and nearly four times as many as Swedish. There is a lively Yoruba-language publishing and broadcasting industry, and widespread use in schools. Within the Yoruba-speaking area, children normally grow up speaking Yoruba, and the same is true among hundreds of thousands if not millions of Yoruba speakers in other countries. So it's weird to juxtapose Yoruba with Unesco's (valid) concern about dying languages, as if it were an example. There are plenty of endangered languages among the 505 that Ethnologue lists in Nigeria, but Yoruba is not one of them.

Using Yoruba on the computer can certainly still be a problem. The orthography requires both accents above and dots below certain letters, and getting this rendered correctly on the web without special fonts remains a bit chancy. And because a variety of non-standard 8-bit fonts remain in use, dealing with Yoruba manuscripts (or even Yoruba examples in linguistics papers) remains annoyingly difficult. Adegbola's efforts are certainly needed. But the article doesn't mention these issues; instead it asserts (falsely) that "Different Yoruba words are written the same way using the Latin alphabet - the tones that differentiate them are indicated by extra punctuation". Actually, the tones are indicated by (acute and grave) accents (as in the name of the language, Yorùbá, whose tones are mid-low-high). I've gotten over being shocked when a major publication like the New York Times assigns a story to a reporter who lacks the most elementary linguistic knowledge relevant to it -- but really, would it be too much to ask to keep the difference between accents and marks of punctuation straight?

I guess it's possible that the reporter does know the difference, and is writing about the use of single quote and back quote as a method for keyboarding acute and grave accents; but if that's it, why not say so, and give an example? Like "In entering Yoruba on the computer today, people often hit an extra key to add a tone mark, for example typing a' to get á."

Another possible issue that is implicit in the article but never brought out directly is the question of localizing help files, dialogue boxes, interface legends, and so on. This is the only thing that can possibly be at issue for (text-based) use of Swahili, which "computers... have not yet mastered", according to the article. While localization of prompts and such is certainly a good thing to do, I'm very skeptical that it is a major barrier to wider use of computer technicology among Africans. At present most literate Africans can read English or French. Perhaps this should change -- though I believe that the people whose education would be affected by this choice would object very strongly, and I would agree with them, since literacy in one of the major international languages is an essential educational tool. In any case, at the moment, anyone in Africa who is likely to be using a computer to create a document or send email can almost certainly read interface text in English or French without much trouble.

Lacey (the article's author) does start out by talking about "[making] computers more accessible to Africans who happen not to know English, French or the other major languages that have been programmed into the world's desktops". So he may have in mind facilitating a new kind of computer-mediated literacy training among those who don't know English or French. Or maybe he's thinking about bringing interaction with networked computers to people who are not literate at all, using images and speech technology. Those are both interesting ideas, but it's odd to write as if the way to to accomplish such things is to put African languages on an equal footing with English or French in the use of Microsoft Office. Mix in references to endangered languages, text messaging in Amharic, machine translation among English, Afrikaans and Sotho, problems of borrowed vs. created technical vocabulary; stir well; and bake till done.

The ingredients here include preservation and documentation of Africa's hundreds of endangered languages; full localization of software for Africa's dozens of large local languages; methods for input, display and editing of Africa's many orthographies that require (simple forms of) complex rendering; the role of computers in promoting literacy in local languages; language standardization and the development of technical vocabulary; linguistic nationalism among the languages within African countries, nearly all of which are multilingual; and the relationship of Africans to the major international languages, which in Africa mainly means English and French, though Arabic is also relevant in some areas. These are all important problems, with subtle and complicated relationships among themselves and with other economic, political and technical questions. There are analogous issues in most other areas of the world. I hope that this article means that the NYT editors have developed an interest in these questions, and will continue the discussion in a more careful way at some point in the future.


Posted by Mark Liberman at 06:42 PM

A journalist's perspective on (bias in) media citations

Last Thursday, just before Geoff Pullum checked up on Mark Bauerlein and Geoff Nunberg warned about biased studies of bias, I got an interesting email in reference to my 10/31 post on the rhetoric of citation. I had described the mathematical model used by Groseclose and Milyo in their widely-referenced study, "A Measure of Media Bias", and complained that "as I read G&M's model, it says that we derive maximum 'utility' from citing the most extreme sources whose political polarity is the same as ours". I used a (small, unscientific) sample of blog entries citing Karl Marx to suggest that this is not always an accurate way to predict who cites whom: forcing everyone artificially onto a one-dimensional political spectrum, I found that citations of Marx were roughly twice as likely to be from right-wing blogs as left-wing ones.

The note was from a writer at The Economist, and began like this:

Just discovered the wonderful Language Log blog. I thought that a lot of what you said in "Marx: Red or Blue" rings true--people do very often cite those they disagree with. But the study you mention deals only with Congressmen/women and the media, and the phenomenon being observed is a very specific one: reference to think-tanks. These are of course not representative of the speech community as a whole.

Members of Congress, it seems to me without doing any research, very rarely cite opposing think-tanks. They often try to sneak by with no attribution at all - e.g. a Republican criticizing Kerry's health-care plan may say "A recent study found that it would cost twice as much as he says..." when the data come from Heritage or the AEI. If they did cite, they would surely cite their own guys--liberals would use Brookings, Republicans Cato, AEI or Heritage. It's very hard for me to imagine Ted Kennedy citing AEI or Trent Lott.

As for media outlets (I'm a reporter), you're right that opinion writers (which bloggers more closely resemble) will cite their opponents to dismiss them. But "straight" reporters will cite evidence they believe to be credible. And what they believe to be credible will (it seems likely) subtly be affected by their own viewpoint.

A conservative friend points out that the oft-cited Brookings is rarely called "a liberal think-tank" by reporters--because they don't think it's liberal, though conservatives do. AEI and Heritage are, by contrast, almost always flagged with a cautionary label like "conservative". A quick-and-dirty google news search confirms: of 3350 pages mentioning Brookings, only 322 also contain the word "liberal" (under 10%). The comparative figures for Heritage are 1420 mentions, of which 810 include the word "conservative" (57%). For the AEI the numbers are 1350 and 509 (38%).

So, in sum, I think the Groseclose and Milyo methodology you criticize might actually hold up fairly well in the specific instance of congressmen, reporters and think-tank citations.

I responded (in part)

I admit that I don't know what the distribution of affinities was between reporters and sources in the cases they covered [in the data for the G&M study]. My point was just that it's far from obvious that the mathematical form of their model corresponds to the psychological and rhetorical reality of the citation process.

and sent him links to Geoff Nunberg's earlier writings on media bias. He responded:

On reflection, I'll simplify my response to your post: the citation habits of reporters are different from the citation habits of bloggers or academics, so I'm not sure your Technorati search invalidates G and M's methodology. Bloggers and academics argue, and explicitly cite opposing evidence to knock it down. Journalists, who if they're lucky get 1000 words to make a point (and TV gets far fewer), are more likely to cite a single "trustworthy" source, and I'd not be surprised if most of those aren't left-of-center.

I'll accept his expert opinion about journalistic constraints and their consequences for journalistic rhetoric -- though we still don't really know what G&M's data looked like, in this respect. For example, my point about the many citations of Geoff Pullum's Dan Brown critique by Catholic writers (here, scroll down to the bottom of the post) still stands, as an example that is consistent with Lane's picture of what journalists do, but which would tend by G&M's methodology to identify Geoff (inappropriately) as a Catholic writer.

My correspondent added:

thanks for the Nunberg links - am enjoying digesting with my breakfast

which echoes my own opinion that it's a special pleasure to be able to follow the back-and-forth on such issues through a linked set of magazine articles, blog entries and so on.


Posted by Mark Liberman at 09:43 AM

November 12, 2004

The "liberal professoriate" -- not so fast

Geoff Pullum is right to take Mark Bauerlein to task for claiming in an article in the Chronicle of Higher Education that the phrase right-wing think tanks is "always" qualified by well-funded, when a more accurate statement would have been "qualified by well-funded 0.3 percent of the time." But Geoff is much too credulous when it comes to evaluating Bauerlein's claim that studies have shown that "campuses are havens for left-leaning activists." When somebody on the right cites a study that "proves" the existence of liberal bias in the media or the academy, it's a good idea to keep your hand on your wallet.

The study that Bauerlein cites, for example, was supervised by the American  Enterprise Insitute's William Zinsmeister, who sent volunteers to boards of elections to search out the voter registrations of college faculty.  The result, as reported by Bauerlein, was that "More than nine out of 10 professors belonged to the Democratic or Green party, an imbalance that contradicted many liberal academics' protestations that diversity and pluralism abound in higher education." But as Martin Plissner showed in an article in The American Prospect, the study's methods were highly questionable:

In the University of Texas sample, for example, 28 of the 94 teachers came from women's studies -- not exactly a highlight of any school's core curriculum or a likely cross section of its faculty. At the same time, none of the 94 was from the university's huge schools of engineering, business, law or medicine -- or from any of the sciences. At Cornell University, it's the same story: 166 L's [Democrats or Greens] by the AE bar graphs, and only 6 R's. But not one faculty member in the entire sample taught in the engineering, business, medicine or law schools, or in any of the sciences. Thirty-three, on the other hand, were in women's studies -- more than any subject, save for English.

It goes on: at UCLA, more than half of the faculty sample studied was drawn from just two departments, history and women's studies, and none of it was drawn from the faculties of business, law, engineering, medicine, or any of the sciences.

You could make the same point about a widely publicized 1998 study of voter affiliations at the University of Colorado done by the Rocky Mountain News' Bill Scanlon. It purported to show that the humanities and social-science departments at the university were "a one-party monopoly." But Scanlon's sample was also cherry-picked. He looked at the party affiliations of professors of political science but not of economics; he included the education school but not the business faculty (this at a university where business is a popular undergraduate major). And he looked at no one in the law school or in departments in the sciences or engineering. But the study was described by David Horowitz as showing that "93.6% of the faculty at Colorado University (Boulder)...who registered in political primaries were Democrats," with no qualification.

Is the American professoriate predominantly liberal in its political orientation? Very probably, like the professional class in general. And that point of view is particularly prevalent in certain humanities social-science departments. But the overall disproportion is nowhere near as dramatic as these shoddy and dishonest studies make it out to be, particularly when you move away from the large research institutions -- usually in blue states -- where the right has concentrated its attentions, and where most of us LanguageLoggers are fortunate enough to be located.

The right can argue that the political orientation of professors in the humanities and social sciences is more important than that of other departments, since those fields often deal with political questions as subject matter. But it's hard to see why there's a greater threat in having mostly liberals teaching Hegel, Beowulf or Dante than in having mostly conservatives teaching economic theory. And you wonder whether Bauerlein, David Horowitz et al. would be willing to extend their calls for ideological balance to business faculties, say, who are shaping the way the next generation of people who hold real power in America will be thinking about matters like corporate accountability.

Then, too, the predominance of liberals in humanities and many social-science departments has a lot less effect on overall student attitudes than the right likes to pretend. As a recent study from the National Center for Education Statistics shows, more than 71 percent of American undergraduates choose "career majors" like computer science, accounting, and business, while only about a quarter choose "academic majors" in fields like humanities and social science. In 1999, just 5 percent of undergraduates were majoring in English literature or humanities, against 19 percent who chose business and marketing and almost 12 percent who chose computer science or engineering. If the small proportion of students who major in humanities tend to have a liberal point of view, that's chiefly a matter of self-selection, rather than indoctrination -- students with conservative views tend not to view medieval studies as a career track.

More to the point, these studies assume an inescapable connection between having a point of view and having a bias: a  historian with politically liberal views can't possibly give an objective, even-handed account of free trade or the history of race relations in America. That's a convenient assumption for people like Bauerlein, particularly if they want to take it as a justification for trumping up the evidence for their own side.

Posted by Geoff Nunberg at 01:01 PM

November 11, 2004

Always accompanied by the qualifier

Professors on American campuses tend overwhelmingly to be liberals, says Mark Bauerlein (a not-so-liberal professor of English at Emory University), in an interesting critique of the politically monochromatic character of American campus opinion. And although liberal academics will acknowledge that some conservative intellectual work is done at think-tanks, he opines, they nonetheless believe that such work is inherently corrupt: "The Heritage Foundation, the American Enterprise Institute, the Manhattan Institute, and the Hoover Institution all have corporate sponsors, they note, and fellows in residence do their bidding." Well, maybe sometimes as we liberal professors struggle through our teaching week we may sometimes seek to assuage our jealousy of the cosseted opinion-mongers at the neocon policy zoos by dismissing them as mere corporate whores (though that view is not one that Language Log would endorse; notice that Language Log has a Senior Fellow at the Manhattan Institute, John McWhorter, on its roster of contributors). That's not really a testable claim.

But having made his point about what he thinks liberal academics think, suddenly, heedless of his own safety or credibility, Bauerlein steps off the edge of stereotype into empirically testable space with an unsupported but testable claim about language use:

Hence, references to "right-wing think tanks" are always accompanied by the qualifier "well-funded."

Always, that's what he says. So, since no one else ever seems to want to do any fact-checking on matters of language, the dedicated team of researchers here at the Language Log think-tank turned (as ever) to Google, and I present herewith a table of the relevant findings (uninfluenced by any corporate sponsor):

"right wing think tanks" 14,600
"well funded right wing think tanks" 50

Why would Bauerlein do this to himself? No one would say "Buicks always have pro-Republican bumper stickers", because everyone can just look at the traffic and see that it's not true. How could a professor of English not realize that we can do the same with linguistic material?

His piece is called "Liberal Groupthink Is Anti-Intellectual," and it's in the November 12 Chronicle of Higher Education. The irony is that I broadly agree with his drift. Academia is astonishingly devoid of conservative opinion, and this is by no means a good thing even for academics whose views are of the left, because debate between sharply differing political positions doesn't occur — it can't, because the differing opinions aren't there. I know exactly two serious intellectuals who hold right-wing views, neither anywhere near my campus. I am in regular touch with only one of them; he lives three thousand miles away. We fight a lot by email. He is utterly wrong, but not on everything, and I'm convinced that the extensive arguments we have are good for my intellectual life. I learn facts from him as well as learning to argue better about non-linguistic matters. I have no access to any such arguments on my own campus. Some years ago our campus newspaper (actually a radical leftist opinion weekly that is usually wrong about what little campus news it reports) did a survey from public records of the voter registration status of the faculty in the Department of Politics here: it came out to Republicans 0%, Democrats 100%. Long live intellectual diversity.

Academics should take Bauerlein's view more seriously. How can our universities be arenas of wide-ranging and untrammeled debate on political, economic, social, and cultural topics if our profession represents almost exclusively the range of opinion to the left of John Kerry?

But oh, I wish people like Bauerlein wouldn't spoil their presentations of critical views like this by including ridiculous claims about public use of language that can be falsified in seconds. My statistics actually understate the absurdity of what he says, since a large proportion of the 50 web-hits for "well funded right wing think tanks" are requotations of a single line from a document on the Communication Stream of Conspiracy Commerce prepared in or around July 1995 by the Clinton White House Counsel's Office and discussed here and there in the press around 1997 and subsequently:

The Communication Stream of Conspiracy Commerce refers to the mode of communication employed by the right wing to convey their fringe stories into legitimate subjects of coverage by the mainstream media. This is how the stream works. First, well funded right wing think tanks and individuals underwrite conservative newsletters and newspapers such as the Western Journalism Center, the American Spectator and the Pittsburgh Tribune Review. Next, the stories are reprinted on the internet where they are bounced all over the world . . .

Take away the hits that involve repetitions of that passage in the various periodicals and websites that have discussed it, and the imbalance is much greater. But even if we do no such correction, the fact is that the number of times in which references to right-wing think tanks on the web are accompanied by the attributive modifier "well-funded" is approximately 0.342%. Three or four out of each thousand.

Claims as patently ridiculous as that liberal professors always qualify "right-wing think-tank" with "well-funded" discredit the people who make them. If that is what it means to listen to a more conservative cast of opinion, no wonder we don't waste our time doing so. Bauerlein's moment of careless hyperbole has doubtless encouraged thousands of academics to spend even less time listening to conservatives than they do right now. I'm convinced that they're making a mistake thereby, but Bauerlein has made it hard for me to defend him.

Of course, I could tell you, "Oh, don't take the hyperbole seriously, he doesn't mean his universal quantification literally." But then we might just as well say the same about the universal quantifiers used by leftists in universities who assert that whites are always blind to their inherent racism, or that American intervention in foreign countries is always motivated by rapacious greed for control of natural resources, or that analytical argumentation is always phallocentric and sexist; and in that case Bauerlein's whole discussion of such excesses of university left-wingery has gone up in smoke, because everyone has an intellectual alibi.

The bottom line: people who want to be taken seriously should exercise as much care with their linguistic evidence when making a point about the use of language as they do (or should) with their claims about financial evidence when talking about economics, or psephological evidence when talking about electoral politics, or seismological evidence when talking about earthquakes.

[Note added later: The very fair-minded researcher Maryellen MacDonald points out to me that there are other kinds of qualifier than attributive modifiers: phrases like "right-wing think tanks that are well funded", where the qualification is in a predicative complement, might be relevant, for example. Maryellen is right to raise this as a possibility, though in actual fact "right-wing think tanks that are well funded" gets no Google hits at all. I suppose if we were being really generous we might (despite Bauerlein's direct quotes around "right wing think tanks") allow other wordings, e.g. "conservative" for "right wing". But I hold out little hope for Bauerlein's defense team. I found one (1) hit for "conservative Think Tanks that are well funded", and 119 for "well funded conservative think tanks". Another correspondent, Russell Burdett, suggested that Bauerlein's remarks implied that he would only claim that professors always used the qualifier. I tried testing for this by doing a search limited entirely to the .edu domain, in which most web sites belong to academics, and I also limited the search to the past year. But I got 125 hits for "right wing think thanks" and only 2 for "well funded right wing think tanks" (both of them just quoting the above-mentioned report from the Clinton administration). No, I don't think fiddling with the search is going to yield numbers that will necessitate a retraction of my general point: Bauerlein did not do even the tiniest bit of fact-checking on his linguistic point, and his "always" is the wildest of exaggerations.]

Posted by Geoffrey K. Pullum at 03:03 PM

Looks like a reference problem

Reading Mark's post about seems that/like reminds me of two similar examples. Both are with looks instead of seems, meaning that *looks that isn't possible (though I won't bother with a Google search right now).

The first was one I overheard in Louisville, KY a few years ago. My wife's aunt was telling us a story about how her son came home from school one day, looking beaten up. So his mom says to him:

Somebody looks like they beat the hell out of you!

At first this sounded fine to me, but then I wondered why. The problem is with the reference of somebody.

I don't think that somebody co-refers with you. Co-reference in this case requires that the latter pronoun be 3rd person, and even then it sounds better, at least to me, if the pronoun is in subject position (compare somebody looks like he was beaten up (by some bullies) with ?somebody looks like some bullies beat him up). You is 2nd person, and in object position in the example quoted above.

So we have to conclude that somebody co-refers not with you but with they, which is consistent with the fully grammatical paraphrase without raising:

It looks like somebody beat the hell out of you!

There are several questions raised by this analysis, however.

  1. Somebody is singular, they is plural?
    This presupposes, I think erroneously, that they can't be singular (at least for purposes of gender neutrality). My wife's aunt's son is no wuss, but I think his mom knows that there are some big bad bullies out there with no need for a gang.

  2. Somebody looks like and "sensory range"?
    The somebody in somebody looks like leads one to believe that its referent is within the sensory range of the speaker - the speaker sees, hears, or otherwise senses the referent's presence. But the referent of they is not present - except in the form of bruises on the speaker's addressee, which I suppose may count as enough.

  3. More than one pronoun = ungrammaticality
    It seems to be the case that a second pronoun is simply not possible in the embedded clause of this construction, even if the intended co-reference of somebody is with a 3rd person singular pronoun (regardless of grammatical role): *somebody looks like he was beaten up by her, *somebody looks like she beat him up. But maybe the fact that the second pronoun is 2nd person you in the original quoted example alleviates this problem (I'm not yet willing to say it's not grammatical).

I suppose it's possible that my wife's aunt meant to say something like the following impeccably grammatical sentence, where both embedded pronouns co-refer with somebody, but maybe slipped up sometime around the application of downstairs passivization:

Somebody looks like they had the hell beaten out of them!

Update: John Cowan disagrees somewhat with my analysis; I reproduce his message to me in full here:

I think there is indeed a problem of reference, but what weirds the sentence is not so much the reference of "somebody" (which I do think is co-referential with "you"), but the fact that this represents a particular kind of language game in which "somebody", normally -specific, is being used as +specific. An example from my own home a few days ago:

Somebody better wash the dishes before she goes to bed!

Now examining this form, it seems a bit cleaned-up: a more natural form would probably have been (if I weren't a lingweenie):

Somebody better wash the dishes before they go to bed!

IOW, this +specific use of "somebody" normally carries the "they" singular pronoun that is preferred in its -specific use, as in:

Somebody has left their pager in the men's room.

So I suspect that your aunt-in-law's underlying form was:

Somebodyi looks like theyj beat the hell out of themi.

with this anomalous use of somebody, but that this form got censored because of its semantic peculiarity, causing the "themi" to be replaced by "youi" at the last minute. If so, then "theyj" is of course the vague They who do all bad things.

The other example is on the highly recommended website. Homestar Runner has just presented Marzipan with a veggieburger, decorated so that it "has a little face". Homestar says:

The olives, um, kinda look like he has eyes.

This also sounds eerily fine, although kinda make it look like would be a definite improvement. (Note that this example kinda looks like the he seems like I really have a case examples Mark cites.)

One more kinda related example, with a cute little co-reference problem:

And now, the man whose back I walked on and nearly killed ...

I heard Jim Packard say this while introducing Michael Feldman at the beginning of Michael Feldman's Whad'ya Know?. Obviously, the extracted object of killed is the man, but his back keeps getting in the way.

[ Comments? ]

Posted by Eric Bakovic at 02:44 PM

This construction seems that I would never use it.

Here are five of the ways that I use the verb seem, with some short examples (courtesy of Google, like all the examples in this post):

1 It seems that Sentence. It seems that relativism has killed ethics.
2 NounPhrase seems to VerbPhrase. Chaos seems to aid learning.
3 NounPhrase seems like NounPhrase. This seems like a terrific solution.
4 It seems like Sentence. It seems like the whole South is for sale.
5 NounPhrase seems like Sentence[Pro]. He seems like I can trust him.

Other English speakers agree, to the tune of millions of Google hits. Seem like, especially in patterns 4 and 5, is marked as informal for me, but I use it without embarrassment in conversation and informal writing.

However, I've recently seen two generalizations of these patterns that are natural, but definitely outside my comfort zone. One is a syntactic generalization of pattern 5 from seem like to seem that. The other is a subtle semantic shift in the meaning of seem, from something like "to give the impression of being" to something like "to give the impression of believing".

Here are some examples of the first innovation:

I met with a new client who seems that he might be difficult.
Strangely enough, he seems that he is much happier here.
She seems that she can fit the part.

These would be fine for me if that were replaced by like, creating instances of the commonest version of pattern 5, where the pronoun in the complement of seem like is the subject of its sentence:

She seems like she would be a great person to meet and know.
These posts seem like they are from another planet.
Bush seems like he's stating the facts, and Kerry seems like he's attacking the president.

The cases like the one in the original example ("he seems like I can trust him", where the pronoun is in another position in the complement sentence) are also possible though less common:

She seems like I would hate her in real life.
He seems like I've seen him before.

These seem fine to me, though informal in the way that seems like generally is. However, they are definitely rarer. Thus "she seems like she" get 7,370 Google hits, while "she seems like I|you|he|we|they" gets only 171 -- and so on:

  same pronoun different pronoun
she seems like __
he seems like __
they seem like __

I haven't found any examples of the NounPhrase seems that Sentence[Pro] pattern where the pronoun is not in subject position, but that may only be because of compounding rarities.

In some examples of seem like, the expected pronoun connected to the subject of seem is missing:

My husband and myself meet with a lawyer this past Monday. He seems like I really have a case and should not have any problem.
But she did'nt, just kept on looking distracted and smiling, she told me she loves that priest. She seems like they have a friggin "connection". I don't care.

There seems also to be a shift of meaning here, with "(s)he seems like" apparently being used to mean "(s)he seems to think that" or "(s)he seems to say that", or perhaps "(s)he acts as if".

The same sort of shift often seems to have occurred even when the expected pronoun is present:

... he seems like i'm just a pain to him and i get in the way.
... she seems like I'm stopping her from doing something.
you seem like he doesn't please you.
... why do you seem like he gave you kooties instead of a kiss that you say you liked?

In such cases, seem like appears to mean something along the lines of "act as if" or "give the impression of believing that". Perhaps this is connected to the quotative use of like, so that seem like may be felt to mean "seem to be saying"?

Anyhow, it's not a surprise that so many of the examples of this kind are in confessional writing about relationship problems. Where else do you get so much informal, explicit discussion of "theory of mind" reasoning?

Posted by Mark Liberman at 08:05 AM

November 10, 2004

Linkfest, focusing in the end on sex

Looking randomly around the lingua-blogosphere, I find a troubling case of possible overnegation, a memorial to the subjunctive, Igor's Theory of Negativity, a comparison between making post and getting tenure, a couple of posts on gamers' English, hormonal palmistry of language aptitudes, and a dicussion of what quibbling about the stars means to a (well, one) linguist.

What (little) I know about the stuff on sex hormones, finger lengths and cognitive profiles is from the work of Doreen Kimura (her home page is here). The serious part is the putative relation between sex hormones and cognitive skills; a good summary of her perspective is "Sex Hormones Influence Human Cognitive Pattern", Neuroendocrinology Letters 2002; 23(Suppl. 4):67-77. The finger length business is not entirely a curiosity, since to the extent that it really reflects prenatal sex hormone levels, it serves as a somatic marker that can interestingly be correlated with all sorts of things, including adult sexual orientation as well as various test results, life choices and so on.

The jargon for the difference is 2D:4D ratio ("second digit to fourth digit ratio"). It's easy to measure -- though there is surely a real possibility for observer bias to play a role in the measurements -- and the fact that collecting data is so simple strikes me as both an opportunity and a danger. It's easy to do little studies of this that and the other, and for people to do self-evaluations from which they (or their schoolmates) may draw invidious conclusions. In the (few) cases where I've looked at the data in detail, I find results like those in Q. Rahman and G.D. Wilson, "Sexual orientation and the 2nd to 4th finger length ratio", Psychoneuroendocrinology 28:3, April 2003, 228-303, where the differences were highly significant in the statistical analysis:

For right-hand ratios, there was a significant effect of sexual orientation (F=24.237, df=1, 239 P=0.000); homosexuals having lower right-hand 2D:4D ratios than heterosexuals. There were no significant effects of gender (F=0.115, df=1, 239, P=0.735), no significant interaction (F=1.684, df=1, 239, P=0.196) and no significant effects of the covariates (all Ps>0.10). Overall, the difference between homosexuals and heterosexuals constituted a moderate to large effect.

However, these highly significant differences in the mean value were nevertheless rather small, as a proportion of the means, and also relative to the variance:

2D:4D ratio
Standard deviation
Heterosexual males
Heterosexual females
Homosexual males
Homosexual females

In other words, the mean values for homosexuals and heterosexuals differed in their sample of males by 1 part in 100, and in their sample of females by 3 parts in 100; while the standard deviations of the measurements within each subgroup were 2 to 3 parts in 100. It bothers me that the newspapers (and even science magazines) who report this kind of stuff never try to explain this aspect of the results, which could easily be gotten across with histograms, or with scatter plots when the dependent variable is something like programming skill. The excuse for not doing this, I've been told, is that readers would be confused by the details; but an equally strong reason, I suspect, is that such explanations would undermine the apparently spectacular results ("you can tell someone's sexual orientation from their finger lengths!") with a small dose of reality ("no you can't, not with any accuracy in individual cases; the experiment showed only that you can distinguish a set of 60 homosexuals from a set of 60 heterosexuals on the basis of this measurement").

One could (and should) take the same care in presenting the results about demonstrated sex differences in cognitive skills, which are often of a similar nature.

Though it has nothing at all to do with sex, language, or for that matter with other aspects of cognition, I was also interested in this post on Galileo's middle finger.


Posted by Mark Liberman at 11:51 AM

November 09, 2004

The Curious Grammar of Ohio: The Local Color Illusion

Writers are often told to Show, Not Tell, and while this is not universally the best of advice -- it can lead to a piling up of flat details that the reader has to struggle to interpret individually and as an ensemble -- sometimes you just want to send a writer a plaque with this message engraved on it. This was my reaction to David Blaustein's review of Keith Banner's The Smallest People Alive, in the Lambda Book Report, August/September 2004, p. 25:

Another unifying idea is simply the context of the book: The stories are all set in Ohio, where Banner lives. Banner uses the curious grammar of the region to great effect throughout his book, employing a series of voices that may not come in for much attention by the publishing centers of this country, making a lie of that often repeated idea that regional differences are being subsumed into a standard (and presumably bland) way of life in this country. Whether Banner is comfortable being labeled as a regional writer or not, he has produced a work that is wholly of a specific place and time.

What counts as "curious" grammar, exquisitely of Ohio, for Blaustein? He gives not one example. (Meanwhile, other reviews, in the New York Times and Publishers Weekly, don't even mention language.) Show us something, Blaustein!

Now that I've read the book, I can report that there are vanishingly few regional features in it; counting fairly generously, I found four tokens in the book's 260 pages. (Yes, tokens.) What there is instead is a pile of features from colloquial and/or working-class and/or innovative speech, features that are found all over the United States. Blaustein is suffering from what I'll call the Local Color Illusion, the impression that non-standard features, largely to be heard in the vernacular of the working class, in some area are what make the language of that area special, and colorful -- this despite the fact that the non-standard features that are most likely to be noticed are those that are not particularly regional.

When I brought the Blaustein review to the attention of the American Dialect Dialect Society mailing list, on October 27, I speculated that the part of Ohio in question is the east and south, where most of the stories are set, and ADS posters quickly nominated two South Midlands features that regularly strike outsiders as odd as the probable source of Blaustein's perception of wonderful local color: want/need + past participle (Your shirt needs washed) and positive anymore (Gas is really expensive anymore). A fair amount is known about both of these features. They are spread across a geographical area considerably larger than the east and south of Ohio, but they are certainly used there, and they can easily be found in writing as well as speech and from people of a wide range of social statuses.

Score: 1 token of the first ("The main reason Irene wanted divorced...") and 0 of the second.

Ok, let's try some notably Appalachian features: double modals (I might could do that) and a-prefixation (I was a-talkin' about that last night). Score: 0 and 0. Again, both of these features are spread across a geographical area considerably larger than the Appalachian portions of the state, but they're certainly found there.

Stretching things quite a bit, by interpreting "grammar" in its person-in-the-street sense (to include morphological forms, lexical choices, and pronunciations as well as syntactic constructions) and by interpreting "regional" very generously, I get only three more tokens: one pronunciation, dern for darn (the Dictionary of American Regional English identifies this one, in its entry for durn, as chiefly Southern and South Midlands); one non-reflexive ethical dative, in "I need me a gun" (I don't really know the geographical distribution of this one, but I can vouch for its use in the South and South Midlands); and, dubiously, one occurrence of sack 'bag' (here, DARE has the item all over the U.S., though less in the Northeast; but in Ohio, people tend to view sack as a southern Ohio thing and bag as the more general variant, so I'm counting it anyway).

So, what was Blaustein noticing? Well, one hell of a lot of general colloquial features: prospective gonna, obligative gotta; subject omission, of the Saw him yesterday sort; initially reduced questions, of the She okay? and When you gonna go? sorts; expletives like the fuck and other taboo vocabulary (oddly, Banner doesn't use taboo vocabulary for 'penis', instead uniformly employing thing, as in his thing and my thing); and the tag and shit. Now, these are found all over the U.S. in informal speech (I use most of them myself) and are not even slightly characteristic of one region as opposed to another.

(Another oddity: Banner is incredibly sparing of -in' for present participles, though surely his characters would have this variant frequently, and indicates Auxiliary Reduction much less than his characters would use these variants. Maybe he's just opposed to apostrophes.)

And the book is full of general working-class vernacular features, not particularly regionally restricted. Most of these features were already noted by H. L. Mencken in The American Language (1919-48), and a fair number of them appear in many parts of the English-speaking world. Here are some that occur again and again in Smallest People: ain't; anyways; past tense done; accusative coordinate subject pronouns (Him and me had movies); multiple negation; determiner them (them guys); past form for past participle (have ran); and invariant singular in existentials (There was a lot of people there). Not every common working-class vernacular feature appears in Smallest People; I didn't catch any occurrences of hisself or theirselves, for instance. And not every one of Banner's characters uses these features a lot; the narrator of the title story has very few of them. But most of his characters are working class and talk like it.

Oh, and throw in some innovative non-standard usages that aren't particularly working class, like of used with exceptional degree modifiers (too Adj of a N); independent reflexive myself, as in Edgar and myself go way back; and transparent type of (these type of things).

What all these features (colloquial, working class, or innovative) have in common is that they're officially non-standard: schoolteachers and advice books on language tell you not to use them, at least in formal writing (though the prohibition often extends to any use at all, even in informal speech). Readers are sensitive to these features, having been taught that those who use them are sloppy, lazy, ignorant, uneducated, illogical, or just erroneous. The sort of people who live in trailer parks in hardscrabble areas and are just barely getting along (as, indeed, many of Banner's characters do). But these features also signal "jes' folks", a kind of down-homey earthiness and toughness (which can be admirable or scary, depending on the circumstances). Real People.

But what gives rise to the Local Color Illusion? Why should officially non-standard features be interpreted as characteristic of some region? Especially when most of these features are found all over the place (in the speech of rural Maine, Northern cities factory districts, and California's Central Valley, as well as the South and South Midlands) and in all sorts of social groups (the Pennsylvania Dutch, African Americans, Polish Americans, and Chicanos, as well as the descendants of the Scots-Irish).

I think that the crucial link is based in fact, but fact twisted by a heavy infusion of language ideology. The fact is that the official standard -- established formal standard written English -- is pretty much the same from place to place; there's a good bit of variability, but for the most part it's not regional. Now for the ideology, which comes in two parts: the belief that the official standard is the same everywhere, for all people, in all contexts (denying the variability I just mentioned); and the companion belief that language that is not officially standard is a deviation from this standard, and so can be expected to differ from it in any number of ways, different ways in different places, for different groups of people. The picture is of a uniform official standard with a welter of "dialects", each with its own characteristic features -- quaint, colorful, grating, charming, or alarming, as the case may be. (Such pictures can occasionally be found in textbooks.) Variability in the official standard is underestimated, and diversity in the "dialects" seriously overestimated. As a result, divergences from the official standard "sound local".

The Local Color Illusion can be very strong. I've had people I grew up with in Pennsylvania Dutch country describe the local English for me (after all, I'm a Linguist), beginning with a few well-known localisms like doppich 'clumsy' and strivvely '[of hair] unruly' and quickly turning to a list of pan-U.S. non-standard features, like ain't, multiple negation, determiner them, and their sisters and their cousins and their aunts. Now, these people are aware that any particular feature might be used outside Pennsylvania Dutch country -- that multiple negation, say, is used by blacks and Southerners and lots of other people-- but they treat each item separately, so it's remarkable to them that all these features come together locally. Officially standard features hang together in a package, which is taught to you in school, but all the rest crop up one by one, on the street, so to speak, and no one expects them to make a package. But, actually, some of them do.

And so we see Blaustein, confronted with features that, with a tiny number of exceptions, could be found anywhere from rural Maine to California's Central Valley, hearing them as wonderfully evocative of Ohio, the place where Banner's stories happen to be set (reasonably enough, since this is the area Banner is most familiar with).

One more piece of ideology (already alluded to above): ordinary people tend to believe not only that the official standard is uniform, but that this uniformity is created and maintained by schooling; without this institutional support, the language would dissolve into the chaos of the "dialects". But in fact features spread in pretty much the same way in all social groups, working largely outside the strictures of the schoolroom and producing a certain degree of conformity within social classes, ethnic groups, regions, and so on. I picked up my variety of English on the street, too, by association with, and identification with, other people. So it's no great surprise that my English is a lot like that of academic colleagues who grew up in Los Angeles or Chicago or Dallas. No more surprise than that Banner's Ohio characters sound a lot like working-class people in Los Angeles or Chicago or Dallas.

zwicky at-sign csli dot stanford dot edu

Posted by Arnold Zwicky at 12:55 PM

Talking about whom you are and who you're seeking

James E. McGreevey said farewell yesterday, as he stepped down from the governorship of New Jersey. I haven't been able to find an audio version of his speech, but the published text includes some striking images. My favorite: "We smile in person and then throw each other under the bus when we leave the room". This is a case where a diagram might be helpful, as Geoff Pullum has suggested for Dan Brown's descriptions.

The speech also featured a grammatical innovation (emphasis added):

To be clear, I am not apologizing for being a gay American, but rather, for having let personal feelings impact my decision-making and for not having had the courage to be open about whom I was.

Here whom is being treated like the object of about. A web search shows that others feel the same way --

Or checking into the hotel that night and not having to say a word about whom I was or what I was doing - they knew already.
I thought a moment of every teaching I could think of about whom I was and then responded. "Some say I am Spirit or Soul. Is that who I am?".
I'll tell them about whom I was, what I wanted to be, and maybe even about the wife, mother, and woman I still hope to become.

though it is still a minority view:

"about __ I am and"
"about __ he|she is and"
"about __ we|you|they are and"

I'm glad I checked, since I happened on a review of Yahoo Personals, at, containing this lovely sentence

In your ad, you can talk about whom you are and who you're seeking.

which suggests that the active principle is stringwise adjacency to the preposition. The standard view would be that the whom and who in this sentence should be swapped, but it seems that as this bit of dead morphology rots, new hypotheses about its function germinate.

Though perhaps if we could search enough 18th century writing, as produced by the equivalents of Gov. McGreevey and, we might find find examples of the same sort of thing, I don't know.

[Update: there's a video stream for McGreevey's farewell at CSPAN.]

[Update #2: Matt Weiner blogged about this, citing examples like "I lost the person whom I was and the more time goes by, the more I believe that the person I was is lost to me forever now".

Matt is absolutely right that some people use whom without any plausible grammatical pretext at all. That's the premise of James Thurber's joke ("'Whom' should be used in the nominative case only when a note of dignity or austerity is desired"), and the conclusion of Geoff Pullum's illustrated obituary. My point in reference to Gov. McGreevey's use was just that adjacency to a preposition seems to make the insertion of whom somewhat more likely, presumably because in many cases, the wh-word would really be the object of the preposition. ]


Posted by Mark Liberman at 07:02 AM

November 08, 2004

Oxen, sharks, and insects: we need pictures

A number of readers have written asking me not to quote any more of Dan Brown's prose. I'm sorry, I know how you feel, but I have some... uh... angels and demons to exorcize. I have a few more linguistic observations about his wildlife similes and other imagery in both The Da Vinci Code and Angels and Demons.

You may recall that, as Dan Brown tells it in The Da Vinci Code, after Jacques Saunière staggered through the vaulted archway (catch up here and here if this means nothing to you), "the seventy-six-year-old man heaved the masterpiece toward himself until it tore from the wall and Saunière collapsed backward in a heap beneath the canvas." (Among other things, this led me to muse on whether a 76-year-old curator, entirely on his own, could possibly constitute a heap.) Well, I see that a new treat is in store for Dan Brown fans. The current issue of The New Yorker carries an advertisement for a recently released illustrated edition of The Da Vinci Code. A full-color illustrated edition with more than 150 pictures! If a picture is worth a thousand words, that's 150,000 words of value right there. The picture in the ad shows the book open to the first page, with the words quoted above clearly legible, and on the left-hand page opposite is a full-color frontispiece showing the masterpiece under which Saunière collapsed: Caravaggio's The Death of the Virgin.

I don't know yet what else they decided to include pictures of. I'm rather hoping there is one for the beginning of Chapter 4 of The Da Vinci Code, where it says, "Captain Bezu Fache carried himself like an angry ox." Moo! Grrr! I really crave a picture of that.

Maybe one day the earlier book, Angels and Demons, will have, along with its unauthorized guidebook, an illustrated edition, so I can see pictures corresponding to some of its deeply weird descriptions. Dan writes at one point that the constantly angry Commander Olivetti of the Vatican Guard "entered the room like a rocket". (Would a rocket be better or worse to be in a room with than an angry ox? You can see how a picture might help.) At another point in the novel someone says something Olivetti doesn't like, and we read that "His eyes went white, like a shark about to attack." Now, as my friend Ari Kahan reminds me, this isn't quite right: a shark rolls its eyes up for protection not before blundering into its attack unable to see, but at the instant of the attack itself: if you see the eyes go white, it may be too late for you: count your arms and legs.

But actually the shark stuff may not matter very much, because it soon turns out that they're not shark eyes at all: we read that Olivetti says something with "his insect eyes flashing with rage."

So I'm trying to picture him mentally. A rocket? A blind shark? An enraged insect? When Dan Brown is doing the describing, you really need pictures.

Posted by Geoffrey K. Pullum at 07:04 PM

Snoot? Bluck.

David Foster Wallace is a snoot as well as an exceptionally long-winded novelist, "snoot" being his own made-up word for "a really extreme usage fanatic". For a thorough airing of Wallace's attitudes in this arena, see his 4/2001 review of Garner's Dictionary of Modern American Usage, and Language Hat's critique (scroll down for the DFW bits).

I've noticed over the years that snoots often like to make up words, and I've wondered why people who value traditional usage so highly are also so open to lexical innovation. The paradox evaporated when I realized that the snootish impulse is not a defense of the community's traditions, it's an assertion of linguistic ego. And what could be more egocentric than inventing new words?

This also explain why snoots are never scholars. At least, their snootish outpourings are never based on scholarly investigation and analysis, even if they have some scholarly credentials in other aspects of their intellectual life. The reason is simple: scholarship subordinates the self, at least temporarily, to an investigation of external fact, while the snootish posture immediately asserts the primacy of the self's linguistic judgments. Snoots routinely invoke both the authority of tradition and the dictates of logic, but these are ex post facto rhetorical justifications, not the conclusions of a dispassionate analysis.

This insight helps explain why snootish assertions are so often factually mistaken, even when they deal with matters that are easily checked. How could Cullen Murphy, John Powers and Sidney Goldberg, all intelligent and literate individuals, embarrass themselves by fulminating in national publications about lexical and grammatical issues where their claims are simply wrong? Why is DFW's usage rant so full of elementary usage mistakes? The usual explanation is that our society has failed to give its intellectuals an adequate linguistic education. That's true, but it's not enough. Murphy and Powers are surely capable of understanding the glosses and example sentences in the OED, and Goldberg could have had his assistant phone up an expert on English syntax, as he would probably have done if he were ranting about some medical or economic or legal matter. Wallace could have checked his dictionary citations, spent a few hours on the internet or in the library getting his terminology right, and so on. Snoots don't check lexicographic and grammatical facts because their complaints are about subjective pain, not about objective facts of usage. Though they masquerade as defense of social norms, such screeds are really the howls of a wounded self, demanding primacy.

I was reminded of all this when I read Wallace's review in yesterday's NYT of Edwin Williamson's "Borges: A Life", and came across this passage (emphasis added):

Williamson's claim ... that in 1934, ''after his definitive rejection by Norah Lange, Borges . . . came to the brink of killing himself'' is based entirely on two tiny pieces of contemporaneous fiction in which the protagonists struggle with suicide. Not only is this a bizarre way to read and reason ... but Williamson seems to believe that it licenses him to make all sorts of dubious, humiliating claims about Borges's interior life: ''A poem called 'The Cyclical Night' . . . which he published in La Nacion on October 6, reveals him to be in the throes of a personal crisis''; ''In the extracts from this unfinished poem . . . we can see that the reason for wishing to commit suicide was literary failure, stemming ultimately from sexual self-doubt.'' Bluck.


This is not a reference to the religious writer Bluck, or the psychologist Bluck, or any of the other Blucks known to others. The context makes it clear that this is an invented expression of disgust, a blend of yuck and blecch, evocative of vomit.

The traditional norms of formal writing, at least in English and other languages that I can read, frown on the use of orthographically-rendered expressive noises, even ones like yuck that have made it into the dictionary. If you check out Google News or Nexis search for yuck, you'll find it mainly in quotations, in sports stories, in columnists' informal rants, and in self-conscious meta-uses like this 11/7/2004 review by Elizabeth Egan of the new McLaughlin and Kraus novel, Citizen Girl:

She is known simply as Girl, a gimmicky moniker whose cringeworthiness is compounded by the appearances characters named Buster and Guy. Yuck. (And since we're in the world of chick lit, it's okay to say that.)

Well, neither Borges nor Williamson is exactly chick lit, but Wallace doesn't feel the need to apologize for his Bluck, as Egan did for her Yuck. And if anyone has even used "Bluck" before in print to spell out a disgust-sound, neither the OED nor Google nor A9 can find it for me. Invented expressive-noise spellings are common in cartoons the world over, but it's a recent stylistic innovation to extend this feature into formal essays and reviews. And this is an innovation that puts the individual's linguistic impulses ahead of the community's linguistic traditions, which is precisely what snoots purport to deplore.

Let me make it clear that I'm being descriptive here, not prescriptive. If writers want to introduce cartoon stylings into formal essays, that's fine with me. Nor do I have any quarrel with Wallace's evaluation of Williamson. My point is just the superficial incongruity -- and deep connection -- between Wallace the Snoot and the Wallace who Blucks.

[Update: Mike Pope wrote to claim documented lexicographic validity for bluck, on the basis of Barbara Park's immortal work "Junie B. Jones and the Yucky Blucky Fruitcake". Now, I might point out that this work follows in the pattern established by "Junie B. Jones and Some Sneeky Peeky Spying", not to mention "Junie B. Jones and the Mushy Gushy Valentime" [sic], and I might suggest that if Ms. Park should follow with "Junie B. Jones and the Language Banguage Maven Baven", we might wait a bit before putting banguage and baven into the dictionary.

But I won't. Instead, I'll admit that Mike is probably right: bluck is an obvious back-formation from blucky, which has 2,910 hits on Google, a few of which are not references to Ms. Park's work, but are real uses like "felt blucky today" or "dark, blucky and people stuck in traffic for miles." This kind of coinage and derivation is a typical process in the creative evolution of language, and is (therefore) exactly the sort of thing that snoots like to deprecate. ]

[Update #2: Neal Whitman came up with the same citation independently, and sent this link to a .jpg of the book cover. ]


Posted by Mark Liberman at 07:40 AM

November 07, 2004

Reintroducing Diagramming

Not long ago I commented on a nostalgic essay on diagramming sentences, an exercise that, along with grammar instruction in general, disappeared from most schools with the advent of the "whole language" trend and the idea that children learn grammar best without any explicit instruction in the context of reading and writing. Well, for some people it isn't just nostalgia. The Education Life section of today's New York Times contains an article about a move to reintroduce diagramming.

The idea of reintroducing grammar as a subject in schools is a good one. It's good for students to acquire insight into the structure of their own language and to be able to understand how it differs from others, both foreign languages, and the formal, written variety of English to the mastery of which so much of education is devoted. I fear, however, that reintroducing diagramming may not be the best way to do this. As we learn from Mark's discussion of the history of diagramming, it reflects a 19th century approach to syntax. A lot has been learned about syntax since then, especially in the last fifty years. One rather obvious defect of traditional diagrams is that they reflect only constituency; there are no category labels. The main problem, though, is not so much the diagrams themselves as the lack of any explicit procedure for constructing them or any principles for determining whether a diagram is a valid representation of a particular sentence. Even this explanation of how to diagram consists merely of a bunch of examples. Mark's description of his experience meshes with my own brief exposure to diagramming in grade school. The teacher didn't give any real explanation of what the principles were or how to choose between alternatives.

It is of course possible that the people reintroducing diagramming are using it as part of a more sophisticated grammar curriculum, but I am skeptical. In my experience, few English teachers know anything at all either about English grammar or about linguistics. This isn't really their fault - most of them never learned any grammar in school themselves and nothing in their training as teachers would have suggested to them that they ought to know either grammar or linguistics. Nonetheless, it means that they are ill-prepared to teach about grammar and even more so to develop a curriculum. I hope that the people working to reintroduce grammar will avoid clinging to tradition and instead introduce an approach that reflects current knowledge of syntax and presents the study of syntax as an empirical, scientific activity.

The late Ken Hale was a proponent of the idea that speakers of minority languages could benefit from studying the structure of their own language empirically. He actually wrote a highschool-level textbook of Navajo linguistics aimed at helping speakers of Navajo to discover the structure of their language. One reason for doing this is the hope that it will increase their appreciation for the richness of their language and engender pride in it. Another is that studying the structure of one's own language provides an introduction to the scientific method that requires no laboratory, equipment, or materials and that is perhaps less foreign and threatening than laboratory science is for many students. Speakers of standard English can derive many of the same benefits. Wayne O'Neil and Maya Honda have carried on with this idea. Wouldn't it be wonderful if instead of merely reintroducing a musty 19th century tool for teaching formal, written English, schools began to teach students what languages are really like and how to find out for themselves?

Posted by Bill Poser at 07:10 PM

Renowned author Dan Brown staggered through his formulaic opening sentence

I promised to supply here from time to time a few more of the points about Dan Brown's writing that I didn't have space to talk about in the article I contributed to Secrets of Angels and Demons. So here's one. It is truly strange that Dan Brown began his first novel with exactly the same construction that made the opening of his better-known The Da Vinci Code so weird.

I can be quite precise about the description of that construction: an occupational term is used with no determiner as a bare role NP premodifier of a proper name. (The name is borne, moreover, by a elderly Catholic man speaking a Romance language, who has just suffered an excruciatingly painful attack and will be dead within a quarter of an hour.) This odd formula makes the openings of Dan Brown's two novels about Catholic skullduggery eerily similar. Here's the first sentence of The Da Vinci Code (which I wrote about before):

Renowned curator Jacques Saunière staggered through the vaulted archway of the museum's Grand Gallery.

And here's the first sentence of Angels and Demons:

Physicist Leonardo Vetra smelled burning flesh, and he knew it was his own.

This use of a person's name preceded by the name of a job, without a preceding article (an anarthrous NP, as we grammarians say when chatting with our own kind in the secretive cabals that we sometimes hold) is odd because occupational descriptions like "fertilizer salesman" aren't normally used as titles. "Cardinal" is a title; selling fertilizer is merely a job. It is true that noun phrases like fertilizer salesman Scott Peterson are found in newspaper articles (in fact John Cowan points out to me that it is a well-known feature of the style associated with Time magazine), but I have never yet found anyone but Dan Brown using this construction to open a work of fiction. The construction sounds to me like the opening of an obituary rather than an action sequence. It's not ungrammatical; it just has the wrong feel and style for a novel.

I didn't really begin to worry about Dan being stuck in a rut, though, until I took a look at the first chapter of yet another of his novels, Deception Point, which is reproduced at the end of Angels and Demons as an advertisement, and found the same construction used yet again right at the beginning of the first chapter, albeit with one short sentence preceding it:

Death, in this forsaken place, could come in countless forms. Geologist Charles Brophy had endured the savage splendor of this terrain for years, and yet nothing could prepare him for a fate as barbarous and unnatural as the one about to befall him.

Once again the strange anarthrous use of an occupational noun. And of course, Geologist Charles Brophy is dead meat. The simple fact is that if you are ever mentioned on page 1 of a Dan Brown novel you will be mentioned with an anarthrous occupational nominal premodifier ("Renowned linguist Geoff Pullum staggered across the savage splendor of the forsaken Santa Cruz campus, struggling to remove the knife plunged unnaturally into his back by a barbarous millionaire novelist"), and you will have died a painful and horrible death by page 2, along with several curiously ill-chosen clichés and mangled idioms.

Posted by Geoffrey K. Pullum at 06:08 PM

Climb that iceberg

Last spring, I noted that Ben Roethlisberger described himself as "starting to hit the iceberg", meaning that his skills as a quarterback were just starting to develop. This afternoon, as the Pittsburgh Steelers' quarterback, Roethlisberger is dumping that iceberg right on top of the Philadelphia Eagles. At the end of the first half: Pittsburgh 21, Philadelphia 3. Roethlisberger 7/11, 102 yards, 2 TDs. I wonder if Justin Busch at Semantic Compositions has any second thoughts about his evaluation of San Diego's draft choice.

Posted by Mark Liberman at 02:22 PM

C-word fuss in Chi-town

Today's NPR On the Media story (streaming audio or transcript here) on the Chicago Tribune's 10/26 censorship of Lisa Bertagnoli's piece on semantic bleaching of the C-word:

It begins with C, rhymes with grunt, and refers to female anatomy. And its origins and usage were the subject of a freelancer’s story in the Chicago Tribune last week. But when editors decided that the story shouldn’t run after all, it was already too late, and staffers had to remove an entire section from hundreds of thousands of newspapers by hand. The word never actually appeared in the story, nor will you hear it in this one, from Chicago Public Radio's Diantha Parker.

Other links: Bertagnoli's blurb at her agent's site; Chicago Sun-Times story, Romanesko blog entry, Ms. Magazine blog entries (here, here), Michael Miner's description of his TV panel discussion of the controversy, in which he points out the very different reaction to what Jon Stewart's famous take-down of Tucker Carlson: "You know what's interesting, though? You're as big a dick on your show as you are on any show."; the 355 cunts at the Guardian. A few more random blog entries: Hi Res : Wide Screen, The Lincoln Plawg,

I haven't been able to find a copy of Bertagnoli's original article, although apparently some copies leaked out, and the Ms. blog explicitly asks for one. My faith in the power of the blogosphere is shaken.


Posted by Mark Liberman at 12:42 PM

Theory of mind on Mars

In today's NYT, there's an article by Kenneth Chang on Honeybee Robotics, located in a Lower Manhattan office, which controls the rock-examination activities of the Spirit and Opportunity rovers on Mars.

The scientists name not only every rock, but also separate locations on each rock that they examine. Earlier in the mission, true to the New York twist on this mission, the Honeybee engineers were able to assign the name New York to a grinding site on a rock named Mazatzal.

When the scientists wanted a deeper look into the rock, they changed the orientation of the grinding tool and renamed the site Brooklyn. "Somebody said it was the same target as New York but with an attitude adjustment," Mr. Bartlett said. (This being a scientific joke, it required a footnote: " 'attitude,' " Mr. Bartlett said, "can mean angle or spacecraft pointing."

This is a joke with a few lexicographical layers.

According to the OED, the astro/aero meaning for attitude seems to have arisen just about as soon as flying did:

1910 R. FERRIS How it Flies 455 Attitude, the position of a plane as related to the line of its travel.

The original English sense for attitude was apparently "the ‘disposition’ of a figure in statuary or painting", as in

1695 DRYDEN Dufresnoy's Art of Painting §4 The business of a painter in his choice of attitudes.

But artists, like other humans, interpret postures as outward and physical signs of inward and spiritual things, and so the meaning was extended to "[a] posture of the body proper to, or implying, some action or mental state assumed by human beings or animals", as in the common phrase "to strike an attitude : to assume it theatrically, and not as the unstudied expression of action or passion".

1775 HARRIS Philos. Arrangem. (1841) 346 These various positions peculiar to animal bodies, and to the human above the rest, (commonly known by the name of attitudes).
1883 J. GILMOUR Mongols xviii. 211 You will find him..striking pious attitudes at every new object of reverence.

and at the same time, people began to refer metaphorically to "attitudes of mind" lacking any necessary bodily expression at all:

1862 H. SPENCER First Princ. I. i. §1. 4 Much depends on the attitude of mind we preserve while listening to, or taking part in, the controversy.

and then to use the word attitude alone to mean "attitude of mind":

a1873 MILL Three Ess. Relig. (1874) 126 Along with this change in the moral attitude of thoughtful unbelievers towards the religious ideas of mankind, a corresponding difference has manifested itself in their intellectual attitude.

Ironically, it was soon possible to use attitude to refer to the mental state associated with a physical manifestation, as in this sentence about the "attitudes" behind pitch modulations in speaking:

1922 H. E. PALMER Eng. Intonation p. viii, We all recognize immediately..each of the attitudes associated with the tones.

By 1960 or so, attitude referred in some contexts to various negatively-evaluated mental states:

1962 MAURER & VOGEL Narcotics & Narcotic Addiction (ed. 2) 289/2 (Gloss.) Attitude, hostile or aloof and uncooperative.
1974 H. L. FOSTER Ribbin' iv. 169 Attitude, to get mad without a good reason.
1975 WENTWORTH & FLEXNER Dict. Amer. Slang Suppl. 673/2 Attitude,..a resentful, hostile manner, either toward people in general or toward a specific group. A person who ‘catches a quick attitude’ is one who is easily angered and ready to fight. Mostly black and prison use.

I suspect that this arose from the terminology of psychologists and social workers in contact with hospital patients, prisoners and so. In such contexts, an attitude worthy of comment is probably an attitude that is causing problems. When I was in secondary school in the early 1960s, "bad attitude" was a common enough diagnosis of disciplinary problems that it was shortened among students to "bad ad".

I don't remember the word attitude being current among college students in the later 1960s, but when I was in the U.S. Army around 1970, attitude definitely meant "bad attitude", and attitude adjustment was an ironic way to refer to authoritative punishment or retaliation. I don't know whether the term was actually used by people in the chain of command, but I remember someone saying that so-and-so was "in need of a serious attitude adjustment", as a way of justifying a fight, and I believe that "attitude adjustment" was one of the names that I saw written on the side of a rocket launcher (though maybe this is one that someone else told me about).

I can imagine that "attitude adjustment" was a 1960s-era psycho-jargon euphemism for punishment, roughly in the mode of the famous line "what we have here is a failure to communicate", said by Strother Martin's character in the 1967 movie Cool Hand Luke. However, I haven't been able to find any concrete evidence for this, and it's possible that it was always a joke from the beginning.

The phrase "attitude adjustment" doesn't seem to occur in the text of the OED. The earliest citation that I can find in Nexis for the joke-psychological usage is from the Washington Post for January 4, 1980, in a note at the end of a piece in the financial section by Bill Gold


Colleague Mike Kernan reports that Mr. Henry's on lower Wisconsin Avenue, does not have a Happy Hour.

From 4 p.m. until 8 p.m., it has an "Attitude Adjustment Period."

On the other hand, the earliest citation I found for the aero/astro usage was from May 5, 1975, in an article by Erwin J. Bulban, Aviation Week & Space Technology:

Actual docking requires only minor attitude adjustment maneuvers on the part of Soyuz, to align spacecraft antenna to insure optimum communications and telemetry data transmission during the exercise.

And I'm pretty confident that both usages date from the 1960s, though apparently not within the ken of Nexis.

Anyhow, the use of attitude for negatively-evaluated (i.e. aggressive, rebellious) attitude has meanwhile undergone the usual development into a positively-evaluated version of exactly the same behaviors (i.e. assertive, independent, irreverent). And the Honeybee Robotics joke (that Brooklyn is New York with an attitude adjustment) then depends on a compositional re-interpretation of this phrase, so that it's not longer a euphemism for punishment, and becomes instead a way of talking about personal growth or cultural variation.

The funny thing is, attitude comes from the same source as aptitude:

1668 J. E[VELYN] tr. Freart's Perf. Peinture Advt., Though we retain the words, Action and Posture..the tearm Aptitude [F. attitude] is more expressive. And it were better to say the Disposition of a Dead Corps than the Posture of it, which seems a Tearm too gross; nor were it to speak like a Painter, to say, this Figure is in an handsome Posture, but in a graceful Disposition and Aptitude [F. attitude]. The Italians say Attitudine.

and both come from Latin apto , frequentative form of apo, meaning "to fit, adapt, accommodate, apply, put on, adjust, etc." So in some sense an attitude adjustment is an adjustment adjustment. Of course, like disposition, the word adjustment also has psychological as well as physical interpretations.

Finally, here's a pretty good joke about attitude adjustment and animal communication.


Posted by Mark Liberman at 12:36 PM

November 06, 2004

Language Quiz II: The answer

See below. (Patrick Hall pointed out that for people who don't get to the quiz on the first day, it's unfair to put the answer on the blog's index page. So don't read the rest of this point unless you want the answer.)

The language is Somali.

The text is the title and the first two sentences from this page (in standard Somali orthography):

Colaadda gurigu waa mid waxyeelleynaysa qoyska oo dhan.
Haddaad u malaynaysid in dumarku badanaaba halis u yihiin colaad am waxyeelo uga imaaneysa banaanka ka baxsan ammaanka guriga, waad ku qaldantahay.
Waxay u badan tahay in dumarka lagu weeraro guryahooda ayna weeraraan dadka ay la nool yihiin.

whose English version is here:

Domestic violence hurts the whole family.
If you think women are most at risk from violence out on the streets away from the safety of home, you’d be mistaken.
Women are more likely to be attacked in their own houses by people they live with.

The rest of the Somali audio version can be found here in mp3 form.

The European relevance is to the recent killing of Theo van Gogh, in retaliation for his film on Muslim treatment of women, co-produced with Dutch MP Ayaan Hirsi Ali, who is Somali by origin. The five-page note tacked to van Gogh's body with a knife was addressed to her, though the killers were North Africans rather than Somalis

In 1998, I taught a field methods course in which we worked on Somali, and I was somewhat taken aback by one of the proverbs that we learned:

 Naag   ha             kaga     jirto       guri  ama  god  
 woman [optative AUX] in-in  stay+FEM+OPT   home  or   grave

"A woman's place is in the home or in the grave".

I was never quite sure how to square that one with another Somali proverb: Kunka koodi kownaka guurso; "A thousand assignations, one marriage."

Interesting grammatical note: the wordform kaga is a preposition cluster, in this case (I believe) equivalent to ku+ku, that is, two copies of the preposition ku meaning "in, into, on, at, with (by means of)". As John Saeed explains in his Somali Reference Grammar (p. 206, 2nd edition): "Where there are several locative prepositions in a sentence they all occur before the verb and merge into a cluster, with some accompanying sound changes". More on this (from the same 1998 field methods course) is here.

I'll post something more about Somali phonology, phonetics and orthography at some point -- for now I should just say that the letter "c" is used for a voiced pharyngeal "fricative" (where the scare quotes mean that it is often frictionless).

[Update: several people got this one -- often by interesting techniques.

For example, Jonathan Mayhew at Bemsha Swing used broad phonetic transcription and Google -- as he wrote by email:

Olad de guruga, o a mitwa fui elenis´ hoiska odan.
Hadad uma laysit, in dumarka padanaava, haleese ulaheen ...

Is it Somali? A google search for the word "ulaheen" brings up texts in this language. Also, Somali has tones, double vowels, glottal stops, and complex dipthongs--all of which I imagine I'm hearing here.

I'm probably way off. It's fun to guess anyway. Mitwa could be a Hindi word, Elenisa the first name of a Brazilian woman.

This was an example of true poetic inspiration, since the actual sequence that Jonathan transcribes as "haleese ulaheen" is "halis u yihiin" ("they are most in danger") in standard Somali orthography, and in IPA roughly /halis u jihi:n/. The fact that "ulaheen" turns up pages in Somali is a happy accident -- it's a completely different word, in which the "ee" is pronounced more like French é.

Chris Waigl got the right answer too, by more conventionally linguistic means, and documented her efforts in interesting detail here.

Stefano Taschini, who contributed the audio for the first quiz, emailed without explanation that

My very wild guess is Somali.

and of course his guess was absolutely right. I'm impressed.]

[Update #2: Sauvage Noble has an excellent transcription and much interesting discussion, but was not able to come up with an ID for the language. ]

[Update #3: Stefano Taschini emailed his method:

First, I used the "Amazing Slow Downer" to listen to the audio clips at half speed. What struck me is the retroflexed 'd' at the end of the first clip. Since that sound is often transcribed in latin alphabet as 'dh' when occurring in Hindi, I thought to look up the word 'odhan' on Google, and that returned a number of pages in Somali. What I could find about that language afterwards seemed to be consistent with the clips.

As with Jonathan Mayhew's method, this one involved a happy accident as one step: "qoyska oo dhan" means "the whole family" -- literally "family-the REL complete" = "the family, all of it"; the word "odhan" is the verb "to say", which is completely unconnected, as far as I know, and doesn't occur in that phrase at all...

The key thing, though, was that once the hypothesis of Somali was on the table, Stefano was able to find evidence by Googling texts for the identity of the language! ]


Posted by Mark Liberman at 10:53 PM

November 05, 2004

Language Quiz II

Many people liked our last language quiz, so here's another one. Can you provide a broad phonetic transcription for these three sentences (#1, #2, #3)? Can you identify the language?

Answers tomorrow.

(This one is a bit hard, so here's a hint -- it's directly relevant to a recent European event.)

Posted by Mark Liberman at 03:01 PM

November 04, 2004

Dan Brown still moving very briskly about

Despite having endured the pain of reading The Da Vinci Code (I wrote about it here), I have to confess to the (doubtless very surprised) readers of Language Log that I have been reading another Dan Brown novel. Only this time, with a purpose: I was invited to contribute a short piece about the use of language in Angels and Demons for a forthcoming companion book, Secrets of Angels and Demons: The Unauthorized Guide to the Best-Selling Novel (ed. by Dan Burstein and Arne de Keijzer, CDS Books, due to be published on December 1, 2005).

Dan actually wrote and published Angels and Demons earlier than The Da Vinci Code, in 2000 — it was re-released to enjoy a new life after Code made him a literary superstar. (He surely is; he is selling so many books now that he need never work again.) Angels and Demons is by no means a disappointment for those seeking a feast of ill-chosen word combinations, unintendedly bizarre similes, unnoticed self-contradictions, and occasional good old-fashioned sentence-mangling. In fact my only disappointment has been that in the 2,000 words I'm allowed for my article I simply can't use all the choice examples that I amassed in my notes as I went through the book. But I could share a few with you here from time to time, if you'd like. Would you like that?

Most of the cases I dropped from the article are a bit more subtle than the ones I kept in. There are some crashing failures of ear, for example. When Vittoria Vetra learns of the death of the adoptive father who nurtured her interest in science, Dan Brown writes:

Genius, she thought. My father . . . Dad. Dead.

"Dad. Dead." Just as one should be feeling her pain, one winces instead at the ineptness of the jingle created by these two phonetically similar adjacent monosyllables. But it's hard to explain that to someone who just doesn't see it.

And it is harder still to explain briefly one utterly unintended literary allusion that made me smile. I didn't attempt to do it in the published piece for the book, but let me try and explain it to you. Early in the novel, Robert Langdon, the Harvard professor of religious symbology, has been whisked across the Atlantic to visit CERN, the great high-energy physics research laboratory near Geneva, and is shown into the main lobby. Dan Brown writes:

A handful of scientists moved briskly about, their footsteps echoing in the resonant space.

Now, you might see nothing wrong with that. But "moving briskly about" is a cliché, and it immediately put me in mind of another place I had seen it. In the 1950s, Stephen Potter produced a set of four or five inimitably British books on "gamesmanship" and its extension into everyday life, "lifemanship". They were mock-serious how-to books in rather academic style, purporting to tell you how to be one up on everyone else you came in contact with, whether in games and sports or in ordinary social interaction. A whole imaginary world was created: a headquarters and college at Yeovil in Somerset, and a slew of imaginary expert one-uppers: Gattling-Fenn, Pinson, Odoreida, Carraway, Offset, Brood (and sometimes one or two real people slipped in amongst them; for example, Potter's hardback publisher appears as "R. Hart Davis"). Potter catalogs minutely and hilariously their subtle techniques for making other people (including each other) feel socially at a disadvantaged. Often those techniques are linguistic.

In Some Notes on Lifemanship (Rupert Hart Davis, 1950; Penguin paperback 1962) Potter speaks of "that great Lifeman Harry Gattling-Fenn, and his opening remarks": Gattling's remarks were designed to make people uneasy, to create "distrust, uncertainty, and broken flow" in conversations:

Gattling seemed permanently in the off-guard position. It was only by his opening remarks, his power of creating a sense of dis-ease, that one realized, as one used to say of him, that Gattling was always in play.

To a young person, for instance, who came to visit him he would say, genially of course, "Sit you down." Why was this putting off? Was it the tone? Then if the young man nervously took out a cigarette he would say, "Well, if you're smoking, I will."

He would say, "You want a wash, I expect," in a way which suggested that he had spotted two dirty fingernails. To people on the verge of middle age he would say, "You're looking very fit and young." To a definitely older man, of his still older wife he would comment that he was glad she "was still moving very briskly about."

The remark was of course intended to be deeply unsettling if not shattering: to say of someone that they are "moving very briskly about" implies that they are extraordinarily old and infirm, and it is a wonder they can even take a step without their walker. It simply isn't something you would normally say about ordinary people who have a spring in their step, or about scientists walking from one office to another in the foyer of a research center. It's a wonderful example of Dan Brown's knack for coming up with exactly the phrase not to use.

I'll post a few more of these little points that occurred to me from time to time. Or not: if you don't want to see any more, just think negative thoughts. I can't actually pick them up myself, but luckily I have a telepathic parrot who can.

Posted by Geoffrey K. Pullum at 05:35 PM

American Rhetoric

For those who haven't had enough recently... Michael Eidenmuller's American Rhetoric site has a list of the "top 100 speeches", according to "137 leading scholars of American public address", with transcripts for all but one, and audio for many. The main page has organized links to a "growing database of 5000+ full text, audio and video (streaming) versions of public speeches, sermons, legal proceedings, lectures, debates, interviews ... and a declaration or two".

If you're unhappy about how the election came out, you could listen to #33, William Faulkner on the "the old verities and truths of the heart ... love and honor and pity and pride and compassion and sacrifice", or #73, Lou Gehrig's Farewell to Baseball (" have been reading about the bad break I got. Yet today I consider myself the luckiest man on the face of the earth"). You should avoid wandering over to the movie speeches section, where you might be tempted to listen to Aragorn's Battle Speech at the Black Gate. If you're pleased with the election results, you could listen to #25, Ronald Reagan's A Time for Choosing ("I'd like to suggest there is no such thing as a left or right. There's only an up or down."). And to the extent that your priorities are elsewhere, you could read #84, Ursula Le Guin's Left-Handed Commencement Address ("Our roots are in the dark; the earth is our country. Why did we look up for blessing -- instead of around, and down? What hope we have lies there. ... Not in the light that blinds, but in the dark that nourishes, where human beings grow human souls.")


Posted by Mark Liberman at 08:12 AM

November 02, 2004

A Cootchie-Cootchie-Coo Theory of Language Acquisition

In a continuing effort to avoid thinking about the election tonight, I've been reading this week's Chronicle of Higher Education (11/5/04). On p. A15 there's a `Verbatim' column by Daniel Engber that reports on an interview with one of the authors of a book on the evolution of language: The First Idea: How Symbols, Language, and Intelligence Evolved from Our Primate Ancestors to Modern Humans (Da Capo Press). Since the column consists of Engber's questions and the author's answers, it's probably safe to assume that the answers reflect the author's actual thoughts, not a reporter's distortion. Some of them seem a bit...odd.

The authors are Stanley I. Greenspan, a pediatric psychiatrist, and Stuart G. Shanker, a philosopher and psychologist. The interviewee is Shanker, and he's quite confident about their scenario for the origins of human language:

`Guys that looked at the question of the origins of language always sort of looked at a group of adults sitting around with a problem that they had to solve as a group, like capturing a bison.

`We have to look at this in a totally different way. We have to look at caregiver-infant interactions; that's where the origins of language lay.'

So now we know; what a relief! No more worries about a lack of evidence! This view echoes (in a way, but I don't think it's too big a stretch) the common view that language change is usually or always initiated and implemented by children during first-language acquisition. I certainly wouldn't deny that children are sometimes responsible for significant linguistic innovations -- the evidence from the emergence of the Nicaraguan Sign Language makes that quite clear, clearer (in my opinion) than the theory-internal evidence provided by some historical linguists. But there's also excellent evidence that adults are sometimes responsible for significant language changes, and therefore that it's not an all-or-nothing choice for the agents of change. Maybe both the cootchie-coo and the let's-go-get-that-mammoth scenario for the evolution of language are valid too; who knows? Pace Greenspan & Shanker, I don't see solid evidence either way. (O.K., I haven't read their book. But they're hardly the first people to speculate on how language evolved, or the first to compare human communication to communication systems of other primates. And direct evidence is not going to be available.)

In the column, Shanker goes on to say this about language acquisition:

`It turns out that the information [available to the child for acquisition] isn't chaotic at all. If you just focus on certain things it may seem that way. But an awful lot of preverbal emotional signaling and interactions are going on in the first two years of life, which lay the groundwork and the foundation for language.

`Language development is a fairly long process.'

Possibly there's an interpretation of this passage that makes sense. But it looks to me as if Shanker is suggesting, or even claiming, that kids wait a couple of years before doing much language learning, and that during those two years they're building a foundation for language learning in all that emotional signaling and interactions with caregivers. So what about the very good evidence that six-month-old infants have already learned the phonemic inventory of the language in their environment? And the evidence that, throughout their first two years, children do a lot of actual language learning?

Shanker then mentions the development of syntax during first-language acquisition:

`One of the puzzles we've had in psycholinguistics is, How does the kid get from being able to put two words together to being able to put together a simple sentence or even more complicated grammatical constructions?'

Shanker doesn't give a definition of `sentence' that would explain his apparent belief that two words can't make a sentence in Standard English (consider Catch it!). His definition would have to exclude child utterances like Give book!, which seems awfully sentence-like even if it's not grammatical in Standard English. Don't child-acquisition specialists analyze such utterances as sentences?

Posted by Sally Thomason at 09:35 PM

An Escape from Election News into Brahui

Lately I've been finding creative ways to take my mind off the political news, and one of them involved checking Dravidian references for a student. This took me to one of the books I inherited from my father, the atheist son of a Methodist missionary to British India -- specifically Baluchistan, which is now in Pakistan -- early in the 20th century. The book is Notes on the Study of the Brahui Language (Revised and Second Edition, Quetta, 1917), by Rai Bahadur Diwan Jamiat Rai, C.I.E., Extra Assistant Commissioner in Baluchistan. Almost half the book is devoted to lists of single sentences, in categories like `Colloquial Sentences' (e.g. `You shall suffer rigorous imprisonment for 8 months, pay a fine of Rs. 50 and in default suffer further imprisonment for 2 months') and `Riddles and proverbs' (e.g. `Do not trust a wife, a sword, a horse and a serpent' and `He will give it to you when wild sheep are shorn' [i.e. never]).

To someone interested in language contact, like me, the most interesting sentences are in a section called `Miscellaneous Sentences'. Here are some examples, with numbers as given in Rai's book:

272 There are slight differences in the language of each clan of the Brahuis.

274 The Brahui we Shahwanis speak is good one.

411 Do you know Brahui?

412 Certainly, I can understand both Brahui and Balochi.

413 Do you know Jagdali?

414 No, I do not know Jagdali at all.

415 That man talks good Brahui.

416 He talks corrupt Brahui.

482 That man talks Jagdali, I cannot understand him.

Baluchi is an Iranian language, and Jagdali is an Indic (a.k.a. Indo-Aryan) language. According to the 1992 edition of the Ethnologue, the three varieties of Baluchi are spoken by a total of 5,230,000 people in seven different countries (but mostly in Pakistan); as of 1987, Jagdali had `a few thousand' speakers, all in Baluchistan. Brahui, the northernmost Dravidian language, has 1,710,000 speakers in three countries, but mostly in Pakistan. Assuming that the relative sizes of the speech communities were roughly similar in 1917, Rai's slight emphasis on Jagdali seems a bit surprising: two of his other sentences concern a sick nephew who declines to accept his uncle's advice to go to the hospital because he (the nephew) doesn't speak Jagdali and therefore wouldn't expect to get adequate treatment there. Maybe Jagdali speakers were rich and powerful in 1917 and/or especially inclined to populate major medical establishments? I could check up on this. Maybe I will, if the results of today's election make me want to escape to a different century

Posted by Sally Thomason at 08:04 PM

Pronouncing solemnly

Walking back from the market, late this afternoon, with stuff for the election-night party, I passed a group of students standing around the center of campus with Kerry-Edwards signs. Another group of students passed by in the other direction.

1st passing student:    Go Bush!
1st standing student:   F*** Bush!
2nd passing student:    Four more years!
2nd standing student:   Four more hours, dude!
(all laugh)

Well, we'll see.

Obligatory linguistic content: the word vote is from Latin vovere 'vow, solemnly promise', past participle votum 'vow, pledge'. In the 16th century, according to the OED, vote could be used Latin-wise to mean "a vow, a solemn promise or undertaking", and then the closely related idea "a prayer or intercession" or "a petition, a request":

1626 B. JONSON Fort. Isles, Song Wks. (Rtldg.) 651/1 All the heavens consent, With harmony to tune their notes, In answer to the public votes, That for it up were sent.  

For a little while in the middle of the 16th century it meant "an aspiration; an ardent wish or desire":

1667 Decay Chr. Piety v. p. 29 To breath out Moses's wish, O that men were wise; or if that be too hopeless a vote, O that men were not so destructively foolish.

However, the modern meaning "an indication, by some approved method, of one's opinion or choice on a matter under discussion; an intimation that one approves or disapproves, accepts or rejects, a proposal, motion, candidate for office, or the like", is apparently even older in English:

c1460 in Liber Pluscardensis (Skene) I. 394 Be eleccioune chosin men of gude,..Quhilkis has the votis of al the commonis hale.
1552 in Rec. Convent. Roy. Burghs (1870) I. 3 To woit about throw that haill nowmer,..and he that gettis monyest wottis to be chosin and sworn incontinent.
a1578 LINDESAY (Pitscottie) Chron. Scot. (S.T.S.) I. 18 Lyk as he haid beine suppreme magistratt apprivit be the vottis of this realme.

And where did Latin vovere come from? Apparently from indoeuropean *Hw-eghw-, Hughw- "declare, pronounce solemnly".

Anyhow, in a few hours we'll know, I hope, who "haid beine suppreme magistratt apprivit be the vottis of this realme".


Posted by Mark Liberman at 07:13 PM

November 01, 2004

How to decide who to vote for

I've just added this public service announcement to a 10/12/2004 Language Log post written by Bill Poser:

If you're still not sure how to cast your votes on Nov. 2, some good places to start looking for information might be the website for the League of Women Voters, Public Agenda's First Choice 2004, or Project Vote Smart.

I did this because among the 118 internet pilgrims who have arrived at this weblog during the past hour, more than half a dozen have been sent here by Google after asking how to decide who to vote for, help me decide who to vote for, how to decide how to vote, and so on. It's pathetic enough that people are reduced to asking Google this question, but it's even more pathetic that the number one hit for these questions (a few minutes ago when I checked) was that Language Log post. With all due respect to Bill...

(If you have a better suggestion for a public service link, please let me know -- the two I've given are earnest and serious, but for folks who are at this point reduced to asking Google for help, something punchier may be needed.)

The real surprise is probably that the number of people asking Google how to cast their ballots is apparently so low. Given our high ranking on some of the obvious ways to ask the question, our server might have been swamped by hundreds of thousands of desperate undecided voters. This hasn't happened, so either most people have already made up their minds, or else know better than to treat Google as an oracle for questions like this one.

This is not the only case where I suspect that some wanderers have stumbled on our blog in error, during the past hour or so -- here are a few of suspect search strings for which Google currently gives a high rank to Language Log posts that probably do not provide the information that is sought:

truth in politics #3
private sex #3
how to get a boyfriend #4
warning adult #2
russian tennis players #2
incall and outcall #1
log splitters #7
does size matter? #10

I don't mean to suggest that all or even most such hook-ups are futile: here are an equal number of (generally somewhat higher-ranked) cases where I suspect that the searchers found what they were looking for, or at least some information of value to them in the context of their search:

origin of cliché #1
"da vinci code" bad writing #1
perforating mexicans #1
parapalegic #1
inclimate weather #2
inuit words for snow #4
"sacha baron-cohen" simon #2
eggcorns #1


Posted by Mark Liberman at 10:40 AM

Keys and fetters

Edward Sapir's Language is available on line at Bartleby, and so I thought I'd quote a passage (from chapter 1) that is as relevant to the discussion of political framing as it is to the discussion of Amazonian arithmetic.

We see this complex process of the interaction of language and thought actually taking place under our eyes. The instrument makes possible the product, the product refines the instrument. The birth of a new concept is invariably foreshadowed by a more or less strained or extended use of old linguistic material; the concept does not attain to individual and independent life until it has found a distinctive linguistic embodiment. In most cases the new symbol is but a thing wrought from linguistic material already in existence in ways mapped out by crushingly despotic precedents. As soon as the word is at hand, we instinctively feel, with something of a sigh of relief, that the concept is ours for the handling. Not until we own the symbol do we feel that we hold a key to the immediate knowledge or understanding of the concept. Would we be so ready to die for “liberty,” to struggle for “ideals,” if the words themselves were not ringing within us? And the word, as we know, is not only a key; it may also be a fetter.

Posted by Mark Liberman at 07:11 AM

Pica on the Mundurucú

Pierre Pica, the lead author of the Science article on the Mundurucú that I discussed in a post on Saturday, sent this note:

Dear Mark,

I have sent you our article on Exact and Approximate Arithmetic in Mundurucú, and the press communique related to it.

The reason is that as, you will notice, our study which involved various groups -- illustrating the social diversity of the Mundurucú -- and a French control group -- clashes with Gordon's 'incommensurability' hypothesis.

More precisely :

We personally favour a performance limitation based on the lack of a routinized counting routine.

We find the term 'incommensurability' highly inappropriate, because even in the Pirahã, there is a linear relation (and thus a "common measure") between the true numerosity and the approximate response given.

It thus seems much more likely that similar concepts of number are present in all cultures --- but that cultures vary in the invention of tools to measure number.

This is a clear and simple statement of the difference between Peter Gordon's interpretation of his research among the Pirahã -- which, he believes, calls into question Sapir's hypothesis of "formal completeness" -- and Pica et al.'s point of view. (See here, here, and here for my original post on the Pirahã article, and notes from Dan Everett and Peter Gordon on the subject.)

Here is a .pdf of the CNRS press release on the research of Pica et al., and a .pdf of their Science paper (Pierre Pica, Cathy Lemer, Véronique Izard, and Stanislas Dehaene, "Exact and Approximate Arithmetic in an Amazonian Indigene Group", Science, Vol 306, Issue 5695, 499-503, 15 October 2004). (Peter Gordon's paper from the same issue, "Numerical Cognition without Words", is here, and the discussion by Gelman and Gallistel, "Language and the Origin of Numerical Concepts", is here.)

Here is an excellent page discussing research on arithmetic and the brain at Stan Dehaene's Unité de Neuroimagerie Cognitive (Cognitive Neuroimaging Unit) in Paris, with many interesting links.

In his email, Pierre went on to discuss my silly throwing-vocabulary example, treating it with more seriousness than it deserves:

With respect to your hypothesis about the Nerdahã, (a population that you imagine) who aren't interested in throwing things, I agree with your story. The Nerdahã have no words for pitch, fling, chuck, toss, sidearm, slider, curveball, bouncepass, and so on, for trivial reasons right ? It is not an interesting fact, and the scientist is misled.

As far as I can tell the fact that the lexicon is limited in this case is not linked to any interesting properties in the case of the Nerdahã. Their case seems rather involve their lack of interest for a kind of sport (an activity which is usually not related to deep innate properties – the opposite in fact).

I do not agree with you however, that your story has anything to do with the the facts that we are talking about.

It is not even clear to me that a 'dormant knowledge' is involved in your hypothesis.

I believe that Pierre and I largely agree about the throwing business. In my imaginary example, what the Nerdahã lack is not primarily throwing vocabulary, but rather "a routinized throwing routine", i.e. a set of well-practiced motor skills and associated cognitive systems. The case is different in many ways from counting and arithmetic -- though surely it's plausible that throwing also has some "deep innate properties"? But the purpose of my example was to displace the discussion from an area that most people think of as happening mostly in the brain (counting) into an area that most people think of as happening mostly in the body (throwing). And everyone understands that the development of motor skills is mostly non-linguistic, even though socially-important skills generally have an associated vocabulary.

The example thus made the point that skills and vocabulary might co-vary, without vocabulary necessarily playing any crucial causal role in the development of the skills. The Mundurucú example demonstrates the same sort of dissociation, even more strongly, by showing that it's possible to have (some of) the vocabulary without the skills. It remains unclear whether some others could have the skills without the vocabulary -- but I would predict that the answer should be "yes". This certainly can be true for some complex skills, including some that are mainly "in the brain"; and it seems to me that it might be true for counting and associated exact-arithmetic skills, if the "routinized counting routine" were (for example) mediated entirely by finger-gestures, say of the kind used in chisenbop. Though on the other side, the author of this page on finger multiplication says that

Now that I have told you all that, I will also throw in my two cents' worth about number sense. This "trick" will not teach your students number sense. They would really be better off making arrays, and sets of numbers to "see" the different products. They can build upon those experiences later. The finger trick can be time-consuming and does not transfer to conceptual understanding.

As for the incomensurability issue, I perversely claim to agree with both Gordon and Pica, as I'll try to explain in a later post.


Posted by Mark Liberman at 06:36 AM