Language Log

April 08, 2008

New infrastructure at Language Log Plaza

[New Language Log posts here!]

Sometime early Sunday morning, the disk drive on the venerable Language Log server began having problems, and the process that hands out .html pages hung. I was able to repair the disk, as I have before, and things worked for a few hours, but then the same sorts of things began happening again, and fsck and I were unable to persuade the file system to return to normal.

Unfortunately, all this happened just as I was scheduled to leave for a conference -- I'm now in Florida, and won't be back until Thursday. So I'm taking the opportunity to do some things that I should have done long ago.

First, moving to a newer machine. The old server was a 6-year-old Dell desktop PC, running an antique version of Linux, and sitting in an unused corner of a group office at IRCS. Chad Jackson at the LDC was good enough to procure a new machine, install Ubuntu 7.10, and set it up in a more stable physical environment -- but also with a new IP address. If you're reading this, then the DNS changes have propagated to you, or you've linked to us as http://itre.cis.upenn.edu.

Second, upgrading the content management software. Since June of 2003, we've been running with Movable Type 2.64, using the first non-hideous default theme that I found out of the box. It's worked fine, though the posting interface is kind of clunky, and it can't cope with non-ascii characters (unless they're encoded as é or 天 or whatever), and it doesn't offer full rss feeds, and ...

So for new posts, we'll move to something new, probably WordPress 2.5. I'll put a link to the new blog's index page here, as soon as it gets set up. (There are quite a few posts queued up and ready to go, so there will be some new content to read.) The new machine will be known to the world as itre.cis.upenn.edu (among other things), so all the URLs to old posts should work: "Incall and outcall", "Latte Lingo: Raising a pint at Starbucks", Tighty-whities: the semantics", and all the 5,000-odd others.

More later.

Posted by Mark Liberman at 07:30 AM

April 05, 2008

Someone is wrong on the internet, again

Randall Monroe continues the fight against error on the web:

Posted by Mark Liberman at 05:18 PM

Phishy mail


After a short spurt of postings about phishing back in 2004 (here, here, and here), Geoff Pullum returned to the topic in January.  Once again, his interest was in detecting phishing by looking at the grammatical and orthographic errors in postings.  Occasionally, as with some phishing attempts sent from a Stanford address to computer users at Stanford over the past few months, the mail is scarcely detectable as bogus.  But most attempts at phishing are astonishingly incompetent; you wonder how people could be taken in.  An example, ostensibly from customers.unit@uga.edu (the University of Georgia), is below, with some (but not all) notable features boldfaced.  For some time, I've been interested in whether there's enough evidence in these really inept messages for us to make a reasonable guess at the native language of the writer(s) -- which we have to hope is not English -- but the errors suggest structures that are very common in the world's languages.

Continue reading "Phishy mail"
Posted by Arnold Zwicky at 01:50 PM

Trent Reznor Prize, RNR division

The Trent Reznor Prize for Tricky Embedding (Right-Node Raising division) goes to Andrew Ilachinsky, author of "Exploring self-organized emergence in an agent-based synthetic warfare lab", Kybernetes, 32(1/2): 38-76, 2003:

4.84 Universal grammar of combat. Finally, what lies at the heart of an artificial-life approach to simulating combat, is the hope of discovering a fundamental relationship between the set of higher-level emergent processes (penetration, flanking maneuvers, containment, etc.) and the set of low-level primitive actions (movement, communication, firing at an enemy, etc.).

Wolfram (1994) has conjectured that the macro-level emergent behavior of all cellular automata rules falls into one of only four universality classes, despite the huge number of possible local rules. While EINSTein's rules are obviously more complicated than those of their elementary cellular automata brethren, it is nonetheless tempting to speculate about whether there exists — and, if so, what the properties are, of — a universal grammar of combat (emphasis added)

Continue reading "Trent Reznor Prize, RNR division"
Posted by Mark Liberman at 07:38 AM

Mailbag: comparative communication efficiency

In yesterday's post on "Comparing communication efficiency across languages", I compared the sizes of the English and Chinese sides of parallel (i.e. translated) text corpora, and observed that English seems to require 20-40% more bits to express the same information, even after the application of compression techniques that ought to eliminate most of the superficial and local reasons for such a difference. Bob Moore sent an interesting comment:

I can think of at least couple of reasons that might explain how there can appear to be a difference between the communication efficiency of two languages. One suggests that it might only be apparent; the other explains why it might be real.

Continue reading "Mailbag: comparative communication efficiency"
Posted by Mark Liberman at 06:57 AM

April 04, 2008

Yet another "yeah no" note

Following up on "Yeah no" and "'Yeah no' mailbag" (4/3/2008), Russell Lee-Goldman writes:

I was actually about to send a long email to you about yeah-no, but decided just to put it on my blog.

That's "Yeah-no and no-yeah again", Noncompositional (3/4/2008).

But one highlight that might interest you is that I found a token of "yeah no" while trolling through archived NPR transcripts on lexis-nexis: it's Geoff Nunberg in Talk of the Nation, April 2nd 2004. The lead-up to the "yeah no" starts at around 10m:30s into his segment of the show.

See Russell's blog post for a transcript and discussion. If I were breaking into the conversation at this point, I might observe that

Yeah, no, well, in fact "yeah no" is pretty much the thematic idiom of NPR.

That is, if I were the sort of person who makes such facile generalizations, which I'm not.

Continue reading "Yet another "yeah no" note"
Posted by Mark Liberman at 05:25 PM

Textbook ambiguities


Many -- indeed, most -- linguistic expressions have more than one meaning.  An apparently trivial observation, but one that leads to all sorts of puzzles in linguistic analysis and theorizing.  The central question is how meanings are associated with linguistic forms, and the answer cannot be that speakers have just memorized all these linkages (though they can have memorized some of them).  Instead, we need to look for some kind of compositional account, in which meanings of smaller expressions and meanings associated with syntactic constructions work together to predict meanings of larger expressions.  One crucial thing such an account has to manage is predicting, both accurately and completely, the range of ambiguities in complex expressions.

There's a huge literature on the subject, including textbook discussions of various ways in which ambiguities can arise.  As it happens, my recent mail has brought me in-the-wild examples of ambiguous sentences of just the sort in textbooks.

From a NYT Magazine piece "Students of Virginity" by Randall Patterson (3/30/08, p. 41):

(1) The Anscombe Society at Princeton went on to embrace positions not just against premarital sex but also against homosexual sex and marriage.

And the head on a piece on the Denver Post website (2/15/08) by T. J. Wihera:

(2) I love my dog more than you

Continue reading "Textbook ambiguities"
Posted by Arnold Zwicky at 01:29 PM

An infuriating Cupertino

Audrey Devine-Eller writes in with the latest entry for the Cupertino files. This spellchecker-induced gem is from the Student Personnel Services page on South Brunswick (NJ) High School's website:

In early August, all rising sophomore, junior and senior students will receive an unofficial copy of their transcripts. You should carefully review these for all infuriation: demographic data, courses taken, final grades earned, credits earned, participation in activities. If there are any errors, please follow directions for making corrections.

A likely suspect here is the misspelling "infurmation" getting miscorrected to "infuriation" rather than "information". But since the "edit distance" to "infurmation" from the two possible words is the same (a one-letter substitution in each case), I'm not sure why a spellchecker would rank "infuriation" more highly on its list of suggestions, especially considering "information" → "infurmation" involves a vowel-to-vowel substitution rather than a vowel-to-consonant substitution.

Continue reading "An infuriating Cupertino"
Posted by Benjamin Zimmer at 08:49 AM

Comparing communication efficiency across languages

In response to last week's post on comparative vocabulary size ("Ask Language Log: Comparing the vocabularies of different languages", 3/31/2008), a number of readers sent observations about a related but different topic, namely the comparative efficiency of communication. At least as measured by crude metrics such as bit counts, there are differences among languages that are not easy to explain.

Continue reading "Comparing communication efficiency across languages"
Posted by Mark Liberman at 06:35 AM

April 03, 2008

"Yeah no" mailbag

I've gotten a number of interesting messages about this morning's "Yeah no" post, and I also found the time to transcribe and discuss one typically complex example that turned up among the 5,000-odd hits in the search I did on LDC Online. Details below...

Continue reading ""Yeah no" mailbag"
Posted by Mark Liberman at 04:29 PM

Saying it wrong on porpoise

Grant Barrett is now doing a weekly language column for the Malaysia Star, and this week he talks about saying things the wrong way on purpose — intentional errors like the Internets and coinkydink. The column got picked up by Jason Kottke's blog, where commenters are chiming in with their own examples.

Just in case anyone thought this was a new phenomenon (hello again, Recency Illusion), an article on "Intentional Mispronunciations" appeared in the journal American Speech way back in 1932. If you don't have access to JSTOR and you're not a member of the American Dialect Society, you'll have to make do with this recent summary by Larry Horn on the ADS mailing list:

Margaret Reed (1932), "Intentional Mispronunciations". American Speech 7: 192-99.

This covers what Reed took to be a fad among the "light-hearted youth" of Central Westerners (she's writing from Nebraska) to circulate...well, intentional mispronunciations. (She's following up on a paper by Louise Pound from 10 years earlier in Dialect Notes.) Her categories include everything from adding or subtracting syllables and restressing (antique as "an-tee-cue", "champeen", "the-'ater"), tensing lax vowels ("genu-wine"), borrowing of "vulgar" pronunciations ("agin", "extry", "who'd-a thunk it", "varmint"), "Al Smith" English [a.k.a. Brooklynese, not a moniker Reed herself applies] ("boid", "noives", "toity-toid street", "winegar woiks"), the "extremely annoying" affectation of children's speech ("sojer", "sword" [with /w/, as we've been discussing recently], "Injun", "ax" for 'ask' [!-- she does add 'also archaic' for this], "itty bitty"), Yiddishisms ("epple", "darlink", "dun't esk"), various other dialect borrowings ("enyhoo", "pitcher" [for 'picture'], "divil"), blends and folk etymological forms ("bumbershoot", "brass-ear", "animule", "absotively"), misdivisions ("a tall", "a norange", but not "a whole nother"), spelling pronunciations ("k-nife", "g-nat", "X-mas"), and so on. She ends with the wistful hope that while "human nature" may be responsible for perpetuating this fad (or these fads--unclear how many causal factors are involved), "surely, in its fullest and most extreme form, the phenomenon is now passing its peak".

And of course we could take things back a century before that, to the 1830s, when a fad in comical misspellings eventually led to the popularization of O.K. (as definitively proven by Allen Walker Read in a series of American Speech articles in 1963-64).

Posted by Benjamin Zimmer at 03:33 PM

Yeah no

Matt Hutson writes:

There's a phenomenon that has interested me for a while, and I noticed a extreme example last weekend. When people mean "yes" they sometimes say "no, yeah" or "yeah, no" and when they mean "no" they say "yeah, no" or "no, yeah" or even "no, yeah, no."

On Saturday I was sitting next to someone at a lunch, and I counted four consecutive times when she said "yeah no" in place of "yeah." For example: "Did you like Columbia?" "Yeah no I loved it." (In fact, once I started looking for it, I never heard her say a simple "yeah.") The "no" was almost imperceptible each time, as the meaning was clear, and adding it is a common practice in speech.

Do you have any insight into the practice? Patterns in gender or age or situation or setup?

Well, there's Vicky Pollard's catch phrase "yeah but no but yeah but ...". Unfortunately, I don't know how to evaluate British class caricatures.

So for this morning's Breakfast Experiment™, I'll take a look at the use of "yeah no" in the LDC-Online (mostly American) English conversational speech transcripts, previously described here. (I'll leave the sequences "no yeah" and "no yeah no", which are significantly less frequent, for another morning.)

Continue reading "Yeah no"
Posted by Mark Liberman at 07:30 AM

April 02, 2008

"Ampersand asterisk star lightning bolt, you percent sign spiral thingy ministers!"

That would be the comic strip version, anyhow, of the scene evoked by the headline of Augustine Anthony's Reuters story, "Musharraf swears in Pakistan cabinet full of foes", 3/31/2008.

[Hat tip to Andy Hollandbeck]

Posted by Mark Liberman at 08:27 PM

Comprehensibility and standardness


Step 1: A language maven M contrasts two (roughly) equivalent variants X and Y, labeling them standard and non-standard respectively (or, more starkly, "correct" and "incorrect") and proscribing Y.  This is the labeling phase.

Step 2: M attempts to justify the differential labeling (and the accompanying proscription) by claiming that X has intrinsic virtues -- it preserves a distinction that's important for communication, it avoids ambiguity, it's "logical", it's briefer, it's clearer, whatever -- that Y lacks; Y is intrinsically inferior.  This is the justification phase.

Step 3: A linguist L objects to the justification phase -- sometimes also to the labeling phase, but the central question here is the validity of the justifications.  L argues that the justifications offered in favor of X over Y are ill-founded.  In particular, L argues that in practice Y does not impede communication or introduce pernicious ambiguity.  This is a rejection of the justifications, not of the labeling.  (In some cases L wants to dispute the labeling as well, but L rejects the justifications for dispreferring Y in any case.)

Step 4: L will make a similar argument in case after case, concluding that the standard variety is as it as a consequence of social, cultural, and historical forces, not because of some intrinsic superiority as a vehicle of communication.  Having examined case after case, L will note that each non-standard variant has its own intrinsic values -- it makes a distinction that's important in communication, it avoids ambiguity, it's more regular or is simpler in some other way, it's briefer, it's clearer, whatever -- so that the justifications are really beside the point.  (But this last step is important because it leads to the humane conclusion that users of the language are all concerned, tacitly of course, with communicative values; people who use non-standard variants are not just sloppy, lazy, cognitively impaired simpletons who have, moreover, perversely rejected the excellences of the standard.)

Step 5: Others now claim that L is maintaining (absurdly) that if people can understand something, it's therefore standard; call this "comprehensibility implies standardness".  This conclusion does not follow from what L says; anyone who draws this conclusion deserves to fail Logic 101.

I believe that no linguist has ever said that comprehensibility implies standardness, and also that no linguist has ever said that if a speaker of a language says something on some occasion it's therefore standard (in lay terms, "correct" or "grammatical"), an even more absurd claim that Geoff Pullum rants about occasionally (most recently, in passing, here).  Certainly I have never said either of these things.

Continue reading "Comprehensibility and standardness"
Posted by Arnold Zwicky at 07:37 PM

Ernie Banks gets apostrophized

When the Chicago Cubs unveiled a statue of beloved player Ernie Banks outside Wrigley Field earlier this week, there were murmurs of horror among the enemies of apostrophe abuse. The granite pedestal of the statue was inscribed with Banks' famous catchphrase, "Let's play two" — a shorter version of the saying usually attributed to him: "It's a beautiful day for a ballgame. Let's play two!" (As the Wikipedia page on Banks helpfully explains, this is "expressing his wish to play a doubleheader every day out of his pure love for the game of baseball, especially in his self-described 'friendly confines of Wrigley Field.'") But the carvers of the statue managed to leave out the apostrophe in "Let's". Local columnists and talk radio hosts had a field day with the goof.

This morning, the missing apostrophe took its rightful place on the pedestal. Lou Cella, the sculptor who made the statue, told the Sun-Times that it took about 30 minutes for carvers to etch the added punctuation. Below are before and after photos.

Continue reading "Ernie Banks gets apostrophized"
Posted by Benjamin Zimmer at 05:59 PM

Pennsylvania blather?

With the Democratic presidential primary in Pennsylvania still three weeks away, political reporters have a lot of column inches to fill and are no doubt looking for creative ways to combat the campaign trail's proverbial fear and loathing. Take Michael Powell's recent article for the New York Times about how Barack Obama is "grounding his lofty rhetoric in the more prosaic language of white-working-class discontent, adjusting it to the less welcoming terrain of Pennsylvania." Powell hauls out an unusual reference to support his essentialized depiction of Pennsylvanians (all of them?) as no-nonsense, salt-of-the-earth types:

Pennsylvania’s culture, as the historian David Hackett Fischer noted in his book “Albion’s Seed,” is rooted in the English midlands, where Scandinavian and English left a muscular and literal imprint. These are people distrustful of rank, and finery, and high-flown words. It should come as no surprise that the word “blather” originated here.

Kudos to Powell for making the attempt to provide background on "Pennsylvania's culture" from an academic source like Albion's Seed: Four British Folkways in America (1989). But he has misread Fischer, at least when it comes to the putative origins of the word blather. And the linguistic evidence presented in Albion's Seed is problematic enough without injecting further misinformation.

Continue reading "Pennsylvania blather?"
Posted by Benjamin Zimmer at 01:15 AM

April 01, 2008

Important safety information

If you have strong concerns about English usage, science reporting, language analysis, lexicography, or linguistic atrocities of any kind, you should use Language Log. It is well known for its delayed release. For best results daily use is recommended.

Although laboratory studies have shown the effectiveness of Language Log, it may not be for everyone. Federal regulations require Language Log to disclose possible adverse effects when used by children under twelve (12), by certain adults, or by small animals and birds that are said to evidence minimal language skills.

Continue reading "Important safety information"
Posted by Roger Shuy at 08:48 AM

Speculative semiotics of Northern European product names

Richard Morrison's 3/12/2008 column for The Times (London) ran under the title "The very Ikea: Denmark takes the floor in an entertaining feud", and began like this:

Not since Shakespeare declared that something was rotten in the state of Denmark have the inhabitants of that fair country been so disgruntled. A Copenhagen University academic has just produced some research that has shaken every Dane to his irreducible Viking core. He analysed all the products in an Ikea catalogue according to name. What he found was startling. It seems that Sweden's all-conquering furniture firm quite shamelessly names its fanciest futons, tables and chairs after Swedish, Finnish or Norwegian places, while reserving Danish place names for doormats, draught-excluders and cheap carpets.

Min gud, as they say in Danish. That has set the kat among the pigeons. The Danish press has accused Ikea of “symbolically portraying Denmark as the doormat of Sweden”. Ikea's response is that the Danes “appear to underestimate the importance of floor-coverings”. I can't work out whether that retort is a genuine attempt to smoothe ruffled feathers, or yet another sly Swedish dig at their neighbours. Either way, it hasn't helped to mollify the seething Danes.

Morrison doesn't tell us who the "Copenhagen University academic" in question was, but other coverage of the Great Danish Doormat Scandal identifies the professor as Klaus Kjøller, specialist in kommunikationsanalyse, massekommunikation, politisk kommunikation, interaktion, kulturanalyse, organisation, ledelse, ideologi, indoktrinering, indholdsanalyse, and sprogfilosofi.

Continue reading "Speculative semiotics of Northern European product names"
Posted by Mark Liberman at 07:12 AM

Liberman to move to BBC

In a major personnel shock, it was announced today that Mark Liberman is to leave Language Log to move to the Science News section of the BBC. Negotiations had apparently been under way for some time. Liberman's openly critical attitude toward the science reporting standards of the BBC (the organization that first brought the phenomenon of tricapital amphibia to the attention of the world's biologists) had suggested, to the few who knew of the ongoing discussions, that the BBC would fail in its bid to recruit him. But his critical stories were in fact a cover. Liberman said today, "I have a high regard for the BBC's upper-crust pomposity and tabloid-like credulity. And above all, I have a high regard for the ratio of its salary levels to those of Language Log. When plotted on a logarithmic scale, they absolutely go through the roof."

The salary Dr Liberman has been offered is reputed to exceed that of Natasha Kaplinsky, who was recruited away from the BBC last year by Five News. The BBC's move is widely regarded in the industry as the first step in a contest to fight back against Five. It is not clear whether Liberman will still be free to write anything for Language Log under the terms of his new contract. Language Log lawyers were reviewing his no-compete clause when this article was posted.

Continue reading "Liberman to move to BBC"
Posted by Geoffrey K. Pullum at 06:35 AM

March 31, 2008

Subjective tense


William Safire's most recent "On Language" column (NYT Magazine 3/30/08, p. 18) looks at the now-famous quote from Geraldine Ferraro, "If Obama was a white man, he would not be in this position."  Then comes a parenthetical digression on grammar:

"Get this," Sam Pakenham-Walsh, member of the Nitpickers League, said in an e-mail message, "we no longer use the subjective tense! Has all our education been for naught?"  Because Ferraro's statement posed a condition contrary to fact, her "if Obama was a white man" should have been were.

Yes, "subjective tense", in a grammar peeve.  Has all our education been for naught?

Continue reading "Subjective tense"
Posted by Arnold Zwicky at 01:39 PM

Ask Language Log: Comparing the vocabularies of different languages

Michael Honeycutt writes:

I emailed Steven Pinker with a question and he told me that I should contact you.

I am a college freshman who plans to study Modern Languages and I am fascinated with linguistics. The question that I had for Dr. Pinker was in regards to the active vocabularies of the major modern languages. This may be a novice question and I apologize ahead of time. I am curious if there are any studies on the subject of the percentage of a language's active vocabulary used on a daily basis. I have been looking into this for a couple of weeks in my spare time and the information I am finding is rarely in agreement.

For example, according to Oxford University Press, in the English language a vocabulary of 7000 lemmas would provide nearly 90% understanding of the English language in current use. Of these 7000 lemmas, what percentage will the average speaker or reader experience on a daily or weekly basis? Are there any particular languages or language families that have a significantly higher or lower percentage of words encountered on a day to day basis? Are there any studies on whether, for example, speakers of Latin used more or less vocabulary in their daily lives than speakers of a modern Romance language?

Those are interesting questions. The answers are also interesting -- at least I think so -- but they aren't simple. Let me try to explain why.

Continue reading "Ask Language Log: Comparing the vocabularies of different languages"
Posted by Mark Liberman at 08:59 AM

Motivated punctuational prescriptivism

Further to my remarks about colon rage, Stephen Jones has pointed out a very reasonable structural factor that might influence the use of post-colon capitalization, regardless of the putative dialect split (between a British no-caps policy and an American pro-caps policy): capitalization is strongly motivated, he suggests, when there is more than one sentence following the colon and dependent on what is before it. Jones offers these well-chosen examples to illustrate:

  1. In order to protect your computer, you should do the following: run a trustworthy anti-virus system such as AVG and keep it updated.
  2. Computers have become easier to use in various ways since the beginning of the decade: They no longer need periodic reboots almost daily. You can run multiple programs at the same time and never run out of system resources, since that bug disappeared with Win ME. The infrastructure of the telecommunications system is much more robust than before, and dropped connections are a rarity. And finally there has been a consolidation of software vendors, which means that software now is better tested and has more resources behind it.

He comments: "In the first example what comes after the colon remains part of the previous sentence. The punctuation hierarchy of period, colon, semi-colon remains in place. In the second what comes after the colon consists of several sentences, and thus the punctuation hierarchy is broken."

Continue reading "Motivated punctuational prescriptivism"
Posted by Geoffrey K. Pullum at 08:56 AM

March 30, 2008

Closure


In my last posting on open vs. closed, I looked at the question of why signs on shops and the like oppose these two words, and not opened vs. closed, or open vs. close (both of which would be morphologically parallel in a way that open vs. closed is not).  I assumed, but did not say explicitly, that what we want for the signs is two ADJECTIVES with appropriate meanings, and then explained that opened wouldn't do because it was pre-empted by open, and noted

the absence of an adjective close (pronounced /kloz/; there is an adjective close /klos/, the opposite of far, as in "Don't Stand So Close to Me", but it's not relevant here).

And people wrote to dispute, or at least query, my claim about close.  I will now try to fend off these criticisms.

Continue reading "Closure"
Posted by Arnold Zwicky at 02:44 PM

Well, maybe not the *first*, actually

Today's Dilbert explores the hidden weakness of the Turing Test.

Continue reading "Well, maybe not the *first*, actually"
Posted by Mark Liberman at 11:04 AM

Fourniret mailbag

A few days ago, I wrote about Michel Fourniret, the "Ogre of Ardennes", an accused serial killer known for what John Lichfield in the Independent called "complex, verbose but inaccurate French, with unnecessary subjunctive verbs and sub-clauses" ("Il fallut que j'accusasse: the morphology of serial murder", 3/27/2008).

Searching the web, I was able to find only one specific example of Fourniret's linguistic style, the phrase "Il fallut bien que je l'enterrasse" ("it was indeed needful that I should bury her"). The article in Le Monde remarked on the imperfect subjunctive, but called his language "suranné et ampoulé" ("outdated and turgid"), not inaccurate. So I wondered whether Fourniret is really given to hypercorrections and other mistakes in attempting to use a register above his station, or whether he's just obnoxiously pretentious and fussy.

Continue reading "Fourniret mailbag"
Posted by Mark Liberman at 08:21 AM

Hoping to be haunted by legitimacy

According to Perry Bacon Jr. and Anne E. Kornblut, "Clinton Vows To Stay in Race To Convention", Washington Post, 3/30/2008:

"We cannot go forward until Florida and Michigan are taken care of, otherwise the eventual nominee will not have the legitimacy that I think will haunt us," said the senator from New York.

I hate to go all Kilpatrick on this, but wouldn't it be a lack of legitimacy, or perhaps a failure to achieve legitimacy, that would haunt them? As quoted, the sentence seems to me to indicate that Senator Clinton hopes to be haunted by legitimacy, and for that reason plans to stay in the race until the nominating convention in August.

Continue reading "Hoping to be haunted by legitimacy"
Posted by Mark Liberman at 07:05 AM

Occupational eponymy

Gerry Mulhern of the Queen's University Belfast wrote a letter to Times Higher Education (2/28/08) after he looked at the list of the vice-chancellors of the Russell Group of the top (and hence most prosperous) UK research universities. He had noticed that there were two named Grant, and several other money-related names like Sterling (the honorific adjective used for the British pound), Thrift (the virtue of good budgeting), and Brink (the cash transport trucking company). He said it reminded him of the name of a director of human resources he once knew (back when Human Resources was still called Labor Relations, I expect), named Strike. The vice-chancellor of the University of Portsmouth later sent in letter (3/13/08) saying simply that he had "never had the guts to study onomastics." His surname, the signature revealed, is Craven. There is a childish joy to these odd coincidences that have given us people apparently named for their jobs (or people who obediently selected the jobs their names foretold). Eric Bakovic nearly choked up his oatmeal last December when he noticed an item about a food company executive with a name suggestive of hurling. I noticed with delight and amazement today that the name of the public relations man cited on this Arts and Humanities Research Council page is Spinner. Honestly. I swear this is not one of my little deadpan jokes. Spinner really is working as a spinner.

Continue reading "Occupational eponymy"
Posted by Geoffrey K. Pullum at 05:18 AM

March 29, 2008

More WTF coordinate questions


Today's find in the world of WTF coordinate questions is
(1) Anyone Here Been Raped and Speaks English?
This is the title of a book by war reporter Edward Behr that was mentioned on NPR yesterday morning.  It's a quote from a British television reporter who Behr observed looking for interviewees in a Congo airport in the 1960s.  Appallingly callous, but my topic today isn't the morals of journalists, but the twists of coordination in English.

We've been here before, though with a slightly less complicated example:
(2) Are you like most Americans, and don't always eat as you should?
The slight complication in (1) is that it's missing an initial auxiliary verb (has) -- but casual variants of English yes-no questions lacking an initial auxiliary are common and have been much studied.  In fact, what the television reporter said is quoted in a number of places, including Brewer's Famous Quotations (Nigel Rees, 2006), as having been
(3) Has anyone here been raped and speaks English?

Examples like (2) and (3) aren't nearly as bad as the failures of superficial parallelism in them -- a clause with subject-auxiliary inversion conjoined with a finite VP -- might lead you to expect, and they present serious problems for a reduction analysis of coordination, in which shared material in parallel positions is "factored out".

Continue reading "More WTF coordinate questions"
Posted by Arnold Zwicky at 11:57 PM

Modesty, hod-carrying, everything but relevance

Interesting to see my friends Mark Liberman and Stephen Jones arguing about whether James Kilpatrick's recent article makes good points. I was already planning to comment on my own reaction to the article: I was astounded by its sheer rambling emptiness; it was far worse than I was expecting.

Kilpatrick had a very clear mandate: he had been asked Why do we study grammar? by a first-year high school student in Oregon named Kathryn. Her question does need an answer. Kilpatrick was apparently intending to provide one. But instead he just sort of staggers about for six hundred words and then falls over and stops. Neither Mark nor Stephen has given you a proper sense of how bad the article is.

Continue reading "Modesty, hod-carrying, everything but relevance"
Posted by Geoffrey K. Pullum at 01:45 PM

Mongers

We have a real cheesemonger near where we live in Edinburgh: a small shop entirely devoted to cheese, with great wheels of the stuff in the window and a huge array of cheesy comestibles on offer and a genuine cheese expert in a white coat in charge and long lines of prosperous Stockbridge residents waiting outside to get in and receive their cheese advice.

We also have a genuine fishmonger a little further down into Stockbridge village, with huge ugly monkfish looking vacantly out into the street amid fantastic piles of ice, mussels, oysters, prawns, lobsters, herring, and more other slimy denizens of the deep than I could name. And it had been my intention for a while to write a witty Language Log post about the strange fact that in contemporary English (ignoring all the obsolete formations the OED includes) the combining form -monger can only be used to form words in which the first part is one of three basic household needs (cheese, fish, and iron) or one of a longer list of unsavory and frightening abstract entities (fear, gossip, hate, rumor, scandal, war, etc.). Nothing much more. (The word whoremonger, denoting the sort of person Eliot Spitzer would contact before a trip out of town, isn't really in use any more; pimp and madam have replaced it.) The form -monger isn't productively usable any more for deriving new words: you simply can't refer to a timber store as a *woodmonger, or use *meatmonger for a butcher.

But then The Onion just stole the idea for this theme out of my head and published today a highly witty news brief about a war- and fear-mongering conference. Probably better than what I could have done. Damn The Onion. Damn them.

Continue reading "Mongers"
Posted by Geoffrey K. Pullum at 10:55 AM

The values of "correct grammar"

In response to yesterday's post "James Kilpatrick, linguistic socialist", Stephen Jones writes:

I hate to have to come to Kilpatrick's defense again but his article is actually rather good. He makes two excellent points; that 'correct grammar' allows communication between people who speak different dialects, and that there must be some kind of agreed set of grammatical rules if we are to be able to interpret written laws and regulations.

Many people believe that stipulation of shared linguistic norms is essential to communication, or at least improves the efficiency and accuracy of communication. But on examination, this idea is transparent nonsense. Let me illustrate.

Continue reading "The values of "correct grammar""
Posted by Mark Liberman at 09:22 AM

March 28, 2008

Open and closed


In an earlier posting, I asked when closing begins and when stopping starts.  There was, of course, mail on the topic.  I'll comment on three responses, in three separate postings, beginning with the morphological asymmetry between the opposites open and closed.  Fernando Colina asked on 19 March:

So, why is it that stores display signs with Open in one side and Closed in the other? Wouldn't it be more logical to say Opened / Closed or Open / Close?

Well, a language is a system of practices, not a designed system, so some things are as they are just because of the way they developed over time; there are plenty of anomalies and irregularities in every language.  On the other hand, a language is a SYSTEM of practices, including many regularities.  It turns out that almost everything about open and closed is a matter of regularities; the special facts are the presence of an adjective open in the language and the absence of an adjective close (pronounced /kloz/; there is an adjective close /klos/, the opposite of far, as in "Don't Stand So Close to Me", but it's not relevant here).

Continue reading "Open and closed"
Posted by Arnold Zwicky at 03:17 PM

Bureaucrats

It's tax season here in America and that usually leads to lots of mumbling under the breath about those "damn bureaucrats in Washington" who make up those unreadable tax forms. Several words in the English language  rise to the level of making us mad and bureaucrat seems to be one of them. When our tax filing gets challenged, we blame those nasty bureaucrats at IRS. When we're bogged down with pages of needless forms to fill out, it's the fault of those anonymous servants of the government who are the problem. When a statute is incomprehensible, it's the bureaucrat's fault, even though we might better place the blame on the legislators who wrote it in the first place.

I rise today to defend those bureaucrats. Please stop hissing and booing. Let me explain why.

Continue reading "Bureaucrats"
Posted by Roger Shuy at 09:37 AM

James Kilpatrick, linguistic socialist

Wikipedia describes James J. Kilpatrick as "a conservative columnist". There's good evidence for this. His syndicated column was called "A conservative view"; he was, according to Wikipedia, "a fervent segregationist" during the civil rights movement; for many years he was the conservative side of the Point-Cointerpoint segment on 60 Minutes.

And yet, in his second career as "grammarian" -- by which he means "arbiter of English usage" -- Mr. Kilpatrick promotes the linguistic equivalent of a planned economy. Linguistic rules are to be invented by experts like him, on the basis of rational considerations of optimal communication, and imposed on the rest of us. For our own good, of course.

His most recent column ("Why do we study grammar?", 3/23/2008) offers a small but telling indication of this:

In speech or in writing, English is the greatest language ever devised for communicating thought.

Continue reading "James Kilpatrick, linguistic socialist"
Posted by Mark Liberman at 08:58 AM

Furth

The University of Glasgow's Faculty of Arts promulgated in 2002 a policy (see it here) that apparently relates to transfer of credit from foreign universities. But what it says, even in the main header to the page (and I thank Judith Blair for bringing this to my attention), is that it concerns "Grades received furth of Glasgow". What the hell is furth?

The answer is that it is yet another English preposition that I had never previously encountered in my entire life.

Continue reading "Furth"
Posted by Geoffrey K. Pullum at 04:46 AM

March 27, 2008

Il fallut que j'accusasse: the morphology of serial murder

According to John Lichfield ("Ogre of Ardennes' stands trial for girls' murders", The Independent, 3/26/2008), Michel Fourniret, who "is accused of seven murders of girls and young women and seven sexual assaults in a 16-year reign of terror in France and Belgium between 1987 and 2003",

is a man who likes to play mind games with investigators and appear more cultured than he really is. He is a keen chess player, who talks, and writes, in complex, verbose but inaccurate French, with unnecessary subjunctive verbs and sub-clauses.

Continue reading "Il fallut que j'accusasse: the morphology of serial murder"
Posted by Mark Liberman at 07:06 AM

March 26, 2008

Using the IPA

Since we were recently on the subject of Entering Exotic Characters, I thought it would be good to mention again the International Phonetic Alphabet. A clickable IPA chart that will play examples of the sounds for you is located here at the web site of the University of Victoria. John Wells at University College London has a page on The International Phonetic Alphabet in Unicode. Of course, to get it to show up properly, you'll want a font that contains the IPA. Two fonts designed particularly for their IPA characters are Charis SIL and Doulos SIL

You can enter IPA using any of the methods for entering non-ASCII characters in general but a clickable IPA chart may be particularly useful. You can use this IPA keyboard over the net or install CharEntry on your own system. The Yudit editor has an ASCII-IPA keyboard definition that makes typing in IPA straightforward. For example, you type t for "t", T for θ, s for "s", S for "ʃ", n for "n", N for "ŋ", i for "i", but I for "ɪ".

Posted by Bill Poser at 08:43 PM

The fractal theory of Canada

Ed Kupfer writes:

Your "X as the Y of Z" post reminded me of the semi-famous "Fractal Theory of Canada", posted to the Usenet group alt.religion.kibology by "Inflatable Space Bunny" many years ago.

Continue reading "The fractal theory of Canada"
Posted by Mark Liberman at 07:03 AM

Is autism the symptom of an "extreme white brain"?

In several previous posts, I've discussed Simon Baron-Cohen's theory of autism as a symptom of an "extreme male brain" (e.g. "Stereotypes and facts", 9/24/2006), and also Mary Bucholtz's hypothesis that nerdity is defined by "hyperwhite" behavior (e.g. "Language and identity", 7/29/2007).  I'm ashamed to say that it never seriously occurred to me to cross-pollinate these two theories, until (for serendipitous reasons) I recently read YW Wang et al. "The Scale of Ethnocultural Empathy: Development, validation, and reliability", Journal of Counseling Psychology, 50(2): 221-234, 2003.

Continue reading "Is autism the symptom of an "extreme white brain"?"
Posted by Mark Liberman at 06:51 AM

March 25, 2008

X as the Y of Z, again

In response to our recent MT funfest, Peter McBurney  wrote:

Your post reminded me of a funny experience from my management consulting days. In the early 1990s, we submitted a proposal to the Government of Uruguay to advise on reform of their telecommunications market. Our proposal included the sentence, "Uruguay has been called the Switzerland of South America".

Our proposal was unsuccessful, but shortly afterwards we were invited to make a similar proposal to the Greek Government. With a word processor, we were able to make a few edits to the text and submit it anew. Only after submission did we notice that we'd somehow written, "Greece has been called the Switzerland of South America".

To avoid these little cut-and-paste or search-and-replace embarrassments, it certainly be would be more convenient if we could just say "X is the Y of its superordinate category", in instances of the phrasal template "X is the Y of Z" (previously discussed here).

Continue reading "X as the Y of Z, again"
Posted by Mark Liberman at 06:53 PM

March 24, 2008

The (probable) truth about Austria and Ireland

In a couple of earlier posts, I expressed puzzlement about what patterns in parallel or comparable text corpora could have persuaded Google's statistical MT algorithms to translate "Austria" as "Ireland", and so on. Several readers, and Melvyn Quince, had a bit of irreverent but irrelevant fun with the resulting silliness, of course. Anyhow, Bob Moore from Microsoft Research has sent in a very plausible explanation. Like many such theories, it's completely obvious in retrospect.

Continue reading "The (probable) truth about Austria and Ireland"
Posted by Mark Liberman at 08:02 PM

Colon rage

Says geography professor Ron Johnston of the University of Bristol, in a letter to Times Higher Education (March 6, 2008, p. 29):

I note that in his work on the use of colons ("Colonic information", 28 February) James Hartley has adopted the appalling American practice of following a colon by a capital letter. I note that you have not followed him in your leader in the same issue, and trust that you will continue to use English English.

Some people really do have the threshold on their appallingness meter set to the wrong value, don't they? If we are going to use up the word "appalling" on a tiny variation in orthographic conventions, what kind of adjective will be left to describe the taste of fermented soy beans in methylated spirits, or the sound of a cat being electrocuted during a child's violin lesson?

Continue reading "Colon rage"
Posted by Geoffrey K. Pullum at 04:20 PM

Why Austria is Ireland

There has been a lot of activity here today in the great research center at One Language Log Plaza. People are running up and down the corridors showing each other new examples of Google's purportedly eccentric translation behavior. The Google translation algorithms perform strange substitutions involving European country names and language names. Among these are replacements of Ireland for Austria, and also sometimes Canada for Austria. I am rather surprised that none of the excited people falling over themselves in the corridors have noticed the obvious generalization.

The Google translation engine is of course a brute-force statistical scheme based on massive amounts of compared bilingual text, and it is quite insensitive to actual meaning. Notice that in one case the algorithm produced a text asserting that the Parliament of Canada meets in Vienna and in another the output text said that Vienna is in Ireland, but only if there were three question marks after the word Austria in the input. The translation algorithms clearly know nothing of politics, geography, or sober punctuation.

In my opinion, what is being statistically detected by the pseudo-translation algorithms is the blindingly obvious relation that holds between the relevant pairs. Think about it: In what respect is it that Ireland is to the UK (for British English speakers) as Austria is to Germany (for Germans), and also as Canada is to the USA (for American English speakers)?

Continue reading "Why Austria is Ireland"
Posted by Melvyn Quince at 01:53 PM

Austria == Ireland?

In response to yesterday's post about odd transnational substitutions in Google's translations ("Made in USA == Made in Austria|France|Italy|... ?"), Martin Marks writes:

I'm afraid I don't have an answer for your crazy Google mistranslation question, but I do have some even crazier data for you to deal with. On a whim, I "translated" your entry from German to English in its entirety. Most of the entry remains unchanged, with a few weird exceptions. ("German-to-English" is unchanged, for example, but "German-to-French" becomes "English-to-French".) However, one sentence really jumped out at me. You wrote "Of course Austria is not a German word...", but Google translated that as "Of course Ireland is not a Spanish word..."

Madness! Madness! I don't know if I can deal with this.

Continue reading "Austria == Ireland?"
Posted by Mark Liberman at 06:20 AM

Outwith

Many people think that while new nouns are made up all the time, and new verbs and adjectives are occasionally coined, the prepositions form a small set that is fixed and unvarying over centuries of time and across the English-speaking world. It doesn't seem that way to me. I still remember with pleasure the day I discovered a new one in Australian English, one that other dialects do not have. I might tell the story here some time. (I already told it in a talk on Australia's ABC Radio National, in a program called Lingua Franca, in 1998. Note that the preposition involved in that case was an intransitive one, like away, not taking a noun phrase complement. That means the traditional view would treat it — wrongly, I claim — as an adverb. Someone wrote to Lingua Franca about that point, so I explained the details in a later talk, transcript here.) Anyway, it was not long after my move to Scotland last year that I encountered a preposition that I did not recollect ever having seen or heard before, either in my early decades of living in Britain, or my many years after that living in California, or my long visits to Australia: the preposition outwith. Mark Liberman discussed it in this post in 2006 (which I had forgotten about until Lindsay Marshall reminded me; thanks, Lindsay). It means, as Mark said, "outside of" (exactly what without meant a century or two ago, before its shift to the meaning "not having"). And Mark noted that it is recorded as largely limited to Scotland. But the new part of the story is that it is not entirely thus limited: the other day I saw it used in an English newspaper, which could mean that it is spreading rather than becoming extinct. We shall see.

Continue reading "Outwith"
Posted by Geoffrey K. Pullum at 04:19 AM

March 23, 2008

Think of the Children

Geoff's discussion of the ridiculous amount of attention paid to the "fleeting expletive" problem in the United States reminds me of a concern that some Carrier people have with dictionaries, namely that they should not contain naughty words for fear that the children will learn them, as if their little minds will somehow be warped by learning the words that describe the central activity of human beings.

Continue reading "Think of the Children"
Posted by Bill Poser at 02:03 PM

A little more on obscenicons


In today's mail: a wonderful billboard that uses Chinese characters and Spanish punctuation marks as obscenicons, and some speculation about why = and + aren't good obscenicons.  This is a follow-up to two earlier postings.

The billboard advertises Chino Latino, a Minneapolis restaurant (at Lake and Hennepin, which might not be clear from the photo) that offers "street food from the hot zones", so the mixture of characters from Chinese and Spanish has some motivation.  The source of this photo, correspondent SYZ, suggests that the Chinese is gibberish (but see below), and notes that the sentiment is "a reference to the unspeakable awfulness of the weather in my lovely hometown of Minneapolis, where it snowed on Friday." 

Continue reading "A little more on obscenicons"
Posted by Arnold Zwicky at 12:43 PM

Y is X plus something


Another abstract for a paper that grew in part from material on Language Log.  This time it's for a conference to honor Jerry Sadock, May 2-3 at the University of Chicago.

Continue reading "Y is X plus something"
Posted by Arnold Zwicky at 11:12 AM

Article-article article abstract


Below is a conference abstract for a paper that grew, in part, out of material I was preparing for Language Log.  The paper was scheduled to be given at the American Dialect Society meetings in January, but because of sickness I wasn't able to give it then.  I then started expanding the abstract into a posting for Language Log, but of course it's been ballooning.  So here's the abstract as a promissory note.

Continue reading "Article-article article abstract"
Posted by Arnold Zwicky at 10:26 AM

Made in USA == Made in Austria|France|Italy|... ?

Antonio Cangiano has noticed an odd thing about Google's statistical translation software.  As he puts it,

Google Translate sometimes changes the country mentioned within the source language to the main country of the translation language.

I've checked the examples that he cites, and they work exactly as he says.

Continue reading "Made in USA == Made in Austria|France|Italy|... ?"
Posted by Mark Liberman at 09:37 AM

March 22, 2008

Entering Exotic Characters

Yesterday for the umpteenth time I was asked for assistance in getting exotic characters into a blog post, so I thought I'd post a little information about this.

If you've got already got the text that you want in Unicode, the problem with getting it into a blog post is probably that your blog software, like the Movable Type package that runs Language Log, gags on non-ASCII characters. To overcome this limitation, you need to replace your Unicode text with HTML numeric character references. For example, instead of directly entering the Unicode for "lower case e with acute accent" é, you enter é. This consists of the Unicode codepoint in hexadecimal 00E9, with the prefix &#x and the suffix ;. It is also possible to give the codepoint in decimal, should you be inclined to the vulgar idiom, in which case you omit the x: é.

Continue reading "Entering Exotic Characters"
Posted by Bill Poser at 02:56 PM

An annual appeal


Forwarded from the American Dialect Society mailing list yesterday, a message from Grant Barrett:

Linguist List is currently hosting its annual fund drive. The organization's needs are modest and the return on your money is significant.  (link)

If you have used its list archives, job postings, or other services over the last year, I would like to encourage you to contribute. I believe it's important that we recognize those good parts of the Internet and it is our personal duty to keep them alive.

If you work for an institution, please consider making a contribution of a size proportional to your organization's use of or appreciation for Linguist List.

Linguist List provides a (moderated) discussion forum, job listings, book announcements and reviews, calls for papers and conference programs, links to resources of many kinds on language and linguistics, and the Ask a Linguist service.  It also hosts over a hundred linguistics-related lists, and it archives many of them (including ADS-L).  All for free.  It's the sort of operation that actually requires a staff (students who receive fellowships), which means that it needs real money.

If you're unfamiliar with Linguist List, check out the site.  There's a lot there.

Continue reading "An annual appeal"
Posted by Arnold Zwicky at 12:50 PM

Something wiki this way comes


Geoff Pullum has now written a spirited defense of Wikipedia.  I applaud.  But on one point I have to issue a warning, having recently read Nicholson Baker's "The Charms of Wikipedia" (a review of John Broughton's Wikipedia: The Missing Manual, yet another splendid volume in the O'Reilly series of computer books) in The New York Review of Books (3/20/08, pp. 6-10).  What's at issue is Wikipedia as a boundless resource (unlike conventional print encyclopedias) -- this in face of an enormous number of entry deletions (Baker says about 1,500 a day), some of them removing clearly nuisance items, but some of them performed by "deletionist" editors bent on purging the site of entries they view as insufficiently important.

Continue reading "Something wiki this way comes"
Posted by Arnold Zwicky at 10:52 AM