According to the Houston Chronicle, "an 18-wheeler carrying 30,000 pounds of eggs overturned" today, "[sending] an avalanche of eggs sailing over the side of the overpass, crushing a state Department of Transportation truck at a construction site below". No one was seriously hurt, but the clean-up was apparently a messy, smelly business. The supervisor, Gary Babb, explained that "we were able to save a few cases of eggs, in case you need any." But when his co-workers brought lunch to the clean-up crew, said Babb, "They brought us scrambled eggs, you believe that? Sick sense of humor, these people."
We've now reached the essential "linguistic hook" that you've no doubt been waiting for. What's the structure of "Sick sense of humor, these people"?
It's related somehow to "These people have a sick sense of humor" -- but it's not quite right to take just any sentence of the form "These PluralNouns have a SingularNounPhrase" and transform it to "SingularNounPhrase, these PluralNouns". Or is it?
I queried Google with the pattern "these * have a", and tried transforming the first half a dozen examples with a suitable structure. Mostly not too bad, the results. Sounds kind of like one of those parodies of Bush 41 that used to be popular:
These cars have a lot of problems. ?Lot of problems, these cars.
These places have a presence of their own. ?Presence of their own, these places.
These eggs have a wonderful tale to tell. ?Wonderful tale to tell, these eggs.
These zills have a beautiful tone. ?Beautiful tone, these zills.
These comics have a lot of sex in them. ?Lot of sex in them, these comics.
These scenarios have a common theme. ?Common theme, these scenarios.
You definitely need some sort of modifier or quantifier, though:
These films have a plot. *Plot, these films.
These women have a dream. *Dream, these women.
And it's apparently not great for the complement of have to get too long:
These teams have a concentrated focus on the Xbox Full Spectrum Warrior gaming front!
???Concentrated focus on the Xbox Full Spectrum Warrior gaming front, these teams!
Definite subjects with determiners other than these are similar in quality:
The Amish have a distinctive culture. ?Distinctive culture, the Amish.
Those Germans have a word for everything. ?Word for everything, those Germans.
but plural indefinite subjects seem worse:
Stroke survivors have a high risk of dementia. ??High risk of dementia, stroke survivors.
Africa's economic problems have a medical solution. *Medical solution, Africa's economic problems.
though by now all the examples are starting to sound like the slurred dialogue of stereotypical drunks. And anyhow, surely there are some more general principles at work here...
[Update: Haj Ross asks
Q: is this mebbe the same rule that does
Your cousin has been working here a long time. -> (*has) been working here a long time, your cousin.
I'm not sure about this. Even without the postposed subject, you get things like the punchline of the famous story about "Silent Cal" Coolidge returning to his home town in Vermont after the end of his presidency; going to the local story, selecting a few items, and bringing them to the cash register, without saying a word; the storekeeper ringing up the purchases, also in silence, taking payment and making change, and then closing the encounter by saying
"Been away."
to which President Coolidge responded
"Ayah."
and left the store.
Of course, you could also just comment "Sick sense of humor." without the postposed subject, if you were a stereotypically laconic New Englander instead of a stereotypically garrulous Texan.]
The abbreviation for preposition is PREP; and if you replace the first and last letters by the preceding letter of the alphabet you get the name of a kind of cookie: OREO!
Can you think of another common grammatical term which yields the name of a common snack food when you replace the first and letters of its common abbreviation by the preceding letter of the alphabet, boys and girls?
No, of course you frigging can't. There isn't one. Listen, I'm going to tell you something I've never told anyone else before. I hate those stupid word puzzles that they have Will Shortz doing on National Public Radio every Sunday morning with a random listener over a bad phone line and Liane Hansen gets all nervous and giggly and sympathetic and tries to help out the listener if he turns out to be the kind of moron who is unable to achieve the marvellously useful and interesting feat of thinking up a name of a farm animal that begins with the same letter as a farm implement or something.
I suppose some people would imagine a grammarian is the sort of pointy-headed dweeb who would simply love to wake up on a Sunday morning to hear someone answer a series of questions about names of cities that sound like Latin names for ecclesiastical garments or two-word phrases for types of criminal activity where each word begins with the letter the other one ends with and then be told that since they got 3 out of 10 they will get an NPR lapel pin and a paperback college dictionary. Well those people would be dead wrong. I loathe word puzzles and when Liane Hansen introduces Will Shortz my arm twitches even if I'm asleep and my hand zaps over to the RADIO OFF button so fast that it makes a swooshing noise as it burns through the air. I couldn't give a monkey's fart about word puzzles. I couldn't...
The expressive power of human language is barely adequate to convey the profound level of apathy word puzzles provoke in me. I despise them. Actually Language Log is a bit too public a place for me to share the full visceral force of my reaction; ask me about them privately some time and I'll tell you how I really feel.
My grade 8 English teacher inisted on something perhaps even stupider than the obligatory omission of omissible that (that) Geoff Pullum discusses. She was of the view that one cannot combine more with an adjective like perfect that describes an absolute state. Her reasoning was that if it is correct to describe something as perfect, then the absolute has already been reached and no greater degree of perfection can be attained. This assumes a particular semantics for words like perfect, one that is plausible at first glance, but easily falsified. For instance, we can say that something is absolutely perfect, which wouldn't make sense if perfect already described the absolute state. I stand to be corrected by the semanticists, but it seems likely that when we say that something is perfect we mean that its degree of perfection falls within a distance d of absolute perfection, where the value of d is contextually determined. This allows us on the one hand to cut d down to 0 by specifying that something is absolutely perfect and on the other hand to talk about things approaching more and less closely to perfection.
I don't know if the demigods of usage have pronounced on this issue or not since I pay no attention to them, so I don't know if she got this silly idea from them, but she didn't cite authority as the basis for her views; she cited her incorrect semantic theory. Evidently she was not familiar with the Constitution of the United States, whose Preamble reads:
We the People of the United States, in Order to form a more perfect Union, establish Justice, insure domestic Tranquility, provide for the common defence, promote the general Welfare, and secure the Blessings of Liberty to ourselves and our Posterity, do ordain and establish this Constitution for the United States of America.This is not only grammatical; it is elloquent. I can only endorse Linda Monk's proposal that the Preamble would be a far more suitable recitation for schoolchildren than the Pledge of Allegiance. Unlike the Pledge of Allegiance, it is not offensive to atheists, cannot be considered idolatrous, and expounds the values on which the United States was founded.
I talked recently with an undergraduate who told me something about her grammar instruction in the Los Angeles public schools. And in addition to the usual nonsense about not ending sentences with prepositions and never using "contractions" and things of that sort, she told me a new one. She was told that sentences like the one you are now reading are ungrammatical.
The alleged fault I'm alluding to here does not have to do with the fact that the main clause is passive, though I have often encountered absurd over-applications of the notion that passives must be avoided, so that would probably have been considered a second strike against it. No, the red sentence above has another feature that is supposed to be a grammatical sin. Sit awhile and try to figure out what, before you read on.
What my undergraduate student's high school English teacher insisted on was that you should look at any sentence containing the subordinator that and see whether omitting it would leave the sentence still grammatical. If so, then you must omit it, this teacher said. She would grade you down if you ever used that where grammar did not absolutely require it.
Think about that. The teacher is saying that these famous lines by Joyce Kilmer are ungrammatical:
I think that I shall never see
A poem lovely as a tree
She is saying the same about the first sentence of Wuthering Heights. And so on and so on. This is worse than bad English teaching. This is raving, blithering nonsense.
But I think I know where it comes from. I think it originates in an elevation of a stupid mantra to the status of a holy edict. The mantra is "Omit needless words," stated on page 23 of Strunk and White's poisonous little collection of bad grammatical advice, The Elements of Style, and elaborated on by E. B. White in the reminiscences of his introduction. It could be interpreted in a sensible way as a piece of advice for those editing their own writing: make sure you're not being too wordy (e.g., why say on a daily basis if you're trying to keep to a length limit and the phrase every day is shorter). But the teacher must have decided that the Strunkian imperative had to be obeyed literally and without question at all times, and that punishment must be meted out to those who do not obey. Fascist grammar.
If I have one ambition for my professional life, it is to do something to drive back the dark forces of grammatical fascism of this kind, to help get English language teaching back into a state where the things that are taught about the grammar of the language are broadly the things that are true, rather than ridiculous invented nonsense like that all words are forbidden except where they are required.
Nicole at A Capital Idea links to my post on participial relative clauses with whom, calling it "interesting". She adds a cautionary coda:
Warning: not for the feint of heart.
Since Nicole indentifies herself as "A newspaper copy editor [who] talks shop and invites you to do the same", I'm going with the hypothesis that the eggcorn was ironic.
Google reports 3,910 hits for "feint of heart", so it's a 3,910 whG pattern, weighing in at 912 whG/gp. By comparison, "faint of heart" comes in at 151,000 whG, or 35,238 whG/gp.
Score: correct spelling 39, wrong spelling (or irony) 1.
Proportionally speaking, that is.
We don't usually post things here without a language hook, so in honor of Memorial Day I'll just put up a link to a post that I wrote last fall for Veterans Day -- though the language hook was vestigial at best -- and another to a (more linguistic) post about military modal logic.
We've seen several examples of people who think that "passive" means "without an explicit agent". Here's another example, from Phil Dennison's weblog.
Dennison quotes the lede of an AP story:
Prosecutors dropped their case Friday against a security guard in the 2000 death of a man put in a choke hold during a shoplifting investigation — a case that took on racial overtones.
and complains that "[i]t just 'took them on,' out of the ether or the phlogiston, I guess. Just like that. Nobody’s fault, really". Dennison points out that you have to read to the end of the AP story to learn how the overtones arose, namely because of protests led by Al Sharpton. The linguistic criticism is fair enough. It's a political question whether raising the racial issue was to Sharpton's credit or due to his "fault", but either way, his agency deserves to be placed higher in the story.
However, Dennison starts his post by writing "Here is a great example of how to mislead readers by using the passive voice", and ends "Don’t use the passive voice in news stories, kids. Especially in news stories about people doing things to other people. It’s really, really dishonest."
Ironically, there's only one instance of the passive voice in the offending sentence, and it's not the one that Dennison complains about. He's annoyed about "a case that took on racial overtones", which involves an ordinary active use of the verb take, in the past tense. Removing the relative clause and making the subject definite for clarity, we get:
The case took on racial overtones.
A passive version -- at best marginally possible for me -- would be
?Racial overtones were taken on by the case.
The actual passive in the AP's lede is in the phrase
a man put in a choke hold during a shoplifting investigation
Again removing the (implicit) relative clause ("a man (who was) put in a...) and making the subject definite, we get
The man was put in a choke hold during a shoplifting investigation.
An active version would be
[Someone] put the man in a choke hold during a shoplifting investigation.
Ironically echoing Dennison's ironic complaint, we could say "he just was 'put in a choke hold' out of the ether? Just like that. Nobody's fault, really."
I don't know anything about the facts of this case, and I'm not trying to take sides for or against either Sharpton or Dennison. The AP story's lede choses to be vague about two questions of agency -- who choked the alleged shoplifter to death? and who raised the issue of race in connection with the case? But the AP writer achieves this vagueness by using the passive voice in only one of the two cases -- and it's not the one that Phil Dennison complained about.
Kevin Drum at Washington Monthly posts a map of red vs. blue counties, and asks his readers "Can you guess what this map represents? Click the graphic for the answer if you give up. Hint: it's got nothing to do with politics." More discussion is here, here, here, here, and so on. A larger version of the map is here ( created by Matthew T. Campbell at East Central University in Oklahoma, based on a data from a survey by Alan McConchie located at this site). A larger (and more scientifically interesting) set of similar maps, presenting data gathered by Bert Vaux and others, can be found here.
[Kevin Drum link via Erika at KDT]
[Prior postings by Kerim Friedman (May 28) and and Irish Eagle (May 25).]
Glen Whitman of agoraphilia emailed to ask
You ended your post on the poetry of Rumsfeld with the following: "No one, as far as we know, has yet set to music the press releases of the Plain English Campaign." Isn't that one of those "double negations" you and your co-bloggers have discussed in some recent posts?
After all, I for one have not set those press releases to music, which means you can't (on a literal interpretation) say that *no one* has yet to do so.
I make plenty of mistakes -- although the rumor that Geoff Pullum gets teaching relief from UCSC in return for editing my posts is not true -- but I'm innocent in this case. The cited sentence wasn't a case of overnegation, because there's a difference between "(not) have yet V+en" and "have yet to V", and I used the first of these rather than the second.
Here's (the relevant bit of) the AH Dictionary's entry for yet:
1. At this time; for the present: isn't ready yet. 2. Up to a specified time; thus far: The end had not yet come.
We can just substitute "at this time" or "for the present" into the cited sentence, to clarify the meaning at the expense of complicating the form. Maybe "up to the present time" would be even clearer:
"No one, as far as we know, has up to the present time set to music the press releases of the Plain English Campaign."
That's just what I meant, and it has just the right number of negatives in it.
GW was thinking of a different usage of yet, which the OED gives as sense 2.c.:
2.c. Followed by an infinitive referring to the future, and thus implying incompleteness (e.g. yet to be done, implying ‘not hitherto done’; I have yet to learn, implying ‘I have not hitherto learnt’). Cf. also 5.
This yet is not a polarity item, but it does imply a negative:
(a) Kim has yet to arrive ⇔ (b) Kim has not yet arrived
My sentence was of type (b), but GW interpreted it as being of type (a) with an extra negative. In this case, I wasn't guilty -- but because this misinterpretation is only one little "to" away, at least with a verb like set whose past participle is the same as its bare stem, I probably should have chosen another wording.
Certainly plenty of others have made the mistake that GW attributed to me. There are 3,820 Google hits for "no one has yet to" -- that's 3,820 whG, or 891 whG/gp -- and all of those that I checked are overnegations (except for one or two that I couldn't understand):
(link) While no one has yet to describe England as the anti-Christ they have come close.
(link) No one has yet to compare these findings to possible early symptoms in men.
(link) ..no one has yet to beat my $12,000 pc
(link) The property... has been advertised for sale for nearly a year and a half, and no one has yet to purchase it
(link) No one has yet to figure out why ... they got it in their heads to film a "real lemming migration" ...
(link) No one has yet to sign on to star in the film.
etc., etc.
So we can add "no one has yet to" to the case of " fail to miss", as an example of a phrase that is almost always used to mean the opposite of its compositional meaning.
I guess that " construction grammar" implies that this is possible and even normal, but it still seems like a mistake to me.
A couple of other "yet" notes in passing.... We ought to be able to unify the AHD's two senses of this yet with a somewhat more abstract definition: "up to an implicitly specified time", where the time can be past (sense 2) or present (sense 1). The OED does this with its sense 2.a.:
2. a. (a) Implying continuance from a previous time up to and at the present (or some stated) time: Now as until now (or then as until then): = STILL adv.
This yet has become a "polarity item" (though other senses have not): the AHD's examples are negative for a reason. We no longer say "It's ready yet" meaning "it's still ready" -- we only say "it's not ready yet" or "is it ready yet"? There are plenty of other interesting semantic issues associated with the word yet, not least the question of how far to go in unifying its protean spread of structures and senses. Not all of the examples below are currently colloquial, but it's just as important to explain what we don't (or no longer) say, as what we do:
He may yet change his mind.
The Sekhti came yet, and yet again.
My sandals were worse yet.
Averse alike to Flatter, or Offend,/Not free from Faults, nor yet too vain to mend.
The tracks include..‘To Know Him is to Love Him’ (with David Bowie on saxophone, yet!).
A yet-warm corpse, and yet unburiable.
The swampy patches of yet unreclaimed forest.
This is the queerest thing yet!
Are we there yet?
Even yet not quite finished.
O merchants, tarry yet a day Here in Bokhara.
But there were..extensions of this practice as yet but little noticed.
As yet the Duke professed himself a member of the Anglican Church.
He was one of the numerous party of yet walkers in the world.
In the yet non-existence of language.
The splendid yet useless imagery.
Though his belief be true, yet the very truth he holds, becomes his heresie.
Surely I could always be that way again, and yet/ I've grown accustomed to her looks...
(quotes mostly but not all from the OED's citations)
Well, O.K., not a full-scale riot, but for a few minutes there I wondered if I was about to be in the middle of one. This happened years ago, but I was reminded of it by the recent Language Log posts on Strict Transitivity. I was teaching Introduction to Linguistics at a local maximum-security state prison in Pittsburgh, for the University of Pittsburgh's earn-a-degree-in-prison program, and my co-teacher and I had arrived at the topic of transitivity.
We asked the students whether some verbs had to be transitive. Yeah, a couple of them said, some verbs are only transitive. Like what? Well, there's find, that 's always transitive. No, said another student, you can use find intransitively. "NO you can't!", said the others. "YES YOU CAN" (he was shouting by this time), you can say "I looked all over the house for it, but I didn't find there, and finally I found in the yard." "NO YOU CAN'T!" (a growing number of the other twenty or so men in the class also began shouting) "THAT'S STUPID!" The holdout leapt to his feet and started waving his arms around: "YOU CAN TOO! YOU CAN FIND HERE, FIND THERE, IT'S INTRANSITIVE, IT'S FINE!" "OH NO IT ISN'T!" But at least the others stayed in their seats, so in the end Sasha and I did not have the opportunity to find out if the instructions we'd been given during the required orientation for working in the prison could actually be carried out ("In case of a threatening disturbance, show no fear, walk calmly to the wall and summon help by pressing the red button there"). (Yes, there was an actual red panic button attached to the wall of each classroom in the prison school.)
This incident left me with two main reactions. First, while it was in progress I envisioned the newspaper headlines -- "Volunteer Teachers Injured in Prison Riot over Transitivity" -- and thought that maybe that would finally convince the general public of the importance of training in linguistics. And second, I realized that I will never, never, never see a class of ordinary undergraduates getting so excited about a bit of language structure. It's not that I yearn for classroom riots, but I sure wouldn't mind transplanting some of the intellectual enthusiasm of my inmate students to my regular classrooms. (I've tried telling my classes that I wish they were more like prison inmates, but this doesn't seem to have the desired effect.)
It was the words "in Arabic" that truly shocked me. The source: Jeffrey Goldberg's long article "Among the Settlers" in the May 31 New Yorker (related slide show with audio here). The scene: outside Hadassah House, home to several families of Jewish settlers in Hebron, across the street from the Córdoba School for (Palestinian Arab) girls.
A group of yeshiva students appeared, walking in the direction of the Tomb of the Patriarchs. ... They had the wispy beards of young men who have never shaved.
Two Arab girls, their heads covered by scarves, books clutched to their chests, left the Córdoba School, and were walking toward the yeshiva boys.
"Cunts!" one of the boys yelled, in Arabic.
"Do you let your brothers fuck you?" another one yelled.
Raw, hostile, poisonous, overt, sexually-charged, ethnic hatred. And these young Hebrew-speaking men, isolated by soldiers from virtually all contact with Arabs, had taken the trouble to learn enough Arabic to be able to howl their filth in their victims' native language.
So many people talk about the need to speak a common language so that we can all get along the dream of Esperanto. Never think that sharing a language is either a necessary or a sufficient condition for being able to empathize with other human beings or to treat them with humanity. The key difference between human language and, say, the hyperspecialized dance "language" of honey bees is that a human language can be used for propositional communications of any sort. They are infinitely adaptable. They are just as good for expressing hate as for anything else.
Tom Friedman starts his NYT column today with this: "The American public has been treated to such a festival of mea, wea and hea culpas on Iraq lately it could be forgiven for feeling utterly lost."
In these latter days, that's about as good as we're going to get. For a glimpse of Dog-Latin as it once was, see Stevens' definition of a kitchen, from the entry for Dog-Latin in E. Cobham Brewer's Dictionary of Phrase and Fable:
As the law classically expresses it, a kitchen is "camera necessaria pro usus cookare; cum saucepannis, stewpannis, scullero, dressero, coalholo, stovis, smoak-jacko; pro roastandum, boilandum, fryandum, et plum-pudding-mixandum ..." A Law Report (Daniel v. Dishclout).
[Update: John Kozak emails:
The UK satirical magazine "Private Eye" has a running Dog Latin feature called "That Honorary Degree Citation In Full". Not online, sadly (as far as I know), but here's the current issue's offering:
SALUTAMUS BEII GEII TRES FRATRI CANTORES IN VOCE FALSETTO NOMINE BARRIUS, ROBINUSQUE ET MAURICIUS EHEU NUNC MORTUUS (QUONDAM UXOR 'LULU', DIVA CELEBRATISSIMA GLASWEGIENSIS CUM CARMINE POPULARE 'CLAMATE!') TRANSFORMAVERUNT MUNDUM MUSICAE DISOTECHNIS CUM JOHANNES TRAVOLTUS HOMO IN TUNICO BLANCO IN ARTEFACTO CINEMATICO 'FEBRUM NOCTIS SATURNALIS'.
John adds that
Now I come to think of it, more contemporary Dog Latin can be found in the spells in the Harry Potter books, of course (the Latin translation of HP&TPS leaves the spells "untranslated": they should be in mangled Greek, it seems to me).
]
Americans' accents may be " flat", but at least they're not "plummy".
According to a 1999 BBC News article, Radio 4 is said to have dumped an announcer for excessive plumminess:
Outspoken journalist Boris Johnson claims he has been the victim of discrimination because his accent is too "plummy". ...
The Daily Telegraph columnist and newly-appointed editor of The Spectator magazine believes he has been the victim of what he calls "vocal correctness".
The article goes on to relay the suggestions of Gregory de Polnay, head of voice at the London Academy of Music and Dramatic Art, about how Johnson could "reinvent" his accent:
He said the journalist's voice was that of someone "used to commanding, used to being heard".
Standardising Johnson's vowel sounds would be the first hurdle.
"All those rather clipped vowel sounds that go with that accent we could iron out," said Polnay.
It's interesting that the BBC can write without irony about "standardising" someone's Received Pronunciation vowels. And to talk about "ironing out" vowels makes it sound like they are not "flat" enough -- is there a translatlantic flatness continuum here, with Americans having too much of it, and upper-class Brits not enough? De Polnay adds that
Johnson's "nasality" would also have to be addressed, ensuring he does not push the sound of his voice down through the nose.
"Somewhere along the line somebody has said that [nasal] sound can appear to be more authoritative," he said.
Perhaps it's pushing sounds "down through the nose" (from where?) that makes them "plummy"? But he OED is quite specific about what plummy means, and the nose is not mentioned:
1. Consisting of, abounding in, or like plums.
2. fig. a. Of the nature of a ‘plum’; rich, good, desirable.
2.b. Of the voice, then of sound gen.: thick-sounding, rich, ‘fruity’; indistinct; with bass predominating.
However, the citations for sense 2.b., which go back to 1881, seem to refer to personal or stylistic characteristics, or even the sound of certain kinds of amplifying circuits, rather than to social class. Class aside, it's clear that sometimes plumminess is a Good Thing and sometimes (despite sense 2.a.) not:
1881 Punch 23 July 25/2 The same aged lover was bidding, with rather a ‘plummy’ voice, the More-than-Middle-Aged Heroine ‘good bye for ever’.
1947 Jrnl. Inst. Electrical Engin. XCIV. IIIA 446/1 Such distortions can be tolerated..without serious loss of articulation, though the speech will usually sound rather ‘plummy’ and unnatural.
1951 K. HARRIS Innocents from Abroad 199 The rich, plummy voice of [actor] Edward Arnold.
1955 Times 3 May 14/4 A disc which sounds plummy and muffled in tone.
1965 G. MCINNES Road to Gundagai xi. 197 His voice..was wonderfully plummy and Edwardian.
1970 Daily Tel. 1 Sept. 9/5 All India Radiomodelled..on the BBC, even down to the plummy accents of its announcers.
1975 City Press 1 May 16/5 Her duchess on the make is a finely pointed performance, the plummy vowels contrasting splendidly with consonants periodically marred by the lack of false teeth.
1977 Early Mus. Oct. 549/3 The plummy..tone [of Flemish virginals] is evidently more popular than the musically versatile but astringent Italian virginal.
1978 Gramophone Feb. 1439/1 His tone is mellow, but again, as in the Waltzes..the sound sometimes seems a bit plummy and close.
In contrast, the Boris Johnson story emphasizes class associations, and so do most of the comments in a forum devoted to Brian Sewell's candidacy for being "more annoying than Mick Hucknall". The nomination features the plumminess of his accent as a key source of annoyance:
First, his unfounded acid criticism of just about everything: "Oh, of course one cannot take Mozart seriously, since he didn't have an overblown plummy accent like one's own."
Second, his overblown plummy accent. Like Jeremy Spake, I suspect that this is a deliberately exaggerated affectation which makes up for his lack of other noteworthy features.
The other forum commenters echo the class associations:
...vain, self obsessed snobby little bastard...
posh talking bastard. thinks he is better than anyone else.
Sewell acts as if he's little lord Fontleroy.
Just because he went to Public school and practiced Received Pronunciation behind the bike sheds ... doesn't make his opinion any more valid than any other yobbo.
I've got Brian Sewell down as a massive wind-up merchant.
The plummy voice CAN'T be real [...] & his comment on the subject of common people, other cultures, non-arty types etc. are just inflammatory for their own sake... As for posh? Nah, not with a name like Brian...
For those who (like me) have never heard of Brian Sewell, here's an (allegedly typical alleged) transcript, showing the content as opposed to the accent:
'Brian Sewell': "So, how does one get to Gateshead?"
Gatesheed Coonsil: "Well, you can take the train out of King's Cross..."
'BS': "The train?! and travel with.... *the masses*....?!?!"
GC: "I think you'll find that's what everyone else does..."'Brian Sewell'": "I've heard that Gateshead is merely a small village, an insignificant backwater inhabited by uneducated, illiterate neanderthals who still live in caves? Is that true?"
Coonsil: "Naah, I think you'll find that's Sunderland."
You can check out the accent as well as the content in more detail on this Brian Sewell satire (?) site, where you'll find that his voice is not at all "plummy" in the OED's sense of "thick-sounding ... with bass predominating". Yet people today seem to accept "plummy" as a description of his way of talking, suggesting that it's the social class rather than the sound quality that has become primary.
[Note for Americans who (like me) don't follow British politics very closely: Boris Johnson has not suffered too much after losing his Radio 4 gig: Simon Hoggart wrote a few days ago in the Guardian that "in the fullness of time [Boris Johnson] will probably become prime minister". So apparently plumminess is not terminally out of fashion in the U.K. -- assuming, as I do, that Johnson did not take Gregory de Polnay's 1999 advice. I'm not sure whether Hoggart is serious, however, since most of his article deals with aspects of British culture that are opaque to me, such as what it means that Johnson "produced a used envelope and tossed it onto the table of the house", or why Mickey Fabb "will be polishing [Boris'] bat with linseed oil".]
[Update: Margaret Marks emailed a comment on Brian Sewell:
I absolutely agree that his voice is not what's usually called plummy.
Sewell was the boyfriend of Ant(h)ony Blunt, the Keeper of the Queen's Pictures who turned out to be Philby's third (or fourth?) man, i.e. a spy for the Soviet Union. I think he was about 80 when the news came out. His house was surrounded by journalists, who were dealt with by Sewell, unknown at that time, who obviously loved the limelight and had a variety of longwinded ways of saying 'No comment'. He does or did camp it up a lot, though. Deliberately exaggerated, as you say. A very upper-class accent.
A couple of relevant links are here and here.]
It sounds like Geoff's recent posts on transitive verbs have inspired a flood of activity in which people are seeking real linguistic examples related to a syntactic phenomenon, and that's all to the good. However, a quick perusal of the discussion suggests that people are taking an overly simplistic approach to the phenomenon in question.
There is a whole body of literature, dating back at least to Fillmore's work on definite and indefinite null complements, demonstrating that it is dangerous to think of the presence -- and particularly the absence -- of direct objects as a single phenomenon. For a start, "Habitual" or "characteristic" uses are well known to permit even the most classicially "rigidly transitive" verbs to appear without an object ("Pussycats eat, but tigers devour"); in addition, a verb's degree of flexibility with respect to omitting objects is influenced by aspectual considerations and the strength with which it selects the semantic category of its argument. Discourse factors can play a role as well.
All this is to say that "transitivity", in the sense of a dictionary's v.i. versus v.t., or in the sense of strict subcategorization frames, is not all it's cracked up to be, and it hasn't been for quite some time, at least to lexical semanticists who work at the interface with syntax. I haven't done a careful analysis, but the counterexamples being sent to Geoff seem like they fall into categories we already knew about.
I would say more, but theoretically this is a holiday weekend and I'd rather spend my time playing than discussing.
On May 14, when Bob Mondello reviewed Troy on NPR, he said:
"As with most sword-and-sandal epics, go indoors and everything's suddenly about statuary, and torches, and an international cast that's trying to reach common ground on accents; here the kings hail from Scotland and Ireland, and the followers from London's West End and Australia. Happily this makes Pitt's Achilles sound like the outsider he's supposed to be, even when he remembers to round his vowels..."
On the same date, when Carrie Rickey reviewed Troy in the Philadelphia Inquirer, she wrote
"Just because Pitt is a hair actor, tossing highlighted tresses for emphasis, doesn't mean he's a bad actor... But when Pitt opens his mouth, the voice that emerges is prairie-flat, lacking the thunder-on-the-palisades sweep and resonance of O'Toole and Bana. When Pitt speaks, you don't think Troy; you think, as a friend says, Troy Donahue."
There seems to be some sort of dialect=shape metaphor in the background here: American voices are flat, British (and Scottish and Australian) voices are not.
Although I suggested in an earlier post that Mondello might have been referring to Pitt's artificial r-lessless, that's probably not it. There a whole complex of inter-related semi-synesthesias going on here, several different metaphorical extensions of "flatness" -- to sounds, to articulations, and especially to social evaluation.
An IMDB review of the 1955 movie Seven Cities of Gold complains about Richard Egan as "Jose Mendoza" that
Egan is about as Spanish as William Bendix!! His flat American accent and obviously non-Latin coloring create a sensory paradox when he is onscreen.
A bit of Googling will turn up hundreds of other examples of this sort of thing. But what is flat about American accents, exactly?
A flat voice might be one that is emotionless or uninflected, and American speech is stereotypically uninflected by comparison to British speech. It's easy to find lists of (empirically unsupported) national stereotypes that depict Americans (especially men) as using little intonational modulation, for example this one:
These are some of the more commonly-held ideas about different cultures: German men speak fairly slowly, with a deep voice; everything sounds serious! Scandinavians appear more quiet and modest: soft, gentle tones but with clearly varied intonation. Italians, Spanish and Greeks always seem to be excited about something. Americans “chew” their words and frequently have very deep voices with “flat” intonation.
However, often it's the vowels or consonants rather than the intonations that are perceived as "flat", as in Mondello's review, or this Wired article about call centers in Bangalore:
"You try to place the accent. Iowa maybe? No, the "a" sound is too flat. California? Maybe it's a crowded call center in some business park in Kansas City.
But Betty is actually calling from Bangalore, and her real name is Savita Balasubramanyam. ... And her perfect American accent is the result of rigorous training and an employer-encouraged addiction to Ally McBeal."
Then again, this interview with Tom Friedman about his visit to call centers in Bangalore features a clip of an "accent neutralization class" in which the instructor uses the phrase "flat the 'tuh' sound" to mean "flap and voice prevocalic non-pre-stress /t/":
INSTRUCTOR: All right, class. I want you to take out your books and I'm going to give you a passage. Remember, the first day I told you that the Americans flat the "tuh" sound. You know, it sounds like an almost "duh" sound, not keep it crisp and clear like the British. So I would not say "Betsy bought a bit of better butter" or "insert a quarter in a meter." But they would say "insert a quarder in the meder," or "'Beddy bought a bit of bedder budder."
So I'm just going to read it out for you once, and then we'll read it together. All right? "Thirty little turtles in a bottle of bottled water. A bottle of bottled water held 30 little turtles. It didn't matter that each turtle had a round metal ladle in order to get a little bit of noodles." All right, who's going to read first?
Friedman explains that the same instructor
...also does British accents, American accents. That was actually for a Canadian call center. They were actually working on a sort of flat North American Canadian accent.
where he is not talking about T's but about some overall impression of flatness.
A discussion of accents in Hebrew suggests that American R's are "very flat":
"Don't worry about how you sound" was my mother's advice. She had heard a political talk in Hebrew on the radio, by someone whose masculine sounding voice spoke with a heavy American accent, including very flat "r"s. After the speech, the announcer said: "You have just heard Golda Meir speak." My mother suggested: "Listen to how she speaks. She's prime minister of Israel and her Hebrew sounds so American."
Flat vowels, flat T's, flat R's, flat accents. What are these people talking about?
As usual, reading the OED helps us to trace the metaphor back to its sources, which turn out to be multiple. Among the OED's senses for flat (the adjective) are:
4.b. Engraving. Wanting in sharpness...
4.c. Of paint, lacquer, or varnish: lustreless, dull.
4.d. Photogr. Wanting in contrast.7. Wanting in points of attraction and interest; prosaic, dull, uninteresting, lifeless, monotonous, insipid. Sometimes with allusion to sense 10. a. of composition, discourse, a joke, etc. Also of a person with reference to his composition, conversation, etc.
9.a. Wanting in energy and spirit; lifeless, dull.
10. Of drink, etc.: That has lost its flavour or sharpness; dead, insipid, stale.
11.a. Of sound, a resonant instrument, a voice: Not clear and sharp; dead, dull. Also in Combs., as flat-sounding, -vowelled.
11.b. Music. Of a note or singer: Relatively low in pitch; below the regular or true pitch.
There are several different metaphors here: "flatness is lack of variation"; "flatness is lack of some (desired) feature"; "flatness is lack of attractiveness"; "flatness is lack of resonance"; "flatness is pitch lowered relative to a reference value". All of these can be applied to sound.
There is a tradition (now obsolete) in phonetics of using "flat" (usually as opposed to "sharp") to describe certain differences in sound quality. For example, the OED cites
1874 R. MORRIS Hist. Eng. Gram. §54 B and d, &c. are said to be soft or flat, while p and t, &c. are called hard or sharp consonants.
This is the sense of "flat" as "voiced" (i.e. accompanied by vocal-cord vibration) that the call-center instructor in Bangalore employed, carrying on the tradition of 19th-century British phonetics, which was developed largely in order to help teach "commercial millionaires" and colonial subjects to speak "properly."
The OED also references two other similarly-abandoned traditional uses of "flat" in phonetics. One, due to Henry Sweet, refers to vowels made with a level or "flat" tongue shape, not raised or lowered in either the back or front:
1934 H. C. WYLD in S.P.E. Tract XXXIX. 609 The tongue may be so used that neither back nor front predominates, but the whole tongue, which lies evenly in the mouth, is raised or lowered. Vowels so formed are called ‘mixed’ by Sweet, but I owe to him also the term ‘flat’ which I prefer as more descriptive. The vowel [ʌ] in bird is low-flat.
The other one is due to Jakobson, Fant and Halle. This one refers to sound rather than tongue shape, invoking the contrast with "sharp" again, but in terms of physical measurements of frequency rather than subjective impressions of sound quality:
1952 R. JAKOBSON et al. Prelim. Speech Analysis 31 Flat vs. Plain...Flattening manifests itself by a downward shift of a set of formants.
However, I think it's clear that the references to "flat American accents" don't describe anything at all about the intrinsic quality of the sounds, but are pure social evaluation: "wanting in points of attraction and interest; prosaic, dull, uninteresting, lifeless, monotonous, insipid". The American middle classes have always had self-esteem problems.
As we know,
There are known knowns.
There are things we know we know.
We also know
There are known unknowns.
That is to say
We know there are some things
We do not know.
But there are also unknown unknowns,
The ones we don't know
We don't know.
This and six others songs with lyrics by Rumsfeld are available on CD here. No one, as far as we know, has yet set to music the press releases of the Plain English Campaign. [via kaleboel].
I hesitate to say a good word for modern software (it might encourage the further production of the sort of bug-ridden word processors I use so much), but I think Craig Silverstein (Google's technology director), in the interview that Mark Liberman recently cited, might actually be underestimating recent successes at simulating aspects of common sense. Try typing "Simmons" and "Beautyrest" into the Google search box (notice, no use of the word mattress) and watch the mattress ads spring up in the margin of your display of search results. If that isn't effective computer simulation of common sense I don't know what would be: the name Simmons is common enough (one might imagine it thinking), but there is a Simmons firm that has trademarked the name Beautyrest, and it's the name of a line of mattress products, so...
The classic dreams of GOFAI (Good Old-Fashioned Artificial Intelligence) may have been quixotic. But the work on impossible projects about replicating common sense (like inferring in context that someone who goes into a restaurant probably intends to eat a meal there, and will be paying for it, and so forth) used programmers who stayed in the business and ended up working on more sensible things that turned out to be considerably more successful. Google's subject-linked advertisements really do quite often turn out to be relevant, even if you do get silly mistakes based on superficial word similarities sometimes. This is not GOFAI, but it is reasonably called AI, and it is not completely brain-dead. I'm fully aware that what's really going on may be just a matter of mattress-sellers including words like Simmons and Beautyrest on the lists of words that they pay to have their ads associated with, which renders it almost stupidly simple; but my point is not that wondrously clever techniques of reasoning are really being used, but rather that it takes so little of this sort of elementary rigging and accessing of files and lists to make it appear that the system is intelligently responsive to one's interests, and the folks at Google are doing it quite well.
And it's not just Google. Yesterday I checked the details of a book on Amazon.com and I noticed that not only did it have some suggestions for me about the usual kinds of books I buy, it also asked me if I would like the book I had just looked up to be delivered to me the next day, and told me that if I placed my order within one hour and ten minutes that could be done. While I stared at this, the screen did a sudden update and "one hour and ten minutes" changed to "one hour and nine minutes". Now, the software accomplishment involved shouldn't be compared to to proving Fermat's last theorem, but looks intelligent. It's a closer approach to helpful, relevant, and timely information than I've had in most phone calls to retailers.
What is notable about these advances in advertising and retailing software, though, is that the symbolic computational linguistics of the 1980s has contributed nothing to them. None of the small wonders of convenience and user-friendliness found on the very best of the commercial websites involves anything you could reasonably call natural language processing.
(I know, I know, AskJeeves.com boasts of natural language processing, and says its product is "able to understand the context of what you are asking" and can offer "answers and search suggestions in the same human terms in which we all communicate". Puh-lease! can you say pa-thet-ic? Ask Jeeves "Show me some cars that are not Japanese." The results are all about Japanese cars. The NLP claims about AskJeeves appear to be a load of nonsense.)
Perhaps Silverstein is right to talk in terms of it being centuries before you can talk to a computer at the library reference desk and find it as intelligent as the human being who currently staffs it. But I'm not prepared to concede that yet. I think in due course we have to get back to real natural language front ends: it is possible to pull together (1) literal sentence understanding based on grammatical analysis, and (2) modern computational techniques of spotting likely relevance. But no one is trying to do it at the moment.
The people working on (2) are doing brilliantly. (Notice, the spotting of likely spelling mistakes in Google and advanced word processors is an aspect of (2), and deserves some real respect.) But work on (1) was all but abandoned in industry some ten years or more ago. It didn't have to be. Nothing was discovered that made people decide syntax or literal meaning were impossible to grapple with algorithmically. I am not ready to believe that centuries will have to elapse before I can send email to a shopping robot "Get me price information on Simmons Beautyrest mattresses from at least five stores in Northern California" and get some useful data back in response. Sure, there are some syntactically irresolvable ambiguities in there (is it mattresses from five stores, or information from five stores?). But Google's ad-relevance technology could pretty much figure out what I want even without analyzing the grammar. If we brought even just a little grammar together with some educated guesswork and smart search technology, we could have something really remarkable. Not intelligence and empathy we're going to be talking about something vastly less intelligent than a cockroach here, and an equivalent level of empathy, i.e., none but it could still be something very effective, something that you could easily mistake for rapid and effective assistance from an intelligent entity that had understood the gist of what you said.
Erika at Kittenishly Doomy Thoughts asks
What the hell was up with Brad Pitt's accent in Troy? It seemed to be entirely devoid of postvocalic /r/, but he didn't have any.. other.. features of r-less dialects. It sounded like some sort of affectation, but I had no idea what he was hoping to convey with it.
Could this have been what Bob Mondello was talking about on NPR when he said that "Pitt's Achilles [sounds] like the outsider he's supposed to be, even when he remembers to round his vowels"?
A lot of people are talking about how Brad Pitt talks, but so far, Erika is the only one (among those I've read or heard!) who's said anything specific and coherent about it. Here's a small fraction of the Brad Pitt accent discussion from (here and there in) FT forums:
Kithy: I'm totally going to see this. I don't care who's in it or how lame Brad's accent is...
Elle Driver: I am so gonna be there all those pretty men in skirts, I was laughing at Brad's accent during the trailer though...
Sharmila: What is UP with Brad Pitt's accent? It's not just me who starts cracking up during the trailer, right?
M One: And yes, even in the short previews, Brad's accent slippage is pretty bad. Skirts outweight accents for me though.
Talamasca: Oh good, I thought I was the only one who thought Brad's accent was lame in the commericals/trailer.
Satine O'Hara: It looks like an amazing film to me- besides Brad Pitt's accent which can be described as shaky at best...
Jcpdiesel21: I am very frightened of Brad Pitt's accent.
radguurl: ...Brad Pitt's terrible, terrible, terrible accent.
Knesaa: I rather liked Brad Pitt's Achilles also, shitty accent or no.
Cassandra423: Good thing he had those fabulous muscles to distract me from his horrible accent.
Sally Albright: From the trailer, I was concerned about his accent, but in the movie it wasn't that bad. Or maybe I become accustomed to it.
Elle Driver: I liked Brad Pitt as Achellies, but his accent was really, really bad
Binky: The accent didn't even bug all that much. I liked him. Not exactly a shining performance, but he wasn't the stand-out suckage of the film.
If I have a chance later on, I'll see if I can grab some Pitt accent audio from the online trailers -- so far all I can remember hearing, in a couple of TV ads, is the musical background for fleets of ships and flashes of battles.
[Update: Here are .wav files for Pitt's contributions to the scene in which Agamemnon confronts Achilles over possession of Briseis. Pitt is indeed consistently r-less in this scene, just as Erika says. So when Mondello said "round his vowels", did he mean "leave out syllable-final r's"?
(link) Perhaps the kings were too far behind to see.
(link) The soldiers won the battle.
(link) Be careful, king of kings.
(link) First you need the victory.
(link) You want gold? Take it.
(link) It's my gift. To honor your courage -- take what you wish.
(link) No argument with you, brothers, but if you don't release her, you'll never see home again.
(link) Decide!
There are some other interesting things about Pitt's speech in this scene, which I'll have to pick up later, as my youngest son and I are off to the zoo at the moment.]
Proposed strictly transitive verbs have half-lives as short as synthesized high-end transuranic elements. Lance Nathan at MIT squelches Adam Albright's suggestion squelch with a sentence off the web (http://thalo.net/freedom.html): "I have been a participant in other online forums, grew dissatisfied with the rampant tyranny of political correctness and impulse to squelch, and therefore acted to whack together my own modest forum site." So much for squelch. He also kills predecease: "Where the husband predeceases, neither widow nor children can claim a right in any part of the heirship moveables" (Erskine, Inst. Law Scot., 1765, via OED). And M. Crawford writes with this example of induce with implicit object: "I'm really hoping I go before they have to induce because I've heard so many bad things about induction and how much more painful it is." (That's, about inducing labor to hasten childbirth, of course.) So, though one could quibble, that looks like three of Adam Albright's potential strict intransitive verbs gone. And more counterexamples are coming in over the transom all the time.
The traditional statement about imperative clauses is that they express commands: they are for bossing people around, telling people what to do. This is not so. Imperatives express a much wider range of meanings, referred to in The Cambridge Grammar as directives. In today's Garfield cartoon strip (May 28, 2004) the difference comes out clearly.
"Nobody tells me what to do!", Garfield is thinking.
Then his owner/servant Jon appears with a heaped platter and says, "Have something to eat."
And the gluttonous cat says to himself, "Well, this is a bit awkward."
Not at all, Garfield. Tuck in. Jon's utterance is (syntactically) an imperative, but it is not (semantically) a command. It's just a directive. Directives can be polite invitations ("Come in"), suggestions ("Have a seat"), or even just good wishes ("Have a nice day!").
In the May 28 Ecommerce Times, Naseem Javed has a "Viewpoint" article entitled "Six Questions to Spur Web Success". Question number 5 is:
5. What is linguistics, and why do they embarrass your international customers?
A site name in one country can mean something entirely different when it circumnavigates the globe. How do you tackle such language issues? The answer is to acquire skills and a deeper understanding of global communications. Even if you're a regional player, your sites are still visible and exposed to the entire world. Cyber branding is an extremely global phenomenon.
Note that "A site name in one country can mean something entirely different when it circumnavigates the globe" is, I think, an Escher sentence of a new type, though I can't quite get it to sit still long enough to be sure.
There are some other great lines in this article:
One must determine the desired size, personality and length of the name, plus the choice of alpha characters, as each emits its own unique signals.
Customers must allow a name brand to settle in their minds before they give you cash.
When trying to process millions of silly and randomly structured names, the mind becomes overly tired.
The future can be pretty clear if it is planned today.
[Thanks to David Donnell for emailing the article reference].
[Update: Margaret Marks emails a link to a critical discussion (at wordlab) of an earlier Javed column: "... we feel compelled, in the interest of our profession, to debunk this bunk point-by-point...". Read the whole thing.]
My recent suggestions for verbs so rigidly transitive that they always have an overt direct object (where it is permitted) were have and keep. There may be some others; but not the ones some people have been sending me.
Andy Durdin was immediately reminded (as I should have been) of the words from the old Church of England marriage service from the Book of Common Prayer:
"I _______ take thee _______ to my wedded wife, to have and to hold from this day forward, for better for worse, for richer for poorer, in sickness and in health, to love and to cherish, till death us do part, according to God's holy ordinance; and thereto I plight thee my troth."
However, I think this is one of the cases where the construction involved requires a missing object. There are lots of these constructions, and in every case it is fine for the object not to be there; but it is also fine for the object of a preposition like at not to be there:
I want a wife I can love ____ for the rest of my life
I want a copy of this that I can look at ____ whenever I want.
That diamond would really be something to have ____ if you wanted to attract thieves.
This would be a useful goal to aim at ____ .
Peace of mind is a wonderful thing to have ____.
That Rembrandt is a wonderful thing to gaze at ____.
What these examples show is that in some kinds of sentence you are required to have a missing noun phrase. We're talking about whether or not you can leave it out in a context where it would normally be permitted.
That doesn't mean I was right about have. I wasn't, as email correspondents and bloggers have been jumping all over me to show. Keith Ivey sent me by email a Googled sentence, The world we live in is a world of those who have and those who don't, which seems fine. And Jonathan Mayhew offers an attested example, from Billie Holliday's "God bless the child": Mama may have, papa may have / But God bless the child that's got his own. Douglas Davidson points out that the Bible also says For he that hath, to him shall be given: and he that hath not, from him shall be taken even that which he hath, and he is quite right, it doth. Language Hat emailed me to point out that there is a book by Ernest Hemingway entitled To Have and Have Not. All in all, that pretty much does it for have.
Sasha Albertini suggested give and take might be obligatorily transitive, but I don't think so: The trouble with you is that you know how to take but you have no idea how to give.
Still, there may be some verbs that are. Adam Albright proposes a list that are about as solid as I can imagine coming up with. He doubts that an implicit objects would ever sound good with any of these verbs:
attain, attribute, cause, comport, delineate, depict, eclipse, impute, induce, portray, predecease, resemble, squelch, subsume, supercede, utter
The nice thing about this little puzzle is that it is highly empirical: you can always find one of your conjectures is blown away completely by a single counterexample. I once thought abandon was a solid case, and then I leafed through a copy of Atlantic Monthly at an airport bookstall one day in the 1990s when Newt Gingrich was threatening his Contract With America, and I read that the conservatives in the Congress were implacably devoted to the destruction of bloated Federal programs of expenditure: "Where necessary, we must not merely revise, we must abandon." I put the magazine back on the rack and got on the plane knowing that I couldn't cite abandon as strictly transitive any more. And as Adam says:
I also thought "await" might be one, but then I discovered this quote from T.H. White's "Once and Future King":
"There is nothing," said the monarch, "except the power which you pretend to seek: power to grind and power to digest, power to seek and power to find, power to await and power to claim, all power and pitilessness springing from the nape of the neck."
This is a lovely example of a genuine context in which plenty of transitive verbs could be coerced into occurring without their usual objects. And a similar case could readily be imagined that would remove several of the verbs on Adam's list above:
If you want to be an artist, it is not enough just to splosh paint around; before your abstractions can mean anything you must first study representational art you must learn how to delineate, to portray, to impute, to depict.
I think that is convincingly grammatical (not that it's a genuine citation: sometimes a syntactician cannot Google, but must invent). As yet I am not able to see it as likely that objectless occurrences will be found for attain, attribute, cause, comport, eclipse, induce, keep, predecease, resemble, squelch, subsume, supercede, or utter. But who knows? It may merely be that I lack the necessary imagination and haven't yet spotted cases that are out there somewhere in textland waiting to be Googled up.
For those of you who are planning to be in Philadelphia within the next few weeks, here's a short review (sent by Anna Papafragou to a local mailing list) of a play with a number of linguistic connections.
The Adrienne theater is now showing a play called 'Speech acts' written by
Claire Gleitman (Lila and Henry's daughter). This is a play about the
successful career and not-so-successful marriage of a female linguist. It
is a rare opportunity to see a play where the characters agonize over
giving LSA papers, academia, prescriptive grammar and the Great Vow
Shift (after all, the play is also about a troubled marriage). The play is
very funny - I saw it last night at the premiere and really enjoyed it.For those of you who would like to see it, it is on Mondays,
Tuesdays, Wednesdays, Thursdays and Friday nights, through June 18:Adrienne Theatre
2030 Sansom St.
Philadelhia
(ground floor, called "The Playroom").
Consider that 20-30 people a day visit our site to learn about wedding vowels, I'm surprised to learn that "great vow shift" is not in Google's index at the moment. Yes, I recognize that people who know what the Great Vowel Shift is also know how to spell it, but the same people tend to be fond of puns, and this is a good one.
This morning I checked the web for uptake on Geoff Pullum's proposal that the measure of (document) frequency on the web should be denominated whG/gp, i.e. "web hits on Google per gigapage". I didn't find evidence of mass buy-in, though it's far too early to tell. But once I got past all the hits associated with the Werner-Heisenberg-Gymnasium Göppingen, whose domain name is "www.whg-gp.de", I did find this post by Russ Barnes at apostropher.
Russ is also worried that the British may be catching up to the U.S. in wingnuttery. I would have guessed that our U.K. cousins were way ahead, but I've never seen a quantitative study.
There's been a lot of feedback on the "partitive participial relatives" that I discussed a couple of days ago. These are examples like "At present, personal injury cases are heard by many different Judges, some of whom having no experience in this field." Steve at Language Hat wrote about the topic, and his (literate and erudite) commenters commented on it, and Neque Volvere Trochum at entangledbank provided more analytic depth (as did N.V.T.'s literate and erudite commenters), and I got a bunch of email from our (l. and e.) readers. Andrew Durdin wrote in with some examples from classic texts (given below). Leaving aesthetic judgments to the side, I'll summarize the results by saying that most people seem to agree with me that the construction is ungrammatical, but some agree with Haj Ross (and the writers of the googled examples) in thinking that it's OK.
However, I wrote "seem to agree with me" because there are several different constructions being discussed, and it's possible to accept some of them while rejecting others. That's the way my own reactions fall out, for example. In my original post, I mentioned one such distinction in passing, and passed over the other in silence -- the post was already too long -- but the result was that some people misunderstood what I meant, so I'll try to clarify it here.
As before, those who are not interested in English syntax will want to turn their attention to some of our other topics, say Arabic domain names or vole as game animals.
The key question is whether a supplementary clause containing "whom" is tensed or not. If the answer is "no", then we have an example of the structure that I originally discussed. If the answer is "yes", then there are a couple of different sub-cases, to which people's reactions may also differ.
In all the cases under discussion, we have "Q of whom V-ing ..." at the start of a non-restrictive (or "supplementary") relative clause, where Q is something like some, both, most, neither, many, a few,, etc., V-ing is the present participle of a verb (usually having or being, though that is mainly because the result is an easy pattern to search for), and then ... matches the rest of the clause.
In some cases, the rest of the supplementary clause is nothing but the remainder of the verb phrase headed by the participle V-ing. That's the case with the first three examples that I gave (modulo working out the ... in the second one, where I didn't quote the original sentence in full), and it's also the case with this simple example that I cited later:
"He was one of four brothers, three of whom having died or departed."
The point is that the relative clause -- the clause containing the whom -- has no tensed (or "finite") verb.
Then there is a second structure, in which the supplementary clause is itself a complex consisting of two clauses. The first one is a participial clause, which stands as an initial supplement to a second, finite-verb clause. I gave two examples of this construction, including:
"The next evening we spent with the Consul and his two pretty daughters, neither of whom being able to speak a word of English, the conversation was carried on in French."
Finally, there is a third structure, where V-ing heads a participial relative clause serving as a supplementary modifier of the partitive Q of whom phrase, which in turn is the subject of a tensed verb following the participial relative. I didn't provide any examples of this structure, since I thought it was obvious that it isn't relevant, but of course nothing in this area is obvious. So here's a classical example, courtesy of Andrew Durdin:
(link) "[U]nder which every one did sit in his order according to his dignity, to keep him from the heat of the sun; divers of whom being of good age and gravity, did make an ancient and fatherly show." (Francis Pretty, Sir Francis Drake's Famous Voyage Round the World, originally published in 1580)
You can see the difference more clearly if we take the three supplements by themselves, replacing "whom" with "them", fixing Pretty's punctuation for clarity, and replacing his "divers" with "some" for the same reason:
(a) Three of them having died or departed.
(b) Neither of them being able to speak a word of English, the conversation was carried on in French.
(c) Some of them, being of good age and gravity, did make an ancient and fatherly show.
Example (a) is not a complete sentence, since it lacks a tensed verb, but (b) has the tensed main verb "was carried (on)", and (c) has the tensed main verb "did make".
Using square brackets for clause boundaries, and putting the participial clauses in red, we can schematize these three structures as:
(a) [ (Q of them) V-ing ... ]
(b) [ [ (Q of them) V-ing ... ] [ Subj VerbPhrase
] ]
(c) [ (Q of them) [ V-ing ...] VerbPhrase]
Now, we can take each of these structures, replace (the word) them with whom, and embed the whole thing in a main clause in which some noun phrase is to be non-restrictively modified by the structure we've created.
In the case of structure (c), the result is a completely normal if somewhat complex supplementary relative clause. Put the participial clause in parentheses, and it won't even be too hard to read:
[link] ... I am talking about 54 million people in the U.S., some of whom (being wealthy) can afford a better life ... (re-punctuated for clarity)
In the case of structure (b), the result is a non-restrictive relative clause in which the relative pronoun whom is buried inside a recursively-embedded participial supplement. This is a dangerously complex structure, but I find it perfectly grammatical, once I figure out what the writer had in mind. Such patterns seem to have been fairly common in earlier centuries, when grammatical complexity was not only tolerated but even encouraged. Of the two examples that I originally cited, I find the first to be perfectly clear, though a bit archaic-seeming, while the second is very hard to construe. Here is it again, with a comma and some parentheses added in a feeble attempt to make the writer's intent easier to see:
"In this report I have dealt more in particulars, for the reason there are no reports from brigade commanders, ( [ all three of whom having been captured ], I reserve to myself the privilege of making such corrections as would appear right and proper when I subsequently have the opportunity to examine their reports. )"
Here is the promised list of examples from Project Gutenberg e-texts, emailed by Andrew Durdin -- with my classification of each into the categories (a), (b) and (c) as defined above. Note that only category (a) is really an example of the structure that I originally intended to comment on.
"under which every one did sit in his order according to his dignity, to keep him from the heat of the sun; divers of whom being of good age and gravity, did make an ancient and fatherly show." (Francis Pretty, Sir Francis Drake's Famous Voyage Round the World, 1910; http://www.gutenberg.net/etext01/fdvrw10.txt)
{type (c)}"Sir Henry Sidney had three children, one of whom being Sir Philip Sidney, the type of a most gallant knight and perfect gentleman." (GORDON HOME, WHAT TO SEE IN ENGLAND, 1908; http://www.gutenberg.net/1/1/6/4/11642/11642-8.txt)
{type (a)}"The duke left Filippo and Giovanmaria Angelo, the latter of whom being slain by the people of Milan, the state fell to Filippo" (NICCOLO MACHIAVELLI (anonymous translation), HISTORY OF FLORENCE, 1901; http://www.gutenberg.net/etext01/hflit10.txt)
{type (b)}"the marriage register contains an entry of the names of Thomas Tilsey and Ursula Russel, the first of whom being "deofe and also dombe," it was agreed by the bishop, mayor, and other gentlemen of the town, that certain signs and actions of the bridegroom should be admitted instead of the usual words enjoined by the Protestant marriage ceremony" (The Mirror of Literature, Vol. 20, Issue 572, October 20, 1832; http://www.gutenberg.net/1/1/8/6/11863/11863-h/11863-h.htm)
{type (b)}
There is another dimension of variation -- is the embedded non-restrictive relative clause placed at the beginning of the main clause, in the middle of the main clause (following the modified noun) or at the end of the main clause (perhaps immediately after the modified noun, or perhaps following some other stuff)?
For some people, the biggest surprise is that these Q-of-whom V-ing clauses can sometimes occur sentence-initially, preceding the modified noun(s), as in the first example that I cited:
"Both of whom being influenced by Ellington, Rowles and Brown choose one Ellington tune for each of the two albums that comprise this two-CD set..."
You would think that movie directors would know what censorship is, but it seems that they don't. There's a new DVD player out, the RCA DRC232N, that is causing a fuss. This DVD player contains software called ClearPlay that cuts out words and scenes that would be offensive to the viewer. The user configures the DVD player according to his or her preferences in a number of categories: violence, nudity, blasphemy, and so forth. The ClearPlay company reviews films and produces an electronic annotation indicating the location of words and scenes that would offend a viewer with certain preferences. If the user of the DVD player wants to cut out certain kinds of material, he installs the annotation. The DVD player then cuts out whatever bits, according to the annotation, would not conform to the preferences set. Some people are using this system to control what their children watch. Others are using it for their own viewing, to avoid what they would find upsetting.
I would think that the movie industry would be pleased. Such a gadget provides people who are easily offended by current movies, or who are concerned with what their children watch, with an alternative to complaining about the movie industry and demanding censorship. It will probably increase sales, since people who might previously have avoided a film may now see it. In fact, the industry is outraged and is engaged in litigation over this. There is much talk in the media and on the web of "censorship". The Directors Guild of America says:
ClearPlay software edits movies to conform to ClearPlay's vision of a movie instead of letting audiences see, and judge for themselves, what writers wrote, what actors said and what directors envisioned.There's some muddled thinking going on here. This isn't censorship. The movie industry still puts out exactly what it wants to, and audiences still see what they they want to see. Everybody is free to use another DVD player, or to use this one with a configuration that suppresses nothing, or to use it without the annotation. All this software does is allow the user to choose to skip selected portions of the movie. Contrary to the DGA's claim, ClearPlay doesn't edit movies to conform to its vision - it simply tags them so that viewers can make their own decisions. Providing audiences with a choice is not censorship.
There is also an issue here of artistic integrity, though it is a bit of a stretch to characterize some movies as "art". The DGA seems to think that the director has the right to have his work viewed exactly as he intended it. There's some validity to this when a work is unique, which is why in some European countries there are now laws that prevent the alteration or wanton destruction of works of art, but here there's no question of the original being lost. The director's only right is to present his work to the viewer - he has no right to control what the viewer does with it. When you read a book, you read it as you wish. You can skip whatever you like, you can read it backwards, you can skip around, or look up selected bits in the index. You have no obligation to read the book as the author intended you to.
The DGA should should be ashamed of itself for crying wolf about censorship. Allowing people to skip bits of movies that they find offensive isn't censorship. Censorship is a very serious matter, and in much of the world is severe, as documented by organizations like Human Rights Watch. It isn't absent in the United States, as the National Coalition Against Censorship and American Civil Liberties Union will attest. It isn't a word that should be taken in vain.
It is generally assumed by syntacticians that some verbs are obligatorily transitive. An example of one that isn't is eat. It can be used either with a direct object (I've already eaten lunch) or without (I've already eaten). I don't mean in constructions where the object is required to be eliminated, like passives (The food wasn't all eaten) or relative clauses (the things that they eat); I mean that in construction types that allow the object to be present, if the verb is eat the object can just be left implicit. Some verbs are standardly said to be much more rigid, insisting on an overt direct object noun phrase. A syntactician might well exemplify with verbs like, say, discard, or abandon, because it would be easy to assent to the notion that sentences such as these deserve their asterisk annotations for ungrammaticality:
*The company eventually decided to discard.
*I hope your brother doesn't just abandon.
But the syntactician would be wrong.
Look at this sentence, which I came across in this article about moves to improve the standard of computer programming in the future:
They require the courage to discard and abandon, to select simplicity and transparency as design goals rather than complexity and obscure sophistication.
Perfectly grammatical. So there goes the idea that discard and abandon illustrate obligatory transitivity. It is in fact extremely hard to come up with a list of even ten transitive verbs associated with a really hard requirement that the direct object be explicit rather than implicit. Sometimes I almost begin to think there aren't really any. I believe I can still be confident about have and keep. But finding eight more wouldn't be easy.
Craig Silverstein, technology director at Google, was recently interviewed by Stephanie Olsen at CNET. The discussion got around to Artificial Intelligence, or at least Artificial Reference Librarians:
You have portrayed the ideal search engine as one resembling the intelligence of the Starship Enterprise or a world populated with intelligent search pets. Can you talk a little bit about those ideas?
Well, the third idea is having the computer be as smart as a reference librarian. That's interesting, because reference librarians, of course, use computers, use Google to help them search, but they put some element of intelligence into it that the computer cannot do by itself.So, part of the goal is to make computers smart enough so that when you interact with them, they can do something with that information to help you actually get better results. That is certainly something Google thinks about to improve quality.
When do you think that kind of artificially intelligent search will happen?
I think that understanding language is kind of the last frontier in artificial intelligence, and then talking to a computer will be just like talking to a reference librarian, because they will both be equally knowledgeable about the world and about you.The big difference, and this is where the search pets come in, is that the reference librarian will understand emotions and other nonfactual information that even a fully intelligent computer may have trouble with.
In terms of timing, I typically say about 200 to 300 years. I think it is probably closer to the 300th year end of it. But if it ends up being closer to the 200th year, I would not be around in any case, and I will not be able to have anyone gainsay me.
Good thinking.
Going back further, even 30 years, the people who were working on artificial intelligence in the '60s thought all these problems would be solved by today--and we are basically not very much closer in terms of those overall high AI goals of understanding language.
So basically, Silverstein agrees with Marvin Minsky -- and just about everybody else -- that AI is brain dead. But further, he's making a quantitative estimate that it will take between 200 and 300 years to create an artificial reference librarian.
SIlverstein is only 31, and he's not a scientist, so we can't apply Arthur C. Clarke's dictum that "When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong."
So we need another proverb, along the lines of "When a bright young technology hotshot predicts that something will be developed next year, (s)he's almost certainly ____. When (s)he predicts that something will not be developed for two or three centuries, (s)he's very probably ____." Fill in the blanks for yourself. There's more than one right answer, I'm sure.
In the absence of a compelling "young technologist" substitute for the "elderly scientist" role in Clarke's Law, we might instead turn to Kernighan's Law (at least I think it was Brian Kernighan from whom I first heard this): "When a programmer says that a piece of code will take X amount of time, double the estimate and raise it to the next higher unit." In other words, "half an hour" means "one day"; "one day" means "two weeks"; "two weeks" means "four months"; and so on.
According to Kernighan's Law, 200-300 years translates to 400-600 decades, i.e. four to six millennia. On that view, the time period between AI and the development of the computer would be about the same as the time period between the development of the computer and the invention of writing. That would have a nice symmetry, but I doubt that it's true, if only because cultural prediction on a time scale of millennia (or even centuries, at this point) seems absurd.
[Silverstein interview via Blogos.]
Just now on Jeopardy one of the categories in Double Jeopardy was Latin. The questions required knowledge of some words, but also of an irregular verb form and some grammatical terminology. Contestant Anne got all five right! Maybe people aren't quite as ignorant about language as we thought. Or maybe it was just a fluke.
According to Jonathan Wright's Reuters wire story, Khaled Fattal at the Multilingual Internet Names Consortium (MINC) has the goal of "enabling Arabs unfamiliar with the Latin script to use the Internet in Arabic alone", and suggests that "given enough funding, say $6 million, his organisation could produce tangible results within nine months". Note that the issue is only the use of Arabic in domain names and other aspects of URLs -- web pages can now be displayed correctly in Arabic script in all the major browsers, and there are a large and growing number of Arabic-language web sites. I'm puzzled that the article doesn't mention the fact that ICANN announced a plan almost a year ago, to permit web addresses in any Unicode-supported language -- which certainly includes Arabic.
ICANN's announced plan was based on a reversible encoding from Unicode into ASCII (cleverly if undiplomatically called "punycode"), rather than on modifying the web's infrastructure to permit Unicode directly in domain names. It's clear that the proposed solution has some problems, since the domain-name proposal doesn't seem to have gone much of anywhere in the past year -- though Mozilla has supported "punycode" since rel. 1.4 -- but it would nice to know what the problems are. The Reuters article hints that the difficulties might be more political than technical -- "[t]he Arab Internet community has ... wasted several years in disagreement over which characters are essential and how to map them into computer code". But maybe there are technical issues with "punycode" too. In any case, the reporter is pretty thoroughly clueless -- or has been made to seem so by his editor(s) -- since (for example) the word "Unicode" doesn't occur anywhere in the article.
What really puzzles me, though, is the quote from Paul Verhoef, identified as "a vice president at the International Corporation for Internet Names and Numbers (ICANN)". Verhoef is represented as saying that "What Khaled says is true, because if you only speak Arabic, why would you be interested in the Internet?"
Uh, because you can read thousands of newspapers and magazines, and millions of discussion groups and information and advocacy sites in Arabic? Like, the same reason anyone else would be interested in the internet?
When you read something this dumb in a news story, it's time for the old attributional abduction tango...
...we can't tell: was the journalist or news release writer misled by the source? did the journalist misremember, misunderstand or invent something independently? was the piece subverted by an editor, accidentally in the course of hasty re-writing, or on purpose due to conceptual confusion or some independent agenda?
I'll file this one tentatively under the general heading of "Reuters anti-globalization prejudice", along with the infamous Korean tongue-cutting story. After all, Paul Verhoef, as a former "Advisor to the Director-General" of the European Commission, and head of "a team with responsibility for international policies in telecommunications, Internet, e-commerce, and Information Society" for the "DG Information Society in Brussels", can't possibly be that badly informed and illogical, right?
I'm sorry that this seems to be "beat up on Reuters week" here at the Language Log. I really don't have any anti-Reuters animus. I just read the news via Google's news aggregator, and the Reuters wire offering is often among the top few stories in a given cluster, and beyond that, I just calls 'em as I sees 'em.
Reuters headline: "Small Changes Separate Man from Ape, Study Shows."
Headline of Science Update piece about the study in question: "Chimp chromosome creates puzzles: First sequence is unexpectedly different from human equivalent."
Science Update explains in more detail that
83% of the 231 genes compared had differences that affected the amino acid sequence of the protein they encoded. And 20% showed "significant structural changes". ... The researchers also carried out some experiments to look at when and how strongly the genes are switched on. 20% of the genes showed significant differences in their pattern of activity.
Note that this is true despite the fact that only 1.44% of the individual base pairs are different. If this seems puzzling to you, then you need to open up a calculator window and determine the value of .9856^N (the probability that all base pairs are the same for a gene with N base pairs, assuming independence) and .0144*N (the average number of base pairs that are different for a gene with N base pairs, assuming ditto), for values of N corresonding to the number of base pairs in a gene (say 1,000 to 100,000).
Perhaps this is the line of reasoning that the Reuters' headline writer had trouble with -- or maybe (s)he never got past the first three words of Maggie Fox's story ("Tiny genetic changes..."). Other headline writers did better, though: the Korea Times has "Big Genetic Gap between Chimpanzee, Human Being"; Xinhua has "Genetic study shows chimps are less human"; and so on.
The full description of the Nature
article is:
"DNA
sequence and comparative analysis of chimpanzee chromosome 22", by
The International Chimpanzee Chromosome 22 Consortium, Nature 429, 382 - 388
(27 May 2004).
Reuters correspondent Maggie Fox actually got the story right -- her first sentence continues "(Tiny genetic changes) add up to huge differences when human DNA is compared to that of chimpanzees" -- it's the headline writer who did her in.
[Thanks to Keith Ivey for correcting my math.]
According to this NYT story, Reuters and Robert Fisk are not the only ones whose preconceptions get in between their sensory inputs and their published descriptions.
Lara St. John is a classical violinist described as a "striking six-foot blonde". For the past few years, she's been trying to get past her first album picture, which showed her naked holding her instrument across her chest. So for a recital in Toronto last February, she "chose her best gown, a simple navy blue silk", explaining that "because the recital was so serious I didn't want trouble with the visuals". But John Terauds, apparently disappointed at the contrast with the album cover, wrote in the Toronto Star that
An almost matronly St. John shambled out onto the Jane Mallett Theatre stage in a wrinkled pigeon-colored number that had to be one of the ugliest frocks to see stage lights this season.
Terauds did like her playing, at least.
Journalists, like junior high school students, want to be able to tag everyone according to some simple and evocative mnemonic category. So-and-so is the class clown, so-and-so is a snooty brainiac, so-and-so is always daydreaming. Lara St. John is tagged as selling sex appeal, George W. Bush is tagged as linguistically inept.
This makes life easier for journalists with deadlines and without new ideas, I guess. But precisely because everyone's perceptions are necessarily influenced by expectations -- sometimes even determined by them -- you'd hope that journalists would take special care with the facts when they see or hear what they expect to.
This is good advice for scientists, too. Dick Hamming once warned me (as I gather he warned everyone) to "beware of finding what you're looking for." In John Terauds' case, I guess the advice applies in reverse; but then he had to come up with a lede for his review from somewhere. So maybe the advice should go to journalists' subjects and journalists' readers: recognize that journalists will misperceive or invent facts that correlate with their stereotypes and preconceptions, and adjust your actions and beliefs accordingly.
Those who have noted that Language Log doesn't have an open comments section have sometimes wondered why. Those who are aware that robots can be programmed to scour the blogosphere for open comments sections and spit spam into them may have some sense of why Language Log has been cautious so far. Here is just one small story about what's out there itching to get at you: recently posted statistics reveal that the percentage of material mailed to Debian Linux mailing lists that passes all ID checks and content X-raying and security screenings and is duly made available to the subscribers on one of the lists is just 3.5%. That's not a typo: three point five percent. Only 35 out of every thousand items sent to the list are genuine postings by human beings that pertain to Debian Linux. The rest, the spam, is caught by various filters, which some human has to constantly tune and maintain. Such is the flood of mass-mailed garbage travelling around the net looking for a way to get to your screen. The percentage of all email that is from spammers is said to be as high as 80% as of April 2004, and rising. These may be underestimates. Even running spamassassin in fairly aggressive mode does not prevent my address, which I do not advertise, from getting at least one Nigerian scam letter per day that skips past the filtering (plus a dozen pieces of other junk that get caught a very light spam load by some people's standards). This is Geoff Pullum, looking forward to not hearing too much from anyone at ssshhhh!@censored.sorry.xyz.
It is ironic indeed that the Reuters characterization of President Bush's first pronunciation of "Abu Ghraib" (discussed by Mark here) as "abugah-rayp" roughly matches Professor Ahmed Ferhadi's model pronunciation: they were accusing the President of pronouncing the name in almost exactly the way that Ferhadi says is correct. But they were wrong about the pronunciation Bush employed.
Like Bill Poser, I am a Linux user, because I need to have a machine with an operating system fit for grownups to do serious work in a reliable and controllable computing environment. Unlike Bill, though, I am still forced to maintain a Windows machine so that I can use WordPerfect (my coauthor Rodney Huddleston in Australia has twelve years of file creation and macro-writing invested in WordPerfect, and we have to share lengthy and complex files because we're writing another book together). I used the Windows machine to go to the file Bill pointed out he was unable to play. Since I had not used Windows Media Player before, it immediately started up a configuration process that made Windows Media Player the default player for every single kind of sound or video file you could imagine, part of their campaign to destroy such firms as Real Audio, whose product I had been using before. I let it have its evil way (I can always change everything back, though doubtless they will have found ways to make that difficult). Then it let me play the file.
Interestingly, I found that in the variety of Arabic spoken by Professor Ferhadi, the pronunciation is clearly [abugrep], but it also sounds very much like [abugurep], because the r is an alveolar tap, not an approximant like the American English r, which means the g and the r are separated enough that there is a hint of a vowel between them. For the first sound in "Ghraib" I was expecting a voiced velar fricative (like the g in Castilian Spanish haga), but what I'm hearing from Ferhadi is [g]. (One should never underestimate how much the dialects of Arabic vary.) And the last sound in "Ghraib" is definitely [p] in Ferhadi's pronunciation, not [b], possibly because he says the name all on its own and it is uncommon to fully voice a phrase-final plosive.
So the Reuters allegation that Bush said "abugah-rayp" would mean that he got it almost exactly right (here I'm ignoring the details of the length and monophthongal quality of the [e] vowel; Bush basically got that right too.) They were wrong about Bush's actual production, though: he pronounced it with a final [b], as Mark shows. (I suspect this is an acceptable Arabic pronunciation, but it does not match Professor Ferhadi's model exactly.) Mark's point is not impugned in any way, of course, and I agree with him: I'd fail the Reuters reporter in my phonetics class.
And Bush's grade? Mark says B. I say that people who insist Bush don't talk good should just try giving a broadcast speech to an audience of millions with Arabic place names in the script before they sit in judgment. He is probably doing just about as well as any of us could do. And on some occasions he hits on closer approximations to the right pronunciation than we heard from members of the Senate Committee on the Armed Services not long ago.
On reading Mark's discussion of President Bush's pronounciation of Abu Ghraib, I read the Slate article by Sam Schechner to which he referred. The Slate article provides a link to an audio file of Professor Ahmed Ferhadi of New York University saying Abu Ghraib in Arabic. I can't play it. The file is in a new multimedia format that Microsoft has created called Advanced Systems Format. Microsoft's Windows Media Player can play such files, but as far as I know, nothing else can. So if you use Microsoft software, you can play this file, but if like me you don't, you can't.
The ASF format is described on Microsoft's web site here. You can download a 98 page specification in Microsoft Word format. The specification proper is preceded by a three page End User License Agreement, in small type. The EULA begins with this:
IMPORTANT--READ CAREFULLY: This Microsoft Agreement ("Agreement") is a legal agreement between you (either an individual or a single entity) and Microsoft Corporation ("Microsoft") for the version of the Microsoft specification identified above which you are about to download ("Specification"). By downloading, copying, or otherwise using the Specification, you agree to be bound by the terms of this Agreement. If you do not agree to the terms of this Agreement, do not download, copy or otherwise use the Specification.
I am not a lawyer, and this is not legal advice, but even I can be confident that this is legal nonsense. Contrary to the statement in the above paragraph, I did not read the EULA prior to downloading the specification. I couldn't have, since it doesn't appear on the web page, only in the spec, which you have to download to read. And they're not entitled to assume that people read things in order. Anyone who looks at the table of contents and skips to the beginning of the substantive matter, or to a section of particular iinterest, won't even see the EULA. Such "agreements" have no legal force because they are not in fact agreements. The fundamental principle of contract law is that a valid contract is formed only by "a meeting of minds". The parties must both agree to the same thing. If the parties have different things in mind, no contract is formed, nor can one party impose a contract unilaterally, which is what Microsoft is trying to do here.
The invalidity of the EULA doesn't mean that you can do anything that you want with the specification document. Microsoft does have rights under copyright law. These rights are created by statute and do not depend on the existence of a contract between the company and the buyer. You can't, for instance, copy and distribute freely the specification document since that is governed by copyright law. But once they publish it, you're entitled to read it, and except as limited by any patents that may be relevant, you can use the information as you see fit. The particular words used in the document to describe the file format are subject to copyright, but data structures are ideas and are not protected by copyright.
Microsoft also has rights under trademark law. If they wish, they can trademark the name of their format. What that means is that they can prevent someone else from using the same name for a different format, or from falsely claiming to implement the specification.
The basic license is described thus in section 1(a):(i) reproduce and internally use a reasonable number of copies of the Specification in its entirety as a reference for the sole purpose of implementing ASF in your hardware, application, or utilities (your "Solutions"); (ii) reproduce and internally use your implementations of ASF made pursuant to the terms of this Agreement (your "Implementations") in source code form solely for internal development and testing of your Solutions, and (iii) reproduce and have reproduced in object code form only, your Implementations and distribute, directly and indirectly, your Implementations (only in object code form) solely as part of and for use with your Solutions.
The first clause is reasonable and legal. It grants a license to reproduce the document for certain purposes. As copyright holder, they have the right to control copying beyond fair use. The next two clauses are the interesting ones. They attempt to control the distribution of information about the specification. This attempt continues in Section 2(c):
You may not provide, publish or otherwise distribute the Specification to any third party. Further, you shall use commercially reasonable efforts to ensure that the use or distribution of your Solutions, including your Implementations as incorporated into your Solutions, shall not in any way disclose or reveal the information contained in the Specification.The net effect of these is that you can use the specification to write your own software for dealing with files in this format, and you can distribute the compiled versions of that software, but you cannot distribute the source code for that software, which would provide information about the specification to a programmer, or otherwise disseminate information about it. Now, this is curious. Since they have published the specification and explicitly allow people to produce software implementing it, they aren't, strictly speaking, trying to force people to use Microsoft products. What they are clearly trying to do is to discriminate against Free and Open Source Software. This is made explicit in section 2(g):
For a variety of reasons, including without limitation, because you do not have the right to sublicense the Necessary Claims, your license rights to the Specification are conditioned upon your not creating or distributing your Implementations in any manner that would cause ASF (whether embodied in your Implementation or otherwise) to become subject to any of the terms of an Excluded License. An "Excluded License" is any license that requires as a condition of use, modification and/or distribution of software subject to the Excluded License, that such software or other software combined and/or distributed with such software be (x) disclosed or distributed in source code form; (y) licensed for the purpose of making derivative works; or (z) redistributable at no charge;An example of an Excluded License, and no doubt the one they they have in mind, is the GNU General Public License, the license under which a great deal of free software is distributed, ranging from major projects such as GNU Project and the Linux kernel down to my own software. Microsoft is so scared of the free software movement that they are trying to prevent their ASF format from being used in free or open source software.
Although they aren't, strictly speaking, preventing other people from implementing the ASF specification, the prohibition against releasing information about it, including source code, raises the barrier. I haven't studied this specification carefully, but it looks pretty complex. Writing code to implement it would probably be a fair bit of work. That will discourage people who don't have a fairly strong motivation to support this format. If, on the other hand, source code could be distributed, the work need only be done once. For instance, if you are a C programmer, you don't need to study the details of the various common audio file formats and write your own software for reading and writing them because there are freely distributed libraries for doing this available as source code. I've been using Erik de Castro Lopo's libsndfile library.
As I said above, in my opinion the restrictions that Microsoft is trying to impose have no legal force, but the use of secret and/or proprietary data formats is an increasingly widespread problem. It helps monopolies like Microsoft, creates economic inefficiency, discourages innovation, and, when these formats are used by governments, creates an improper linkage between government and private companies, often forcing people to use a particular company's products in order to obtain access to government services or information to which they are entitled. The Open Data Format Initiative is an organization created to combat this problem by encouraging companies to open up their data formats and lobbying governments to forbid the use of proprietary data formats in government operations.
Returning to Slate's example of how to pronounce Abu Ghraib, why did they provide the file in ASF format? There isn't any good reason to. There are a number of audio file formats that do the job perfectly well and are universally understood, such as WAV, probably the most common, and AU/SND. For plain sound files such as this, ASF isn't any sort of improvement. The only reason that I can see is that Slate is owned by Microsoft and that this is an effort to lock consumers in to Microsoft products and discourage the use of FLOS software.
A true statement, because I'm currently in the woods of northwestern Montana, but it's not the truth value that interests me here. I'm reading a fascinating (no kidding!) book called Wild Logging: A Guide to Environmentally and Economically Sustainable Forestry, by Bryan Foster, and at one point (p. 23, to be exact) he lists the wildlife found on a 160-acre tree farm in northeastern Oregon, including the following:
Mammal sightings: black bear, bobcat, cougar, white-tailed deer, mule deer, elk, chipmunk, ground squirrel, flying squirrel, snowshoe hare, mice, and vole
What's interesting is the zero-plural-for-game-animals usage in this list, except for those mice -- which aren't game animals, of course, but then, neither are voles and chipmunks. That is, as is typical in discussions of game animals, all the terms are treated as if they were structurally parallel to deer, with plural identical to singular. Surely those animal sightings are multiple, not a single member of each species (well, except maybe for rarely-seen animals like cougars).
My first thought was that the asymmetry between the plural form mice and the singular forms vole, etc., must have basically the same motivation as the parallel asymmetry in certain types of compounds, discussed, I think, by Peter Gordon: mice-eater but rat-eater, with *rats-eater impossible.
But maybe not, because as I recall (and I might be misremembering), Gordon's explanation for the asymmetry in compounds had to do with late vs. early plural formation, depending on whether it was the default regular plural (as in rats) or an irregular plural like mice; and that wouldn't be relevant for the list in the logging book. I also don't think the asymmetry is an idiosyncrasy of this author, because it sounds fine to me, and replacing mice with mouse doesn't. So what governs this pattern in the list of mammal sightings? I'm hoping that a blogger with more insight into English grammar than I have will provide The Answer.
(Notice that the singular in the compound noun mousetrap isn't a problem in this context, presumably because most mousetraps catch just one mouse at a time, whereas a creature that eats mice is likely to want more than one.)
A logician friend of mine mailed me yesterday with a linguistic question:
What does "apageslication" mean? None of my dictionaries at home has an entry for this word and an internet search did not help either.
He certainly came to the right linguistic detective. I solved this puzzle without reference works in 2.83 seconds. I wonder if you can do the same. You know my methods, Watson; apply them!
The occurrence of "apageslication" he was looking at, I immediately told him, resulted when the writer most unwisely did a global change to alter sequences like "pp 29-32" to "pages 29-32" throughout.
I was basically right, though in the first version of this post I set out a more complex and detailed hypothesis than the simpler and dumber truth. Someone had indeed done a careless global edit that said "replace pp by pages everywhere". The original word was "application".
At the time of writing this revised post (May 26, 8:35pm EST) you can see the evidence here: a page of details on nasty military books about killing people in which "88 pp" has been changed to "88 pages" by someone so careless with the editor that they also changed "an opponent's ability to fight" to "an opagesonent's ability to fight", and "Poisons & Application Devices" to "Poisons & Apageslication Devices", etc.
My professional combination of observation, deduction, and specialist linguistic knowledge enabled me to see immediately that something like this must have happened, before I even looked for "apageslication" on the web. Elementary, Watson. Normally customers have to pay for this kind of puzzle-solving service at my minimum rate of $150 for an hour or any part thereof, but my logician friend will merely be paying for our next lunch together.
When I listened to George W. Bush's speech about Iraq on Monday, I noticed that he handled the pronunciation of Izzadine Saleem and Lakhdar Brahimi just fine, but tripped over Abu Ghraib, which he said three times in three different ways.
At the time, I considered blogging about it. However, I decided not to do so. I felt that these pronunciation issues are a minor point, and shouldn't be emphasized over the content of the speech, and I generally dislike the fuss that's been made over Bush's linguistic problems. I also know from experience that memory for disfluency is often inaccurate, and so I would be very reluctant to write about something like that without finding a recording to check my memory for what was said.
In most news stories, Bush's disfluent rendition of Abu Ghraib was ignored (e.g. in the New York Times story), or mentioned in passing, without any details. That's pretty much as it should have been, in my opinion. The real story is the content, not the pronunciation.
However, the Reuters wire sent out a piece featuring Bush's mispronunciations, which were described as "abugah-rayp", "abu-garon", and "abu-garah". And in an op-ed attack on American policies, Robert Fisk specified Bush's "hesitant pronunciation of Abu Ghraib as 'Abu Grub'".
As I said, I don't think the whole mispronunciation business is very important; but given that people are going to talk about it, I think it's interesting that they don't take the trouble to describe what happened accurately. If we're going to criticize President Bush for not taking the trouble to learn how to say a currently-important word fluently, shouldn't we also take the trouble to observe carefully the facts of what he did say?
Here is the passage in Bush's speech where the three pronuncations of Abu Ghraib occurred:
A new Iraq will also need a humane, well-supervised prison system. Under the dictator, prisons like Abu Ghraib were symbols of death and torture. That same prison became a symbol of disgraceful conduct by a few American troops who dishonored our country and disregarded our values. America will fund the construction of a modern, maximum security prison. When that prison is completed, detainees at Abu Ghraib will be relocated. Then, with the approval of the Iraqi government, we will demolish the Abu Ghraib prison, as a fitting symbol of Iraq's new beginning. (Applause.)
As for the "correct" way to pronounce, here's what Slate had to say a couple of weeks ago, including a link to a (non-Iraqi?) Arabic rendition. I put "correct" in scare quotes because there is always some uncertainty about how to anglicize (or americanize) the pronunciation of a foreign name that contains sounds without an English counterpart.
There are RealVideo versions from CSPAN and on the White House web site, and a RealAudio version is available from NPR. The section quoted above occurs at about 21.29 in the CSPAN version, and at about 21.12 in the White House version. After considerable trouble with non-standard downloading techniques and wrestling with conversion of proprietary audio formats -- I'll add to Bill's complaints about this stuff that observation that I did all this on a Windows box, and still had to struggle -- I was able to create .rm and .wav files of the paragraph in question -- you can listen to the whole paragraph in RealAudio here, and .wav clips of the crucial segments are linked in below.
In the first rendition, Bush seems to hesitate disfluently at three points, indicated by hyphens in the pseudo-orthographic version below:
Under the dictator, prisons like - abu gar - reb - were symbols of ...
Aside from the hesitations, this pronunciation seems to be a pretty good rendition of what Sam Schechner at Slate magazine recommends. In particular, the vowel of the last syllable was a pretty good IPA [e], not diphthongized like the vowel in English babe but also not laxed and shortened like the vowel in English bed. The president divided "Ghraib" (Arabic for "raven") into two syllables, but this is a plausible anglicization of a word for which the preferred transliteration is actually Ghurayb. Contrary to my memory of the original speech, Bush did not stutter or repeat any syllables, he just hesitated three times -- once before "Abu Ghraib", once in the middle of it, and once at the end. You can listen to a .wav clip here of the words "...prisons like Abu Ghraib were symbols of death and torture".
I don't think that Reuters is accurate in transcribing this as "abugah-rayp". The president's vowel was not diphthongized, as the "ay" would suggest, and the final consonant, though clipped because of the final disfluent hesitation, seems definitely to be a [b], not a [p], as can clearly be seen by the "voice bar" after the closure in the spectrogram (of the "Abu Ghraib" part) below:
First item's grade: Bush gets A for phonetics, C for fluency; Reuters gets a C for transcription.
Bush's second rendition is fluent but puzzling: I'd render it in IPA as [gɑrɔm] or [gɑrɑm]. It's fluent, but the final vowel seems to be backed and somewhat rounded, and the final consonant is definitely nasalized. However, Reuters is also wrong to transcribe it as "garon". The final consonant sounds like an [m], as you can hear in this clip, and on the video, you can clearly see the labial closure.
Second item's grade: Bush gets D for phonetics, A for fluency; Reuters gets a C for transcription.
Bush's third rendition is also fluent, and gets the consonants right. The final vowel of "Graib" is back but not round -- I'd put the whole thing in IPA as [gərɑb] (or maybe [gurɑb] -- the first, unstressed vowel is rather short, and acquires some rounding from the following [r], so its quality is hard for me to decide on). (You can listen for yourself to a clip of the passage here). The final-syllable vowel is not what Slate recommends, but I've heard several newscasters using it in "Abu Ghraib" over the past few weeks.
Reuters treats this pronunciation as "garah" -- that seems to be a complete invention, as both the sound and the video seem to me to indicate that there is a final labial consonant. Their only excuse is that a [p] (at the start of "prison" follows -- but the spectrogram below (of the words "...the Abu Ghraib prison") again clearly shows a voice bar for the [b], not the noise pattern expected for an [h]:
Third item's grade: Bush gets B for phonetics, A for fluency. Reuters gets D for transcription.
Summary: Bush gets a grade point average of 3 -- (4+2+1+4+3+4)/6 -- a solid B. Reuters gets a grade point average of 1.67 -- a weak C-.
We should expect better. As one commentator observed, by now Bush should be able to reel off the pronunciation of Abu Ghraib as confidently and correctly as the pronunciation of the Alamo. But I'm just as disappointed in Reuters. Having decided to devote a whole story to criticizing the pronunciation of a sitting president of the United States, couldn't they take the trouble to sit down with a recording (and the help of a phonetician, given that their reporter obviously had no relevant skills or knowledge) and get the facts right?
As for Robert Fisk, he describes Bush's pronuncation as "Abu Grub", exhibiting the breezy disregard for mere factual detail for which he has become famous. Fisk is wrong on every relevant count, for all three of the president's renditions. In all three renditions, the president divided "Ghraib" into two syllables -- Fisk missed this. The president's final vowels in his three versions of "Ghraib" were [e], [o] and [a] -- Fisk uses orthographic "u", which can't plausibly be a representation of any of these. The president mispronounced the final consonant as [m] in his second version -- Fisk missed it again. Phonetically, Fisk flunks.
[Update: I was able to download the .rm file from CSPAN, and with a bit of help from the RealEditor program, cut out the paragraph in question from Bush's speech. It's available here. I still haven't figured out how to strip out the audio track so as to make spectrograms. I'd also like to be able to get a copy of the version on the White House site, as the audio and video quality seems to be much better, but the site seems to be set up in such as way that no downloading is possible, only streaming.]
[Update 2: I was able to download an audio-only version from NPR, and extract the relevant paragraph, here. After an inordinate amount of (I hope legal) fuss, I was able to convert this to a .wav format, so that I could make spectrograms. I also extracted short clips of the crucial three pronunciations, linked in above.]
Geoff Pullum has straightened out the whole Google hit nomenclature thing -- except for one key point. How do you pronounce whG and whG/Gp? (Not to speak of whA and whA/Gp...)
"Double yuu aitch gee" and "double yuu aitch gee per gee pee" are not going to make it.
We can safely leave this to Norma Loquendi and the Law of Least Effort, but my guess is that they'll decide on "whig" and "whig-up" -- or in IPA [hwɪg] and [ˈhwɪˌgʌp]. For for those like me who don't distinguish [hw] from [w], it'll just be [wɪ;g] and [ˈwɪˌgʌp]. Though these will work just fine, and are certainly much more SI-compliant, I have to confess that I don't like them as well as "ghits". I'll also confess that my reason for preferring the pronunciation [gɪts] was the same as Geoff's reason for avoiding it.
Most spelling reform proposals are totally unsuccessful, and some are disastrously so. I now want to withdraw mine, making it perhaps the most short-lived and unsuccessful in the history of the universe. Keith Ivey points out to me that under the standard conventions for physicists published here by the Physics Laboratory of the National Institute for Standards and Technology (NIST), unit names like the hertz, joule, pascal, and so on are always lower-case despite being named for persons, though with abbreviations "the symbol or the first letter of the symbol is an upper-case letter when the name of the unit is derived from the name of a person": it's Pa for pascals, Wb for webers, Hz for hertz, and so on. That would suggest "ghit" for web hits using the Google&tm; search engine, but "Gh" for the abbreviation. (The reason I suggested g-hit as the pronunciation is that "git", with a velar stop [g], is a word in British English meaning stupid or obnoxious person. But it won't matter any more; read on.)
I now realize that I am not content with any of this. It now occurs to me that the best analogy for Google hits as a measurement term is not hertz or joules or pascals, but degrees Celsius. Degrees are the units, Celsius is a specific scale of measurement. Words like Celsius are capitalized because they are names of people. Google should be capitalized because it is a corporation name But hits, like degrees, are not named for a person or corporation. According to NIST the correct spelling of the name of the unit °C is "degree Celsius" (the unit "degree" begins with a lower case "d" and the modifier "Celsius" begins with an upper-case "C" because it is the name of a person. However, I'm not done yet. Read on.
Putting this insight (hits are analogous to degrees, Google is analogous to the Celsius scale) together with Phil Resnik's suggestion that we really want to count web hits relative to a given search engine, and Mark Liberman's scientific refinement pointing out that we really want to measure web hits per billion documents, we get to the following proposal. The basic unit should be the webhit, abbreviation wh. Webhits measured using Google are Google webhits, abbreviation whG; webhits measured using AltaVista are AltaVista webhits, abbreviation whA; and so on. Web hits per billion pages, i.e., per gigapage (Gp -- David Nash points out to me that giga- is conventionally abbreviated G-, not g-), will be a unit obtained by division, and the NIST standard would be to say webhits per gigapage when spelled out in full but wh/Gp when using the abbreviations. Measurements of webhits on Google per gigapage would be expressed in whG/Gp, and so on.
So that's my current proposal: measuring currency of (sets of) strings on the web by whG/Gp. Though we probably haven't heard the last of this thread. We may need a national standards committee to make recommendations to a central council standing executive committee which will then report to the membership at large through delegates to a national congress... yawn... I never wanted to become a national standards administrator. I'm a grammarian. Grammar is fun. Grammar is exciting.
If you're interested in English syntax, you'll interpret this post in one of two ways. You might find that it documents a curious construction that (like me) you've never noticed before, and in fact didn't believe to be possible. Then again, you might find that it documents the fact that an allegedly literate adult (like me) can remain completely ignorant of a commonplace and perfectly ordinary aspect of his native language. (If you're not interested in English syntax, you'll want to return to our discussions of eggcorns, ghits and coffee.)
An email query from Haj Ross led me to discover sentences like these:
"Both of whom being influenced by Ellington, Rowles and Brown choose one Ellington tune for each of the two albums that comprise this two-CD set..."
"Ireland and Denmark, both of whom being heavily reliant on British trade, decided they would go wherever Britain went..."
"At present, personal injury cases are heard by many different Judges, some of whom having no experience in this field."
These are supplementary (non-restrictive) relative clauses with a present participle in place of a finite verb, whose subject is a partitive structure involving a relative pronoun, like "both of whom", "most of whom", "few of whose parents", "part of which".
Frankly, every single one of these examples seems completely ungrammatical to me -- or at least they did at the start of the investigation. Of course I have no problem with participal relative clauses like "the boy sitting in the chair" -- but these normally lack a relative pronoun, and the examples above are more like "*the boy who sitting in the chair looked up"!
In some cases, I can fix such sentences up by deleting the prepositional phrase with the relative pronoun (e.g. deleting "of whom"), or by replacing the relative pronoun with a regular pronoun (e.g. replacing "whom" with "them"), or by replacing the participle with a finite verb form (e.g. replacing "having" with have"):
At present, personal injury cases are heard by many different judges, some having no experience in this field.
At present, personal injury cases are heard by many different judges, some of them having no experience in this field.
At present, personal injury cases are heard by many different judges, some of whom have no experience in this field.
So my first thought was that the examples in the first set were mistakes of composition -- slips of the keyboard -- caused by a typing substitution, by a partial revision, or just by losing track of the structure under construction.
However, it seems from Haj's note that such sentences are fine for him. I can't assign him responsibility for the examples above, which I found on the web, but he cites invented examples like
These men, neither of whose ID's being valid, were jailed immediately.
These gophers, some of whom having mated already, command top dollar.
which (if I understand his note) he takes to be perfectly OK. And looking on the web, there are lots and lots of examples of this pattern. So either these preposterous imitations of English have been produced by infiltrators from some parallel universe, or this is one of those little corners of the language where idiolects differ.
Here are a few other examples, just to push the point for those whose initial reactions are like mine:
"The partnership between Mfume and Bond, both of whom having held elective offices for many years, pushed the NAACP aggressively back into national politics in the 2000 election."
"...it was fascinating to find the many fields of medicine entered and the many locations chosen by those of us who attended Duke Medical School together just after the war, most of whom having also been in the armed services."
"She seems to know and be known by a great many residents, many of whom I also know, several of whom being some of the finest elder members in my own congregation, Nashville’s Second Presbyterian Church."
"CFA has grown to represent many of the leading names in UK chilled food production, who employ more than 50,000 people around the UK, many of whom being in rural areas."
"He was one of four brothers, three of whom having died or departed."
"Two associate members and two alternates (none of whom having a prior or present employment relationship with the Fund) would be appointed by the Managing Director after appropriate consultation."
The design and manufacture of the prototype is predicted at a total of eight months, three months of which being design.
There are also plenty of examples with mass-noun partitive constructions, like:
If the branch is aboveground at a bank of meters, all of the service line can be a main, part of which being aboveground.
There are also a few examples that can be construed as relativization out of a supplement to the relative clause, which is a mere island violation:
"The next evening we spent with the Consul and his two pretty daughters, neither of whom being able to speak a word of English, the conversation was carried on in French."
"In this report I have dealt more in particulars for the reason there are no reports from brigade commanders, all three of whom having been captured, I reserve to myself the privilege of making such corrections as would appear right and proper when I subsequently have the opportunity to examine their reports."
These are easier for me to take, once I succeed in parsing them. And then there are some where the participial relative clause is restrictive, which seem even worse to me:
The 99ER Pairs is open to any two players neither of whom having more than 100 or more masterpoints as of November 1st of the year in which the event is held.
As in the case of "such the good son", I expect that now that I've learned that this construction exists, I'll gradually learn to accept and perhaps even use it. No, I probably won't use it, except in a sort of jokey semi-quotative manner.
[Note: in case it's not obvious, I should state explicitly that I don't see anything logically or conceptually impossible in these constructions. I've studied languages that use relative pronouns freely in analogous non-finite clauses (e.g. Latin ablative absolute constructions):
quibus
|
rebus
|
cognitis
|
Caesar
|
apud
|
milites
|
contionatur
|
which
|
things
|
having been learned
|
Caesar
|
to
|
the soldiers
|
gave a speech
|
(B.C. 1.7) |
qua
|
(regione)
|
subacta
|
licebit
|
decurrere
|
in
|
illud
|
mare
|
which
|
area
|
subdued
|
it will be possible
|
to run down
|
into
|
that
|
sea
|
(Q.C. 9.3.13) |
I just didn't realize that English was such a language -- for some people.]
I'll risk adding even more verbiage to this already over-long post, by observing again that this case provides another example of the unsolved problem of looking things up in grammar books.
CGEL devotes chapter 12 to "Relative constructions and unbounded dependencies", and chapter 14 to "Non-finite and verbless clauses". In chapter 11 ( "Content clauses and reported speech), it mentions (p. 990) non-finite clauses with question words, like
Whether hunting or being hunted, the red fox is renowned for its cunning.
Chapter 15 is "Coordination and supplementation", and it cites (p. 1359) supplements that are "comparable in function to a relative clause":
The tourists, most of them foreigners, had been hoarded onto a cattle truck.
[By the way, this must be one of the rare errors in CGEL -- "hoarded" seems to be a malapropism for "herded".]
Somewhere in the 350-odd pages of these four chapters, there is probably a discussion of participial relative clauses with subjects like "some of whom", but I couldn't find it. Nor could I figure out how to find it in the index.
Cory Doctorow has an excellent talk/essay, "Ebooks: Neither E, Nor Books", which I completely missed, until Blogalization posted on the problems of translating the terms "public domain" and "creative commons" into Spanish and other languages whose parent cultures are outside the tradition of English common law.
The American Heritage Dictionary has
IDIOM: stick in (one's) craw To cause one to feel abiding discontent and resentment.
A transcript entitled "Gen. Anthony Zinni, USMC (Ret.) Remarks at CDI Board of Directors Dinner, May 12, 2004" contains the following phrase:
They did not ever want to hear that we had a problem, something sticking our crawl, that we didn’t bring up to them, and we didn’t honestly express if we felt it had to be expressed.
(Note that this is a transcript of a recorded presentation and interchange -- specifically at this point one of Gen. Zinni's responses to questions -- and so this " eggcorn" is much more likely to have been created by the transcriber than by the speaker.)
This is the only Ghit for "sticking our crawl" (making it a .001 KGh expression, with a frequency of 0.23 GPB -- Ghits per Billion documents). There are quite a few eggcorns for this phrase with crawl in place of craw, however, as long as we retain the preposition "in":
(link) And to think that his hand picked protoge was beat out by a 7th rounder must stick in his crawl.
(link) “Win a Date with Tad Hamilton” is going to cause the girls to melt, but it’s really gonna stick in the crawl of the guys.
(link) It may stick in your crawl, but we're coming back yankee boys and girls.
(link) There are two things that realy stick in my crawl... People who aren't tolerant of other people, and that lying bastard Sayer.
There are 21 Ghits for "stick in * crawl", making it a .021 KGh pattern with a frequency of 4.9 GPB.
The pattern "sticks in * crawl" has 47 Ghits, making that a 10.97 GPB pattern.
(link) But what sticks in my crawl is the creative lethargy that takes a twenty year old song from one of the most innovative bands to emerge from the post punk, pre-techno era and just blithely gut it for fodder.
(link) This sticks in my crawl, maybe because, I refuse to hide behind anything or anyone false ...
(link) there's just something about shania and like, celine dion, that really sticks in my crawl i just can't put my finger on it
(link) Sorry for the rant, but it sticks in my crawl.
The pattern "sticking in * crawl" has 24 Ghits, making it a 5.60 GPB deal, and the pattern "stuck in * crawl" has 66, but only 23 of those are relevant -- the others are things like "stuck in the crawl space" and "stuck in the crawl of traffic".
Adding it all up, we have 21+47+24+23 = 115 Ghits, or 26.8 GPB.
(This is an upper bound, since some of the pages may have been returned for more than one of the searches).
By comparison, looking at the original patterns "stick in * craw" etc., we see that these are overall about 127 times as common:
stick in * | sticks in * | sticking in * | stuck in * | Total Ghits | GPB | |
craw | 2,930 |
6,110 |
653 |
4,860 |
14,553 |
3396.1 |
crawl | 21 |
47 |
24 |
23 |
115 |
26.8 |
Ratio | 140 |
130 |
27 |
211 |
127 |
127 |
The original idiom has a frequency of about 3,400 GPB, while the "eggcorn" idiom has a frequency of less than 30 GPB.
Why are 3396.1 and 26.8 GPB better numbers than 14,553 and 115 Ghits?
For the purposes of comparing the eggcorn to the idiom, it doesn't matter -- the ratio is the ratio, and all we need to do is to say that at the moment, the original idiom seems to be a bit more than 100 times more frequent than the eggcorn. However, if we care about the actual frequencies, then we want the normalized counts. It means something to say that as of today, the idiom "stick in one's craw" has a frequency of about 3,400 GPB.
My initial instinct was just to propose AVhits as another measure, but perusing the CPAN site,I find that there are a ton of search modules, including not only search engines like Google, Altavista, and Lycos, but also news sources like the Washington Post and Reuters, specialized article searches like PubMed, job searches, etc.
What do do? Perhaps use Whits (for "Web hits") as the more general term, with usages like "1234 Whits (Altavista)", and then consider Ghits to be an abbreviation for "Whits (Google)"?
A nice advantage of this proposal is that you can advise researchers to keep their Whits...just in case anyone wants to see them. :-)
Geoff Pullum (GKP) suggests that we start using new units Gh, KGh, MGh and so on, as a way of formalizing the counting of Google hits, now traditional among web language folk, and dubbed "ghits" earlier this year by Trevor at kaleboel. This is a terrific idea, though I'm afraid that Geoff's suggested pronunciation "gee hits" has little chance of making headway against the competitor [gɪts] (hard g, rhymes with grits).
However, I want to propose a more substantive addition to Geoff's proposal, based on the fact that to get a measure of frequency (which is what we often but not always want), we need to normalize by the size of the set searched. In this case, the set is the documents in Google's index, and Google puts the size of this set right up on its front page. As I write this, the number is 4,285,199,774. Now if we take a modest number of Ghits -- a few hundred to a few thousand -- and divide by 4,285,199,774, we'll get an unpleasantly small number. For example, {Pullum} has a count of 23,800 Gh, or 23.8 KGh, or .0238 MGh, which is a pretty respectable number. However, 23,800 divided by 4,285,199,774 is 5.554e-06, or .000005554 Ghits per document indexed.
We can deal with this the way we deal with other uncomfortably large standard measures like Farads and Lenats, by using prefixes such as micro- and nano-. Thus the frequency of Pullum becomes 5.554 microGh/document, or 5,554 nanoGh/document. In general, I think that the nano-scale measure is the right one to use for term frequencies, since the plausible range of sensible and useful frequency counts then correspond to a sensible and useful range of natural numbers: one nanoGH/document corresponds today to about 4 or 5 Ghits; ten nanoGH/document corresponds to about 40 or 50 Ghits; a thousand nanoGH/document corresponds to about 4,000-5,000 Ghits; a million nanoGh/document corresponds to about 4 or 5 million Ghits; and so on.
We need a shorter term for this measure than "nanoGh/document" -- so I suggest that the web frequency of terms should be measured in GPB, for "Ghits per billion documents". I'll illustrate the use of this measure in some subsequent posts.
The value added of a normalized measure is that it will continue to give comparable estimates of how frequent a pattern is, as the number of pages that Google indexes continues to grow. Here's a graph of (self-reported) search engine index size from 12/95 through 6/03, in terms of billions of textual documents indexed:
(KEY: GG=Google, ATW=AllTheWeb, INK=Inktomi, TMA=Teoma, AV=AltaVista).
So a bit less than a year ago, Google indexed 3.3 billion textual pages, vs. 4.3 billion now. That's about a 30% increase. At that rate of increase, the index will double in size in about 2.5 years, and will increase by a factor of 100 in about 17.5 years. In 2021, Google may or not still be in business, and the web will certainly be organized in very different ways -- average document sizes may be quite different, to mention one trivial matter -- but to the extent that we want to make comparisons of frequencies over time -- even over a couple of years of time -- we'd better do our best to normalize counts somehow. And we linguists aspire to work on a time-scale of centuries, if not millennia.
Of course, if we are just looking at the ratios of counts -- or frequencies -- for different cases at a given time, it doesn't make any difference whether we use counts or frequencies, the results are exactly the same. In that case, it's clearer and simpler just to use counts -- and there Geoff's Gh, KGh and so on are just the right thing. An excellent case study using such comparisons of counts can be found here at Tenser, said the Tensor. We've posted a number of examples of the same sort of analysis, for example here.
Finally, I should mention that there's another issue about frequency -- document frequency and term frequency are not entirely interchangeable measures, and the cases in which they differ more or less than expected are sometimes especially interesting. For more on this, see e.g. this reference (or wait for another post on the subject). However, GPB remains a pretty decent proxy for a measure of the frequency of bits of text -- much better, and much more accessible, than anything we had just a few years ago.
[Update: Semantic Compositions suggests using capitalization to distinguish between "raw" ghits (e.g. kGh) and validated ghits (e.g. KGH). I guess one could similarly use Gpb and GPB, though I'm skeptical that folks will be able to keep the capitalization straight.]
On the other hand, even "doux fard" is more transparent than New York City's code for coffee-ordering, where "coffee, black" means coffee with two sugars and "coffee, regular" means it has milk and two sugars. (Yeah, buddy, dat's what we considuh REG-yoo-luh. Yugodda problum widdat?) Interestingly, the page I referenced does not contain "light and very sweet" -- my preferred combination, which, when done properly, has the taste of very hot melted coffee ice cream.
First let me say that in this post I want to make a spelling reform proposal. Previous spelling reform proposals for English have had a disastrously unsuccessful history, but I only want to respell one word, and only by a capitalization. It relates to the matter of getting a little more serious about the terminology for measurment units in practical everyday use of the web as a corpus.
The term "ghit" for "Google hit" is slowly beginning to get established, at least here on Language Log. It seems to me unfair to Google&tm; to use the lower-case "g"; it should be "Ghits". We should honor Google (a company that includes "You can make money without doing evil" as part of its corporate philosophy deserves our respect) the way Heinrich Hertz (1857-1894) is honored in the abbreviation "Hz", the basic unit of measurement for frequency of wave vibrations. The term for the unit should be the Ghit, with capital G, pronounced G hit ("jee hit"), and the abbreviation should be Gh.
A thousand Ghits will be a kiloGhit (under usual US capitalization conventions, KGh compare KB for kilobytes, KHz for kiloHertz; outside the US we may expect the spelling kGh). A million Ghits will be a megaGhit ((MGh); a billion (109) Ghits, which the would probably get, would be a gigaGhit ((GGh). In due course, as the web gets bigger, we may come to need a term for a trillion Ghits: a teraGhit ((TGh), though the web will have to get about 250 times larger before we need that term.
A measurement in Ghits will be by definition a count of the number of web pages returned by a search pattern. A pattern gets n Ghits if and only if searching the web using Google yields n distinct web pages that contain tokens of the pattern. I do think the pages should be distinct: it seems to me that duplicate pages should in principle be eliminated if the notion of a Ghit is to mean anything. Since it is perfectly possible for a page on the web to have an identical copy at a different URL (this probably happens quite a bit), it is clearly possible for copies of pages to come up as separate hits in the list when you run a Google search. That means that the number of items on the list returned by the Google search engine will only be a rough approximation to the actual Ghit count for your search string. It also will not be a measure of the number of occurrences of the string on the web: the number of occurrences will be higher than the Gh value because a page will often contain multiple occurrences.
Notice that a pattern is a set of strings, not a string. The pattern {ghit} gets 636 Gh, most of them spurious (as Mark pointed out here). But the set {ghit, "Google hit"} is also a pattern, and it gets only 7 Gh, the number of pages that contain BOTH "ghit" and "Google hit". Those are all genuine hits for the word "ghit" that we're talking about, the one that I say should be respelled "Ghit". Switching to plurals gives us {ghits, "Google hits"}, which gets 9 Gh.
Adding strings to a pattern set either keeps the Gh the same or decreases it. There may be quite a few people using Google who do not fully understand that. It would be reasonable to think that a search using the pattern {flowers tulips daffodils pansies dahlias roses} might do even better at getting pages about flowers than {flowers} would, but that is not true; it gets far fewer pages, three orders of magnitude different: 4.5 KGh for {flowers tulips daffodils pansies dahlias roses}, 12.5 MGh for {flowers}. That's because an otherwise relevant page missing just one of the words, say "dahlias", will be ruled out under Google's search principles if "dahlias" is included in the search pattern. Remember also that putting a string of words in quotes turns them into a single word-like unit (call it a pseudoword): searching with the 2-word pattern {chocolate cake} will give utterly different results (far higher: 2.16 MGh) than searching with the 1-pseudoword pattern {"chocolate cake"} (0.616 MGh = 616 KGh).
A couple of days ago, I failed to understand the Quebecois accent of the barista at a Montreal café ("honi soit qui joual y pense..."). So when I went back to the same place yesterday for a sandwich and a cup of coffee, I was entirely prepared to cope with the choice "doux ou fort?" -- "mild or strong" -- pronounced so that the last word sounded like standard European French "fard".
Well, I got a different server this time, and after she prepared my sandwich and picked up a coffee mug, she asked me "velouté ou corsé?". Now "velouté" means "velvety", and I know it as a term for creamy soups and smooth-tasting wines; and the only experience that I've had with the French word corsé is in the context of wine terminology, where it means "(made) high in alcohol content" or something of the sort. So this left me in a state of cross-linguistic and cross-cultural doubt.
I figured "velouté ou corsé?" was probably just another name for the mild/strong choice. But then again, the server at the same place used "doux or fort?" for that choice just a day before., so maybe corsé meant "with a shot of brandy" or "with an extra shot of espresso" or something? Or maybe I had gotten the word entirely wrong again, due to pronunciation variation?
My uncertainty showed on my face, as usual, and so the barista tried to help me out by switching to English. "The coffee, do you want it smooth or coarse?" Now I was really confused. Being a decisive if random kind of guy, I said "corsé, s'il vous plait", as if I knew what that meant. The server didn't add anything -- brandy or otherwise -- to the coffee, and it tasted just the same as the "fort" variety had the previous day -- well-made brewed coffee, fairly strong, with the taste of a dark roast.
Pursuing the linguistic aspects later on, I learned that I had indeed apparently observed another example of the diffusion of winetalk into other areas. At least, that's what I concluded from perusing the Dictionnaire de l’Académie Française , which is now online. (More exactly, the full eighth edition is online, and the first two volumes of the ninth edition, A to mappemonde). From this source I learned that the verb corser originated in the 16th century, as a derivative of cors, the old form of corps "body", and that it means to augment the alcohol content of a wine, to spice up a sauce, or to add complexity to a story, play or real-life situation.
The past participle corsé is from the 18th century, and the entry notes that the relevant sense of corps "body" is "the consistency of a thickening liquid" -- this is confusing, since higher alcoholic content in wine hardly thickens it, though there is a clear metaphorical sense in which a wine of higher alcoholic content has more "body". The gloss for corsé , translated, is: "having body, vigor. Of wine, high in alcohol. Of coffee, very strong. Of sauce, highly seasoned, spicy." There's no indication that a sauce is considered to be corsé merely by virtue of being thickened.
Here are the full entries:
CORSER v. tr. XVIe siècle. Dérivé de cors, forme ancienne de corps.
1. Renforcer, donner du corps à. Corser un vin, augmenter sa teneur en alcool. Corser une sauce, la relever, l'épicer. 2. Fig. Corser l'intrigue d'une pièce, en multiplier les péripéties. Corser un récit, le rendre plus captivant. Fam. L'affaire, la situation se corse, elle se complique, elle devient plus sérieuse. Péj. Corser une facture, la majorer abusivement.CORSÉ, -ÉE adj. XVIIIe siècle. Dérivé de corps, au sens de « consistance que prend un liquide qui épaissit ».
1. Qui a du corps, de la vigueur. Un vin corsé, fort en alcool. Un café corsé, très fort. Une sauce corsée, relevée, épicée. 2. Fig. Une intrigue corsée, riche en incidents et péripéties dramatiques. Fam. et péj. Une addition corsée, exagérée, abusivement majorée. Des histoires corsées, osées, scabreuses.
From this entry, it's not clear what the order of the applications to wine, food and coffee were, but I'm guessing that coffee came later in the series, although apparently not very recently.
As for velouté, l’Académie is quite exact about what it means when applied to wine: "Vin velouté, Bon vin qui est d'un beau rouge un peu foncé et qui n'a aucune âcreté" -- "a good wine that has a beautiful red color, a bit dark, and that lacks any acridity". However, the extension to coffee is not mentioned at all. I'm not sure whether this is an oversight, or whether the usage with coffee is recent, but it seems likely that this is a case of diffusion of tasting vocabulary, whatever the timing. Here's the whole entry:
(1)VELOUTÉ, ÉE. adj. Il se dit des Étoffes dont le fond n'est point de velours et qui ont des fleurs, des ramages faits de velours. Satin velouté. Passement velouté. Étoffe veloutée.
Il se dit aussi de Certains papiers qui servent de tenture et dont les dessins, les ornements imitent le velours. Un rouleau de papier velouté.
Il signifie, par extension, Qui est doux au toucher comme du velours, ou Qui a l'apparence du velours; il se dit particulièrement de Certaines fleurs. Les pensées, les œillets d'Inde, les amarantes sont des fleurs veloutées. Peau veloutée. Teint velouté.
Vin velouté, Bon vin qui est d'un beau rouge un peu foncé et qui n'a aucune âcreté.
En termes de Cuisine, Potage velouté, Sorte de potage onctueux.
VELOUTÉ, en termes de Joaillerie, se dit des Pierres qui sont d'une couleur riche, foncée. Un saphir velouté.
VELOUTÉ s'emploie aussi comme nom masculin et signifie Douceur, caractère de ce qui est velouté. Le velouté d'une étoffe, d'une pêche.
English coarse is completely unconnected -- here's the OED's etymology for it:
[First found early in 15th c. No corresp. adj. in Teutonic, Romanic, or Celtic. The general spelling down to the 18th c. was identical with that of the n. COURSE; with that word it is still identical in pronunciation, both in standard English and in the dialects (e.g. Scotch kurs); the spelling coarse appears to have come in about the time when the pronunciation of course changed from (u), to (o). Hence the suggestion of Wedgwood that coarse is really an adj. use of course, with the sense ‘ordinary’, as in the expression of course, ‘of the usual order’. It appears to have been used first in reference to cloth, to distinguish that made or worn in ordinary course from fine cloth or clothes for special occasions or special persons; ‘course cloth’ would thus be ‘cloth of (ordinary) course’. Cf. the history of mean, and such expressions as ‘a very ordinary-looking woman’, a ‘plain person’.
Our first contemporary example of the spelling coarse is in Walton 1653 (where course however also occurs; it became frequent after 1700; course occurs occasionally down to 1800.]
The OED's etymology of English velvet also applies to the French cognate:
ad. med.L. velvetum (-ettum), also vel(l)uetum (-ettum), app. representing a Romanic type *villūtettum, dim. of *villūtum, whence med.L. vel(l)utum (velotum), It. velluto, OF. velut, -ute, Sp. and Pg. velludo, ultimately f. L. vill-us shaggy hair.
and the OED gives a gustatory sense for "velvety", which however focuses on touch metaphors and ignores color:
3.b. Smooth and soft to the taste.
The semantics of coffee is multidimensional. On the production side, there are the intrinsic qualities of the beans and their fermentation, the type and degree of roasting, the method of preparation and the ratio of water to beans, and so on. On the consumption side, there are many flavors, several textures, and these differ in degree as well as kind. But I have the impression that the pragmatic aspects are even more important -- what sort of cultural systems and social settings the speaker or writer wants to evoke...
In French as well as English, there's apparently a growing infusion of terms from wine talk into coffee lingo. This is partly because the oenophiles have plenty of terms to borrow, but I suspect that it's mostly a matter of borrowing prestige by using prestige-associated vocabulary.
[Update: There is a lovely storyat Pedantry, evoked by this post, which confirms (in passing) that velouté or corsé are the traditional Montreal terms for two traditionally-available alternative forms (types? degrees?) of brewed coffee.]
Geoff Pullum is right to point out the absurdity of trademark attorneys' fetish about always using a trademark as an adjective ("Oreo cookies" rather than "Oreos"). I've run into this notion several times in the course of preparing expert opinions in trademark cases, when attorneys have questioned my descriptions of marks as nouns or noun phrases.
But it isn't quite accurate to say as Geoff does that "The enemy [the attorneys] are laying defenses against is the danger that a trademark might fall into the public domain." This has rather to do with the distinction between terms that apply to the attributes of a certain product or service, which are protectable, and "generic" terms that merely name the class of things of which a particular product or service is an instance, which are not protectable. So you can register the name "Tru-Fit sports shoes," for example, but not "sports shoes" itself.
The line between the two is not always clear,
particularly when the mark is descriptive of some unique property of
the goods or services it's associated with, which is why companies are
careful in trademark applications to describe their brands under
descriptions that make the generic term explicit, as in
"DISTANCE-COMMANDER brand remote control devices," and the like. (The
upper-case letters are used in something analogous to the way they were
used by Katz and Fodor, by way of suggesting that an expression is
somehow detached from its ordinary English meaning.)
Where the lawyers go wrong is in associating genericness with nominal
meanings, and assuming that if you use the mark only as an adjective
(or more accurately, as Geoff points out, as an attributive modifier),
you will have secured yourself against a competitor's claim that your
mark is generic, and hence not protectable.
But of course companies do in fact routinely use their marks as nouns, and indeed, sometimes as verbs. Scott Paper Company ran an ad for Viva paper towels some years ago that showed people using the product to clean various objects and persons against a jingle that ran, "Viva the this, Viva the that, Viva the Chris, Viva the cat…" and so on. Juniper Networks is currently running a television campaign with the slogan "Juniper your net." And "googling" and other verbal forms occur a number of times on Google's own Web site. Those would be rightly described as proper verbs, at least in the sense of "proper" that's relevant to describing nouns like Chevrolet, if not the semanticist's sense of having a unique denotation.
Arnold has explained lucidly why splitting an infinitive is sometimes obligatory, as in "to more than double." But in the interest of historical accuracy (vulgarly known as "claiming credit"), I should point out that the observation, if not its elucidation, is first recorded in the usage note for "split infinitive" that I wrote for the third edition of the American Heritage Dictionary, which appeared in 1993. It reads, in part:
In We expect our output to more than double in a year, the phrase more than is intrinsic to the sense of the infinitive phrase, though the split infinitive could be avoided by use of another phrase, such as to increase by more than 100 percent.
I'm less than happy with that phrase "intrinsic to the sense of the infinitive phrase," and to tell the truth I can't recall why I used it, though in writing these notes it's always difficult to come up with an explanation that's consistent with the limits of readers' grammatical sophistication and the requirements of brevity. In any case, when we polled the dictionary's usage panel on that example, 87 percent of them found it acceptable, though I suppose you could say that the fact that fully 13 percent demurred is evidence of just how strong a hold these superstitions have on some people.