March 28, 2004

And yet.

How many times does a word or phrase need to be repeated in order to seem characteristic of a speaker or author? I think that the answer is "not very many times, maybe only once or twice, if the use in context is salient enough".

If this is true, then the kind of statistical stylistics that David Lodge worried about will not be adequate to uncover these associations. Raw frequencies certainly will not work, since these words or phrases may only be used a couple of times, or at least will only have been used a couple of times at the point where we start to associate them with the writer or speaker. Simple ratios of observed frequencies to general expectations will not work either, because at counts of two or so, such tests will pick out far too many words and phrases whose expected frequency over the span of text in question is nearly zero. As readers and listeners, we mostly ignore these cases, attributing them to the influence of topic or to random noise in the process. To model human reactions in such cases, we need to be able to discount the effects of topic, and perhaps also to understand better, in some other ways, what makes the use of a word or phrase stylistically striking or salient.

I recently came across an example of this phenomenon in the climactic scene of Jennifer Government, a satirical SF novel by Max Barry. This is the end of chapter 84, p. 313, where the book's eponymous heroine Jennifer Government arrests the arch-villain, her former lover John Nike:

... "John Nike, you are under arrest for the murder of Hayley McDonald's and up to fourteen other people."
         "What? What?"
         "You will be held by the Government until the victim's families can commence prosecution against you." She hauled him up and marched him towards the escalators. He was a pain to move. His legs kept slipping out from under him, as if he was drunk.
         "You're arresting me? Are you serious? I don't belong in jail!"
         "And yet," she said.

When I read this passage, I recognized that "and yet" -- as a phrase by itself, with the continuation left unspoken -- was an expression characteristic of the character "Jennifer". I couldn't remember any specific instances from earlier in the book where she had used the expression, though I did feel that one of them had been in a conversation with her four-year-old daughter Kate.

Courtesy of amazon.com's search function, I can easily find out how often the expression occurs elsewhere in the book. The answer is "twice". The first is indeed part of Jennifer's effort to get her daughter up in the morning, on p. 170:

Kate's eyes opened, then squeezed closed. "I'm tired..."
"It's time to get ready for school."
"I don't want to."
"And yet," she said.

The second instance is in the context of a government raid on General Motors' London headquarters (p. 195):

In a way, Jennifer felt bad, busting into such a nice place in full riot gear and scaring the crap out of everybody. But in another, more accurate way, she enjoyed it a lot. She collared a scared-looking receptionist and read out her list of target executives. "Where are they?"
         "They're--different floors. Four, eight and nine."
         "Three teams!" Jennifer said. "I'll take level nine. Meet back here."
         "You can't go up there!" the receptionist said, horrified. "This is private property! You can't!"
         "And yet," Jennifer said. She hit the stairs. She found her target by striding down the corridor and barking out his name: when a man popped his head out of an office, she cuffed him. It was much easier than she'd expected.

It seems fairly easy to explain post hoc why the phrase "and yet" as a sentence in itself should trigger our linguistic novelty detectors -- the words in this case are clearly free of topic-specific content, and the bigram "and yet" at the end of sentence, written without continuation dots, is much rarer than would be predicted given its overall frequency and the frequency of sentence-ends. However, I suspect that a scan for bigrams with quantitatively similar properties would turn up lots of unremarkable examples, and that other examples of passages evoking a similar psychological reaction might not yield as easily to simple frequentistic analysis, even post hoc.

This reminds me of Josh Tenenbaum's analysis of generalization from very small sets, down to sets of size one. It would be interesting to try an analogous approach here. A more strictly analogous problem would be inferring the 'sense' of a word or phrase from a single use in context. This is related to the point under discussion here, I think, since in many cases we seem to identify a speaker or writer's lexical habit from a couple of uses, in part by concluding that those uses constitute a novel (or at least unusual) sense.

I should point out that Max Barry (the author of Jennifer Government) tries to salt the mine, so to speak, by having the little girl in the first passage cited above respond "Mommy, I hate it when you say 'And yet.'", thus trying to clue us in overtly to his intentions. I don't think this is necessary or even effective -- I don't think it had any effect on my reactions in this case.

And yet.

No, the context isn't quite right for this to be a valid instance of Jennifer's little verbal tic, as established by the three examples in the novel. In fact, I think that any one of those examples would probably do as an adequate basis for lexicographic generalization, suggesting that my use in the preceding paragraph, though plausible enough, is not the same sense. In some sense.

Posted by Mark Liberman at March 28, 2004 02:55 PM