July 30, 2005

"Quotations" with a word error rate of 40-60% and more

In my last post, I cited an extreme example of editing Shakespeare for performance, and I mentioned in passing that journalistic quotation is also so selective -- and often so inaccurate -- as to become a form of creation. Since the cited examples were distributed over a number of posts covering other topics as well, I'm reproducing some of the material here for the convenience of interested readers.

The key step is choosing what to include and what to leave out. If you quote people verbatim but change the context, omit qualifications and so on, you can change their emphasis or even make them seem to say something that they never meant at all. However, I'm not dealing with that issue here. Instead, my point is that even after the crucial choice of material and its context of presentation has been made, journalists are remarkably careless about the accuracy of the words that they put in direct quotes in their stories.

Within the past month or so, I've examined the details of this practice in the case of remarks by Rasheed Wallace (here and here), George W. Bush (here, here, here), and Tim Duncan ( here). My motivation was not to beat up on journalists, but to follow up on a remark attributed to Rasheed that caught my eye, and one attributed to W that similarly attracted Eric Bakovic's interest. In each case, a quick survey via Google News showed that there were essentially as many different versions of the quotes as there were journalists quoting them. Comparison with a careful transcription of recordings available on the web revealed that none of the journalists' versions were accurate. For convenient comparison, I then took a look at the reporting of Tim Duncan's remarks in the same post-game press conference in which I transcribed Rasheed Wallace's portion, and found that Tim was quoted even less accurately.

Looking at the New York Times' versions, and putting the original in black and the journalistic approximation under it in red, we have this for Rasheed's quote:

Uh, just- just went at it as- as another good game, 
uh, even though I did a bonehead play the other night, I just made a bonehead play the other night,
had to put it behind me, it was over with, I had to put it behind me
and we just came out here and had to play tonight and I had to come to play tonight

Leaving out the first clause and the "uh", there are 6 insertions and 13 deletions in 31 words, which is a Word Error Rate of (6+13)/31 = 61%, using (a version of) the metric employed in the speech recognition biz.

The same comparison for the quote from W gives

you 've got        people here who are  working to alleviate poverty
you         have   people               working to alleviate poverty
and help rid the world of the pandemic of AIDS and rid the world of the pandemic of AIDS
and they 're working on ways to have a clean environment and ways to have a clean environment

If we split off 've and 're as separate words, we get a score of 1 insertion and 10 deletions in 32 words, which is a WER of (1+10)/32 = 34%. If we don't, we get 2 insertions and 10 deletions in 30 words, for a WER of 12/30 = 40%. The Chicago Tribune's performance on Tim Duncan's quotes is worse, with hundreds of words omitted between fragments that are presented as if spoken continuously, but I won't repeat the details here.

As I observed in earlier commentary, the poor quality of the quotations is mostly due to the practice of using handwritten notes that are not checked against recordings. This would have been a plausible excuse 50 years ago, but it's pretty pathetic now. And (as I also noted), the sense of the speakers is generally preserved in the cases cited above -- although the NYT represented President Bush as committing a grammatical solecism of which he was in fact innocent, while standardizing Rasheed Wallace's lexical choices -- but the same level of approximation could easily result in serious misrepresentation. This can happen because of simple carelessness, or because of prejudicial misperception or memory errors, or because of malice. It's hard to tell and in the end it doesn't matter.

The fact is, the standards for direct quotation in print media are scandalously low, and should be reformed. If a student took similar liberties with print quotations in a term paper, (s)he would given a serious lecture on the responsibilities of scholarship. If an academic scholar did this, it would be grounds for rejection of publications submitted for review, and charges of culpable carelessness if not outright fraud. I'll bet that lawyers or judges who quote law, precedent or testimony this loosely don't retain the respect of their peers, if indeed they ever make it through law school. If a religious authority started "quoting" from scripture using these methods... well, you can go on in this vein yourself.

This doesn't mean that journalistic quotes have to reproduce every stutter and stammer, every self-correction, every um and uh. But there should be an explicit policy about what kind of editing is permitted (or even encouraged), and what is not. And in fact such policies do exist. For example, The New York Times Code of Ethics says that

Readers should be able to assume that every word between quotation marks is what the speaker or writer said. The Times does not "clean up" quotations. If a subject’s grammar or taste is unsuitable, quotation marks should be removed and the awkward passage paraphrased. Unless the writer has detailed notes or a recording, it is usually wise to paraphrase long comments, since they may turn up worded differently on television or in other publications. "Approximate" quotations can undermine readers’ trust in The Times.

The writer should, of course, omit extraneous syllables like "um" and may judiciously delete false starts. If any further omission is necessary, close the quotation, insert new attribution and begin another quotation. (The Times does adjust spelling, punctuation, capitalization and abbreviations within a quotation for consistent style.) Detailed guidance is in the stylebook entry headed "quotations." In every case, writer and editor must both be satisfied that the intent of the subject has been preserved.

This strikes me as just about right, except that it puts far too much trust in "detailed notes", which are hardly ever a reliable guide to speakers' actual words. And I think it should be clear from the examples cited above, which I believe to be typical of current practice at the NYT, that the spirit of this policy is not being followed in practice.

I don't believe that other papers are any better in this respect. Just for comparison, here's the Houston Chronicle's version of Rasheed's quote, which combines fragments from his answers to two different questions, and adds a phrase that (as far as I can tell) he never said at all during the post-game interview being reported on. Again, Rasheed's actual statements are in black, and the Chronicle's quoted version is in red. As the quote appeared in the original article, it was

"I did a bonehead play the other night,", he said. "I had to put it behind me. It was over with. It was no pressure. I don't feel pressure. I had to do the things I needed to do."

Rachel Nichols:
How emotionally did you approach the game, and how do you feel you played?
Rasheed Wallace:
Uh, just- just went at it as- as another good game, 
uh, even though I did a bonehead play the other night, I did a bonehead play the other night.
had to put it behind me, it was over with, I had to put it behind me, it was over with.
and we just came out here and had to play tonight
Rachel Nichols:
As a group, the Pistons all talk about how you guys are best when your backs 
are up against the wall.
How do you feel that you personally react when you're under pressure?
Rasheed Wallace:
Uh, I mean it's        no pressure, I don't- I don't feel pressure.
                it was no pressure.          I don't feel pressure.
Uh, no matter if it's the game winning shot, or I got the ball, you know, last possession,
I don't feel no pressure. 'Cause you still got to go out there and play.
I had to do the things I needed to do.

And if you want to see something approximating what Colin Hurley did to Shakespeare, take a look at what the Chicago Tribune did to Tim Duncan.

Posted by Mark Liberman at July 30, 2005 11:49 AM