April 19, 2007

Truthiness in the funny papers

I guess that cartoonists don't get in trouble for making up numbers, but still:

Actually, you get 92,400 from a search on Google, 119,000 from a search on Yahoo, and 12,963 from a search on MSN. These happen to be similar to the numbers you get from a search for {Doonesbury bullshit} -- well, at least the 108,000 hits from Google is a comparable number, though the 56,600 from Yahoo and 7,191 from MSN are considerably lower. Not that I'm implying anything about Garry Trudeau's veracity in general, you understand.

I'm in Phoenix for the NSF/JISC Repositories Workshop, and one of the meeting's themes is "data-driven science and scholarship", and especially the impact of easily-accessible digital data. As the workshop's website says:

Some academic leaders are beginning to recognize that data-driven science is becoming a new scientific paradigm – ranking with theory, experimentation, and computational science. Fewer people appreciate that the combination of large-scale digitization of books, scholarly journals online, and huge data sets provides opportunities for new methodologies for scholarship and research in all academic disciplines. Is this really a fourth paradigm of science or is it new wine in old bottles? Can we articulate the importance of this area, so that university presidents (in the US) and vice-chancellors (in Britain) understand the potential and challenges?

Easy access to published data makes makes replication and new research much more efficient for the pros, but it also opens things up to interested citizens in general. Since the barriers to communication are also lowered, this radically democratizes the whole social process of rational inquiry. One result is the WCFCYA effect. And if it's fair to fact-check scientists, journalists and politicians, I guess it's also fair to fact-check cartoonists. Especially when it's as easy as this.

In a panel discussion yesterday on the role of individuals in this new process, David Rosenthal brought up a much more interesting and consequential case: a recent post on The Oil Drum by Stuart Staniford, "The Status of North Ghawar", 4/7/2007. Staniford, who was originally trained as a physicist, spent most of his career working on computer security, but he's recently gotten interested in energy issues. In the blog post in question, he makes a complex argument based on a detailed cross-comparison of information from plots extracted from several different papers in the technical literature, pulling things out of them that the papers' authors never intended to reveal.

David cited in particular a comment by Staniford about half way down the (long) list of comments on that post:

As I said in private email the other day, I have some experience of scientific publication in my own field, and I guesstimate that I am 2-3 times as productive doing this kind of "blog-science" as in the traditional mode. The ability for anyone in the world, with who knows what skill set and knowledge base, to suddenly show up and decide to be part of the collaboration in real time is just an amazing thing.

That said, it's scarier, because one is much more likely to make errors (in public) when operating at this tempo. The only cure for that is a willingness to admit them and move on. And a recognition on all our parts that this kind of work will have more errors in any given piece of writing, and it's the collaborative debate process that converges towards the truth. And of course there's a lot of noise along with the signal.

I'm not sure the problem of clashing egos is any less severe amongst Oil Drum editors, contributors and commenters than in traditional academia, though :-)

[Steve, aka Language Hat, writes:

Dude, it's a comic strip! It's humor! You're reminding me of Charles Babbage, who once wrote to Tennyson: "Sir: In your otherwise beautiful poem The Vision of Sin there is a verse which reads: 'Every moment dies a man / Every moment one is born.' It must be manifest that, if this were true, the population of the world would be at a standstill... I would suggest you have it read: 'Every moment dies a man / Every moment one and one sixteenth is born.' I am, Sir, yours etc, Charles Babbage."

I am, Sir, yours etc,

Steve

Well, I thought I was being humorous as well, but I guess I need to use a special font, or something.

More seriously, Doonesbury is a funny (both meanings) case -- the series on Romney's flip-flops was full of (specific and true) facts about the history of his position changes, so a naive reader might be pardoned for thinking that the web search number was also true, though it was so large that it was pretty clearly made up.

And generally, I do think that it's a Good Thing for people to get into the habit of thinking about the evidentiary basis of stuff they read, especially quantitative claims that conflict with common sense or with easily accessible evidence. Doing this to jokes and comic strips is a joke, sure enough, but the joke is a reflex application of a good habit, even if it verges on Babbagery.

In this particular case, I'd planned to blog about David Rosenthal's pointer to Stuart Staniford's post at The Oil Drum, and a joke about the made-up number in this morning's Doonesbury strip seemed like a good lead-in. ]

[A comment from Andrew Clegg:

I read with interest your Language Log post on fact-checking.

Given your comments at the end, I can't help but wonder if Babbage was using a special font.

I am, Sir, yours etc,

Andrew.

Later on, Andrew sent this additional Babbage quotation:

"On two occasions I have been asked [by members of Parliament!], 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."

]

Posted by Mark Liberman at April 19, 2007 09:36 AM