As an MIT grad, I've been getting the MIT alumni magazine Technology Review in the mail for many years, and I generally read with interest and pleasure. But Tech Review is now undergoing some changes, to make it bloggier, or at least webbier: more "immediate", more "searchable" and more "interactive". Unfortunately, these changes are not all for the better, as I evaluate them so far, because the information seems also to be becoming less reliably true. At least, in a couple of recent cases, Tech Review presented false statements dealing with simple matters of fact, about which the truth could have been learned in a few seconds of Google searching.
Ironically, old-media pundits have been complaining for years that blogs and wikis and such, lacking editorial oversight, are not factually reliable. This was never true, in my experience -- bloggers who know their areas are more reliable, on average, than journalists are. But what seems to be happening now is that Tech Review is aiming for the immediacy of blogging and other new media, in a way that really does degrade factual reliability rather than improving it.
This is a shame, because the theory behind the changes seems otherwise to be a good one. The new editor, Jason Pontin, has smelled the same new-media coffee as everyone else in the industry, and writes:
Readers want information to be immediate, searchable, and easily customized, and advertisers are demanding accountability from the publishers who take their money. Put baldly, the era when publishers could rely on print magazines to satisfy their readers and build sustainable businesses is over.
In keeping with MIT's history of innovation and leadership, Technology Review has decided to invest more of its resources in interactive media.
Specifically, Pontin explains, they're going to:
• Decrease the frequency of the print magazine to bimonthly publication;
• Focus the print magazine on what print does best: present longer-format, investigative stories and colorful imagery;
• Dramatically increase the number of stories we publish on technologyreview.com every day;
• Expand the range of media we employ online to include podcasts, blogs, RSS feeds, and a variety of new technologies;
• Focus all our editorial content on the impact of emerging technologies and discontinue our coverage of the business models and financing of new technologies.
My first indication that the dramatic increase in online content might be sacrificing accuracy was a story by Kate Greene about machine translation, datelined Wednesday, January 18, 2006, under the headline "Repetez, en anglais, s'il vous plait". It contains this paragraph:
In 2005, DARPA also announced the Linguistic Data Consortium (LDC), a project aimed at acquiring huge amounts of translated documents, for distribution to Global Autonomous Language Exploitation, another DARPA-funded project in which computers will process the data. The intention of both of these initiative is to speed up the progress in machine translation. LDC is currently in the first year and will be transcribing speech from broadcast news sources and talk shows in Arabic, Chinese, and English, and also cataloguing text newswire feeds, Web news discussion groups, and blogs in those languages. For now, the project is focused mainly on data collection from these genres, with researchers in the computer and engineering science department at the University of Pennsylvania doing much of the work.
Now, the truth of the matter is that the LDC was founded in 1992, not 2005, and has been publishing materials for speech and language research since 1993. And the LDC's goals are quite a bit broader than collecting translated documents for MT research. And only a few of the LDC's staff members are associated with Penn's CIS department. And many LDC publications are authored by researchers from other institutions around the world. I know all this because I was the P.I. on the initial DARPA grant (which ended in 1995), and continue to direct the organization. Greene could have learned the facts about the LDC by asking Google for information on {linguistic data consortium history}, or poking around on the LDC web site for a few minutes, or by contacting someone at the organization.
These are small points, which I wouldn't care much about if I didn't have a personal connection to the work. I mean, 1992, 2005, what's 13 years in the grand tapestry of human history? In some ways, Greene's story is a step up from the July 2003 NYT story on an earlier DARPA MT evaluation -- which didn't mention DARPA at all, or the LDC for that matter, though it did track statistical MT back to 1999 or so. And I'm impressed that Tech Review allows comments on its online articles, so that readers can offer corrections.
However, it bothers me to think that when I read an article in Tech Review, I have to allow for the possibility that its "facts" are plainly and simply false, in ways that anyone can discover in a few seconds of research on the web. I don't have the time to check all the facts in every article that I read, so I like to think that in a reputable and well-edited publication like Tech Review, someone will have done that for me, at least to a first order.
Is this an isolated case of an unchecked mistake of fact? Apparently not. When I took a look at the Technology Review front page this morning, one of the prominently displayed blog headlines was "Lifespan for CD-Rs Around Two Years". The blog post behind the headline, by Brad King, quotes as if it were fact a 1/10/2006 IDG News Service story, which in turn quotes Kurt Gerecke, identified as "a physicist and storage expert at IBM Deutschland":
"Unlike pressed original CDs, burned CDs have a relatively short life span of between two to five years, depending on the quality of the CD. There are a few things you can do to extend the life of a burned CD, like keeping the disc in a cool, dark space, but not a whole lot more."
That's scary stuff -- think of all the crucial stuff naively saved on CD-Rs! But is it really true?
I checked into it a bit, not to get on Tech Review's case, but because I was genuinely worried about all the crucial data that I have backed up on CD-Rs. And apparently, it ain't necessarily so. The wikipedia article on CD-Rs says:
There are three basic formulations of dye used in CD-Rs:
- Cyanine dyes were the earliest ones developed, and their formulation is patented by Taiyo Yuden. Cyanine dyes are mostly green or light blue in color, and are chemically unstable. This makes cyanine discs unsuitable for archival use; they can fade and become unreadable in a few years. Many manufacturers use proprietary chemical additives to make more stable cyanine discs.
- Azo dye CD-Rs are dark blue in color, and their formulation is patented by Mitsubishi Chemicals. Unlike cyanine, azo dyes are chemically stable, and typically rated with a lifetime of decades.
- Phthalocyanine dye CD-Rs are usually silver, gold or light green. The patents on pthalocyanine CD-Rs are held by Mitsui and Ciba Specialty Chemicals. These are also chemically stable, and often given a rated lifetime of hundreds of years.
The same article says that
With proper care it is thought that CD-Rs should be readable one thousand times or more and have a shelf life of several hundred years. Unfortunately, some common practices can reduce shelf life to only one or two years. Therefore, it is important to handle and store CD-Rs properly if you wish to read them more than a year or so later.
And this 1995 paper "Lifetime of KODAK Writable CD and Photo CD Media" applies an Arrhenius model to the criterion of "maximum block error rate less than 50", and finds that
That model predicts (at the 95% confidence level) that 95% of properly recorded discs stored at the recommended dark storage condition (25°C, 40% RH) will have a lifetime of greater than 217 years.
It wasn't hard to find this information: these pages were the first and third hits on a Google search for {CD lifetime}.
I'm glad to be warned that low-quality CD-Rs may lose data after a couple of years, and from now on I'll check to see what dyes are used in the CDs I buy. (I checked the ones I've been using, and I think I'm OK.) The E-MELD "School of Best Practices in Digital Language Documentation" mentions this problem in the general context of hardware and software obsolescence, but doesn't make any specific recommentations (that I could find in a quick search, anyhow), except the suggestion to
Place archival copies in a stable online linguistic archive that will:
- Maintain a constant URL.
- Migrate data to new formats
Good idea -- the LDC, among other outfits, stands ready to publish significant and well prepared language documentation archives -- but E-MELD ought also to tell language documenters to use CD-Rs with phthalocyanine dyes. And Tech Review should have done so, too, rather than just repeating an apparently incorrect newswire story.
Posted by Mark Liberman at January 19, 2006 02:30 PM