Here's an idea whose time has come: scientific and technical papers should include an explicit, executable recipe for generating their numbers, tables and graphs from published data.
Traditional scientific and technical journals require authors to specify their materials, methods and analytic techniques precisely enough to permit replication, because replicability is the foundation of the scientific method and the engine of technological progress. Now that the scientific and technical literature has become a networked digital achive, we can do better. We can expect articles to include an executable -- and readable and modifiable -- procedure for turning published data into the numbers, tables and graphs that play a role in their argument.
In a sense, such executable articles are self-replicating. Of course, genuine replication requires application to new data; but executable articles lower the barriers to such generalization. And there's certainly also a benefit to re-implementation of complex algorithms, to avoid the possibility of bugs or perniciously special cases -- but executable articles make this kind of replication more likely as well, just because they make it so much easier to get in the game at some level in the first place.
Among the many good consequences, I'd like to emphasize three:
There are many examples where this sort of thing is starting to happen. But it's far from being the norm, and there are plenty of problems in the way of making such practices more general. For example:
And the biggest problem, of course, is the cultural conservatism of the academy.
As we look across the disciplines of science and engineering, we can see the seeds of plausible solutions to each of the problems, as well as some subdisciplines where moves in this direction are already underway. But I don't know any subdisciplines where executable articles, in the full sense, have become the norm. And I don't know of any complete and general solutions to the problems -- indeed, I doubt that a single solution is appropriate across all the diverse types of data, algorithms and disciplines.
The way to make progress, in my opinion, is for people to start experimenting more widely. This could be done by encouraging experimental but regular publication of executable articles in existing journals, or by starting new journals that specialize in such papers. Scientific and technical societies in relevant areas could also play a useful role in encouraging this development.
And funding agencies could do a lot to foster the development of needed infrastructure, and to encourage its use.
In the language-related fields where I work, executable articles would (in my opinion) be an especially good thing. If you're interested in pursuing the idea, let me know.
Posted by Mark Liberman at January 3, 2007 09:03 AM