Steven Bird has drawn my attention to a method for "publicly registering data sets with a persistent identifier and structured basic description."
This might be more exciting than it sounds.
According to the press release
"This use of DOI will provide for the effective publication of primary data using a persistent identifier for long-term data referencing, allowing scientists to cite and re-use valuable primary data. The DOI's persistent and globally resolvable identifier, associated to both a stable link to the data and also a standardised description of the identified data, offers the necessary functionality and also ready interoperability with other material such as scientific articles."
Steven points out that LDC might use this as a way to go beyond LDC catalog numbers and ISBN numbers as a way to provide durable references for published linguistic data.
It looks interesting, though I haven't had a chance to figure out how it really works in detail, and what people can really do with it.
One thing I'd like to understand better is the relationship to the Open Archives Initiative and the Open Linguistic Archives Community. Steven?
It's been a long-time dream of mine to be able to read a scientific article, and to access the underlying data and analyses through a process as simple as clicking on a hyperlink. From the other side, I'd like to be able to give readers the same sort of access to my data and analyses. It looks like we're gradually moving in that direction (though persistent identifiers for data are only part of what is needed). When we get there, I believe that it will have a much more profound impact than most people realize, affecting science in something like the way that URLs and hypertext and browsers have affected mass communications.
Posted by Mark Liberman at October 30, 2003 08:16 AM