Language Log: Copyrighting everything

July 04, 2005

Copyrighting everything

The other day while looking for something else I chanced upon this suggestion in the archive of a discussion list about the work of the Union for the Public Domain, a non-profit organization whose purpose is "to protect and enhance the public domain in matters concerning intellectual property".

How hard would it be to program a computer to eventually write every possible sentence possible in every language, then patent AND copyright both the input AND the output of such a device?

Thus copyrighting everything possible in every language!!!

The context is a discussion of how it might be possible to use intellectual property law to subvert itself and reduce what many people consider to be an abusive privatization of what should be in the public domain.

From a linguistic perspective there are two striking things about this proposal. One is that it assumes that each language contains finitely many sentences. That most if not all human languages are infinite is one of the central observations of modern linguistics. It isn't possible to generate all of the sentences of a language because you can always construct a new, longer sentence.

The other point is that it suggests that there aren't all that many languages and that they are sufficiently well understood that it would be possible to write computer programs enumerating all of their grammatical sentences. The task becomes quite implausible if one knows that there are around 7,000 oral languages plus an unknown number of signed languages.

The task becomes even harder if one realizes how few languages are sufficently well documented. Here there are no reliable statistics. There is no comprehensive survey of the state of the documentation for the world's languages. In fact, even regional surveys are hard to come by. When I did one for the native languages of British Columbia five years ago, it was as far as I can tell the first one ever done. Even in the absence of reliable figures, it is clear that the state of documentation is not very good.

For the native languages of British Columbia, I found that half had a reasonably adequate grammar and that less than a third had a decent dictionary. A recent survey of 150 of the 340 languages with over 1,000,000 speakers found no dictionary for 17. Most of the world's languages are not adequately documented.

My point here is not to berate the author of this suggestion. Both the author and many of the others in the discussion are very distinguished people. The author of the suggestion is Michael Hart, the founder of Project Gutenberg. Another participant in the discussion is Richard Stallman, founder of the Free Software movement and the GNU Project. The message is cc-ed to Brewster Kahle, creator of the Internet Archive and the associated Wayback Machine. These are intelligent, well-informed people. Yet not only did Michael Hart make what from a linguistically informed view was a wild suggestion, but not one of the points I have mentioned was picked up on in the subsequent discussion. What this goes to show is how very little most people, even intelligent and knowledgable people, know about language.

Posted by Bill Poser at July 4, 2005 01:50 PM