Via a recommendation at Infomusings, I've just read a paper by Marcia Bates that introduced me to the "Resnikoff-Dolby 30:1 Rule" (originally proposed in publications from 1971-72). Bates summarizes this idea as "suggest[ing] that human beings process information in such a way as to move through levels of access that operate in 30:1 ratios... Something about these size relationships is natural and comfortable for human beings to absorb and process information. Consequently, the pattern shows up over and over again."
Quoting (with ellipses) from Bates' paper:
Howard Resnikoff and James Dolby researched the statistical properties of information stores and access mechanisms to those stores... Again and again, they found values in the range of 28.5:1 to 30:1 as the ratio of the size of one access level to another...
•A book title is 1/30 the length of a table of contents in characters on average
•A table of contents is 1/30 the length of a back of the book index on average
•A back of the book index is 1/30 the length of the text of a book on average
•An abstract is 1/30 the length of the technical paper it represents on average
•Card catalogs had one guide card for every 30 cards on average. Average number of cards per tray was 30^2 or about 900.
•Based on a sample of over 3,000 four-year college classes, average class size was 29.3
•In a test computer programming language they studied, the number of assembly language instructions needed to implement higher-level generic instructions averaged 30.3.
Once you start looking for this kind of thing, you can find it all over the place. I conjecture that written English sentences probably average about 30 morphemes in length. I haven't ever measured this directly, nor seen any distributions, but mean sentence lengths in texts of various types tend to be about 15-25 words, and if we split compounds, regular inflections and compositional derivational morphemes this is likely to add about 5-10 tokens per sentence. You could get a wide range of numbers, depending on the mix of text types and writing styles, but the average is probably not far from 30 morphemes.
There's a considerable danger of confirmation bias in this sort of thing. We can find confirmation in the fact that military platoons average about 30 soldiers in size, but we could have picked on squads, companies or battalions instead. For sentence length, we could count syllables, morphemes or words; we can pick conversational transcriptions or text types of various kinds; we could have looked at clauses or paragraphs instead of sentences. For some combinations of choices, we're pretty sure to come out with a number close to 30.
Still, I'm prepared to believe that Resnikoff and Dolby are on to something. The main thing that makes me skeptical is precisely that I haven't heard of this idea before -- and that's a sad sort of argument.
This recent PowerPoint presentation by Ian Rowlands (with the snowclone title "30 is the new 42) closes with an appropriate set of questions:
how valid (or useful) is the 30:1 `rule’?
if it’s valid, what is the underlying explanation?
is it just a structural feature of print, or can it be extended to the e-world? (or HCI or map scales or visualisation compression ratios?)
is the report by Resnikoff-Dolby a citation sleeping giant or a dodo?
The title is an allusion to the passage in Douglas Adams' Hitchhiker's Guide to the Galaxy, in which Deep Thought gives the answer to the Ultimate Question of Life, the Universe, and Everything as "42".
Posted by Mark Liberman at March 23, 2004 12:05 AM