June 28, 2007

Proofreading entertainment

I posted a little while back on yellow star thistle counting as two words rather than three, entertained the possibility that star thistle had originally been starthistle but had been "corrected" by a proofreader to star thistle, and noted that I was inclined to misread starthistle as start-histle.  This elicited some mail about proofreading, all of it entertaining.

First, there's a Taylor Mali video entitled "The Impotence of Proofreading" (actually, Mali SAYS "The the Impotence of Proofreading"), available several places on-line; here's the link to YouTube.  It's a comedy routine packed with (spoken versions of) typos of all kinds -- word confusions, letter substitutions, omitted material, extra material, transpositions -- many of them off-color (anal for any, Sale of Two Titties, and one of the lessons of the routine: "There is no prostitute for careful proofreading"). 

I got pointers to this video from Marilyn Martin and Chris Waterson.  Waterson wrote:

I love the fact that I can watch that and completely understand what he's saying?  Why is that?! :)

So there's actually a linguistic question here.  The short answer is that we use context and background knowledge to interpret what we hear -- to the extent that we fail to notice many speech errors that we encounter -- and that Mali has been careful to provide enough context to help us along.

Then Mae Sander picked up on the word division question and started an exchange with me about automated hyphenation programs and their discontents.  First she cited things like

Small boys in kneep-

Team leaders called co-

Then she told a story:

.. in the very early days of text processing, before personal computers, writers typed on typewriters. Their copy went to data-entry clerks who knew a mark-up language and created computer files with teletype machines or DecWriters. Reviewers and copyreaders received output from huge line-printers, which produced formatted copy on wide fan-fold paper in a monospace typeface. Typeset output (including results of automatic hyphenation) was the very last step in the process. The galley proofs arrived from an offsite Linotron typesetting machine, driven by paper tape from the mainframe computer. This is true: the introduction to the manual for a commercial version of one such text processor and its complex procedures contained this sentence:

This product eliminates the need for pro-

This story was so wonderful that I was dubious about it, but she's now supplied a ton of convincing detail.  In any case, pro-ofreaders were clearly not obsolete then.  Nor are they now.  Though brute-force methods -- really really big dictionaries with possible hyphenations specified -- can improve things considerably, and undoubtedly have.

I pointed out a few years back on the ADS-L that even correct hyphenations at line end can be troublesome, and Geoff Pullum posted here about my example:

to obtain what he wanted amid the scar-
city of planned economic life...

Surely someone has made a collection of line-break hyphenations gone awry.  And no, I don't want to start one myself.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at June 28, 2007 05:14 PM