August 03, 2004

WYSIAANWTG: What You See Is Almost Always Not What They Get

There was just a single day of business involved in the two-week trip to England from which I just returned. For the most part Barbara and I tried to devote the time entirely to relaxation, but I just had to visit Cambridge University Press to check on how things were going with the editing of the book Rodney Huddleston and I had just submitted, A Student's Introduction to English Grammar. It's just as well I did stop by.

Anyone who does not want to read a rant about the state of word processing programs and the stupidity of the human beings using them and a tale of possibly the silliest electronic submission process in the history of computers should simply pass on at this point, and not read the rest. I'm sure Mark or Eric or Arnold or someone will have some nice material about words or pronunciation or grammar that you could read instead. I have only a tale like the one told by Coleridge's ancient mariner, who stoppeth one of three on their way to a wedding feast and grippeth him by the arm and will not stop telling the story until he has dealt with the last dead albatross and the last stony glare in a dead crewmate's eye. So you and you, go ahead, you don't need to hear this. But you, stop. I need to tell my tale, and I've decided that you're it. The moment that his face I see, I know the man that must hear me. Read on.

Cambridge University Press boasts of being "the oldest printing and publishing house in the world": it was founded on a royal charter granted to the University by Henry VIII in 1534 (you needed the permission of regal and religious authorities to publish in those days, it seems; England was rather like modern Iran). However, I note that the time taken to get its first book out was fifty years: the Press "has been operating continuously as a printer and publisher since the first Press book was printed in 1584." A press with such origins is, a priori, the least likely to move rapidly toward modern methods of book production. This is actually unfair to them: in many fields of science and mathematics they are now accepting LaTeX source. But from what happened in the third week of July 2004 one could certainly get the impression that they are not yet ready for modern methods of handling text, and will remain in the 19th century for some time.

Perhaps it was quixotic of Rodney Huddleston and me to hope that we might submit the typescript of A Student's Guide to English Grammar electronically, direct from our own word processor files. Our previous work, The Cambridge Grammar of the English Language, was (fantastically) printed out as a double-spaced typescript on one side only of about 3,500 sheets of paper, and airmailed from Australia to England in a box the size and weight of a sewing machine cabinet (with sewing machine contained therein). We thought it was rather wasteful of jet fuel to send all that heavy paper in a box. We suggested simply emailing the word processor source files for this one, rather than sending hard copy plus diskette. The Press would still need a printout for the copy editor to write corrections on in the traditional manner, but we're on a tight schedule, and we thought we could gain a week by just flashing them the word processor files and having them make the printout. And CUP said they could handle that.

Now, I'm well aware that printing something out from a word processor document file generally demands that precisely the same software and hardware is in use at each end. Translation from one word processor format to another in a way that preserves what is important about a complicated text like a grammar book is possible in principle but hopeless in practice. Differences in letter width way below the millimeter level pile up and lead to problems with tabbing and tables. Linebreak differences pile up and lead to disastrous page break placements. Special characters disappear. Tables are mangled. Fonts are randomly replaced in utterly lunatic ways. If you've worked extensively with moving files between different word processors you'll know why I'm gripping your arm and warning you about this. If not, you won't listen, but you should.

I'm giving you the short version of the story. This is it. Rodney completed the final edit of the new book using WordPerfect 6 for DOS (because he standardized on that in the late 1980s and now has too big of an investment in macros and text files to switch). I was worried that the Press would never be able to find a machine with WordPerfect 6 for DOS on it, so I converted the whole book to WordPerfect 11 for Windows, checked all the pagebreaks, and emailed the file to Cambridge with instructions about how it MUST be printed using WordPerfect for Windows, version 6 or later. Then, about a week later, having travelled to England and relaxed a few days to recover from the jetlag and the final push of writing the book, on July 19 I stopped by in Cambridge on the way up to York and went into CUP's headquarters to take my first look at the typescript that I thought by now they would be copy-editing.

But no copy-editing had started. Our senior commissioning editor had flagged at least one place where a table had been botched in the printout. I rapidly saw that there were unpleasant page-break issues too. Barbara was in on the meeting too, and she glanced at a page and pointed out a tabbing error. Slowly it became clear to me that virtually all the tabbed displays had gone wrong. And suddenly I saw what should have made me jump as if stung by a bee, only sometimes you can't see unexpected things when they're really huge. The entire typescript was in a new font. It looked a lot like Apple-style Helvetica (though oddly the footnotes were in Times Roman, the font we had used). And the bullets had been replaced by decimal points. In fact hardly any of the special characters were right.

The file had been through some kind of conversion process, exactly what they promised would not happen! I pounded on the table. I shouted and hurled medium-sized objects around. Secretaries nervously checked to make sure they had the phone number of security in case things really got ugly. Our commissioning editor apologized profusely and repeatedly, and said she'd look into it. She took us to lunch in the private dining room to try and calm the situation. (This worked well; they had profiteroles on the dessert menu, which did have a calming effect.)

I learned a day or two later, by phone from York, what had happened. A young CUP intern on a short-term contract (very short, I hope) had been unable to find a machine with WordPerfect that was connected to the right printer, and it was raining so he didn't want to have to go across the road to a different building (did the little twit think he would melt?), so, without approval, he just converted from Corel's WordPerfect for Windows to Microsoft's Word for the Macintosh, and printed the result. It was about 550 pages, double-spaced. The font was too big. Nothing usable remained of most of the carefully measured tables and meticulously laid out example displays. There was nothing to be done with it but to throw it away.

And then later, when he had been told he must use WordPerfect, the same assistant botched it again. He opening our Windows files with WordPerfect for the Macintosh — a totally different program, fairly old, no longer marketed or updated, and never much good for anything. Even he was able to see that it was useless and couldn't be the basis for the copy editing. So that copy had to be thrown away too. (Yes, that's around 1100 wasted sheets of paper so far, and counting.)

He then sent the files off to the Printing Services division of the Press, and they were able to grasp the notion that for printing a WordPerfect for Windows file, it's a good idea to start with a machine running Windows, and that having WordPerfect on it would be an excellent additional feature. Even their version, though, was apparently not perfect: certain lines from tables appeared to be missing, and the pagination did not match what Rodney Huddleston had in the version he had printed out. I haven't got the details, because I'm back in California, Huddleston is in Australia, and the Press's third printout is in England. But there is a real problem about neither Rodney or me being sure we have a version with the same page breaks as the CUP copy: we will never know what the copy editor means in her emailed queries ("On page 138, seven lines up, should which be changed to that?").

So what they eventually decided they had to do was to make a xerocopy of their third printout and airmail it to Australia. Rodney had printed it out originally to check it before doing the electronic mailing, but his copy didn't match the CUP copy with regard to page breaks. Putting Rodney's copy together with CUP's three attempts at printing what they received plus the xerocopy, the number of sheets of A4 paper used up so far is around 3,300. And jet fuel had to be used in the end to send the typescript back from the Press to the senior author. And I still don't have a copy that matches anyone else's (if I printed it out here, it would be on American letter-size paper, and wouldn't match any of the other copies even approximately).

So much for electronic transfer of documents in the modern world. So much for the vaunted paperless office which I remember being told would arrive in the second half of the 1980s.

What are the lessons learned? One is that word processor software is hopeless when judged by any kind of serious standard. Glitzy stupid features are constantly added (clip-art libraries, magnifying tools, different designs for font lists, dialog boxes, menus, status lines), but the basic formats and font handling mechanisms and printer interactions and so on just aren't fit to be used as a basis for electronic transfer of documents. Even keeping pagination control stable is out of the question. In fact the supervisor of the production department at Cambridge University Press told me recently that they are returning to a strict policy of requiring authors to submit hard copy as well as a computer file. Born in the galleon age, CUP is now deciding to stick with the steam age in this regard.

In part I blame myself. I should have stopped Rodney from mailing word processor files, seized control of the submission process, and done everything in a way that involved minimum trust. Minimum trust in document transfer means only letting people have things in a page description language like PostScript or PDF. It means they can't edit the file the receive, they can only print it, and if it prints, it prints exactly the way you want it to look. [Added later: OK, so Varbidian laughs his head off at this. I should have said, it is supposed to mean it prints exactly the way you want it to look. But PDF has its own horrible problems, as Mark hints.] You are [if it works] basically sending them (in the form of a compressed machine-readable description) a picture of each page. I should have made a PDF (WordPerfect 11 does a nice PDF conversion) and sent that, guaranteeing that the font sizes and page breaks would be as originally stipulated, and that anyone with Acrobat Reader and a laser printer could print the thing looking exactly the way we wanted it to look. Editable word processor document formats won't do that. WYSIWYG stands for What You See Is What You Get. It doesn't stand for What You See Is What They Get. What you see is almost always not what they get. I knew that already. I had a very bad feeling about the idea of attempting to submit a book with technical content, tables, diagrams, etc., in a word processor format. I knew in my gut that it wouldn't work but I tried it anyway. It was crazy; like shooting an albatross for no reason. It was my fault, all mine...

Since then, at an uncertain hour
That agony returns:
And till my ghastly tale is told
This heart within me burns.

[Endnote: Now that I've taken the advice of Varbidian and Liberman and read something about font embedding and PDF and the DMCA, I am appalled at the above hint of optimism that PDF might be the answer to typescript submission problems; there seem to be many forces arrayed against the very possibility of portability for electronic documents. Just as spammers are destroying the usefulness of the email medium, font foundries are intent on destroying document portability through absurd abuse of copyright laws and criminal prosecution of freeware font designers... Things are bad out there.]

Posted by Geoffrey K. Pullum at August 3, 2004 02:03 AM