September 13, 2004

Final periods and quotation marks: harder than you thought

There’s a punctuation rule that American publishers follow rather strictly though British publishers do not: when an expression contained in quotation marks falls at the end of a sentence, a following comma or period (though not a colon, semicolon, exclamation point, or question mark) should be moved leftward to fall inside the quoted string. You might have thought it was child’s play to enforce that by algorithm. It isn't. We’ll consider just the issue of single quotation marks and periods. (Single quotation marks are less common in American printed sources than double quotation marks, but I'll deal with that issue below.) Since it looks really confusing to try and mention punctuation marks in print so you can talk about them, I'll refer to the right single quote character as <RSQUO> (after its HTML code &rsquo;), and I'll call the period or full stop <PERIOD>. The rule for correcting to the American practice could be (you might think) simply this:

Change any occurrence of <RSQUO><PERIOD> to <PERIOD><RSQUO>.

But a single sentence in the latest New Yorker caused me to realize that it isn’t that simple; it can never be simple; it is extremely hard, about as hard as the whole enterprise of accurately parsing arbitrary English syntactic structure.

The reason is simply that <RSQUO> is ambiguous in function: it serves both as our right single quotation mark (which must be matched with a left one that occurs earlier) and as the apostrophe (which is really a 27th letter of the alphabet that occurs in the spelling of certain words like won’t and children’s and has nothing to do with quotation). No font distinguishes these. What caused me to see that this matters a great deal was the underlined part of the following (the context being a discussion of how everywhere Al Gore goes he has to put up with people expressing sympathy for him and also grief of their own over the Florida election in 2000):

He has to face not only his own regrets; he is forever the mirror of others’. A lesser man would have done far worse than grow beard and put on a few pounds.

Here the <RSQUO> character is functioning as the apostrophe. It is part of the spelling of the regular genitive plural suffix, as in a phrase like several butchers’ aprons. Notice, the article is not saying that Al Gore is forever the mirror of others, i.e., other people; it is saying that he is forever the mirror of others’ regrets, i.e., other people’s regrets. But it would be perfectly possible to have a sentence like this (it doesn’t state a true claim, you understand, it’s just an example of a possible sentence; the bit inside the single quotes asserts, unlike the sentence quoted above, that he is the mirror of other people; and notice that I’m punctuating it wrongly according to the rule, to exhibit the contrast):

The New Yorker article said, ‘He has to face not only his own regrets; he is forever the mirror of others’.

That sentence would need to be changed under the American; it should be given like this:

The New Yorker article said, ‘He has to face not only his own regrets; he is forever the mirror of others.’

In case you're thinking that this won’t come up very much because usually we use double quotation marks for quotations, let me remind you first that this differs between publishers (the Linguistic Society of America style sheet requires single quotes), and second, more importantly, single quotation marks are used for quotations within quotations enclosed in double quotation marks. Consider this example:

Geoff Pullum writes on Language Log: “The New Yorker article said, ‘He has to face not only his own regrets; he is forever the mirror of others’. A lesser man would have done far worse than grow beard and put on a few pounds.’ Here the <RSQUO> character is functioning as the apostrophe.”

Here the first period must not be moved, but under the American rule the second one must! [Nerd note: Sophisticated computational linguists will immediately see that there is an argument here, based on quote patterns alone, to the effect that no finite state device can ever successfully recognize all the contexts in which the order of <RSQUO> and <PERIOD> must be change. I will not give the proof here, as the margin of this post is too small to contain it. End of nerd note.]

The bottom line: in order to tell whether you should change <RSQUO><PERIOD> to <PERIOD><RSQUO> you have to determine whether or not you’re inside a single-quoted sequence, and also determine whether the word before the period is a regular genitive plural. It’s non-trivial. There is no telling how long a passage in single quotes might be: the opening quote might be any number of sentences off to the left, and the closing quote might be any number of sentences off to the right, past any number of apostrophes. And the only way to tell whether you’re looking at a regular genitive plural is to grasp

  • the morphology (e.g.: does this noun take regular inflection?), and
  • the syntax (e.g.: is this noun in a structural position where genitive case is allowed?), and
  • the semantics (e.g.: is this sentence to be understood as making a reference to other people, or implicitly to other people’s regrets?),

all in full detail. Quite beyond the capacities of computational linguists at the moment.

Everything’s so much harder once it’s been given a simple explanation by a linguist, isn’t it? Sigh.


[Revised a little on September 14. Thanks to Glen Whitman for an interesting observation that contributed to this expanded version.]

Posted by Geoffrey K. Pullum at September 13, 2004 12:56 AM