September 17, 2007

(an)arthrous abbreviations

The Economist Style Guide (2005), p. 7, advises us:

the : not needed before pronounceable abbreviations like NATO, UNESCO

Rachel Cristy unearthed this in a search through usage manuals for instances of Omit Needless Words (ONW) and Include All Necessary Words (IANW) advice.  This one looks like an ONW case, but my first reaction to it was that no advice was necessary: things like "The NATO is an international organization" seemed to me to be just ungrammatical, and unlikely to occur with any frequency; for me, acronyms like NATO are obligatorily anarthrous (an-arthr-ous, lacking an article).  [Reminder: acronyms and initialisms are both abbreviations made up of initial letters of words in some expression.  But an acronym is pronounced like an ordinary word, while an initialism is pronounced as a sequence of letter names.  The Economist's advice is about acronyms.]  But, yes, there's variation out there.  As Geoff Pullum noted here recently in connection with another set of proper names, there are some generalizations about arthrousness, but also many exceptions, and there is variation from speaker to speaker (and, in fact, for a single speaker on different occasions).

[Terminological note: following the Cambridge Grammar of the English Language, Geoff uses the technical terms weak and strong rather than my arthrous and anarthrous, respectively.  The intended image is that arthrous proper names can't stand on their own; they're weak and need an article, while anarthrous names require no such support.  Unfortunately, I can see a rationale for using the terms weak and strong in exactly the reverse fashion: arthrous names come with an article and so have strength "built in", while anarthrous names are weak because they're missing an element.  So rather than trying to remember which metaphor CGEL had in mind, I've opted for technical terms that, it seems to me, can't be confused.  I'm also just fond of these terms.]

First, it's not hard to find examples where the definite article in a full proper name (like the North Atlantic Treaty Organization) is preserved in the corresponding acronym; very often a writer goes back and forth between the arthrous and anarthrous variants, as on this website, which begins:

Why is NATO wrong?

The NATO is treated as beyond moral judgment: for most politicians in Europe, it simply exists, like gravity. For them, the only issues are: who should join, and where should it intervene? Nevertheless, the NATO has no moral basis: its existence and its fundamental purpose are wrong - let alone its interventions

Perhaps the writer intended the arthrous variants to be read out in full and the anarthrous ones to be pronounced as single words, but that seems unlikely to be true of all the cases you can find.

Be that as it may, FOR ME the following principle is (I think) exceptionless:

The Acronym Principle: Acronyms are anarthrous (even when the full names they abbreviate are arthrous).

This covers NASA, FEMA, MOMA, Unicef, NOAA and other acronyms whose full forms are arthrous.  It covers at least some hybrid abbreviations, like SFMOMA (part initialism, part acronym), and covers in general "coerced" acronyms, where vowels are inserted to make strings of letters (especially long strings of letters) pronounceable. like NOGLSTP, pronounced like "nogglestup" and standing for "The National Organization of Gay and Lesbian Scientists and Technical Professionals" (yes, I know, not a catchy name, but at least it's full of information).

[Style note: throughout this posting, I'll cite abbreviations without periods in them.  I understand that most style sheets call for periods in some of them, but my personal preference is for this very spare style.]

On to initialisms (abbreviations that are read as sequences of letter names).  Here the large generalization is just the opposite of the Acronym Principle:

The Initialism Principle: In general, initialisms are arthrous if their full forms are (and, of course, anarthrous otherwise).

Hang on: there are plenty of exceptions, but this is the overarching generalization.

Some examples: the FBI, the CIA, the NSA, the GAO, the SHC (the Stanford Humanities Center), the EU, the LSA, the ADS, the AAUP, the AARP, the NAACP, the NSF, the NIH, the NEH, the NEA (the National Endowment for the Humanities, the National Education Association).  I've given a fair number of examples to convince you that there's a real phenomenon here, hoping that you can multiply the examples with others of your own.  (For the Acronym Principle, this is no problem.)

A digression.  Before I go on, I want to confront an idea that some people have advanced to me about the facts so far: that the anarthrous abbreviations lack an article because they're seen as holistic proper names (like John Smith), while the arthrous abbreviations are seen AS abbreviations, and so preserve aspects of the corresponding full forms.  (Of course, the expressions we're looking at are all both proper names and abbreviations.)

I'm inclined to think that the pure-proper-name idea is an illusion created by the forms: no the = pure proper name, the = mere abbreviation.

To put the question in a larger context, let's look at some expressions that aren't abbreviations.  Note the uniform arthrousness of proper names with the head common noun river (the Mississippi River, the River Nile) vs. the uniform anarthrousness of proper names with the head common noun lake (Searsville Lake, Lake Washington).  Similarly, arthrous building (the Brill Building) vs. anarthrous hall (Carnegie Hall), though there is some American/British variation in the second case (note: the Royal Albert Hall).

There's a real system here (though one exquisitely dependent on the particular common noun that serves as head in the proper-noun constructions).  I can't see any non-circular way of viewing this as a matter of conceptualizing things in different ways; it's just convention.

In fact, it makes sense for both versions to be possible.  Proper names have (contextually) unique reference, and uniqueness is one of the two circumstances in which referring expressions are semantically definite.  [Further digression: givenness (in context) is the other, and scholars differ as to whether there are two kinds of definiteness here, or only one (and if only one, whether one of the circumstances is fundamental, or whether they are both manifestations of a single more general meaning), and as to whether languages (or varieties) can differ as to the status of the two circumstances.  But in the case at hand, uniqueness is what's at issue, and we can put off these deeper questions.]

So much for SEMANTIC definiteness.  What we're looking at now is the question of how definiteness is marked syntactically and morphologically.  There are two schemes available, and each of them has a rationale:

Economy: If the referent is unique in context, use no syntactic or morphological mark of definiteness, because it's unnecessary.  (Omit Needless Words!)

Clarity: If the referent is unique in context, use a syntactic or morphological mark of definiteness to indicate this fact.

(These principles apply to languages in general.  English has only a syntactic marker of definiteness, the article the, though other languages (including a number in the Indo-European language family, as well as many outside it) have affixes marking definiteness, either instead of or in addition to a syntactic mark, and syntactic marking other than via an article -- by word order, for instance -- is also possible.)

The competition between economy and clarity, as abstract principles, comes up all the time.  See, for example, my discussion (in my posting on at about) of economical (implicit) vs. clear (explicit) marking of relations -- there with reference to bare NP adverbials vs. P-marked adverbials.  Both principles are valid, but they can't be satisfied simultaneously; instead, the competition is negotiated though a system of conventions for specific cases, with one principle holding sway in some cases, the other in others.

So it is with proper names in English.  For personal names, English almost entirely opts for economy: Arnold (Zwicky), not the Arnold (Zwicky); yes, I know about the Donald.  (Other languages insist on clarity -- definite marking across the board -- or have definite marking for personal names in some contexts and lack it in others.)  In other spheres, English is much more variable: there are conventions, of several different sorts, and exceptions to those, and variation, both within speakers and between speakers.  Recall the river/lake and building/hall cases, consider the examples that Geoff Pullum gave in his posting --

There are some generalizations, but also many exceptions. Cities, boroughs, and regions are usually strong (like Amsterdam or New York or North Africa or Antarctica) but a few are weak (like the Hague or the Bronx or the Maghreb or the Antarctic). And remarkably, to a rough approximation at least, numerical freeway names are weak proper names in Southern California ("Get on the 55") but strong proper names in Northern California ("Take 17 South").

and check out the somewhat longer treatment in CGEL (pp. 517-8); and be prepared for more variation in the material to come.  (But bear in mind that these discussions are only samplings of the phenomena, not complete inventories.  A full treatment of definite marking in English proper names, including a survey of the variation, would fill a book.)

Digression over.  We're now ready to get back to initialisms.  For initialisms, English generally goes for clarity: the Initialism Principle.

But there are exceptions, and there's variation.  Though it's almost invariably the BBC (and not just BBC), it's also, on this side of the Atlantic, almost invariably NBC, ABC, and CBS (not the NBC etc.).  For me, it's mostly the OED, but I've occasionally written OED instead; meanwhile, AHD is, for me, almost always anarthrous (possibly because it's so often paired with NOAD, which is anarthrous because I read it as an acronym).  You can also find anarthrous occurrences of some American government agency names (NSF, NIH, NEH, DOD), though these names are usually arthrous.

A striking GENERAL exception to the Initialism Principle is the naming of educational institutions:

The Educational Principle: In general, initialisms naming educational institutions are anarthrous.

So we get: MIT, OSU, UCLA, UCSD, RPI, etc.  I say that I have a Ph.D. from the Massachusetts Institute of Technology, not Massachusetts Institute of Technology (the full name is arthrous), but I say that I have a Ph.D. from MIT, not the MIT (the initialism is anarthrous).

But, as usual, there is variation.  The Educational Principle is pretty firm for me, but it's clear that it doesn't work for everybody:

The MIT is going to change its curriculum structure that was famous for teaching Scheme in introductory courses.  (link)

As for the large educational institution in Columbus, Ohio, usually known familiarly as OSU, it famously tries to insist on a the in its full name (The Ohio State Unversity), a quirk that is sometimes carried over to its initialistic version, as the OSU (or The OSU):

Department of Electrical Engineering, The Ohio State University ... More recently, the Center of Intelligent Transportation Research (CITR) at the OSU is ...  (link)

NASA support draws upon the Byrd Polar Research Centre of the Ohio State University .... of the data acquisition plan while the OSU is responsible for, ...  (link)

(You can find similar instances of the OSU referring to Oregon State.  And some other unexpected arthrousness for other institutions.)

And, in fact, the (no doubt originally snarky) orthographic variant tOSU (or tosu or Tosu) -- an acronym pronounced /tosu/ -- has grown up to represent the arthrous variant:

Michael Floyd commits to tOSU  (link)

Why is Ohio State referred to as tosu by some people, mostly Michigan fans and others who don't like Ohio St?  (link)

Note that because the Educational Principle applies specifically to names of educational institutions, there are minimal contrasts in arthrousness for initialisms: when such initialisms stand for other things that have arthrous full names, they are almost uniformly arthrous:

the MIT for the metal/insulator transition, the Millvale Industrial Theater, the Management Improvement Team (of the USA Freedom Corps), etc.

the OSU for the Overseas Singaporean Unit, the Operation Support Unit (Denton County, Texas, Sheriff's Office), the Oxygen Servicing Unit (McNaughton Dynamics UK), etc.

Syntactic footnote.  All of the preceding was about proper names standing on their own or serving as arguments; some of these names are normally anarthrous, some normally arthrous.  But other syntactic constructions can impose their own requirements.  As a result, it's easy to find instances of normally anarthrous names preceded by the, and also instances of normally arthrous names without preceding the, but these aren't relevant to the classification of proper names with respect to arthrousness.

So: acronyms (like NATO) are normally anarthrous, as are initialisms referring to educational institutions (like MIT).  Abbreviated proper names can serve as prenominal modifiers -- NATO support 'support from/by NATO', MIT buildings 'buildings at/of MIT' -- and the resulting expressions can have preceding determiners, which means we can get things like the NATO support and the MIT buildings, which have the structure
  [ the [ NAME HEAD ] ],
not the structure
  [ [ the NAME ] HEAD ].
That is, they do not contain the arthrous proper names the NATO and the MIT; NATO and MIT are anarthrous here as elsewhere.

In fact, if the head noun is a proper name, the resulting expression is a (more complex) proper name, which may itself require a  definite article: the NATO Secretary General, the MIT Media Lab.  But once again, the article belongs to the outer layer of structure, and NATO and MIT are again anarthrous, despite the preceding the.

In the other direction, though initialisms are generally arthrous, an abbreviated proper name serving as a prenominal modifier is obligatorily bare, so we get things like A local startup has gotten CIA funding (with CIA funding 'funding by/from the CIA'), not A local startup has gotten the CIA funding.  The condition on prenominal modifiers trumps the arthrousness of initialisms.

