April 16, 2005

The future of the history of usage

The OED traces "could care less" back to 1966:

1966 Seattle Post-Intelligencer 1 Nov. 21/2 My husband is a lethargic, indecisive guy who drifts along from day to day. If a bill doesn't get paid he could care less.

A few days ago, Benjamin Zimmer supplied a citation from 1955, which he got from searching the ProQuest Historical Newspapers database:

This Morning . . . With Shirley Povich
Washington Post, Sep 25, 1955, p. C1
The National League clubs have always shied from pitching left-handers against the Dodgers, but Casey Stengel could care less about the Dodgers' reputation for beating southpaws.

The ProQuest Historical Newspapers and American Periodicals Series (APS) databases are the leading edge of a series of developments that will make it possible to study, in an entirely new way, the origin and progress of new idioms, constructions and word senses. All we can do so far is to search for words and word sequences, contingent on source and/or date, but this is already very useful.

When researchers have fuller access to the back-end corpora of OCR'ed text, or when outfits like ProQuest have access to modern NLP technology, it will be possible to search over corpora that have been automatically tagged for morphological and syntactic properties, word senses, discourse function and so on. An even more important innovation will be the ability to go beyond the search for the earliest citation, or for a representative series of historical citations, and instead to create richer compilations of information about changes in usage as a function of time, space, genre, personal identity and so on.

There are many legal, social and technical issues between us and that happy end, but the first step is establish a digital archive of the historical texts, and that is already happening.

[Note that the various ProQuest databases are subscription services, which may be available to you through a library. If the University of Pennsylvania is typical, university libraries subscribe to some but not all of the relevant services -- though the Penn library I can access the APS sources, and the New York Times portion of the historical newspapers archive, but not the other papers. You may also be able to access such databases through some public libraries.]

Posted by Mark Liberman at April 16, 2005 11:56 AM