November 04, 2007

Lexical repulsion

This is not word aversion. Nor is it word rage. In fact, no human emotions are involved, at least not in any obvious way.

The source is Antoinette Renouf and Jayeeta Banerjee, "Lexical Repulsion between sense-related pairs", International Journal of Corpus Linguistics 12(3): 415-443, 2007. From their abstract:

We have proposed that there is a hitherto unexpored textual feature, which we call 'repulsion', which operates on the construction of meaning in an opposing way to that of word collocation. ... We focus on "lexical repulsion,' by which we mean the intuitively-observed tendency in conventional language use for certain pairs of words not to occur together, for no apparent reason other than convention.

For example, they suggest, merry tends to collocate with christmas, and happy with birthday; but also, merry actively resists combination with birthday. Thus they cite these counts from a corpus:

Word 1 Frequency   Word 2 Frequency   Collocates
merry 2326   christmas 90670   450
happy 8323   birthday 2416   526
merry 2326   birthday 2416   0
happy 8323   christmas 90670   299

Google counts show a similar pattern:

Word 1 Frequency   Word 2 Frequency   Collocates
merry 38.4M   christmas 288M   2.64M
happy 519M   birthday 185M   13.2M
merry 38.4M   birthday 185M   27.9K
happy 519M   christmas 288M   1.53M

This is an interesting paper. But even if you're a pro in the field of text analysis, you probably haven't come across it. The authors have not (as far as I can tell) posted a copy on their websites, or deposited one in a repository. (John Benjamins, the publisher, is missing from this list, suggesting that perhaps they don't allow such archiving.) And the International Journal of Corpus Linguistics is not very widely available. I happen to have an individual subscription, but my university's (excellent) library does not subscribe -- IJCL is available through ingentaconnect, but apparently only after a delay. To get a copy of this single article on line, I (or the library) would have to pay $41.88. That's a lot for 28 pages.

Although I'm in general in favor of open-access journals, I'm not an open-access absolutist. Someone has to pay, somehow, for the legitimate costs of running a journal. But for a journal with limited circulation, the IJCL pricing model could hardly have been better designed to minimize impact. As a result, I would think very hard before hiding my work under the proverbial barrel by publishing it in IJCL, and I'd certainly advise students and postdocs and junior faculty to think carefully through the issues as well.

[Update -- Tanja Säily writes:

Having just read your Language Log post on lexical repulsion, I'm happy to let you know that a related article by Renouf and Banerjee will be published by the end of the year in the second volume of a new e-series entitled Studies in Variation, Contacts and Change in English. This series will be freely available on the web at http://www.helsinki.fi/varieng/journal/.

]

[Graeme Hirst writes:

What Renouf & Banerjee call "lexical repulsion" seems to be pretty much the same as the idea of "anti-collocations" proposed by Darren Pearce in 2001, which was subsequently developed by my former student Diana Inkpen in her 2003 dissertation and published in papers by Inkpen & Hirst in 2002 and 2006.

Inkpen, Diana Zaiu and Hirst, Graeme. "Acquiring collocations for lexical choice between near-synonyms.'' SIGLEX Workshop on Unsupervised Lexical Acquisition, 40th meeting of the Association for Computational Linguistics, Philadelphia, 12 July 2002, 67--76.

Inkpen, Diana and Hirst, Graeme. "Building and using a lexical knowledge-base of near-synonym differences.'' Computational Linguistics, 32(2), June 2006, 223--262.

Pearce, Darren. 2001. "Synonymy in collocation extraction." In Proceedings of the Workshop on WordNet and Other Lexical Resources, Second meeting of the North American Chapter of the Association for Computational Linguistics, pages 41-46, Pittsburgh, USA.

]

Posted by Mark Liberman at November 4, 2007 07:43 AM