June 18, 2004

Predicting random eggcorns

Francis Heaney writes:

Morrissey drops an eggcorn on his new CD (as good as "Vauxhall and I", not quite as good as "Your Arsenal"), in the song "I Like You": "Something in you caused me to / take a new tact with you".

Some other common examples recently sent in by readers include "slight of hand" for "sleight of hand", and "for all intensive purposes" in place of "for all intents and purposes".

As usual, the phonetic difference between the original and the eggcorn ranges from nil ("slight of hand") to small ("for all intensive purposes").

I've pointed out in the past that one can use web search to compare rates of eggcorn usage in different contexts, for instance in news vs. the web at large. I'd like to reiterate here that web search makes it possible to predict eggcorns and investigate their occurrence experimentally -- and even to estimate their rates of occurrence.

Thus I can open a magazine on my desk to a random page, and pick a random phrase -- here is "marginal cost" -- and predict a likely eggcorn -- say, "margin of cost" -- and check the web to find it!

(link) What IT potentially offers, he says, is economies of scale, the possibility of enlarging the scope of educational activities at a relatively low margin of cost, and mass student customization of education.
(link) Leaving the law on one side and turning to economics, it is a well established principle that if one wants to maximise one's returns, one carries out an activity until the margin of cost is equal to the margin of revenue.
(link) Realize, of course, that if the report costs more to compile the margin of cost to the report to one more copy would go down, making one more copy even cheaper in comparison.

The raw rates of occurrence (195,000 whG for marginal cost, 148 for margin of cost) are not necessarily to be trusted. One needs to do some sampling to estimate the rate of valid hits -- and to do that, we really need one additional feature from Google or other web search systems, namely the ability to get a random sample of N hits. But with that proviso, one could really use such techniques to do Google psycholinguistics.

Posted by Mark Liberman at June 18, 2004 10:42 PM