September 18, 2007

Lowest of any of the others

I'm used to seeing comparatives like "taller than anybody on his team" meaning taller than anybody *else* on his team, and have imagined (off the top of my cuff) that we automatically accommodate a restriction on the domain, though I don't know if there's been any real work on how this works, whether the same phenomenon is found cross-linguistically, etc. I probably commit this 'error' myself sometimes -- the logician in me hates it, but the non-prescriptivist in me reminds the logician-in-me that it's quite benign, since it would never be misunderstood, so why fuss? Just file it away as an interesting curiosity that I hope someone someday will work on (or already has? I'd be curious to know) and explain how/why it happens so naturally.

But now I've found a superlative construction with the 'opposite' 'error': "I really like the look (and feel!) of my ifrogz case. It has the lowest profile of any of the other cases I have used." -- from a testimonial in the metrobagz part of the ifrogz site, http://ifrogz.com/products/metro-bagz/.

I wouldn't know how to account for it unless it could be called hypercorrection (I somehow doubt it), and am wondering whether the different domain requirements of the comparative and superlative constructions are just very commonly confused (the way "3 times as big as" vs "3 times bigger than" are distinguished only by pedants or for small fractions like "50% as big as" vs "50% bigger than").

This time we have to EXPAND the domain to put the item being compared back IN, since the superlative construction needs to involve a set that includes the item in question. My inchoate hypothesis for the comparative examples was that when comparing an individual and a whole set, it's natural to invoke some sort of implicit disjoint reference idea, so that even when we say "[taller than] everybody in his class", we mean "everybody except him". But this example really messes up any such idea, since "any of the other cases I have used" requires MORE words, and we have to add the ifrogz case back in to provide a suitable domain for the superlative. So it can't be anything like an accommodation of 'disjoint reference'.

Although this construction seems unrelated to negation, it somehow feels a lot like all the "can hardly underestimate his importance" examples we've been spotting lately.

I was also unsure whether "any" is normal in superlatives, thinking it may usually be restricted to comparatives. But when I asked around, Larry Horn assured me that it occurs very readily in superlatives. Larry wrote:

-----

NPIs in general are fine in superlatives--"the lowest profile I've ever seen" is impeccable, as is "the toughest problem I have yet encountered", and "the lowest profile of any of them" is OK for me as well. And my intuitions aren't unique; just checking a few of the likely suspects, I find e.g.

"most of any": 404,000 google hits

"best of any": 242,000 google hits

"biggest of any": 9,900 google hits

-----

In any case, maybe "has the lowest profile of any of the other cases" could be analyzed as a blend of comparative and superlative: "has a lower profile than any (of the) other X" + "the lowest profile of any X" (Larry's formulation).

Larry agrees that it's an interesting phenomenon, and that it's more problematic than the restriction-accommodation case of "taller than anyone in his family", which he sometimes uses variants of for translations in intro. semantics.

Google peculiarities: When I tried to get a rough Google comparison of "biggest * of any of the other" vs. "biggest * of any of the", I actually seemed to get a much bigger number for the first, though it should be a subset of the second. I got 106,000,000 for the first and just 12,800 for the second! But then with some help from Kai von Fintel and David Beaver, it was discovered that Google behaves very strangely with some ungrammatical strings. Closer inspection of the return from the search that seemed to give 106,000,000 hits shows that it returns only 3 pages of results, with the number 106,000,000 at the top of pages 1 and 2, but the number 21 on page 3, and in fact it only returned 21 hits.

David sleuthed out the phenomenon; here's his report.

***********

Unfortunately, the numbers given as results of google searches have become less meaningful over the last few years rather than improving in any sense relevant to us. The numbers google gives in response to a query are not counts of the number of pages with the given string. Rather, they are estimates based on a formula that, so far as I know, is not public. For simple searches, the estimate is presumably based on a calculation of the probability of the page having all the search terms based on the number of pages in the google caches for each of the component terms. But once you start doing string searches, this sort of approach becomes very unreliable.

I assume that the oddity of the result for "biggest * of any of the other" occurs because Google doesn't have any smart way to calculate the likelihood of strings for which the number of responses appears too large to simply count them. That is, I guess the algorithm works by first putting some bounds on the likely number of hits based on e.g. how rapidly various google network nodes appear to be sending responses, and if that number is sufficiently small, then google uses some fairly accurate algorithm for estimating the total, like counting every single response. But if there appear to be loads of responses, then the algorithm makes an estimate based on, well, who knows what. In the case at hand (and similarly for "smallest * of any of the other", "largest * of any of the other"), the estimate assumes some distributional properties that just don't hold for semantically or syntactically anomalous strings. Then, as you start going through the hits, Google is forced to self-correct as soon as you force it to actually enumerate all the results.

Hmm. So, if I'm right, then Barbara has stumbled on a rather interesting test for grammatical anomaly (though only relative to Google's bizarre assumptions about normality). Lets try another case: "* who thinks that is happy". This one has pretty damn ordinary set of words in it, but suffers from an unfortunate case of a missing subject. Here Google initially estimates 10,900 results. But then it rapidly revises down to 16. Try a that-trace violation: "who * said that is happy", first 1,130,000, then 22, none of which are actually that-trace violations (and all of which are only produced because Google's interpretation of the * operator is insane, a point we've made on Languagelog before). Similarly, "* give a man a man a *", which has too many arguments for "give", starts off at 4,260, but then drops to 54.

So. Umm. There you go.

David

Posted by Barbara Partee at September 18, 2007 04:19 AM