May 24, 2004

Ghits -- and Whits?

I like Geoff's proposal for using Ghits as a measure for Web frequencies, despite some necessary imprecision owing to duplicates, etc. But I think it's premature to settle on Ghits as the only such measure. It works fine for people doing searches in browsers, but for those attempting to automate such searches (e.g., using specializations of the WWW::Search perl module, such as WWW::Search::Google or WWW:Search:Altavista) Google's Web API usage restriction (1000 queries per day) sometimes makes it difficult to conduct searches in sufficiently large quantities. (In my experience Altavista is more forgiving, at least if you have your program pause for a second or two in between http requests.)

My initial instinct was just to propose AVhits as another measure, but perusing the CPAN site,I find that there are a ton of search modules, including not only search engines like Google, Altavista, and Lycos, but also news sources like the Washington Post and Reuters, specialized article searches like PubMed, job searches, etc.

What do do? Perhaps use Whits (for "Web hits") as the more general term, with usages like "1234 Whits (Altavista)", and then consider Ghits to be an abbreviation for "Whits (Google)"?

A nice advantage of this proposal is that you can advise researchers to keep their Whits...just in case anyone wants to see them. :-)

Posted by Philip Resnik at May 24, 2004 08:02 PM