Language Log: Google Faculty Summit

July 27, 2007

Google Faculty Summit

Yesterday and today, I'm in Mountain View for Google's Faculty Summit. It's been fun. I've had a chance to catch up with old friends -- both Googlers like David Talkin and Franz Och, and other academics like Ed Fox, Mary Harper, Bob Futrelle and Richard Sproat. Among the many interesting new people that I've met, I've especially enjoyed conversations with Graeme Bailey from Cornell, about musical search; Lillian Cassel from Villanova, about whether ontologies are discovered or invented; Munindar Singh from NC State, about the pragmatic web; and with one of the Googlers who invented/implemented Google Scholar.

There are a few differences with the last Google Faculty Summit that I attended. This one is bigger -- big enough so that so far I haven't even bumped into some of the attendees that I know, like Christiane Fellbaum, one of the authors of WordNet. And there was no NDA to sign, unlike the rather ferocious one we had to sign on the way into the Googleplex last time; but on the other hand, there are a lot of hip but tough-looking security people around.

I'll mention just three of the technical highlights so far.

First, there was an announcement of two new ways for faculty to use Google's resources remotely for research purposes ("Drink from the firehose with University Research Programs", 7/26/2007). Google Search "is designed to give university faculty and their research teams high-volume programmatic access to Google Search, whose huge repository of data constitutes a valuable resource for understanding the structure and contents of the web". And Google Translate "will allow researchers programmatic access to Google's translation service", including "detailed word alignment information" and/or "a list of the n-best translations with detailed scoring information".

Second, there was an excellent talk by Mehran Sahami, "Text Mining in Information Retrieval: Theory and Practice". (The talk was videotaped and perhaps it will show up on YouTube, as many of the open talks at Google do, I'm not sure.) Its technical content has been published, as Mehran Sahami and Timothy Heilman, "A web-based kernel function for measuring the similarity of short text snippets", Proceedings of the 15th international conference on World Wide Web, 2006.

I don't have time to explain it now (I'll try to get to it later), but trust me, there are some very neat ideas in there. Mehran's presentation was also extremely well crafted, presenting the key issues clearly and accessibly, but without dumbing them down -- that's why I hope that the video ends up being posted on YouTube.

Finally, there were a couple of presentations on Google Code for Educators, which looks like it has some neat stuff in it, not only for use in courses but also for self-education. Want to learn Ajax programming, or how to use Hadoop and GFS? There are what look like accessible tutorials and course materials.

Posted by Mark Liberman at July 27, 2007 10:00 AM