December 18, 2006

Perforating database search?

Not long after Google made patent search available, Lubos Motl posted a list of "nine patents [that] depend on string theory". I guess that the idea is to show that string theory is not only not "not even wrong", but is actually providing the theoretical foundations for practical invention. (Or more likely, Motl's post is just a joke.) I'll leave it to others to evaluate the "Method for ameliorating the aging process and the effects thereof utilizing electromagnetic energy" and the "Space vehicle propelled by the pressure of inflationary vacuum state", but I was fascinated to find several language-related patents in the list. An echo, decades later, of Ed Witten's undergraduate minor in linguistics at Brandeis?

Not so, at least for the patent that I personally found most interesting, namely US Pat. 6862586, issued Mar. 1, 2005 to a group of researchers at IBM, describing a method for "Searching databases that identifying group documents forming high-dimensional torus geometric K-means clustering, ranking, summarizing based on vector triplets."

Seriously, that's the title. Talk about your bag of words. I can only imagine that a few phrases have been transposed and/or a few words left out -- anyone case to speculate about what the title was supposed to be? Anyhow, the "high-dimensional" part seems vaguely promising, so let's read on.

This patent's first claim is:

1. A method of perforating a database search comprising:

searching a database using a query, said searching identifying a group of hyperlinked documents;
forming a high-dimensional torus geometric representation of said hyperlinked documents, wherein each hyperlinked document is represented by a vector triplet comprising a normalized word frequency, a normalized out-link frequency and a normalized in-link frequency;
clustering said result items into clusters based on said high-dimensional torus geometric representation;
ranking items within each cluster of said clusters based on said high-dimensional torus geometric representation;
[etc. etc.]

"Perforating a database search"? Could this be a radical new database operation for accessing the extra curled-up dimensions of (the meaning of linguistic) strings? No, after reading the rest of the document, I'm pretty sure that "perforating" is just a scribal error for "performing".

In fact, alas, this patent's only connection to string theory comes at the end, where "string theory" is used as one of several sample queries:

Results for one query are discussed below (e.g., the query "latent semantic indexing"). [...]

Similarly, in response to the query "string theory" the invention brings up "The Official String Theory home page" as S and in reponse to the query 'Information Retrieval" the invention brings up 'The SIGIR home page" as S.

What a disappointment. I was hoping to learn about the bulk of meaning beyond the brane of text, penetrated only by the force of insight (thus explaining why insight is so many orders of magnitude weaker than other interpretive forces).

But I'm left with a question. Aren't patent examiners supposed to, you know, like actually read the patents they approve, and determine that at least some of the text -- say, the title and the first claim -- make some kind of sense as written? (I'm not objecting to the content of this patent, which at a quick read looks sensible and interesting once you get past the random errors in presentation.)

[Update -- Rob, a patent-lawyer, explains:

You are right that the examiner (and even the administrative folks who are looking at these applications during the process) is supposed to review things like the title to make sure they make sense and comply with the rules (for instance, the rules require that the title "must be as short and specific as possible" - unlikely in this case as the title is nearly as long as the Abstract). Some typos in final documents (perhaps "perforating" in this case) come from the scanning and printing processes. This particular application was filed before the advent of the on-line electronic file histories, so it is not possible to see what it looked like as filed without ordering the file history from the PTO. If there was actually an error in the papers as filed, then it is all the more remarkable because not only did an examiner look at them, but the board of patent appeals did as well. Presumably, someone in that three judge panel should have noticed that the title and claim 1 were nonsensical.

]

Posted by Mark Liberman at December 18, 2006 10:01 AM