August 23, 2004

Parsing the Duchess

Senior research scientist Chris Culy at FXPAL ran the Brill part-of-speech tagger on the Duchess's sentence, fed the output of that to the parser written by Michael Collins, and fed the output of that to a perl script that translated it into XML. Those with a strong stomach may may look at the XMLized parse output if they care to read on below. But contrary to what I said in the first version of this note posted last night, the fact that a parse was produced does not indicate that the Duchess's sentence is grammatical: Fernando Pereira at Penn informs me that the Collins parser will assign a parse to any string of words. The parser finds the structure that would be most probable for the string, be it ever so unlikely. So that still gives us no clue as to whether the structure below corresponds to a grammatical sentence or not, and we certainly know nothing about what it would mean.

<doc>
<TOP numDtrs="1" headDtr="1" headStr="imagine">
<SG numDtrs="2" headDtr="2" headStr="imagine">
<ADVP numDtrs="1" headDtr="1" headStr="Never">
<RB>Never</RB>
</ADVP>
<VP numDtrs="3" headDtr="1" headStr="imagine">
<VB>imagine</VB>
<NP numDtrs="1" headDtr="1" headStr="yourself" isArg="true">
<NPB numDtrs="1" headDtr="1" headStr="yourself">
<PRP>yourself</PRP>
</NPB>
</NP>
<SG numDtrs="2" headDtr="2" headStr="to">
<RB>not</RB>
<VP numDtrs="2" headDtr="1" headStr="to">
<TO>to</TO>
<VP numDtrs="3" headDtr="1" headStr="be" isArg="true">
<VB>be</VB>
<ADVP numDtrs="1" headDtr="1" headStr="otherwise">
<RB>otherwise</RB>
</ADVP>
<PP numDtrs="2" headDtr="1" headStr="than">
<IN>than</IN>
<SBAR numDtrs="2" headDtr="1" headStr="what" isArg="true">
<WHNP numDtrs="1" headDtr="1" headStr="what">
<WP>what</WP>
</WHNP>
<S numDtrs="2" headDtr="2" headStr="might" isArg="true">
<NP numDtrs="1" headDtr="1" headStr="it" isArg="true">
<NPB numDtrs="1" headDtr="1" headStr="it">
<PRP>it</PRP>
</NPB>
</NP>
<VP numDtrs="2" headDtr="1" headStr="might">
<MD>might</MD>
<VP numDtrs="3" headDtr="1" headStr="appear" isArg="true">
<VB>appear</VB>
<PP numDtrs="2" headDtr="1" headStr="to">
<TO>to</TO>
<NP numDtrs="1" headDtr="1" headStr="others" isArg="true">
<NPB numDtrs="1" headDtr="1" headStr="others">
<NNS>others</NNS>
</NPB>
</NP>
</PP>
<SBAR numDtrs="2" headDtr="1" headStr="that" isArg="true">
<IN>that</IN>
<S numDtrs="2" headDtr="2" headStr="would" isArg="true">
<SBAR numDtrs="2" headDtr="1" headStr="what" isArg="true">
<WHNP numDtrs="1" headDtr="1" headStr="what">
<WP>what</WP>
</WHNP>
<S numDtrs="2" headDtr="2" headStr="were" isArg="true">
<NP numDtrs="1" headDtr="1" headStr="you" isArg="true">
<NPB numDtrs="1" headDtr="1" headStr="you">
<PRP>you</PRP>
</NPB>
</NP>
<VP numDtrs="3" headDtr="1" headStr="were">
<VP numDtrs="1" headDtr="1" headStr="were">
<VBD>were</VBD>
</VP>
<CC>or</CC>
<VP numDtrs="2" headDtr="1" headStr="might">
<MD>might</MD>
<VP numDtrs="2" headDtr="1" headStr="have" isArg="true">
<VB>have</VB>
<VP numDtrs="2" headDtr="1" headStr="been" isArg="true">
<VBN>been</VBN>
<VP numDtrs="2" headDtr="1" headStr="was" isArg="true">
<VBD>was</VBD>
<ADVP numDtrs="2" headDtr="1" headStr="not">
<ADVP numDtrs="2" headDtr="1" headStr="not">
<RB>not</RB>
<ADVP numDtrs="1" headDtr="1" headStr="otherwise">
<RB>otherwise</RB>
</ADVP>
</ADVP>
<PP numDtrs="2" headDtr="1" headStr="than">
<IN>than</IN>
<SBAR numDtrs="2" headDtr="1" headStr="what" isArg="true">
<WHNP numDtrs="1" headDtr="1" headStr="what">
<WP>what</WP>
</WHNP>
<S numDtrs="2" headDtr="2" headStr="had" isArg="true">
<NP numDtrs="1" headDtr="1" headStr="you" isArg="true">
<NPB numDtrs="1" headDtr="1" headStr="you">
<PRP>you</PRP>
</NPB>
</NP>
<VP numDtrs="2" headDtr="1" headStr="had">
<VBD>had</VBD>
<VP numDtrs="1" headDtr="1" headStr="been" isArg="true">
<VBN>been</VBN>
</VP>
</VP>
</S>
</SBAR>
</PP>
</ADVP>
</VP>
</VP>
</VP>
</VP>
</VP>
</S>
</SBAR>
<VP numDtrs="2" headDtr="1" headStr="would">
<MD>would</MD>
<VP numDtrs="2" headDtr="1" headStr="have" isArg="true">
<VB>have</VB>
<VP numDtrs="3" headDtr="1" headStr="appeared" isArg="true">
<VBN>appeared</VBN>
<PP numDtrs="2" headDtr="1" headStr="to">
<TO>to</TO>
<NP numDtrs="1" headDtr="1" headStr="them" isArg="true">
<NPB numDtrs="1" headDtr="1" headStr="them">
<PRP>them</PRP>
</NPB>
</NP>
</PP>
<SG numDtrs="1" headDtr="1" headStr="to">
<VP numDtrs="2" headDtr="1" headStr="to">
<TO>to</TO>
<VP numDtrs="2" headDtr="1" headStr="be" isArg="true">
<VB>be</VB>
<ADVP numDtrs="1" headDtr="1" headStr="otherwise">
<RB>otherwise</RB>
<PUNC>.</PUNC>
</ADVP>
</VP>
</VP>
</SG>
</VP>
</VP>
</VP>
</S>
</SBAR>
</VP>
</VP>
</S>
</SBAR>
</PP>
</VP>
</VP>
</SG>
</VP>
</SG>
</TOP>
</doc>

Posted by Geoffrey K. Pullum at August 23, 2004 09:09 PM