June 23, 2007

Word counts

    With California Invasive Weeds Awareness Week just around the corner (July 17-23), there are two words every Californian should know: yellow star thistle.

Yes, I know, how silly of the Chron (or its source on invasive weeds): yellow star thistle is obviously three words.  Or is it?

Counting "the number of words" in an expression is a tricky business.  The New Yorker staff is acting like the word counting software that comes with your word processor: basically, it counts things separated by spaces.  That means the algorithm is sensitive to the arbitrariness of English orthography.

English noun-noun compounds, including those whose meanings are in part conventionalized, are written in three ways: solid (doghouse), hyphenated (dog-ear), separated (dog tag).  There are some generalizations about which spelling is used for which compounds, but there's a good bit of arbitrariness, and also significant variation.  In any case, as far as the system of English goes, for conventionalized compounds the three types are entirely parallel, and a dictionary of reasonable size will have entries for all three.  We're looking at "a word" in each case, regardless of how they're written -- granted, a word that has words as its parts, but still in some sense a word.

Dictionaries, AHD4 for instance, do have entries for star thistle (and star anise and star apple and star fruit).  And my Peterson Field Guide to Pacific States Wildflowers (Niehaus & Ripper 1976) has the yellow star thistle (Centaurea solstitialis) listed in its index under "star thistle, yellow" (also under "thistle, yellow star", using the head noun thistle of the compound star thistle).

So you could argue that yellow star thistle is in fact a two-word expression: yellow plus the compound noun star thistle.

[Yes, solstitialis, suitable for this season of the year.  And the pernicious yellow star thistles are in fact blooming on the hillsides.]

[Addendum 6/26: Mae Sander has written with a plausible proposal about how the Chron ended up with "yellow star thistle": the piece originally had "yellow starthistle" -- this spelling can be found in many publications, for instance the University of California Cooperative Extension fact sheet on the plant -- but a proofreader "fixed" the spelling by separating the two parts of "starthistle" (I myself dislike this spelling, because I'm inclined to (mis)read it as "start-histle").  Now, this requires a proofreader who isn't really reading the text for content, but there are such people -- people who would change "an item of data is" to "an item of data are" because, sigh, they change ALL instances of "data is" to "data are".]

Posted by Arnold Zwicky at June 23, 2007 08:29 PM