April 19, 2006

MLA Language Map enters new territory

Back in June 2004, the MLA website rolled out an interactive language map of the United States, displaying the number of speakers per county or zip code for 37 languages, based on 2000 census data. The site originally used the misleading term "density" to refer to these statistics, even though the numbers given were for total speakers of a language, not the proportion of speakers to the local population. After this was pointed out on Language Log, the MLA changed the wording from "density" to "number of speakers." David Goldberg of the MLA's Foreign Language Programs further promised that "an anticipated expansion of the site will include a reflection of actual density of speakers." Well, as the Chronicle News Blog reports, the anticipated expansion has finally arrived. Not only does the new improved site generate percentage-based maps for different languages, it has a whole host of enhancements, including a Data Center with statistics for more than 300 languages searchable all the way down to the municipal level.

The percentage-based maps currently can be obtained only by county, not by zip code, but it's still enough to make a big difference compared to the previously available maps on the site. Compare these two maps for Spanish speakers in Texas, the first coded by number of speakers and the second by percentage of speakers:





One nice feature of the new Data Center is the ability to see local statistics for any recognized language or language group, combined with census data on speakers' age ranges and knowledge of English. So, for instance, if you look up the list of languages spoken in Jersey City, NJ, you can click on any language, say Gujarathi, and get this "language snapshot":




MLA vice president Michael Holquist was quoted by the Chronicle News Blog as saying that the project demonstrates how the United States, "with the exception of Papua New Guinea, is the country in the world with the greatest diversity of languages." I'm not sure how they're measuring linguistic diversity, but I doubt the U.S. comes in second on any reliable scale. According to tabulated data from Ethnologue, the U.S. ranks fifth in terms of total languages with 311, behind Papua New Guinea (820), Indonesia (742), Nigeria (516), and India (427). And the United States is not especially diverse according to another scale, Greenberg's diversity index, which calculates the probability that any two randomly selected people have different native languages. Ethnologue gives that probability as 0.353 for the U.S., good enough for 124th place out of 218 countries, sandwiched between Serbia-Montenegro and Paraguay.

[Update: Ben Sadock points out that the Census Bureau's own mapping tool is "harder to use than the MLA's interface" but "infinitely more manipulable." He recommends: "go play around on the Census Bureau's website, and you'll never be satisfied with the MLA's mapping tools again, even if they do monopolize mappable language data."]

Posted by Benjamin Zimmer at April 19, 2006 07:52 AM