November 13, 2004

African language computer farrago

There's a curious article by Marc Lacey in the NYT today, under the headline "Using a New Language in Africa to Save Dying Ones". The article reads if a few raw notes about computer technology in Africa had been mixed up together, dumped out in random order, and strung together as if they told a coherent story.

The article starts by asserting that there are problems using African languages on computers, though it never explains clearly what these problems are:

Swahili speakers wishing to use a "kompyuta" - as computer is rendered in Swahili - have been out of luck when it comes to communicating in their tongue. Computers, no matter how bulky their hard drives or sophisticated their software packages, have not yet mastered Swahili or hundreds of other indigenous African languages.

But that may soon change. Across the continent, linguists are working with experts in information technology to make computers more accessible to Africans who happen not to know English, French or the other major languages that have been programmed into the world's desktops.

The article goes on to tangle up the problem of preserving dying languages with the problem of facilitating computer use by the speakers of some very lively ones:

But the campaign to Africanize cyberspace is not all about the bottom line. There are hundreds of languages in Africa - some spoken only by a few dozen elders - and they are dying out at an alarming rate. The continent's linguists see the computer as one important way of saving them. Unesco estimates that 90 percent of the world's 6,000 languages are not represented on the Internet, and that one language is disappearing somewhere around the world every two weeks.

"Technology can overrun these languages and entrench Anglophone imperialism," said Tunde Adegbola, a Nigerian computer scientist and linguist who is working to preserve Yoruba, a West African language spoken by millions of people in western Nigeria as well as in Cameroon and Niger. "But if we act, we can use technology to preserve these so-called minority languages."

This is a bizarre transition. Yoruba is very much alive, with 25 to 30 million native speakers, almost as many as Polish and nearly four times as many as Swedish. There is a lively Yoruba-language publishing and broadcasting industry, and widespread use in schools. Within the Yoruba-speaking area, children normally grow up speaking Yoruba, and the same is true among hundreds of thousands if not millions of Yoruba speakers in other countries. So it's weird to juxtapose Yoruba with Unesco's (valid) concern about dying languages, as if it were an example. There are plenty of endangered languages among the 505 that Ethnologue lists in Nigeria, but Yoruba is not one of them.

Using Yoruba on the computer can certainly still be a problem. The orthography requires both accents above and dots below certain letters, and getting this rendered correctly on the web without special fonts remains a bit chancy. And because a variety of non-standard 8-bit fonts remain in use, dealing with Yoruba manuscripts (or even Yoruba examples in linguistics papers) remains annoyingly difficult. Adegbola's efforts are certainly needed. But the article doesn't mention these issues; instead it asserts (falsely) that "Different Yoruba words are written the same way using the Latin alphabet - the tones that differentiate them are indicated by extra punctuation". Actually, the tones are indicated by (acute and grave) accents (as in the name of the language, Yorùbá, whose tones are mid-low-high). I've gotten over being shocked when a major publication like the New York Times assigns a story to a reporter who lacks the most elementary linguistic knowledge relevant to it -- but really, would it be too much to ask to keep the difference between accents and marks of punctuation straight?

I guess it's possible that the reporter does know the difference, and is writing about the use of single quote and back quote as a method for keyboarding acute and grave accents; but if that's it, why not say so, and give an example? Like "In entering Yoruba on the computer today, people often hit an extra key to add a tone mark, for example typing a' to get á."

Another possible issue that is implicit in the article but never brought out directly is the question of localizing help files, dialogue boxes, interface legends, and so on. This is the only thing that can possibly be at issue for (text-based) use of Swahili, which "computers... have not yet mastered", according to the article. While localization of prompts and such is certainly a good thing to do, I'm very skeptical that it is a major barrier to wider use of computer technicology among Africans. At present most literate Africans can read English or French. Perhaps this should change -- though I believe that the people whose education would be affected by this choice would object very strongly, and I would agree with them, since literacy in one of the major international languages is an essential educational tool. In any case, at the moment, anyone in Africa who is likely to be using a computer to create a document or send email can almost certainly read interface text in English or French without much trouble.

Lacey (the article's author) does start out by talking about "[making] computers more accessible to Africans who happen not to know English, French or the other major languages that have been programmed into the world's desktops". So he may have in mind facilitating a new kind of computer-mediated literacy training among those who don't know English or French. Or maybe he's thinking about bringing interaction with networked computers to people who are not literate at all, using images and speech technology. Those are both interesting ideas, but it's odd to write as if the way to to accomplish such things is to put African languages on an equal footing with English or French in the use of Microsoft Office. Mix in references to endangered languages, text messaging in Amharic, machine translation among English, Afrikaans and Sotho, problems of borrowed vs. created technical vocabulary; stir well; and bake till done.

The ingredients here include preservation and documentation of Africa's hundreds of endangered languages; full localization of software for Africa's dozens of large local languages; methods for input, display and editing of Africa's many orthographies that require (simple forms of) complex rendering; the role of computers in promoting literacy in local languages; language standardization and the development of technical vocabulary; linguistic nationalism among the languages within African countries, nearly all of which are multilingual; and the relationship of Africans to the major international languages, which in Africa mainly means English and French, though Arabic is also relevant in some areas. These are all important problems, with subtle and complicated relationships among themselves and with other economic, political and technical questions. There are analogous issues in most other areas of the world. I hope that this article means that the NYT editors have developed an interest in these questions, and will continue the discussion in a more careful way at some point in the future.


Posted by Mark Liberman at November 13, 2004 06:42 PM