May 08, 2007

India begins monumental language documentation project

Later this month, linguists from across India will convene to begin work on a 10-year, US$100M project to survey 400+ Indian languages. The New Linguistic Survey of India will involve 44 academic institutions and some 10,000 linguists and language experts, making it the largest national language documentation effort to date. The project will describe each language and speech variety, compiling lexicons, grammar sketches, audiovisual documentation, and language maps, and will disseminate these materials over the web.

On a recent visit to the Indian Summer School on Natural Language Processing, I called in at the Central Institute of Indian Languages, Mysore, the coordinating site for the project, and met the deputy director Professor Rajesh Sachdeva. He is busy with logistics for the six-week summer school starting later this month, bringing 400 linguists to Mysore to take stock of the current state of knowledge about Indian languages and to provide advanced training in linguistic survey and analysis.

In a recent interview, Professor Udaya Narayana Singh, director of the institute, described plans to ``develop a Linguistic Data Consortium for Indian Languages (LDCIL) on the lines of the Linguistic Data Consortium at the University of Pennsylvania — a hugely successful consortium of 100 companies, universities and government agencies that aids research in linguistic technologies.''

Posted by Steven Bird at May 8, 2007 02:02 AM