This is embarrassing. Among other things, I've recently been working on a Bengali morphological analyzer, and so I've been doing a lot of looking at Bengali text on line, like here, and a little bit of creating Bengali html text for display. Bengali has some complex rendering issues typically of South Asian scripts. Here's a sample, from the Bengali Wikipedia's "Bangla script display help" page:
The following image shows you how a correctly enabled computer will render the Bangla script:
The following line of text shows how your computer renders the above line:
ক + ি → কি
The Unicode Bengali code chart (or here, in html form) will tell you that this involves 0995 "Bengali letter KA" plus 09BF "Bengali vowel sign I (stands to the left of the consonant)". Put them together, in the logical order "KA I", and they spell the syllable /ki/ -- but the "vowel sign I" is supposed to be rendered first, even though it's second in the character sequence. When I look at the results in Internet Explorer 6, the sequence is rendered as it should be, but after a modest amount of trying, I can't get it to work in Firefox. (As complex rendering issues go, this one is pretty simple -- but that doesn't mean that you can count on it to be handled correctly.)
In current Microsoft software, most of this stuff mostly works, thanks to people like Michael Kaplan, who has an always-interesting blog called "Sorting It All Out", about internationalization issues construed broadly. My experience has been that Microsoft, though slower than I would like, has been generally better about dealing with such things than any of the other players whose software I've had occasion to use. Since this blog has sometimes complained about MS software for one reason or another, it's only fair to offer kudos where it is deserved, and I hereby do so, as I've meant to do for several months. Praise is due not only to the individuals who have made the software work, but also to the corporate commitment that gives them the mandate to do it.
So why is this embarrassing? Because I've finally gotten around to posting this praise for Microsoft just as I'm about to head to Seattle for the "Microsoft Research Faculty Summit 2006". I hope you believe that it takes more to buy my good opinion than two days of listening to inspirational talks and eating hotel food. Well, there's also the "dinner cruise from Lake Washington to Puget Sound"; maybe that's what tipped the scales :-).
[Update -- this morning Patrick Hall wrote:
I was interested to see your post on a wandering Bengali vowel signs in Firefox ("matras," I've learned that the vowels are called "matras," or at least they are in Hindi). I've wondered about that same problem. I've got a rather amorphous, ranty blog post on the topic here:
But hey, it has illustrations. And it wouldn't be a blog post if it weren't amorphous & ranty. :-)
It seems that this particular bug is a long-standing problem in Firefox (there's a link to a Bugzilla thread in my post). A guy named Simos, who I believe is involved in Gnome internationalization, left a comment on the post as well, explaining that the Pango font rendering system still isn't fully integrated into Firefox. It's worth noting that on Linux at least, Pango-based applications like the gedit text editor handle the matras correctly.
Definitely complex stuff, and I couldn't agree more that the Microsoft i18n folks are to be congratulated.
For those readers who may not be clued in, "i18n" stands for "internationalization", the "18" representing the 18 letters left out between "i" and "n". This afternoon, Patrick added:
I ran across a couple more links that may be relevant to your Bangla woes:
Bengali character picker at w3c.org
I was playing around with this and came to the rather disturbing realization that if you enter the characters in in the incorrect order ( "i + ka" instead of "ka + i" in your example), they render "correctly." This strikes me as all kinds of bad.
Known Problems of FireFox in Bangla
This one explains how (on Linux at least), the Pango font rendering can be compiled into Firefox (but isn't). There are screenshots that seem to show that this resolves the matra problems you describe.
Yes, in a post more than two years ago ("Them old diacritical blues again", 3/21/2004) I expressed frustration that in order to get Unicode combining underdots to work write in Mozilla, I had to put them in the string in the wrong order. IE got it right then (at least with a suitable font), Mozilla got it wrong, and nothing has changed since. Hey guys, we're supposed to accelerating towards a cultural singularity, not sitting in an i18n fixed point...
And Kerim Friedman writes:
It is worth noting that on a Mac, Safari handles devanagari scripts just fine, even though (alas) Firefox has yet to solve these problems. Safari is based on KHTML (used in Konqueror on Linux), unlike Firefox which is based on the Gecko engine. Devanagari scripts are also handled fine in all built-in OS X applications.
I filed a bug with Firefox about this over a year ago, and it is unfortunate that they have yet to fix the problem.
Interestingly, the new online "Writely" word processor (recently bought by Google) seems to be able to work with these scripts in Firefox, even though Firefox has the problems you mention.
While I've been impressed by Writely in general, this particular feature doesn't work for me in Writely in Firefox on Windows XP (although it works in Writely in Firefox on a Mac with OS X), even if I use a font that renders Devanagari-derived scripts correctly in MS Word or in the excellent (and free software) Abiword word processor. This kind of inconsistency across operating systems and platforms is puzzling and annoying. Abiword, by the way, does the right thing in all the cases that I've checked. Since it's happy to read and write UTF-8 text, I've been using it for general multilingual editing. Why Firefox still can't consistently handle the display side correctly is not clear to me. ]
[Several people have written to me with messages like this one from Chung-chieh Shan:
Perhaps because my Debian "unstable" system has Pango 1.12.3 installed, and Debian enables Pango support in Firefox, the Bengali text here does work for me. (Screenshot attached.)
I gather from this and other emails that part of the problem here is due to ambiguity about which parts of the rendering problem should be solved in an individual application, versus in a (shared library associated with the) windowing system, versus in the underlying operating system itself. Quite a few people wrote to me to complain that I shouldn't blame this rendering problem on Firefox, since the problem is really that Windows and Mac OS X are not behaving as Firefox expects them to (except when Firefox is compiled in a certain way, maybe). If I believed this, though, I'd have to believe Microsoft's argument that integrating applications such as browsers into the OS is a good thing for users. Firefox is my standard browser (except when I'm looking at pages in scripts with non-trivial rendering), but the LONG time it's taken the folks at mozilla.org to get their act together on this point does seem to ratify the arguments Microsoft used against Netscape back in the day.]Posted by Mark Liberman at July 16, 2006 05:02 AM