August 30, 2005

An Encoding Puzzle

Recently I looked something up in the GNU/Linux manual pages at, which are in French. and couldn't get them to display correctly. Most of the text came out fine, but accented letters, and generally anything outside the ASCII range, came out garbled. At first I thought that the browser might be displaying the page using the wrong encoding, but changing encodings didn't solve the problem. The Spanish manual pages at exhibit the same problem.

[Note: when I checked these URLs just now, I got a server error. If it is still acting up, here's a link to the Google cache of the French page.]

Although I couldn't get these pages to display correctly, short of writing a little script to transform them before letting the browser at them, after a few minutes I figured out what had happened to them. There is a perfectly straightforward explanation for what happened to them. For now, I'm going to leave the solution as an exercise for the ling-technically inclined reader. I'll post it tomorrow.

Posted by Bill Poser at August 30, 2005 12:30 AM