June 04, 2006

ASCII, Mac OS X, and the 128 names of DormAid

A former Harvard dean I met recently at a party told the story of how couple of years ago Harvard agreed to let a private company (started by a Harvard student) under the name DorMaid contract to provide chambermaids come in to clean and tidy student dorm rooms for those students well-heeled enough to afford the fees. There was some discussion in March 2005 of whether this was political correct, given the class implications (see the critical editorial in the Harvard Crimson, and the contemptuous dissenting view on this blog by a Harvard alum). Harvard's negotiated terms before the company was permitted to operate, I was told, included a change to the company's name: from DorMaid to DormAid. "Maid" carried connotations of low-status unmarried females doing menial work for wealthy sons of the elite. "Aid" did not. (The Harvard Crimson didn't get the new name right: their editorial spelled it Dormaid, which does not match the company web site and, as I understand it, was never the correct name.) As I listened to this story I realized it reminded me of something rather ghastly about the Macintosh OS X operating system. Let me explain it to you. And if you use Mac OS X, you should listen, because not knowing this could be hazardous to your health.

As you can easily verify, in OS X the distinction between the name Harvard didn't like and the name that it did is, to say the least, subtle. Try a few experiments (with great caution). Start the Terminal program, and create a file called DorMaid. Don't put anything valuable in it. (The command touch DorMaid will create an empty file for you to experiment with.) Now try removing the nonexistent file DormAid (you don't have one, but imagine that you thought you did and you wanted to get rid of it). (The command to do this is rm DormAid.) If you know Unix, you should expect to see an error message referring to a nonexistent file. You'd expect the screen to look like this:

% rm DormAid
rm: DormAid: No such file or directory

But you won't see that error message. You'll see nothing but the command you typed and the new prompt. That's the expected behavior when a Unix command worked perfectly. It's what you'd expect if rm really had found a file of that name to remove.

Now use the ls -l DorMaid command to list the details of the file you created, DorMaid, which you did not attempt to remove. You'll find it isn't there:

% ls -l DorMaid
ls: DorMaid: No such file or directory

The file you created has gone AWOL. Where did it go?

Here's what's going on. OS X appears to be working with no distinction between A-Z and a-z, but pretending to recognize the distinctions. It registers your file names with the capitalization you gave them, but then treats then without regard to the case distinction.

So if you create a file called DorMaid, OS X will carefully repeat it back for you as if it were really called that (the ls program, for example, will show it as DorMaid. But it isn't really called that, and it is very dangerous to think otherwise. The truth is that all of the following are the same filename under OS X:

DORMAID DORMAId DORMAiD DORMAid DORMaID DORMaId DORMaiD DORMaid DORmAID DORmAId DORmAiD DORmAid DORmaID DORmaId DORmaiD DORmaid DOrMAID DOrMAId DOrMAiD DOrMAid DOrMaID DOrMaId DOrMaiD DOrMaid DOrmAID DOrmAId DOrmAiD DOrmAid DOrmaID DOrmaId DOrmaiD DOrmaid DoRMAID DoRMAId DoRMAiD DoRMAid DoRMaID DoRMaId DoRMaiD DoRMaid DoRmAID DoRmAId DoRmAiD DoRmAid DoRmaID DoRmaId DoRmaiD DoRmaid DorMAID DorMAId DorMAiD DorMAid DorMaID DorMaId DorMaiD DorMaid DormAID DormAId DormAiD DormAid DormaID DormaId DormaiD Dormaid dORMAID dORMAId dORMAiD dORMAid dORMaID dORMaId dORMaiD dORMaid dORmAID dORmAId dORmAiD dORmAid dORmaID dORmaId dORmaiD dORmaid dOrMAID dOrMAId dOrMAiD dOrMAid dOrMaID dOrMaId dOrMaiD dOrMaid dOrmAID dOrmAId dOrmAiD dOrmAid dOrmaID dOrmaId dOrmaiD dOrmaid doRMAID doRMAId doRMAiD doRMAid doRMaID doRMaId doRMaiD doRMaid doRmAID doRmAId doRmAiD doRmAid doRmaID doRmaId doRmaiD doRmaid dorMAID dorMAId dorMAiD dorMAid dorMaID dorMaId dorMaiD dorMaid dormAID dormAId dormAiD dormAid dormaID dormaId dormaiD dormaid

This is a whole slew of accidents waiting to happen. It is not just a silly triviality. I nearly lost some important source files because of this appalling misfeature. I am now having some difficulty reconstructing the exact scenario; what I posted here earlier was not correct. It may have been that I typed ‘cp ch1.tex Ch1.tex’ (trying to creat a new file with a capital initial) and failed to notice that there was an error message (copying a file to itself is not a legal operation), and then later decided to remove ch1.tex on the grounds that I didn't need it, and thus almost lost the original completely. All I recall is I had two near disasters several weeks ago in which I was only saved by lucky accidents of backup copies with different names, and after the second I realized what OS X was doing to me.)

The mv command, which alters the names or locations of files, allows name changes like mv DorMaid DormAid without complaint. The ls command then shows the name change as having taken place. But the rm command then ignores the change. This is very dangerous behavior. I had wondered why Apple was naming successive releases of OS X after dangerous beasts (Jaguar, Panther, Tiger, and so on). But in fact it's rather appropriate. Be warned. It's a jungle out there.

[Update: David Pesetsky writes from MIT with some helpful technical info that modifies what I say above — he claims it's the file system, not the OS itself. I quote:

Actually, from 10.3 on, you can specify that you want your hard drive to have "case-sensitive HFS+" when you set it up or if you repartition it. (There is such an option in "Disk Utility".) It's apparently not a property of OS X per se, but of the HFS+ filesystem that is the default for OS X installations.

I think it all has to do with compatibility with pre-OS X programs for the Mac, since the previous operating system was case-insensitive for real. If you Google for "case-sensitive HFS" or "Case-sensitivity Macintosh" and the like, you'll also see various warnings about problems with some OS X programs if you have turned "case sensitivity" on. But the option of case-sensitivity exists, at least at the time your disk is first formatted -- and apparently OS X can deal with it.

I followed David's advice and looked at this site and this one. On the latter I saw a comment asking why anyone would want case-sensitivity. I think I've answered that!

Don Porges tells me you get the same behavior doing DOS commands on a Windows system. I'm not really surprised. This is behavior that, unusually for Mac OS X, really sucks. I would have expected Windows to faithfully replicate every feature of other operating systems that really sucks, and then add stupid features of its own, plus bugs. And apparently that's right. Perhaps I should add this note before I close: Mac OS X is beautiful. I use it all the time. It's the best corporate operating system I've ever encountered, and I do all my serious work on it (I keep a Windows system purely to be able to work on files with coauthors who like WordPerfect. No other reason. WordPerfect is the best of the WYSWYG editors, but isn't available in a Mac version. Don't read me as saying Mac OS X is no good. It's wonderful. The only better systems are carefully selective and well-configured Linux systems (Debian Linux is my pick). And they are occasionally a little heavy on the system maintenance time that you have to put in. No, I'm grateful for what David Pesetsky has told me, because I love Mac OS X, and only hate the failure to be case-sensitive.]

Posted by Geoffrey K. Pullum at June 4, 2006 07:10 AM