May 18, 2007

Accidental dropped keyboard command issuance probability

My wireless Macintosh keyboard, which was talking to my laptop, fell off my desk at home, and as it went down and bumped against the drawer knobs and I grabbed at it, several keys were accidentally pressed (and a Shift key actually came off). In the active window on the laptop at the time was a live SSH connection to the Linux machine on my desk up at the UCSC campus, so everything that was accidentally typed was interpreted as a sentence in the language of the tcsh Unix shell language on my Linux box two miles away. And as fate would have it, the keys that were hit as the keyboard clattered to the floor spelled out a fully grammatical sentence of that language, meaning that an actual executable command was issued to the operating system of my desktop machine, and was executed. What are the chances of that?

Well, it would take some tedious but elementary work in elementary combinatorics and Unix command file listing to figure it out exactly. Unix/Linux commands are spelled mostly in lower case letters with occasional upper-case letters and digits (I'll ignore one or two other legal characters like the underbar and the @-sign), so first we need the probability of all the keystrokes being contained among the 62 characters {0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z}. That is already low, but I won't bother to work it out; it would be different for different keyboard design features like function keys and numerical keypads and so on.

After that, given a combination of n characters in the right range for some positive integer n, the probability of their spelling a command name would be the number of available n-letter commands divided by 62n. For example, there happen to be 55 accessible commands (that is, commands in the directories on my path) that are spelled with 2 letters, so if just two random characters from the correct range were typed, the probability of their forming a legal command would be 55/(622), roughly 0.0143 — considerably less than a 2% chance.

The chances of accidentally hitting commands with longer names get lower and lower, of course, because there are fewer and fewer command names as length goes up (Unix loves short, cryptic command names), and at the same time you keep get a larger and larger divisor (6225

But then we have to take account of the fact that some commands require additional words. The command ls means something on its own ("list the contents of the current working directory"), but rm ("remove") requires at least one filename (which can be any arbitrary string of letters and/or digits and/or certain other printable characters), and cp ("copy") requires at least two filenames. If you get a name of a command that needs one extra word on the command line, you have to get a space after the command line and then a sequence of characters; and if you happen to get a name of a command that needs two extra words on the command line, you have to get a space after the command line and then two sequences of characters separated by a space.

You do the math; it can be your Breakfast Experiment™. As it happened, what my keyboard actually told my Linux system to do this morning was this:

bg OP+

(plus a Return on the end, which caused the actual execution attempt). I was lucky. This is not a very dangerous command. It means "Take the stopped job whose identifying number is OP+ and restart it running in the background." And it turned out to be semantically incoherent: OP+ is not a valid job number, so the result was just an error message saying that there was no such job.

It could have been an issue, though. The following string is of exactly the same length (seven keystrokes including the space and the final invisible Return):

rm ~/*

But that one means "Remove all plain files in the current user's home directory." And a Unix system will do just that if you (or your dropped keyboard) should happen to tell it to. It won't ask "Are you sure?" or "Delete all files?"; it will just swiftly and silently destroy the record of their former existence. And there is a finite, though very small, probability that it might happen simply by accident. There but for the grace of God and the low probability of random strings turning out to be grammatical in most kinds of language...

You may recall that I remarked on Language Log in another context that in English nearly all strings of words are ungrammatical, in the sense that the probability of a random string of English words being grammatical in Standard English heads down toward zero in the limit as string length goes up toward infinity. That's only a conjecture, but I think it's true. (By the way, it doesn't have to be true: Chris Barker devised, just for fun, a programming language called jot in which all programs are expressible and in which every string is grammatical; see this fascinating page. But in fact jot only uses the characters 0 and 1, so there is hardly any chance of a jot program being typed and run by accident even if he drops his keyboard while running a jot interpreter in the active window. Given about a 3% chance of the first keystroke being a 0 or a 1, the probability that both of the first two will be binary digits is only 6/10,000 = 0.0009, and for the first three the probability is only .000027, and we head downward toward zero land pretty rapidly.)

Posted by Geoffrey K. Pullum at May 18, 2007 04:56 PM