October 29, 2003

What language are we in, mon ami?

A friend of mine recently learned that the perl programming language permits system calls to run other programs, which means a perl program can call up a simple Unix shell script, which means you could write a perl program that does nothing but call up a shell script and run it. So he jokingly suggested to me that he might put all his shell scripts in perl wrappers in this way, and thus become a bona fide perl programmer instantly, with no learning curve (learn perl in one minute flat! no studying! amaze your friends! enhance your job prospects!)

This led me to thinking that the notion of being a program in one particular language isn't really very well defined. Let me explain...

A shell script can read in arbitrary material in another language as a kind of quotation; consider this sequence of commands:


    #!/bin/csh -f
    cat >! foo.c <<QQ
    main(){printf("Hello world\n");}
    QQ
    cc foo.c
    a.out
    /bin/rm foo.c a.out

This mumbo-jumbo has the top-level form of a C-shell script. It causes "Hello world" to appear on your screen. But the way it does is by quotation: it quotes the code of a program in another language (the C language), gets the cat program to put the quoted material into a file, compiles the file with the C compiler, and then executes the compiler output and covers its tracks by removing the files. Is this a C-shell script? Is it a C program? If wrapped in a perl shell would it be perl? One could stipulate answers to these questions (it's a C shell script because the first word is #!/bin/csh), but it seems to me like the point is being missed.

If this holds for computer programming languages, it surely holds much more for natural languages. What language I am using if I say the following?

What I say is sauve qui peut, mon vieux!

(There are characters who talk this way, annoyingly, in some novels, as I recall. Hercule Poirot does, I think.)

There are two ways to go. One is to include the French bits and their structure and meaning in the structure of the English sentence, and thus for consistency include all French sentences in English, with all their structure, and thus completely blur the line between English and French, and between English and any other language (they would all have to be included). The other is to say that the example is English but it has exactly the same grammar and meaning as What I say is aaaaahhhh!. This would amount to saying that in certain contexts random noises you make with your mouth are permitted to appear in English sentences as if they were words, with no length limits or structural constraints.

Either way, the set of strings of noises that get classified as English seems intuitively way too big. There is too much included that doesn't relate at all to what the rules say about the grammatical structure or pronunciation of sentences of English. And since any string of nonsense could pop up, most conclusions about what can appear in an English string are blown away (a point that has been made by Alexis Manaster-Ramer in the context of pointing out that one cannot really support Chomsky's claim from 1956 that English is not accepted by a finite state machine, because most or all of the drivel one needs to define as ungrammatical to argue for this result turns up in strings in the form of quotations and names).

These ruminations are not quite the pointless speculation that they might appear to be. I think they carry the message that whatever we think a natural language is, we should not think of it as simply a collection of sentences. The computer science idea of defining a `language' as a set of symbol strings does have mathematical advantages, and it meshes beautifully with work on generative grammars (which is where it came from). But as my initial examples suggest, it has counterintuitive aspects even in computer science. And it's certainly not the way to conceive of natural languages. But for the same reasons, natural languages also shouldn't be thought of as mentally inscribed generative grammars (Chomsky's "I-languages").

Naturally, I wouldn't be saying all this if I didn't have a better story about how to think about natural languages. I have a most wonderful story about that topic... but unfortunately this blog entry is too small to contain it.

Posted by Geoffrey K. Pullum at October 29, 2003 07:57 PM