Henning Mangled
Geoff Pullum
wonders
why he and his wife find the name "Henning Mankell" so
much more confusable than the name of Henning's most famous
creation, "Kurt Wallander." Could be "Hanning Menkell." Could be
"Henkel Manking." Could be almost anything. Or, to restrict it a little, anything with an "M", an "H", an "en" an "an", a "k", an "ing" and an "el" or "ell."
Presumably, the reason is connected to the state of Geoff's mind. And
that of his wife, philosopher Barbara Scholz. And presumably the states
of their minds are related to what they have experienced. And
presumably what they have experienced relates to what is in their
environment. And I'm not in their environment very much, although I was
in Geoff's environment last week, and I had a great time. Thanks for
the curry, Geoff. But given that I'm not in their environment very
much, I can only guess at what has been in it. And using the argument
of the drunk who looks for his keys under the lamp post, what I guess
is that the Google database provides a good impression of Geoff and
Barbara's environment. Of course, this could be wrong.
First the distribution of non-English words in Google is unlikely to be
similar to that in Geoff and Barbara's environment. I'll conveniently
ignore this. Second, the Google corpus, as Mark has
impressed
upon me indelibly, is wildly full of porn and gambling sites. But
from what we all know of Geoff, the internet may
underrepresent his (scholarly) interest in porn and gambling to the
same extent that it overrepresent's Barbara's. So let's not worry about
that either.
Then again, the porn and gambling sites are chock-a-block with
artificially created text - should we worry about that? Well, what I'm
going to do now is compare the rates at which various possible Swedish
mystery writer names arise. I suppose the porn and gambling sites have
an equal tendency to use "Hanning" as "Henkel", possibly close to
zero, so that although they skew any absolute frequency estimate, they
probably won't affect a relative comparison too much. So no, let's not
worry about the artificial text.
Let's get on with it!
Mystery
Name
|
Ghits
|
mennkell |
106
|
mennkel |
1
|
mankel |
9660
|
menkel |
7710
|
mankell |
206000
|
mannkell |
17
|
manning |
2670000
|
menning |
66800
|
hanning |
75100
|
henning |
2080000
|
henkell |
11000
|
hankell |
160
|
hankel |
70500
|
henkel |
661000
|
hennkel |
18
|
hennkell |
9
|
kurt
|
7350000
|
wallander
|
92200
|
First observation, f(Henning)*f(Mankell) > f(Kurt)*f(Wallander). So
the confusability of "Henning Mankell" is likely not just a raw
frequency issue. The problem, quite obviously, is that the "Henning
Mankell" morpheme space is full of similarly plausible combinations. A
full analysis would presumably involve looking at phonetic distance
between alternatives, but I haven't the time for that. I'm not even
going to consider orthographic distances, as could be measured by
counting the number of changes to one word's spelling you would need to
turn it into another. No, I'll assume that we are given that the first
name ends in "ing" and the second in "el" or "ell", and satisfy myself
just by looking at the two possible first-names/surname combinations
which use up all the relevant morphemes the right amount of times, and
which are most popular in terms of the raw frequencies of the
individual names, i.e. "Henning Mankell", and "Manning Henkel."
Doing the math, it turns out that, based naively on raw frequency of
the individual words, "Manning Henkel" is over 4 times as likely as
"Henning Mankell"! The fact that "manning" is a reasonably common
gerund has little to do with this, since "Henning" competes admirably
in frequency terms: the real problem, if Google to be believed, is the
far higher frequency of "Henkel" than "Mankell". This is in spite of
the fact that half of Google's "Mankell" pages are "Henning Mankell"
pages, so that in a survey that threw out actual mentions of the
author, the odds would be stacked even more stronlgly against him. And
in the wild feedback loops of the Pullum/Scholz household, it would
take only one or two mentions of the wrong name for their linguistic
environment to become even more polluted. No wonder Geoff and Barbara
find it confusing.
Suspiciously, I found only one instance of someone on the internet
actually misnaming "Henning Mankell" as "Manning Henkel." The culprit
appears to be
Finnish - one Esa Tuomas Tikka. Having, as I do, a talent for making strong categorical claims
on the basis of weak statistical data, and being prepared to overlook the fact that Geoff mentioned "Henkell" but not "Henkel" in his post, I therefore propose that Geoff
is also Finnish. And if you know anything about Finnish orthography,
you'll know what that means. It means that "Geoff" cannot be his real
name. Too many "g"s, you see. Who is this blogger, linguist and distinguished university professor who claims conveniently elsewhere heritage (English, of all
things - does he think it's classy?), yet is married to someone with a
passion for Swedish murder mysteries and has an unusually deep
knowledge of Eskimo snow vocabulary? "Geoff Pullum"? Pull the other
one, I should say. The game's up - reveal your true identity!