August 15, 2005

Missing decades in Dutch, French and German

In response to my post "What happened to the 1940s?", Michel Vuijlsteke sent in the results of some experiments of his own on the relative frequency of 20th-century decade names in Dutch, French and German. His conclusions (which he qualifies as "preliminary and unscientific"):

- Dutch, French or German: no one writes of the 1940s much on the web, as you pointed out.
- Dutch speakers don't care much for the 1990s
- German speakers on the other hand *love* the 1990s (fall of Berlin Wall / reunification perhaps?)
- There's something fishy going on with Google's French pages: it contradicts all of the other trends in all of the other languages

Ah, Google and the French: wheels within wheels.

Anyhow, here are Michel's graphs for Dutch:


and German:

and his counts for Dutch:

de jaren 10 de jaren 20 de jaren 30 de jaren 40 de jaren 50 de jaren 60 de jaren 70 de jaren 80 de jaren 90
Google 895 31900 56100 29200 122000 192000 228000 295000 162000
Yahoo 6430 103000 175000 81300 374000 538000 651000 750000 513000
MSN 2842 26255 41826 19876 74246 104467 136817 144827 119179

for French:

les années 10 les années 20 les années 30 les années 40 les années 50 les années 60 les années 70 les années 80 les années 90
Google 6330 143000 269000 124000 712000 948000 654000 730000 822000
Yahoo 19100 395000 684000 316000 1E+06 2E+06 2E+06 3E+06 2E+06
MSN 12445 76693 141856 61837 219592 326127 397380 516601 367362

and for German:

1910er 1920er 1930er 1940er 1950er 1960er 1970er 1980er 1990er
Google 77900 219000 263000 139000 341000 454000 496000 510000 692000
Yahoo 92300 443000 437000 243000 701000 891000 980000 1E+06 1E+06
MSN 6075 51201 54000 19345 77429 101193 107841 112712 142733

[Update: Trevor at Kalebeul emailed:

My lunch companion suspects that the 1940s score so low because people often refer to them as "the war years." The yuppie years were neither so dramatic nor so focussed, so "the 1980s" tends to be preferred.

Yes, this was also my hypothesis.

In addition, I believe that "the war years" (essentially the first half of the decade) are felt to be very different from "the postwar years" (the second half of the decade and onward), so that there is a smaller tendency to refer to the decade as a whole under any name.

When people talk about "the 1960s" they really mean "1965-1969" or so, in most cases, but the 1960-1965 period has no real identity of its own, so "the 1960s" is used.

Trevor offered another suggestion as well:

Another more creative excuse for under-represented 40s would be that, because of paper shortages, they were not very good at documenting themselves, thus depriving us of primary sources. My grandad wrote many of his letters from the front on toilet paper, and the censors fortunately understood the concept of added value.

Though I'm no archivist, my guess is that the WWII years are nevertheless pretty well documented, and they were certainly full of events for people to write about in retrospect.]

[Update #2: Andrew Gray emailed with the plausible suggestion that the smaller number of references to the teens is essentially morphological in character: English, we don't really ever say "the tens" (though we will say "the nineteen-tens"), because it seems verbally clumsy, and it's quite possible that this will spill over into writing about the decade. If you never *say* a phrase, you're less likely to use it in writing, in my experience. As an aside, try saying it to yourself - does it evoke any mental images? It draws a blank for me - "the 1910s" is a phrase with very few connotations, and so is probably less likely to be used idiomatically.

I went and did a little searching. I've taken Google numbers, as you did, for every decade back to 1510, so we can compare five centuries, and in all cases the numbers drop off sharply in the --10s decade. I missed out the first decade of each century, "1700s" and the like, since these usually refer to the century as well as the decade and so are pretty skewed.

The numbers on the Y-axis are "percentage of hits for that century which were this decade"; I normalised it to this to allow comparison across centuries, as the absolute numbers steadily shrank over time.


Posted by Mark Liberman at August 15, 2005 04:42 PM