December 22, 2005

Linguistics, politics, mathematics

In the end, this post is about the recently-published study of media bias by Tim Groseclose and Jeff Milyo ("A Measure of Media Bias", The Quarterly Journal of Economics, Volume 120, Number 4, November 2005, pp. 1191-1237). Earlier versions of that study have been widely reported and discussed over the past year and a half, including here on Language Log, with a critique by Geoff Nunberg, a response by Groseclose and Milyo, and a few other comments (here, here, and here). I started to play around with their model on the computer, and at the first step, something about the structure of the model took me aback. But before I get to the point, let me set the stage.

Last week, Penn's president had a holiday party for the faculty, in a big tent behind her house. In the midst of the throng I was talking with Elihu Katz and some other people from the Annenberg School for Communication, when another colleague, on being introduced, asked how we happened to be acquainted. Interpreting this as a question about academic disciplines rather than personal histories -- what could a sociologist have in common with a linguist? -- someone said something about a shared interest in communication. Elihu, who knows a thing or two about social networks, waved his hand at the crowd and said "well, I bet that 60% of the people here work on something connected to communication".

In fact, there's a cultural gulf between people who study large-scale communication -- media, politics, advertising -- and people who study small-scale communication -- individual speakers and hearers. This is one of the many boundary lines in the intellectual Balkans of research on language, meaning and communicative interaction, but I've felt for a long time that it's one of the borders where freer trade is most needed.

Some of us at Penn have recently gotten an NSF IGERT ("Interdisciplinary Graduate Education and Research Training") grant on the theme of "Language and Communication Sciences". Starting in January, I'm co-teaching a "Mathematical Foundations" course for this program, aimed introducing graduate students to a wide range of mathematical topics that are relevant to animal, human or machine communication. So I thought I'd take a small step in the direction of intellectual free trade by importing a problem or two from relevant areas of economics, political science or sociology.

One possibility is the model that Groseclose and Milyo have used to study media bias. This model is mathematically simple, and a version of it, I thought, could be applied to data from weblog links harvested (semi-)automatically. So I started playing around with an implementation of the model in R, initially using made-up data created by plugging appropriate sorts of random variables into the relevant places in the equations. And indeed this looks promising as a pedagogical exercise. But as I did this, I realized that the model starts with a very peculiar assumption about the relationship between political opinion and the choice of authorities to cite.

Let's pick up their "simple structural model" on p. 1208:

Define x i as the average adjusted ADA score of the ith member of Congress. Given that the member cites a think tank, we assume that the utility that he or she receives from citing the jth think tank is

(1)    aj + bjxi + eij

The parameter bj indicates the ideology of the think tank. Note that if xi is large (i.e., the legislator is liberal), then the legislator receives more utility from citing the think tank if bj is large. The parameter aj represents a sort of “valence” factor (as political scientists use the term) for the think tank.

They then go on to specify an equation for the probability that the ith congresscritter will cite the jth think tank, given some assumptions about the distribution of the error term eij. And then they add a similar equation for media citations of think tanks, and use the model to work backwards from media citation counts to estimates of media ADA scores. But never mind that for now. I'm interested in a weird assumption built into equation (1).

Let's take it apart term by term. As they explain, xi is the "average adjusted ADA score of the ith member of Congress". Americans for Democratic Action (ADA) "scores" are denominated in percent, representing the percent of the time that a given legislator voted the way the ADA thinks (s)he should have. These scores thus run from 0 to 100, with 0 being the most "conservative" possible voting record and 100 being the most "liberal" possible record, given that we let the ADA define the "liberal" position. (G&M's "adjusted score" is as "constructed by Groseclose, Levitt, and Snyder [1999]" in order to make the scores "comparable across time and chambers". The details of the adjustment don't matter here, and the results are still numbers between 0 and 100.).

As for eij, it's just an error term of the kind that you have in any statistical model.

That leaves aj and bj, both of which are parameters associated with the jth think tank. As they explain, bj "indicates the ideology of the think tank", because "if xi is large (i.e., the legislator is liberal), then the legislator receives more utility from citing the think tank if bj is large".

OK, that seems reasonable. What about aj? They tell us that it "represents a sort of “valence” factor (as political scientists use the term) for the think tank". I didn't know how political scientists use the term valence, so I checked, and found an (implicit) definition in T. Groseclose, "A model of candidate location when one candidate has a valence advantage", American Journal of Political Science, 45 (4): 862-886 Oct. 2001:

This article extends the Calvert-Wittman, candidate-location model by allowing one candidate to have a valence advantage over the other, due to, say, superior character, charisma, name recognition, or intelligence. Under some fairly weak assumptions, I show that when one candidate has a small advantage over the other, this alters equilibrium policy positions in two ways. First, it causes the disadvantaged candidate to move away from the center. Second, and perhaps more surprising, it causes the advantaged candidate to move toward the center. I also show that, under some fairly weak assumptions, for all levels of the valence advantage, the advantaged candidate chooses a more moderate position than the disadvantaged candidate. Empirical studies of congressional elections by Fiorina (1973) and Ansolabehere, Snyder, and Stewart (2001) support this result.

So a candidate's valence is an "advantage" that's not associated with location on the spectrum of political opinions, but has to do instead with things like "superior character, charisma, name recognition, or intelligence". In the case of a think tank, "name recognition" can be applied directly, and the other qualities translate to things like perceived quality and integrity, level of funding, degree of access to publicity networks, and so on. Again, this seems fair enough: regardless of my political position, it's reasonable that I derive more utility from citing a well-known and widely respected think tank than from citing some small-time K street shell with a shady reputation, independent of my degree of agreement with the political slant of either outfit.

But if we think about the meaning of G&M's equation (1) as a whole, something very strange emerges.

Consider four (imaginary) think tanks, with these "valences" and "ideologies":

  Valence aj Ideology bj
Bleeding Heart Institution
100
0.9
Red Meat Foundation
100
0.1
Americans For Conservative Things
10
0.1
People For Liberal Things
10
0.9

And consider two congresspersons with these "average adjusted ADA scores":

 
ADA score
Mildred Moonbat
100%
Walter Wingnut
0%

G&M's model now assigns the following "utilities" (ignoring the error term):

  Mildred M Walter W
Bleeding Heart Institution
100 + 0.9*100
= 190
100 + 0.9*0
= 100
Red Meat Foundation
100 + 0.1*100
= 110
100 + 0.1*0
= 100
Americans For Conservative Things
10 + 0.1*100
= 20
10 + 0.1*0
= 10
People For Liberal Things
10 + 0.9*100
= 100
10 + 0.9*0
= 10

This model says that because of Walter Wingnut's 0% ADA rating, he doesn't care at all about the ideology of the think tanks he cites. That's because an institution's "ideology" is multiplied by his ADA factor of 0, and so his "utility" depends only on the ideology-free "valence" of the institution. Mildred Moonbat's 100% ADA rating, on the other hand, means that she pays a full measure of attention both to "valence" and to "ideology".

Walt Wingnut is just as happy to cite the Bleeding Heart Institution as the Red Meat Foundation, and likewise just as happy to cite People For Liberal Things as Americans For Conservative Things. Millie Moonbat, on the other hand, is 73% happier to cite Bleeding Heart as Red Meat, and fully five times happier to cite People for Liberal Things as Americans for Conservative Things.

I submit that this is preposterous.

The model's prediction of the probability for a given congressperson to cite a given think tank is the exponential of the congressperson's utility for that think tank divided by the sum of the exponentials of their utilities for all think tanks. Because of the exponentials, the numbers assigned as valences -- and the resulting utilities -- will need to be much more tightly clustered if we want to get a reasonable amount of probability mass distributed over less-favored think tanks. But the basic mathematical fact remains the same: think tank ideology, according to this model, only matters to liberals. Or to put it another way, the more liberal the congressperson, the more weight they give to ideology; the more conservative they are, the closer they come to paying attention only to "valence", i.e. ideology-free quality.

Now, I freely admit that I'm not a social scientist. I'm not used to thinking about this particular kind of model, and maybe there's something obvious that I'm not seeing here. If so, I'm sure that someone will point it out to me.

But if I'm understanding this equation right, I don't understand how we got to this point in the discussion of this interesting and important piece of work without addressing this fundamental assumption of its defining equations. And I'm as guilty as anyone else, since I read a version of the paper back in July of 2004.

[Update: several people have written to observe that the problem that I describe does not arise if the conservative-to-liberal spectrum is made symmetrical around 0, e.g. -1 for most conservative to +1 for most liberal. Indeed -- that's what I assumed they were doing, when I first wrote about the details of their model back in October of 2004, because (I thought) that's the only version of the approach that seems to make sense. But in fact G&M's recent QJE article is quite explicit that their xi values, representing the ith congressperson's political position, are exactly ADA scores, which run from 0 (most conservative) to 100 (most liberal). They list these numbers in Table II, from Maxine Walters at 99.6 down to Tom DeLay at 4.7. The same thing holds for their cm values, the parameters representing the bias of the mth media outlet, which are denominated on exactly the same 0-to-100 scale, and which play an identical role in the equation specifying the utility that the mth media outlet derives from citing the jth think tank. Their Table III gives estimated c values for (some of?) the outlets in their study, from the Washington Times at 35.4 to the Wall Street Journal at 85.1.

Other people wrote to ask what effect if any the odd structure of their equation (1) might have had on outcome of the process of estimating the ADA ratings of media outlets by reasoning from the citational habits and ADA ratings of congresscritters. The answer is that I don't know, except to observe that from someone with an ADA rating of 0, their model gains information only about think tank valences and not at all about think tank ideologies. The estimate of think tank ideologies is influenced more and more strongly by the citational habits of more and more liberal congresspersons. I'm not sure what effect this had on the parameters they derived -- the result depends on the data as well as on the structure of the model -- but I'd be surprised if it didn't affect the outcome pretty strongly one way or another. It almost certainly means that the estimates of "valence" will in fact be some amalgam of valence-like factors and conservatism; and there will surely be some artefacts introduced into the estimates of think tank ideologies and media biases as well.]

[More better stuff on this same topic is here.]

Posted by Mark Liberman at December 22, 2005 11:25 AM