October 31, 2004

Marx: red or blue?

Karl Marx is traditionally a red. But recently, things have gotten kind of swapped around in the U.S., so that the red states are the ones that vote Republican. Anyhow, this post is not really about Karl Marx at all, it's about a method for estimating media bias by modeling citation frequencies, featured in some recent work by Tim Groseclose and Jeff Milyo. There was a lot of discussion of this work back in August, and I've seen several mentions recently, as the question of election coverage is debated.

You can follow the links to get the details, but G&M's own description of their basic method is as follows:

To compute our measure, we count the times that a media outlet cites various think tanks. We compare this with the times that members of Congress cite the same think tanks in their speeches on the floor of the House and Senate. By comparing the citation patterns we can construct an ADA score for each media outlet.

As a simplified example, imagine that there were only two think tanks, one liberal and one conservative. Suppose that the New York Times cited the liberal think tank twice as often as the conservative one. Our method asks: What is the estimated ADA score of a member of Congress who exhibits the same frequency (2:1) in his or her speeches? This is the score that our method would assign to the New York Times. [924K .pdf here]

Their full mathematical model assumes that every citer and every citee can be assigned a numerical position on a single political dimension (which they interpret as left-right), and that the "utility" that a citer receives from making a citation is the product of the citer's and the citee's coefficients plus a measure of the citee's overall authoritativeness plus an error term. Then the probability of citation choices can be predicted from these political and authoritativeness coefficients by a "multinomial logit". G&M start from an estimate of citers' politics (in this case, the ADA rankings of members of congress), use that to estimate the politics of citees (in this case, political think-tanks), and then work backwards to estimate the politics of another class of citers (in this case, media outlets).

I'm not a big fan of the idea that political opinions should be reduced to a coordinate on a single dimension, but even if we grant that point, the logic of their first step troubles me. I often cite people I don't agree with at all, and I think others do too. For example, in this post, I obviously felt that I gained "utility" by quoting Goebbels and Hitler, whose overall political opinions are very different from mine; on other occasions, Language Loggers have cited Stalin, Thomas Jefferson, Benjamin Franklin, Thomas Aquinas, Friedrich Hayek, and so on. But as I read G&M's model, it says that we derive maximum "utility" from citing the most extreme sources whose political polarity is the same as ours. This is hard for me to square, in common-sense terms, with my intutions about my own behavior, or with (what I think are) the observed facts of human discourse.

In particular, I suspect that the one-dimensional politics of a given source can't as a rule be well estimated from the political distribution of its citers. In order to explore this point -- in an entirely unscientific but still empirical way -- I decided to look at blogospheric citation of Karl Marx. I asked Technorati this morning over breakfast, and found 345 mentions of "Marx" in blogs over the past seven days. Ignoring Groucho (and any other non-Karls), I took the first dozen English-language citing sites as my highly unscientific sample.

There was one clearly left-wing site:

PolemicBlog: "That's an egalitarian dream even Marx never contemplated."

one site that is mostly about things like music and software, but seems to express somewhat left-of-center politics:

eric's site: "capitalism the way marx and engels said it would be. the world turns a blind eye on genocide to keep the wheels of commerce greased with high quality petroleum products."

one anti-collectivist but apparently also anti-Bush site:

H. Duthel: The bond between all citizens of the state, their common political will, is the result of a forced act of volition on the part of each individual who, in order to reach his or her goal of private advantage, also participates in an abstract and general will. "The separation of bourgeois society and the political state necessarily appears as a separation of the political member of bourgeois society, the citizen, from bourgeois society, his own actual, empirical reality, because as an idealist of the state he is a being who is completely distinct, different from, and opposed to his own reality" (Marx, Critique of Hegel's `Philosophy of Right', Cambridge University Press 1970, p.79).

one hyper-individualistic site that explicitly refuses to be assigned a coordinate in the left-right dimension:

hobopoet: "Liberals say we should end employment discrimination. I say we should end employment. Conservatives support right-to-work laws. Following Karl Marx's wayward son-in-law Paul Lafargue I support the right to be lazy. Leftists favor full employment. Like the surrealists -- except that I'm not kidding -- I favor full unemployment."

and two others with vaguer politics yet

short black: "Jake then passed off some of his communist hot air dressed as wisdom. 'Well, Marx wouldn't have existed if it wasn't for Engels. Engels paid for him to work. If it wasn't for that, Marx couldn't have done what he did. What you need is some rich guy who thinks you're really smart, some rich guy who thinks he's smart. Pay for you to do your writing.'"
kirei "Ja, I’m off to read Marx."

two sites that define their politics mainly in religious terms, but would probably count as right-wing in most people's categorization:

poachedfrog.com (Christian, pro-Bush): "Rousseau, Byron, Shelley, Hugo, even Freud and Marx were the standard bearers of this new secular notion that man is fully capable of self-redemption, that man is his own master, and that man is perfectible given the appropriate political environment, an environment devoid of kings and religion."
Anti-MDP (Islamist -- "We respect the rule of law and Islam is what we believe superior.") "[Bismarck] said that the best strategy for the Bolsheviks or the Communists was to prevent the Tsar from modernising the country. In this he agreed with Karl Marx. The best way to defeat a government is to not give it the space to reform."

and four politically-oriented sites that seem clearly to the right of center:

Desert Rat Ramblings: John Kerry demands "tax fairness for Americans." This is his euphemism for fleecing Americans who pay the highest taxes. What he calls "tax fairness," Karl Marx called wealth redistribution.
The World Through Juan's Eyes: The communists are crawling out of the woodwork in support of sKerry. ... "......as critics from Marx to Chomsky have pointed out ..."
Indiana Observer: Karl Marx coined the word “capitalism” in the mid 1800s, though in his “Communist Manifesto” he never really defines it.
Stambord: That other great manifesto, the Marx and Engels one, is of course also on the reading list.


But here is (the right-hand side of) G&M's equation describing the probability that citer m selects source j:

Only the numerator of this expression really matters to us here -- the sum in the denominator is the same for all sources, and is just a normalizing factor to ensure that the probabilities for each citer sum to 1.

In the numerator, aj is a measure of the authoritativeness or popularity of source j -- the bigger aj is, the more source j will be cited (by everyone). The term bj and cm are the political coefficients for source j and citer m. (The subscripted a, b and c terms in the denominator are have similar meanings -- the denominator means "sum over all sources for citer m".)

The B&M model assumes that the political center is 0 -- they happen to have "left" corresponding to positive values, as I understand their paper, so I'll do the same. Let's take Karl Marx's political coefficient bj to be (say) +10, and set his authoritativeness arbitrarily at 100. Then if citer m has political coefficient cm, his or her probability of citing Karl Marx will be proportional to exp(100 + 10*cm).

For a citer whose political coefficient cm is similar to Karl Marx's, this expression evaluates to exp(100 + 10*10) = exp(200) or about 7.2*10^86. For someone whose political coefficient is the opposite of Marx's (a political clone of Friedrich Hayek?), this expression evaluates to exp(100 + -10*10) = exp(0) = 1.

In each case, the resulting quantity will be normalized by the sum of the similar expressions for all the other possible citees, in order to get a predicted probability. But if the rest of the situation is reasonable, then as we along the political spectrum from right to left, the probability of citing Karl Marx is predicted to increase more or less monotonically.

The thing is, that's not my impression of how the world works; and I'll take the results of my little excursion into Technorati as a crude validation of my impression. These days, the people who are most likely to cite Karl Marx are right wingers. Or maybe there's a bimodal distribution, with the extreme left and right both more likely to cite him than people in the center are. Anyhow, I'm asserting here that the G&M model makes predictions in this case that are qualitatively wrong, not just quantitatively out of wack.

I'm fond, myself, of the scientific proverb (adapted from Picasso) that "a model is a lie that leads us to the truth." So the fact that G&M's model makes some qualitatively counterfactual predictions is not necessarily a reason to reject it. Depending on the real relation between the politics of citers and citees, and the empirical distribution of citers and citees in political space, their model might be leading us towards the truth, or it might not.

One difficult question is what the rhetorical content of "citing" a source is. The implication of G&M's model is that citing X is a a sign of political agreement with X, and thus the rhetorical context would be something like "As X showed, it's true that P." But sometimes people go out of their way to find support from those whose views they don't share: "Even X admitted that P". And there are other rhetorical frames entirely: "The evil ones have no shame: X just proposed that P"; or "When my opponent suggests that P, she is echoing the ideas of X"; or just "Here's something new: X said that P".

G&M discuss the question of what counts as a citation, and demonstrate that they're aware of the issues:

We looked for instances where the legislator cited a view or a fact stated by a member of the think tank. We then counted the sentences in the citation. [...]
Along with direct quotes, we sometimes included sentences that were not direct quotes. For instance, many of the citations were cases where a member of Congress noted “This bill is supported by think tank X.” [... ]

Sometimes a legislator or a media outlet noted an action that a think tank had taken—e.g. that it raised a certain amount of money, initiated a boycott, filed a lawsuit, elected new officers, or held its annual convention. We did not record such cases in our data set. However, sometimes in the process of describing such actions, the reporter or member of Congress would quote a member of the think tank, and the quote revealed the think tank’s views on national policy, or the quote stated a fact that is relevant to national policy. If so, we would record that quote in our data set. For instance, suppose a reporter noted “The NAACP has asked its members to boycott businesses in the state of South Carolina. `We are initiating this boycott, because we believe that it is racist to fly the Confederate Flag on the state capitol,’ a leader of the group noted.” In this instance, we would count the second sentence that the reporter wrote, but not the first.

Also, we omitted the instances where the member of Congress or journalist only cited the think tank so he or she could criticize it or explain why it was wrong. About five percent of the congressional citations and about one percent of the media citations fell into this category.

In the same spirit, we omitted cases where a journalist or legislator gave an ideological label to a think tank (e.g. “Even the left-wing Urban Institute favors this bill.”). The idea is that we only wanted cases were the legislator or journalist cited the think tank as if it were a disinterested expert on the topic at hand. About two percent of the congressional citations and about five percent of the media citations fell into this category.

But by my reading of their rules, four of the six right-of-center mentions of Karl Marx still count as citations -- look at them yourself and see what you think. It's clear of course that those sources are not in accord with Marx's ideas, but you can't necessarily tell from the immediate context, at least not in any simple way that is not circular with respect to the goal of quantifying political stances. And in the case of media citations, it's often even more obscure whether the cited position is being presented in an approving, disapproving or neutral way.

Whatever the truth about blogs citing Karl Marx, I feel that it remains problematical to try to determine the politics of a source by looking at who cites it. One of the most widely cited Language Log posts was Geoff Pullum's takedown of Dan Brown's The Da Vinci Code. A majority of the links (says my impressionistic memory) were from Catholic sites. Those folks have historical and theological beefs with Dan Brown, and so they were happy to link to Geoff's piece, which was strongly negative on linguistic and literary grounds. We can't conclude anything from this about Geoff's attendance at Mass or his feelings about any theological points whatever.

[Note from Geoff Pullum: Holy smoke! You can say that again! Don't ever read off any opinions from my citation list. I cite people I utterly despise — Strunk and White, to name but two.]

[lFurther note from Mark Liberman: G&M wouldn't count your references to Strunk and White as "citations" in their sense, I think, because your criticisms of them are always front and center when you cite them. However, when a religious website references your Dan Brown post approvingly, this does seem to count, by the lights of G&M's model, as evidence for your position on (say) the continuum from secular to regligious. And if we ran the numbers, I think it might turn out that your estimated position is rather far out on the religious end of the scale. Independent of any direct evidence on the subject, it seems imprudent to me to draw such an inference simply because you happen to criticize effectively on linguistic grounds a writer whom many committed Christians dislike on theological grounds. ]

Posted by Mark Liberman at October 31, 2004 04:20 PM