June 07, 2004

People whose email reveals them to be gay

There must be something wrong with me. I just don't seem to be afraid of Gmail's supposed dire threat to my privacy. Google's plan for its Gmail service is that email content will be scanned automatically on the server so that possibly relevant ads can be placed in the margin when they the messages are viewed. Imagine: you email me to say that surf's up and maybe we should wax up our boards and go catch some waves, and the Google text scanner does a keyword scan and decides to drop ads for O'Neill's Surf Shop and a wetsuit repair company in the margin so I see them while I'm reading your message. For my willingness to have these ads on my screen, I get a gigabyte of free storage. Seems like a great idea. But instead people are trying to bring lawsuits to have Gmail stopped, and I am supposed to be terrified of the dark threat of what Google, and maybe even The Government, might do to us. Have a look at the scenario (attributed to an anonymous hacker) that is described in a recent article by Annalee Newitz in Metro Santa Cruz, a free paper in the little paranoid town where I live. Imagine:

...an anti-gay group buys Gmail ads that are targeted at people whose email reveals them to be gay. When these gay people click through the targeted ads, they land on the anti-gay website, which allows the website owners to log their IP addresses — and since IP addresses are often traceable to real-world addresses, the anti-gay group could possibly use targeted Gmail ads to compile a hit list of gay people, complete with directions to their targets' homes.

It is particularly ludicrous for this alarmism to be published in a town as extraordinarily gay-friendly as Santa Cruz, but set that aside; perhaps in Alabama or east Texas gays would be tracked to their lairs via their IP addresses if only the pesky perverts could be identified. Also set aside the issue of Google's company policies (as Newitz goes on to say, Sergey Brin of Google points out that such targeting wouldn't be allowed by the company's policies anyway, even if it were feasible). Forget these issues. This is Language Log, and my concern is with the linguistic angle here: What the hell does Newitz mean by "people whose email reveals them to be gay"?

Remember, a machine is supposed to do the matching of ads with emails, and do it very fast. Is it really possible that Newitz, or the hacker she quotes, truly believes that an algorithm can determine from the text of an email whether or not the sender is gay?

Here's the text of two messages I've received in the past year, side by side, both from close personal friends of mine who are syntacticians. (Purely by coincidence, both use the smileyface ASCII emoticon in these random passages I scooped from my files.) The two are of different sexual orientations. Get to work on a perl script that will figure out which one is the homosexual:

I finally did send in an abstract. My cousins in Philadelphia dropped it off at the main post office and they claimed there that it would be delivered on Tuesday. We shall see... (whether it gets there on time, and whether it is accepted...;-) I'd like to work with you on the problem of the limits between adjectives and prepositions. I've been collecting examples of the use of PPs as the predicative complements of "seem" type verbs and also modification of P(P) by "very" and things like that. The data are interesting I think. I also have a student who wants to do a qualifying paper on extraposition from object. So, she's reading Postal and Pullum 88, and I'll also get back to you on that problem. You are truly a prince to be willing to do this on such short notice, Geoff! I checked with the powers that be, and they will cut you a check for giving the talk. Should be enough for a car rental, gas, parking, plus a decent meal somewhere. We will rearrange our schedule to make room for you on Monday. If you want to come for the full 2 hrs, I'm sure the students would enjoy interacting with you. If you prefer to do only one hour, you can pick whether to do the first or second. We meet in building 160, room 127. That is to the left of the main entrance to the university . It's on the first floor, and it's a showcase high-tech classroom, with all kinds of futuristic equipment.

Reading that phrase "people whose email reveals them to be gay" reminded me once again that even quite intelligent members of the general public have absolutely no idea of what is known about language, what is possible and what is not. They believe both too much and too little. They'll believe things that are wildly and absurdly false (like that parrots or monkeys can hold intelligent conversations); yet they won't believe things that are uncontroversially true (like that genitive noun phrases are allowed to be antecedents of pronouns in English, or that African American Vernacular English is more than just Standard English spoken badly.

Surprisingly (in view of the fact that men are reputed to be from Mars and women from Venus), computer analysis of text can't even reliably tell male from female authors. How likely is it that textual analysis will be able to tell which sex would be a given author's preference for someone to cuddle up in bed with? If this is the sort of thing that is supposed to make me terrified that Gmail will destroy my privacy, then I'm sorry, I just don't seem to be able to muster the expected level of terror.

Posted by Geoffrey K. Pullum at June 7, 2004 09:29 PM