August 19, 2007

I mean, you know

Matthew Hutson writes:

Sometimes I wonder if there are underlying personality differences between people who punctuate (litter?) their speech with "you know" versus those who use "I mean" more frequently. Any hunch on that?

I don't have any hunches, and I don't know any studies about correlations between personality dimensions and choice of fillers, though I'll ask around.

However, we might be able to infer something from demographic variables. Since LDC Online lets me do database queries over boolean combinations of text strings and demographic categories, this is a perfect topic for a Breakfast Experiment™.

In the 14,137 conversations (26,151,602 words) of the LDC conversational telephone speech corpus, the frequency of "you know" compared to "I mean" appears to increase with increasing age:

 
"you know"
"I mean"
"you know"/"I mean"
ratio
20-39
58,364
24,478
2.38
40-59
278,099
73,211
3.80
60+
33,477
7,518
4.45

On the other hand, the frequency of "you know"relative to "I mean"appears to decrease with more years of formal education:

 
"you know"
"I mean"
"you know"/"I mean"
ratio
High school
2,608
408
6.39
College
191,088
51,143
3.72
Post-graduate
167,893
51,389
3.27

And there's also slight tendency for the relative frequency of "you know" to be higher among women than among men:

 
"you know"
"I mean"
"you know"/"I mean"
ratio
Women
198,086
51,689
3.83
Men
173,321
53,892
3.22

You could spin out a theory that greater use of "I mean" means greater involvement with self as opposed to others, and that age makes people less self-involved, but education makes them more self-involved, and men are somewhat more self-involved than women.

But this would be even more tenuous than such explanations generally are, since the demographic variables in this collection of conversations are not orthogonal -- in other words, the age categories are not balanced by education and sex, and the educational categories are not balanced by age and sex, and the sex categories are not balanced for age and education. So you'd at least want to do some sort of multiple regression, and I don't have time for that this morning (because I'd need to re-scan the underlying data to get the raw materials).

Posted by Mark Liberman at August 19, 2007 07:53 AM