Wednesday, May 25, 2011

Refuting Satoshi Kanazawa's "Objective Attractiveness" Analysis

http://www.eurweb.com/wp-content/uploads/2011/05/satoshi_kanazawa2011-med.jpg

If you follow any of the psychology blogs, you no doubt have read recently about Satoshi Kanazawa's supposed "Objective Attractiveness" analysis at his Psychology Today blog, The Scientific Fundamentalist, in which he claimed that African-American women are "objectively" less attractive.

The article has been removed from the Psychology Today site, but you can still read it here.

The response to his post has been nothing short of incendiary - in an earlier century he may well have been forced to leave the village and live alone in the wilderness.

At Scientific American, guest blogger

The Data Are In Regarding Satoshi Kanazawa

May 23, 2011

A Hard Look at Last Week's "Objective Attractiveness" Analysis in Psychology Today

If what I say is wrong (because it is illogical or lacks credible scientific evidence), then it is my problem. If what I say offends you, it is your problem."—Satoshi Kanazawa

Satoshi Kanazawa has a problem.

It is hard to believe that it was merely a week ago today that I first encountered Satoshi Kanazawa; given all that I have read, thought and talked about him this week, it feels like a year. For those of you who haven't been following this saga online, or aren't regular readers of Psychology Today: last Sunday, Satoshi Kanazawa, PhD, Evolutionary Biologist and professor at London School of Economics posed (and purported to answer) an incendiary question on his Psychology Today blog: "Why Are Black Women Less Physically Attractive Than Other Women?"

Though the post has been removed from the site, you can now see it here. In the post, Kanazawa promises his readers a scientific analysis of public data showing objective evidence of Black women's status as the least attractive group among all humans. In other words, he promises to wave a magic wand, say "Factor Analysis!" and make racist conclusions appear before your (bluest) eyes.

As it turns out, Kanazawa is a repeat offender, with years of roundly criticized and heartily debunked pseudoscience-based shock-jockery under his belt. Despite this, he is still posting on the blog of a reputable mainstream publication, still teaching at a respected university and still serving on the editorial board of one of his discipline's peer-reviewed research journals. Though, possibly not for long: this particular post's racist hypothesis offended many, unleashing serious righteous outrage across the internet: social media users raced to blog, tweet and even petition demanding that Psychology Today remove Kanazawa as a contributor to their Web site and magazine. Psychology Today removed the post late Sunday night, and Monday morning the largest student organization in London (representing 120,000 students) unanimously called for Kanazawa's dismissal.

Over the past week, a handful of Kanazawa's fellow bloggers at Psychology Today have posted insightful and at times scientifically-grounded critiques of his research question and methodology. Dr. Scott Barry Kaufman has even done an independent statistical analysis of the data set Kanazawa uses to "prove" his theory, beating me to publication by a couple of days but coming to the same conclusions I have derived from my own independent analysis.

Independent evaluation of an article's data analysis is a critical step in deconstructing scientific inquiry, and one the mainstream media rarely undertakes. As the founder of a science journalism nonprofit – and therefore an aspiring entrant into the mainstream media ranks – I am alarmed by this. Whether we agree with Kanazawa's assertion or are horrified by it, we cannot report on it without actually comparing his hypothesis to the evidence. Yet, as the London Guardian warned us back in 2005:

...[s]tatistics are what causes the most fear for reporters, and so they are usually just edited out, with interesting consequences. Because science isn't about something being true or not true: that's a humanities graduate parody. It's about the error bar, statistical significance, it's about how reliable and valid the experiment was, it's about coming to a verdict, about a hypothesis, on the back of lots of bits of evidence.

In his blog post, non-journalist Kaufman [and his co-author on the post, Jelte Wicherts, who also wrote up a much more complete, technical analysis of the dataset here] did a reporter's job, explaining why Kanazawa's statistical analysis was bunk, independently analyzing the Add Health data set (freely available here or here for anyone to analyze!) to find that Kanazawa's conclusion that Black women are the least attractive was incorrect, even if you buy into his idea that the Add Health data set was a reasonable sample from which to ground such an assessment. See Kanazawa's graph, which is magical thinking in the guise of factor analysis:

and Kaufman's graph, which makes sense:

Like Kaufman, I take great issue with Kanazawa's use of a study on adolescent health and behavior to explain human attractiveness or lack thereof. The Add Health Study begins tracking its study participants at the age of twelve, and Kaufman wisely limits his analysis to that including participants who could reasonably be considered adults.

I am disturbed by the fact that the Add Health study's adult researchers even answered the question of how attractive they rated these youth. I am even more deeply disturbed by the idea that we are to extrapolate a general theory of desirability from these adult interviewers' subjective assessment of the children's attractiveness. Kaufman's analysis may be correct, but having run the analysis as well, I feel even more strongly that this data set is a completely inappropriate basis for Kanazawa's analysis.

Brian Hughes, of The Science Bit, agrees. Hughes' critique focuses on the lack of race and sex data of the interviewers, as well as the ambiguity around the number of interviewers used – it is a worthwhile read. Hughes also points out that the Add Health data set fails to report the race of the interviewer, or any facts about the interviewer at all. For example, there is no data to analyze to help us determine if interviewers preferred interviewees of their same race.

As Robert Kurzban comments in his Psychology Today blog retort to Kanazawa, "Rhodes et al. (2005) argued that if people prefer faces that constitute an average of the faces that they experience, then, as they put it, faces 'should be more attractive when their component faces come from a familiar, own-race population.' They indeed showed some evidence for an 'own race' effect." Hence, in knowing the race of the interviewer and the interviewee, we might actually be able to learn whether this held true and add to the body of scholarly knowledge.

Kaufman and other bloggers also address Kanazawa's painful contortion of factor analysis, which I agree is laughable. He looks at three measurements of the same test taken at three different time points and creates a one-factor model, with the one factor being "objective attractiveness." This is, of course, founded on the principle that an attractiveness rating handed out by interviewers in a study on adolescent health and well-being is actually measuring something that we can agree is "objective attractiveness."

He then says that by merging these three measurements for each interviewee into one factor, he can use factor analysis to get at that "objective attractiveness" while minimizing any error. This is just plain false. Factor analysis cannot get rid of measurement error. If it could, we'd all be using it all the time, and we'd get rid of all measurement error, and scientific studies wouldn't need to be replicated.

What his factor analysis might be saying is that over time, individuals were rated relatively consistently by interviewers on what the study called attractiveness. Without knowing anything about the interviewers, we have no idea whether this is significant. The beauty – and danger – of factor analysis is that the statistician running the analysis gets to define the factors, and there are an infinite number of factor solutions to any given problem - or at least, no unique solutions.

Kanazawa continues by looking at the attractiveness mean values for women by racial group, also as measured by the interviewer, and, seeing a difference in the overall attractiveness rating as broken down by these arbitrary racial groups (which somehow fail to include "Hispanic," despite all other study data including that category), concludes that since there are differences between groups, then the reason for that difference in the rating of attractiveness by interviewers over time is due to race.

But that is a logical fallacy. We have no idea why the interviewers felt differently about different youth in the study – correlation is not causation. In fact, according to Kaufman's reading of the data, correlation might not even really be correlation:

The low convergence of ratings finding suggests that in this very large and representative dataset, beauty is mostly in the eye of the beholder. What we are looking at here are simple ratings of attractiveness by interviewers whose tastes differ rather strongly. For instance, one interviewer (no. 153) rated 32 women as looking "about average," while another interviewer (no. 237) found almost all 18 women he rated to be "unattractive."

Kanazawa also correlates Black female self-perception of attractiveness as being higher than Black female rated attractiveness, despite there being no one-to-one relationship between self-identification of race and perceived race. The two could be completely different: for example, I could self-identify as Hispanic but my interviewer, seeing my dark skin, might perceive me as Black. Hence, Kanazawa's conclusions are nonsensical.

Kanazawa surmises that Black women's lower attractiveness might be due to low estrogen and high testosterone. Yet, high estrogen levels and low testosterone is a leading cause of fibroids, which significantly impact Black women, especially Black women who are overweight. Also, Black women have been found to have higher levels of estrogen in a study on breast cancer. Finally, Kanazawa offended his fellow Psychology Today bloggers in 2008 with his post, "The power of female choice: Fat chicks get laid more." The thesis there contradicts his supporting theory here. It leads me to wonder if this is all just some grand practical joke.

I see a more central flaw with Kanazawa's method beyond its creepiness, reliance on unscientific conjecture or abuse of factor analysis. Since the interviewers' assessment data was never intended to be used for an analysis such as Kanazawa's, the survey was not designed to capture that information. In fact, nowhere in the study monograph, nowhere on the website and nowhere in the study design materials is the interviewer's assessment of the interviewee's attractiveness mentioned. (I emailed the study designers to ask why they collected this information in the first place, and will update this post below if they answer.)

Why was the study undertaken? According to the study website, it was in response to a mandate by the US Congress inthe NIH Revitalization Act of 1993, where Congress asked a division of the NIH to "provide information about the health and well-being of adolescents in our country and about the behaviors that promote adolescent health or that put health at risk" with "a focus on how communities influenced the health of adolescents."

The Add Health study measures hundreds of variables. One has to wonder: why pick only race? Especially when the results of your "study" are so unabashedly weak? Seeing that Kanazawa based his findings on such a tenuously related study, I wonder how many other studies he scoured for evidence to support his point. This sort of "fishing" for results to support your finding leads to bad science, period.

I agree with Psychology Today blogger, Sam Sommers, PhD, of Tufts University, when he concludes:

Like it or not, the burden is higher when you're a scientist blogging about science. And anyone who can only think of one explanation for an observed difference in a data set might simply be incapable of meeting that high burden.

To quote Kanazawa, a little bit of logic goes a long way. Seeing that his work is rife with logical errors, Kanazawa should be criticizing himself.

I drafted this post after spending a couple of days sorting through my emotions on Kanazawa's work. Seeing that the man clearly relishes his role as an agent provocateur, I knew I could not impact him or those who respond to his work from a place of emotion. He has made that much clear.

From my incessant reading of blog responses and comments, I have encountered the sentiment that because Kanazawa's question was immoral to ask, his results are invalid. I agree with my heart and soul that the way he framed his so-called "research question" is offensive, racist and harmful. As I tweeted after reading Kanazawa's post, "Imagine a little Black girl reading this filth. [Toni Morrison's novel] The Bluest Eye is not history to her. It's reality." I want to protect that little girl – and wish I could heal all the little girls that came before her and grew up into beautiful women like this one, made to feel ugly by a racist society. I stand in solidarity with Black women and hope you will heed this blog's cry to stand stronger than ever in self-love.

The intent behind a question can establish an immoral line of inquiry and instigate immoral research methods (see the Nazi doctors' experiments). But a question itself is not evil. Scandalous, offensive and sometimes frightening questions are often at the root of important scientific inquiry. When supported by data significant enough to support them, these questions drive us toward the truth (see, e.g., "the Earth is round").

I agree with Psychology Today blogger Mikhail Lyubansky, PhD, when he says, "[e]xtraordinary claims ... require extraordinary evidence and editorial oversight." This does not lead us to censorship; it means requiring that an inquiry bring us closer to – not farther from – the truth. Kanazawa does not earn censure with the political incorrectness of his question, but earns social and scientific irrelevance through the weakness of his research. This irrelevance earns Kanazawa a special place in hell in today's link-driven media economy – one where no one will hear him scream. One week later, neither Kanazawa nor Psychology Today's editors has published any official defense, apology or explanation. The silence is deafening.

About the Author: Khadijah M. Britton, JD, is founder of BetterBio, a Massachusetts-registered nonprofit and fiscally sponsored project of the 501(c)(3) Fractured Atlas whose mission is to empower journalism that reinforces the intimate connection between life and science. BetterBio provides a platform for comprehensive science reporting, challenging us to ask hard questions and debunk dangerous myths while addressing our collective social responsibility. Khadijah also serves as a post-graduate research fellow in antibiotic policy under Professor Kevin Outterson at Boston University School of Law while she completes her Master's in Public Health at Boston University School of Public Health and studies for the bar exam.

The views expressed are those of the author and are not necessarily those of Scientific American.


2 comments:

Caty Karther said...

The data Kanazawa used for his research were drawn from the National Longitudinal Study of Adolescent Health (Add Health), a congressionally-mandated study funded by the U.S. National Institutes of Health.

william harryman said...

Yes, that is mentioned in the article I shared - no one is questioning the data from what I have seen, rather, the criticism is on his faulty (some might say, "willfully misinterpreted") analysis of the data.