A Preliminary Review of Jones and Yarhouse’s “Ex-Gay? A Longitudinal Study”
September 17th, 2007
Last Thursday, Stanton Jones and Mark Yarhouse announced the results of their new ex-gay study at a press conference in Nashville, where the American Association of Christian Counselors was holding its annual conference. The study, Ex-Gays? A Longitudinal Study of Religiously Mediated Change in Sexual Orientation will be published by InterVarsity Press October 10. This review is based on a thirteen-page synopsis that was provided for the AACC.
This study was funded by Exodus, with Jones and Yarhouse promising that “we would be reporting publicly the results of our outcome study regardless of how encouraging or embarrassing Exodus might find those results.” Based on Exodus’ press release and Alan Chambers’ presence at the press conference, it appears that Exodus is quite pleased with the study. Exodus, by the way, chose that same weekend to host their regional conference in Nashville, making for a very well coordinated event.
The Study’s Design
According to Jones and Yarhouse, their study was intended to answer two questions: Is change in sexual orientation possible, and are attempts to change harmful? And to answer those questions, they set out to do something that hadn’t been done before. They constructed what’s called a longitudinal or prospective (i.e. forward-looking) study, where they followed a population of study participants as they were beginning their experience with ex-gay ministries and continued to follow them over a period of four years.
This is an important feature of the study. One of the many criticisms for Robert Spitzer’s 2003 ex-gay study was that it was a retrospective (i.e., backwards-looking) study. In other words, study participants were asked to remember back to before they began their attempts to change and report from memory their sexual orientation and attractions. Jones and Yarhouse chose to conduct a prospective study instead:
In contrast to retrospective methods that ask participants to remember change experiences that happened in their pasts, a prospective methodology begins assessment when individuals are starting the change process and assesses them as the results unfold.
Because it’s longitudinal, beginning as the ex-gay participants begin their own journey, there’s no reliance on possibly faulty memories when asked, “What were your attractions several years ago when you started?”
When the study began, participants undertook a number of interviews face-to-face. This was at Time 1. There were two more interviews, Time 2 and Time 3, with the span between Time 1 and Time 3 being between thirty months to four years. Most of the Time 2 interviews were conducted face to face (15% were over the phone) and all of the Time 3 interviews were done over the phone. For all three, Jones and Yarhouse used a number of recognized, standardized measures for sexuality and mental health, with the crucial self-reports of sexuality being conducted via mail-in questionnaires.
Another weakness with Robert Spitzer’s study was that he didn’t use any standard measures for sexual orientation. He didn’t use the Kinsey scale (where 0= completely heterosexual and 6 = completely homosexual), nor did he use the Shively-DeCecco scales (which separates the intensity of homosexual and heterosexual attractions on two separate scales for independent measurement, with the zero axis representing perfect asexuality). Jones and Yarhouse used both sets of scales for their analysis, using commonly recognized standardized questionnaires to determine ratings at Time 1, Time 2 and Time 3.
Probably the weakest link in the design was in relying on self-reports for assessing sexual orientation instead of physiological measures of arousal. They addressed those criticisms this way:
Psychophysiological measures assess sexual arousal and orientation by attaching sensors to the genitals of subjects and measuring sexual arousal while the subjects watch pornography. We judged these methods as pragmatically impossible given the dispersed nature of our sample and the limitations of our funding, as morally unacceptable to the bulk of our research participants, and as not justified in light of current research challenging the reliability and validity of the methods themselves.
These are all legitimate objections as far as penile and vaginal plethysmography are concerned. There are however new emerging technologies involving MRI’s which may be useful for future studies.
Difficulty In Recruiting Participants
While Jones and Yarhouse’s study appears to be very well designed, it quickly falls apart on execution. The sample size was disappointingly small, too small for an effective retrospective study. They told a reporter from Christianity Today that they had hoped to recruit some three hundred participants, but they found “many Exodus ministries mysteriously uncooperative.” They only wound up with 98 at the beginning of the study (72 men and 26 women), a population they describe as “respectably large.” Yet it is half the size of Spitzer’s 2003 study.
Jones and Yarhouse wanted to limit their study’s participants to those who were in their first year of ex-gay ministry. But when they found that they were having trouble getting enough people to participate (they only found 57 subject who met this criteria), they expanded their study to include 41 subjects who had been involved in ex-gay ministries for between one to three years. The participants who had been in ex-gay ministries for less than a year are referred to as “Phase 1″ subpopulation, and the 41 who were added to increase the sample size were labeled the “Phase 2″ subpopulation.
This poses two critically important problems. First, we just saw Jones and Yarhouse explain that the whole reason they did a prospective study was to reduce the faulty memories of “change experiences that happened in their pasts” — errors which can occur when asking people to go back as far as three years to assess their beginning points on the Kinsey and Shively-DeCecco scales. This was the very problem that Jones and Yarhouse hoped to avoid in designing a prospective longitudinal study, but in the end nearly half of their results ended up being based on retrospective responses.
This diluted the very purpose of doing a longitudinal study, and as Jones and Yarhouse describe it, this also clearly affected the results:
We expected that the results of change would be somewhat less positive in this group (phase 1), as individuals experiencing difficulty with change would likely be somewhat less positive in this group, as individuals experiencing difficulty with change would be likely to get frustrated or discouraged early on and drop out of the change process. We were able to retain these Phase 1 subjects in our study at the same rate as the whole population, and indeed found that change results from them were a bit less positive.
Left unsaid but clearly implied is the second problem with adding Phase 2 participants. Since they had already hung in there for between one and three years, that subpopulation would not have included those who entered ex-gay ministries at the same time they did but who were discouraged early on and dropped out. It’s no wonder the change results for Phase 1 were less positive than Phase 2. There’s no indication how “less positive” those results were, not in this synopsis anyway. Hopefully the book will break these numbers out.
But in the synopsis at least, the study’s results appear to combine Phase 2 and Phase 1 participants, which represents an unacceptable mixing of prospective (Phase 1) and retrospective (Phase 2) participants. And since the Phase 2 participants make up nearly half the total sample, this ruins any chance of this being a truly prospective study.
Whenever a longitudinal study is being conducted over a period of several years, there are always dropouts along the way. This is common and to be expected. That makes it all the more important to begin the study with a large population. Unfortunately, this one wasn’t terribly large to begin with; it started out at less than half the size of Spitzer’s 2003 study. Jones and Yarhouse report that:
Over time, our sample eroded from 98 subjects at our initial Time 1 assessment to 85 at Time 2 and 73 at Time 3, which is a Time 1 to Time 3 retention rate of 74.5%. This retention rate compares favorable to that of the best “gold standard” longitudinal studies. For example, the widely respected and amply funded National Longitudinal Study of Adolescent Health (or Add Health study reported a retention rate from Time 1 to Time 3 of 73% for their enormous sample.
The Add Health Study Jones and Yarhouse cite began with 20,745 in 1996, ending with 15,170 during Wave 3 in 2001-2002. But this retention rate of 73% was spread over some 5-6 years, not the three to four years of Jones and Yarhouse’s study.
What’s more, the Add Health study undertook a rigorous investigation of their dropouts (PDF: 228KB/17 pages) and concluded that the dropouts affected their results by less than 1 percent. Jones and Yarhouse didn’t assess the impact of their dropouts, but they did say this:
We know from direct conversation that a few subjects decided to accept gay identity and did not believe that we would honestly report data on their experience. On the other hand, we know from direct conversations that we lost other subjects who believed themselves healed of all homosexual inclinations and who withdrew from the study because continued participation reminded them of the very negative experiences they had had as homosexuals. Generally speaking, as is typical, we lost subjects for unknown reasons.
Remember, Jones and Yarhouse described those “experiencing difficulty with change would be likely to get frustrated or discouraged early on and drop out of the change process.” And so assessing the dropouts becomes critically important, because unlike the Add Health study, the very reason for dropping out of this study may have direct bearing on both questions the study was designed to address: Do people change, and are they harmed by the process? With as much as a quarter of the initial population dropping out potentially for reasons directly related to the study’s questions, this missing analysis represents a likely critical failure, one which could potentially invalidate the study’s conclusions.
Representativeness of Study Participants
Jones and Yarhouse describe their sample’s representativeness in contradictory and confusing terms:
Our study examines a representative sample of the population of those in Exodus seeking sexual orientation change. We cannot be absolutely certain of perfect representativeness, since no scientific evidence exists for describing the parameters of such representativeness. Still, we are confident that our participant pool is a good snapshot of those seeking help from Exodus.
Did you get that? It’s representative, but they can’t prove it. But they’re confident anyway.
When researchers make a sweeping statement, especially one as important as representativeness, they bear the burden of providing evidence to support their claim. If they can’t do that, then they must instead caution that their sample may not be representative and list the reasons why. I don’t think a respected peer-reviewed journal would let Jones and Yarhouse get by with claiming representativeness with nothing to substantiate that claim.
Their synopsis doesn’t describe how members were recruited into the study, so we can’t judge what selection biases may occur during recruitment. (I’m sure the book addresses some of this — I’d be shocked if it didn’t.) Nor do they discuss how their demographics might compare with other measures for ex-gay ministry participant populations. There are conferences, rosters, or simple surveys of ex-gay ministry leaders that they could have culled and compared their demographic data with.
But they didn’t appear to have done this, not according to the synopsis anyway.
But I think at least one demographic variable they provided is ample evidence that their sample is not representative. For example, they said that the average age of their sample was 37.50 years. Having asserted that their sample was “fairly representative,” they extrapolated that to the entire Exodus population this way:
The average age was older than we had expected, and its significance should be underscored. There is an unflattering caricature that Exodus groups appeal primarily to young, naïve, confused and sexually inexperienced individuals.
In this statement, Jones and Yarhouse appear to be more interested in defending Exodus’ reputation than in defending their own sample. But when I attended the Exodus Freedom Conference in Irvine California in June 2007, I got the distinct impression that the average age of the 800 participants was well under 37 years — perhaps even under thirty. The median age of “strugglers” was certainly close to thirty. The conference audience definitely skewed quite young.
We should keep in mind that Jones and Yarhouse limited their study sample to those over eighteen; their youngest participant ended up being twenty-one. Exodus, on the other hand, allowed registrants for their annual conference to be as young as thirteen, although I don’t think I saw anybody that young there. I did see a large number of teenagers, and an extraordinary number of young people, largely under thirty. Exodus had special programs set aside at that conference for younger people which old fogies like me weren’t allowed to attend. Exodus even operates an entire ministry called Exodus Youth, headed by Scott Davis, which specifically targets young people of high school and junior high ages. The Love Won Out ex-gay conferences also conduct several workshops for youth. And again, some of these workshops are closed to older adults.
Jones and Yarhouse may have had good ethical and methodological reasons for limiting their study to those above the age of eighteen. There are issues of informed consent, and questions would undoubtedly arise as to whether youth who are still under their parents direction would feel free to answer questions truthfully. But by limiting the study to those above the age of eighteen, Jones and Yarhouse guaranteed that their study would not be representative of Exodus participants overall.
Results – Is Change Possible?
As Timothy Kincaid already reported, the breakdown of the quantitative results went this way:
- 33 people reported change (moving from homosexual, bisexual or other at Time 1 to heterosexual at time 3; or homosexual at Time 1 to bisexual or other at Time 3)
- 29 reported no change
- 8 reported “negative change” (moving from heterosexual, bisexual or other at Time 1 to homosexual at Time 3; or from heterosexual at Time 1 to bisexual or other at Time 3).
- 3 reported uncertain change (moving from bisexual to other, or the reverse)
Keep in mind however that these results mix the truly prospective participants (Phase 1 participants who who began the study during their first year in Exodus ministries) and the retrospective participants (Phase 2 participants who had been in ministries for between one and three years). We don’t know what the mix of these two subpopulations are in the results. Since Jones and Yarhouse already stated that reported change from the prospective phase 1 group were “a bit less positive,” we know the results aren’t the same. But unless we understand how Phase 1 fared, we don’t know how mixing in people who were asked what their beginning orientation was retrospectively affected the results.
These results were derived using standardized measures using Kinsey and Shively-DeCecco scales. And the the Shively-DeCecco scales (remember, this separates homosexual attraction and heterosexual attractions on two separate axis), revealed something particularly interesting:
Changes on the Shively and DeCecco ratings for all three of our analysis followed a stable pattern… We see that change away from homosexual orientation are consistently about twice the magnitude of changes toward heterosexual orientation. It would appear, then, that while change away from homosexual orientation is related to change toward heterosexual orientation, the two are not identical processes. The subjects appear to more easily decrease homosexual attraction than they increase heterosexual attraction. [Emphasis in the original]
In many ways this confirms what many opponents of ex-gay therapy have noted, that attempting to change sexual orientation does not necessarily make someone straight. In fact, this particular finding makes it all the more unlikely, and puts into context Jones and Yarhouse’s characterization of success as “satisfactory, if not uncomplicated, heterosexual adjustment.”
This also, I think, goes a long way toward describing something else. It is often assumed that those who reported the most change were probably bisexual to some degree when starting the change process. To test that theory, Jones and Yarhouse created a subpopulation from their sample that, for want of a better term, they dubbed “The Truly Gay”:
… [T]o be classed as truly gay, subjects must have reported above average homosexual attraction and reported homosexual behavior and reported past embraced of a gay identity. We would emphasize that these were much more rigorous standards than are typically employed in empirical studies to classify research subjects as homosexual. Using this method, 45 out of our total 98 subjects were classed as “Truly Gay,” just less than half the population sample. We expected that the results of change for the Truly Gay subpopulation would be less positive, as they individuals would be those more set and stable in their sexual orientation. This is not what we found. Rather the change reported by the Truly Gay subpopulation was consistently stronger than that reported by others.
It’s unclear to me what they meant by “above average homosexual attraction” in their definition for the “Truly Gay.” Most researchers consider only Kinsey 5’s and 6’s to be “truly gay.” It’s not clear that this is what Jones and Yarhouse did here. By saying “above average homosexual attraction,” do they mean above average for this sample? If so, what was that cut-off? Maybe the book will clear things up. We’ll see.
But let’s assume for a moment that their criteria is valid, and let’s look at this in light of what they noticed about change to begin with: A change away from homosexual attractions at a rate that is about twice the rate of change toward heterosexual attraction. When looked at it this way, it is possible that the “stronger change” for the “Truly Gay” subpopulation was possible simply because there was a greater potential travel along the Kinsey or Sively-DeCecco scales to begin with; many bisexuals would have begun their attempts to change already partway down those paths. And since overall, the best functioning was “satisfactory, if not uncomplicated, heterosexuality,” it appears that for both groups, there was a finite limit short of Kinsey 0 or 1 that few in either group approached.
Qualitative Analysis of Change
So far, we’ve talked about statistical measures of change based on Kinsey and Sively-DeCecco scales. Jones and Yarhouse also described some qualitative analysis, based on open-ended questions about participants’ attractions, experiences and identity. Those results were:
- “Success: Conversion”: There were subjects who reported that they felt their change to be successful and reported substantial reduction in homosexual desire and addition of heterosexual attraction and functioning. 15% (11 of 72) at Time 3 met this standard.
- “Success: Chastity”: These were subjects who reported that their change was successful, and who reported homosexual attraction to be present only incidentally or in a way that does not seem to bring about distress, allowing them to live happily without overt sexual activity. 23% (17 of 72) at Time 3 met this standard.
- “Continuing”: These persons may have experienced modest decreases in homosexual attraction, but were not satisfied with their degree of change and remained committed to the change process. 29% (21 of 72) at Time 3 met this standard.
- “Non-response”: These people experienced no significant change. They had not given up on the change process, but may be confused or conflicted about which direction to turn next. 15% (11 of 72) at Time 3 were in this category.
- “Failure: Confused”: These persons had experienced no significant change and had given up on the change process but without yet embracing gay identity. No change reported and had given up but did not label themselves gay. 4% (3 of 72) at Time 3 were in this category
- “Failure: Gay identity”: These persons had clearly given up on the change process and embraced a gay identity. 8% (6 of 72) of the sample at Time 3 were in this category.
To further understand what all this means, it would be important to know how the dropouts might have affected these results. As we mentioned earlier, the Adolescent Health study (which Jones and Yarhouse upheld as a “gold standard”) made a concerted effort to understand how their dropouts might have affected the results. In doing so, they discovered that fewer than 5% dropped out because of refusal to continue. With that and other information at hand, they were able to determine that their dropouts affected the results by no more than a single percentage point.
Jones and Yarhouse appear to show no similar curiosity, and this represents a very significant failing of their study. In fact, the dropouts might have contributed very significantly towards higher “failure” numbers. But since Jones and Yarhouse appear to be incurious to find out more about this group, we are left in the dark.
Outcomes for Harm
Jones and Yarhouse administered the System Check List-90-Revised (SCL), which they describe as “a respected measure of psychological distress that is often used to measure the effects of psychotherapy.” They report no difference in the SCL scores from Time 1 to Time 3 when compared to others who are undergoing outpatient counseling.
But again, what about the dropouts? Did they report higher SCL scores at Time 1 or Time 2 before dropping out? We don’t know, not from the synopsis anyway. Again, maybe the full book will provide more details. But without this critical information to understand how the dropouts might have affected the results, Jones and Yarhouse cannot confidently conclude that attempting to change produced no harm. At best, they can only conclude that there was no greater degree of distress among those who continued ex-gay therapy when compared to mentally distressed persons undergoing psychological counseling for other issues — and by the way, is that really a legitimate comparison? I think it’s debatable. At any rate, where they chose to look, there was no problem. Where they chose not to look, who knows?
From Jones and Yarhouse’s synopsis of their study, I have a few more questions than answers. Hopefully I’ll get a copy of the full report in a few days. If so, I’ll post a more complete review as time permits. Until then, consider this review a preliminary one.
I’d have to say that I was very impressed with the study’s design, and very disappointed in its execution. Seventy-two participants out of Alan Chambers much-repeated “thousands” or “tens of thousands” doesn’t impress me much. I’m especially disappointed with these particular weaknesses:
- Jones and Yarhouse’s insistence that the study is representative of Exodus participants is completely without merit. If Jones and Yarhouse feel free to make such a sweeping claim with no data to support it, one wonders what other sweeping claims they may have made.
- Jones and Yarhouse’s apparent incuriosity towards those who dropped out borders on willful ignorance. Maybe the full book will provide better information in this area, but the synopsis leaves the impression that unlike the Add Health study that they admired, they didn’t try to learn what those dropouts might mean for their results.
- Jones and Yarhouse’s inclusion of those who had been in Exodus member ministries for between one and three years — and having that group making up nearly half of the study — makes a significant chunk of what was supposed to be a prospective study a retrospective one instead. And it misses those who “failed” out of that Phase 2 group before they had a chance to join the study. This is a particularly sloppy failing that most certainly biased the results in favor of more “successes” and fewer “failures.”
We’ve waited quite a long time for a better study than Robert Spitzer’s 2003 effort. This study held great promise based on its initial design, but its conduct left much to be desired. Its rigorous design was not matched by similar rigor in execution. And so we’re still left waiting for that definitive breakthrough ex-gay study. I don’t think this one is it.
Update: Stanton Jones Responds