Saturday, 16 January 2016

Science is 'Other-Correcting'

Les Autres (by Keith Laws)
 “Are you all sitty comftybold two-square on your botty? Then I'll begin.”

Some scientists seem to think that 'publication' represents the final chapter of their story...but it is not ...and has never been. Science is a process, not a thing - it's nature is to evolve and some stories require postscripts. I have never been convinced by the ubiquitous phrase 'Science is self-correcting'. Much evidence points to science being conservative and looking less self-correcting and more ego-protecting. It is also not clear why 'self' is the correct description - most change occurs because of the 'other' - Science is other correcting. What is different from the past however, is that corrections or postscripts - often in the form of post publication peer review - now have far greater immediacy, accessibility, exposure and impact than previously.

This post details a saga that transpired when some colleagues and I highlighted data errors in a recent paper by Turkington and colleagues (2014) examining the efficacy of Cognitive Behavioural Therapy (CBT) for psychosis. It is a tale in 6 parts - the original paper and 5 letters published by the Journal of Nervous and Mental Disease ...the latest being published this month.

Montague Terrace in Blue (Scott Walker)

Your eyes ignite like cold blue fire
The scent of secrets everywhere
A fist filled with illusions
Clutches all our cares

My curiosity about this paper was piqued by the impossibility of data reported in Table 2 of the paper (see my Tweet below). A quick glance at the QPR intrapersonal values displays a mean and lower end CIs both of which are -.45 ...something was clearly amiss as the mean effect size cannot have the same value as the lower end 95% confidence interval.

This led to a brief Twitter discussion between myself and three other psychologists (Tim Smits, Daniel Lakens, Stuart Ritchie) and further oddities in the paper quickly emerged. The authors assert that multiple outcome measures revealed a significant benefit of CBT
"Parametric and nonparametric tests showed significant results between baseline and follow-up for all primary and secondary outcomes except for social functioning, self-rated recovery, and delusions" Turkington et al 2014
Curiously however, the paper contains no inferential statistics to support the claims - readers are expected to accept such assertions at face value. Moreover, their claims about significant statistical tests are themselves contradicted by the confidence intervals that they present in their Table 2 - the 95% Confidence Intervals cross zero for every variable and so, would all be non-significant. Add to this some very poorly presented Figures (see Figure 5 below), lacking axis labels and claiming to display standard error bars, which are in fact standard deviation bars

...clearly a paper with basic errors, which escaped the seven authors, the reviewers and the editor! 
These initial concerns were documented in blog posts by Daniel Lakens How a Twitter HIBAR ends up as a published letter to the editor and Tim Smits "Don't get all psychotic on this paper"
2.  Smits, T., Lakens, D., Ritchie, S. J., & Laws, K. R. (2014). Statistical errors and omissions in a trial of cognitive behavior techniques for psychosis: commentary on Turkington et al. The Journal of Nervous and Mental Disease, 202(7), 566.
Our Twitter discussion coalesced into the above letter, which was published by the Journal of Nervous and Mental Disease ...where we suggested that our concerns
"...mandate either an extensive correction, or perhaps a retraction, of the article by Turkington et al.(2014). At the very least, the authors should reanalyze their data and report the findings in a transparent and accurate manner"
We assumed that the authors might be pleased to correct such errors, especially since the findings might have implications for the psychological interventions offered to people diagnosed with schizophrenia  
3. Turkington, D. (2014). The reporting of confidence intervals in exploratory clinical trials and professional insecurity: a response to Ritchie et al. The Journal of Nervous and Mental Disease, 202(7), 567.
We were somewhat surprised when Professor Turkington alone replied. He incorrectly cites our letter (it was not Ritchie et al but to Smits et al), suggesting perhaps that he may not have taken our points quite that seriously...a view underscored perhaps by both the title and the content of his letter. Without apparently reanalysing or checking the data, he asserts:
" I can confirm that the findings have been accurately reported in our published article... In Table 2, the confidence intervals are calculated around Cohen’s d as indicated...The labeling of the axes on Figure 5 is self-evident" Turkington 2014
At this point, Professor Turkington also decided to e-mail the first author Tim Smits with an invite:
I wonder if you would attend a face to face debate at Newcastle University to debate the issues. Your colleagues McKenna and Laws have already been slaughtered in London.”
Professor Turkington refers here to the 50th Maudsley Debate (CBT for Psychosis has been oversold) - where you can view the video of our apparent 'slaughter'. Tim Smits politely declined, emphasising the importance of Turkington and colleagues actually addressing the errors we had highlighted in their paper. 
William Basinski DIP 4 from Disintegration Loops
4. As a result, we made a request to the authors and the Journal for access to Turkington et als' data, which was kindly provided. Re-analysing their data confirmed all of our suspicions about the errors. We sent our re-analyses to Professor Turkington and his colleagues
We then sent a second letter to JNMD conveying our re-analyses of their data:
We recalculated the effects sizes, which were quite different from those reported by the authors- in some cases, strikingly so (if you compare our Table 2 below with the one in my Tweet above from Turkington et al). Our re-analysis confirmed that the confidence inetrvals were incorrect and every effect size was inflated an average of 65%

At this point we received an email from the JNMD editor-in-Chief was clearly not intended for us
"Shall we just publish a correction saying it doesn't alter the conclusions"
Another email from the editor rapidly followed ...asking us to ignore the previous email, which we duly did ...along with some White Bears

For comparison purposes of course, we included Turkington et al s original Table 2 and in an ironic twist, the Journal of Nervous and Mental Disease invoiced us (106.29 Euros) for reproducing a Table presenting incorrect data in was of course eventually waived following our protest.

Living my Life by Deerhunter
Will you tell me when you find out how to conquer all this fear
I've been spending too much time out on the fading frontier
Will you tell me when you find out how to recover the lost years
I’ve spent all of my time chasing a Fading Frontier
5. At this point, JNMD decided to ask their own biostatistician - rather than the authors - to respond to our letter:
In his letter, Dr Cicchetti attempted to rebuff our critique in the following manner
To be perfectly candid, the reader needs to be informed that the journal that published the Lakens (2013) article, Frontiers in Psychology, is one of an increasing number of journals that charge exorbitant publication fees in exchange for free open access to published articles. Some of the author costs are used to pay reviewers, causing one to question whether the process is always unbiased, as is the desideratum. For further information, the reader is referred to the following Web site:
Cicchetti is here focussing on a paper about effect sizes cited in our letter and written by one of our co-authors - Daniel Lakens. Cicchetti makes the false claim that Frontiers pays it's reviewers. This blatant poison-the-well fallacy is an unfortunate attempt to tarnish the paper by Daniel and by implication, our critique, the journal Frontiers and quite possibly much of Open Access

Unfortunately, Dr Cicchetti failed to respond to any correspondence from us about his letter and his false claim remains in print ...
Beyond this, Dr Cicchetti added little or nothing to the debate, saying that everything rests on
"...the assumption that the revised data analyses are indeed accurate because I [Cicchetti] was not privy to the original data. This seemed inappropriate, given the circumstances."
As remarked here by Tim Smits in his post entitled "How credibility spoiled this mini-lecture on statistics",  this is a little disingenuous. Cicchetti did have access to the data and so, why not comment upon the accuracy of what we said?

Cicchetti's response was further covered by Neuroskeptic in his blog 'Academic Journals in Glass Houses' and by Jim Coyne in his Sordid Tale post

Finally just this week, JNND published another letter
Sivec, H. J., Hewit, M., Jia, Z., Montesano, V., Munetz, M. R., & Kingdon, D. (2015). Reanalyses of Turkington et al.(2014): Correcting Errors and Clarifying Findings. The Journal of Nervous and Mental Disease, 203(12), 975-976.

Finally, this month, the original authors - with the notable absence of Turkington - but with 2 added statistical advisors...seem to accept the validity of the points that we made .... up to a point!
"In short, the original presentation of the effect size results were incomplete and inaccurate, and the original published data would not be appropriate for future power calculations or meta-analyses as noted by Smits et al. (2015). However, at the descriptive level the reported conclusions were, in the main, relatively unchanged (i.e., while being cautious to generalize, there were modest positive effects). " Sivec et al 2015
Are the conclusions "relatively unchanged"? I guess it depends whether you think - for example - a doubling of the effect size from .78 to 1.6 for CPRS total symptoms makes much difference? Whether an overall average effect size inflation of 65% by Turkington et al is important or not? Whether future power estimates should incorrectly suggest an average sample size of n=45 or more accurately n=120. The latter point is crucial given the endemic low levels of power in CBT for psychosis studies
Regarding incorrect error bars, Sivec et al, now say:
"Consistent with the Smits et al. (2014) original critique, our statistical consultants also found that, for figures reported as histograms with standard error bars, the bars represented standard deviations and not standard errors. Because the figures are used for visual effect and because the bars are intended to communicate variation within the scores, figures were not reproduced with standard errors" Sivec et al 2015
and finally
"In the original article, we reported "Parametric and nonparametric tests showed significant results between baseline and follow-up for all primary and secondary outcomes except for social functioning (PSP), self-rated recovery (QPR), and delusions”  (p. 33). This statement is true only for the parametric tests...To be clear, most of the nonparametric tests were nonsignificant. This was not a critical part of the analyses" Sivec et al 2015
So, after 5 letters, the authors  - or most of them - acknowledge all of the errors - or most of them. Sivec et al obviously want to add a caveat to each 'admission', but the central fact is that the paper remains exactly as it did before we queried it. This means that only somebody who is motivated to plough through the subsequent 5 letters would discover the acknowledged errors and omissions and the implications

Sensitivities regarding post-publication peer review are quite understandable, and perhaps as with the 'replication' initiative, psychologists and journals need to evolve more acceptable protocols for handling these issues. Is post-publication peer review worth the effort - "Yes!" The alternative is a psychology that echoes a progressively distorting noise while the true signal is a fading frontier.


Friday, 6 November 2015

Song for the Siren

There is another future waiting there for you
I saw it different, I must admit
I caught a glimpse, I'm going after it
They say people never change, but that's bullshit, they do
Yes I'm changing, can't stop it now
And even if I wanted I wouldn't know how
Another version of myself I think I found, at last
Yes I'm Changing by Tame Impala

Some 'follow-up' observations to my earlier 'Thoughts about Holes' post on the PACE follow-up study of Chronic fatigue Syndrome/ME by Sharpe et al 2015. 

To recap, after completing their final outcome assessment, some trial participants were offered an additional PACE therapy..."If they were still unwell, they wanted more treatment, and their PACE trial doctor agreed this was appropriate. The choice of treatment offered (APT, CBT, or GET) was made by the patient’s doctor, taking into account both the patient’s preference and their own opinion of which would be most beneficial.” White et al 2011
I have already commented on some of the issues about how these decisions were made, but here I focus on the Supplementary Material for the paper (see particularly Table C at the bottom of this post) and - what I believe to be some unsupported ...or unsupportable inferences made about the PACE findings recently.

Song to the Siren by Tim Buckely
(from the Monkees TV show)
Did I dream you dreamed about me?
Were you here when I was flotsom?
Now my foolish boat is leaning
Broken lovelorn on your rocks

I will start with three recent quotes making key claims about the success of the PACE follow-up findings and discuss the evidence for each claim.

1) First, in a Mental Elf blog this week, (Sir) Professor Simon Wessely rightly details and praises the benefits of randomised controlled trials (RCT), concluding that PACE matches-up quite well. But to extend Prof Wessely's nautical motif, I'm more interested in how 'HMS PACE' was seduced by the song of the Sirens, forced to abandon methodological rigor on the shallow rocky shores of bias and confound.

Prof Wessely states
"There was no deterioration from the one year gains in patients originally allocated to CBT and GET. Meanwhile those originally allocated to SMC and APT improved so that their outcomes were now similar. What isn’t clear is why. It may be because many had CBT and GET after the trial, but it may not. Whatever the explanation for the convergence , it does seem that CBT and GET accelerate improvement, as the accompanying commentary pointed out (Moylan, 2015)."
It seems to me that specific claims for "no deterioration from the one year gains in patients originally allocated to CBT and GET" might be balanced by the equally valid statement that we also saw "no deterioration from the one year gains in patients originally allocated to SMC and APT".  Almost one-third of the CBT and GET groups did, however, receive additional treatments as did the SMC and APT groups.  As I mentioned in my previous post, the mean group scores at follow-up are now a smorgasbord of PACE interventions, meaning that the group means lack... meaning!

At best, the PACE follow-up data might show that deterioration did not occur in the GET and CBT groups to any greater or lesser extent than it did in the SMC and APT groups. The original groupings effectively no longer exist at follow-up and we should certainly not flit between explanations sometimes based on initial randomised groupings and sometimes based on additional nonrandomised therapies.

In terms of deterioration, we know only one thing - one group were close to showing a significantly greater number of patients reporting 'negative change' during follow-up and contrary to the claim, this was the CBT group (see Table D in supplementary materials)

Yes I'm Changing by Tame Impala

2) Second, in the abstract of the PACE paper by Sharpe et al draw conclusions about long-term benefits of the original therapy groups:
"The beneficial effects of CBT and GET seen at 1 year were maintained at long-term follow-up a median of 2·5 years after randomisation. Outcomes with SMC alone or APT improved from the 1 year outcome and were similar to CBT and GET at long-term follow-up, but these data should be interpreted in the context of additional therapies having being given according to physician choice and patient preference after the 1 year trial final assessment." (my italics)

Again for the reasons just outlined, we cannot infer that CBT and GET maintained any benefits at follow-up anymore than we could argue that APT or even the control condition (SMC) maintained their own benefits. The smorgasbord data dish prevent any meaningful inference

3) Finally, in a Commentary that appeared in Lancet Psychiatry alongside the paper (mentioned by Prof Wessely above), Moylon and colleagues suggest hypotheses about the benefits of CBT and GET remain 'unproven' but that CBT and GET may accelerate improvement:
"The authors hypothesise that the improvement in the APT and SMC only groups might be attributed to the effects of post-trial CBT or GET, because more people from these groups accessed these therapies during follow-up. However, improvement was observed in these groups irrespective of whether these treatments were received, and thus this hypothesis remains unproven. ....Overall, our interpretation of these results is that structured CBT and GET seems to accelerate improvement of self-rated symptoms of chronic fatigue syndrome compared with SMC or SMC augmented with APT..."

Round and Round by Ariel Pink's Haunted Graffiti
It's always the same, as always
Sad and tongue tied
It's got a memory and refrain
I'm afraid, you're afraid
And we die and we live and we're born again
Turn me inside out
What can I say...
Merry go 'round
We go up and around we go
Moylon et al rightly point out that improvement occurred "irrespective of whether these treatments were received, and thus this hypothesis remains unproven" . Although not apparent from the main paper, the supplementary material throws light on this issue, however, Moylon et al are only half right!

We can see in Table C of the supplementary material (see below) that those in the CBT group, APT and SMC showed significant improvements even when no additional therapies were provided - so, Moylon are correct on that score.  By contrast, the same cannot be said of the GET group. At follow-up, GET shows no significant benefit on measures of fatigue (CFQ) or of physical function (SF-36PF) ...whether they received additional adequate therapy, partial therapy or indeed, no further therapy!

This is even more interesting when we consider that Table C reveals data for 20 GET patients (16%) who had subsequently received 'adequate CBT' - and it clearly produced no significant benefits on their fatigue or their physical function scores. So, what are we to conclude? That CBT is ineffective? CBT is ineffective following GET? That these patients are 'therapy resistant'? Therapy resistant because they received GET?

Whatever the explanation, GET is the only group to show no improvement during follow-up. Even with no additional therapy, both the SMC controls and the APT group improved, as indeed did CBT. The failure of GET patients to respond to adequate additional CBT therapy is curious, not consistent with the claims made and does not look 'promising' for either GET or CBT.

The inferences described above do appear to be holed below the water line.

1) CBT does not appear to accelerate improvement least in people who have previously received GET
2) People who received GET show no continuing improvement post-therapy
and 3) CBT may heighten the incidence of 'negative change' .



Sunday, 1 November 2015

PACE - Thoughts about Holes

This week Lancet Psychiatry published a long term follow-up study of the PACE trial assessing psychological interventions for Chronic Fatigue Syndrome/ME - it is available at the website following free registration

On reading it, I was struck by more questions than answers. It is clear that these follow-up data show that the interventions of Cognitive behavioural Therapy (CBT), Graded Exercise Therapy (GET) and Adaptive Pacing Therapy (APT) fare no better than Standard Medical Care (SMC). While the lack of difference in key outcomes across conditions seem unquestionable, I am more interested in certain questions thrown up by the study concerning decisions that were made and how data were presented.

A few questions that I find hard to answer from the paper...

1) How is 'unwell' defined? 
The authors state that “After completing their final trial outcome assessment, trial participants were offered an additional PACE therapy. if they were still unwell, they wanted more treatment, and their PACE trial doctor agreed this was appropriate. The choice of treatment offered (APT, CBT, or GET) was made by the patient’s doctor, taking into account both the patient’s preference and their own opinion of which would be most beneficial.” White et al 2011

But how was ‘unwell' defined in practice? Did the PACE doctors listen to patient descriptions about 'feeling unwell' at face-value or did they perhaps refer back to criteria from the previous PACE paper to define 'normal' as patient scores being “within normal ranges for both primary outcomes at 52 weeks” (CFS 18 or less and PF 60+) . Did the PACE Doctors exclude those who said they were still unwell but scored 'normally' or those who said they were well but scored poorly? None of this seems any clearer from the published protocol for the PACE trial.

Holes by Mercury Rev
Holes, dug by little moles, angry jealous
Spies, got telephones for eyes, come to you as
Friends, all those endless ends, that can't be
Tied, oh they make me laugh, an' always make me

2) How was additional treatment decided and was it biased?
With regard to the follow-up phase, the authors also state that “The choice of treatment offered (APT, CBT, or GET) was made by the patient’s doctor, taking into account both the patient’s preference and their own opinion of which would be most beneficial”.

But what precisely informed the PACE doctors’ choice and consideration of “what would be most beneficial”?

They say “These choices were made with knowledge of the individual patient’s treatment allocation and outcome, but before the overall trial findings were known” This is intriguing …The doctors know the starting scores of their patients and the finishing scores at 52 weeks. In other words, the decision-making of PACE Doctors was non-blind, and thus informed by the consequences of the trial and how they view their patients have been progressing in each of the four conditions.

3) The authors say” Participants originally allocated to SMC in the trial were the most likely to receive additional treatment followed by those who had APT; those originally allocated to the rehabilitative therapies (CBT and GET) were less likely to receive additional treatment. In so far as the need to seek additional treatment is a marker of continuing illness, these findings support the superiority of CBT and GET as treatments for chronic fatigue syndrome.”

Because more participants were assigned further treatments following some conditions (SMC APT)rather than others (CBT GET), doesn't necessarily imply "support for superiority of CBT and GET" at all. It all depends upon the decision making process underpinning the choice made by PACE clinicians.  The trial has not been clear on whether only those who met criteria for being 'unwell' were offered additional treatment...and what were the criteria? This is especially pertinent since we already know that 13% of patients were entered into the original PACE trial who met criteria for being 'normal'

Opus 40 by Mercury Rev
"Im alive she cried, but I don't know what that means"

We know that the decision making of PACE doctors was not blind to previous treatment and outcome.
It also seems quite possible that participants who had initially been randomly assigned to SMC wanted further treatment because they were so evidently dissatisfied with being assigned to SMC rather than an intervention arm of the trial - before treatment, half of the SMC participants thought that SMC was 'not a logical treatment' for them and only 41% were confident about being helped by receiving SMC.
Such dissatisfaction would presumably be compounded by receiving a mid-trial Newsletter saying how great CBT and GET participants were faring! It appears that mid-trial, the PACE team published a newsletter for participants, which included selected patient testimonials stating how much they had benefited from “therapy” and “treatment”. The newsletter also included an article telling participants that the two interventions pioneered by the investigators and being trialled in PACE (CBT and GET) had been recommended as treatments by a U.K. government committee “based on the best available evidence.” (see

So, we also cannot rule out the possibility that the SMC participants were also having to suffer the kind of frustration that regularly makes wait-list controls do worse than they would otherwise have done. They were presumably informed and 'consented' at the start of the trial vis-a-vis the possibility of further (different or same) therapy at the end of the trial if needed? This effectively makes SMC a wait-list control and the negative impact of such waiting in psychotherapy and CBT trials is well-documented (for a recent example

Let us return to the issue of how 'need' (to seek additional treatment) was defined. undoubtedly the lack of PACE Doctor blinding and the mid-trial newsletters promoting CBT ad GET, along with possible PACE Doctor research allegiance would all accord with greater numbers of CBT (and GET) referrals ...and indeed, CBT being the only therapy that was further offered to some participants - presumably after not being successful the first time!). The decisions appear to have little to do with patients showing a ‘need to seek additional treatment” and nothing at all to do with establishing "superiority of CBT and GET as treatments for chronic fatigue syndrome.”


4) perhaps I have missed something, but group outcome scores at follow-up seem quite strange. To illustrate with an example, does the follow-up SMC mean CFQ =20.2 (n=115) also include data from 6 participants who switched to APT, 23 to CBT and 14 to GET? If so, how is this any longer labelled as an SMC condition? The same goes for every other condition – they confound follow-up of intervention with change of intervention. What do such scores mean…?  And how can we now draw any meaningful conclusions about any outcomes ...under the heading of the initial group to which they were assigned?

Thursday, 28 May 2015

Science & Politics of CBT for Psychosis

"Let me tell you about scientific management
...And the theft of its concealment"
The Fall - Birmingham School of Business  
Recently the British Psychological Society invited me to give a public talk entitled CBT: The Science & Politics behind CBT for Psychosis. In this talk, which was filmed (see link at the bottom), I highlight the unquestionable bias shown by the National Institute of Clinical Excellence (NICE) committee  (CG178) in their advocacy of CBT for psychosis.
The bias is not concealed, but unashamedly served-up by NICE as a dish that is high in 'evidence-substitute', uses data that are past their sell-by-date and is topped-off with some nicely picked cherries. I raise the question of whether committees - with such obvious vested interests - should be advocating on mental health interventions.  
Tim Hecker - Live Room + Live Room Out
I present findings from our own recent meta-analysis (Jauhar et al 2014) showing that three-quarters of all RCTs have failed to find any reduction in the symptoms of psychosis following CBT. I also outline how trials which have used non-blind assessment of outcomes have inflated effect sizes by up to 600%. Finally, I give examples where CBT may have adverse consequences - both for the negative symptoms of psychosis and for relapse rates
Clicking on the image below takes you to the video 
The video is 1 hour in length & is linked to accompanying slides
(note the last 3 mins lack sound)

Friday, 6 June 2014

Meta-Matic: Meta-Analyses of CBT for Psychosis

Meta analyses are not a 'ready-to-eat' dish that necessarily satisfy our desire for 'knowledge' - they require as much inspection as any primary data paper and indeed, afford closer we have access to all of the data. Since the turn of the year, 5 meta-analyses have examined Cognitive Behavioural Therapy (CBT) for schizophrenia and psychosis. The new year started with the publication of our meta analysis (Jauhar et al 2014) and it has received some comment on the BJP website, which I wholly encourage; however the 4 further meta-analyses in 4 last months have received little or no, I will briefly offer my own.

Slow Motion (Ultravox)

1)      Turner, van der Gaag, Karyotaki & Cuijpers (2014) Psychological Interventions for Psychosis: A Meta-Analysis of Comparative Outcome Studies

Turner et al assessed 48 Randomised Controlled Trials (RCTs) involving 6 psychological interventions for psychosis (e.g. befriending, supportive counselling, cognitive remediation); and found CBT was significantly more efficacious than other interventions (pooled together) in reducing positive symptoms and overall symptoms (g= 0.16 [95%CI 0.04 to 0.28 for both]), but not for negative symptoms (g= 0.04 [95%CI -.09 to 0.16]) of psychosis

The one small effect described by Turner et al as robust - for positive symptoms - however became nonsignificant when researcher allegiance was assessed. Turner et al rated each study for allegiance bias along several dimensions, and essentially CBT only reduced symptoms when researchers had a clear allegiance bias in favour of CBT - and this bias occurred in over 75% of CBT studies.

One included study (Barretto et al) did not meet Turner et als own inclusion criteria of random assignment. Barretto et al state "The main limitations of this study are ...this trial was not truly randomized(p.867). Rather, patients were consecutively assigned to groups and differed on baseline background variables such as age of onset being 5 years earlier in controls than the CBT group (18 vs 23). Crucially, some effect sizes in the Barretto study were large (approx. 1.00 for PANNS total and for BPRS). Being non-random, it should be excluded and with 95% Confidence Intervals hovering so close to zero, this makes an big difference - I shall return to this Barretto study again below

Translucence (Harold Budd & John Foxx)

2)  Burns, Erickson & Brenner (2014) Cognitive Behavioural Therapy for medication-resistant psychosis: a meta analytic review

Burns et al examined CBT’s effectiveness in outpatients with medication-resistant psychosis, both at treatment completion and at follow-up. They located 16 published articles describing 12 RCTs. Significant effects of CBT were found at post-treatment for positive symptoms (Hedges’ g=.47 [95%CI 0.27 to 0.67]) and for general symptoms (Hedges’ g=.52 [95%CI 0.35 to 0.70]). These effects were maintained at follow-up for both positive and general symptoms (Hedges’ g=.41 [95%CI 0.20 to 0.61] and .40 [95%CI 0.20 to 0.60], respectively).

Wait a moment.... what effect size is being calculated here? Unlike all other CBT for psychosis meta analyses, these authors attempt to assess pre-postest change rather than the usual end-point differences between groups. Crucially - though not stated in the paper - the change effect size was calculated by subtracting the baseline and endpoint symptom means and then dividing by ...the pooled *endpoint* standard deviation (and not, as we might expect, the pooled 'change SD'). It is difficult to know what such a metric means, but the effect sizes reported by Burns et al clearly cannot be referenced to any other meta-analyses or the usual metrics of small, medium and large effects (pace Cohen).

This meta analysis also included the non-random Barretto et al trial, which again is contrary to the inclusion criteria for this meta analysis; and crucially, Barretto produced - by far - the largest effect size for general psychotic symptoms in this unusual analysis (See forest plot below).



van der Gaag et al examined end-of-treatment effects of individually tailored case-formulation CBT on delusions and auditory hallucinations. They examined 18 studies with symptom specific outcome measures. Statistically significant effect-sizes were 0.36 for delusions and 0.44 for hallucinations. When compared to active treatment, CBT for delusions lost statistical significance (0.33), though CBT for hallucinations remained   significant(0.49). Blinded studies reduced the effect-size in delusions by almost a third (0.24) but unexpectedly had no impact on effect size for hallucinations (0.46).

van der Gaag et al state they excluded studies that "...were not CBTp but other interventions (Chadwick et al., 2009; Shawyer et al., 2012; van der Gaag et al., 2012). Shawyer et al is an interesting example as Shawyer and colleagues recognize it as CBT, stating “The purpose of this trial was to evaluate...CBT augmented with acceptance-based strategies" The study also met the criterion of being individual and formulation based.

More importantly, clear inconsistency emerges as Shawyer et al was counted as CBT in two other 2014 meta analysis where van der Gaag is one of the authors. One is the Turner et al meta analysis (described above) where they even classified it as having CBT allegiance bias - see below far right classification in Turner et al)


And ....Shawyer et al is further included in a 3rd meta-analysis of CBT for negative symptoms by Velthorst et al (described below), where both van der Gaag & Smit are 2 of the 3 co-authors.

So, some of the same authors considered a study to be CBT in two meta-analyses, but not in a third. Interestingly, the exclusion of Shawyer et al is important because they showed that befriending significantly outperformed CBT in its impact on hallucinations. The effect sizes reported by Shawyer et al themselves at end of treatment for blind assessment (PSYRATS) gives advantages of befriending over CBT to the tune of 0.37 and 0.52; and also for distress for command hallucinations at 0.40

While the exclusion of Shawyer et al seems inexplicable, inclusion of Leff et al (2013) as an example of CBT is highly questionable. Leff et al refers to the recent 'Avatar therapy' study and at no place does it even mention CBT. Indeed, in referring to Avatar therapy, Leff himself states that he "jettisoned some strategies borrowed from Cognitive Behaviour Therapy, and developed some new ones"

And Finally...the endpoint CBT advantage of 0.47 for hallucinations in the recent unmedicated psychosis study by Morrison et al (2014) overlooks the fact that precisely this magnitude of CBT advantage existed at baseline i.e. before the trial began...and so, does not represent any CBT group improvement, but a pre-existing group difference in favour of CBT!

Removing the large effect size of .99 for Leff and the inclusion of Shawyer et al with a negative effect size of over .5 would clearly alter the picture, as would recognition that the patients receiving CBT in Morrison et al showed no change compared to controls. It would be surprising if the effect then remained significant...

Hiroshima Mon Amour (Ultravox)

4. Velthorst, Koeter, van der Gaag, Nieman, Fett, Smit, Starling Meijer C & de Haan (2014) Adapted cognitive–behavioural therapy required for targeting negative symptoms in schizophrenia: meta-analysis and meta-regression

Velthorst and colleagues located 35 publications covering 30 trials. Their results showed the effect of CBT to be nonsignificant  in alleviating negative symptoms as a secondary [Hedges’ g = 0.093, 95% confidence interval (CI) −0.028 to 0.214, p = 0.130] or primary outcomes (Hedges’ g = 0.157, 95% CI −0.10 to 0.409, p = 0.225). Meta-regression revealed that stronger treatment effects were associated with earlier year of publication, lower study quality.

Aside from the lack of significant effect, the main findings of this study were that the large effect size of early studies has massively shrunken and reflects the increasing quality of later studies e.g. more blind assessments.

Finally, as Velthorst et al note, the presence of adverse effects of CBT - this is most clearly visible if we look at the forest plot below - where 13 of last 21 studies (62%) show a greater reduction of negative symptoms in the Treatment as Usual group!


Friday, 21 February 2014

The Farcial Arts: Tales of Science, Boxing, & Decay

"Do you think a zebra is a white animal with black stripes, or a black animal with white stripes?"

Why do researchers squabble so much? Sarah Knowles recently posted this interesting question on her blog - it was entitled Find the Gap. The debate arose in the context of the new Lancet paper by Morrison et al looking at the efficacy of CBT in unmedicated psychosis. I would advise taking a look at the post, plus the comments raise additional provocative ideas (some of which I disagree with) about how criticism in science should be conducted.

So, why do researchers squabble so much? First I would replace the pejorative squabble with the less loaded argue. In my view, it is their job to much as it is the job of politicians to argue, husbands and wives, children, everyone- at least in a democracy! Our 'science of the mind', like politics gives few definitive answers and .... so, we see a lot of argument and few knock-out blows.

'I am Catweazle' by Luke Haines
 What you see before you is a version of something that may be true...
I am Catweazle, who are you?

But we might ask - why do scientists - and psychologists in particular - rarely land knock-out blows? To carry the boxing analogy reason is that opponents 'cover up', they naturally defend themselves; and to mix metaphors, some may even "park the bus in front of the goal".

And like may sometimes seem to be about bravado. Some make claims outside of the peer-reviewed ring with such boldism that they are rarely questioned, while others make ad hominem attacks from the sidelines ...preferring to troll behind a mask of anonymity - more super-zero than super-hero.

A is for Angel Fish  

Some prefer shadow boxing - possibly like clinicians carrying on their practice paying little heed to the squabbles or possibly, even the science. For example, some clinicans claim that the evidence regarding whether CBT reduces the symptoms of psychosis is irrelevant since - in practice - they work on something they call distress (despite its being non-evidenced). Such shadow boxing helps keep you fit, but does not address your true strength - you can never know how strong your intervention is ...until its pitted against a worthy opponent, as opposed to your own shadow!


Despite this, the clashes do emerge between science and practice. Many fondly remember Muhammad Ali and his great fights against the likes of Joe Frazier (less attractive, didn't float like a butterfly). Fewer recall Ali's post-retirement battles including with the professional wrestler - Inoki - not a fair fight, not a nice spectacle and not decisive about anything at all - this is like the arguments between scientists and practitioners - they have different paradigms, aims and languages, with probably modest overlap - often a no-contest.

Race for the Prize by Flaming Lips

Is 'Normal Science' all about fixed bouts?
We should acknowledge that some bouts are 'fixed', with some judges being biased toward one of the opponents. Again in science, this may happen in contests between an established intervention (e.g. CBT for psychosis, anti-psychotic medication etc) and those arguing that the intervention is not so convincing. Judges are quite likely to be advocates of the traditional therapy, or at least the status quo - this is a part of Kuhnian normal science - most people have trained and work within the paradigm, ignoring the problems until they escalate, finally leading to replacement (paradigm shift). These changes do not occur from knock-out blows but from a war of attrition, with researchers hunkered down in the trenches possibly advancing and retreating yards over years. What this means is that its hard to defeat an established opponent - unseating an aging champion requires much greater effort than simply protecting that champion

This is Hardcore - Pulp
I've seen the storyline played out so many times before.
Oh that goes in there.
Then that goes in there.
Then that goes in there.
Then that goes in there. & then it's over.

Monster-Barring: Protective Ad Hoc Championship Belts
Returning to CBT for psychosis, nobody should expect advocates to throw in the towel - that is not how science progresses. Rather, as the philosopher of science Imre Lakatos argues, we would expect them to start barricading against attacks with their protective belt - adding new layers of ad hoc defence to the core ideas. Adjustments that simply maintain the 'hard core', however, will highlight the research programme as degenerative.
Not a leg to stand on

Of course, the nature of a paradigm in crisis is that ad hoc defences emerge inluding examples of what Lakatos calls 'monster barring'. To take an example, CBT for psychosis advocates have seen it as applicable to all people with a schizophrenia diagnosis and when this is found wanting, the new position becomes: schizophrenia is heterogeneous and we need to determine for whom it works- monster barring protects the hypothesis against counter-examples by making exceptions (not tested and evidenced of course). This could go on indefinitely of course: CBT must be delivered by an expert with x years training; CBT works when used in the clinic; CBT works for those individuals rated suitable for CBT infinitum...What happens ultimately is that people lose faith, break ranks, become quiet deserters, join new ascending faiths - nobody wants to stay on a losing team.

Although sometimes, like Sylvester Stallone, scientific ideas make a come-back...spirits raise and everyone gets hopeful again, but secretly we all know that comebacks follow a law of diminishing returns and with the prospect that holding on for too long comes increased potential for... harm. A degenerative research program may be harmful because it is a waste of time and resources, because it offers false hope, and because it diverts intelligent minds and funds away from the development of alternatives with greater potential.

"If even in science there is no a way of judging a theory but by assessing the number, faith and vocal energy of its supporters, then this must be even more so in the social sciences: truth lies in power." Imre Lakatos

Queensbury rules
All core beliefs have some acceptable protection, the equivalent of gum shields and a 'box' I suppose, but some want to enter the ring wearing a suit of armour - here I will briefly mention Richard Bentall's idea of rotten cherry picking which emereged in the comments of the Find the Gap blog. Professor Bentall argues that as researchers can cherry pick analyses (if they dont register those analyses), critics can rotten cherry pick their criticisms, focusing on things that he declares... suit their negative agenda. In essence, he seems to suggest that we ought to define what is acceptable criticism on the basis of what the authors declare as admissible! I have already commented on this idea in the Find the Gap post. Needless to say, in science as in boxing, you cannot be both a participant and the referee!

Spectator sport
Some love nothing more than the Twitter/blog spectacle of two individuals intellectually thumping each other. But for others, just like boxing, science can seem unedifying (a point not lost on some service users ). Not everybody likes boxing, and not everybody likes the way that science operates, but both are competitive and unlike Alice in Wonderland, not everyone is a 'winner', but then even the apparent losers often never disappear....thus is the farcial arts.

Thursday, 6 February 2014

My Bloody Valentine: CBT for unmedicated psychosis

When I critiqued Morrison et als exploratory CBT trial with people who stop taking anti-psychotic medication, I promised to write a post on the final study
Well it appeared in the Lancet today and a free copy is here. I am not going to describe the study in detail as it is excellently covered in the Mental Elf blog today. Contrary to the fanfare of glowing comments by highly respected schizophrenia/psychosis researchers, I think the paper has so many issues that I may need to write a second post. But I'm keeping it simple here to concentrate on the primary outcome data - symptom change scores on the PANSS.

'Soon' by My Bloody Valentine (Andy Weatherall mix)

The study examines schizophrenia patients who have decided not to take anti-psychotic medications; 37 were randomly assigned to 9 months CBT and 37 assigned to - what the authors call TAU (but is obviously quite an important manner that will become clear below)

What do the primary outcome PANSS scores (total, positive and negative symptoms) reveal?

Table 1. PANSS scores during the intervention (up to 9 months) and follow ups to 18 months

The key questions are:
Do the CBT and TAU groups differ in PANSS scores at the end of the intervention (9 months) and at the end of the study (18 months)? One simple way to address both questions is to calculate the Effect Sizes at 9 months and at 18 months.

9 months
PANSS total       =  -0.37   (95 CI -0.96 to 0.22)
PANSS positive  =  -0.18  (95 CI -0.77 to 0.40)
PANSS negative =  -0.45  (95 CI -1.04 to 0.14)
Examination of effect sizes at the end of the intervention (9 months) reveals that CBT and TAU groups do not differ significantly on any of the three primary outcome measures at the end of intervention (i.e. all CIs cross zero)

18 months
PANSS positive is nonsignificant, while PANSS total and PANSS  negative effect sizes are moderately sized, the lower end CIs are very close to zero (at -0.05 and -0.08) suggesting marginal significance
18 months
PANSS total = -0.75 (95 CI -1.44 to -0.05)
PANSS positive = -0.61 (95 CI -1.27 to 0.05)
PANSS negative = -0.45 (95 CI -1.47 to -0.08)

A closer inspection of the means shows that the significant differences at 18 months almost certainly reflects an increase in symptom scores for the TAU group rather than a decrease for the CBT group (compare CBT at 9 and 18 months and TAU at 9 and 18 months)

My final and crucial point concerns within group symptom reduction
Table 2 shows the baseline PANSS scores on primary outcome measures and its informative to compare change from baseline within each group (CBT and control)
Table 2. PANSS scores at baseline

If we compare baseline and the end of the intervention 9 months:

PANSS total
CBT group show a reduction from 70.24 to 57.95 =12.29 
TAU group show a reduction from 73.27 to 63.26 =10.01

PANSS positive
CBT group show a reduction from 20.30 to 16.0 =4.30
TAU group show a reduction from 21.65 to 17.0 = 4.65

PANSS negative
CBT group show a reduction from 13.54 to 12.50 = 1.04
TAU group show a reduction from 15.49 to 14.26 = 1.23

So, after 9 months of intensive CBT intervention, controls - who don't even receive a placebo - show a greater reduction in positive and negative symptoms !

Moreover, the 'natural' reduction shown at 9 months by TAU is as large as the reduction shown by the CBT group at the very end of the trial (18 months: PANSS total =13.77; PANSS pos 5.67 and PANSS neg 1.01) - no significant difference exists between TAU reduction at 9 months and CBT reduction at 9 or 18 months

What then have Morrison et al shown?
I would argue that their data show, for the first time, how patients who choose to be unmedicated display fluctuations in symptomatology (as we might expect given they are unmedicated) ...but crucially, these fluctuations are as large as the changes seen in the CBT group. Hence, it is reasonable to ask...have Morrison et al simply documented 'normal fluctuation' in the symptomatology of unmedicated patients ...and nothing to do with CBT