Saturday, 16 January 2016

Science is 'Other-Correcting'

Les Autres (by Keith Laws)
 “Are you all sitty comftybold two-square on your botty? Then I'll begin.”

Some scientists seem to think that 'publication' represents the final chapter of their story...but it is not ...and has never been. Science is a process, not a thing - it's nature is to evolve and some stories require postscripts. I have never been convinced by the ubiquitous phrase 'Science is self-correcting'. Much evidence points to science being conservative and looking less self-correcting and more ego-protecting. It is also not clear why 'self' is the correct description - most change occurs because of the 'other' - Science is other correcting. What is different from the past however, is that corrections or postscripts - often in the form of post publication peer review - now have far greater immediacy, accessibility, exposure and impact than previously.

This post details a saga that transpired when some colleagues and I highlighted data errors in a recent paper by Turkington and colleagues (2014) examining the efficacy of Cognitive Behavioural Therapy (CBT) for psychosis. It is a tale in 6 parts - the original paper and 5 letters published by the Journal of Nervous and Mental Disease ...the latest being published this month.

Montague Terrace in Blue (Scott Walker)

Your eyes ignite like cold blue fire
The scent of secrets everywhere
A fist filled with illusions
Clutches all our cares

My curiosity about this paper was piqued by the impossibility of data reported in Table 2 of the paper (see my Tweet below). A quick glance at the QPR intrapersonal values displays a mean and lower end CIs both of which are -.45 ...something was clearly amiss as the mean effect size cannot have the same value as the lower end 95% confidence interval.

This led to a brief Twitter discussion between myself and three other psychologists (Tim Smits, Daniel Lakens, Stuart Ritchie) and further oddities in the paper quickly emerged. The authors assert that multiple outcome measures revealed a significant benefit of CBT
"Parametric and nonparametric tests showed significant results between baseline and follow-up for all primary and secondary outcomes except for social functioning, self-rated recovery, and delusions" Turkington et al 2014
Curiously however, the paper contains no inferential statistics to support the claims - readers are expected to accept such assertions at face value. Moreover, their claims about significant statistical tests are themselves contradicted by the confidence intervals that they present in their Table 2 - the 95% Confidence Intervals cross zero for every variable and so, would all be non-significant. Add to this some very poorly presented Figures (see Figure 5 below), lacking axis labels and claiming to display standard error bars, which are in fact standard deviation bars

...clearly a paper with basic errors, which escaped the seven authors, the reviewers and the editor! 
These initial concerns were documented in blog posts by Daniel Lakens How a Twitter HIBAR ends up as a published letter to the editor and Tim Smits "Don't get all psychotic on this paper"
2.  Smits, T., Lakens, D., Ritchie, S. J., & Laws, K. R. (2014). Statistical errors and omissions in a trial of cognitive behavior techniques for psychosis: commentary on Turkington et al. The Journal of Nervous and Mental Disease, 202(7), 566.
Our Twitter discussion coalesced into the above letter, which was published by the Journal of Nervous and Mental Disease ...where we suggested that our concerns
"...mandate either an extensive correction, or perhaps a retraction, of the article by Turkington et al.(2014). At the very least, the authors should reanalyze their data and report the findings in a transparent and accurate manner"
We assumed that the authors might be pleased to correct such errors, especially since the findings might have implications for the psychological interventions offered to people diagnosed with schizophrenia  
3. Turkington, D. (2014). The reporting of confidence intervals in exploratory clinical trials and professional insecurity: a response to Ritchie et al. The Journal of Nervous and Mental Disease, 202(7), 567.
We were somewhat surprised when Professor Turkington alone replied. He incorrectly cites our letter (it was not Ritchie et al but to Smits et al), suggesting perhaps that he may not have taken our points quite that seriously...a view underscored perhaps by both the title and the content of his letter. Without apparently reanalysing or checking the data, he asserts:
" I can confirm that the findings have been accurately reported in our published article... In Table 2, the confidence intervals are calculated around Cohen’s d as indicated...The labeling of the axes on Figure 5 is self-evident" Turkington 2014
At this point, Professor Turkington also decided to e-mail the first author Tim Smits with an invite:
I wonder if you would attend a face to face debate at Newcastle University to debate the issues. Your colleagues McKenna and Laws have already been slaughtered in London.”
Professor Turkington refers here to the 50th Maudsley Debate (CBT for Psychosis has been oversold) - where you can view the video of our apparent 'slaughter'. Tim Smits politely declined, emphasising the importance of Turkington and colleagues actually addressing the errors we had highlighted in their paper. 
William Basinski DIP 4 from Disintegration Loops
4. As a result, we made a request to the authors and the Journal for access to Turkington et als' data, which was kindly provided. Re-analysing their data confirmed all of our suspicions about the errors. We sent our re-analyses to Professor Turkington and his colleagues
We then sent a second letter to JNMD conveying our re-analyses of their data:
We recalculated the effects sizes, which were quite different from those reported by the authors- in some cases, strikingly so (if you compare our Table 2 below with the one in my Tweet above from Turkington et al). Our re-analysis confirmed that the confidence inetrvals were incorrect and every effect size was inflated an average of 65%

At this point we received an email from the JNMD editor-in-Chief was clearly not intended for us
"Shall we just publish a correction saying it doesn't alter the conclusions"
Another email from the editor rapidly followed ...asking us to ignore the previous email, which we duly did ...along with some White Bears

For comparison purposes of course, we included Turkington et al s original Table 2 and in an ironic twist, the Journal of Nervous and Mental Disease invoiced us (106.29 Euros) for reproducing a Table presenting incorrect data in was of course eventually waived following our protest.

Living my Life by Deerhunter
Will you tell me when you find out how to conquer all this fear
I've been spending too much time out on the fading frontier
Will you tell me when you find out how to recover the lost years
I’ve spent all of my time chasing a Fading Frontier
5. At this point, JNMD decided to ask their own biostatistician - rather than the authors - to respond to our letter:
In his letter, Dr Cicchetti attempted to rebuff our critique in the following manner
To be perfectly candid, the reader needs to be informed that the journal that published the Lakens (2013) article, Frontiers in Psychology, is one of an increasing number of journals that charge exorbitant publication fees in exchange for free open access to published articles. Some of the author costs are used to pay reviewers, causing one to question whether the process is always unbiased, as is the desideratum. For further information, the reader is referred to the following Web site:
Cicchetti is here focussing on a paper about effect sizes cited in our letter and written by one of our co-authors - Daniel Lakens. Cicchetti makes the false claim that Frontiers pays it's reviewers. This blatant poison-the-well fallacy is an unfortunate attempt to tarnish the paper by Daniel and by implication, our critique, the journal Frontiers and quite possibly much of Open Access

Unfortunately, Dr Cicchetti failed to respond to any correspondence from us about his letter and his false claim remains in print ...
Beyond this, Dr Cicchetti added little or nothing to the debate, saying that everything rests on
"...the assumption that the revised data analyses are indeed accurate because I [Cicchetti] was not privy to the original data. This seemed inappropriate, given the circumstances."
As remarked here by Tim Smits in his post entitled "How credibility spoiled this mini-lecture on statistics",  this is a little disingenuous. Cicchetti did have access to the data and so, why not comment upon the accuracy of what we said?

Cicchetti's response was further covered by Neuroskeptic in his blog 'Academic Journals in Glass Houses' and by Jim Coyne in his Sordid Tale post

Finally just this week, JNND published another letter
Sivec, H. J., Hewit, M., Jia, Z., Montesano, V., Munetz, M. R., & Kingdon, D. (2015). Reanalyses of Turkington et al.(2014): Correcting Errors and Clarifying Findings. The Journal of Nervous and Mental Disease, 203(12), 975-976.

Finally, this month, the original authors - with the notable absence of Turkington - but with 2 added statistical advisors...seem to accept the validity of the points that we made .... up to a point!
"In short, the original presentation of the effect size results were incomplete and inaccurate, and the original published data would not be appropriate for future power calculations or meta-analyses as noted by Smits et al. (2015). However, at the descriptive level the reported conclusions were, in the main, relatively unchanged (i.e., while being cautious to generalize, there were modest positive effects). " Sivec et al 2015
Are the conclusions "relatively unchanged"? I guess it depends whether you think - for example - a doubling of the effect size from .78 to 1.6 for CPRS total symptoms makes much difference? Whether an overall average effect size inflation of 65% by Turkington et al is important or not? Whether future power estimates should incorrectly suggest an average sample size of n=45 or more accurately n=120. The latter point is crucial given the endemic low levels of power in CBT for psychosis studies
Regarding incorrect error bars, Sivec et al, now say:
"Consistent with the Smits et al. (2014) original critique, our statistical consultants also found that, for figures reported as histograms with standard error bars, the bars represented standard deviations and not standard errors. Because the figures are used for visual effect and because the bars are intended to communicate variation within the scores, figures were not reproduced with standard errors" Sivec et al 2015
and finally
"In the original article, we reported "Parametric and nonparametric tests showed significant results between baseline and follow-up for all primary and secondary outcomes except for social functioning (PSP), self-rated recovery (QPR), and delusions”  (p. 33). This statement is true only for the parametric tests...To be clear, most of the nonparametric tests were nonsignificant. This was not a critical part of the analyses" Sivec et al 2015
So, after 5 letters, the authors  - or most of them - acknowledge all of the errors - or most of them. Sivec et al obviously want to add a caveat to each 'admission', but the central fact is that the paper remains exactly as it did before we queried it. This means that only somebody who is motivated to plough through the subsequent 5 letters would discover the acknowledged errors and omissions and the implications

Sensitivities regarding post-publication peer review are quite understandable, and perhaps as with the 'replication' initiative, psychologists and journals need to evolve more acceptable protocols for handling these issues. Is post-publication peer review worth the effort - "Yes!" The alternative is a psychology that echoes a progressively distorting noise while the true signal is a fading frontier.



  1. Carolyn Wilshire16 January 2016 at 16:42

    Fascinating and shocking story, Keith. And a nice illustration of how the failure to respond appropriately to a problem can end up being more damaging for all concerned than the original problem itself.

    I notice the two primary authors are MDs,not PhDs, and it also appears from their letters that they rely on others to do their statistics for them, without fully understanding the results (even simple concepts like CIs). I wonder if this is part of the problem too? Perhaps non research professionals wishing to publish need to ensure they are appropriately trained in research basics first.

  2. Sadly, this is pretty much the norm - a whistle stop tour of Pubpeer uncovers large numbers of queries that remain unanswered. While you are right that publication is the start of a process, journals, authors, institutions and evaluation systems (from job interview to university 'performance') consider publication to the the final step.

  3. It is an old model of how scholarship works, which is that each scholar builds a block and sets it down, and others add blocks to it until you have an edifice - vs. what you describe, a model of knowledge as a river, a process. Also the model of knowledge as being impacted by perspective (not controlled by, just impacted by) so that it's valuable to have others look at what you've been doing.

    In the first model, criticism is combative, because either my block is the one that belongs there, or yours is. If you win, I lose. You can BUILD on my block - you can complement me and say you're adding to what I've done - but you can't CONTEST that my block belongs there. Worse, the researcher may feel it necessary to fight all critics to keep control over their block of knowledge.

    In the second and third models (knowledge as a flow; knowledge as impacted by perspective), there's room to learn from each other, to change as we go along, without being "defeated."

    As in the current Lancet/PACE controversy, when you try to barricade your block of knowledge from all criticism, you only make matters worse. (Something that Nixon learned the hard way.)