The placebo response: Meaning or muddle? phenomenon or phantasm?
The term ‘placebo response’ has long been, and generally still is, a poorly defined and an imprecisely used concept. In so far as there are indeterminate influences on the perceived assessments of ‘outward responses’ to interventions and medications, these are multifarious, depending on; experimental subjects, the nature of the condition in question, regression to the mean, meaning and expectation, circumstances, measurement methods (objective vs subjective), trial methodologies, and trial durations.
Recent research into such mechanisms includes aspects of; the statistical artefact of regression to the mean, the type of assessment (subjective rating vs. objective measurement), and psychological factors such as expectation and meaning. This indicates these mechanisms may contribute to apparent reductions of symptom severity during the course of a treatment trial in characteristic ways.
The current undifferentiated conceptualisation of the placebo notion makes it little more than an incantation convincing people they have protection from unspecified scientific evils and transgressions. The uncritical employment of the placebo notion continues to complicate and damage both medical research, and trials performed to establish the benefits of drugs, because it is in part a muddle and in part a phantasm.
“When I use a word,” Humpty Dumpty said, in rather a scornful tone, “it means just what I choose it to mean—neither more nor less.” “The question is,” said Alice, “whether you can make words mean so many different things.”
Lewis Carroll (Charles Dodgson, Oxford Don) Through the Looking-Glass
There is no more endearing author or quotation with which to start this commentary.
This old Jefferson Airplane song, ‘White Rabbit’, is about Alice and is a must listen — so apposite for a discussion about placebo (the lyrics are cleverly close to the wording in the book ‘Through the Looking-Glass’). It starts (after a nice little base riff):
One pill makes you larger, one pill makes you small
And the ones mother gives you don’t do anything at all
The conjectured but ill-defined ‘placebo response’ is a central, if deictic, notion* underlying the ‘evidence-based medicine’ enterprise. Accordingly, one would presume that it was a well-defined, well-studied, and well-validated phenomenon — it is not. Far from it. It is a nebulous and ill-defined entity of little proven empirical use or demonstrated heuristic value. Alice might well have gone on to contend that if you do make words mean different things then they come to mean nothing at all. To which a contemporary wag who likes self-referential oxymorons might reply ‘Absolutely Alice’!
*I use the vague word ‘notion’ deliberately, because the ‘placebo response’ cannot be elevated to the status of either a ‘concept’, or a theory: it is not properly ‘conceptualised’, nor sufficiently coherently formulated to justify the epithet ‘theory’.
It is time to make the term ‘placebo response’ redundant.
The natural variation in severity of subjective symptoms over the short-term, seen in many illnesses, especially when of a lesser degree of pathological severity, and which underlies ‘regression to the mean’, is not the same as a response related to psychological meaning/expectation, nor to failed blinding resulting in distortion of subjective ratings — yet these three distinct factors, and others, are uncritically lumped together under the rubric ‘placebo response’. More importantly, they each require different clinical trial methods to disentangle their influence.
If you find it difficult to accept the points above, then just consider the negligible progress made by the thousands of clinical trials of new antidepressants in ‘placebo-controlled’ trials — this has prompted Professor Stahl recently to state that we need a new paradigm . As the ‘Lancet 21 antidepressant meta-analysis’ recently revealed, when people torture these data by the dubious method of meta-analysis, no useful difference or superiority emerges for any of these new drugs — how could it? The situation now exists where virtually every new antidepressant has been ‘proven’ to be superior to placebo, and also every other antidepressant, in what I have previously dubbed ‘Penrose stairs with drugs’ (i.e. A > B > C > A).
And that has taken approximately 40 years and many thousands of drug trials.
Not exactly a glowing testimony to the success of placebo controls! We need to get back to a broader-based clinical judgement based on more objective outcome measures in real-life clinical situations and in patients with severe biological depression.
After decades of use the term placebo gets the almost meaning-free medical dictionary definition of ‘a usually pharmacologically inert preparation prescribed more for the mental relief of the patient than for its actual effect on a disorder.’ And the SOED definition of ‘placebo effect’ is (do not hold your breath); ‘A beneficial effect produced by a placebo drug or treatment, which cannot be attributed to the properties of the placebo itself, and must therefore be due to the patient’s belief in that treatment.’
The superficial, illogical (must therefore be due to the patient’s belief, an obvious logical non sequitur), and incorrect nature of both those ‘definitions’ makes my point for me: I requiem meam doleat, as Cicero might have said (‘I rest my case’).
Back to Jefferson Airplane and Alice again:
When logic and proportion have fallen sloppy dead
And the white knight has fallen backwards and the red queen’s off with her head
Remember what the dormouse said
Feed your head, feed your head
What a mess
The key phrase, as above, is, ‘apparent reductions of symptom severity during a treatment trial’.
Symptoms do not an illness make*. Or, more precisely; symptoms do not accurately define the pathological severity or progression of an illness.
Sounds familiar? Aristotle (translated as); ‘One swallow does not a summer make’.
Not only is there no agreement about what a placebo-effect is, but also, the imprecise use of the term has generally only served to muddy the waters of investigation.
I can hear the rising chorus of protest, chanting the mantra of the ‘clinical-trial-faithful’, that the placebo arm of a trial is essential to distinguishing whether a true drug-response is being demonstrated — no, it is not. It is neither essential to the task, not adequate for the task. Like tits on a bull, in fact.
Remember that Sir Austin Bradford Hill famously stated [2, 3];
Randomization and blinding [and statistical analysis] of studies is only necessary when treatment effects are small.
I remind readers that Hill’s historic report on the occupational diseases of workers in the cotton mills contained dozens of tables of data, but not one single statistical test.
He stated explicitly that the differences were so obvious that statistics were unnecessary — and the converse is usually true; if statistics are necessary then the differences are probably small and of little real-life consequence.
Or, more prosaically, you do not need an RCT to tell you that parachutes work [4, 5]: spoiler; ‘spoof’ references.
Indeed, one might note that in 50 years, clinical observations have produced more insights about treatment effects than controlled trials, something which should not be forgotten (see below for more about how many senior members of the profession, who remember history, have commented on these issues it, especially on the website of the ‘International network for the history of neuropsychopharmacology’.
What the above illustrates is that trials, and the associated statistical analyses (valid or not) over the last five or more decades simply established that there are only minor differences between these drugs (these may be relevant in some circumstances and cases). What a waste of half-a-century of research.
This is what the recent much-discussed ‘Lancet 21 antidepressant study’ concluded:
‘Our assessment overall found few differences between antidepressants when all data were considered’.
’Our information unfortunately cannot guide next-step choices after failure of such a first step (i.e., they do not apply to treatment resistant depression), for which well performed trials are scarce.’
This is mind-numbing stuff: a stream of words jostles for attention to describe it; inconsequential, trivial, trifling, insignificant, valueless …
If there were substantial and meaningful differences between treatments there would be no necessity* for randomisation, placebo controls, or statistics.
*NB. I am using language precisely; Before anyone collapses in spluttering indignation, I am not stating that those procedures are of no use, just that they are not necessary except in special circumstances and for ‘fine-tuning’. But ‘fine-tuning’ is no use when you are on the wrong wavelength.
We are not looking for fine tuning, but rather as Prof Stahl said, we are looking for a paradigm-shift in treatment effectiveness: a switch of wavebands is needed, not fine-tuning. A definite difference in degree of efficacy is going to be obvious in severely ill patients without all of this investigational paraphernalia — in other words a return to Roland Khun and imipramine, see International network for the history of neuropsychopharmacology’.
So, the conclusions after that massive investment of time, energy, and money are uninformative; and not only are the results unhelpful concerning ‘first-time’ treatment, but also they cannot necessarily be generalised to apply to those who have failed to respond to whatever their primary care doctor gave them, nor to any patient with serious depression, nor any patient referred to a specialist.
Efficacy and effectiveness are two quite different things; efficacy (in clinical trials) is poorly correlated with (real-world) effectiveness [6, 7].
Classes of errors
Why is it desirable to have a better understanding of what explains the ‘placebo response’?
One element is ‘wishful thinking’ by both doctors and patients; they want improvement to occur, and when the measurement instruments used are subjective, like the interpretation of a statement about the severity of a symptom, it is easy for minor influences on the way people think to alter the severity of the rating assigned. That only needs to affect a few questions on a typical rating-scale to create the degree of difference that is indicated in the various meta-analyses, like the Lancet study mentioned above.
Second, there are measurement errors related to the inevitable imprecisions of assigning values to variables. These are inevitably greater for subjective measures. It has been demonstrated that the changes in placebo groups are greater for measures on subjective rating scales than they are for ‘objective’ changes (such as BP measurements, ECGs etc.) That may not be surprising when one thinks about it, yet that has not been something that has been considered or accounted for in most discussions on this subject.
Here is a brief list of a few useful and suggestive findings:
The placebo response to objective outcome measures is consistently less than it is for subjective outcome measures
The placebo response is greater when rating scales are assessed by non-medically qualified personnel
The placebo response is greater in milder relapsing-remitting conditions, than it is in more severe conditions
The placebo response is significantly related to psychological factors such as the treatment expectations of patients and doctors, and personality
The placebo response is accounted for in part by ‘regression to the mean’ because patients usually present, or are enrolled, because their symptoms are worse than usual, and hence are likely to improve in the short to medium term
The degree of temporal variability in severity of subjective symptoms varies with the severity of the underlying pathology and must be accounted for when assessing the endurance of a ‘placebo’ response. That means the end-point assessment is significantly trial-duration-dependant
When the patient samples are heterogenous, as they frequently are because of the imprecision of the definition of many conditions, the natural ‘improvement’ with no treatment at all (waiting list control group) is significant (partly related to 5).
Long-term follow-up tends to be associated with lower placebo response, possibly both because the psychological optimism that colours initial assessment becomes less over time and regression to the mean is less significant over longer time-spans.
The time-window over which symptoms are measured facilitates the inclusion of extraneous data, because pharmacological effects can be predicted to have a specific time course. It thus follows that including data gathered outside of that time window makes assessments less precise (see below).
The small degree of change in all these comparisons (because the treatments are only weakly effective) makes the above effects of greater relative magnitude
Conversely, if the effects were of a substantial clinical magnitude and relevance these confounding factors would be largely irrelevant
Nature and time-course
To what degree are ‘placebo’ responses enduring, or are they transient, and how do they differ between ‘subjective’ and ‘objective’ outcome measures. How do they vary between chronic waxing and waning conditions, as opposed to ‘progressive ones (such as heart failure), do they vary according to personality type, social circumstances and the like? Do they affect severe symptoms (and signs) as much as mild ones … and so on?
As an example of the point made in 8) above, consider this experiment: measurements of the hypotensive effect of intravenous phentolamine. The time-course of this hypotensive response is very specific, like 1-2 minutes; if the experiment was extended to include BP changes that took place 5-30 minutes after the infusion the results will become less and less clear. This would be even more pronounced in anxious patients who had a high expectation of a treatment effect. Such results would not elucidate the pharmacological effect of phentolamine.
What bedevils much research is excessive reliance on subjective, interim (short-term), unvalidated, proxy measures of improvement. Medium to long-term outcome measures such as suicide, functional recovery, and return to work, are conspicuous by their absence. A ‘placebo’ may improve scores on an imperfectly designed and administered subjective rating-scale, but it probably does not affect the long-term suicide rate, or get patients functioning and back to work. No data. Forty years of trials, and almost no data. It is stretching a point to call that good science.
Add to that difficult-to-define heterogenous conditions like depression and you have a perfect recipe for placebo confusion.
A little science
When investigating the effect of specific pharmacological agents on pathological processes the methodologies used should assess those changes as directly as possible and eliminate changes effecting the outcome measures utilised that may be caused by other factors. A key part of this process, that is frequently ignored, is the process of establishing the relationship of the dose, and the duration of its administration, with the outcome (however objectively or subjectively that may be measured).
If the time course of improvement following the administration of the drug varies between one day and six months, then it will be appropriate to question what possible mechanism could account for this variation. That is because we know that drugs have delineated effects which occur within specific timeframes. Changes occurring outside those timeframes are unlikely to be related to any specific mechanism.
Irrespective of how good, or bad, the measuring-instrument being used may be, the changes measured are expected to fall within a delineated time-range; otherwise, any proposed cause-and-effect relationship becomes dubious, to the point of implausibility.
Yet when it comes to antidepressant drugs this universal scientific principle seems to have been forgotten, or ignored.
Is the entity (‘diagnosed condition’) a heterogenous one of established aetiology
Is the natural history known?
Is the diagnosis established by subjective (rating scales) or objective tests (e.g. blood tests)?
Investigations concerning the placebo effect
There are now some data on how the placebo response affects ‘subjective’ assessments relative to ‘objective’ ones [8, 9] — for science pedants out there, I should mention that I’m not stating there is an absolute distinction between the categories of subjective and objective, but there is clearly a major relative difference between a rating scale and a measurement of the blood level of uric acid, or the BP measurement. That is reflected by work suggesting substantially greater apparent improvement in placebo groups, when assessed by subject ratings, compared to objective ones. It is hardly news that both doctors and patients have hopes and expectations from the benefits of treatments and that, despite attempts, completely successful ‘blinding’ of active treatments is hard to achieve. When the differences being found are of a minor degree (as they mostly are) such factors assume a relevance and importance they would not have if the treatment effect was large (cf. Hill).
Importantly, most trials have not included a ‘no treatment’ arm to control for the contributions of natural disease progression, regression to the mean, or other outside influences.
The outward appearance and subjective symptoms of many conditions wax and wane over time, even if the underlying pathology is inexorably progressive. In chronic conditions, where this is more pronounced, like rheumatoid arthritis, or multiple sclerosis, or anaemia (if ‘untreatable’) we do not expect a close and consistent short-term correlation between disease progression and symptoms (whether treated or untreated). Such conditions are more likely to present for treatment (or be recruited into trials) during a period of perceived symptom exacerbation, thus regression to the mean will inevitably occur. Indeed, when I used to lecture on the MRC course in London 40 years ago, one of my lecturers was entitled ‘Baked bean therapy’. I explained how to prove that baked beans cured any chronic relapsing-remitting condition.
‘The dose of baked beans required for benefit to occur varies quite a lot from patient to patient (‘it is all to do with metabolism you know’), so one has to start with small dose for a couple of weeks, and then increase it in two or three stages if improvement does not occur after a couple of weeks at any particular stage.’
Inevitably, at some point in this process, a majority of patients will enjoy an improvement. At which point the essential step is to hold the dose the same for only a short time (but longer if necessitated!), before decreasing it slightly when improvement is evident. Then, whatever happens next, you’re on a winner. If the patient gets worse that proves they needed a slightly bigger dose, and if they don’t it means the baked beans are working really well, … and so on.
This ensures success in the majority of cases and this is what explains the improvement in many cases of the treatment of depression where often the doctor deceives himself, before next deceiving the patient.
It would be absurd to expect a consistently good short-term correlation in cases of anaemia, between haemoglobin and tiredness and weakness assessed on a self-rating scale. Yet this is the sort of lax methodology employed in depression trials.
Thus, we already have two substantive mechanisms by which an apparent improvement may appear to have occurred when no specific treatment has been given; 1) regression to the mean, and 2) mis assessment of subjective symptoms whilst failing to measure the underlying pathology or functional outcome.
Kirsch  has discussed the subtle psychological aspects of this
As Benedetti (2014) has reminded us, “there is not one single placebo effect, but many” (p. 623).
The distinction between response expectancies and stimulus expectancies is particularly important for understanding placebo effects. Response expectancies are stronger, more stable, and more resistant to extinction.
Khan Additionally, while techniques like 24-hr ambulatory blood pressure monitoring have been shown to increase reproducibility [13–15] and yield lower estimations of placebo response [16–19],
Example of sham ECT form 1st 2 or 4 sessions and the effect on time to response.
1. Stahl, S.M. and G.D. Greenberg, Placebo response rate is ruining drug development in psychiatry: why is this happening and what can we do about it? Acta Psychiatr Scand, 2019. 139(2): p. 105-107.
2. Hill, A.B., The Environment and Disease: Association or Causation? Proc R Soc Med, 1965. 58: p. 295-300.
3. Worrall, J., Causality in medicine: getting back to the Hill top. Prev Med, 2011. 53(4-5): p. 235-8.
4. Smith, G.C. and J.P. Pell, Parachute use to prevent death and major trauma related to gravitational challenge: systematic review of randomised controlled trials. BMJ, 2003. 327(7429): p. 1459-61.
5. Yeh, R.W., et al., Parachute use to prevent death and major trauma when jumping from aircraft: randomized controlled trial. BMJ, 2018. 363: p. k5094.
6. Singal, A.G., P.D. Higgins, and A.K. Waljee, A primer on effectiveness and efficacy trials. Clin Transl Gastroenterol, 2014. 5: p. e45.
7. Boissel, J.P., et al., From Clinical Trial Efficacy to Real-Life Effectiveness: Why Conventional Metrics do not Work. Drugs Real World Outcomes, 2019. 6(3): p. 125-132.
8. Wechsler, M.E., et al., Active albuterol or placebo, sham acupuncture, or no intervention in asthma. N Engl J Med, 2011. 365(2): p. 119-26.
9. Ramtvedt, B.E. and K. Sundet, Relationships Between Computer-Based Testing and Behavioral Ratings in the Assessment of Attention and Activity in a Pediatric ADHD Stimulant Crossover Trial. The Clinical Neuropsychologist, 2014. 28(7): p. 1146-1161.
10. Kirsch, I., Response expectancy and the placebo effect, in International review of neurobiology. 2018, Elsevier. p. 81-93.