Does your favourite policy work? Toss a coin to find out

My AFR oped today is on randomised policy trials, with a particular discussion of what I think is the most fascinating randomised trial now in place in Australia – the Head Injury Retrieval Trial. I’m grateful to commenter Mark, who first drew HIRT to my attention, to Nicholas Gruen for comments on an earlier draft, and to Alan Garner for taking the time to talk with me about it. Full text over the fold.

A Good Test of Public Policy, Australian Financial Review, 8 April 2008

To get a new drug approved in most developed countries, it is necessary to show that it works in a randomised trial. Yet to get a new policy approved, politicians need no evidence of efficacy. Consequently, while we can be confident that most pharmaceuticals work as intended, it is quite possible that some of our social policies do more harm than good.

To understand why medical scientists rely so heavily on randomised trials, we need to go back to the purpose of an evaluation. In judging the effectiveness of any intervention, we want to know the counterfactual: what would have happened if we had not intervened? In the case of a new pharmaceutical, those who choose to take a drug are probably different from those who choose not to take it. Perhaps pill-poppers worry more about their health, or maybe they live closer to the doctor. If so, then those who chose not to take the drug are a bad comparison group for those who actually took the drug.

Enter the randomised trial. By assigning participants to the treatment and control group with the toss of a coin, we can be sure that the characteristics of both groups are identical at the start of the trial. So at the end of the experiment, any differences in outcomes must be due to the intervention.

What works in the laboratory can also work in many areas of policy. Here, the power of randomised trials lies in two things. From a statistical standpoint, they are regarded as the ‘gold standard’ of policy evaluation, beloved by policy wonks. And from a policymaking standpoint, randomised trials are the simplest form of evaluation, providing compelling results in a simple graph.

In the policy arena, the United States has conducted many more randomised trials than any other country. For example, one of the reasons that early childhood intervention is so high on the policy agenda is the results from the Perry Preschool program. For social researchers seeking to understand neighbourhood effects, there is no better source of evidence than the five-city Moving to Opportunity experiment. Many of the early insights about health insurance came from the RAND Health Insurance Experiment. And wage subsidy programs rapidly gained ground after the National Supported Work Demonstration was conducted.

Randomised policy trials can also show up policy failure. A randomised evaluation of the US Job Training Partnership Act found that job training for low-skilled youths did not make them more employable. Randomised evaluations of pre-licence driver education programs have found no evidence that it makes youths into safer drivers. And DARE, a school-based anti-drugs program, was revised following randomised trials showing that the program did not deliver promised results.

One excuse that Australian policymakers sometimes give for failing to conduct randomised trials is that they cannot face the ethical dilemma of denying some people a potentially beneficial new program. But here again, the policymakers can learn from medical researchers.

For the past two years, an NRMA CareFlight team, led by Alan Garner, has been running the Head Injury Retrieval Trial, which aims to answer two important questions: Are victims of serious head injuries more likely to recover if we can get a trauma physician onto the scene instead of a paramedic? And can we justify the extra expense of sending out a physician, or would the money be better spent in other parts of the health system?

To answer these questions, Garner’s team is running a randomised trial. When a Sydney 000 operator receives a report of a serious head injury, a coin is tossed. Heads, you get an ambulance and a paramedic. Tails, you get a helicopter and a trauma physician. Once five hundred head injury patients have gone through the study, the experiment will cease and the results will be analysed.

Although he has spent over a decade working on the trial, even Garner himself admits that he doesn’t know what to expect from the results. “We think this will work”, he told me a in a phone conversation last week, “but so far, we’ve only got data from cohort studies”. Indeed, he points out that “like any medical intervention, there is even a possibility that sending a doctor will make things worse. I don’t think that’s the case, but [until HIRT ends] I don’t have good evidence either way.”

For anyone who has heard policymakers confidently proclaim their favourite new idea, what is striking about Garner is his willingness to run a rigorous randomised trial, and listen to the evidence. Underlying the HIRT is a passionate desire to help head injury patients, a firm commitment to the data, and a modesty about the extent of our current knowledge. What area of Australian public policy could not benefit from a little more of this kind of thinking?

Andrew Leigh is an economist in the Research School of Social Sciences at the Australian National University.

This entry was posted in Economics Generally. Bookmark the permalink.

4 Responses to Does your favourite policy work? Toss a coin to find out

  1. conrad says:

    The obvious reason we arn’t all using randomized designs for policy/intervention progams etc. has nothing to do with restricting unevaluated programs — its because its extremely costly to do (especially if we’re talking about double blind stuff and designs where the policy/experiment designer doesn’t run the program), which is one of the reasons drugs are so expensive.

    There are three other reasons :
    1) For smaller (i.e., most) programs, its too time consuming, and if you’re just a small fry type researcher, you’ll never get you program run/written/up and published in your grants lifespan, so its simply impracticle and you won’t get funded next time.
    2) Even if you do really good work in that area, it is unlikely to get cited much, because people won’t have time/money to continue or replicate the work. This means you won’t get funding again because you are now a low impact person.
    3) There’s too much risk involved. If your project happens not to work, you’ll never get it published, and hence you won’t get funded again.

  2. DLB says:

    I take Conrad’s points, but have to say they are a little bit pessimistic and not related to the whether the question being asked is a good one or not.
    One of the difficulties with trials such as the one cited is that the numbers required, calculated from statistical tables for appropriate “powering” of the study, are often large and take a long time to gather enough data where the outcome can be definitively known (i.e., subjects not lost to follow-up). The study is quoted as having been going for a decade. In that time, technology and treatments change. New acute interventions may have been introduced. Ambulances might reach their destinations quicker by use of GPS, for example.This means that there is a confounding variable in the data set. Does this apply equally to both arms of the trial?
    While scientifically admirable, trials are also sometimes used by bureaucrats to delay the roll-out of a new technique or treatment. In Australia, we are often asked to “re-prove” overseas results ‘in the Australian setting’. We get to the end of a long, drawn-out trial to find we have replicated what we already knew at the start. We struggle to get the sort of numbers and throughput in countries such as USA, UK and, increasingly, China and India.
    There is nothing wrong with appropriate trials, but they should be timely to remain as unbiased and uncorrupted as possible.

  3. Andrew Leigh says:

    Conrad, if you compare randomised trials to consultants’ reports, I agree they look costly. But if you compare them to policy failure, they’re a bargain. Given that caring for just one severely head-injured youth for the rest of his life can cost millions, HIRT’s total price tag of $11 million can be justified if it helps policymakers merely avert a few severe injuries.

  4. conrad says:


    I guess it depends on what you want it for — I agree with you on the big policy ones. However, perhaps most things that are done are smallish projects (at least in education — like you Perry preschool program if I remember correctly). Even big meaningful school-help type projects would cost at most 100-200K to run unless they were run by the Department of Education (with that amount, for example, you could probably test 200 kids across the year on a remediation program for some type of learning disorder and still have change left over to see if the effects lasted once they were off the program). In addition, whilst randomized trials may be best practice, I don’t see why you need such designs for most things — including your example. I’m sure a non-randomized design would have worked fine given that I imagine there are fairly large samples of people getting different types of treatments for reasons not related to their injury. You just need to be careful that your samples are really matched — this is going to be better in some cases than randomization if your sample sizes are not huge and if there is a large amount of variability in your population.

    My suggestion would be that non-randomized designs are fine for most things, but if people want to sell their “help” programs (common in many areas), better designs should be used, especially if the government is coughing up the bucks (this is a big problem in my area — there’s huge amounts of quackery where people try and sell cures that are no better than tiger oil).

Comments are closed.