An Evidence Hierarchy for Social Policymakers

The Economic Roundup, Treasury’s in-house journal, has just released its first issue for 2009. Evidence is a bit of a theme for the issue, and among the articles, I have one that discusses the idea of a medical-style ‘evidence hierarchy’ for social policymakers.*

As one possible evidence hierarchy, I’ve suggested that the following might be used:

A possible evidence hierarchy for Australian policymakers  

1. Systematic reviews (meta-analyses) of multiple randomised trials

2. High quality randomised trials

3. Systematic reviews (meta-analyses) of natural experiments and before-after studies

4. Natural experiments (quasi-experiments) using techniques such as differences-in-differences, regression discontinuity, matching, or multiple regression

5. Before-after (pre-post) studies

6. Expert opinion and theoretical conjecture

All else equal, studies should also be preferred if they are published in high-quality journals, if they use Australian data, if they are published more recently, and if they are more similar to the policy under consideration.

Naturally, an evidence hierarchy will always be just a rule-of-thumb. If time is short or the issue is new, theory or before-after studies might be the only thing that’s available. But for decision makers who are choosing between a large number of studies, an evidence hierarchy might help discern the wheat from the chaff.

* By ‘social policymakers’, I mean people who formulate social policy, as distinct from those who make policy with their friends over a few beers.

This entry was posted in Economics Generally, Randomisation. Bookmark the permalink.

14 Responses to An Evidence Hierarchy for Social Policymakers

  1. Kevin Cox says:

    When we construct information systems – and social policy makers require information systems to implement their policies – we find that 60% of the costs are in the user interface and the biggest productivity improvements are found for the least cost by doing usability testing.

    I expect that policy makers would gain a lot from usability testing as a way of trying to work out the best policy for a particular problem. That is, you can have a great policy idea and approach but if people cannot understand it then it is unlikely to effective. Usability is not about the user interface it is about what goes on in people’s heads when they use the system.

    Perhaps a lot of variability in evidence can be explained by usability. That is, if you implement it well it works well if you implement it poorly it does not work.

  2. Don Arthur says:

    There are two things I’d like to see more of in evaluations of social policy:

    1. More experiments; and

    2. More evaluations that unpack and test the program’s ‘theory of change’ (program logic).

    Unless you know why a program worked with a particular client group at a particular time and in a place, it’s difficult to generalise from the results.

    I’d like to see far more attention to developing and testing theories. Many impact evaluations are ‘black box’ evaluations. The report tells you that, for example, a training program did not improve employment outcomes. Was it because participants didn’t complete the course? Was it because those who did complete didn’t learn new skills? Was it because employers didn’t need workers with those skills? Was it because the credential was stigmatising? Too often you’ve got no idea.

    Drug treatments are far simpler to evaluate than social policies.

  3. invig says:

    How about computer simulation?

    How about all the ‘models’ that Treasury supposedly has developed?

    Or are they merely trends in observed data that, when a new circumstance arises (such as now), prove entirely useless as a means of prediction.

    Is it not true that economics has become merely the discernment of patterns in data without recourse to theory development? Since the supply-demand-equilibrium model is completely unrealistic? And no one is brave enough to make this call as your continued pay packets and status as a profession depend upon the subterfuge continuing?

    How many of the conversations around coffee at the Treasury venture into this territory, Andrew?

  4. conrad says:

    I might say that there is a big trade-off when you run huge medical-style randomized experiments — they’re expensive and you often need huge numbers versus other types of designs. I imagine it is also very area specific, but for the price of one medical-style experiment, you could run a whole series of smaller experiments on the same thing. It’s also the case that one good experiment can be better than any number of not-as-good experiments. It’s therefore not clear to me that one could ever have a general order of what type of evidence is best.

  5. Bruce Bradbury says:

    Randomised trials are good at answering simple questions (has a policy of type X been shown to work?). However, it is vary rare to have randomised trial data on the policy you are considering. A more feasible evidence base is likely to be found when there are strong links in all parts of the following chain:

    Causal evidence => sound theory => policy implications

    Your hierarchy only relates to the first of these. If the evidence relates to the policy under question, then the theory link might be considered trivial. If not, we need to place a lot of weight on this part. Many of the models developed by economists are all about having a sound theory and understanding the implications of this.

    As for your causal evidence hierarchy, I think most evidence tends to fall under heading 4. But this covers a wide spectrum of quality. On the one hand, many natural experiments are about as good as we can get. On the other many regression models are subject to obvious unobserved variables bias. A method for formally categorising quality within this heading would be really useful.

  6. derrida derider says:

    Jeez, Andrew, I thought your spell in the bureaucracy would cure you of that yearning for academic purity. You would have seen some of the difficulties in evidence-based policy.

    You can never get the politics out, and it’s useless telling a Minister that his pet scheme – whispered into his ear by his favoured lobby – has no evidence base. Policy-based evidence (ie “evaluations” showing what a Good Thing that pet scheme was, despite the nasty things the Opposition said about it at the time) is more common than true evidence-based policy.

    Budget cycles means you just don’t have time to create random trials, etc. You’re often doing really well if you can get your hands on (5) and (6) in your hierarchy. If you want better evidence it has to come from better post-hoc evaluations.

    IMO we need a proper evaluation agency that reports to the parliament, not the government – maybe you could expand ANAO’s remit. That evaluation agency can then outsource or do work in-house as appropriate.

  7. I think DD is right on this. Even increasing the level of (6) would be a major achievement.

  8. Don Arthur says:

    Imagine you have a highly successful program. It resonates with the party faithful, the media love it, the opinion polls are favourable (especially in the marginals), the Department of Finance has given it the tick, and it’s easy for your department to implement.

    Do you really want to subject it to a rigorous evaluation?

  9. derrida derider says:

    Yep, Don, that’s exactly what the problem is; everyone involved in the program has a strong interest in finding it a success.

    This motivation, BTW, is sometimes strongest at the grassroots delivery level – few people can live with the prospect that they’ve busted a gut for a year or two delivering a program that was a waste of time. You have to pay close attention to protocols in randomised trials or the program deliverers can frustrate the randomisation.

    All of which is why you need an independent evaluation agency.

  10. Don Arthur says:

    DD – I like the idea of getting the ANAO to do more performance audits or creating a new evaluation agency that answers to the parliament.

    Another attractive idea is for philanthropists, academics and non-profits to get together to trial innovative programs.

    I’ve spoken to people in non-profits who’ve run randomised trials and you’re right — it’s difficult, and there’s a very real possibility that the results will be negative.

    But I still think it’s an idea worth exploring. If you can demonstrate that an idea works and generate favourable publicity it should be easier to get government support.

    Philanthropists (foundations, corporates or individuals) and non-profits could do the risky research and development work and governments could turn the most successful ideas into full-scale programs.

  11. The trouble with government performance audits is that they don’t necessarily have staff competent to do the job – having had colleagues dealing with a ‘performance’ audit who had to spend countless hours tutoring the auditors. Two distinct possibilities emerge from this – that the auditors will be captured by the audited, who hold all the knowledge; or that they won’t be captured but will arrive at erroneous conclusions.

    Another difficulty is that many programs don’t have clear objectives. What is the purpose of the $4 billion we spend subsidising higher education tuition? Nobody seems sure, giving us no clear criteria against which to measure success or failure.

  12. derrida derider says:

    Andrew, I agree with you about the problems of an auditing approach – I’ve seen both possibilities you cite realised. If you did it by expanding ANAO they’d need both a shitload of specialists and plenty of interagency mobility to get some poachers to turn gamekeeper.

    But you’d still be miles ahead of the present situation .

  13. Sue Funnell says:

    The concept of a hierarchy of evidence within the context of evaluating programs (program evaluation is one type of social policy development tool) has been the subject of considerable debate in the USA. The US Office of Management and Budget included a hierarchy much like that proposed by Andrew Leigh in its guidelines for Program Assessment and Review Tool (PART) and invited comment.

    The American Evaluation Association prepared a response that was quite critical of the hierarchical approach and in particular questioned the notion of the RCT as the gold standard. The flaws of the RCT (as well as its strengths in a fairly limited set of conditions) have long been recognised by the evaluation community both internationally and in Australia although from time to time they get a new lease of life through such papers as that produced for PART.

    Some of the AEA criticisms of RCT and the general approach proposed by OMB have included:

    RCTs are weak with respect to the goal of program improvement (this goes to the issue of the weaknesses of black box evaluation and the important role of program theory raised in previous comments)

    RCTs do not by themselves explicitly address construct validity.

    RCTs are weak with respect to generalizability or external validity.

    Addressing RCTs’ validity problems often entails investment in companion program evaluations that have methodological designs other than RCTs.

    RCTs often overlook the demonstrated importance of mixed methods

    The need to address feasibility and resource issues realistically.

    The need to address equity and human subjects concerns realistically.

    The AEA paper discusses the benefits of many different approaches and emphasizes the importance of choosing the types of evidence that are fit for purpose rather than establishing a hierarchy of evidence. It is only in very rare cases that the so called gold standard of RCT is both fit for purpose and feasible.

    The following links to the OMB paper and the AEA response may be of interest:

    Click to access 2004_program_eval.pdf

    Click to access aea08.omb.guidance.responseF.pdf

  14. Sue Funnell says:

    Further to my last comment, I should have mentioned that the author of the AEA position paper, Professor William Trochim. Professor of Policy Analysis and Management at Cornell University, will be a keynote speaker at the Australasian Evaluation Society conference to be held 31 August to 4 September in Canberra. The theme of the conference is Evidence and Evaluation.

    Will Trochim is the immediate past president of the AEA. He wrote the paper with input from across the AEA which has several thousand members. It is really the AEA’s paper not just Will’s.

    For further information about the conference go to

Comments are closed.