Randomised Control Trials and their limitations for use within educational research
by Nick Hassey, Teach First
The recent What Works Network report contained a section from the Education Endowment Foundation (EEF) purporting to set out clear evidence on what works in education. A key part of their evaluation approach is increasing reliance on the use of Randomised Control Trials (RCTs) for both evaluating new interventions and as the cornerstone of the meta-analysis they conduct. However there are several methodological issues with RCTs (for more detail on the methodological problems with RCTs, and the alternatives set out here, see the work of Harvey Goldstein), particularly within education, that mean they can sometimes give a less accurate indication of the impact of interventions than other methods or approaches, and may be giving us a misleading picture of 'what works.'
Something that sounds obvious but is often overlooked is that it is not just the act of randomising the allocation of people to the 'treatment' and 'control' groups that has made RCTs so powerful in medical research. The other key concept is that of 'blindness'. Medical RCTs go to great lengths to ensure neither the patient nor the doctor know who is receiving the treatment and who is in the control group. This process, known as imposing a double blind, is important in providing controls for both the placebo effect and for any changes doctors may make in their behaviour if they knew which patients were in which group.
The issue of blinding is crucial because in education research it is often impossible to make RCTs double, or even single blind. Teachers will almost always know which children are in which group, and often the children will know as well. For research into issues where teachers and/or pupils already hold prior convictions this inability to impose blinds poses a significant risk to the validity of the study as both groups are likely to modify their behaviour in response to the imposition of a 'treatment'. In medical research such studies would "usually be regarded as difficult to justify because results my reflect expectations as much as real effects of treatment"(Goldstein & Blatchford, 1998). Indeed because RCTs involve an active intervention they can have "lower validity" (ibid.) than an observational study which does not impose an intervention which might artificially raise or lower expectations.
Research into class size is probably the best example. Many teachers hold very strong opinions on the matter and care about their pupils, with the result that it is almost inconceivable that they would not modify their behaviour to try and help the children placed in the control group (usually a larger class). This act introduces an uncontrolled variable into the experiment and undermines any attempts to isolate the 'effect' of smaller classes.
Even controlling for the issue of 'blinding' (say it was possible to find teachers and pupils who had no prior opinions on the intervention) there would still be an issue of compositional and in test contamination effects, for the schools or groups of pupils selected for the study.
For randomisation to allow you to extrapolate out from the sample to the wider population, the different elements of your study (e.g. schools) need to be similar both to each other and to the wider population. This allows you to be sure the 'effects' you see are; due only to the intervention, and that they will be replicated if implemented across the wider population. This is problematic because schools are not all alike, especially in England, where they may vary considerably in pupil composition, internal organisation and curriculum. As a result any intervention involving more than one school faces serious issues in trying to control for school characteristics and composition as well as ensuring the sample is still representative enough of the population to be able to make claims about the general 'effectiveness' of an intervention.
Even within a single school an RCT faces issues. For an RCT to effectively test the intervention, the objects of study (usually pupils and teachers) are also required to act independently of each other, and as anyone who has every set foot in a school will know, that simply isn't possible in education. Pupils and teachers interact with each other making it impossible to 'contain' the experiment. Pupils in an intervention group may teach their peers in the control group the skills they are being taught, which would make any effect of the 'treatment' impossible to observe relative to the control. In this case randomising doesn't allow us to assume that 'effects' observed in the sample are due only to the intervention and so do not allow us to judge whether the intervention 'works'.
The key summary point here is that RCTs can be a very powerful tool for measuring the effect of an intervention, but for the RCT results to be valid, it must be possible to isolate the effect of the intervention confidently from other potential contaminating effects and for the system under investigation to be internally consistent and a valid representation of the whole population. Such an experimental approach to education is not impossible – it might be appropriate for large-scale studies where subjects (students or teachers) are likely to respond in the same way to the intervention under examination. However, such examples are rare and rather than assuming an RCT is always the best form of evaluation, researchers need to think very carefully if an RCT is the most appropriate method for the particular intervention they are studying.
- Goldstein & Blatchford, Class Size and Educational Achievement, 1998