Main

December 06, 2009

Research shows that using blogs, texting and social networking sites improves children’s literacy skills - or does it?

A news item appeared on the BBC website this week with the headline: ‘Children who use technology are “better writers”‘ (see:  http://news.bbc.co.uk/1/hi/technology/8392653.stm). The claim is based on a survey of 3,001 children aged 9-16 commissioned by the National Literacy Trust that explored their use of new communication technologies such as: blogs, texting and social network sites. From the news item it appears that the main finding is that: ‘of the children who neither blogged nor used social network sites, 47% rated their writing as "good" or "very good", while 61% of the bloggers and 56% of the social networkers said the same.’

In response to these findings, Jonathan Douglas, Director of the National Literacy Trust, was quoted as saying: ‘Our research suggests a strong correlation between kids using technology and wider patterns of reading and writing. […] Engagement with online technology drives their enthusiasm for writing short stories, letters, song lyrics or diaries.’ Moreover, and in response to the claim that the use of blogging, texting and social networking sites damages literacy, he went onto state that: ‘Our research results are conclusive - the more forms of communications children use the stronger their core literary skills.’

Now, I’ve not had time to read the full report as yet but there are at least two problems with the conclusions being drawn above:

1. From the news item, it appears that the survey did not actually measure children’s literacy skills. Rather it focused simply on their own self-perceptions of how good their writing is. At the very best, therefore, all that can be claimed here is that the more that children use such communication technologies, the more that they are likely to have positive self-perceptions of their literacy skills.

2. While there may well be a ‘strong correlation’ between these two things, it is impossible without further evidence to make any claims about what may be causing what. It is certainly premature for the Director of the National Literacy Trust to conclude that they have ‘conclusive results’ showing that it is children’s use of such technologies that increases their core literacy skills. While this may be the case, there is also an equally plausible explanation: that children with greater literacy skills (or, in their case, a greater perception of their literacy skills) are more likely to then use literacy-based technologies such as blogging, texting and social network sites more. Moreover, there may not be any direct relationship between the two at all. It may be, for example, that there is some other factor - for example a child’s socio-economic background - that has an influence on both literacy skills and use of technology. Thus, the more affluent a child’s background, the more likely they are to have higher literacy skills and also to have greater access to, and thus make greater use of, such technologies. It is not inconceivable, therefore, that literacy skills and technology use are completely unrelated.

This second point – that correlation does not equal causality – is a fundamental one that students should have learnt from any basic research methodology course. The fact that a well-respected organization such as the National Literacy Trust can be confusing the two in their own research findings is a poor reflection on the state of educational research in the UK. Moreover, the fact that the BBC’s own ‘technology reporter’ can simply report such claims uncritically, as in this news item, does nothing to help improve the situation.

September 26, 2009

Walden University’s College of Education produces teachers who are more effective in improving pupils’ reading fluency. Really?

A glossy advertisement on the back of the latest issue of Educational Researcher (the official journal of the American Educational Research Association, AERA, no less) grabbed my attention. Apparently, and as the headline exclaims: “New study shows that students of Walden teachers make greater gains in reading fluency.”

The claim is based upon research commissioned by Walden University’s Richard W. Riley College of Education and Leadership that compared the effectiveness of teachers who had graduated with their master's degree compared to that of teachers who had graduated with master's degrees elsewhere. As the glossy advert went onto explain:

“In a unique collaboration with Tacoma Public Schools in Tacoma, Washington, researchers compared the reading fluency of students taught by Walden Master’s-educated teachers with students taught by non-Walden Master’s-educated teachers. The study revealed that students of teachers who graduated from Walden’s Elementary Reading and Literacy programme had gains in reading fluency that were on average 4.8 words per minutes, or 14%, greater than students of non-Walden Master’s-educated teachers.”

This is a huge claim. It is not surprising that Walden's College of Education chose to buy a glossy advert on the back of the prestigious AERA magazine to publicise it. What College wouldn’t want to let the world know that their masters degree is proven to be more effective than others? Students will clearly want to graduate from Walden given that a Walden degree is evidence that you are a more effective teacher. The advert encourages readers to visit their website at http://www.WaldenU.edu/tacoma for more information on the research. Fortunately, the full report of the research is also available to download from the website and can also be downloaded directly from here: http://bit.ly/2IZDLm 

So, are the claims in the advertisement true? Well, the research that lies behind these findings is based on a relatively small sample (the main element of which compares the reading scores of children taught by just 35 graduates from Walden with those taught by 35 graduates of other programmes). However, the findings are statistically significant so we can be sufficiently confident that the differences between the two groups are unlikely to have occurred by chance. Moreover, the researchers use appropriate statistical techniques – hierarchical linear modelling – for analysing the data they have (nearly 4,000 pupils clustered in 70 classes).

Interestingly, the researchers are a little more cautious in their own interpretation of the findings. As they explain in the executive summary: “Limitations on the research design do not allow for a claim of causation between the completion of the Walden degree and teaching effectiveness. However, [the findings] ... provide suggestive evidence that the program may indeed improve the effectiveness of elementary literacy instruction” (p. 3).Of course everything rests on these ‘limitations’ that, not surprisingly, fail to get a mention in the glossy advert and that do not seem to be considered by the researchers to be that serious to stop them claiming that they have “suggestive evidence” that the Walden programme “is making teachers more effective at reading and language arts instruction” (p. 21). Well, here’s the main limitations, taken directly from the research report (pp. 22-23):
 
  1. “While we were able to use matching to control for differences in teacher experience between the Walden and the control group samples, we did not have information on teachers’ credentials, prior education (i.e., bachelor’s degree institution and major field of study), or professional development/training experiences. It is plausible that any differences in student reading gains are not due to Walden’s M.S. in Education program, but due to systematic differences in these other factors between Walden teachers and the comparison group teachers.”
  2. The inference from the estimated effect is the difference in earning a Walden M.S. in Education degree with a specialization in Elementary Reading and Literacy relative to earning any other type of master’s degree (as represented in the control group). It is plausible that teachers who seek out specialized degrees in elementary literacy instruction are more likely to be successful at reading instruction than those who seek out degrees in other areas. In fact, they may pursue the degree because they have higher self-efficacy as it relates to literacy instruction. Consequently, the estimated effect of the Walden program may stem from this self-selection and the unobserved differences in reading instruction effectiveness between those who sought out the ERL program and those who did not.
  3. The samples were too small to control for “school effects” (i.e., the effects on student achievement that are common to all students within a given school). Therefore, it is possible that the difference in performance between Walden teachers and non-Walden teachers is due to the programs and policies used in the schools where they teach rather than to their own classroom instruction.
  4. "While we were able to control for some student demographic characteristics, there were a number of unobserved factors that might also explain these differences, for example students’ socioeconomic status or home circumstances."
In relation to three of the four limitations (1, 3 and 4), these are significant but are to be expected from such a research design where it is simply not possible for students to be  randomly assigned to the main and control groups. As the researchers quite rightly point out, the positive gains found among the pupils taught by Walden graduates could be due to a range of unidentified systematic differences between these graduates and their comparators. This is why the researchers quite rightly state that it is not possible to make “a claim of causation between the completion of the Walden degree and teaching effectiveness.” It is also why they also present their research as “suggestive evidence”.

All of the above is quite reasonable and to be expected with a pragmatic evaluation of this type. However, it is the second limitation that is much more problematic and represents a fundamental flaw in the research design. Interestingly, it is hidden away in the body of the report and not mentioned at all either in the Executive Summary or the main Conclusions. Not surprisingly, it doesn’t feature at all in the glossy advertisement.And yet, this second limitation completely undermines the validity of the claims being made. In essence, we’re not comparing “like with like” at all. Rather, we’re comparing students that have taken a master’s degree with a specialisation in elementary reading and literacy with students who have simply taken generic master’s degrees. There is thus no way of knowing whether the additional gains made in reading fluency among the pupils taught by the Walden graduates (which are actually fairly small by the way and not consistent across year groups)  were due to the effectiveness of the Walden programme itself (i.e. compared to other specialist elementary reading and literacy master’s programmes) or the fact that it is due simply to the students having had more specialist training in elementary reading and literacy.

This is a crucial point. Remember that the headline in the glossy advert claimed that: “New study shows that students of Walden teachers make greater gains in reading fluency.” This is clearly misleading as it encourages the reader to believe that there is evidence that the Walden programme is more effective than other comparable specialist programmes. As it is, the study provides no evidence at all that Walden teachers are any more effective in producing gains in reading fluency than teachers with equivalent specialist qualifications from any other College.

     

September 19, 2009

Why does the UK Government, with £6m at its disposal, also find it so difficult to do a simple evaluation?

This week, the Home Office published the findings of the first phase of its £6 million evaluation of Blueprint, a multi-component school-based drug education programme targeted at secondary school children in Years 7 and 8. The reports are available at: http://bit.ly/22SLI

With such resources at its disposal one would expect a rigorous evaluation with some clear evidence of whether the programme is effective or not (initially in relation to children’s levels of drug awareness and, in the longer-term, their attitudes and behaviour). After all, undertaking an evaluation isn’t rocket science. You invite a number of schools to take part, you randomly split them into two groups – one that will deliver the programme and one that will act as a control/comparison group – and then you just collect some data from all the children before the programme starts and then again at the end. If the children in the programme schools have shown progress (in terms of awareness, attitudes and/or behaviour) above and beyond those in the control group then you have strong evidence that the programme has been effective.

Unfortunately, the research team responsible for the evaluation of the Blueprint programme failed to follow even this simple design. They were advised to use 50 schools in order to generate sufficient data to detect any effects that might be associated with the programme. However, they felt that the use of such a sample size was “a very large step for an improvement in the limited UK evidence based” (p. 32) and thus, presumably, a step too far. This is just nonsense. Only this summer we (the Centre for Effective Education) published the results of a randomised controlled trial of a pupil mentoring scheme involving 50 schools and over 800 children (the full report is available from our website at: http://www.qub.ac.uk/cee). Moreover, we’re just writing up another trial involving 80 preschool settings and 1,500 3-4 year old children and their parents.

Instead, the research team referred to guidance from the Medical Research Council that, in the evaluation of complex interventions, a “cumulative approach” is required “to understanding how outcomes are achieved, moving from theory, to modelling, to an exploratory trial to a definitive trial” (p. 32). This is indeed an eminently sensible and pragmatic approach to take and one we have also adopted as well. Most recently we have just completed an “efficacy test” of an early childhood programme in 10 preschool playgroups (5 delivering the pilot programme and 5 acting as a control group).

However, and curiously, the “exploratory trial” the research team chose to conduct for the Blueprint programme involved 30 schools. Clearly too large for a proper exploratory trial and insufficient for a full-blown study. Unfortunately, the problems don’t just stop here. Inexplicably, the research team decided to only select six of the 30 schools to act as a comparison (control) group and then decided not to randomly select them but to hand-pick them. As it turned out, the characteristics of these six comparison schools proved to be significantly different to the remaining 23 schools (one dropped out) delivering the programme and so they cannot now be used for any meaningful comparisons at all.The catalogue of errors involved in this trial are well outlined by Ben Goldacre in the latest entry in his commendable “Bad Science” column in The Guardian, see: http://bit.ly/ECcq5.  It is just astounding that the Home Office could have ended up with such a half-baked evaluation, especially given the amount of funding they set aside for this and the clear advice they were given as well as the expertise at their disposal (see Goldacre’s column for more details).

I have previously asked the question “why some educational researchers find it so difficult to do a simple evaluation” (see: http://bit.ly/6tfJ). Then, I used an example of a small evaluation conducted by a couple of educational researchers that was reported at the BERA Conference. That was bad enough; reflecting, as I argued, a more general lack of competence among sections of the British educational research community in conducting simple evaluations of the effectiveness of educational programmes and interventions. However this present example is simply in a different league. What hope can we have for the future when even the New Labour government – the self-styled proponents of evidence-based policy – can’t even undertake a simple evaluation for themselves?

 

September 07, 2009

Why do some educational researchers find it so difficult to do a simple evaluation?

Here's an example of an evaluation of an educational programme taken from a paper presented last week at the British Educational Research Association Annual Conference at a session I attended. To maintain anonymity I will keep the description of the study fairly vague. The point is not to be critical of the specific authors of the paper, for they are far from the only ones to adopt this type of approach, but to raise a more general point about the nature of educational research.

The paper described what was actually a very interesting educational initiative that attempted to motivate children through the use of a particular strategy. The presenters clearly knew their subject area and provided a convincing case theoretically for why the use of that strategy may help to motivate children. They also described a pilot scheme where this approach was trialled for a short period of time. However, the evaluation that was undertaken of the effectiveness of the strategy, and that the presenters then went onto report, was unfortnately probably one of the worst examples of an evaluation I have seen.

Part of the evaluation involved the teachers rating the children's levels of motivation into four categories (‘very motivated’, ‘engaged’, ‘somewhat engaged’ and ‘negative’) for the 63 who participated in the pilot scheme. The results were presented in a table, reproduced below exactly as it appeared in the paper, with the children being broken down by their entering grade (year group):


Entering     |      Very       |      Engaged   |    Somewhat   |   Negative
Grade        | Motivated    |                      |    Engaged     |
------------------------------------------------------------------------------------------------------
2               |        5         |          7         |           3         |          3
3 or 4        |        8         |         13         |           4         |          1
5 or 6        |        2         |          6          |           4         |          1
7+             |                  |          1          |           1         |          4
------------------------------------------------------------------------------------------------------
Total          |      15        |          7          |           12        |          9
 

The presenters interpreted these data as follows: “The teachers’ descriptions indicated that 15 of them were very motivated by [using the strategy], 27 were somewhat motivated, 12 were not very engaged, and 9 found it to be a negative experience. In general, in this population, students aged 8-11 years-old [i.e. those in entry grades 3 or 4] were more likely to be motivated by [the strategy] than younger or older students.”

Now, there are three main problems with this interpretation of the data that should be apparent to anyone who has done even an elementary course in educational research methods:

  1. There’s no pre-test scores. How, therefore, can we tell whether the children’s levels of motivation have actually changed at all during the course of the pilot scheme?
     
  2. There’s no comparison or control group. Even if we had pre-test scores and we could see that the children’s motivations had increased over the course of the pilot scheme, how do we know that this improvement was down to them participating in the pilot scheme and not due to something else?
     
  3. As regards the claim that the use of the strategy was more effective for the middle band of children (i.e. those with entering grades 3-4), how do we know that the differences between the differing bands of children were due to the programme rather than just down to random variation?

As it happens, the query raised in the last point can be answered very quickly with the use of a simple statistical test (a Fisher’s exact test in this instance). In this case, and by conflating the oldest two bands so that we are comparing the ‘3-4’ group with their younger counterparts (‘2’) and older counterparts (‘5+’), such a test gives us a significance level of p=0.275. What this tells us, in essence, is that there’s a fair chance (a 27.5% chance to be precise) that there are actually no underlying age differences and that the differences in this present sample are simply due to random variation. With odds like this, how can we have any confidence in these claims?

The presenters attempted to justify their approach by arguing that it is difficult to isolate the effects of the strategy used and that it was not possible to organize and conduct a randomized controlled trial. However, such arguments are difficult to defend. Infact the present pilot scheme, that ran for just a few weeks, was ideally placed to have been evaluated using a small, pragmatic trial. For example, the children taking part could have been randomly organized into two groups, with one group participating in the scheme initially and the other group acting as a control but possibly getting to participate in the scheme at a later stage (i.e. being a ‘delayed control group’). This way, nobody loses out in the long run. Then, with the children organized into two groups, they just needed to have their motivations tested at the beginning of the pilot scheme and then again at the end. Et voila: a pragmatic randomized trial that would provide strong evidence of whether this pilot scheme was being effective in increasing the motivation of the children taking part.

So if randomized trials are so simple to organize and run then why do researchers still opt, with depressing frequency, for flawed evaluative designs like this? I have offered some possible answers to this question in my editorial for the first issue of the new journal Effective Education which can be accessed free online at: http://www.informaworld.com/effectiveeducation Whatever the reason, it is surely a telling indictment that studies like the one described here are still being produced when so much commitment has been expressed, and efforts made, to building research capacity in education. Teaching the basics of evaluative research designs should be a core element of all undergraduate and postgraduate research training. After all, doesn’t the question of whether an educational programme is effective or not represent one of the basic and fundamental questions that educational research should be seeking to answer? The fact that educational researchers are routinely failing to receive basic training in simple evaluative techniques is therefore indefensible.