Wednesday, 27 August 2014

Attack of the Clones: are data people trying to replace levels with levels?

A couple of days ago the opening salvos of a war between good and evil were fired across the vast expanses of the Edutwitterverse. From their distant quantitative system of darkness, the number crunching legions of the Evil Empire of Analysis were positioning their Data Death Star in orbit around the peaceful and progressive Moon of Awol (that's assessment without levels, in case you didn't know); and much was at stake. 

Well, OK, there was a minor skirmish involving some words and some huffing, and some good points were made. Mainly, I have to confess, by Michael Tidd (light side), and not so much by me (dark side). Michael (follow him on twitter @MichaelT1979 - he knows stuff) has already convincingly blogged his side of the argument here. I also hurriedly wrote a piece detailing my thoughts but managed to lose the whole thing after 2.5 hours of effort. Typical. Try again!

So what's the big issue?

Well, to put it simply, the question is this: do we still need a numerical system for tracking now that levels have gone?

Which caused one person to ask: is this 'a case of the data bods driving the agenda'?

Whilst someone else worried that 'it's beginning to sound a lot like levels', which I'm fairly certain was a Christmas hit by Michael Buble.

They have a point. So, before I go any further I'd like to state the following:

1) I'm no fan of levels. They are too broad, sublevels are meaningless, and they have resulted in the most dangerous/pointless progress measure ever devised.

2) I don't believe a numerical system is required for reporting to parents and pupils. As a parent I am far more interested in what my daughter can do, and what she needs more support with, than some arbitrary number. 

3) I understand that assessment and tracking are different things.

So, do we still need a numerical system for tracking purposes? Well, I think we do. I fully support objective-based assessment models - they make perfect sense - but I also believe that conversion to a standardised numerical system will allow for more refined measures of progress, particularly at cohort or group level, and over periods longer than a year. To reiterate, these do not need to be used in the classroom or reported to parents; they would simply provide the numbers for analysis. They would be kept under the bonnet, fuelling the tracking system's engine; and this is the approach that most tracking systems have adopted. It remains to be seen, of course, how well these work in practice and whether schools start reporting these figures to parents and pupils. I hope not.

Ofsted and RAISE

So, this is where I have to make a confession: Ofsted concern me and many of my opinions about tracking and analysis relate to inspection, which is rather depressing I know, but I think it's necessary. No, I don't think we should build tracking systems solely to satisfy Ofsted but I think it's foolhardy not to consider them. Having been involved in a number of difficult inspections in the past year, I know that data presentation (particularly fine analyses of progress) can often make the difference between RI and Good, which again is depressing, but it's a reality. If we want to get an Ofsted-eye view of school data, just look at RAISE. If you want to counter the arguments stemming from RAISE then it pays to be able to present data in a similar format, in a way that an Ofsted inspector will find familiar. And let's face it: inspectors aren't going to find many things familiar this year. 

The measure that concerns me most is VA - a measure of long term progress, comparing actual against expected outcomes. Without resorting to any particular metric, we can address the proposed new floor measure of 85% making the expected standard by end of KS2 by showing the percentage of the cohort/group that are at or above the school's defined expectations linked to the new national curriculum. Mind you, to digress for a bit, I have a couple of issues here, too. Being at or above the expected level is not necessarily the same as being on track. The pupil may have made no progress or gone backwards. Also, if the school only defines the expected level at the end of the year, will this mean that all pupils are below expectations until the end of the year is reached, like a switched being turned on? Where will this leave the school? Would it not make sense to have a moving definition of expected to allow for meaningful analysis at any point in the year? Just a thought. 

Back to the issue of measuring progress, under various proposed assessment models, we can easily analyse in year progress by counting the steps pupils make, and we can also measure progress by monitoring the shifts in percentages of pupils in a cohort that are below, at or above expectations. But long term, VA-style progress measures are more tricky. If no numerical system exists, how does the school effectively measure progress to counter any negative VA data in RAISE? I'm really struggling with this and I suspect that many if not most headteachers would like their assessment system underpinned by a numerical scale, which will allow progress to be quantified . We know that a floor standard, measuring progress from beginning of EYFS to end of KS2, will be implemented and will be of huge relevance to schools, the majority of which will (initially at least) fall below the 85% expected standard threshold mentioned above. I'm assuming that schools will want to in some way emulate this VA measure in their tracking by quantifying progress from EYFS to the latest assessment point, and perhaps in some way project that progress to make predictions for the end of the key stage.

Another confession: I made the assumption that these assessment models rely on sequential attainment of objectives. If this were the case then a decimalised curriculum year-based model would be useful and neat. For example, categorising a pupil as a 4.5 because they are working within the year 4 curriculum and have achieved 50% of its objectives. Simple. And of course would allow meaningful comparison between pupils within a cohort and even between schools. However, as was pointed out to me, this is not how pupils learn and it doesn't tell us what 50% they've achieved (it's not necessarily the first 50%). This was what we were debating yesterday when the 'data bods driving the agenda' accusation was fired at us. The author of that comment has a good point. 

However, in my defence - and I'm sure it's the same for most other data people - I don't want to drive the agenda. I spend most of my time in schools, working with headteachers, senior leaders, teachers and governors, and I'm constantly learning. I change my mind pretty much everytime I look at twitter. My opinion is like Scottish weather: if you don't like it, just wait 20 minutes. I simply want to ensure that schools have the best tools to do their job and to defend themselves against Ofsted. That's it. I'm not interested in unwieldy, burdensome, time consuming systems; data systems should simplify processes, save time and improve efficiency. It should be a servant, not a master. And yes, its primary function is to inform teaching and learning.

So, to summarise a rather rambling blog, I'm excited about the removal of levels and see it as an opportunity to innovate. As a parent I am more interested in knowing what my daughter can and can't do, than her being assigned a meaningless level. I just think that tracking works best when data is converted to a standardised numerical system. This numerical scale should be used for strategic analysis, to help senior leaders compare current school performance against that outlined in RAISE. I don't think that new numerical systems should replace levels and be used for reporting purposes.  Any such systems must be kept guarded within the mainframe of the Data Death Star at all times.

and we'll leave those cute little Awols alone. 


Data Vader
Level 5 (Sublevel C)
Data Death Star

Wednesday, 20 August 2014

Using on entry CAT tests in junior schools (and how I intend to buy new climbing shoes)

Some things in life are certain: death, taxes, getting a 'sorry I missed you' card from the postman when you've just nipped to the loo for 2 minutes. Oh, and having the conversation about the accuracy of infant schools' KS1 results whenever you find yourself in the same room as a junior school headteacher. This is a conversation I have regularly. If I had a pound for each time I've had this conversation, I reckon I'd have about £87 by now, which is nearly enough for a new pair of climbing shoes. I always need new climbing shoes.

I'm going off topic.

Sometime ago, a junior school head came to visit me in my office. She wanted to discuss the issue of KS1 data accuracy (obviously). I pushed my jar of pound coins towards her, strategically placed a climbing gear catalogue within line of sight, and prepared myself for some proper headteacher ranting. But this head didn't want to rant; she wanted to take some action. She wanted to do stuff. She wanted data. Which is always nice.

So, after some discussion we hatched a plan: to carry out CAT tests on entry in as many Junior schools as possible. We had no idea if this project would be of any use and what we would do with the data when we got it but it sounded like positive action and we thought it would be pretty neat, too. In the end after numerous meetings and emails, 13 out of the 20 junior schools in Gloucestershire got involved and a date in early October was set for their new Year 3 intakes to do the tests. Exciting!

The test itself is known as a PreA test and is specifically designed to be carried out early in year 3. If you'd like to learn more about these and other CAT tests, please contact GL Assessment.

I said above that we didn't know what we would do with the data, which is really true. I had a sort of, kind of idea. A CAT test provides scores for the pupils verbal, non-verbal and quantitative reasoning; it does not generate a level or sublevel that can be directly compared with the pupil's KS1 results. However, like other CAT tests, the PreA test would provide an English and Maths estimate for the end of KS2 in the form of a sublevel. I thought it would be interesting to compare these estimates with those generated using RAISE VA methodology. Not exactly a perfect solution, but compelling, in a data-ery sort of way.

So, once the junior schools had carried out the PreA tests in October last year, they sent me the data. I then converted each pupil's KS2 sublevel estimates generated by the tests, into points scores (by the way, I don't like using the term 'APS' here because they're not averages. I'm pedantic like that). Next I put each pupil's KS1 results into my VA calculator (more information on that here) to generate end of KS2 estimates using RAISE VA methodology, and took estimated point scores for each pupil. I now had two point score estimates for the end of KS2 for each Y3 pupil in the 13 junior schools taking part: one based on the CAT PreA test; the other based on their KS1 results. Neat! now all I had to do was subtract the CAT estimate from the RAISE VA estimate (the former from the latter) to find which one was highest. Positive figures would indicate that the estimate derived from the CAT tests was in advance of those derived from KS1 results; negative figures would indicate the opposite. 'So what?' I hear you shout. Fair question, but bear in mind that it's the RAISE VA estimate that the pupil's progress is measured against (well, sort of, because, actually, their estimates won't really be calculated until they've done their KS2 SATS, but we're trying here, OK?). And if the RAISE VA estimate (i.e. that based on KS1) is always higher that the CAT estimate then this could be rather worrying as it may indicate that the future VA bar will be set unrealistically high for those pupils.

So what was the outcome?

Well, the estimates based on KS1 results were higher than the those based on the CAT test in pretty much every case. I'm writing this at home without the full dataset in front of me but we're talking about approximately 600 pupils here. It was quite startling. Wanna see some data? Course you do.

English Maths
Junior School 1 2.3 1.9
Junior School 2 1.6 1.9
Junior School 3 4.3 4.0
Junior School 4 2.7 2.4
Junior School 5 3.3 1.8
Junior School 6 2.7 3.2
Junior School 7 2.6 3.2
Junior School 8 3.3 2.3
Junior School 9 6.0 6.9
Junior School 10 2.5 2.1
Junior School 11 4.3 4.9
Junior School 12 2.3 1.6
Junior School 13 1.5 1.1
Average 3.0 2.9

The table and chart above (it's always nice to have the same data presented in different ways - I learnt a lot from RAISE) show the average differences (this actually is APS!) between the end of KS2 estimates derived from CAT PreA tests and those generated using RAISE VA methodology for both English and Maths. I used 2012 methodology, by the way, as it produced English estimates, rather than the separate reading and writing estimates of 2013, and so matched the CAT test data. As you can see the average difference for the group of schools is 3 points for both English and Maths, i.e. VA estimates base on KS1 outcomes are 3 points (1.5 sublevels) higher than those based on the CAT tests. Some schools' differences are very small (e.g. schools 2 and 13), so estimates based on KS1 and CAT tests are similar and this could be taken as evidence that KS1 results are accurate. And maybe differences of 2 APS or less are the within the limits of tolerance, but three of the above schools (3, 9 and 11) have very big differences and these perhaps are the most concerning. Schools 3 and 11 have differences of 4-5 APS (2-2.5 sublevels) and school 9 has a difference of 6 APS in English and 7 APS in Maths (an entire level).

Obviously I'm making the assumption that CAT tests are reliable and accurate predictors of end of key stage outcome, but if this is the case (and many evidently think they are), and if the estimate differences detailed above can be taken as a proxy for the gap between KS1 results and pupils' actual ability, then the children in these three schools in particular have some serious ground to make up just to break even in terms of VA. Considering that, on average, cohorts need to make around 13 points to get a VA score of 100  (it's actually around 13.4 but let's not split hairs), then the pupils in the schools 3 and 11 would, in reality, need to make 17 points to make expected progress (in terms of VA). Meanwhile pupils in school 9 will need to make 19-20 points to reach the VA 100 line. Somewhat unlikely and blue boxes in RAISE may be hard to avoid. Interestingly, my friendly junior school head teacher, mentioned above, maintains that pupils in her school need to make 16 points of progress in reality (i.e. from the school's own baseline assessment) to get a positive VA score. The CAT vs VA experiment backed up her assertions.

So, that's it really. Deeply flawed I know, but interesting and a worthwhile exercise (the data was used by one school as part of their evidence base for inspection and proved very useful). The lack of control group is an obvious issue here and needs to be addressed in future. Ideally we'd like to get 10 primary schools to take part at some point. Traditionally schools have carried out CAT testing in Year 5 but more schools are considering alternatives. I actually think it's worth doing them earlier as you have more time to act on the data, so perhaps more primary schools would be interested in testing in year 3. Many of the junior schools heads involved in this project intend to continue using the tests as it gave them a alternative and rich source of information on pupils strengths and weaknesses, which they didn't have previously. This is a positive thing.

And finally, please can I state that this is not intended to be an exercise in infant school bashing. I'm very fond of infant schools, some of my best friends are infant schools, but this issue always crops up when talking to junior schools so I thought it would be interesting to test their claims. I suspect that similar issues occur in primary schools and that's why we need a primary control group for this research to have any real validity.

Anyway, that's the end of this blog. Hope it was useful, or at least interesting.

Oh, and by the way, I am now a governor of a junior school and now own new pair of climbing shoes.