Wednesday, 14 December 2016

10 things I hate about data

I seem to spend a lot of time ranting these days. Recently I've been trying to rein it in a bit, be less preachy. It's counter productive to wind people up - need to get them on side - the problem is there are just so many opportunities to get annoyed these days. I'm turning into a data analyst who hates data. Well, a data analyst who hates bad data (as any decent data analyst should). And let's face it, there's a lot of bad data out there to get annoyed about. So, a few weeks ago I gave a conference talk entitled '10 things I hate about data' (it could have been much longer, believe me, but 10 is a good number).

Here's a summary of that talk.

1) Primary floor standards
We are now in a crazy world where the attainment floor standard is way above national 'average'. England fell below its own minimum expectation. How can that happen? On the 1st September 2016, the floor standard ceased to be a floor standard and became an aspirational target. But the DfE had already committed to having no more than 6% of schools below floor, which meant they had to set the progress thresholds so low that they just captured a handful of schools. I find it hard to apply the phrase 'sufficient progress' to scores of -5 and -7 and keep a straight face. So primary schools have four floor standards: one linked to attainment, which is way too high, and three relating to progress, which are way too low. If the school is below 65% EXS in reading, writing and maths, and below just one of the progress measures it is below floor. Unless that one happens to be writing, in which case chances are it'll be overlooked because writing data is junk. Oh, and if you are below just one progress floor then it has to be significantly below to be deemed below floor, which is ridiculous because it's not actually possible to have scores that low and for them not to be significantly below. Meanwhile, secondary schools with all the complexity of GCSE and equivalent data, have one single measure, progress 8, which captures the average progress made by pupils in up to 8 subjects. The floor standard at KS4 is half a grade below. Simple. Why can't primary schools have a similar single, combined-subject, progress-based floor measure?

2) Coasting
I hate this measure. I get what they're trying to do - identify schools with high attainment and low progress - but this has been so badly executed. Why 85%? What does that mean? How does 85% level 4 in previous years link to 85% achieving expected standards this year? Why are they using levels of progress medians for 2014 and 2015 when they could have used VA, which would make the progress broadly comparable with 2016? And why have they just halved the progress floor measures? (smacks of what my Dad would describe as a 'Friday afternoon job'). Remember those quadrant plots in RAISE? The ones that plotted relative attainment (which compared the school's average score against the national average score) against VA? Schools that plot significantly in the bottom right hand quadrant 3 years running - that would be a better definition of coasting. Unless they are junior schools, in which case forget it. Actually, until we have some robust data with accurate baselines, perhaps forget the whole thing.

3) The use of teacher assessment in high stakes accountability measures
The issue of KS2 writing has been discussed plenty already. We know it's inconsistent, we know it's unreliable, we know it's probably junk. Will it improve? No. Not until teacher assessment is removed from the floor standards at least. I'm not saying that writing shouldn't be teacher assessed, and that teacher assessment shouldn't be collected, but we can't be surprised that data becomes corrupted when the stakes are so high. The DfE evidently already understands this - they decided a year ago not to use writing teacher assessment in the progress 8 baseline from 2017 onward (the first cohort to have writing teacher assessed at KS2). It's not just a KS2 issue either. KS1 assessments form the baseline for progress measures so primary schools have a vested interest in erring on the side of caution there; and now that the DfE are using EYFSP outcomes to devise prior attainment groups for KS1, who knows what the impact will be on the quality of that data. All this gaming is undermining the status of teacher assessment. It needs a rethink.

4) The writing progress measure
Oh boy! This is a whopper. If you were doubting my assertion above that writing teacher assessment should be removed from floor standards, this should change your mind. Probably best to read this but I'll attempt to summarise here. Essentially, VA involves comparing a pupil's test score against the national average scores for pupils with the same start point. A pupil might score 97 in the test and the national average score for their prior attainment group is 94, so that pupil has a progress score of +3. This is fine in reading and maths (and FFT have calculated VA for SPaG) but it doesn't work for writing because there are no test scores. Instead, pupils are assigned a 'nominal score' according to their teacher assessment: WTS = 91, EXS = 103, GDS = 113, which is then compared against an unachievable fine graded benchmark. So, a pupil in prior attainment 12 (KS1 APS of 15 i.e. 2b in reading, writing and maths) has to achieve 100.75 in writing, which they can't. If they are assessed as meeting expected standard (nominal score of 103) their progress score will be +2.25; if they are assessed as working towards (nominal score of 91) their progress score will be -9.75. Huge swings in progress scores are therefore common because most pupils can't get close to their benchmarks due to the limitations of the scoring system. And I haven't got space to here to go discuss the heinousness of the nominal scoring system applied to pre-key stage pupils except to say that it is pretty much impossible for pupils below the level of the test to achieve a positive progress score. So much the for claim in the primary accountability document that the progress measures would reflect the progress made by ALL pupils. Hmmm.

5) The death of CVA
In 2011, the DfE stated that 'Contextual Value Added (CVA) goes further than simply measuring progress based on prior attainment [i.e. VA] by making adjustments to account for the impact of other factors outside of the school’s control which are known to have had an impact on the progress of individual pupils e.g. levels of deprivation. This means that CVA gives a much fairer statistical measure of the effectiveness of a school and provides a solid basis for comparisons.' Within a year, they'd scrapped it. But some form of CVA is needed now more than ever. Currently, pupils are grouped and compared on the basis of their prior attainment, without any account taken of special needs, length of time in school, number of school moves or deprivation. This is a particular issue for low prior attainment groups, which commonly comprise two distinct types of pupils: SEN and EAL. Currently, no distinction is made and these pupils are therefore treated the same in the progress measures, which means they are compared against the same end of key stage benchmarks. These benchmarks represent national average scores for all pupils in the particular prior attainment group, and are heavily influenced by the high attainment of the EAL pupils in that group, rendering them out of reach for many SEN pupils. Schools with high percentages of SEN are therefore disadvantaged by the current VA measure and are likely to end up with negative progress scores. The opposite is the case for schools with a large proportion of EAL pupils. This could be solved by either introducing form of CVA or by removing SEN pupils from headline measures. This of course could lead to more gaming of the system in terms of registering pupils as SEN or not registering as EAL, but the current system is unfair and needs some serious consideration.

6) The progress loophole of despair
This is nuts! Basically, pupils that are assessed as pre-key stage are included in progress measures (they are assigned a nominal score as mentioned above), whereas those assessed as HNM (in reading and maths) that fail to achieve a scaled score (i.e. do not achieve enough raw marks on the test) are excluded from progress measures, which avoids huge negative progress scores. I have seen a number of cases this year of HNM pupils achieving 3 marks on the test and achieving a scaled score of 80. Typically they end up with progress deficits of -12 or worse (sometimes much worse), which has a huge impact on overall school progress. Removing such pupils often makes the difference between being significantly below and in line with average. And the really mad thing is that if those pupils had achieved one less mark on the test, they wouldn't have achieved a scaled score and therefore would not have been included in the progress measures (unlike the pre-key stage pupils). Recipe for tactical assessment if ever I saw one.

7) The one about getting rid of expected progress measures
The primary accountability document states that 'the ‘[expected progress] measure has been replaced by a value-added measure. There is no ‘target’ for the amount of progress an individual pupil is expected to make.’ Yeah, pull the other one. Have you seen those transition matrices in RAISE (for low/middle/high start points) and in the RAISE library (for the 21 prior attainment groups)? How many would really like to see those broken down into KS1 sublevel start points? Be careful what you wish for. Before we know it, crude expectations will be put in place, which will be at odds with value added and we're back to square one. Most worrying are the measures at KS1 involving specific early learning goals linked to end of KS1 outcomes, and the plethora of associated weaknesses splashed all over page 1 of the dashboard. Teachers are already referring to pupils not making 'expected progress' across from EYFS to KS1 on the basis of this data. Expected progress and VA are also commonly conflated, with estimates viewed as minimum targets. In every training session I've run recently, a headteacher has recounted a visit by some school improvement type who has shown up brandishing a copy of the table from the accountability guidance, and told them what scores each pupil is expected to get this year. Expected implies that it is prescribed in advance, and yet VA involves comparison against the current year's averages for each prior attainment group and we don't know what these are yet. Furthermore, because it is based on the current year's averages, half the pupils nationally will fall below the estimates and half will hit or exceed them. That's just how it is. Expected progress is the opposite of VA and my response to anyone confusing the two is: tell me what the 2017 averages are for each of the 21 prior attainment groups, and i'll see what I can do. I spoofed this subject here, by the way.

8) Progress measures in 2020
Again, this. Put simply, the basis of the current VA measure is a pupil's APS at KS1. How are we going to do this for the current Y3? How to I work out the average for EXS, WTS, EXS? Will the teacher assessments be assigned a nominal value? How many prior attainment groups will we have in 2020 when this cohort reach the end of KS2. Currently we have 21 but surely we'll have fewer considering there are now fewer possible outcomes at KS1, which means we'll have more pupils crammed into a smaller number of broader groups. Such a lack of refinement doesn't exactly bode well for future progress measures. Remember that all pupils in a particular prior attainment group will have the same estimates at the end of KS2, so all your EXS pupils will be lumped into a group with all other EXS pupils nationally and given the same line to cross. This could have been avoided if the KS1 test scores were collected and used as part of the baseline, but they weren't so here we are. 2020 is going to be interesting.

9) Colour coding used in RAISE
Here is a scene from the script I'm working on for my new play, 'RAISE (a tragedy)'.

HT: "blue is significantly below, green is sigificantly above, right?"
DfE: "No. It's red and green now"
HT: "right, so red is significantly below, and green is significantly above. Got it"
DfE: "well, unless it's prior attainment"
HT: "sorry?"
DfE: "blue is significantly below if we're dealing with prior attainment"
HT: "so blue is significantly below for prior attainment but red is significantly below for other stuff, and green is significantly above regardless. And that's it?"
DfE: "Yes"
HT: "You sure? You don't look sure."
DfE: "Well...."
HT: "well what?"
DfE: "well, it depends on the shade?"
HT: "what shade? what do you mean, shade?"
DfE: "shade of green"
HT: "shade of green?"
DfE: "or shade of red"
HT: "Is there a camera hidden here somewhere?"
DfE: "No. Look, it's perfectly simple really. Dark red means significantly below and in the bottom 10% nationally, light red means significantly below but not in the bottom 10%; dark green is significantly above and in the top 10%, light green is significantly above but not in the top 10%. See?"
HT: "Erm....right so shades of red and green indicating how significant my data is. Got it."
DfE: "Oh no. We never say 'how significant'. That's not appropriate, statistically speaking"
HT: "but, the shades...."
DfE: "well, yes"
HT: *sighs* "OK, shades of red and green that show data is significantly below or above and possibly in the bottom or top 10%. Right, got it"
DfE: "but only for progress"
HT: "Sorry, what?"
DfE: "we only do that for progress"
HT: "but I have dark and light green and red boxes for attainment, too. Look, here on pages 9 and 11 and 12. See?"
DfE: "Yes, but that's different"
HT: "How is it different? HOW?"
DfE: "for a start, it's not a solid box, it's an outline"
HT: "Is this a joke?"
DfE: "No"
HT: "So, what the hell do these mean then?"
DfE: "well those show the size of the gap as a number of pupils"
HT: "are you serious?"
DfE: "Yes. So work out the gap from national average, then work out the percentage value of a pupil by dividing 100 by the number of pupils in that group. Then see how many pupils you can shoehorn into the gap"
HT: "and the colours?"
DfE: "well, if you are 2 or more pupils below that's a dark red box, and one pupil below is a light red box, and 1 pupil above that's a light green box, and you get a dark green box if you are 2 or more pupils above national average"
HT: "and what does that tell us?"
DfE: "I'm not sure, but -2 or lower is well below, and +2 or higher is well above. You may have seen the weaknesses on your dashboard"
HT: "So let me get this straight. We have dark and light shades of red and green to indicate data that is either statistically below or above, and in or not in the top or bottom 10%, or gaps that equate to 1 or 2 or more pupils below or above national average. Am I there now?"
DfE: "Yes, well unless we're talking about prior attainment"
HT: "Oh, **** off!"

Green and red lights flash on and off. Sound of rain. A dog barks.

10) Recreating levels
We've been talking about this for nearly 2 years now and yet I'm still trying to convince people that those steps and bands commonly used in tracking systems - usually emerging, developing, secure - are essentially levels by another name. Instead of describing the pupil's competence in what has been taught so far - in which case a pupil could be 'secure' all year - they instead relate to how much of the year's curriculum has been achieved, and so 'secure' is something that happens after Easter. Despite finishing the previous year as 'secure' they start the next year as 'emerging' again (as does everyone else). Pupils that have achieved between, say, 34% and 66% of the year's curriculum objectives are developing, yet a pupil that has achieved 67% or more is secure. Remember those reasons for getting rid of levels? how they were best-fit and told us nothing about what pupils could or couldn't do; how pupils at either side of a level boundary could have more in common than those placed within a level; how pupils could be placed within a level despite having serious gaps in their learning. Consider these reasons. Now look at the the example above, consider your own approach, and ask yourself: is it really any different? And why have we done this? We've done it so we can have a neat approximation of learning; arbitrary steps we can fix a point score to so we can count progress even if it it's at odds with a curriculum 'where depth and breadth of understanding are of equal value to linear progression'. Then we discover that once pupils have caught up, they can only make 'expected progress' because they don't move on to the next year's content. So we shoehorn in an extra band called mastery or exceeding or above, with a nominal bonus point so we can show better than expected progress for the most able. These approaches have nothing to do with real learning; they've got everything to do with having a progress measure to keep certain visitors happy. It's all nonsense and we need to stop it.

Merry Christmas!

Friday, 2 December 2016

Example report on primary school performance

This is an anonymised version of a summary report I've just written for a primary school. They kindly allowed me to post it on my blog (with school name removed, obviously). It's not been properly edited yet but hopefully it'll give you some ideas if you are in the middle of writing similar reports at the moment.

Beacon Primary Academy
Summary of performance 2016

Beacon Primary Academy is a larger than average primary school in an area of high deprivation in central Springfield. Most pupils are from ethnic minority backgrounds – the majority are Pakistani or Bangladeshi - and English is an additional language for the majority of pupils in the school. Almost half of pupils are eligible for free school meals, which is considerably higher than national average, and the school ranks amongst the 20% most deprived nationally.

Historically, prior attainment at key stage 1 has been significantly below average but this has improved and prior attainment of current years 4 and 5 is broadly in line with national average.

Floor Standards
Floor measure
Progress Reading
Progress Writing
Progress Maths
Overall floor standards met?

The school fell below the attainment floor measure of 65% achieving the expected standard in reading, writing and maths combined but it is close to the national average of 53%. It should be noted that the majority of schools nationally fell below the 65% threshold. The school is above all 3 progress floor measures (it was significantly above average for progress in writing and maths) and is therefore above floor. Furthermore, the school is not considered to be ‘coasting’ on the basis of its 2016 progress results.

Key stage 2

FFT Analysis of main key stage 2 results
FFT data shows that the average score in reading and maths tests combined in 2016 was 100.5, which is significantly below average, but is 1.6 points higher, and significantly above expected, when pupil start points are taken into account. Furthermore, the result of 49% achieving the expected standard in reading, writing and maths, despite being below national average of 53%, is significantly higher than expected by 12% points, considering the start points of pupils. This indicates that, under normal circumstances, just 37% of pupils would achieve the expected standard in the three subjects, which translates into 10 more pupils achieving the overall expected standard than would be expected to do so in a school where ‘average’ progress is made. FFT ranks the school in the top 25% for progress, maintaining its 2015 position and an improvement on 2014 when it was ranked at the 33rd percentile for progress.

Overall progress in maths is +2.4, which is significantly above average and ranks the schools at the 17th percentile. This contrasts with reading, the progress score for which is 0.0, indicating that progress in that subject was average. This is perhaps to be expected in a cohort comprising such a high proportion of EAL pupils, and should be considered alongside pupils’ progress in grammar, punctuation and spelling (GPS). Here the progress score is +4.1, which is significantly above average, demonstrating that pupils made excellent progress in this subject in comparison to pupils with similar start points nationally. The school is ranked at the 6th percentile (top 6% of schools) for progress in GPS.

FFT analysis shows that all but two pupil groups made more than average progress, with many making significantly more than average progress. The ‘any other’ ethnic minority group of pupils (8 pupils) made average progress, and SEN support group making less than average progress. However, it should be noted that SEN pupils are often shown to make less than average progress in a VA model where they are compared against pupils nationally with similarly low start points – a group that includes EAL pupils.

RAISE Summary report

Progress at Key Stage 2
Overall progress in writing and maths was significantly above average and was positive for all pupils and disadvantaged pupils in each prior attainment band. Notably, progress in maths was significantly above average for disadvantaged pupils overall, and significantly above average in writing and maths for disadvantaged pupils in the low prior attainment group. As in FFT data, progress in reading was 0, indicating pupils making average progress.
Further analysis of pupil group progress data reveals that no groups’ progress was not significantly below average in any subject and many groups were significantly above in writing and maths. A notable area for further investigation is the progress of high prior attainers, which was broadly in line with average in maths and below (but not significantly below) average in writing.
Please note: overall low/middle/high is defined by KS1 APS. Subject low/middle/high refers to pupil’s level in that particular subject at KS1.

Attainment at Key Stage 2
49% of pupils achieved the expected standard in reading, writing and maths at KS2 which is just below the national average. No pupils achieved the higher standard in the three subjects combined. Percentages meeting the expected standard in writing and maths are above national average overall and for each prior attainment band; and disadvantaged pupils’ attainment of expected standards was generally in line with the national averages for non-disadvantaged pupils in these two subjects, and above in the case of low prior attaining disadvantaged pupils. No gaps are therefore identified in writing and maths in terms of percentages of disadvantaged pupils meeting expected standards.
The key issues are twofold: attainment of expected standards in reading and greater depth in writing. Here, large gaps from national average are identified particularly for the middle prior attaining group, and the gap between the middle disadvantaged group and non-disadvantaged pupils nationally in reading is notably wide (-4). Gaps of -2 pupils or lower are classified as ‘well below’ average. There is also a gap identified in terms of high prior attainers achieving the high standard in reading, writing and maths combined. This is due to the lack of pupils achieving greater depth in writing.

Attainment at Key Stage 1
Attainment of expected standards at KS1 was above and well above average in all subjects overall and for each prior attainment group. Disadvantaged pupils’ attainment of expected standards was above average in writing and well above in reading and maths. All but one of the non-emerging disadvantaged pupil group (i.e. those disadvantaged pupils that had met the early goals) achieved the expected standard at KS1.

FFT data shows that 63% of pupils in this cohort achieved expected standards in reading, writing and maths combined, which is 12% points higher than estimated when pupils’ EYFS outcomes are taken into account. As at KS2, this equates to around 10 more pupils achieving expected standards in the three subjects than perhaps would be expected in a school where average progress is made. FFT data shows that attainment of expected standards at KS1 was above expected for nearly all groups, and significantly above in many cases, most notably Bangladeshi, FSM pupils and lower prior attainers. Only SEN pupils’ attainment fell short of estimated outcomes – the gap equating to one pupil not meeting expected standards.

As at KS2, it is achievement of greater depth (high standard) that is the key issue, particularly for ‘middle’ prior attainment pupils - in this case, those that had met the early learning goals in that specific subject at EYFS (EY expected). FFT shows that the percentage of high prior attainers achieving greater depth in reading, writing and maths combined (25%) was in line with ‘expectations’. However, the percentage of middle prior attainers doing the same was below: no pupils managed the higher standard in all three subjects, which is deemed to be 5% (or 2 pupils) below estimated. 

RAISE shows gaps ranging from 1 pupil below average (EY ‘expected’ disadvantaged pupils achieving greater depth in writing at KS1) to 6 pupils below (EY ‘expected’ pupils (all pupils) achieving greater depth in reading at KS1). A further investigation into middle prior attaining pupils achieving high standards is recommended. It is, for example, likely that many pupils placed in the EY ‘expected’ prior attainment group, based on EYFS outcome in that particular subject area, did not achieve GLD.

Percentages of EY ‘exceeding’ pupils (the high prior attainment group) achieving greater depth at KS1 are also low but numbers of pupils are small and gaps do not generally equate to 1 pupil, except in the case of writing, where the school figure for all pupils is flagged as -2 (i.e. 2 pupils below average). It should be noted that there are no pupils in the EY ‘exceeding’ group for maths and only 4 pupils are in this group for reading, in contrast with writing where 10 pupils are identified as exceeding based on EY outcomes. This is unusual as writing is usually the lower of the 3 EYFS areas.

Phonics continues to improve overall and for all key groups. Notably, the percentage of disadvantaged pupils achieving the expected standard in phonics is above that not of non-disadvantaged nationally, and has been for the past 3 years. 95% of disadvantaged pupils achieved the expected standard in Y1 in 2016.

Overall progress at KS2 is high particularly in writing and maths where scores are above and significantly above average for certain groups. Progress in reading is in line with national average but this is perhaps to be expected for a cohort comprising mostly EAL pupils. Progress in writing for high prior attainers is below average but there are serious concerns over the accuracy and reliability of teacher assessment in writing nationally. Three more pupils assessed at greater depth instead of expected standard in writing would have turned the negative progress score into a positive.

Overall attainment at KS2 is broadly in line with national averages. RAISE highlights relatively low attainment for middle prior attainment group in reading and the middle and high prior attainment groups in writing. Attainment of lower prior attainers tends to be in line with or above average. This is reflected in the progress measures (see above).

Attainment at KS1 is similar to KS2 with relatively low percentages of middle prior attaining pupils achieving higher standards. However, it is likely that strong improvements in phonics alongside the further embedding of the new curriculum will see this situation change in future years.

The key issue arising from data is the relatively low progress made by middle prior attaining pupils and this should be a key focus in future.

Sunday, 13 November 2016

RAISE case study summary sheet for EY-KS1

As tweeted yesterday, the whole EY-KS1 attainment table thing in RAISE/dashboard really bothers me. I intend to write a blog or TES article about the issue this week but in the meantime, I wanted to share this tool.

Many schools have many 'weaknesses' stated on p1 of their dashboards, which arise from these tables. These relate to negative gaps from national average that equate to 2 or more pupils. In such cases, I've recommended schools produce case studies on pupils that didn't achieve the expected standard or greater depth at KS1 from their emerging, expected, or exceeding start point (in the specific early learning goal!) at EYFS.

Someone pointed out that there may not be time during an inspection to go through these studies (I hope that's not the case and that the inspector will make time) but I thought it would be worth summarising the written case studies in a table, which is what I've attempted to produce here.

Have a look and let me know what you think.

And feel free to adapt it. Just download it first and leave the online version as it is.



Friday, 11 November 2016

RAISE summary template

Not blogged for a month, which is disgraceful, but I've been stupid busy. While I'm getting round to writing something meaningful and current and bang on the data nail, here is the link to my RAISE summary template:

Please avoid editing the online version. It should open in word view - download it (click on 3 little dots top right in chrome) and then save it and fill it in.

Hope it's useful. 

Saturday, 15 October 2016

The Zero Game

Until this year VA has played second fiddle to the levels of progress measure. This was mainly because there were no floor standards linked to VA. But it was also because - let's be honest here - not many people really understood it. Everyone understood pupils needing to make two levels of progress (and hopefully three levels of progress) but we struggled with the world of prior attainment groups, estimates, and confidence intervals. But now that levels are gone and all we're left is a value added progress measure, we have no choice but to get our heads round it. So, we read the primary accountability document and have seen the lookup table on p16-17; we understand there are 21 prior attainment groups (a reduction in start points in previous years due to change in methodology); that each of these prior attainment groups has an estimate in reading, writing and maths, which represents the national average score for pupils in that group; that these estimates form the benchmark for each pupil; and that exceeding these scores ensures a positive progress score for each child, which will aggregate to a positive progress score overall. We get this now.

And that's where the trouble started. 

Up until recently, schools were flying blind. With a new curriculum and new tests, unsure of what constituted expected standards, and no idea of 'expectations' of progress, schools just concentrated on teaching and tracking the gaps in pupil's learning. We even started to question the methods of our tracking systems, with their pseudo-levels and points-based progress measures. Things were looking positive. The future was bright.
But then we saw the checking data, and that lookup table appeared, and I produced my VA calculator, and FFT published their 2016 benchmarked estimates.  Now it seems that many schools are playing a VA game, working out where each pupil needs to get to in order to ensure a positive progress score; comparing benchmarked estimates (that are no doubt too low for next year) against predicted results to model VA in advance, to provide figures to settle nerves and satisfy those scrutinising schools' performance.

I understand that schools want a positive VA score when the stakes are so high but we have to consider the potential risks to pupils' learning by focussing on minimum 'expected' outcomes. I am particularly concerned to hear that schools are building systems that track towards these estimated outcomes, using teacher assessment or optional tests as a proxy for expected standards, as a predictor of outcome that can then be compared against the end of key stage progress estimate. I think of the ideals of 'learning without limits' and the sound principles for the removal of levels, and wonder if anything has really changed. I also wonder if it was wise to publish my VA calculator. All those schools inevitably using it to generate estimates for current cohorts; estimates that are being entered into systems and somehow tracked towards. Am I part of the problem? 

Has a knowledge of progress measures become a risk to children's learning? 

How about we just put the blinkers on and concentrate on teaching? Look after the pennies and the pounds will take care of themselves. 

Just a thought. 

Tuesday, 11 October 2016

A clear and present danger: the problem with writing VA

Back in February I wrote this post, speculating on the potential issues of the proposed writing progress measure. That measure is now a reality and the issues predicted in the blog post have taken root. I don't have a problem with VA measures per se - although I would prefer it to be contextualised - but I do have an issue with VA when it's based on teacher assessment. The problem is that teacher assessments are too broad, too inconsistent, and too prone to bias when high stakes are involved. I certainly echo the proposals of the Headteachers' roundtable alternative green paper, which calls for the removal of teacher assessment from high stakes accountability measures. In my blog I discussed the problems of the writing VA measure under levels and suggested that it would only get worse in the post-levels world. To understand this we need to look at how VA works and why writing differs from reading and maths.

VA involves comparing a pupil's attainment score at KS2 against the national average score for pupils with the same KS1 prior attainment. Many of us will now be familiar with the lookup table on p16-17 of primary accountability document, which shows the average scores in reading, writing and maths for each of the 21 prior attainment groups. These are the benchmarks against which each pupil's actual KS2 scores in the three subjects are compared this year (please note: they'll almost certainly increase next year). I don't have a big problem with the reading and maths benchmarks - they are fine graded, and pupils, having sat tests in those subjects, will have a fine grade to compare it against. In short, the reading and maths benchmarks are achievable.

That is not the case for writing. Again the benchmarks are fine graded but unlike reading and maths, in writing they are unachievable. Because there is no test score in writing the decision was taken to apply a nominal score to the teacher assessment. In my blog, I speculated that 90, 100, and 110 would be possible nominal scores assigned to the WTS, EXS and GDS assessments. We now know that these scores are 91, 103 and 113. This doesn't really change the problem; if anything it makes it worse. To try to understand this a bit better, I put together the following table, which shows every possible progress score in writing this year for every possible TA outcome from every start point:

The first thing to note - and this is something that confuses quite a few people - is that writing benchmarks in column 3 are fine graded. How can this be when they haven't sat a test? Well, consider all those scores of 91, 103 and 113 (and the nominal scores attached to pre-key stage judgements - another issue). If we add up all the various nominal scores for all pupils in a particular prior attainment group (thousands of pupils) and divide by the number of pupils in that group, we will of course end up with a fine grade. But as mentioned above, it's a fine grade that in many cases, due to the scoring system, pupils can't get close to.

Now have a look at the next few columns. Negative numbers indicate pupils making supposedly less than average progress. It is therefore not possible for a pupil assessed as BLW, PKF or PKE to make more than average progress, and the only PKG pupils that make more than average progress are those that are in prior attainment group 1, i.e. pupils that were below P7 average at KS1. Now take a look at pupils with a KS1 average point score of 10 or more - they need to be at least EXS to make more than average progress. In reading and maths, they just need a few more marks on the test to shift from negative to positive but in writing they need to make the leap from WTS to EXS. For prior attainment group 7 (KS1 APS 10 to <12) this leap changes their progress score from -1.6 to +10.4; and for prior attainment group 9 pupils (2C average at KS1) the progress swings from -5.69 to +6.31 if they are awarded EXS instead of WTS. Meanwhile pupils in prior attainment group 12 (2B average at KS1) assessed as WTS at KS2 end up with a deficit of -9.75. And EXS isn't good enough for prior attainment group 16 (2A average at KS1). Pupils in this group need to achieve GDS in order to make more than average progress. 

This is a measure based on an illogical scoring system applied to unreliable data. The methodology results in wild wings into positive or negative territory, which have a profound impact on overall progress scores, especially in small schools. Consequently, many schools are likely to have one eye on the progress lookup table when making their writing assessments next year. High stakes influencing outcome. Yes, there have been reassuring statements from Ofsted and the DfE regarding this year's writing teacher assessments, but I can't see how the writing progress measure is going to improve much next year. This measure is so flawed that one has ask 'why bother?' Is it really worth it? Is a writing progress measure really so important that we are willing to sacrifice accuracy and fairness? Not really - it makes no sense - so let's ditch it.

It's time to admit that we can't measure progress in this way.

Thursday, 15 September 2016

The progress loophole of despair

Is it possible that there are certain situations where a school would benefit more from a pupil actually scoring less on a test? Having spent probably too much time playing with the KS2 pupil ready reckoner tool (click here and click 'how Ofsted and DfE analyse your data' to download), it would appear so. But this flies in the face of the core purpose of value added measures - that all progress counts - and the key statement in the DfE's Primary Accountability guidance: 'It is important that schools are held to account and given recognition for the progress made by all of their pupils'. So, how can we have a loophole where certain pupils are excluded, even if they've taken the test? And what are the implications for future assessment arrangements if this loophole remains open?

Having played with the pupil ready reckoner tool, I've found a number of interesting scenarios and they involve pupils assessed as HNM or PKS (pre-key stage, which includes BLW, PKF, PKE and PKG):

1) Pupils assessed as PKS who do not sit the test
These pupils obviously have no test score and are instead assigned a nominal score as defined in the accountability guidance: BLW = 70 points; PKF = 73 points, PKE = 76 points, PKG = 79 points. These points are for progress measures only and are not used in the average scores measure. These pupils will be counted in the denominator for the percentage threshold measures.

2) Pupils assessed as PKS who do sit the test but do not gain enough marks to achieve a scale score
Naively, I assumed that any pupil assessed as PKS would not sit the test because they were working below the level of the test, but that is certainly not the case. There are enough examples in the datasets I've seen to suggest it's actually quite common. In situations where 'PKS' pupils sat the test but did not achieve enough marks to be awarded a scale score, they are instead assigned the same nominal scores according to their PKS code as detailed above. In the ready reckoner tool, if 'no scale score awarded' is selected in the first box, users are prompted to 'Please enter a KS2 Teacher Assessment if the test outcome is B (working below the standard of the test) or if the pupil entered the test but did not achieve a scaled score'. PKS codes are acceptable values and the nominal score (i.e 70-79) is awarded in place of the scale score. Again, these points are for progress measures only and are not used in the average scores measure. These pupils will also be counted in the denominator for the percentage threshold measures.

3) Pupils assessed as PKS who sit the test and gain enough marks to achieve a scale score
This scenario  is quite simple: the pupil achieves a scale score, which supplants the nominal score assigned to the PKS code; and because the lowest scale score is 80, the pupil makes at least a single point gain in their individual progress score (the highest they can get if they don't score on the test is 79). However, unlike the nominal scores assigned to PKS codes, this score is also included in the average score, so whilst the school makes an almost negligible gain in terms of progress, its average scores are likely to take a big hit. These pupils are therefore included in progress measures, average scores and percentage threshold measures.

4) Pupils assessed as HNM who achieve a scale score
These pupils are not working below the level of test but are assessed as not having met the expected standard. These pupils take the test and the vast majority achieve a scale score - there is a wide range of scores for HNM pupils with many achieving low scores (i.e. below 90). These scores are used in the progress measures and average scores. This is all fairly normal.

5) Pupils assessed as HNM who do not gain enough marks to achieve a scale score
Here's the loophole. Unlike, PKS pupils, this group are not assigned a nominal score of any sort. When 'no scale score awarded' is selected in the pupil ready reckoner tool, and HNM is then selected in the teacher assessment box, as prompted, instead of the nominal scores awarded to PKS codes, we are instead confronted with a red box stating 'pupil excluded'. So a PKS pupil that sits the test but does not achieve enough marks to get a scale score is assigned a nominal score from 70-79, but HNM pupils receive no such score and are excluded. The impact of this, particularly on a small school's data can be enormous. Those nominal scores for PKS codes can have a major negative impact on progress measures, but if pupils were instead assessed as HNM and had not scored on the test, they'd have been excluded and the negative impact would disappear.

What concerns me is that schools may start to be strategic in their assessments if this loophole remains open. Yesterday I saw an example of a pupil that had been assessed as HNM, who gained just enough marks to achieve the lowest scale score of 80. The pupil ended up with individual progress scores of around -12, which has a profound impact on the school's overall progress measures. If the pupil had achieved just a few less marks, then no scale score or nominal score would have been awarded and the pupil would have been excluded from the progress measures. The -12 would disappear.

In another school, a PKE pupil had been entered for the test but failed to achieve a scaled score. In this case the nominal score of 76 is awarded, and again there is a big impact on VA due to the large negative deficit. If this pupil had been assessed as HNM then no nominal score would have been awarded, the pupil would be excluded from progress measures and there would be no major, negative impact on the school's progress measures.

The worst case scenario is that schools assess pupils as HNM rather than PKS and enter them for the test knowing/hoping they don't score, thus ensuring there is no negative impact. If they do score on the test then at least the score will be higher (i.e. 80+) than the nominal score they would be have been awarded for the PKS code (79 maximum). So, either they score and there is some benefit, or they don't score and there is an even greater benefit. I'm sure (I hope) this wouldn't happen but we currently have a ridiculous situation where schools could gain from entering pupils for a test they may not be able to access, award them an incorrect teacher assessment, and hope they don't score.

The simplest solution is to award a nominal score of 79 points to those 'HNM' pupils who take the test but do not achieve enough marks to be awarded a scale score. That would solve this particular problem but considering the big negative deficits associated with PKS nominal scores and the considerable impact these have schools' progress measures, there are still serious question marks over the whole methodology.

Monday, 12 September 2016

VA Calculator and Floor Standards tool: links to download

Warning: this blog post contains no opinions or rants whatsoever.

I recently shared my VA calculator and floor standard tool on Twitter and thought it'd be useful to put the links into a blog so people can find them easily.

PLEASE NOTE: the tools are stored in my One Drive so don't edit the online versions (think I set them to view only but just in case). Please download and save them to a local drive before getting to work, otherwise everyone will be able to see your data and i'll have to delete it. Also, it's likely that the links will be blocked on school PCs so you might need to access at home.

So, here are the links:


Click here to download the VA calculator

I've updated the VA calculator to include an Actual vs Estimated results page. You can now use this to get an idea of percentages 'expected' to achieve EXS and high standard for current cohorts. It will also calculate actual results once you have entered Y6 test scores and TAs; and compare actual to estimates.

Many people have asked me for the password for this spreadsheet, which is understandable. The spreadsheet is still locked but to save you the hassle of contacting me, the password is 'primary'. PLEASE don't unlock and edit the online version.

Note: new version calculated low/middle/high bands


Floor standards tool can be downloaded here

Please read the notes on the first tab of both tools for instructions and get in touch (@jpembroke) if you have any questions or comments.

Have fun!

Thursday, 1 September 2016

Update on primary floor standards and progress measures

To coincide with the performance tables checking exercise, the DfE have released an updated Primary Accountability guidance document and additional note on the new progress measures. These documents contain useful information on how progress measures are calculated, and provide more detail on floor standards.

Schools that have downloaded their checking exercise data will find their progress measures in the summary sheet and will note that there are three figures for each of reading, writing and maths, e.g. 2.5 (-4 to 5.6). The first figure (out of brackets) is the progress score: negative indicates that the cohort made less than average progress, 0 indicates average progress, and a positive score indicates more than average progress. The other two figures (in brackets) form the confidence interval, which dictates whether progress is significantly above or below, or in line with average.

How to tell if progress is significant or not
Take note of the confidence interval in brackets beside your progress score:
  • If the first figure in brackets (the lower part of the confidence interval) is positive then your progress is significantly above average. For example: 4.6 (1.7 to 7.4)
  • If the second figure in brackets (the upper part of the confidence interval) is negative then your progress is significantly below -3.2 (-5.9 to -0.5)
  • If the figures range from negative to positive (i.e. they straddle 0) then data is not statistically significant. Your progress score is either positive but not significantly so (e.g. 0.8 (-1.5 to 3.2); or negative but not significantly so (e.g. -1.3 (-2.8 to 0.3).
Floor standards
The updated accountability guidance defines the much anticipated thresholds for 'sufficient progress' as follows:

Reading: -5
Writing: -7
Maths: -5

If 65% or more pupils achieved the expected standard in reading, writing and maths then these sufficient progress thresholds do not come into play. That school is above floor. If fewer than 65% of pupils achieved the expected standard in reading, writing and maths then the school's progress scores will be compared against these thresholds. In this situation, the school needs to match or exceed all thresholds to be in the safe zone. However, there are certain circumstances where floor standards do not apply:
  • there are fewer than 11 eligible pupils at KS2
  • fewer than 50% of pupils have KS1 assessments that can be used to establish prior attainment groupings
  • There is insufficient KS2 attainment information because there are fewer than 6 pupils with results in a particular subject.
In addition, please note that if your school falls below the 65% EXS attainment threshold and only falls below one of the three 'sufficient progress' threshold then you would need to be significantly below average progress in that subject, not just below, which provides an extra cushion (see footnote on p6 of main guidance) although it is hard to see how any school in scope for floor standards could be anything but significantly below if they fall below the above thresholds. All this tweaking is no doubt to ensure that we end up with the promised 6% of schools below floor

As for coasting, we have no new information. The guidance states that the DfE 'plan to announce the 2016 progress thresholds, which will be key in determining whether a school meets the 2016 part of the coasting definition, in the autumn when we lay the coasting Regulations in Parliament. We will add this information into this guidance once the Regulations have been laid.' So, it looks like progress thresholds for coasting will be different to those used in the floor measures and will probably be tougher (higher).

Nominal scores for teacher assessments

The guidance provides detail of the nominal scores, which will be assigned to teacher assessments for purposes of progress measures. Nominal scores for writing are as follows:
  • Working towards the expected standard: 91
  • Working at the expected standard: 103
  • Working at greater depth: 113
and for those working below the standard of the test (pre-key stage):
  • Below standard of interim pre-key stage standards (BLW): 70
  • Foundations for the expected standard (PKF): 73
  • Early development of the expected standard: 76
  • Growing development of the expected standard: 79
There is no mention of those that sat the tests but failed to score (i.e. HNM). I assume there are very few such children, but it is an anomaly that needs to be cleared up. Let me know if I've missed something.

It is important to note that the above scores are for purposes of progress measures, to ensure that progress measures reflect the progress made by all pupils. The scores will not be reported or used elsewhere, including in the average scaled scores that will feature in RAISE and performance tables.

Prior Attainment Groups (PAGs)
For future reference, the DfE provide a useful lookup table, which provides KS2 estimates for each KS1 prior attainment group. There are 21 groups in total, ranging from pupils with KS1 APS between 0 and 2.5, up to the highest prior attainment group comprising pupils with KS1 APS of 21.5 or higher (i.e. at least one subject at KS1 was Level 4). Prior attainment group 12 (those with KS1 APS between 15 and 15.5 - the 2B'ers) have KS2 estimates for reading, writing and maths of 100.6, 100.7, and 101.5 respectively. These scores represent the 2016 average scores for pupils with the same start point nationally (i.e. the average scores for pupils in the same prior attainment group).

It is encouraging to see that the methodology now takes note of p-scales at KS1, rather than lumping all 'below level 1' pupils into one group and coding them as 'W' (working below). This welcome development results in much greater differentiation of prior attainment groups and will consequently lead to a fairer, more refined progress measure. From now on, pupils that were P6 at KS1 will not have the same 'expectations' for KS2 as a pupil that was P8, and that's a good thing.

Oh, and one more thing: national 'average' progress is always 0. Adding up all the +/- differences between actual and average score comparators will always result in 0. There is no longer any such thing as expected progress.

Pupils working at a higher standard
Quick mention of this measure, which we've been aware of for some time now (it was originally announced on a government legislation website, and more details came out in the first version of the Primary Accountability document, published in January). We now know the the high standard threshold is 110 for reading, maths and EGPS; whilst in writing it is obviously 'working at greater depth' (GDS). The DfE Statement of Intent shows that the performance tables (and RAISE) will show percentages achieving the high standard in each individual subject, but the headline measure is the combined one. This will involve those pupils that achieved 110+ score on the reading and maths tests and achieved GDS in writing (i.e. combined). EGPS scores are not part of the headline measure (this year at least). National and LA data can be found in the recently published SFR, by the way. 

That's enough for now. Still lots to get through so no doubt I'll be blogging again in the near future.

Wednesday, 31 August 2016

A brief word about the checking exercise

Tomorrow (or today depending on when you are reading this) primary schools receive data as part of the DfE's checking exercise. The aim is for schools to check the results and submit any changes by 16th September. These changes, along with any review outcomes, will be reflected in the performance tables in December and will not be taken account of in any earlier data releases. This obviously includes the unvalidated RAISE reports and Ofsted Inspection dashboards, which should be published in October. This has always been the case: first versions of RAISE and dashboards are based on unvalidated, potentially inaccurate data; later versions are based on the validated data, corrected via the checking exercise. 

In addition to pupil level results for checking, schools will also receive a summary sheet of headline results, including progress scores. Depending on how accurate your pupil data is, these headline data may or may not be reliable. Again, please bear in mind that if you make changes to your data tomorrow then this will not take effect until the validated RAISE report is published, probably in December. 

This is why RAISEonline provides users with the facility to edit pupil data, so you can ensure that RAISE is as fair and accurate portrayal of standards in your school as possible prior to the release of validated data. Note that any changes made do not alter the main report but can be viewed via the interactive reports, printed out and slotted into your main report to show the alternative, true picture. It is well worth familiarising yourself with these functions in the data management section of the website so you can get to work when version 1 of RAISE is released. Get in touch if you need any further information on this. 

Best wishes for tomorrow

Sunday, 28 August 2016

Guidance on primary data and accountability: useful links

As there are so many data-related documents out there at the moment, and many people are struggling to find all the guidance they need, I thought it'd be helpful to put all the main links in one place. Please let me know if I've forgotten anything important:

General guidance:

Primary School Accountability technical guide (includes floor standards & detail on progress measures):

Rochford Review recommendations for pupils working below standard of tests:

Report of Commission on Assessment without Levels (and other docs):

Data Management Review Group report (reducing teacher workload):

School performance tables 2016: statement of intent (content of performance tables - see Annex A):

The National Curriculum Programmes of Study (Primary):

Tests and teacher Assessment administration:

Teacher assessment frameworks, moderation guidance and ARAs for KS1 & KS2:

Assessment & Reporting Arrangements for EYFS:

Test administration guidance for Key Stage 1:

Test Administration guidance for Key Stage 2:

2016 Test frameworks for KS1 & KS2:

Data & Statistics

DfE Data checking exercise (for period 1st to 16th Sept 2016):

New performance tables website:

RAISEonline news - keep an eye on this particularly scheduled downtime (means new data coming):

DfE Statistics (main site):

Ofsted Stuff

Ofsted handbook:

School Inspection Updates from Ofsted (quarterly newsletter):

Ofsted Inspection Reports:

Ofsted monthly management information:


Hope that's useful

Wednesday, 20 July 2016

Content of the new RAISE report: The Good, the Bad and the Ugly

The DfE have now released details of the content and format of the new RAISE summary reports, to be published in the Autumn term. As expected, they are going to look considerably different to previous versions; and it's not just the obvious stuff that's changed. Of course, there will no longer be pages filled with percentages achieving level 4 and level 5, and progress matrices are consigned to history, but there are also big changes in the way data is put into context, most notably with new national comparators and prior attainment groups for both KS1 and KS2. The new KS2 progress measure uses a similar methodology to existing VA but scores will be in a new format; and colour coding - perhaps the thing we are most fixated on - has changed and now comes with a subtle twist. Some of these changes I like, some I'm not so sure about and some really bother me. Here's a rundown of the good, the bad and the ugly.

The Good

There appears to be a big change to the way performance of groups is compared. Previously, this has been a mess with results of key groups compared against national 'averages' for all pupils, for the same group, or for the 'opposite' group with no indication as to which is most useful or relevant. The analyses of performance of disadvantaged pupils was particularly confused with comparisons made against national comparators for disadvantaged pupils and all pupils in some parts of the report but, most critically, against non-disadvantaged pupils in the closing the gap section; a section that was - somewhat bizarrely considering its importance - tacked on the end of the report. This mess seems to have been addressed in the new reports so we should now be more aware of the relevant benchmarks for each group. For example: boys, girls, no SEN, non disadvantaged and EAL groups will be compared against the same group nationally; disadvantaged, FSM and CLA pupils will be compared against the national figures for non-disadvantaged pupils (as per closing the gap); and the SEN support and SEN with statement or EHC plan groups will be compared against national figures for all pupils. OK, I don't quite get the rationale behind the last bit. Either SEN pupils should be compared against SEN pupils nationally, or not compared against anything. At least in the interactive reports users will be able to switch the comparator to 'same' for all groups allowing a like-for-like comparison.

Prior attainment
Big changes here. One of my main criticisms of RAISE is the lack of progress data for KS1. Previously, schools were judged on attainment alone. The new RAISE reports, whilst not providing VA scores for KS1 (one of the many things I like about FFT dashboards), will put the KS1 results into context by splitting cohorts into prior attainment groups based on EYFS outcomes. So, the KS1 results for those pupils that achieved a good level of development in the foundation stage will be shown, presumably alongside a national comparator. There will also be a further breakdown of KS1 results for those pupils that were emerging, at expected or exceeding in the early learning goals for reading, writing, and maths. I know there are many who will disagree with this approach but too many schools are forced into producing this data themselves when KS1 results are low. The lack of KS1 progress data in RAISE has also resulted in a major dilemma for primary schools: to they go high to ensure 'good' KS1 results, or go low and gain more progress at KS2. Hopefully, with KS1 results now placed in the context of prior attainment, this pressure will ease somewhat.

We will also see new subject specific prior attainment groups for progress measures at KS2. For example, the progress in maths will be shown for pupils with low prior attainment in reading, or middle prior attainment in writing. I assume the definitions of these bands are simply low = W or L1, middle = L2, High = L3, which differs from the point thresholds used for overall prior attainment groups based on APS at KS1. This new approach is welcome as it goes a long way to addressing concerns about the new VA measure outlined here. Whilst the main VA model is based on KS1 APS and will therefore result in pupils with contrasting prior attainment in english and maths being grouped together, these new prior attainment groups will allow us to unpick the progress data and isolate the issues.

Subject data on a page
All information about each subject (i.e. progress score, average score, %EXS+, % high score, for the cohort and key groups) will be shown in one table, which is great news, because up until now it's been all over the place. Previously, we've had to scroll up and down through about 30 pages of attainment and progress data to get the headlines for KS2, forcing us to create our own templates to compile the key figures. Hopefully now we'll just need to refer to a handful of key pages, which will be very welcome.

A shorter report?
Reading between the lines here, I'm hoping we'll have a slimmed down RAISE report this autumn. 60 pages was too much. How about 20 pages? How about just 10 and ditch the dashboard? Please let that happen.

The Bad

Comparing results of SEN pupils against those of all pupils nationally is certainly not great. That should be changed to a like-for-like comparison by default, rather than the onus being on the school to do this via the interactive reports in RAISEonline. Also, writing VA continues to worry me, but that doesn't look like it'll be changing anytime soon. I look forward to seeing the methodology but I'd rather it was ditched from progress measures. My other bugbear is percentages for small groups and it looks like that farce is set to continue. I don't think there are many primary schools where percentages make any sense when you drill down to group level, even when the gap from national 'average' is expressed as a number of pupils. I would prefer analysis of group data to focus on average scores, but even that is flawed in that it can be easily skewed by anomalies. The presentation of data for small cohorts and groups needs some serious thought.

The Ugly

Sometimes we should be careful what we wish for. I have major concerns with the application and interpretation of significance indicators in RAISE and have called for a more nuanced approach. And now we've got one. First thing to note is that red replaces blue as the colour of 'bad'. Many evidently aren't happy about this but the writing was on the wall once red dots were used in the Inspection dashboard and closing the gap section of the RAISE report. Red is also used to indicate data that is significantly below average in FFT reports. The second thing to note is that the double meaning of the colour coding, introduced in the inspection dashboard, continues. Red can either mean data that is significantly below average or signify a gap from national average that equates to one or more pupils in percentage terms. The third thing to note is that we now have shades of red and green defined as follows:

Pale red: indicates that data is significantly below national average but not in bottom 10%; or denotes negative % point difference equivalent to a 'small' number of pupils.

Bright red: indicates that data is significantly below national average and in bottom 10%, or denotes negative % point difference equivalent to a 'larger' number of pupils.

Pale green: indicates that data is significantly above national average but not in top 10%; or denotes positive % point difference equivalent to a 'small' number of pupils.

Bright green: indicates that data is significantly above national average and in top 10%; or denotes positive % point difference equivalent to a 'larger' number of pupils.

There is some serious blurring of the rules going on here. A significance test is a threshold and a school's results are either significant or they're not. Yet this approach will no doubt result in language such as 'very significant' and 'quite significant' entering the 'school improvement' lexicon, despite the bright red or bright green boxes actually being defined by a decile threshold rather than being the result of an additional significance test (e.g. a 99% confidence interval).  It's bad enough that people might talk in terms of degrees of significance; it's even worse that people will apply the term to data on which no significance test has been applied. Inevitably, we will hear disadvantaged gaps being described as 'very' or 'quite' significant because they are assigned a bright or pale red or green box, which in other parts of the report indicates statistical significance. Here, however, they relate to a gap equivalent to a certain number of pupils, and the thresholds used are entirely arbitrary; they are not the result of a statistical test. So we have colours meaning different things in different sections of the report - some denoting significance and others not - and shades of those colours defined by arbitrary thresholds. There is too much scope for confusion and misinterpretation, and schools will be forced to waste precious time compiling evidence to counter a narrative based on dodgy data.

No change there then.

Thursday, 7 July 2016

The progress measures you shouldn't attempt (but you're going to anyway)

The SATS results are out and many schools are feeling utterly dejected. Our tracking systems told us everything would be OK and we confidently gave predictions to governors, staff, LAs and others, of results that would at least be above floor. But then Tuesday happened and not even the country as a whole came close to floor standard. Almost half of pupils nationally did not meet the expected standard and schools are left reeling. Almost as soon as the scores were downloaded from NCA Tools, teachers were on twitter discussing progress measures:

1) How will progress be measured this year?

2) What will the floor standards for progress be?

3) How can we measure progress now?

The answer to the first question is: it will be a VA measure calculated like this - pretty much identical (in concept) to VA calculations of previous years.

The answer to the second question is: we don't know. They will certainly be negative; and now it seems they'll be very negative if the DfE really only wants a 1% point increase in the number of schools below floor.

The answer to the third question is: we can't. We have to wait until the autumn term for the VA scores to come out in RAISE. But that's not going to stop schools and others (I'm looking at you LAs and Academy chains) from having a go now. 

So, here are the three things that schools are going to attempt in advance of the real data being published. Three things that are pointless and most likely to be at odds with the official VA data; that are likely to cause further confusion and pain down the line, but that schools will do anyway.

1) Attempt to calculate VA

Surely we all understand how VA work by now. It involves comparing each pupil's KS2 score against the national average score for similar pupils (similar in terms of prior attainment based on KS1 APS). So, if you wanted to have a decent stab at it, you will first need to know the national average scaled score for each of these start points:

3 8 11.5 15 18.5
4.5 8.5 12 15.5 19
5.5 9 12.5 16 19.5
6 9.5 13 16.5 20
6.5 10 13.5 17 20.5
7 10.5 14 17.5 21
7.5 11 14.5 18

According to my calculations, using the new KS1 APS formula, those are the discrete APS outcomes derived from all possible combinations of W, L1, 2c, 2b, 2a, L3 for reading, writing and maths at KS1. We need to know the national average KS2 scaled score outcomes for each of those 34 prior attainment groups (more if we throw L4 into the KS1 mix). No one knows these figures at the moment. And even if we did, there are shrinkage factors to take account of school size, and no doubt other coefficients to consider as well, none of which we know right now. So what might some try instead?

First, they might just compare each pupil's KS2 score to the overall national average score of 103. That is not VA; that is relative attainment. It is not a progress measure. You are not comparing pupils against an appropriate benchmark linked to start point. It is meaningless. A non-starter. Don't do this.

Second, you might work out the 'in-school' average score for each start point and compare each pupil to that. Well, nice try and on the right track (it at least demonstrates awareness of VA methodology) but there is one major floor: you will always get the same outcome: 0. I know people have tried this in the past (had that conversation) and a headteacher actually emailed me some LA guidance this week that suggested schools do this. Seriously, don't. 

2) Subtract KS1 APS from KS2 score
As certain as death and taxes, give people a start figure and an end figure and they'll subtract the former from the latter. I bet people are doing this right now: subtracting KS1 APS from KS2 scaled score, and then inventing arbitrary thresholds to define 'expected' and 'more than expected' progress. Do you have a 2b pupil who achieved 107 in their maths test? Simply subtract 15 from 107. They've made 92 points of progress. That's excellent! Another only made 88 and fell short of the 90 point good progress threshold. Hmm, less than expected.

That's Numberwang! (credit: @simonraz :-)

3) Invent a progress matrix
We all love a progress matrix. I know I do (seriously, I do). So let's invent one now. First we start with the assumption that all 2b pupils should achieve scores of 100+. But what about the others? The L1, 2c, 2a and L3 pupils? Time for some arbitrary thresholds I reckon. To save you all the hassle of creating a progress matrix yourself, I've made one up for you:

It's so simple to use and easy to understand. What can possibly go wrong? Feel free to copy and fill out to present to SLT, Governors, SIPs, LA officers, Ofsted inspectors and that chap from academy head office. I guarantee they'll love it.

But I guarantee one other thing, too: none of this will bear any relation to the real VA data when it's published.

so why bother?