Monday, 22 February 2016

The Writing's on the Wall

Last summer, when the news trickled out that the floor standard would be kept at 65%, rather than being hiked up to to a mountainous 85% as originally proposed, the sense of relief was palpable. We already knew that the expected standard was going to be tougher to achieve than a level 4, and to not only raise the academic expectation on individual pupils but also the accountability threshold for schools seemed monumentally unjust. So, the DfE did the right thing and left the floor standard threshold where it was. After all, they wouldn't want the majority of schools to be below floor, would they? We all cheered.

But now that sense of relief is ebbing away. First came the interim teacher assessment frameworks, then the primary school accountability technical document, and most recently the key stage 2 exemplification guides. Suddenly, for many schools, that 65% threshold looks like another country. And for some it's simply off the map.

And so, many schools will be pinning their hopes on the progress measures, and it's these that worry me. The first major distinction between the new and existing progress floor standards is that the new ones are VA-based, whereas the existing ones relate to the percentage of pupils making two levels of progress - so called expected progress. But unlike in previous years where a school that fell below the 65% L4 threshold only had to match one of the expected progress medians to be above floor, in 2016 any school that falls below the 65% expected standard threshold will have to match or exceed all three measures of 'sufficient progress', to be above floor. Now, it's important to point out that because it's a VA measure we currently have no idea what constitutes 'sufficient progress' (the DfE can't calculate VA until all results are in) but we know the thresholds are likely to be negative. Note that, in future, a VA score of 0 will indicate average progress - replacing 100, which indicates average progress now - and so approximately half of schools will have negative VA scores, just as around half of schools have VA scores below 100 now. This is why they can't set the 'sufficient progress' threshold at 0: there would be too many schools with negative VA scores in all three subjects. So, it'll be negative - how negative we don't yet know - and maybe statistical significance will have a part to play too. Furthermore, the DfE have suggested that in future they will set the sufficient progress threshold in advance, which will be helpful (sort of).

I don't have a problem with VA per se. It's about as fair as progress measures get because it inherently recognises that pupils make different amounts of progress from different start points in different subjects, and is therefore a lot more realistic than the ridiculously simplistic expected and better than expected progress measures that it's replacing. VA, by its nature, does not involve a universal expected rate of progress.

I do however have a problem with the way VA is calculated for writing and have had since its inception. Writing VA is clumsy, and clunky and blunt. Unlike in reading and maths, where pupils achieve fine grades from their tests, in writing pupils are awarded a broad teacher assessment. This teacher assessment (usually L3, L4 or L5) is then converted to a point score (21, 27, 33). The weird thing about writing VA though is that pupils' point scores from their teacher assessments are compared against an unachievable fine grade estimate, which represents the average KS2 result for pupils with the same KS1 prior attainment. The result is that there are wild swings in writing VA that you don't see in the other subjects. This is demonstrated by the following table:

The table shows the actual results (whole levels) for pupils in writing, and the fine grade estimates against which their results are compared. The estimate represents the average result for pupils nationally with the same prior attainment at KS1. Pupils are almost without exception a long way above or below their individual estimates, resulting in big positive or negative VA scores that are nowhere near as common in reading and maths where pupils' fine graded test scores ensure they can get close to their estimates. There are, for example, many pupils in the above table with estimates of 28-29 points (4a) that are awarded a L4 teacher assessment, which attracts 27 points, and who therefore have VA scores of -1 to -2. The only way these pupils can achieve a positive VA score is if they are awarded a L5. Their VA then shoots up to +4 or +5. Because the estimates are based on national average outcomes and more and more Level 5s are awarded each year, there has been a corresponding increase in the estimates to the point where L4 in most cases is worthless. Consequently, schools that err on the side of caution and award Level 4s are finding themselves with very low - sometimes significantly low - writing VA scores. I often see RAISE reports with positive VA in reading and maths and negative VA in writing, and others that have negative VA in reading and maths and positive VA in writing. No prizes for guessing which one makes me frown the most.

So, currently, writing VA is prone to big shifts into negative and positive territory as a result of how many L5s the school gives out and the 6 point swings this generates. Often, one more L5 can make the difference between a blue box and safety. And, having modelled various scenarios, it's feasible (although somewhat unlikely) that a few more L5s can even turn a blue box green.

And what about this year? Will there be a change for the better? Probably not. We don't know the details yet, but p15-16 of the primary accountability guidance provides the following with regards writing at key stage 2:

"Pupils will be allocated nominal points, based on their key stage 2 writing teacher assessment, which will be used in the progress calculation. This is done only in order to calculate a school’s progress scores. Pupils will still receive their teacher assessment as their key stage 2 outcome and no pupil will receive our nominal point score as their key stage 2 outcome. We will confirm the exact numbers that will be assigned to teacher assessment categories after the first set of assessments have been completed in the 2016 summer term."

We have no idea what 'nominal' scores will be allocated to each of the 3 main teacher assessments or to the pre-key stage assessments, for that matter, but lets assume the nominal scoring is something like this:

working towards the expected standard = 90
working at the expected standard = 100
working at greater depth within the expected standard = 110

If this were the case - yes, I've made these up but it doesn't seem too far fetched - then I can't see how it will result in anything other than a continuation of the exaggerated gaps and skewed results we have now. Pupils with a particular start point, say a KS1 APS of 13, may have a KS2 writing estimate of 93. This represents the national average score for pupils with the same prior attainment and is calculated - if the nominal scores listed above were adopted - by adding up all the 90, 100, 110 scores for all pupils that have a KS1 APS of 13 and dividing by the number of pupils (there might be 50,000 of them). This unachievable estimate of 93 is below the expected standard but is above 90 so, in my example, if a pupil is assessed as working towards the expected standard they'd have a VA score of -3; if they are assessed as working at the expected standard they'd have a VA score of +7. Then imagine that happening across an entire cohort. Even with the new moderation arrangements there is clearly a temptation here to game it, to try to beat the system. If this happens then the following year the situation will get worse as averages, and the corresponding estimates, increase. Same as it ever was. Obviously, they could opt for scores on a separate scale, e.g. 1, 2, 3 for each of the three main assessments but a) a close scale would not adequately differentiate between pupils or do justice to the progress they make, and b) it's still arbitrary. 

So, writing, which many schools already fear will be the subject that pulls them below the 65% threshold, will continue to have a flawed VA methodology, which benefits those that push pupils up and punishes those that are perhaps more realistic or cautious. Considering how high the stakes are, I believe it's time progress measures in writing were scrapped - VA should only be calculated using fine graded test data. The removal of writing from the baseline for secondary progress 8 measure next year speaks volumes about how much the DfE values teacher assessment so why are they continuing to use it in this critical accountability measure?

Either we make do with VA in just reading and maths or, if we have to have three progress measures, then perhaps it's time the DfE did what they probably intend to do anyway: use the EGPS test scores instead.

Until that happens, writing VA is junk and the entire system of floor standards is a sham.

Wednesday, 3 February 2016

This much we know (and some stuff we don't)

Last week the DfE published its document on Primary School Accountability in 2016, a technical guide to the new key performance measures that will form the basis of the performance tables and Ofsted data this year. With this key document finally out in the public domain, it seemed like an ideal time to summarise the key data changes heading our way, and speculate on some known unknowns. Let's start at the beginning.

We now have a reception baseline (you may have noticed). This will eventually provide the start point for VA measures but not until 2022 when the first of the baseline cohorts reach the end of KS2. This cohort will have VA measured from baseline and from KS1 and the school will benefit from the better of the two. From 2023, VA will only be measured from baseline and schools choosing not to use an approved baseline provider will be measured on attainment alone. Any takers?

The DfE could of course also use the baseline to measure VA between reception and KS1 (something they have previously aluded to) and even from reception to the phonics test. As with all other VA measures, pupils are essentially grouped by prior attainment (baseline scores), which indicate their ability on entry. A pupils' attainment (at KS1 or KS2) is then compared against the national average attainment for pupils with the same start point. It is important to understand that pupils are compared against the national average for pupils with the same start point, not the overall national average for all pupils, or the school average (if you did that you'd always end up with a VA score of zero). 

In an attempt to provide a simple analogy of VA I recently tweeted this:

I think some people thought I was making a serious suggestion. I was actually just trying to show how VA worked, how it does not require data in the same format at either end, and that each pupil is compared against pupils with similar prior attainment nationally. 

Anyway, back to the baseline. We have three baseline providers: Centre for Evaluation & Monitoring (CEM), NFER, and, of course, Early Excellence, who have a whopping 75% market share. Those that lost out were GL Assessment, Hodder Education and Speech Link. I'm intrigued to see how this pans out over the next year or so: will there be much movement between providers? and will an observation-based, 'non-invasive' approach provide accurate and consistent baseline data? The SQA are currently carrying out a comparability study of baseline providers, which is due to report back soon. This may result in further shake up of the market and the word on the street is that the DfE really just wants one provider, which of course makes sense. Why they didn't do this in the first place is beyond me.

Not going anywhere!

Key Stage 1
This year we have the introduction of externally set, internally marked tests at KS1 to 'inform' teacher assessments. This will include a grammar, punctuation and spelling test (which everyone seems to be thrilled about) to 'inform' the teacher assessment of writing. The tests will be marked internally and schools will be provided with a conversion table to convert the raw score to a scaled score, where 100 indicates that the pupil has met the expected standard in the test. The reason for the inverted commas, is because I'm not sure how rigidly the test score will inform the teacher assessment. Can a pupil who achieves 92 in the SPaG test still be assessed as having met the expected standard, for example?

Most pupils will be assessed as 'working towards the expected standard' (WTS), 'working at the expected standard' (EXS) or 'working at greater depth within the expected standard' (GDS), the latter being the new catch phrase for 'mastery'. Since the publication of the Rochford Review we now know that pupils who are unable to access the tests or working below the level of the tests can be assessed as either 'foundations for the expected standard' (PKF) or below (BLW) and that p-scales are also still in that mix. 

Bizarrely (well, it seems bizarre to me), the DfE will not be collecting the KS1 test scores this year, which means that the VA for this cohort (who do not have reception baseline scores) will be based on these broad assessments when they reach the end of KS2 in 2020. It's less differentiated (although arguably more robust) than the outgoing system of sublevels at KS1, which makes me concerned about the validity of VA measures for this cohort. Essentially, a pupil that was EXS at KS1 will have their KS2 scores compared against the national average KS2 score for pupils that were EXS at KS1. What exactly does this tell us? If the DfE collected the test scores the measures could be a lot more refined, with pupils placed into smaller, more meaningful prior attainment groups.  

Of course, these are all interim arrangements, which begs the question: what will happen next year? I bet the DfE start collecting KS1 scores and it wouldn't surprise me if we end up with externally marked KS1 tests, which makes me wonder about the future of teacher assessment.

And I'm willing to wager a Mars Bar that we end up with floor standards for KS1 within 3 years. Anyone willing to bet against it?

Key Stage 2
Not much change here.


First up, the floor standards and an admission. Michael Tidd was right: there will be three separate progress measures, not the single measure I'd hoped for. The DfE had alluded to a single VA measure in the Education Regulations 2015, which suggested that there would be one 'column' for progress. This would fit with the single Progress 8 measure used at KS4 and be less harsh on schools, but that is evidently not going to be the case here. So, a school is above floor if 65% or more of its pupils achieve the expected standard in reading, writing and maths combined. If a school falls below this 65% threshold then the progress measures kick in. Unlike now, when a school will be above floor if they are above just one of the medians for expected progress, in 2016 they will need to be above all three 'sufficient' progress measures. We don't know what these thresholds are at the moment but they'll almost certainly be negative, much as the Progress 8 floor standard is at KS4, or we'll end up with a lot of schools below floor.

The DfE suggests  that they will fix the progress thresholds for future years so schools will know in advance the line they need to cross, which will be a step in the right direction. But then comes the fun bit: trying to predict VA so schools have a reasonable idea of the position they're in. I am hoping that i'll be able to continue to produce my VA calculator to help schools with this but it will rely on schools' ability to predict pupils' KS2 scaled scores with some degree of accuracy.

And just to reiterate, if a school is above the 65% expected standard threshold, then they are above floor even if VA is low in all subjects.

So, according to the Primary Accountability document (see link at top of this post) the key performance measures for 2016 are as follows:

1) the percentage of pupils achieving the ‘expected standard’ in English reading, English writing and mathematics at the end of key stage 2 

2) the pupils’ average scaled score:
  • in English reading at the end of key stage 2 
  • in mathematics at the end of key stage 2
3) the percentage of pupils who achieve a high standard in English reading, English writing and mathematics

4) the pupils’ average progress:
  • in English reading
  • in English writing
  • in mathematics
A few observations: 1) SPaG still does not form part of these key performance measures (bet that changes). 2) it doesn't look like there will be a combined average scaled score, just average scores for reading and maths separately. Writing teacher assessments will be assigned a nominal value for the purposes of VA but these scores will not be published. 3) We have no definition yet of what constitutes a pupil achieving a high standard. I assume that this will require some norm-referencing once all the results are in and could be based on standard deviations. Question is: will it be fixed at a specific score in future or will it change every year?

Testing arrangements continue as now: pupils will sit externally set, externally marked tests in reading, maths and SPaG, and results will be reported as scaled scores. Schools will receive raw scores, scaled scores and an indication of whether the pupil has met the expected standard (i.e. achieved a score of 100 or more in their tests). Scores will be reported to parents.

Teacher assessments also continue but are far more limited for reading, maths and science, in which pupils are simply assessed as having met the expected standard or not. Writing has the same assessments as at KS1: working towards the expected standard, working at the expected standard, and working at greater depth. It's where pupils are working below the expected standard or unable to access the tests that things get rather bizarre (and reduce people to tears and hysterics). In science all very straightforward: if they have not met the expected standard then they are HNM - there is no further differentiation - but in reading and maths HNM is a specific assessment that appears to align with 'working towards' in writing, and below that we have:
  • Growing development of the expected standard
  • Early development of the expected standard
  • Foundations for the expected standard 
And below that is below (BLW) and below that are the p-scales. 

Well, that's my interpretation but I recommend you read the Rochford Review and make up your own mind. Call me a conspiracy theorist but I do wonder if they have deliberately made this so painful that no one will complain when they ditch statutory teacher assessment. 

Oh, and I suppose I'd better mention coasting schools. These are schools that fall below attainment and progress measures across 3 years. The first coasting schools will be identified this autumn and these will be schools that have fewer than 85% (yes, 85%!) achieving L4+ in reading, writing and maths in 2014 and 2015, and fewer than 85% achieving the expected standard in 2016. In addition they will need to be below the 2014 and 2015 medians for expected progress in reading, writing and maths (2014: 94%, 96%, 93%; 2015: 94%, 97%, 93%), and below just one of the three, yet to be defined, progress thresholds for 2016. Based on data for the last 3 years, Education DataLab suggested around 5% of primary schools will find themselves caught in this net but considering the expected standard is tougher than L4+RWM, it could end up being higher than that. And it remains to be seen if the DfE employ the same progress thresholds for coasting as they do for the in year floor standards. They'll probably be tougher just to add a further layer of excitement and jeopardy.

Right, that's enough (until I remember something important and edit this post).

I doubt that anyone has read this far anyway.

Certainly not without the aid of gin.