Monday, 12 October 2015

What is sufficient progress?

In the crazy world of competitive shin kicking, combatants apparently shout "Sufficient!" when they've had enough. I know how they feel. This whole issue of defining 'sufficient progress' feels like being kicked in the shins repeatedly. So, I thought I'd try to explain what it is and why attempting to set end of key stage targets right now is a fairly futile process.

The Key Stage 2 Assessment & Reporting Arrangements for 2016 were published last week and the key changes section contained this brief explanation of the new progress measures:

2.9 Progress Measures

Progress measures in 2016 will work in a similar way to current primary value-added measures or Progress 8 in secondary schools. A school’s score will be calculated by comparing their pupils’ KS2 results against those of all pupils nationally who had similar starting points.

Pupils will be assigned to prior attainment groups based on their KS1 results.

The department will confirm what score a school would need to get to have made ‘sufficient progress’ after the tests have been sat next summer.

More detailed guidance on how the new measures will be constructed is expected to be published early in 2016.

After years of expected and better than expected levels of progress measures, this seems new and daunting, but it is exactly the same method used in the VA measures of RAISE and FFT reports for years. Essentially, it involves comparing a pupil's KS2 attainment against the national average attainment for pupils in the same cohort with the same start point (this is known an estimate or benchmark). So, for example, we compare the KS2 scaled score of a pupil that was 2c, L1, 2c at KS1 against the national average scaled score for pupils that were 2c, L1, 2c at KS1. I produced this hypothetical example to illustrate this:

In this example, a pupil has fallen short of the expected standard but has made 'sufficient' progress and achieves a positive VA score because their scaled score is higher than the national average result for pupils with the same prior attainment. Conversely, it is possible for a pupil to achieve the expected standard but not make 'sufficient' progress because nationally pupils with the same prior attainment achieved a higher score on average. The differences between pupils' actual and estimated results are then averaged for the whole cohort to arrive at a school VA score. It is most likely that sufficient progress will be a negative threshold and perhaps based on percentile ranking so that we don't end up with 50% of schools below floor. 

The key thing here, with regards our attempts to second guess what constitutes sufficient progress, is that pupils' individual benchmarks are calculated retrospectively. In other words, because a pupil is compared against the average attainment of pupils nationally with the same start points in the same year, we have to wait until all the 2016 results are in before we know the line they have to cross. This means we are stumbling around in the dark right now and any attempt to set targets is like shooting into the night whilst wearing a blindfold. At best it's distracting; at worst it's a danger to pupils' learning. Even FFT - and those guys know a thing or two about target setting - are being cautious this year by providing broad estimates in the form of ARE bands. But that's not stopping some people from having a crack at it. 

Things will improve in 2017 once we have some actual data in the bank, but considering more and more pupils will reach the expected standard each year, and that average scores will increase, any estimate derived from the previous year's data is likely to be too low. Right now though we have absolutely no idea what a likely outcome will be from any particular start point because no one has sat the tests. We certainly shouldn't be applying some spurious methodology like adding two whole levels of progress to the KS1 result and then attempting to convert the outcome to a scaled score. This is failing to understand the difference between VA and expected progress. It is a fact that many pupils in primary schools made so-called expected progress but fell short of their VA estimate (the opposite was common in secondary schools), and we could unwittingly repeat this through an ill-conceived approach. And, whilst the DfE claim that the new expected standard is equivalent to a 4b, we know this is a very broad approximation and performing a conversion to scaled scores on that basis is likely to be inaccurate and misleading. 

The concerns are twofold: 1) that schools will attempt to teach to the test, and 2) that schools will be held to account for targets set on flawed methodology. My fear is that schools, having set scaled score targets for pupils based on a 'sufficient progress' model, will then test pupils to see how close to these targets the pupils are. The sample tests don't have a raw score to scaled score conversion so they might attempt to do it themselves using the rough criteria for meeting the expected standard contained in the frameworks. Highly dubious. Alternatively they might used a commercial standardised test, which produces scores in a similar format. Again, this is very risky. Schools must understand that a score of 100 in these tests indicates the average level for pupils taking that particular test, and therefore cannot be linked to the key stage 2 expected standard. It might be that pupils find that particular test hard, so 100 will be linked to a low raw score. Or they might find it easier, so 100 will be linked to a higher raw score. No matter the difficulty of the test, around a half of pupils will be below 100 and the other half will be above. The expected standard, on the other hand, will always be around the same level of  difficulty and the DfE want to move towards 85% of pupils achieving it (the floor standard being kept at 65% in 2016). This means that 100 is not the average and is therefore a different entity to a score of 100 in a commercial test.

So, take note when trying to set targets for 2016. It's a pointless exercise. The data will have no validity - it's a stab in the dark - and could even be dangerous. The best advice, for this year at least, is to block out the noise and concentrate on the teaching. Then the results will hopefully take care of themselves.

Tuesday, 6 October 2015

The Grim Fairy Tale of Teacher Assessment

It occurred to me recently that if statutory teacher assessment were a fairy tale it would be Cinderella. Loved and cherished in the Early Years, its future has become ever more uncertain, its status steadily undermined. Soothed by masters who, publicly at least, pay lip service to its value, teacher assessment is becoming increasingly sidelined, abused, neglected and ignored. One wonders if ultimately it will be ditched altogether. These are dark days indeed for teacher assessment.

Early Years: the warning signs
The foundation stage profile as a statutory entity has a year to go and the government has introduced a baseline assessment for future progress measures, which many schools have already carried out this term. A choice was offered and schools voted overwhelmingly for Early Excellence because it fitted the ethos of the EYFS: teacher assessment based purely on observation. But many schools opted for NFER or CEM (having possibly opted for one of the others before they were removed from the list) so we have a fragmented approach, and I question if this is the DfE's ideal situation. On the one hand, they want to offer schools a choice (and allow the market to decide the best approach) but deep down would they have preferred a single test that provides a standardised baseline? I'm intrigued to see what happens when the first cohort of children get to the end of key stage 1. What if analysis shows that there's no apparent relationship between the baseline assessment and key stage 1 results? This would surely mean that the baseline is a poor predictor of outcome, a critical stipulation of the consultation (see p7). What then? Could they be ditched and replaced? I guess we'll find out in 3 years time.

Key Stage 1: The abuse begins 
Remember the performance descriptors? Below national standard, working towards national standard, working at national standard, and mastery. There was a consultation, there was opposition, there was silence for months, and then there were three - working towards, working within, and working at greater depth within the expected standard - contained in a sparse 11 page document. All that time to come up with those? Is it an improvement? OK, there was an apology, but there is also the intriguing use of the word 'interim'. These teacher assessment are interim and they are for 2016 only. What happens after that? And what happened to 'below'? What of those pupils that do not meet the criteria of 'working towards the expected standard'? How do we assess them?

If that isn't worrying enough, the DfE's response to the consultation on primary school assessment and accountability states that 'at the end of key stage 1, teacher assessment in mathematics and reading will be informed by externally-set, internally-marked tests. There will also be an externally-set test in grammar, punctuation and spelling which will help to inform the teacher assessment of writing.' Informed. Is this a polite way of saying validated? Or straitjacketed? Does this mean that the teacher assessment can only be X if the test score is Y? 

I read Daisy Christodoulou's blog in support of tests recently. It contained the following interesting point:

'Similarly, one way we could ensure greater equity in the early years is to introduce exams at KS1, rather than teacher assessments, since we have some evidence that teacher assessments at this age are biased against pupils from a low-income background – but again, if you suggest replacing teacher assessments with tests, you generally do not get a great response.'

Fair point. And she's right of course, it wouldn't get a great response but one wonders how much support such a position gets behind closed doors. And come to think of it, isn't that what's actually happening anyway? Next year's key stage 1 tests will provide scaled scores linked to an expected standard; and these scaled scores will 'inform' the relevant teacher assessment of which there will be one of just three possible outcomes. It sounds like the tests are winning to me.

Key Stage 2: Willful neglect

Now things get worse for dear teacher assessment. Reading, maths and science are reduced to simple binary outcomes. Are they working at the expected standard? Yes or no. Writing has been reduced from five possible outcomes (same as the originally proposed key stage 1 performance descriptors with 'above' shoehorned in between 'meeting the expected standard' and 'mastery') down to three as per key stage 1. Just like key stage 1 there is no teacher assessment for GPS, but unlike at key stage 1, the result cannot be used to 'inform' the teacher assessment because the tests are externally marked, which probably explains why for all subjects other than writing, the teacher assessment is binary. The yes or no response. In what way is this useful to anyone?

The STA's timetable of progress measures states that in academic years 2019/20 and 2020/21, progress will be measured from 'new' KS1 teacher assessment to 'new' KS2 test and teacher assessment outcomes. Call me a cynic but I don't believe it. I don't believe anyone would choose to use the vague key stage 1 teacher assessments as a baseline when a scaled score is a available. And while we're on the subject, I wouldn't be at all surprised if GPS test score usurps the writing teacher assessment at the key stage 2 end of the measure. For VA you want data to be as fine as possible, and a 3 tier teacher assessment hardly cuts it. At key stage 2, teacher assessment is looking seriously endangered.

Key Stage 3/4: Locked in the attic

Baselines for new Progress 8 measure at key stage 4 will involve a decimal level derived from pupils' English and maths results at key stage 2, where English is a combination of reading test result and writing teacher assessment. From 2017 onwards, once the last cohort with overall key stage 2 English levels have left, reading and mathematics test results only will be used in calculating key stage 2 prior attainment fine levels for use in progress 8. Writing will not feature. For those pupils missing key stage 2 test results, the teacher assessment will only be used in certain circumstances. In most cases where a pupil is missing one result, the teacher assessment will not replace it. Instead, the pupils' baseline will involve the one test result that is present. Here the importance of the key stage 2 teacher assessment has not so much been undermined as completely demolished.

The End

I think this is as far as the Cinderella analogy goes. I do try to be optimistic but I can't see any cause to be so here. From the potential mess of the fragmented reception baseline to the near total exclusion of teacher assessment from progress 8 baselines, and the 'interim' frameworks in between, the future looks bleak. For all the positive noises about the importance of professional judgement, teacher assessment at all key stages has been progressively marginalised to the point it is a shadow of its former self, and I'm not sure this particular fairy tale will have a happy ending.