Thursday, 6 December 2018

Slave to the algorithm

I once sat in on a series of demos of primary school tracking systems for a large MAT. At lunchtime the CEO - a secondary head by trade - sat with his head in his hands and asked: "how have we reached the point where we have so little faith in teachers' judgement that we need software to give us the answers?". He was talking about 'assessment by algorithm' approaches employed by many systems (both commercial and home grown), and schools, whereby teachers tick-off, RAG-rate and score numerous objectives and a formula calculates an overall grade or level for the pupil.

And yet here we are - thousands of schools use such systems. Teachers dutifully spend hours ticking off the learning objectives, not just for core subjects but often for other subjects as well, and we leave it to the system to decide whether pupils are below, at or above 'age-related expectations'. The fact that teachers will may disagree with the computer is all too often overlooked, because senior leaders want the security of supposed consistency. And this speaks volumes. Senior leaders in such schools evidently do not have faith in teachers' judgement, and perhaps teachers themselves are in a comfortable place here, having been absolved of responsibility for assessment they can point at the computer and say "it wasn't me, it was the software".

But the whole thing is a fallacy. There is no consistency - it’s not standardised, it just gives the illusion of standardisation. First, it relies on multiple, subjective micro-assessments, and the sum of multiple, subjective micro-assessments is never going to be anything approaching reliable. Second, teachers, when confronted with the result of the formula may tweak the underlying detail, ticking and unticking objectives to get the desired outcome. Make sure it's low at the start of the year and high at the end because, you know, progress. Yes, there are still so many who think this process is actual evidence of progress.

The system itself may take various routes to arrive at its grade. The most simple is based on the percentage of objectives achieved at a particular point in time compared to some arbitrary threshold e.g 70%. But which 70%? What if one pupil is missing objectives relating to shape, another is struggling with fractions, whilst another can't get their head round roman numerals or telling the time. Each has achieved 70% of objectives but are they all secure? Clearly not. And some of our 70% group are less secure than others. Having such simplistic approaches doesn't work.

This leads us down the road of weighting objectives. A murky world of scoring objectives according to perceived difficulty or value, but who decides on the weighting? Can everyone agree? Are we going to spend the next few years constantly tweaking the weighting until we get something we accept - a reasonable fix? And based on the examples above, what is difficult for one child is easier for another, so weighting is utterly subjective both for teachers and pupils. In short, you'll never get it right. And based on our scoring system, what if a pupil scores high on some objectives and low on others and averages out with an average score? Does that mean they are 'on track' or 'expected'? No!

These approaches are so massively flawed, but they are still very much relied upon in so many schools. Senior leaders persist in the belief that their systems are oracles, that they will provide accurate and consistent data, but they are underpinned by subjective assessment, often made begrudgingly and with one eye on performance management, convoluted weighting systems, and arbitrary thresholds. Putting aside the workload issues - which can be enormous - these systems essentially undermine trust in teachers' judgement. And for those that state "teachers need this because they're not experienced enough to make that decision themselves", how can we ever expect them to get the experience if we continue to rely on the assessment equivalent of stabilisers? Teachers are professionals - they should be diagnosticians - and they should be perfectly capable of stating whether pupils are where they expect them to be based on all the evidence at their disposal, without having to resort to endless RAG-rated tick lists.

Lets return to that CEO and his vital question: "how have we reached the point where we have so little faith in teachers' judgement that we need software to give us the answers?"

It's worth thinking about. 



Tuesday, 6 November 2018

Converting 2018 KS2 scaled scores to standardised scores

Many schools are using standardised tests from the likes of NFER, GL and Rising Stars to monitor attainment and progress of pupils, and to predict outcomes; and yet there is lot of confusion about how standardised scores relate to scaled scores. The common assumption is that 100 on a standardised test (eg from NFER) is the same as 100 in a KS2 test, but it's not. Only 50% achieve 100 or more in a standardised test (100 represents the average, or the 50th percentile); yet 75% achieved 100+ in the KS2 reading test in 2018 (the average score in 2018 KS2 reading test was 105). If we want a standardised score that better represents expected standards then we need one that captures the top 75%, i.e. around 90. However, to be on the safe side, I recommend going for 94 (top 66%), or maybe even 95 (top 63%) if you want to be really robust. Whatever you do, please bear in mind that standardised test scores are not a prophecy of future results, they are simply an indicator. Michael Tidd (@ MichaelT1979) has written an excellent blog post on this subject, which I recommend you read if you are using standardised scores for tracking.

The purpose of this blog is to share a conversion table, that will give you a rough idea of how scaled scores convert to standardised scores. It is based on distribution of 2018 KS2 scores in reading and maths, taken from national tables. Download the national, local and regional tables (3rd link down) and click on table N2b. The cumulative percentages in table N2b are converted to standardised scores via this lookup table.

The scaled score to standardised score conversion table can be downloaded here.

Please note: this is not definitive; it is a guide. It will also change next year, when 2019 national data is released, but hopefully it will demonstrate that one score does not directly convert into another.

Enjoy!

Wednesday, 24 October 2018

2018 ASP summary template for primary schools

2018 version of ASP summary template free to download here.

Some tweaks on last year's template - now takes account of time series and 3 year average.

Feel free to modify, copy and share. Just credit the source and please download it first before attempting to complete it (it will open in Word online. To download, click on 3 dots in top right window of browser).

If you are confused by the 'Impact scores' concept (and who can blame you. I made up that term by the way), the idea is to find the minimum score required to improve an overall progress score from below average (orange or red) to average (yellow); or from average (yellow) to above average (green). The former is most critical and often it a case of just removing one pupil from data.

Schools that are below average (orange) will have a negative progress score (e.g. -1.9) and a confidence interval that is entirely negative (e.g. -3.6 to -0.2). If the confidence interval does not include the national average of zero - i.e. it does not cross the zero line - then it is deemed to be significantly below average (as in the example given above).

It would be neat to find out if removing one pupil would improve our data from below (orange) to average (yellow). Let's return to our example above. We take the upper limit of the confidence interval (the right hand number, i.e. -0.2). This is how far the progress score is away from average; how far the confidence interval is away from the zero line (safety!). Essentially, if every pupil's progress score increased by 0.2, the overall score would be in line with average, but that doesn't really help.

A better approach is to take that figure of -0.2 and multiply by the number of pupils included in progress measures (clearly stated in ASP). Let's say that's 30 pupils:

-0.2 x 30 pupils = -6.

This means by removing just one pupil with an individual progress score below -6, the 'below average' (orange) indicator will change to 'average' yellow.

Note: if your progress scores are average (yellow) and you want to determine what it would take to make them above average (green), use the lower limit of the confidence interval (the left hand figure) instead. Same applies: multiply that by the number of pupils, and if you have a pupil with a negative score equal to the result, removing a pupil with a progress score lower than the result should change overall scores from average to above.

Hope that makes some kind of sense. If it doesn't, tweet me and I'll do my best to explain it again.

Saturday, 20 October 2018

Making expected progress?

’Expected progress’ was a DfE accountability measure until 2015. Inspectors must not use this term when referring to progress for 2016 or current pupilsOfsted Inspection Update, March 2017.

It’s an odd phrase, expected progress, as it seems to have two meanings. First, there is the expectation of the teacher, which is based on an in-depth knowledge of the pupil; not only their start point but everything else: their specific needs, attitude to learning, the support and engagement of parents, and whether or not the pupil has breakfast. And then there is the definition we build into our systems, which is essentially a straight line drawn from pupils’ prior attainment at EYFS, KS1 or KS2. A one size-fits all-approach, all for the sake of convenience - a simplistic algorithm and  a neat metric. Needless to say, the two usually do not match, but all to often we wilfully ignore the former in favour of the latter, and plough on with blind faith in the straight line. 

The problem is the assumption of ‘linearity’ - that all pupils learn in the same way, at the same rate, and follow the magic gradient. We know it’s not true but we go along it because we have to make pupils fit the system, even if it means shooting ourselves in the foot in the process. 

The other problem with ‘expected progress’ - other than it not existing - is that it sounds, well, mediocre. Language is important, and if we choose to adopt the phrase ‘expected progress’ then we also need to have a definition for ‘above expected progress’ as well. And this is where things start to get messy. It wasn’t that long ago that I saw Ofsted report state that ‘according to the school’s own tracking data, not enough pupils are making more than expected progress’. The school was hamstrung by the point system they used, that only really allowed those that were behind at the start of the year and apparently caught up, to make more than expected progress. Everyone else had to settle for expected. 

But putting aside the still popular levels and points-style methods, we still have a problem in those schools that are taking a ‘point in time’/age-related approach.

Why?

Quite simple really and perfectly illustrated by a recent conversation, where I asked the headteacher, who was talking about percentages of pupils making expected progress, to define it. They gave a puzzled look, as if it was a bizarre question:

“Well, that’s staying at expected. If they were expected before and still at expected now, then they’ve made expected progress, surely?”

Sounds logical. 

“And what about those at greater depth?”

“That’s sticking at greater depth of course.”

“So, how do ‘greater depth’ pupils make above expected progress?”

“They can’t”

Problem number 1: in this system, pupils with high start points cannot be shown to have made 'above expected' progress. I asked another question: “what about those pupils that were working towards? What’s expected progress for them?”

“To stay at working towards.” Was the reply.

Is it? Is that really our expectation for those pupils? To remain below? Obviously there are those that were working towards that probably will remain so; but there are also those pupils, such as EAL pupils, that accelerate through curriculum content. And then there is another group of low prior attaining pupils, that do not have SEND and are not EAL, but often do not catch up. These may well be disadvantaged pupils for whom pupil premium is intended to help close the gap. Our expectations for all these pupils may be different. They do not fit on a nice neat line.

Expected progress is many things. it is catching up, closing gaps, overcoming barriers and deepening understanding. It is anything but simple and linear. What we’re really trying to convey is whether or not pupils are making good progress from their particular start points, taking their specific needs into account.

That may not roll off the tongue quite as easily, but surely it’s more meaningful than ‘expected progress’.

Isn’t it?

Further reading: Why measuring pupil progress involves more than taking a straight line. Education Data Lab, March 2015 https://ffteducationdatalab.org.uk/2015/03/why-measuring-pupil-progress-involves-more-than-taking-a-straight-line/

*credit to Daisy Christodolou whose book title I've blatantly copied.

Thursday, 18 October 2018

Trust

Here's a thing. In conversations with senior leaders both online and in the real world, I often get asked about restricting access to data for teaching staff or even locking down tracking systems entirely. This seems to take two broad themes:

1) Limiting a teacher's access to data that relates only to those pupils for whom they are responsible.

2) Locking down the system after the 'data drop' or 'assessment window'

Let's have a think about this for a minute. Why are some senior leaders' wanting to do this? What are their concerns? Essentially it boils down to mistrust of teachers and fear that data will be manipulated. But what sort of culture exists in a school where such levels mistrusts have taken root? How did they get to this point? It's possible that such concerns are well founded, that manipulation of data has occurred; and I have certainly heard some horror stories, one of which came to light during inspection. That didn't end well, believe me. But often it's just suspicion, suspicion that teachers will change the data of another class to make their class look better, or will alter the end of previous year assessments for their current class to make the baseline lower, or will tweak data to ensure it fits the desired school narrative, or most commonly to ensure it matches their target.

Suspicion and mistrust. How desperately sad is that?

Golden Rule #1: separate teacher assessment from performance management. But how common is it for teachers to be set targets that are reviewed in the light of assessment data that the teacher is responsible for generating? I regularly hear of teachers being told that 'all pupils must make 3.5 points' progress per year' or that '85% must be at age-related expectations by the end of the year' and the final judgement is based on the data that teachers enter onto the system; on how many learning objectives they've ticked. It is a fallacy to think you can achieve high quality, accurate data under such a regime.

Teacher assessment should be focused on supporting children's learning, not on monitoring teacher performance. You cannot hope to have insightful data if teachers have one eye over their shoulder when assessing pupils, and are tempted to change data in order to make things look better than they really are. Perverse incentives are counterproductive and a risk to system integrity. They will cause data to be skewed to such an extent that it ceases to have any meaning or value, thus rendering it useless. Senior leaders need a warts and all picture of learning, not some rose-tinted, target-biased view that gets exposed when the SATs results turn up. Teachers need to be able to assess without fear, and that evidently requires a big culture shift in many schools.

The desire to lock down systems and restrict teacher access is indicative of how assessment data is viewed in many schools: as an instrument of accountability, rather than a tool for teaching and learning. If teachers are manipulating data, or are suspected of doing so, then senior leaders should take a long hard look at the regime and culture in their school rather than resorting to such drastic measures.

It is symptomatic of a much wider problem.

Friday, 21 September 2018

The Progress Delusion

I recently spoke to a headteacher of a primary school judged by Ofsted to be 'requiring improvement'. The school has been on an assessment journey in the last couple of years, ditching their old tracking system with its 'emerging-developing-secure' steps and expected progress of three points per year (i.e. levels), in favour of a simpler system and 'point in time assessment', which reflects pupils security within the year's curriculum based on what has been taught so far. With their new approach, pupils may be assessed as 'secure' all year if they are keeping pace with the curriculum, and this is seen as making good progress. No levels, no points; just a straightforward assessment presented in progress matrices, which show those pupils that are where you expect them to be from particular start points, and those that aren't.

And then the inspection happened and the screw began to turn. Despite all the reassuring statements from the upper echelons of Ofsted, the decision to ditch the old system is evidently not popular with those now 'supporting' the school. Having pupils categorised as secure all year does not 'prove' progress, apparently; points prove progress. In order to 'prove' progress, the head has been told they need more categories so they can show more movement over shorter time scales. Rather than have a broad 'secure' band, which essentially identifies those pupils that are on track - and in which most pupils will sit all year - the school has been told to subdivide each band into three in order to demonstrate the progress. This means having something along the lines of:

BLW- BLW= BLW+
WTS- WTS= WTS+
SEC- SEC= SEC+
GDS- GDS= GDS+

The utter wrongness of this is staggering for so many reasons:

1) Having more categories does not prove anything other than someone invented more categories. The amount of progress pupils make is not proportionate to the number of categories a school has in its tracking system. That's just stupid. It's like halving the length of an hour in order to get twice as much done.

2) It is made up nonsense. It is unlikely there will be a strict definition of these categories so teachers will be guessing where to place pupils. Unless of course they link it to the number of objectives achieved and that way lies an even deeper, darker hell.

3) Teacher assessment will be compromised. The main purpose of teacher assessment is to support pupils' learning and yet here we risk teachers making judgements with one eye over their shoulder. The temptation to start pupils low and move them through as many sub-bands as possible is huge. The data will then have no relation to reality.

4) It increases workload for no reason other than to satisfy the demands of external agencies. The sole reason for doing this is to keep the wolf from the door; it will in no way improve anything for any pupil in that school, and the teachers know it. Those teachers now have to track more, and more often, and make frequent decisions as to what category they are going to place each pupil into. How? Why? It's the assessment equivalent of pin the tale on the donkey.

5) It is contrary to recent Ofsted guidance. Amanda Spielman, in a recent speech, stated "We do not expect to see 6 week tracking of pupil progress and vast elaborate spreadsheets. What I want school leaders to discuss with our inspectors is what they expect pupils to know by certain points in their life, and how they know they know it. And crucially, what the school does when it finds out they don’t! These conversations are much more constructive than inventing byzantine number systems which, let’s be honest, can often be meaningless." Evidently there are many out there that are unaware of, or wilfully ignoring this.

The primary purpose of tracking is to support pupils learning, and any data provided to external agencies should be a by-product of that classroom-focussed approach. If your system works, it's right, and no one should be trying to cut it up into tiny pieces because they're still in denial over the death of levels. Everyone needs to understand that the 'measure more, more often' mantra is resulting in a toxic culture in schools. It is increasing workload, destroying morale and even affecting the curriculum that pupils experience. It is a massive irony lost on the people responsible that many of their so-called school improvement practices are having precisely the opposite effect; and I've spoken to several teachers in the past year or so who have changed jobs or quit entirely because of the burden of accountability-driven assessment. Schools should not be wasting their time inventing data to keep people happy, they should not be wasting time training teachers in the complexities of 'byzantine number systems'; they should be using that time for CPD, for advancing teachers' curriculum knowledge and improving and embedding effective assessment strategies. That way improvement lies.

In short, we have to find a way to challenge undue demands for meaningless numbers, and resist those that seek to drive a wrecking ball through principled approaches to assessment.

It is reaching crisis point in too many schools.



Tuesday, 4 September 2018

2018 KS2 VA Calculator free to download

I've updated the VA calculator to take account of changes to methodology this year. This includes new standard deviations and estimated outcomes, and the capping of extreme negative progress scores. I have referred to this as adjusted and unadjusted progress; and the tool shows both for pupils and for the whole cohort. Note that extreme positive progress scores are not affected.

You can use the tool to get up-to-date, accurate progress scores by removing pupils that will be discounted, and adding on points for special consideration (this should already be accounted for in tables checking data) and successful review outcomes due back via NCA Tools on 12th Sept.

You can also use it to get an idea of estimated outcomes for current Year 6 but please take be aware of usual warnings, namely that estimates change every year.

The tool can be download here.

It will open in Excel Online. Please download it to your PC before using by clicking on the 3 dots top right. Do not attempt to complete online as it is locked for editing. Please let me know ASAP if you have any issues or find any discrepancies.

Enjoy!