## Tuesday, 7 November 2017

### Analyse School Performance summary template (primary)

Many of you will have downloaded this already but I thought it'd be useful to put it on my blog. For those who don't already have it, it's a rather low tech and unexciting word document designed to guide you through ASP and pull out the useful data. Aim is to summarise the system down to a few pages.

You can download it here

The file should open in Word Online. Please click on the 3 dots top right to access the download option. Please don't attempt to edit online (it should be view only anyway). Also, chances are it will be blocked by schools computers (schools always block my stuff).

A couple of points about the template:

1) Making sense of confidence intervals
Only worry about this is progress is significantly below average, or if data is in line but high and close to being significantly above.

If your data is significantly below average, take the upper limit of the confidence interval (it will be negative e.g. -0.25). This shows how much each pupils score needs to increase by for your data to be in line (0.25 points per pupil, or 1 point for every 4th pupil). Tip: multiply this figure by the number of pupils in the cohort (eg. -0.25 x 20 pupils = -5). If you have a pupil - for whom you have a solid case study on - that has a score at least equal to the result (i.e. -5 in this case), removing that pupil from the data should make your data in line with national average.

If your data is in line and you are interested to know how far it would need to shift to be significantly above, note the lower part of the confidence interval (it will be negative, e.g. -0.19). This again shows how much your data needs to shift up by, but in this case to be significantly above. In this case, each child's score needs to increase by 0.2 points for the overall progress to be significantly above (we need to get the lower limit of the confidence interval above 0 so it needs to rise by slightly more than the lower confidence limit). Obviously pupils cannot increase their scores by 0.2, so best to think of it as 1 point for every 5th child. Or. as above, multiply the lower confidence limit by the number of pupils in the cohort (e.g. -0.2 x 30 pupils = -6). If you have a pupil with a score at least equal to this result (i.e. -6) then removing them from the data should make the data significantly above average.

Easiest thing to do is model it using the VA calculator, which you can download from my blog (see August) or use the online version www.insighttracking.com/va-calculator

2) Difference no. pupils
This has caused some confusion. It's the same concept as applied in last year's RAISE and dashboards. Simply take the percentage gap between your result and national average (e.g. -12%), turn it into a decimal (e.g. -0.12) and multiply that by the number of pupils in the cohort (e.g. 30). In this case we work out 'diff no. pupils' as follows: -0.12 x 30 = -3.6. This means the schools result equates to 3 pupils below average. If the school result is above national then it works in the same way, it's just that the decimal multiplier is positive.

If you are calculating this for key groups, then multiply by the number in the group, not the cohort. For example, the 80% of the group achieved the result against a national group result of 62%, which means the group's result in 18% above national. There are 15 pupils in the group so we calculate 'diff no. pupils' as follows: 0.18 x 15 = 2.7. The group result therefore equates to 2 pupils above national.

I hope that all makes sense.

Happy analysing.

## Wednesday, 25 October 2017

### MATs: monitoring standards and comparing schools

A primary school I work with has been on the same journey through assessment land as many other schools up and down the country. Around two years ago they began to have doubts about the tracking system they were using - it was complex and inflexible, and the data it generated had little or no impact on learning. After much deliberation, they ditched it and bought in a more simple, customisable tool that could be set up and adapted to suit their needs. A year later and they have an effective system that teachers value, that provides all staff with useful information, and is set up to reflect their curriculum. A step forward.

Then they joined a MAT.

The organisation they are now part of is leaning on them heavily to scrap what they are doing and adopt a new system that will put them back at square one. It's one of those best-fit systems in which all pupils are 'emerging' (or 'beginning') in autumn, mastery is a thing that magically happens after Easter, and everyone is 'expected' to make one point per term. In other words, it's going back to levels with all their inherent flaws, risks and illusions. The school tries to resist the change in a bid to keep their system but the MAT sends data requests in their desired format, and it is only a matter of time before the school gives in.

It is, of course, important to point out that not all MATs are taking such a remote, top down, accountability driven approach, but some are still stuck in a world of (pseudo-) levels and are labouring under the illusion that you can use teacher assessment to monitor standards and compare schools, which is why I recently tweeted the following:

This resulted in a lengthy discussion about the reliability of various tests, and the intentions driving data collection in MATs. Many stated that assessment should only be used to identify areas of need in schools, in order to direct support to the pupils that need it; data should not be used to rank and punish. Of course I completely agree, and this should be a strength of the MAT system - they can share and target resources. But whatever the reasons for collecting data - and lets hope that its done for positive rather than punitive reasons - let's face it: MATs are going to monitor and the compare schools and usually this involves data. This brings me back to the tweet: if you want to compare schools, don't use teacher assessment, use standardised tests. Yes, there may be concerns about the validity of some tests on the market - and it is vital that schools thoroughly investigate the various products on offer and choose the one that is most robust, best aligned with their curriculum, and will provide them with the most useful information - but surely a standardised test will afford greater comparability than teacher assessment.

I am not saying that teacher assessment is always unreliable; I am saying that teacher assessment can be seriously distorted when it is used for multiple purposes (as stated in the final report of the Commission on Assessment without Levels). We need only look at the issues with writing at key stage 2, and the use of key stage 1 assessments in the baseline for progress measures to understand how warped things can get. And the distortion effect of high stakes accountability on teacher assessment is not restricted to statutory assessment; it is clearly an issue in schools' tracking systems when that data is not only used for formative purposes, but also to report to governors, LAs, Ofsted, RSCs, and senior managers in MATs. Teacher assessment is even used to set and monitor teachers' performance management targets, which is not only worrying but utterly bizarre.

Essentially, using teacher assessment to monitor standards is counter productive. It is likely to result in unreliable data, which then hides the very things that these procedures were put in place to reveal. And even if no one is deliberately massaging the numbers, there is still this issue of subjectivity: one teacher's 'secure' is another teacher's 'greater depth'. We could have two schools with very different in-year data: school A has 53% of pupils working 'at expected' whereas school B has 73%. Is this because school B has higher attaining pupils than school A? Or is it because school A has a far more rigorous definition of 'expected'?

MATs - and other organisations - have a choice: either use standardised assessment to compare schools or don't compare schools. In short, if you really want to compare things, make sure the things you're comparing are comparable.

## Tuesday, 3 October 2017

### Thoughts on new Ofsted inspection data summary report (primary)

Yesterday Ofsted released a 'prototype' of its new Inspection Data Summary Report and it's a major departure from the Ofsted Inspection Dashboard that we've become accustomed to over the past two years. On the whole it's a step in the right direction, with more positives than negatives, and it's good to see that Ofsted have listened to feedback and acted upon it. Here's a rundown of changes.

Positives
Areas for investigation. This is a welcome change. The new areas for investigation are clearer - and therefore more informative - than the 'written by robot' strengths and weaknesses that preceded them, many of which were indecipherable. They read more like the start point for a conversation and hopefully this will result in more productive, equitable relationship between inspectors and senior leaders.

Context has moved to the front. Good. That's where it should be. It was worrying when context was shoved to the back in RAISE reports. This is hopefully a sign that school context will be taken into account when considering standards. As it should be.

Sorted out the prior attainment confusion at KS2. Previous versions of the dashboard were confusing: progress measures based prior attainment on KS1 APS thresholds (low: <12, Mid: 12-17.5, High: 18+ (note: maths is double weighted)); attainment measures based prior attainment on the pupils level in the specific subject (low: L1 or below, mid: L2, high: L3). This has now been sorted out and prior attainment now refers to pupils KS1 APS in all cases. Unfortunately this is not the case for prior attainment of KS1 pupils - more on that below.

Toning down the colour palette. Previous versions were getting out of hand with a riot of colour. The page of data for boys and girls at KS2 looked like a carnival. Thankfully, we now just have simple shades of blue so sunglasses are no longer required; and nowhere in the new report is % expected standard and % greater depth merged into a single bar with darker portions indicating the higher standard. These are now always presented in separate bars, thankfully. That page was always an issue when it came to governor training.

Progress in percentiles. Progress over time is now shown using percentiles, which makes a lot of sense and is easy to understand. Furthermore, the percentiles are linked to progress scores, so it shows improvement in terms of progress not attainment. Percentiles show small steps of improvement over time, which means that schools can now put changes in progress scores into context, rather than guessing what changes mean until they move up a quintile. Furthermore, an indicator of statistical significance is provided, which may show that progress is be in the bottom 20% but is not significantly below, or perhaps is in the top 20% but is not significantly above, which adds some clarity. And finally, the percentiles for 2015 are based on VA data, rather than levels. Those responsible for the 'coasting' measure take note.

Scatter plots. Whilst an interactive scatter plot (i.e. an online clickable version) is preferable, these are still welcome because they instantly identify those outliers that have had a significant impact on data. In primary schools, These are often pupils with SEND that are assessed as per-key stage, and who end up with huge negative scores that in no way reflect the true progress they made. One quick glance at a scatter plot reveals that all pupils are clustered around the average, with the exception of those two low prior attaining pupils that have progress scores of -18.

Confidence intervals are shown. I was concerned that they'd stop doing this - showing the confidence interval as a line through the progress score - but thankfully this aspect has been retained. It's useful because schools can show how close they are to not being significantly below, or being significantly above. Inspectors will be able to see that if that pre-key stage pupil with individual progress score of -18 was removed from the data, that would shift the overall score enough to remove that red box. Statistical significance is, after all, just a threshold.

Negatives
Prior attainment of KS1 pupils. I'm not against the idea of giving some indication of prior attainment - it provides useful context after all - but I have a bit of problem here. Unlike at KS2 where prior attainment bands are based on the pupils APS at KS1, at KS1 prior attainment is based on the pupils' development in specific early learning goals (ELG) at EYFS. Pupils are defined as emerging, expected or exceeding on basis of their development in reading, or writing, or maths (for the latter they take the lower of the two maths ELGs, to define the pupils prior attainment band). This approach to prior attainment therefore takes no account of pupils development in other areas, just the one that links to that specific subject. The problem with this approach is that you can have a wide variety of pupils in a single band. For example, the middle band (those categorised as expected) will contain pupils that have met all ELGs (i.e. made good level of development) alongside pupils that have met the ELG in reading but are emerging in other areas, and pupils that have met the ELG in reading and exceeded others. These are very different pupils. Data in RAISE showed us that pupils that made a good level of development are twice as likely to achieve expected standards at KS1 than those that didn't, so it seems sensible that any attempt to define prior attainment should take account of wider development across the EYFSP, and not just take subjects in isolation. Perhaps consider using an average score for EYFS prime and specific ELGs, to define prior attainment instead.

Prior attainment of Y1-2 in the context page. Currently this is based on how NYC the percentage achieving specific ELGs differs from national average, whilst prior attainment for years 3-6 involves APS. As above, perhaps Ofsted should consider using an EYFS average score across the prime and specific ELGs instead.

I am, by the way, rather intrigued by mention of APS for current years 3 and 4. Does this mean Ofsted have developed some kind of scoring system for new KS1 assessments? This surely has to happen as some point anyway, in order to place pupils into prior attainment groups for futures progress measures.

Lack of tables. There's nothing wrong with a table; you can show a lot in a table. In the absence of tables to show information for key groups, the scatter plots are perhaps trying to do too much. Squares for boys, triangles for girls, pink for disadvantaged, grey for non-disadvantaged, and a bold border to indicate SEN. It's just a bit busy. But then again, we can see those pupils that are disadvantaged and SEN, so it can be useful. It's not a major gripe and time will tell if it works, but sometimes a good old table is just fine.

And finally a few minor niggles:

There is no such things as greater depth in Grammar, Punctuation and Spelling at KS2. Mind you, yesterday it had greater depth for all subjects at KS2 and that's changed already so it's obviously just a typo.

And many of the national comparator indicators on the bar graphs are wonky and don't line up. They look more like backslashes.

But overall this is big improvement on the previous versions and will no doubt be welcomed by head teachers, senior leaders, governors and anyone else involved in school improvement. This, alongside ASP and the Compare Schools website, shows the direction of travel of school data: that it's becoming more simplified and accessible.

And that's a good thing.

## Thursday, 7 September 2017

### KS2 progress measures 2017: a guide to what has and hasn't changed

At the end of last term I wrote this blog post. It was my attempt to a) predict what changes the DfE would make to the KS2 progress methodology this year, and b) get my excuses in early about why my 2016 VA Calculator could not be relied upon for predicting VA for 2017. For what it's worth, I reckon the 2017 Calculator will be better for predicting 2018 VA, but 2016 data was all over the shop and provided no basis for predicting anything.

Anyway, no doubt you've all now downloaded your data from the tables checking website (and if you haven't, please do so now. Guidance is here) and have spent the last week trying to make sense of it, getting round what -1.8 means and how those confidence intervals work. Perhaps you've used my latest VA calculator to recalculate data with certain pupils removed, or updating results in light of review outcomes, or maybe changing results to those 'what if' outcomes.

This is all good fun (or not depending on your data) and a useful exercise, especially if you are expecting a visit, but it's important to understand that the DfE has made changes to the methodology this year - some of which I predicted and some of which I didn't - and, of course, the better we understand how VA works, the better we can fight our corner.

So what's changed?

Actually let's start with what hasn't changed:

1) National average is still 0
VA is a relative measure. It involves comparing a pupil's attainment score to the national average score for all pupils with the same start point (i.e. the average KS2 score for the prior attainment group (PAG)). The difference between the actual and the estimated score is the pupil's VA score. Adding up all the differences and dividing by the number of pupils included in the progress measure gives us the school's VA score. If you calculate the national average difference the result will be 0. Always.

School VA scores can be interpreted as follows:
• Negative: progress is below average
• Positive: progress is above average
• Zero: progress is average
Note that a positive score does not necessarily mean all pupils made above average progress, and a negative score does not indicate that all pupils made below average progress. It's worth investigating the impact that individual pupils have on overall progress scores and take them out if necessary (I don't mean in a mafia way, obviously).

2) The latest year's data is used to generate estimates
Pupils are compared against the average score for pupils with same start point in the same year. This is why estimates based on the previous year's methodology should be treated with caution and used for guidance only. So, the latest VA calculator is fine for analysing 2017 data, but is not going to provide you with bombproof estimates for 2018. Same goes for FFT.

3) KS1 prior attainment still involves double weighting maths
KS1 APS is used to define prior attainment groups (PAGs) for the KS2 progress measure. It used to be a straight up mean average, but since 2016 has involved double weighting maths, and is calculated as follows:

(R+W+M+M)/4

If that fills you with rage and despair, try this:

(((R+W)/2)+M)/2

Bands are as follows:

low PA: KS1 APS <12
Mid PA: KS1 APS 12-17.99
High PA: KS1 APS 18+

4) Writing nominal scores stay the same
The crazy world of writing progress continues. I thought the nominal scores for writing assessments might change but that's not the case, i.e.

WTS: 91
EXS: 103
GDS: 113

This means that we'll continue to see wild swings in progress scores as pupils lurch 10 points in either direction depending on the assessment they get, and any pupil with a KS1 APS of 16.5 or higher has to get GDS to get a positive score, but GDS assessments are kept in a remote castle under armed guard. I love this measure.

5) As do pre-key stage nominal scores
No change here either, which means the problems continue. Scores assigned to pre-key stage pupils in reading, writing and maths are as follows:

PKF: 73
PKE: 76
PKG: 79

Despite reforms (see changes below) these generally result in negative scores (definitely if the pupils was P8 or above at KS1). It's little wonder so many schools are hedging their bets and entering pre-key stage pupils for tests in the hope they score the minimum of 80.

6) confidence intervals still define those red and green boxes
These can go on both the changed and not changed piles. Confidence intervals change each year due to annual changes in standard deviations and numbers of pupils in the cohort, but the way in which they are used to define statistical significance doesn't. Schools have confidence intervals constructed around their progress scores, which involves an upper and a lower limit. These indicate statistical significance as follows:

Both upper and lower limit are positive (e.g. 0.7 to 3.9): progress is significantly above average
Both upper and lower limit are negative (e.g. -4.6 to -1.1): progress is significantly below average
Confidence interval straddles 0 (e.g. -1.6 to 2.2): progress is in line with average

7) Floor standards don't move
This shocked me. If i had to pick one data thing that I thought was certain to change it would be the floor standard thresholds. But no, they remain as follows:

Reading: -5
Writing: -7
Maths: -5

Schools are below floor if they fall below 65% achieving the expected standard in reading, writing and maths combined, and fall below any one of the above progress thresholds (caveat: if just below one measure then it needs to be sig-. Hint: it will be). Oh, and floor standards only apply to cohorts of 11 or more pupils.

And now for what has changed

1) Estimates - most go up but some go down
The estimates - those benchmarks representing average attainment for each PAG against which each pupil's KS2 score is compared - change every year. This year most have gone up (as expected) but some, for lower PAGs, have gone down. This is due to the inclusion of data from special schools, which was introduced to mitigate the issue of whopping negative scores for pre-key stage pupils.

Click here to view how the estimates have changed for each comparable PAG. Note that due to new, lower PAGs introduced for 2017, not all are comparable with 2016.

2) Four new KS1 PAGs
The lowest PAG in 2016 (PAG1) spanned the KS1 APS range from 0 to <2.5, which includes pupils that were P1 up to P6 at KS1. Introducing data from special schools in 2017 has enabled this to be split into 4 new PAGs, which better differentiates these pupils. The use of special school data has also had the effect of lowering progress estimates for low prior attainment pupils, which goes some way to mitigating the issue described here. However, despite these reforms, if the pupil has a KS1 APS of 2.75 or above (P8 upwards) a pre-key stage assessment at KS2 is going to result in a negative score.

3) New nominal scores for lowest attaining pupils at KS2
in 2016, all pupils that were below the standards of the pre-key stage at KS2 were assigned a blanket score of 70. This has changed this year, with a new series of nominal scores assigned to individual p-scales at KS2, i.e:

P1-3: 59 points
P4: 61 points
P5: 63 points
P6: 65 points
P7: 67 points
P8: 69 points
BLW but no p-scale: 71 points

I'm not sure how much this helps mainstream primary schools. If you have a pupil that was assessed in p-scales they would have been better off under the 2016 scoring regime (they would have received 70 points); as it stands they can get a maximum of 69. Great.

Please note: these nominal scores are used for progress measures only. They are not included in average scaled scores.

4) Closing the progress loophole of despair
Remember this? In 2016, if a pupil was entered for KS2 tests and did not achieve enough marks to gain a scaled score, then they were excluded from progress measures, which was a bonus (unless they also had a PKS assessment, in which case they ended up with a nominal score that put a huge dent in the school's progress score). This year the DfE have closed this particular issue by assigning these pupils a nominal score of 79, which puts them on a par with PKG pupils (no surprise there). In the VA calculator, such pupils should be coded as N.

The loophole is still open by the way. Pupils with missing results, or who were absent from tests, are not included in progress measures, and I find that rather worrying.

5) Standard deviations change
These show how much, on average, pupils' scores deviate from the national average score; and they are used to construct the confidence intervals, which dictate statistical significance. This is another reason why we can't accurately predict progress in advance.

-----

So, there you go: quite a lot of change to get your head round. It has to be said that unless the DfE recalculate 2016 progress scores using this updated methodology (which they won't), I really can't see how last year's data can be compared to this year's.

But it will be, obviously.

## Thursday, 31 August 2017

### KS2 VA Calculator 2017

Updated basic version of KS2 VA calculator can be downloaded here.

and new version with pupil groups tables (progress and average scores) can be downloaded here

Please download before use (ie don't attempt to edit the online version). You should see 3 dots top right of screen - click on these to download to your desktop. Recommend reading the notes page first and also worth reading the primary accountability guidance for more information about nominal scores, p-scales and pre-key stage pupils. Quite a bit more complicated this year with more prior attainment groups (24 instead of 21), closing of the loophole (pupils not scoring on test receive a score of 79), and nominal scores assigned to individual p-scales (rather than blanket score of 70 assigned to BLW).

Which, in my opinion, means this year's progress data is not comparable with last year's, but hey ho.....

Enjoy!

and let me know asap if you find any errors

## Friday, 18 August 2017

### The Progress Bank

For primary schools, there are two main issues with the current system of measuring progress: 1) it is high stakes, 2) it involves teacher assessment. Whether we are talking about our own internal tracking systems, or those official end of key stage DfE measures, the former clearly influences the latter. Imagine you are told that you have to run a marathon in under 3 hours (yes, I do like a running analogy) and that if you fail to do so, the consequences will be dire. Of course, there are some good marathon runners out there for whom this is feasible, and a handful of really good ones for whom this is no problem at all, but for the majority of us this is near impossible. Then we are told that no one will be monitoring your efforts; they will just be basing their judgement on your time, which you alone are responsible for keeping. Now how many sub-3 hours marathon times will we see?

This is the weird situation we find ourselves in: an incompatible mix of high stakes and self-evaluation. Like fire and duvets, they make for strange and dangerous bedfellows, and clearly it's not working. We all know that really.

Consider that weird, near vertical cliff that occurs at 32 marks every year in the national phonics screening check results; or the impact that current KS1 measures will have on the EYFSP now that the profile is used to establish prior attainment groups for those pseudo-progress measures in the dashboard. Then there are those well documented problems with using KS1 results as the baseline for progress measures. The DfE are attempting to solve this by implementing a reception baseline, but this is highly contentious and unpopular; and even if it's implemented in 2019 as planned, we won't see the results until 2026. Then there is the thorny Infant/junior/middle school problem that the reception baseline won't adequately solve; it will just blur the issue so no one really knows where the problems lie, or if there are any problems anyway.

And then, of course, there are those highly variable and suspiciously high KS2 writing results, and associated issues around moderation. Why do you think that KS2 writing was ditched from the baseline for the Progress 8 measure the minute we ran out of secondary pupils with KS2 writing test results? (That was last year by the way; the 2017 GCSE cohort were the first to have writing TA at KS2 and that won't be used in their progress 8 measures).

These are some of the well known issues relating to statutory teacher assessment, assessments that are done almost entirely for the purposes of accountability. It is little wonder that no one seems to have much faith in progress measures that rely on such data. But these issues are not restricted to statutory assessment; they also exist in the various tracking systems schools use, tracking systems which still rely on teacher assessment for measuring progress.

The final report ofthe Commission on Assessment without Levels warns about the risks of using teacher assessment for multiple purposes, yet this is still the norm in many schools. Teacher assessment is commonly used not only for formative purposes, but also for measuring pupil progress, monitoring standards, reporting to Governors, evaluating teacher performance, and even comparing schools. These multiple purposes exert conflicting pressures on the data that can and will lead to its distortion; and let's face it, teacher assessment is too subjective anyway. One teacher's 'secure' is another teacher's 'greater depth', so even if there were no high stakes attached, we still wouldn't have an accurate picture. This is made even more complicated by the various methods of the tracking systems themselves: different steps, bands and point scores; varying lists of key objectives; and contrasting definitions of 'age related expectations' based on spurious algorithms and arbitrary thresholds. And let's be honest, progress measures based on teacher assessment pretty much always involve reinventing levels anyway. No one is talking the same language; no one knows what's going on. No one can measure and compare progress.

That's the point of Progress Bank.

This is something I've been thinking about for a long time: a system that wouldn't rely on teacher assessment to measure progress and wouldn't rely on end of key stage results either. A system that is powered by whatever standardised tests the school chooses to use, and can measure progress from any point to any point, benchmarked against other pupils with similar start points.

This is obviously a rather ambitious project and not something I'm capable of building, which is why I approached the people at Insight Tracking. I like their current system - it's intuitive and highly customisable  - and they tend to build things quick. I needed their expertise and thankfully they agreed.

The system will work like this: schools upload standardised scores from the various tests they use, pick a start point (i.e. previously uploaded data, say at the start of Y1 or Y3) and an end point (probably the most recent upload), and they receive zero-centred VA scores for cohorts, key groups and individual pupils in whatever subjects are tested. The methodology is essentially the same as that used by the DfE to measure progress, so the data is in a common format, but the system is far more flexible in terms of start and end points, and is based on more regular, lower stakes testing. Schools will be able to interrogate their data using simple, interactive reports, which will focus not only on progress, but on attainment gaps, too.

The neat thing is it doesn't matter what tests you use, as long as they're standardised; and if results aren't standardised, the Progress Bank can standardise them if enough data is uploaded by enough schools to provide a suitable sample. If you use tests from multiple providers, that's fine; and if you change test provider, your old data will be stored in Progress Bank, so you won't lose it and can continue to use it. We can, if permission is given, even transfer test results when a pupil changes school. And if you decide to leave, the data is deleted. You own it.

The Progress Bank will be especially useful to junior and middle schools, which have a particular issue when it comes to progress measures, and this project has been expedited by the JUSCO Conference and conversations with members including Chris McDonald, the chair of the group. Obviously, the ideal solution for junior schools is to enable them to measure progress from a Year 3 on-entry baseline, not from KS1 results or from a reception baseline as proposed. Progress Bank will allow them to do that.

But it's not just for junior schools; it is aimed at any school that is interested in alternative, benchmarked progress measures. The system can even measure progress from KS1 scaled scores instead of KS1 teacher assessment used in official measures, or from any current standardised reception baseline assessment. We hope that the data will help schools challenge the flawed measures that is currently used to hold them to account. And by using standardised scores to measure progress, hopefully we can protect the integrity of teacher assessment by ensuring it is used solely for formative purposes, and perhaps reduce workload in terms of tracking and analysis too.

Now we just need lots of schools to get on board.

Find out more about the project and register your interest here:

And follow us on twitter @theprogressbank

## Thursday, 20 July 2017

### Making the cut

Assessment. It should be simple really: just checking what pupils do and don't know. But assessment appears to have turned into some kind of war, with the legions of accountability amassed on one side, and the special forces of teaching and learning besieged upon a slippery slope with nowhere to go. We become so focussed on outcomes - on floor standards, and coasting thresholds, and league tables - that we risk losing sight of what's important: the here and now. But it is focussing on the here and now that makes all the difference. The irony is that concentrating on accountability - on those distant and unpredictable performance measures - can jeopardise the very results you are striving for. In short, focus on teaching and learning, and the results take care of themselves.

This is, of course, easier said than done but something has to change. In too many schools assessment has become a burden: a top down directive disconnected from learning; an interminable, box-ticking, data-collecting, drain on teachers' time. The risks are clear: morale nose dives and pupils' learning is put at risk. We therefore need to ditch some of our assessment baggage - aim to do more with less - and this requires some serious rationalisation of our processes. It all comes down to one simple question:

Does this have a positive impact on learning?

We need to go through everything we do in the name of assessment and school improvement and ask that question, and we need to be ruthless and honest. What is the benefit and what is the cost? How long does this take? Does it tell us anything we don't already know? Is it having a negative impact? Is it taking teachers' time away from teaching? Ultimately, the only way to improve a school is to teach children well, and anything that distracts from that purpose is a risk.

So let's deconstruct our entire approach to assessment and lay it all out on the hall floor: the various tests you use, your marking policy, target setting (both for pupils and performance management), those lists of learning objectives stapled into pupils' books, and the component parts of your tracking system (yes! every single measure, category, grid, table, graph, chart and report). We now separate these into two piles: those that have a demonstrable, positive impact on teaching and learning, and those that are purely done for the purposes of accountability.

We keep the first pile and ditch the rest.

We now have a stripped down system that is fit for purpose, that is focussed on the right things. From now on, the information we provide to governors and external agencies is a byproduct of our assessment system, which exists to serve teaching and learning alone. If it works, it's right, no matter what others may say. Many will try to convince you that you're mad but deep down they probably just wish they could do the same. If you think this is all too radical, it's really not. There are many schools with extremely minimalist approaches to assessment that have had very successful inspections. Just as long as your approach is informative and has impact, then it's fine. If anything, the simpler the better. And Ofsted are not asking you to generate data purely for their benefit anyway. The Handbook states:

Ofsted does not expect performance- and pupil-tracking information to be presented in a particular format. Such information should be provided to inspectors in the format that the school would ordinarily use to track and monitor the progress of pupils in that school

And the workload review group report on data management had this to say:

Be ruthless: only collect what is needed to support outcomes for children. The amount of data collected should be proportionate to its usefulness. Always ask why the data is needed.

In alpine climbing there are two popular adages: 'if in doubt, leave it out', and 'if you haven't got it, you can't use it'. The first one is obvious, and it's what I'm trying to get schools to think about when they go about rationalising what they do. The second one links to it and recognises that if we do decide to carry something we'll most likely try to use it, in which case it becomes a potential distraction that can slow us down. It is common to hear headteachers and senior leaders say "we don't use all those bits of our system, we just use this grid". But the problem is that whilst all those other bits exist there is a temptation to use them, to waste your evenings and weekends wading through various reports and charts, and for governors to ask for them. Even worse, there is the potential for a 'visitor' to say "Oh, you use that system! Can you run this report for me please?"

Ditch it. If you haven't got it, you can't use it, and so it ceases to be an issue.

And when inevitably you do come up against someone asking for something they shouldn't be asking for, this should be your response:

"We don't do that in this school. It has no impact on learning"

Have a great summer.

## Thursday, 6 July 2017

### Predicting progress using the VA calculator: some things to bear in mind

It was great to read in Ofsted's March update that "Ofsted does not expect any prediction by schools of a progress score, as they are aware that this information will not be possible to produce due to the way progress measures at both KS2 and KS4 are calculated." Sean Harford went even further in his blog, describing the process of predicting progress as a 'mug's game'. This is welcome guidance from Ofsted and shows that they understand the complexities of value added measures in comparison to the old levels of progress measure.

However, whilst Ofsted won't be asking for such data, I recognise that schools still like to have an idea of progress scores before they pack up for the summer, especially with floor standards link to these measures, and that's why I produce the VA calculator. It's a free tool and I'm happy for all primary schools to use and share it. It's available in two formats: old school excel, and new-fangled web tool. Feel free to have a play around with both.
.
But, if you do use it, it's important that you understand its limitations, which I've outlined below in order of impact and likelihood of change.

1) Estimates
Value Added measures involve comparing each pupil's actual result against an estimated result. The whole school progress score is the average of the differences between the actual results and estimates. An estimate is the national average score for pupils with similar prior attainment in that particular year. For example, let's take a year 6 pupil that was 2c in reading, writing and maths at KS1. They have a KS1 APS of 13 and are in prior attainment group 9. The DfE have to identify all pupils nationally with the same KS1 APS score, and they calculate the average scaled score for this prior attainment group in order to generate the estimates . In 2016, pupils in this prior attainment group son average scored 97.26, 96.69 and 98.33 in reading, writing and maths respectively (if you want to know more about the crazy world of writing progress, read this). Our 2c pupil's KS2 scores are therefore compared against these benchmarks - if they score higher then they get a positive score; if they score lower they get a negative score. The issue with trying to work out progress for the current year 6 cohort is that we are comparing them against last year's estimates, whereas we should be comparing them against the national average score for pupils with similar prior attainment in the same cohort. We don't have this data yet and won't have it until September. Judging by the overall improvement in results nationally, we can safely assume that the estimates for each prior attainment group will change and will no doubt rise in most if not all cases (they may drop for lowest PA groups because DfE intend to introduce data from special schools to mitigate this problem).
Verdict: definite change

2) Standard deviations
Standard deviations change every year, and these form part of the calculation of confidence intervals on which statistical significance depends. This means your data might not be significantly above or below average on the VA calculator but might be when we get the proper data in September. Or vice versa.
Verdict: definite change

3) Pupils that are entered for tests but do not score will be assigned a nominal score
This was covered in the progress loophole of despair post here: pupils assessed as HNM or EXS in reading and maths that sat the test but did not score enough marks to achieve a scaled score were excluded from progress measures last year. It looks like this particular loophole will close this year, which is a good thing, but we have no idea what nominal score these pupils will be assigned. I'm voting for 79 (1 point below the lowest scaled score of 80). The loophole was a particular issue in reading last year, when over 3000 pupils were entered for tests but did not achieve enough marks to gain a scaled score (compared with around 300 in maths). It is likely to be a much smaller number this year, but it will still affect quite a few schools.

Please note: this specifically relates to pupils assessed as HNM or EXS that did not manage to achieve a scaled score. Pre-key stage pupils that were entered for tests (for whatever reason) and did not manage to achieve a scaled score had a fall back, nominal score and therefore were included in progress measures (see No. 4 below).
Verdict: definite change

4) Floor standards & Coasting
Not really related to the VA calculator but something schools will have their eye on. Last year the progress floor thresholds were set at -5, -7 and -5 for reading, writing and maths respectively. These were incredibly low and reflected the fact that attainment was low (they just want the right amount of schools below floor after all). These threshold will go up this year, but by how much is anyone's guess. Same applies to coasting - they just halved the floor standards, remember? - unless they scrap the coasting measure. Please let them scrap the coasting measure. I assume we'll get the new floor and coasting thresholds at some point in the autum term but for some reason they don't apply them in the Ofsted dashboard until validated data is released.
Verdict: almost certain change

5) Nominal scores for writing
Currently, pupils are assigned the following nominal scores according to their writing teacher assessment:

WTS: 91
EXS: 103
GDS: 113

It is likely that these will change, which will have an impact on progress scores. If they go up, then that will mitigate the inevitable increase in the estimates; if they stay the same then there will be a negative impact on overall progress scores. For what it's worth, I'd like to see the writing progress measure scrapped because it's frankly ridiculous.
Verdict: change likely

6) Nominal scores for pre-key stage pupils
Last year, pre-key stage pupils were assigned nominal scores according to their specific teacher assessment, which caused a huge amount of damage to the progress scores of those schools that had pre-key stage pupils. These scores, used for progress measures only, are as follows:

BLW: 70
PKF: 73
PKE: 76
PKG: 79

We have no idea if these nominal scores will change but I suspect they won't because increasing them will encroach upon the actual scale score range, which doesn't make sense, and decreasing them will further penalise schools with low attaining, SEN pupils. Obviously, any change to these scores will have an impact on progress calculations.
Verdict: change unlikely

NB: If you want to recalculate progress with pre-key stage pupils removed, this is something the VA calculator can help with.

7) Prior Attainment Groups
Currently there are 21 prior attainment groups, as detailed on p17-18 of the Primary Accountability Guidance. These primary attainment groups (PAGs) are integral to the progress measures and are, like the other factors detailed above, a vital part of progress calculations and the VA calculator. They are the bolts onto which the estimate nuts fix. If these change then it will fundamentally alter structure of the progress measures, and make latest data incomparable with last year. I suspect they will remain the same.

Verdict: change unlikely

We will have to wait for the release of progress data in September to get answers to these questions. In the interim, the VA calculator can be used as a guide to indicate whether progress is well above or below average, or broadly in line. We will of course update the VA calculator with 2017 estimates in September so you can use it to accurately recalculate progress with specific pupils removed. I will have more confidence in the forecasts next year, when the estimates are based on this year's more reliable results, and many of the issues listed above have been resolved.

So use it and share it; just be aware that data is likely to change.

## Wednesday, 28 June 2017

### Primary assessment for non-primary people: a very quick guide

The DfE collects a bewildering array of data from primary schools. This quick guide to statutory assessment is here to help.

Early Years Foundation Stage Profile (EYFSP)
This assessment takes place at the end of the reception year when pupils are 5 years old. It
comprises 17 early learning goals (ELGs) against which pupils are assessed as emerging, expected, or exceeding. If pupils meet the expected level of development in the 12 prime and specific ELGs then they are deemed to have reached a ‘good level of development’ (GLD). The percentage reaching GLD is the key performance measure, which is shown in the Ofsted Inspection dashboard but is not in the publicly available performance tables.

Phonics Screening Check (PSC)
Carried out at the end of Year 1, pupils attempt to decode 40 words, half of which are real and the other half made-up. Pupils managing to decode 32 or more out of 40 have achieved the expected standard, and the percentage doing so is another key measure. As for EYFSP, school results are presented in the Inspection dashboard but are not available in the public domain.

Key Stage 1 (KS1)
At the end of year 2 pupils receive a teacher assessment in reading, writing, maths, and science. In science pupils are simply deemed to have met or not met expected standards (EXS or HNM). In other subjects the majority of pupils are assessed as either working towards (WTS), working at the expected standard (EXS) or working at greater depth (GDS). Pre-key stage assessment frameworks are available for those pupils that are working below the standard of the curriculum. Pupils take tests to inform the overall teacher assessment in reading and maths; there are no tests for writing and science. A grammar, punctuation and spelling test is provided but it is non-statutory and no data is collected. The DfE collect pupils’ overall teacher assessment in each subject – they do not collect the test scores – and the percentage achieving expected standards and greater depth in reading, writing and maths are the key measures. Pupils’ KS1 results also act as the baseline for primary school progress measures but this may change in future if the DfE implement a baseline at the start of reception. A school’s KS1 results are shown in the inspection dashboard but are not available in the public domain.

Key Stage 2 (KS2)
At the end of Year 6, pupils sit tests in reading, maths, and grammar, punctuation and spelling.
Achieving a score of 100 or more indicates that the pupil has met the expected standard and a score of 110 is deemed to be a high score. There are no tests for writing and science so pupils receive a teacher assessment based on the KS2 teacher assessment frameworks*, and these mirror the format of KS1 with a binary result for science (HNM or EXS), and more differentiated outcomes in writing (WTS, EXS, GDS). Pre-key stage frameworks are used to assess pupils working below the curriculum standard, with more pre-key stage categories than at KS1. The DfE collect test scores and teacher assessments but headline measures are mainly based on scaled scores in reading and maths, and teacher assessment in writing.  Headline measures include:
• % achieving expected standards in reading, writing and maths combined (1 measure)
• % achieving high standards in reading and maths, and greater depth in writing (1 measure)
• Average scores in reading and maths (2 separate measures)
• Average progress in reading, writing and maths (3 separate measures)
There are floor standards linked to these results with an attainment floor set at 65% achieving expected standards in reading, writing and maths combined, and progress floor thresholds that change each year. Schools' results are available in the public domain as well as in the inspection dashboard.

Note that only reading and maths scores are used for Progress 8 baselines.

*Teachers also make an assessment of reading and maths. This data is collected but is not used in headline measures.

Links:
Performance tables: https://www.compare-school-performance.service.gov.uk/
DfE Statistics: https://www.gov.uk/government/organisations/department-for-
#education/about/statistics
Key guidance: https://www.gov.uk/education/school-curriculum

## Tuesday, 6 June 2017

### The Data Burden

In implementing a new approach to assessment and tracking we must first weigh up our desire for data against any impact on workload. Sometimes those seemingly minor tweaks can produce a butterfly effect, placing huge demands on teachers' time that are most likely disproportionate to any benefit gained from the data. Before embarking on a new assessment journey always start by asking: who is this data for? what impact will it have on learning?

Many schools have implemented systems with a small number of key learning objectives in each subject, encouraging teachers to have the confidence to make judgements on pupils' competence within a broad framework. Some schools however have broken these key objectives down into many smaller steps - perhaps to enable progress to be measured over shorter periods - and this is having an adverse impact both on teacher workload and possibly on learning itself as we seek to assess and record everything a pupil does. This may be done to improve accuracy but such micromanagement of assessment is undermining professional judgement and is unnecessary, demoralising and counterproductive. There is a great irony in procedures put in place supposedly for the purposes of school improvement that actually take teachers' time away from teaching. It is hardly surprising that some headteachers are experiencing a backlash. An anti-tracking rebellion is underway in many schools.

I recently visited a school to help them set up tracking for the reception year. Initially the school wanted to track against development matters statements in each of the areas of the early years foundation stage profile, making an assessment every half term. After discussion, they decided that they would just focus on reading, writing and maths. I took a look at the list of statements - there were around 60 for each subject across the various month bands. I got out my calculator:

60 statements

3 subjects

6 terms

45 pupils

That comes to 48,600 assessments that the Early Years teacher would have to make across the course of year.

Let's say each assessment takes 5 seconds to enter onto the system. This works out as 9 days per year for one teacher.

Perhaps the teacher wouldn't have to update each assessment each term but you take the point. It's a potentially huge burden. Thankfully the headteacher and deputy went a bit pale and decided to go back to the drawing board. Their tracking system now requires the teacher to record an assessment against each of the 17 areas of the foundation stage three times per year. Somewhat more manageable and the early years teacher was certainly happier. They can spend more time doing more important stuff, like teaching.

But yesterday I had a lengthy twitter chat with awesome @misterunwin who challenged my thinking further. In his school they do no objective level tracking at all in their system. No box ticking, no rag rating statements, no assessment by numbers. Teachers make an overall judgement of where each pupil is at each half term and this produces all the data the school needs for analysis and reporting purposes. Teachers therefore spend very little time on data entry, freeing them up to do the stuff that has real impact: planning lessons and teaching children. Detailed pupil progress reviews are used to reveal gaps, inform next steps and identify where further support is needed. Staff are very happy with this arrangement, and improved staff morale will obviously benefit the school in many ways.

I have always tried to encourage schools to keep lists of objectives to an absolute minimum; to strike that balance between impact on learning and impact on workload. However, I completely understand why schools will want to reject any approach to tracking that takes teachers' time away from teaching, which is why some schools are choosing not to track in this way at all. I doubt that we are about to see schools ditching tracking systems en masse - I'm certainly not advocating that, by the way - but we do need to be mindful of the how these approaches can eat into teachers' precious time and the adverse impact this can have.

Start by getting your calculator out.

### The beginning of the end

The DfE's primary assessment consultation, which closes on the 22nd June, focuses on a number of aspects of statutory assessment but it really centres on the baseline, the point from which pupils' progress to the end of KS2 will be measured. The DfE's preferred option is to set the baseline at the beginning of the reception year, and this has been supported by the likes of the NAHT (with caveats), the Headteachers Roundtable, as well as numerous headteachers and senior leaders that I've spoken to over the past year or so. I do, however, recognise that this is a contentious and emotive subject with many in the profession being understandably concerned and opposed to the idea of assessing children at such a young age. Is it possible to assess 4 year olds with any degree of accuracy? What about the pupils' month of birth, which can make such a big difference at that age? And can assessment be disruptive and stressful for children if they are not fully settled in to school life?

These are valid concerns but most primary schools already assess children on entry into reception so this is not new. No doubt the primary reason for carrying out such assessments is to support children in their learning but all too often schools are seeking to establish a baseline from which to measure progress. Pupils are therefore shoehorned into various bands with associated point scores in order to count steps of learning, or plotted onto RAISE-style progress matrices, which show them moving from one band to another. Those that progress, for example, from the 30-50 low band on entry to meet expected standards at KS1 are deemed to have made 'above expected progress', and are colour coded green or purple for good measure. Simple stuff that usually satisfies the demands of governors, the LA advisor and even an Ofsted Inspector.

The problem is that this data is fairly meaningless. On-entry assessments, carried out for the purpose of teaching and learning, are being commandeered for progress measures in order to respond to the increasing pressures of accountability. Such conflicting aims result in perverse incentives, which inevitably skew the data.  It is therefore no surprise that most pupils, according to schools' own data, are below average on entry and appear to make good progress across key stage 1. The reality, if the data remained true to its intended purpose, may look somewhat different and would probably be more informative, too.

The other issue is that many of these assessment practices are immensely time consuming, and it is often in the reception year that tracking is at its most excessive. Teachers dutifully tick off numerous development matters statements, in order to 'level' a child and supposedly measure their progress, despite this being contrary to the purpose of the assessment. It is, in short, a colossal waste of time. Would it not, therefore, be preferable to have a dedicated and universal, standardised baseline assessment, which would afford more robust comparisons of pupils, cohorts and schools, instil greater confidence in progress measures, and free other forms of assessment from the damaging influence of perverse incentives?

However, this is clearly an unpopular opinion. Someone recently suggested I was confusing assessment with accountability, of losing sight of, or perhaps never really understanding, the true purpose of assessment. As a data analyst I admit I am more focussed on accountability and performance measures - that's my job - but I do understand that the main purpose of assessment is to understand what pupils have learnt, to identify gaps and barriers, and inform next steps; and that these principles are put at risk by accountability. However, doesn't accountability in education require some form of assessment? Aren't they inextricably linked? Or am I being naÃ¯ve or narrow minded? Perhaps this is more about confusion over the purpose of assessment: formative or summative, low or high stakes, for teaching and learning or monitoring school standards. Can assessment be all these things without getting wrenched apart in a tug-of-war between such opposing forces? It would appear not. We only have to look at how the Foundation Stage Profile is being put at risk as pupils' development in specific early learning goals is used to establish prior attainment groups for key stage 1 measures in the Inspection dashboard. Schools are, for the first time, concerned about having too many 'exceeding' pupils, fearing the impact on future headline meaures. And concerns about the validity of key stage 1 assessment, used as a baseline for key stage 2 progress measures, are nothing new.

A baseline therefore, whether taken in the reception year or at the end of key stage 1, is most robust if it has a single purpose: to act as a start point for future progress measures. There may be some formative by-product but that's not the main reason for carrying it out. Perhaps the reason why one particular assessment - one rooted in the principles of the foundation stage profile - became so dominant the first time round, was because we lost site of the main purpose of a baseline assessment, or never truly understood it in the first place. Either that or it was a protest vote from a profession concerned about yet another accountability measure. But let's face it, the purpose was never very well explained; that the baseline was required to produce a standardised score, which would be used to construct prior attainment groups for a future VA measure. Pupils' scores at key stage 2 would then be compared against the average score of pupils nationally with the same baseline score. That's pretty much it.

Accountability measures are not going anywhere soon, so we have to consider whether we want an accountability system based on attainment or progress. Most would probably go for the latter and so we need to work out how this is best achieved. It makes sense to measure progress from the earliest point possible but this doesn't necessarily have to be the beginning of reception year. It could be from the end of reception year, which would mean modifying the EYFSP, or from a separate assessment at the beginning of year 1. Whatever happens, it is obviously fairer to judge school performance on the basis of the progress pupils make and we need to recognise that the current process of measuring progress from key stage 1 to 2 is flawed, inaccurate and not fit for purpose. Future measures need to be far more robust, more standardised, and take account of as much of pupils' journey through school as possible.

Either that or scrap the entire system and start again.

## Sunday, 21 May 2017

### This is a low

I often use a race analogy to explain value added, to help people understand how we can measure progress without levels and why we don't need to have data in the same format at either end. Here's a simple example:

Imagine you enter a 10k race, which is part of a series of 10k races being held across the country on the same day. When you register you are asked what pace group you’d like to run in: slow, medium, fast. A keen runner, you choose to go in the fast group and you're handed a green vest to wear. Obviously, the medium pace runners get orange vests, and the slower group wear red (everyone loves a RAG rating system). You feel good that day, having trained hard, and run your race in 41 minutes. You’re thrilled because you’ve run a PB and you’re 10 minutes faster than the average time for your race that day. Even better, you find out you are 12 minutes faster than the national average time for the whole series. Unfortunately that’s not what the race organisers are interested in; they’re interested in how your time compares against the national average time for the green vest group, which happens to be 37 minutes. Despite being way faster than the overall average time, you are 4 minutes down on the average time for your group. Your value added score is therefore -4.

This is how VA works: it involves comparing one result against the average result of those in the same start group nationally. Here we have a start defined by a colour and a result in a time format; for KS1-2 measures we currently have a start defined by a sub level and a result in scaled score format. Same thing.

Whilst I much prefer value added to the old levels of progress measure - it's rooted in some form of reality after all - it does have one serious flaw: SEN and EAL pupils are often expected to cross the same line in the same time.

The issue is that many EAL and SEN pupils have comparably low start points, and are therefore placed into the same prior attainment groups, effectively treating them as similar pupils. And this means they will be compared against the same benchmarks at KS2. As we know, many EAL pupils make rapid progress and score well in their KS2 tests, whereas SEN pupils do less well. The end of KS2 estimates against which each pupil in the prior attainment group is compared, being an average of the performance of SEN and EAL pupils, tend to be too high for the former and easily attainable for the latter. The issue is exacerbated by the current system of low nominal scores assigned to the pre-key stage assessments, which almost guarantees that SEN pupils can only obtain negative progress scores whilst EAL pupils excel against their benchmarks.

We can return to our race analogy to illustrate this issue further. Imagine our pace groups are defined by how many steps runners could take when they were 18 months old. The runners wearing red vests were those that couldn't walk at that age. But perhaps some of those have gone on to be fast runners whilst others have continued to have difficulty walking. On race day they are in the same group, in the same vest, and each of their times will be compared against the overall average time for group. Hardly fair.

This issue needs resolving somehow. Introducing some form of CVA is an obvious answer - a measure that recognises the difference between SEN and EAL pupils - but is likely to lead to a proliferation of SEN pupils and a corresponding decline in those registered as EAL. Removal of pre-key stage pupils from progress measures is also a possibility but that may result in a big increase in pre-key stage pupils as schools seek to get certain pupils discounted.

I'm not sure what the answer is but it needs serious thought because as it stands, schools are hammered if they have SEN pupils, especially if they are pre-key stage.

The DfE stated that they wanted measures to reflect the progress made by all pupils.

Time to make good on that.

## Friday, 12 May 2017

### Pupils included and not included in KS2 measures

The issue of who is and who isn't included in KS2 measures is still causing major headaches for many, which is understandable because it's a bloody minefield. As we have just got through SATS week, and no doubt many senior leaders are now turning their attention to those not so distant reports, I thought I'd attempt to provide some clarity. There is, of course, a chance I've got some of this wrong, but it's worth a try.

Attainment

Attainment can be broken down into two main measures: 1) threshold measures (% attaining expected and high standards) and 2) average scaled scores.

1) Pupils included in and excluded from threshold measures

All pupils are included in this measure initially. Pupils can be discounted if they are recent arrivals from overseas, are EAL and from a non-English speaking country. Such pupils are identified during the checking exercise in September, using the results list sent via NCA Tools. Consequently, due to timing, discounted pupils are included in unvalidated data but are removed from later, validated data releases including the performance tables. All other pupils are included in the measure including pupils that were absent, below standard of tests, or disapplied. Pupils that achieve 100+ (or who have a TA of EXS in writing) are deemed to have met the expected standard; those that achieve a score of 110+ (or have a TA of GDS in writing) are deemed to have met the high standard. They are the only pupils in the numerator. All other pupils are in the denominator with exception of any discounted pupils. Again, discounting does not take effect until validated data release.

2) average scaled scores

Only pupils with a scaled score of 80+ are included in this measure. Nominal scores assigned to pre-key stage assessments (70-79) are only used in the progress measure. Nominal scores are not used in average scaled score calculation.

Progress

This is the real minefield and I created the following diagram to help navigate it:

Essentially to be included in the progress measure a pupil needs a start point (KS1 result) and an end point (KS2 score). The score can be a scaled score from a test or a nominal score assigned to a pre-key stage assessment or teacher assessment in the case of writing.

Scaled scores range from 80 to 120.

Writing scores as follows:
WTS = 90
EXS = 103
GDS = 113
(Note: these may change this year)

Nominal scores are as follows:
BLW = 70
PKF = 73
PKE = 76
PKG = 79
(Note: these may also change this year)

So, if a pupil has a) a KS1 start point, and b) and KS2 score as detailed above, they will be included in progress measures.

Those that are excluded from progress measures (and this is where it gets complicated and may well change this year) are as follows:

No KS1 result
pupils without a start point are not included. They are not assigned a nominal baseline.

Absent (A code)
Pupils that are absent from tests are excluded from progress measures even if they also have a pre-key stage assessment

Disapplied (D code)
Pupils that are disapplied are also excluded from progress measures. This is a commonly misunderstood term, confused with 'below standard of test'. It actually should only be used in cases where a pupil has been disapplied from the national curriculum and it is therefore not possible to make a teacher assessment. A disapplied pupils cannot therefore have a pre-key stage assessment. I have seen numerous examples of pupils coded as D, that actually should have had a B code and accompanying PKS assessment. Disapplied should be a rare occurrence in mainstream settings.

No scale score awarded and has HNM or EXS teacher assessment
These are the progress loophole pupils. They sat the test but failed to achieve enough marks to get the lowest scale score of 80. There were around 350 in maths last year but around 3500 in reading. Because HNM and EXS do not have an associated nominal score, if a pupil with one of those teacher assessments fails to achieve a scale score, then they end up with no score at all, and no score means they cannot be included in progress measures. The STA have stated that they intend to close the loophole this year, which probably means assigning a nominal score to HNM and EXS for such instances. We don't know what the nominal score would be but it seems logical to assume it would be capped at 79.

Missing result (M code)
If the result is missing, the pupil is excluded from progress measures.

Unable to access test (T or U code)
Again, pupils with these codes are excluded from progress measures.

Think that's pretty much evevrything.

Hope it's useful.

## Friday, 7 April 2017

### Defence against peverse incentives

I recently attended the JUSCO (junior school collaboration) conference in Birmingham organised by Chris McDonald (@chrismcd53). It was a great day packed with interesting talks and heated debate; and if you had to use one word to sum up the feelings in the room it would have to be 'frustration'. This feeling was perhaps best encapsulated in Dr Rebecca Allen's talk (@drbeckyallen) in which she showed the stark contrast in progress measures between all through primary schools and junior schools and postulated that "either there is stuff that's going on in your schools that really isn't as helpful as it could be [...] or there's something that's gone wrong with the way the government is measuring school performance". Becky then went on to show the contrast in inspection outcomes between infant and junior schools where the former are 2.8 times more likely to be judged outstanding than the latter, and perhaps unsurprisingly there is a far greater prevalence of RI and inadequate judgements amongst junior schools than amongst infant schools. Inevitably much of discussion that followed concentrated on the direct impact of over inflation of KS1 results by Infant schools, but an arguably bigger impact results from depression of KS1 results by primary schools, where perverse incentives exists to try to make results as low as possible. Junior schools, with no control over their pupils' start points, end up unfairly compared to a national baseline that is engineered to maximise progress. In an attempt to illustrate the issue I created the following diagram. I call it the swirling vortex of despair.

It shows how junior school pupils are at a huge disadvantage in the progress race because the school does not have control over the baseline, and how pupils that make good progress in reality end up with negative scores when compared against supposedly similar pupils nationally. It's like entering a fun run only to discover that the other competitors are elite athletes in disguise.

But this is not all about junior schools. The current system of measuring progress from KS1 to KS2 is hugely flawed and it is deeply concerning that such high stakes are linked to such bad data. The combination of ill-defined, crudely scored, best-fit sublevels at one end and a mix of test results and weird, clunky nominal scores at the other hardly makes for an accurate measure of progress. Add in those perverse incentives to keep the baseline as low as possible whilst inflating KS2 writing teacher assessments and finding ways to exclude less able pupils from measures and we have a mess of system that favours the most creative (or the least honest). And it's set to get worse in 2020 when the current year 3 with their new format KS1 results get to the end of KS2. The decision not to collect KS1 tests scores seems a missed opportunity when we consider what we will probably end up with. Instead of a refined series of start points based on scaled scores, we will have a handful of prior attainment groups, each containing tens of thousands of pupils, all of whom will have the same KS2 benchmarks. An avoidable disaster waiting to happen.

And so we need a better baseline and this is the hot topic in the recently launched consultation on the future of primary assessment. Most seem to favour a baseline taken early in the  reception year and this is most likely the direction of travel. After all, surely it makes sense to measure progress from when pupils start primary school rather than from a point 3/7ths of the way through. Whatever the start point, any future baseline assessment needs to be principled, robust, and should be refined enough to provide a suitable number of prior attainment groups. Unfortunately, and inevitably, those perverse incentives to ensure a low start point will still exist so how do we avoid them?

Moderation
Continue with the current arrangement of moderating a sample of schools each year. I would argue that this has not proved to be particularly effective. If it had been then we wouldn't have all these issues and I wouldn't be writing this blog post. It's probably time to consider other options. Alternatively moderation could be carried out after submission of data, which might help ensure schools err more on the side of caution. More likely though it would just create resentment.

School-to-school support
This could take a number of forms: schools moderating each other's baseline assessments (this already happens a lot anyway), teachers from a neighbouring school invigilating the assessment in the classroom (think national lottery independent adjudicator with a clipboard), or actively administering the assessment. I'm not sure how popular the latter would be either with staff or with children.

Use of technology
If pupils were to do the assessment via an iPad app there are benefits in terms of instant data collection and feedback, which is useful for the user. Plus - and here's the sinister bit - algorithms can spot unusual patterns (think betting apps), which can help discourage gaming. However, there are no doubt access issues for some pupils and what if they struggle to complete tasks at the first attempt? Do they get another go? Plus it means the purchase of a lot of iPads. I recall that one of the six providers of the last attempt at a baseline assessment had such a solution and evidently it wasn't particularly popular - it didn't make it to the final 3 - but that doesn't mean it's not worth another look.

Random checks
This would probably only work if the assessment was carried out in all schools on the same day. I'm assuming this won't happen. It is more likely that assessment will be carried out over a number of days, which would mean schools submitting the dates of assessment in advance like an athlete declaring their whereabouts. Also, who would carry out random checks? This is probably a non-starter. It would be massively unpopular.

Data analysis
Unlike levels, which were broad, vague and non-standardised, and therefore lacked an accurate reference point (yes, 2B was the 'expected' outcome but no one could really decide what a 2b was), a standardised assessment based on sample testing will provide a more reliable measure. Schools or areas with consistently low baseline scores, where all or nearly all pupils are below average, may warrant further investigation.

I understand that all of this sounds rather big brother but the alternative is we carry on as we are with unreliable progress measures against which critical judgements of school performance are made. If we are going to have progress measures - and who wants to have their performance based on attainment alone - then it absolutely has to be based on credible data. That means having an awkward conversation about gaming arising from perverse incentives and what steps can be taken to avoid it, because the current situation of high stakes performance measures, floor standards and coasting thresholds based on utterly unreliable data is unsustainable.

## Thursday, 16 March 2017

### Stuck in the middle with you (the problem with prior attainment bands)

This has actually been a good month in the fight against data nonsense. First, we are hearing news from STA briefings that they are aware of the progress loophole of despair and intend to do something about it. What exactly they'll do is anyone's guess but i'm assuming a nominal score for HNM in cases where pupils fail to score on tests. Whether they'll actually address the main issue of the unfairness of nominal scores for pre-key stage pupils (don't you just love those whopping negatives scores for SEND pupils?) remains to be seen. But at least there is some movement there. Next we have the Ofsted march update, which informs us that inspectors will no longer be asking for predicted results ("it's a mugs game" - Sean Harford). It also hammers home the point that there is no such thing as expected progress. And finally, relating to the above point about nominal scores, it urges caution when interpreting progress data; that inspectors must consider the effect of outliers and specifically mentions the issue of negative scores for pre-key stage pupils. This is all good stuff.

So, with good progress being made on these issues (pun intended) I thought I'd turn my attention to something else that winds me up: prior attainment bands. Not so much their existence but the varied and inconsistent ways in which they are defined. With RAISE on the way out, this is an opportunity to get things right next year. Well, we have to try.

Prior attainment bands - I'm talking about low, middle, high bands here; not the numerous prior attainment groups used to calculate VA - fall into two broad types: those based on average point scores at the previous key stage, and those based on prior attainment in the specific subject. VA uses APS whereas the old RAISE progress matrices were based on the prior level in the specific subject. Right now we have 3 main sources of school performance data (RAISE, Ofsted dashboard, and FFT) and we have 3 different definitions of low, middle, high prior attainment.

Ofsted Insepction dashboard

Things get confusing right away. Here we have two different definitions on the same page. For progress (the top half of the page), low, middle and high bands are based on KS1 APS, whilst for attainment they are based on pupils KS1 level in the specific subject. This means we have different numbers in, for example, the low group for progress than we do in the low group for attainment.

To clarify, the progress PA bands, based on KS1 APS are calculated as follows (and remember that maths is double weighted at KS1 so the fomula is (R+W+M+M)/4):

Low: KS1 APS <12
Middle: KS1 APS 12-17.99
High: KS1 APS 18+

Note that pupils who were 2c in reading, writing and maths at KS1 will have an APS of 13 and will therefore be in the middle band alongside pupils that were 2A across the board (APS 17). Furthermore, a solid 2b pupil (APS 15) will obviously fit in the middle band as will a pupil that was L1, L1, L3 in reading, writing and maths at KS1 (also APS 15).

Meanwhile, below in the attainment section we have the other low, middle, high definition based on the pupil's level in the specific subject at KS1. Here, a pupil that was L3 in reading and maths, and L2a in writing will appear in the middle band for writing attainment measures due to their L2 in writing, but will appear in the high progress band due to their high KS1 APS of 20. Bizarrely, a pupil that is L1 in reading maths and 2c in writing will also appear in the middle band for writing attainment due to their L2 in writing, whereas for progress they will fit into the low band due to their low KS1 APS of 10. This is why it's so important for schools to know who is in those bands. If you have bright red boxes around your attainment meaures (gaps equating to 2 or more pupils) this may be difficult to explain if all your pupils were 2A, but if they were 2c and L1 in other subsets, then it's somewhat more justifiable.

Oh, and for KS1 of course, prior attainment is based on the pupil's development in the specific early learning goal. One early learning goal out of 17 used to band pupils by. That can't be right surely? Nice to see this get a mention in Ofsted's march update, too.

And whilst we're on the subject of banding pupils for attainment measures, once we introduce an element of prior attainment, doesn't it cease to be about attainment and become a sort of pseudo progress measure anyway? Surely that's just like the old progress matrices, isn't it?

RAISE

Now things get even more odd. In RAISE, they take the same approach as the dashboard when it comes to progress with low, middle, high bands based on KS1 APS (see above). This means your progress data in RAISE looks the same as the progress data in the dashboard (just presented in a more incomprehensible format). However, when it comes to attainment, instead of adopting the subject specific method used in the dashboard, they stick with the progress approach based on KS1 APS. RAISE therefore presents attainment and progress in a consistent way with the same numbers of low, middle, high pupils in both parts, but this has caused a lot of confusion because the data differs between the two reports.

Elsewhere in the report we do have subject specific banding (based on pupil's level in that subject at KS1) and to really ramp up the confusion we have results in, say maths, presented for pupils that were low in reading or high in writing and KS1. I'm yet to meet an headteacher or senior leader who gets the point of this. I'm not entirely sure I do either.

FFT
And finally we come to FFT. They also split pupils into low, middle, and high bands based on prior attainment but have come up with a third way. Like the Ofsted dashboard approach (well the progress one anyway) this starts with KS1 APS, calculated in the same way as the DfE, but then they do something different: they rank all pupils nationally by KS1 APS (600,000 of them) and split the pile into thirds. Those in the lower third are the lower attainers, those in the middle third are the middle attainers, and (yes, you've guessed it) those in the upper third are the higher attainers. It's actually not quite thirds because if the 33rd percentile is smack bang in a stack of hundreds of pupils with the same APS, then they have to adjust up or down a bit I assume. This is why we don't get 33% in each band nationally.

I rather like this approach because it means the 2c pupils end up in the low group and the 2A pupils move into the high group. In fact you even find pupils with the odd 2b lurking in the lower group. You will certainly have more lower attainers in an FFT report than you do in the Ofsted dashboard and RAISE, and you tend to see fewer middle attainers and a few more higher attainers too. Pupils just get distributed across the bands a bit more and this tends to make sense to teachers (once they have got over their exasperation of having to get their heads round another methodology).

One of the things that springs to mind is that term 'most able'? Most able based on whose definition? The school's? The DfE's APS approach? Or perhaps their subject specific approach? And what about FFT's too third nationally? Anyone have the answer?

This fragmented, confused and confusing approach can't continue, and with the end of RAISE we have an opportunity to come up with a straightforward and logical approach to establishing these prior attainment bands. I prefer FFT's approach but whatever we end up with, could we have some consistency please? At least not have contrasting methods on the first page of key report.

And we haven't even touched on current Y3. Anyone know the average of EXS+WTS+GDS+GDS?

Over to you, people in charge of such things.

## Monday, 20 February 2017

### Blueprint for a tracking system

In almost every meeting I have in schools someone will at some point say one of the following:

"Our tracking system isn't really working out for us anymore"

or

"Our tracking system doesn't do what we need it to do"

or

"It's just levels really, isn't it?"

And this is often the main reason for my visit: schools want to explore other options. They want a simple, flexible system that works for them, grows with them, adapts to fit their needs, is easy to use, and reduces workload. That's exactly what all systems should do, but sadly this is quite far from the reality. It's that whole master and servant thing; and sometimes, when it comes to the school-system relationship, it's quite difficult to tell which one's which. Schools are evidently getting fed up with systems that are bloated, confusing and inflexible. They want a change and who can blame them? Too much precious time is wasted navigating systems, finding workarounds, trying to map the curriculum to the objectives, and finding the data you need from numerous reports to populate the spreadsheet that someone has sent you because the system doesn't provide everything in one place. Systems need to be more Connect 4 and less like chess. More of an asset and less of, well, an ass.

Almost a year ago I wrote this blog post on the golden rules of tracking. It outlined a philosophy on tracking systems that I presented at the original Learning First conference in Sheffield. Those golden rules are:
1. Separate assessment and tracking from performance management
2. Ensure your system is a tool for teaching and learning rather than accountability
3. Reduce the complexity
4. Do not recreate levels
5. Stop obsessing about measuring progress
Since then I've added to that another point: do not compromise your approach to assessment to fit the rules of a system. I've also changed point 3 to 'Reduce the complexity and the workload'. The post seemed to strike a chord and tied in with what many were thinking anyway. But in order to adopt these principles we need a complete overhaul to the way we track pupil progress in schools. With this in mind, I thought I'd write a post on what I think tracking systems should offer, which is well overdue considering how much time I spend ranting about what they get wrong.

So here are my golden practical rules of tracking. In my opinion, tracking systems should allow schools to:

1) Define and input their own objectives
Yes, all schools are guided by the programmes of study of the national curriculum, but they don't all teach the same things at the same time in the same order in the same way. Many schools have gone to great lengths to design a curriculum that ensures the right level of challenge for its diverse range of pupils, taking considerable care over the wording of key mastery indicators and 'non-negotiables'. Some schools are happy tracking against a broad framework of a few key objectives whilst others want a tighter criteria and therefore more statements. And many schools are using assessment materials from various third parties such as Ros Wilson, Assertive Mentoring, Rising Stars, Kangaroo Maths, or from the LA. Often these assessment materials are in paper form, stapled into books and highlighted to show pupils' security and progression. The problem is that many systems do not allow schools to input their own objectives or edit the existing ones, and so the system is not aligned to what is being taught. Teachers are then confronted with a time consuming and frustrating mapping exercise to attempt to link assessment materials to whatever national curriculum objectives are in the system. Yes, pupils need to secure the national curriculum objectives but schools are finding their own route towards those and would welcome the flexibility to setup their systems to mirror what happens in the classroom. Essentially, the core purpose of tracking system is, in my view, to work as an electronic markbook. That's the important bit - that's the bit that has impact. Get that right and the rest will follow.

Oh, and please keep those lists of objectives to a minimum. Too much tracking is unnecessary, demoralising and counterproductive.

2) Define their own summative assessment descriptors
Schools use different terms to describe where pupils are in their learning: below, meeting, above; emerging, developing, secure; beginning, advancing, deepening; supported, independent, greater depth. Yes, we are often talking about the same thing but schools should be able to adapt their system to reflect the language used by teachers across the school. Also, many systems use terms such as emerging, developing, secure to reflect how much of the curriculum the pupil has covered and achieved. Increasingly schools are turning away from this approach and instead using these terms to denote the pupil's competence in what has been taught so far. So rather than secure being something that happens after Easter when they've covered a certain percentage of the curriculum, it is instead used to indicate that the pupil is working securely within the curriculum at that point in time. If schools choose to apply terms in this way then the system should be flexible enough to accommodate that approach. Furthermore, some schools are starting to use teacher assessments of progress as well as attainment to get around the inadequacy of linear metrics commonly used in systems. If schools need more than one set of descriptors to account for the achievement of all pupils, they should have that option.

3) Record depth of understanding
In a curriculum where pupils do not move on to next year's content early, how do we show 'above expected progress' for pupils that start the year 'at age related expectations'?* All fine for pupils that are below and catch up - they make progress in the traditional sense, through rapid and extensive coverage of the curriculum - but those secure learners present a problem. This is why it's useful to be able to differentiate on the basis of depth at the objective level. It means teachers can a) record and identify pupils on the basis of their competency in key areas, and b) schools can have some measure of progress for those pupils constrained within the parameters of their age-appropriate curriculum, should they need it. Most systems do this to a degree, using a rag rating system to show the pupil's security within each objective, but an associated numerical system is useful for aggregated cohort and group reporting, and possibly for individual pupil progress. Admittedly, this is probably more about external scrutiny than classroom practice but it's an easy win and does not add to workload.

* I used inverted commas because I dislike both those terms but recognise they are commonly used.

4) Enter whatever data they like
A system should be like a scrapbook. It should allow schools to collect whatever assessment data they need, in whatever format it's in, and analyse it in anyway they see fit. If they want to track the changes in standardised scores and percentile ranks, let them. If they want to track progress from two separate baselines, they should be able to do that too. I guess this means that they should also be allowed to define their own pseudo-sublevels and points-based progress measures. I'd really rather they didn't, and I hope they change their mind in time, but if that's what they feel is needed right now, then it's better the system does it for them rather than forcing them down the route of building their own spreadsheet and adding to workload.

5) Track pupils out of year group
Not all pupils are accessing the curriculum for their year group and this causes problems in many systems. Either the systems don't allow tracking outside of year group and simply classify such pupils as 'emerging' along with everyone else; or they do allow it but always highlight the pupils with a red flag, define them as well below, and categorise them as making below expected progress. It would be welcome if schools could track the progress of pupils working below the curriculum for their year group without being penalised by the algorithms of the system. Why can't SEN pupils be shown to have made great progress?

6) Design their own reports
The holy grail of tracking: data on a page. A single report that shows all key data for each subject broken down by cohort and key group. Schools need the facility to be able to design and build their own reports to satisfy the needs of governors, LA monitoring visits and Ofsted. These might show the percentage of objectives secured and the percentage of pupils on track alongside the average test score for each key group, at the start of the year, the mid point and the end. If the school then decides to implement a new form of assessment, they should be able to input that data into the system and add a column to their report. There are too many teachers and senior leaders working around their systems, exporting data into excel, or running reports in order to complete a form or table to meet a particular requirement, perhaps sent by the LA or MAT. I've even seen headteachers reduced to running countless reports and writing key bits of data down on a pad of paper in order to transfer to the table they've been asked to fill in. Now, much of this data maybe pointless and increased workload should be resisted, but it is a fact of school life that people ask for numbers, and systems that allow report templates to be designed to fulfil these needs would be one less headache. And on the subject of reports, we really don't need that many. In addition to a custom built table, probably something like a pie chart that shows the percentage of pupils that are below, at and above where you expect them to be at any point in time. Maybe a progress matrix. I'm struggling to think of anything else that is actually useful.

7) Input and access their data on any device
It is 2017 after all!
The ultimate aim is to have neat, intuitive, flexible systems that save time and stress. Systems that are useful tools for teaching and learning, that do not influence a school's approach to assessment in anyway. They should be easy to use with minimal training; and whilst they shouldn't require much support, if a school does need help, someone on the end of a phone is always very welcome.

So, there's my pie-in-the-sky utopian daydream.

Any takers?