Thursday, 16 March 2017

Stuck in the middle with you (the problem with prior attainment bands)

This has actually been a good month in the fight against data nonsense. First, we are hearing news from STA briefings that they are aware of the progress loophole of despair and intend to do something about it. What exactly they'll do is anyone's guess but i'm assuming a nominal score for HNM in cases where pupils fail to score on tests. Whether they'll actually address the main issue of the unfairness of nominal scores for pre-key stage pupils (don't you just love those whopping negatives scores for SEND pupils?) remains to be seen. But at least there is some movement there. Next we have the Ofsted march update, which informs us that inspectors will no longer be asking for predicted results ("it's a mugs game" - Sean Harford). It also hammers home the point that there is no such thing as expected progress. And finally, relating to the above point about nominal scores, it urges caution when interpreting progress data; that inspectors must consider the effect of outliers and specifically mentions the issue of negative scores for pre-key stage pupils. This is all good stuff.

So, with good progress being made on these issues (pun intended) I thought I'd turn my attention to something else that winds me up: prior attainment bands. Not so much their existence but the varied and inconsistent ways in which they are defined. With RAISE on the way out, this is an opportunity to get things right next year. Well, we have to try.

Prior attainment bands - I'm talking about low, middle, high bands here; not the numerous prior attainment groups used to calculate VA - fall into two broad types: those based on average point scores at the previous key stage, and those based on prior attainment in the specific subject. VA uses APS whereas the old RAISE progress matrices were based on the prior level in the specific subject. Right now we have 3 main sources of school performance data (RAISE, Ofsted dashboard, and FFT) and we have 3 different definitions of low, middle, high prior attainment. 

Ofsted Insepction dashboard

Things get confusing right away. Here we have two different definitions on the same page. For progress (the top half of the page), low, middle and high bands are based on KS1 APS, whilst for attainment they are based on pupils KS1 level in the specific subject. This means we have different numbers in, for example, the low group for progress than we do in the low group for attainment.

To clarify, the progress PA bands, based on KS1 APS are calculated as follows (and remember that maths is double weighted at KS1 so the fomula is (R+W+M+M)/4):

Low: KS1 APS <12
Middle: KS1 APS 12-17.99
High: KS1 APS 18+

Note that pupils who were 2c in reading, writing and maths at KS1 will have an APS of 13 and will therefore be in the middle band alongside pupils that were 2A across the board (APS 17). Furthermore, a solid 2b pupil (APS 15) will obviously fit in the middle band as will a pupil that was L1, L1, L3 in reading, writing and maths at KS1 (also APS 15). 

Meanwhile, below in the attainment section we have the other low, middle, high definition based on the pupil's level in the specific subject at KS1. Here, a pupil that was L3 in reading and maths, and L2a in writing will appear in the middle band for writing attainment measures due to their L2 in writing, but will appear in the high progress band due to their high KS1 APS of 20. Bizarrely, a pupil that is L1 in reading maths and 2c in writing will also appear in the middle band for writing attainment due to their L2 in writing, whereas for progress they will fit into the low band due to their low KS1 APS of 10. This is why it's so important for schools to know who is in those bands. If you have bright red boxes around your attainment meaures (gaps equating to 2 or more pupils) this may be difficult to explain if all your pupils were 2A, but if they were 2c and L1 in other subsets, then it's somewhat more justifiable.

Oh, and for KS1 of course, prior attainment is based on the pupil's development in the specific early learning goal. One early learning goal out of 17 used to band pupils by. That can't be right surely? Nice to see this get a mention in Ofsted's march update, too. 

And whilst we're on the subject of banding pupils for attainment measures, once we introduce an element of prior attainment, doesn't it cease to be about attainment and become a sort of pseudo progress measure anyway? Surely that's just like the old progress matrices, isn't it?

RAISE

Now things get even more odd. In RAISE, they take the same approach as the dashboard when it comes to progress with low, middle, high bands based on KS1 APS (see above). This means your progress data in RAISE looks the same as the progress data in the dashboard (just presented in a more incomprehensible format). However, when it comes to attainment, instead of adopting the subject specific method used in the dashboard, they stick with the progress approach based on KS1 APS. RAISE therefore presents attainment and progress in a consistent way with the same numbers of low, middle, high pupils in both parts, but this has caused a lot of confusion because the data differs between the two reports. 

Elsewhere in the report we do have subject specific banding (based on pupil's level in that subject at KS1) and to really ramp up the confusion we have results in, say maths, presented for pupils that were low in reading or high in writing and KS1. I'm yet to meet an headteacher or senior leader who gets the point of this. I'm not entirely sure I do either.

FFT
And finally we come to FFT. They also split pupils into low, middle, and high bands based on prior attainment but have come up with a third way. Like the Ofsted dashboard approach (well the progress one anyway) this starts with KS1 APS, calculated in the same way as the DfE, but then they do something different: they rank all pupils nationally by KS1 APS (600,000 of them) and split the pile into thirds. Those in the lower third are the lower attainers, those in the middle third are the middle attainers, and (yes, you've guessed it) those in the upper third are the higher attainers. It's actually not quite thirds because if the 33rd percentile is smack bang in a stack of hundreds of pupils with the same APS, then they have to adjust up or down a bit I assume. This is why we don't get 33% in each band nationally.

I rather like this approach because it means the 2c pupils end up in the low group and the 2A pupils move into the high group. In fact you even find pupils with the odd 2b lurking in the lower group. You will certainly have more lower attainers in an FFT report than you do in the Ofsted dashboard and RAISE, and you tend to see fewer middle attainers and a few more higher attainers too. Pupils just get distributed across the bands a bit more and this tends to make sense to teachers (once they have got over their exasperation of having to get their heads round another methodology). 

One of the things that springs to mind is that term 'most able'? Most able based on whose definition? The school's? The DfE's APS approach? Or perhaps their subject specific approach? And what about FFT's too third nationally? Anyone have the answer?

This fragmented, confused and confusing approach can't continue, and with the end of RAISE we have an opportunity to come up with a straightforward and logical approach to establishing these prior attainment bands. I prefer FFT's approach but whatever we end up with, could we have some consistency please? At least not have contrasting methods on the first page of key report.

And we haven't even touched on current Y3. Anyone know the average of EXS+WTS+GDS+GDS? 

Over to you, people in charge of such things. 




Sunday, 26 February 2017

The Data Burden

In implementing a new approach to assessment and tracking we must first weigh up our desire for data against any impact on workload. Sometimes those seemingly minor tweaks can produce a butterfly effect, placing huge demands on teachers' time that are most likely disproportionate to any benefit gained from the data. Before embarking on a new assessment journey always start by asking: who is this data for? what impact will it have on learning?

Many schools have implemented systems with a small number of key learning objectives in each subject, encouraging teachers to have the confidence to make judgements on pupils' competence within a broad framework. Some schools however have broken these key objectives down into many smaller steps - perhaps to enable progress to be measured over shorter periods - and this is having an adverse impact both on teacher workload and possibly on learning itself as we seek to assess and record everything a pupil does. This may be done to improve accuracy but such micromanagement of assessment is undermining professional judgement and is unnecessary, demoralising and counterproductive. There is a great irony in procedures put in place supposedly for the purposes of school improvement that actually take teachers' time away from teaching. It is hardly surprising that some headteachers are experiencing a backlash. An anti-tracking rebellion is underway in many schools.

I recently visited a school to help them set up tracking for the reception year. Initially the school wanted to track against development matters statements in each of the areas of the early years foundation stage profile, making an assessment every half term. After discussion, they decided that they would just focus on reading, writing and maths. I took a look at the list of statements - there were around 60 for each subject across the various month bands. I got out my calculator:

60 statements

3 subjects

6 terms

45 pupils

That comes to 48,600 assessments that the Early Years teacher would have to make across the course of year.

Let's say each assessment takes 5 seconds to enter onto the system. This works out as 9 days per year for one teacher.

Perhaps the teacher wouldn't have to update each assessment each term but you take the point. It's a potentially huge burden. Thankfully the headteacher and deputy went a bit pale and decided to go back to the drawing board. Their tracking system now requires the teacher to record an assessment against each of the 17 areas of the foundation stage three times per year. Somewhat more manageable and the early years teacher was certainly happier. They can spend more time doing more important stuff, like teaching.

But yesterday I had a lengthy twitter chat with the rather awesome @misterunwin. In his school they do no objective level tracking at all in their system. No box ticking, no rag rating statements, no assessment by numbers. Teachers make an overall judgement of where each pupil is at each half term and this produces all the data the school needs for analysis and reporting purposes. Teachers therefore spend very little time on data entry, freeing them up to do the stuff that has real impact: planning lessons and teaching children. Detailed pupil progress reviews are used to reveal gaps, inform next steps and identify where further support is needed. Staff are very happy with this arrangement, and improved staff morale will obviously benefit the school in many ways.

I have always tried to encourage schools to keep lists of objectives to an absolute minimum; to strike that balance between impact on learning and impact on workload. However, I completely understand why schools will want to reject any approach to tracking that takes teachers' time away from teaching, which is why some schools are choosing not to track in this way at all. I doubt that we are about to see schools ditching tracking systems en masse - I'm certainly not advocating that, by the way - but we do need to be mindful of the how these approaches can eat into teachers' precious time and the adverse impact this can have.

Start by getting your calculator out.

Monday, 20 February 2017

Blueprint for a tracking system

In almost every meeting I have in schools someone will at some point say one of the following:

"Our tracking system isn't really working out for us anymore"

or

"Our tracking system doesn't do what we need it to do"

or

"It's just levels really, isn't it?"

And this is often the main reason for my visit: schools want to explore other options. They want a simple, flexible system that works for them, grows with them, adapts to fit their needs, is easy to use, and reduces workload. That's exactly what all systems should do, but sadly this is quite far from the reality. It's that whole master and servant thing; and sometimes, when it comes to the school-system relationship, it's quite difficult to tell which one's which. Schools are evidently getting fed up with systems that are bloated, confusing and inflexible. They want a change and who can blame them? Too much precious time is wasted navigating systems, finding workarounds, trying to map the curriculum to the objectives, and finding the data you need from numerous reports to populate the spreadsheet that someone has sent you because the system doesn't provide everything in one place. Systems need to be more Connect 4 and less like chess. More of an asset and less of, well, an ass.

Almost a year ago I wrote this blog post on the golden rules of tracking. It outlined a philosophy on tracking systems that I presented at the original Learning First conference in Sheffield. Those golden rules are:
  1. Separate assessment and tracking from performance management
  2. Ensure your system is a tool for teaching and learning rather than accountability
  3. Reduce the complexity
  4. Do not recreate levels
  5. Stop obsessing about measuring progress
Since then I've added to that another point: do not compromise your approach to assessment to fit the rules of a system. I've also changed point 3 to 'Reduce the complexity and the workload'. The post seemed to strike a chord and tied in with what many were thinking anyway. But in order to adopt these principles we need a complete overhaul to the way we track pupil progress in schools. With this in mind, I thought I'd write a post on what I think tracking systems should offer, which is well overdue considering how much time I spend ranting about what they get wrong.

So here are my golden practical rules of tracking. In my opinion, tracking systems should allow schools to:

1) Define and input their own objectives
Yes, all schools are guided by the programmes of study of the national curriculum, but they don't all teach the same things at the same time in the same order in the same way. Many schools have gone to great lengths to design a curriculum that ensures the right level of challenge for its diverse range of pupils, taking considerable care over the wording of key mastery indicators and 'non-negotiables'. Some schools are happy tracking against a broad framework of a few key objectives whilst others want a tighter criteria and therefore more statements. And many schools are using assessment materials from various third parties such as Ros Wilson, Assertive Mentoring, Rising Stars, Kangaroo Maths, or from the LA. Often these assessment materials are in paper form, stapled into books and highlighted to show pupils' security and progression. The problem is that many systems do not allow schools to input their own objectives or edit the existing ones, and so the system is not aligned to what is being taught. Teachers are then confronted with a time consuming and frustrating mapping exercise to attempt to link assessment materials to whatever national curriculum objectives are in the system. Yes, pupils need to secure the national curriculum objectives but schools are finding their own route towards those and would welcome the flexibility to setup their systems to mirror what happens in the classroom. Essentially, the core purpose of tracking system is, in my view, to work as an electronic markbook. That's the important bit - that's the bit that has impact. Get that right and the rest will follow.

Oh, and please keep those lists of objectives to a minimum. Too much tracking is unnecessary, demoralising and counterproductive. 

2) Define their own summative assessment descriptors
Schools use different terms to describe where pupils are in their learning: below, meeting, above; emerging, developing, secure; beginning, advancing, deepening; supported, independent, greater depth. Yes, we are often talking about the same thing but schools should be able to adapt their system to reflect the language used by teachers across the school. Also, many systems use terms such as emerging, developing, secure to reflect how much of the curriculum the pupil has covered and achieved. Increasingly schools are turning away from this approach and instead using these terms to denote the pupil's competence in what has been taught so far. So rather than secure being something that happens after Easter when they've covered a certain percentage of the curriculum, it is instead used to indicate that the pupil is working securely within the curriculum at that point in time. If schools choose to apply terms in this way then the system should be flexible enough to accommodate that approach. Furthermore, some schools are starting to use teacher assessments of progress as well as attainment to get around the inadequacy of linear metrics commonly used in systems. If schools need more than one set of descriptors to account for the achievement of all pupils, they should have that option.

3) Record depth of understanding
In a curriculum where pupils do not move on to next year's content early, how do we show 'above expected progress' for pupils that start the year 'at age related expectations'?* All fine for pupils that are below and catch up - they make progress in the traditional sense, through rapid and extensive coverage of the curriculum - but those secure learners present a problem. This is why it's useful to be able to differentiate on the basis of depth at the objective level. It means teachers can a) record and identify pupils on the basis of their competency in key areas, and b) schools can have some measure of progress for those pupils constrained within the parameters of their age-appropriate curriculum, should they need it. Most systems do this to a degree, using a rag rating system to show the pupil's security within each objective, but an associated numerical system is useful for aggregated cohort and group reporting, and possibly for individual pupil progress. Admittedly, this is probably more about external scrutiny than classroom practice but it's an easy win and does not add to workload. 

* I used inverted commas because I dislike both those terms but recognise they are commonly used.

4) Enter whatever data they like
A system should be like a scrapbook. It should allow schools to collect whatever assessment data they need, in whatever format it's in, and analyse it in anyway they see fit. If they want to track the changes in standardised scores and percentile ranks, let them. If they want to track progress from two separate baselines, they should be able to do that too. I guess this means that they should also be allowed to define their own pseudo-sublevels and points-based progress measures. I'd really rather they didn't, and I hope they change their mind in time, but if that's what they feel is needed right now, then it's better the system does it for them rather than forcing them down the route of building their own spreadsheet and adding to workload.

5) Track pupils out of year group
Not all pupils are accessing the curriculum for their year group and this causes problems in many systems. Either the systems don't allow tracking outside of year group and simply classify such pupils as 'emerging' along with everyone else; or they do allow it but always highlight the pupils with a red flag, define them as well below, and categorise them as making below expected progress. It would be welcome if schools could track the progress of pupils working below the curriculum for their year group without being penalised by the algorithms of the system. Why can't SEN pupils be shown to have made great progress?

6) Design their own reports
The holy grail of tracking: data on a page. A single report that shows all key data for each subject broken down by cohort and key group. Schools need the facility to be able to design and build their own reports to satisfy the needs of governors, LA monitoring visits and Ofsted. These might show the percentage of objectives secured and the percentage of pupils on track alongside the average test score for each key group, at the start of the year, the mid point and the end. If the school then decides to implement a new form of assessment, they should be able to input that data into the system and add a column to their report. There are too many teachers and senior leaders working around their systems, exporting data into excel, or running reports in order to complete a form or table to meet a particular requirement, perhaps sent by the LA or MAT. I've even seen headteachers reduced to running countless reports and writing key bits of data down on a pad of paper in order to transfer to the table they've been asked to fill in. Now, much of this data maybe pointless and increased workload should be resisted, but it is a fact of school life that people ask for numbers, and systems that allow report templates to be designed to fulfil these needs would be one less headache. And on the subject of reports, we really don't need that many. In addition to a custom built table, probably something like a pie chart that shows the percentage of pupils that are below, at and above where you expect them to be at any point in time. Maybe a progress matrix. I'm struggling to think of anything else that is actually useful.

7) Input and access their data on any device
It is 2017 after all!
The ultimate aim is to have neat, intuitive, flexible systems that save time and stress. Systems that are useful tools for teaching and learning, that do not influence a school's approach to assessment in anyway. They should be easy to use with minimal training; and whilst they shouldn't require much support, if a school does need help, someone on the end of a phone is always very welcome.

So, there's my pie-in-the-sky utopian daydream.

Any takers?




Monday, 6 February 2017

Mitigation in writing

The purpose of this post is not to explain the mechanics and issues of the KS2 writing progress measure - I've already covered that in enough detail here - but I do want to offer a method to help schools counter the nonsense. Suffice to say, the measure is badly designed, imprecise, and has a habit of making things look a lot worse than they are. In the absence of fine graded test scores, we only have the teacher assessments to work with. Rather than decide that this is insufficient data for an accurate progress measure (is there any such thing?), the decision was taken to assign nominal scores to the teacher assessments as follows:

Working towards the expected standard:  91
Working at the expected standard: 103
Working at greater depth: 113

These nominal scores for writing are then compared against fine graded, scaled score estimates - e.g. 96.69 or 105.77 - just as they are in reading and maths. Unfortunately, unlike reading and maths where pupils actually have test scores, in writing there is no such thing. The benchmarks that pupils' results are compared against are therefore unachievable, and the big leaps between nominal scores result in some huge negative and positive differences. 

So if schools have lots of pupils assessed as 'working towards' they tend to have big negative progress scores; if they have lots assessed as 'working at the expected standard', progress scores tend to be positive. The odd thing is that many, if not most, pupils assessed as working towards, who have negative progress scores, have benchmarks of greater than 91 but less than 103. This means that if they achieved the expected standard (and were assigned a nominal score of 103) their progress would be positive. And this got me thinking: surely all such pupils are actually in line, aren't they? They achieve working towards, they have a score of 91, their benchmark is 97.26 - which they can't achieve - and the next available score is 103. That, in my mind, means their progress is broadly in line with average. They've done what you'd expect them to do from their particular start point. They can't hit the benchmark, they can only fall short or exceed it. 

To counter the negative (and significantly negative) scores many schools have found themselves saddled with for KS2 writing, I propose the following approach for individual pupils: 

Above: positive progress score
In line: negative progress score but estimate is lower than next nominal score threshold
Below: negative progress score and estimate is higher than next nominal score threshold

For pupils assessed as working towards, this works out as:

Above: positive progress score
In line: negative progress score but estimate between 91-103
Below: negative progress score and estimate above 103

In one school recently I suggested a slight variation, which acts as a compromise to DfE methodology:

Above: positive progress score
In line: negative progress score and estimate 91-100
Below: negative progress score and estimate 100-103
Well below: negative progress score and estimate >103

Pupils were then colour coded dark green, light green, orange, and red respectively. They only had one 'red' pupil.

Once you've chosen your approach, simply state the percentage of pupils that fall into each category. If you like you could present it in a progress matrix from KS1 writing level start point. 

I may add this approach into my VA calculator at some point, but in the mean time you can do it manually from the KS2 pupil list, which you can download from RAISE. I definitely think it's worth exploring. Chances are your it'll make your writing data look better. 

It'll be fairer too. 

Tuesday, 17 January 2017

Confidence is not a preference

"Confidence is a preference....."

Nice sentiment, Blur, but let's be realistic: confidence is a threshold. 

I get asked about confidence intervals a lot now. They're everywhere - in the dashboard, RAISE, and FFT reports - and they are more in your face than ever before. We are now aware that these are the things that dictate statistical significance, that they define those red and green boxes and dots in our reports, and it's therefore no surprise that teachers, particularly heads and senior leaders, want to understand them. So, I get asked about them almost daily and I'm aware that I don't always do a very good job of  explaining them. One effort last week: "well, it's nearly 2 standard deviations from the mean and it involves group size". Poor. C grade answer at best. Blank looks. Someone coughs. In the distance a dog barks. Must try harder.

This post is my attempt to redeem myself. This is my resit. I'm going for the B grade answer. 

First, what is a confidence interval? The DfE and FFT use a 95% confidence interval, which means that 95% of confidence intervals constructed around sample (i.e. schools') mean scores will contain the population mean score. Those 5% of cases where the confidence interval does not contain the population mean are deemed to be statistically significant.

If you look at a normal distribution you will note that 68% of pupils' scores are within 1 standard deviation of the national average score; and 96% of scores are within 2 standard deviations. Only 4% of cases are more than 2 standard deviations from the mean: 2% are 2 standard deviations below and 2% are 2 standard deviations above. So, if 96% of scores are within 2 standard deviations of the mean, then 95% of scores are within 1.96 standard deviations of the mean. That's where the 95% comes from and explains the 1.96 in the calculation (see below). 

But if we just looked at those schools whose average scores were beyond 1.96 standard deviations of the mean then it wouldn't be very useful because it would only identify a handful of schools with very high or very low scores. Also, it wouldn't be very fair because a pupil in a small school has a far greater impact on results that a pupil in a large school. This is why the size of the cohort (or group) needs to be taken into account. We are therefore creating groups of schools on the basis of cohort (or pupil group) size, that have the same confidence intervals, and ascertaining whether data is significant compared against the results of same size cohorts nationally. The calculation of the confidence interval is therefore as follows:

1.96 x national standard deviation/ square root of number of pupils in cohort (or group)

(Apologies for lack of mathematical symbols but I'm writing this on my phone whilst waiting for a train). 

If you want the national standard deviations, you can find them in DfE guidance, in the ready reckoner tools, and in my VA calculator. Off the top of my head they are between 5 and 6 (to several decimal points) depending on subject, which means that 68% of pupils' scores fall within the range of 97-109 where the national average score is 103 and the standard deviation is 6.

For the sake of simplicity, let's assume the national standard deviation is 6 and the cohort is 16 (square root = 4). The confidence interval is therefore:

1.96 x 6/4 = 2.94

And for a cohort of 64 (square root = 8) the confidence interval is:

1.96 x 6/8 = 1.47

This effectively means that results for school A need to shift further to be statistically significant than those of school B, but a pupil in school A has a bigger impact on overall results so that's fair.

Right, so we have calculated our confidence interval. Now what? 

Next, let's look at how confidence intervals are presented in various reports and data sets. There are four main sources containing confidence intervals that we'll be familiar with and which all present confidence intervals in a different way. 

1) RAISE
Let's start with the obvious one. RAISE presents the confidence interval as a +/- figure alongside your progress score, for example 2.76 +/-1.7. Imagine this as a range around your progress score with your progress score in the middle of the range. In this case the range is 1.06 to 4.46 (i.e. add and subtract 1.7 to/from your progress score to get the range). In this example the progress score is positive (pupils made more than average) but in the absence of a coloured indicator, how could you tell if it's significantly above? Quite simple really: if the entire range is above zero then progress is significantly above; but you don't need to calculate the full range to do this. The rules are: 
  • if you have a positive progress score, subtract the confidence interval. If the result is still a positive number then progress is significantly above. 
  • If you have a negative progress score, add the confidence interval. If the result is still negative then progress is significantly below.
  • In all other cases, progress is in line with average (the confidence interval contains the national mean of 0). 
Essentially if your confidence interval straddles 0 (i.e. it contains the national mean) then it is not significant. If it does not straddle 0 (i.e. it does not contain the national mean) then it is significant (one of the 5% of cases).

2) Checking data
Here the range is given. So, in the above example, progress is stated as 2.76 and the range given as 1.06 to 4.46. The confidence interval is not given as a +/- figure as in RAISE, so that first step is done for you. Basically, if the lower limit is above 0 then data is significantly above (e.g. 1.06 to 4.46); if the upper limit is below 0 then data is significantly below (e.g. -3.66 to -1.24). If the confidence interval straddles 0 (contains the national mean) then it is not signicant (e.g -2.52 to 0.56, or -0.72 to 2.58) regardless of whether the progress score is positive or negative. 

3) Inspection Dashboard
This is how to present a confidence interval! A simple graphical format with a dot representing the progress score and a vertical line through it to show the extent of the confidence interval. Quite simply, if the entire confidence interval is above the national 0 line then data is significantly above; if the entire confidence interval is below the 0 line then data is significantly below; and if the confidence straddles the line (i.e. it contains the mean) then it is statistically in line with average regardless of the position of the point. 

4) FFT
There is a subtle twist here. The overview page of the FFT dashboard (I love those dials) shows the confidence interval as the space between the red and green zones. Here the confidence interval is calculated as above but is constructed around the national average score (e.g. 103, as indicated by the triangle symbol) rather than around the school average score. If the school's score (indicated by the needle on the dial) points to the red zone, it is significantly below; if it points to the green zone it is significantly above, and if it points to the space between it is statistically in line with the average. The dials only work if the confidence interval is constructed in this way. 

And most importantly, please remember that statistical significance does not mean educationally significant. No cause can be inferred and it is not necessarily indicative of good or bad teaching, or strong or weak leadership. 

I hope that helps. 

And if it does, please feel free to give me a grade. And feedback in any colour pen you like. 

As long as it's green. 

For more information on confidence intervals read Annex A here 

Thursday, 12 January 2017

Similar schools (my a***!)

An ex-colleague called me yesterday with a question about the similar schools measure in the performance tables. As we spoke I could feel that creeping uneasiness one experiences when confronted with something you really should know about but don't. Cue delaying tactics (how's the family? Good Christmas?) whilst frantically searching for the guidance on the internet. Then it transpired I had no internet connection because the builders had accidentally tripped the switch at the consumer unit. And then, thankfully, we were cut off. Phew!

To be fair, I had read the guidance when it was published last month; I just didn't really pay much attention and evidently the information hadn't sunk in. Now was an opportunity to correct that. Besides, I was writing a report and could do with the distraction.

So what is the similar schools measure? How does it work? Essentially it borrows from VA methodology in that it involves calculation of end of key stage 2 estimates based on key stage 1 start points, and is similar to FFT reports in that they calculate an estimated percentage 'likely' to achieve expected standards in reading, writing and maths. Unlike FFT, however, they do not then compare that estimated percentage to the actual result. Here's the process:

1) for each pupil in the previous Year 6 cohort, the probability of that pupil achieving expected standards based on their prior attainment at key stage 1 is calculated. For example, in 85% of cases nationally, a pupil with KS1 APS of 17 achieved the expected standard, and a pupil with a KS1 APS of 15.5 achieved the expected standard in 62% of cases. These pupils therefore have a statistical probability of achieving expected standards. However, a pupil with a KS1 APS of 12 has only a 38% chance of achieving expected standards (i.e. Nationally, a pupil with this prior attainment achieved expected standards in only 38 out of 100 cases). This pupil therefore does not have a likelihood of achieving expected standards.

I made all those probabilities up by the way. They are for illustration purposes. I could have done some proper research - there is a graph in the guidance - but I'm just lazy.

So now we know, based on pupils' start points and national outcomes, whether pupils have a likelihood of achieving the expected standard. Once this is done for individual pupils, we can aggregate this to calculate an estimate for the whole school cohort: simply add up the number of pupils that have a probable chance of achieving expected standards and divide that by the total number of pupils in the cohort.

Note that this process has been done for use in the performance tables. These probabilities are not calculated in advance of pupils sitting SATS; they are done after the event. We already know what pupils results are and whether or not they have met expected standards. Here we are calculating the probability of them doing so based on what pupils with the same prior attainment achieved nationally. It's retrospective. 

In FFT reports, they take this estimated outcome and compare it to the actual result, which gives us the +/- percentage figures seen on the right hand side of overview page in the dashboard (those nice dials). Essentially this is FFT telling us the difference between the likely outcome for such a cohort and the actual outcome. This is a form of VA.

That is not what the DfE have done.

2) now each school has an estimate, a probable outcome. This is the percentage of pupils likely to achieve expected standards based on the achievement of pupils with similar start points nationally. Schools are ranked on the basis of this estimated outcome. We now have a big pile of 16500 primary schools ranked in order of likely result. 

3) each school is placed in a group with 124 other schools. The groups are established by selecting the 62 schools above and below your school in the rankings. These are your 'similar schools', schools that have similar estimated outcomes on the basis of pupils' prior attainment. Size of cohort and contextual factors are not taken into account.

4) then - and this is where it gets a bit odd - they take each school's actual results (the percentage of pupils in each school that achieved the expected standards) and rank the schools in the group on that basis. Schools are then numbered from 1 to 125 to reflect their position in the group. Now, in theory, this should work because they all have similar prior attainment and therefore ranking them by actual results should reflect distance travelled (sort of). Except it doesn't. Not really. Looking at the data yesterday, I could see schools ranked higher despite having much lower VA scores than schools below them. The similar schools measure therefore conflicts with the progress measure, which begs the question: why not just rank schools in the group on the basis of progress scores rather than attainment? Of course a combined progress measure, like FFT's Reading and Maths VA score, would help. Or, at least calculate the difference between the actual and the estimate and rank on that basis. The fact that the school estimates are not published bugs me, too. These should be presented alongside the number of pupils in the cohort and some contextual data - % SEN, EAL, FSM, deprivation indicators and the like. If part of the reason for doing this is to help schools identify potential support partners (that's what the guidance says), then surely this data is vital. 

Not factoring in cohort size is a particular issue. A school with 15 pupils, of which 60% achieved the expected standard, will be ranked higher in the group than a school with 100 children of which 58% achieved the expected standard. In the former school a pupil accounts for 7%; in the latter it's less than 1%. It's hardly a fair comparison. 

And of course no adjustment is made to account for that high percentage of SEN pupils you had in year 6, or all those pupils that joined you during years 5 and 6, but that's an issue with VA in general.

I get the idea of placing schools into groups of similar schools, but to blow all that by then ranking schools in the group on the basis of results without factoring in cohort size or contextual seems wrong. And to overlook the fact that schools can be ranked higher despite having poorer VA scores is a huge oversight. Surely this hints at a system that is flawed. 

So, there you go. That's the similar schools measure. Go take a look at the performance tables and see where you rank and which schools you're apparently similar to.

And then join me in channeling Jim Royle:

Similar schools, my arse!











Wednesday, 14 December 2016

10 things I hate about data

I seem to spend a lot of time ranting these days. Recently I've been trying to rein it in a bit, be less preachy. It's counter productive to wind people up - need to get them on side - the problem is there are just so many opportunities to get annoyed these days. I'm turning into a data analyst who hates data. Well, a data analyst who hates bad data (as any decent data analyst should). And let's face it, there's a lot of bad data out there to get annoyed about. So, a few weeks ago I gave a conference talk entitled '10 things I hate about data' (it could have been much longer, believe me, but 10 is a good number).

Here's a summary of that talk.

1) Primary floor standards
We are now in a crazy world where the attainment floor standard is way above national 'average'. England fell below its own minimum expectation. How can that happen? On the 1st September 2016, the floor standard ceased to be a floor standard and became an aspirational target. But the DfE had already committed to having no more than 6% of schools below floor, which meant they had to set the progress thresholds so low that they just captured a handful of schools. I find it hard to apply the phrase 'sufficient progress' to scores of -5 and -7 and keep a straight face. So primary schools have four floor standards: one linked to attainment, which is way too high, and three relating to progress, which are way too low. If the school is below 65% EXS in reading, writing and maths, and below just one of the progress measures it is below floor. Unless that one happens to be writing, in which case chances are it'll be overlooked because writing data is junk. Oh, and if you are below just one progress floor then it has to be significantly below to be deemed below floor, which is ridiculous because it's not actually possible to have scores that low and for them not to be significantly below. Meanwhile, secondary schools with all the complexity of GCSE and equivalent data, have one single measure, progress 8, which captures the average progress made by pupils in up to 8 subjects. The floor standard at KS4 is half a grade below. Simple. Why can't primary schools have a similar single, combined-subject, progress-based floor measure?

2) Coasting
I hate this measure. I get what they're trying to do - identify schools with high attainment and low progress - but this has been so badly executed. Why 85%? What does that mean? How does 85% level 4 in previous years link to 85% achieving expected standards this year? Why are they using levels of progress medians for 2014 and 2015 when they could have used VA, which would make the progress broadly comparable with 2016? And why have they just halved the progress floor measures? (smacks of what my Dad would describe as a 'Friday afternoon job'). Remember those quadrant plots in RAISE? The ones that plotted relative attainment (which compared the school's average score against the national average score) against VA? Schools that plot significantly in the bottom right hand quadrant 3 years running - that would be a better definition of coasting. Unless they are junior schools, in which case forget it. Actually, until we have some robust data with accurate baselines, perhaps forget the whole thing.

3) The use of teacher assessment in high stakes accountability measures
The issue of KS2 writing has been discussed plenty already. We know it's inconsistent, we know it's unreliable, we know it's probably junk. Will it improve? No. Not until teacher assessment is removed from the floor standards at least. I'm not saying that writing shouldn't be teacher assessed, and that teacher assessment shouldn't be collected, but we can't be surprised that data becomes corrupted when the stakes are so high. The DfE evidently already understands this - they decided a year ago not to use writing teacher assessment in the progress 8 baseline from 2017 onward (the first cohort to have writing teacher assessed at KS2). It's not just a KS2 issue either. KS1 assessments form the baseline for progress measures so primary schools have a vested interest in erring on the side of caution there; and now that the DfE are using EYFSP outcomes to devise prior attainment groups for KS1, who knows what the impact will be on the quality of that data. All this gaming is undermining the status of teacher assessment. It needs a rethink.

4) The writing progress measure
Oh boy! This is a whopper. If you were doubting my assertion above that writing teacher assessment should be removed from floor standards, this should change your mind. Probably best to read this but I'll attempt to summarise here. Essentially, VA involves comparing a pupil's test score against the national average scores for pupils with the same start point. A pupil might score 97 in the test and the national average score for their prior attainment group is 94, so that pupil has a progress score of +3. This is fine in reading and maths (and FFT have calculated VA for SPaG) but it doesn't work for writing because there are no test scores. Instead, pupils are assigned a 'nominal score' according to their teacher assessment: WTS = 91, EXS = 103, GDS = 113, which is then compared against an unachievable fine graded benchmark. So, a pupil in prior attainment 12 (KS1 APS of 15 i.e. 2b in reading, writing and maths) has to achieve 100.75 in writing, which they can't. If they are assessed as meeting expected standard (nominal score of 103) their progress score will be +2.25; if they are assessed as working towards (nominal score of 91) their progress score will be -9.75. Huge swings in progress scores are therefore common because most pupils can't get close to their benchmarks due to the limitations of the scoring system. And I haven't got space to here to go discuss the heinousness of the nominal scoring system applied to pre-key stage pupils except to say that it is pretty much impossible for pupils below the level of the test to achieve a positive progress score. So much the for claim in the primary accountability document that the progress measures would reflect the progress made by ALL pupils. Hmmm.

5) The death of CVA
In 2011, the DfE stated that 'Contextual Value Added (CVA) goes further than simply measuring progress based on prior attainment [i.e. VA] by making adjustments to account for the impact of other factors outside of the school’s control which are known to have had an impact on the progress of individual pupils e.g. levels of deprivation. This means that CVA gives a much fairer statistical measure of the effectiveness of a school and provides a solid basis for comparisons.' Within a year, they'd scrapped it. But some form of CVA is needed now more than ever. Currently, pupils are grouped and compared on the basis of their prior attainment, without any account taken of special needs, length of time in school, number of school moves or deprivation. This is a particular issue for low prior attainment groups, which commonly comprise two distinct types of pupils: SEN and EAL. Currently, no distinction is made and these pupils are therefore treated the same in the progress measures, which means they are compared against the same end of key stage benchmarks. These benchmarks represent national average scores for all pupils in the particular prior attainment group, and are heavily influenced by the high attainment of the EAL pupils in that group, rendering them out of reach for many SEN pupils. Schools with high percentages of SEN are therefore disadvantaged by the current VA measure and are likely to end up with negative progress scores. The opposite is the case for schools with a large proportion of EAL pupils. This could be solved by either introducing form of CVA or by removing SEN pupils from headline measures. This of course could lead to more gaming of the system in terms of registering pupils as SEN or not registering as EAL, but the current system is unfair and needs some serious consideration.

6) The progress loophole of despair
This is nuts! Basically, pupils that are assessed as pre-key stage are included in progress measures (they are assigned a nominal score as mentioned above), whereas those assessed as HNM (in reading and maths) that fail to achieve a scaled score (i.e. do not achieve enough raw marks on the test) are excluded from progress measures, which avoids huge negative progress scores. I have seen a number of cases this year of HNM pupils achieving 3 marks on the test and achieving a scaled score of 80. Typically they end up with progress deficits of -12 or worse (sometimes much worse), which has a huge impact on overall school progress. Removing such pupils often makes the difference between being significantly below and in line with average. And the really mad thing is that if those pupils had achieved one less mark on the test, they wouldn't have achieved a scaled score and therefore would not have been included in the progress measures (unlike the pre-key stage pupils). Recipe for tactical assessment if ever I saw one.

7) The one about getting rid of expected progress measures
The primary accountability document states that 'the ‘[expected progress] measure has been replaced by a value-added measure. There is no ‘target’ for the amount of progress an individual pupil is expected to make.’ Yeah, pull the other one. Have you seen those transition matrices in RAISE (for low/middle/high start points) and in the RAISE library (for the 21 prior attainment groups)? How many would really like to see those broken down into KS1 sublevel start points? Be careful what you wish for. Before we know it, crude expectations will be put in place, which will be at odds with value added and we're back to square one. Most worrying are the measures at KS1 involving specific early learning goals linked to end of KS1 outcomes, and the plethora of associated weaknesses splashed all over page 1 of the dashboard. Teachers are already referring to pupils not making 'expected progress' across from EYFS to KS1 on the basis of this data. Expected progress and VA are also commonly conflated, with estimates viewed as minimum targets. In every training session I've run recently, a headteacher has recounted a visit by some school improvement type who has shown up brandishing a copy of the table from the accountability guidance, and told them what scores each pupil is expected to get this year. Expected implies that it is prescribed in advance, and yet VA involves comparison against the current year's averages for each prior attainment group and we don't know what these are yet. Furthermore, because it is based on the current year's averages, half the pupils nationally will fall below the estimates and half will hit or exceed them. That's just how it is. Expected progress is the opposite of VA and my response to anyone confusing the two is: tell me what the 2017 averages are for each of the 21 prior attainment groups, and i'll see what I can do. I spoofed this subject here, by the way.

8) Progress measures in 2020
Again, this. Put simply, the basis of the current VA measure is a pupil's APS at KS1. How are we going to do this for the current Y3? How to I work out the average for EXS, WTS, EXS? Will the teacher assessments be assigned a nominal value? How many prior attainment groups will we have in 2020 when this cohort reach the end of KS2. Currently we have 21 but surely we'll have fewer considering there are now fewer possible outcomes at KS1, which means we'll have more pupils crammed into a smaller number of broader groups. Such a lack of refinement doesn't exactly bode well for future progress measures. Remember that all pupils in a particular prior attainment group will have the same estimates at the end of KS2, so all your EXS pupils will be lumped into a group with all other EXS pupils nationally and given the same line to cross. This could have been avoided if the KS1 test scores were collected and used as part of the baseline, but they weren't so here we are. 2020 is going to be interesting.

9) Colour coding used in RAISE
Here is a scene from the script I'm working on for my new play, 'RAISE (a tragedy)'.

HT: "blue is significantly below, green is sigificantly above, right?"
DfE: "No. It's red and green now"
HT: "right, so red is significantly below, and green is significantly above. Got it"
DfE: "well, unless it's prior attainment"
HT: "sorry?"
DfE: "blue is significantly below if we're dealing with prior attainment"
HT: "so blue is significantly below for prior attainment but red is significantly below for other stuff, and green is significantly above regardless. And that's it?"
DfE: "Yes"
HT: "You sure? You don't look sure."
DfE: "Well...."
HT: "well what?"
DfE: "well, it depends on the shade?"
HT: "what shade? what do you mean, shade?"
DfE: "shade of green"
HT: "shade of green?"
DfE: "or shade of red"
HT: "Is there a camera hidden here somewhere?"
DfE: "No. Look, it's perfectly simple really. Dark red means significantly below and in the bottom 10% nationally, light red means significantly below but not in the bottom 10%; dark green is significantly above and in the top 10%, light green is significantly above but not in the top 10%. See?"
HT: "Erm....right so shades of red and green indicating how significant my data is. Got it."
DfE: "Oh no. We never say 'how significant'. That's not appropriate, statistically speaking"
HT: "but, the shades...."
DfE: "well, yes"
HT: *sighs* "OK, shades of red and green that show data is significantly below or above and possibly in the bottom or top 10%. Right, got it"
DfE: "but only for progress"
HT: "Sorry, what?"
DfE: "we only do that for progress"
HT: "but I have dark and light green and red boxes for attainment, too. Look, here on pages 9 and 11 and 12. See?"
DfE: "Yes, but that's different"
HT: "How is it different? HOW?"
DfE: "for a start, it's not a solid box, it's an outline"
HT: "Is this a joke?"
DfE: "No"
HT: "So, what the hell do these mean then?"
DfE: "well those show the size of the gap as a number of pupils"
HT: "are you serious?"
DfE: "Yes. So work out the gap from national average, then work out the percentage value of a pupil by dividing 100 by the number of pupils in that group. Then see how many pupils you can shoehorn into the gap"
HT: "and the colours?"
DfE: "well, if you are 2 or more pupils below that's a dark red box, and one pupil below is a light red box, and 1 pupil above that's a light green box, and you get a dark green box if you are 2 or more pupils above national average"
HT: "and what does that tell us?"
DfE: "I'm not sure, but -2 or lower is well below, and +2 or higher is well above. You may have seen the weaknesses on your dashboard"
HT: "So let me get this straight. We have dark and light shades of red and green to indicate data that is either statistically below or above, and in or not in the top or bottom 10%, or gaps that equate to 1 or 2 or more pupils below or above national average. Am I there now?"
DfE: "Yes, well unless we're talking about prior attainment"
HT: "Oh, **** off!"

Green and red lights flash on and off. Sound of rain. A dog barks.

10) Recreating levels
We've been talking about this for nearly 2 years now and yet I'm still trying to convince people that those steps and bands commonly used in tracking systems - usually emerging, developing, secure - are essentially levels by another name. Instead of describing the pupil's competence in what has been taught so far - in which case a pupil could be 'secure' all year - they instead relate to how much of the year's curriculum has been achieved, and so 'secure' is something that happens after Easter. Despite finishing the previous year as 'secure' they start the next year as 'emerging' again (as does everyone else). Pupils that have achieved between, say, 34% and 66% of the year's curriculum objectives are developing, yet a pupil that has achieved 67% or more is secure. Remember those reasons for getting rid of levels? how they were best-fit and told us nothing about what pupils could or couldn't do; how pupils at either side of a level boundary could have more in common than those placed within a level; how pupils could be placed within a level despite having serious gaps in their learning. Consider these reasons. Now look at the the example above, consider your own approach, and ask yourself: is it really any different? And why have we done this? We've done it so we can have a neat approximation of learning; arbitrary steps we can fix a point score to so we can count progress even if it it's at odds with a curriculum 'where depth and breadth of understanding are of equal value to linear progression'. Then we discover that once pupils have caught up, they can only make 'expected progress' because they don't move on to the next year's content. So we shoehorn in an extra band called mastery or exceeding or above, with a nominal bonus point so we can show better than expected progress for the most able. These approaches have nothing to do with real learning; they've got everything to do with having a progress measure to keep certain visitors happy. It's all nonsense and we need to stop it.

Merry Christmas!