Wednesday, 25 October 2017

MATs: monitoring standards and comparing schools

A primary school I work with has been on the same journey through assessment land as many other schools up and down the country. Around two years ago they began to have doubts about the tracking system they were using - it was complex and inflexible, and the data it generated had little or no impact on learning. After much deliberation, they ditched it and bought in a more simple, customisable tool that could be set up and adapted to suit their needs. A year later and they have an effective system that teachers value, that provides all staff with useful information, and is set up to reflect their curriculum. A step forward.

Then they joined a MAT.

The organisation they are now part of is leaning on them heavily to scrap what they are doing and adopt a new system that will put them back at square one. It's one of those best-fit systems in which all pupils are 'emerging' (or 'beginning') in autumn, mastery is a thing that magically happens after Easter, and everyone is 'expected' to make one point per term. In other words, it's going back to levels with all their inherent flaws, risks and illusions. The school tries to resist the change in a bid to keep their system but the MAT sends data requests in their desired format, and it is only a matter of time before the school gives in.

It is, of course, important to point out that not all MATs are taking such a remote, top down, accountability driven approach, but some are still stuck in a world of (pseudo-) levels and are labouring under the illusion that you can use teacher assessment to monitor standards and compare schools, which is why I recently tweeted the following:


This resulted in a lengthy discussion about the reliability of various tests, and the intentions driving data collection in MATs. Many stated that assessment should only be used to identify areas of need in schools, in order to direct support to the pupils that need it; data should not be used to rank and punish. Of course I completely agree, and this should be a strength of the MAT system - they can share and target resources. But whatever the reasons for collecting data - and lets hope that its done for positive rather than punitive reasons - let's face it: MATs are going to monitor and the compare schools and usually this involves data. This brings me back to the tweet: if you want to compare schools, don't use teacher assessment, use standardised tests. Yes, there may be concerns about the validity of some tests on the market - and it is vital that schools thoroughly investigate the various products on offer and choose the one that is most robust, best aligned with their curriculum, and will provide them with the most useful information - but surely a standardised test will afford greater comparability than teacher assessment.

I am not saying that teacher assessment is always unreliable; I am saying that teacher assessment can be seriously distorted when it is used for multiple purposes (as stated in the final report of the Commission on Assessment without Levels). We need only look at the issues with writing at key stage 2, and the use of key stage 1 assessments in the baseline for progress measures to understand how warped things can get. And the distortion effect of high stakes accountability on teacher assessment is not restricted to statutory assessment; it is clearly an issue in schools' tracking systems when that data is not only used for formative purposes, but also to report to governors, LAs, Ofsted, RSCs, and senior managers in MATs. Teacher assessment is even used to set and monitor teachers' performance management targets, which is not only worrying but utterly bizarre.

Essentially, using teacher assessment to monitor standards is counter productive. It is likely to result in unreliable data, which then hides the very things that these procedures were put in place to reveal. And even if no one is deliberately massaging the numbers, there is still this issue of subjectivity: one teacher's 'secure' is another teacher's 'greater depth'. We could have two schools with very different in-year data: school A has 53% of pupils working 'at expected' whereas school B has 73%. Is this because school B has higher attaining pupils than school A? Or is it because school A has a far more rigorous definition of 'expected'?

MATs - and other organisations - have a choice: either use standardised assessment to compare schools or don't compare schools. In short, if you really want to compare things, make sure the things you're comparing are comparable.


Tuesday, 3 October 2017

Thoughts on new Ofsted inspection data summary report (primary)

Yesterday Ofsted released a 'prototype' of its new Inspection Data Summary Report and it's a major departure from the Ofsted Inspection Dashboard that we've become accustomed to over the past two years. On the whole it's a step in the right direction, with more positives than negatives, and it's good to see that Ofsted have listened to feedback and acted upon it. Here's a rundown of changes.

Positives
Areas for investigation. This is a welcome change. The new areas for investigation are clearer - and therefore more informative - than the 'written by robot' strengths and weaknesses that preceded them, many of which were indecipherable. They read more like the start point for a conversation and hopefully this will result in more productive, equitable relationship between inspectors and senior leaders. 

Context has moved to the front. Good. That's where it should be. It was worrying when context was shoved to the back in RAISE reports. This is hopefully a sign that school context will be taken into account when considering standards. As it should be. 

Sorted out the prior attainment confusion at KS2. Previous versions of the dashboard were confusing: progress measures based prior attainment on KS1 APS thresholds (low: <12, Mid: 12-17.5, High: 18+ (note: maths is double weighted)); attainment measures based prior attainment on the pupils level in the specific subject (low: L1 or below, mid: L2, high: L3). This has now been sorted out and prior attainment now refers to pupils KS1 APS in all cases. Unfortunately this is not the case for prior attainment of KS1 pupils - more on that below. 

Toning down the colour palette. Previous versions were getting out of hand with a riot of colour. The page of data for boys and girls at KS2 looked like a carnival. Thankfully, we now just have simple shades of blue so sunglasses are no longer required; and nowhere in the new report is % expected standard and % greater depth merged into a single bar with darker portions indicating the higher standard. These are now always presented in separate bars, thankfully. That page was always an issue when it came to governor training. 

Progress in percentiles. Progress over time is now shown using percentiles, which makes a lot of sense and is easy to understand. Furthermore, the percentiles are linked to progress scores, so it shows improvement in terms of progress not attainment. Percentiles show small steps of improvement over time, which means that schools can now put changes in progress scores into context, rather than guessing what changes mean until they move up a quintile. Furthermore, an indicator of statistical significance is provided, which may show that progress is be in the bottom 20% but is not significantly below, or perhaps is in the top 20% but is not significantly above, which adds some clarity. And finally, the percentiles for 2015 are based on VA data, rather than levels. Those responsible for the 'coasting' measure take note. 

Scatter plots. Whilst an interactive scatter plot (i.e. an online clickable version) is preferable, these are still welcome because they instantly identify those outliers that have had a significant impact on data. In primary schools, These are often pupils with SEND that are assessed as per-key stage, and who end up with huge negative scores that in no way reflect the true progress they made. One quick glance at a scatter plot reveals that all pupils are clustered around the average, with the exception of those two low prior attaining pupils that have progress scores of -18. 

Confidence intervals are shown. I was concerned that they'd stop doing this - showing the confidence interval as a line through the progress score - but thankfully this aspect has been retained. It's useful because schools can show how close they are to not being significantly below, or being significantly above. Inspectors will be able to see that if that pre-key stage pupil with individual progress score of -18 was removed from the data, that would shift the overall score enough to remove that red box. Statistical significance is, after all, just a threshold. 

Negatives
Prior attainment of KS1 pupils. I'm not against the idea of giving some indication of prior attainment - it provides useful context after all - but I have a bit of problem here. Unlike at KS2 where prior attainment bands are based on the pupils APS at KS1, at KS1 prior attainment is based on the pupils' development in specific early learning goals (ELG) at EYFS. Pupils are defined as emerging, expected or exceeding on basis of their development in reading, or writing, or maths (for the latter they take the lower of the two maths ELGs, to define the pupils prior attainment band). This approach to prior attainment therefore takes no account of pupils development in other areas, just the one that links to that specific subject. The problem with this approach is that you can have a wide variety of pupils in a single band. For example, the middle band (those categorised as expected) will contain pupils that have met all ELGs (i.e. made good level of development) alongside pupils that have met the ELG in reading but are emerging in other areas, and pupils that have met the ELG in reading and exceeded others. These are very different pupils. Data in RAISE showed us that pupils that made a good level of development are twice as likely to achieve expected standards at KS1 than those that didn't, so it seems sensible that any attempt to define prior attainment should take account of wider development across the EYFSP, and not just take subjects in isolation. Perhaps consider using an average score for EYFS prime and specific ELGs, to define prior attainment instead. 

Prior attainment of Y1-2 in the context page. Currently this is based on how NYC the percentage achieving specific ELGs differs from national average, whilst prior attainment for years 3-6 involves APS. As above, perhaps Ofsted should consider using an EYFS average score across the prime and specific ELGs instead. 

I am, by the way, rather intrigued by mention of APS for current years 3 and 4. Does this mean Ofsted have developed some kind of scoring system for new KS1 assessments? This surely has to happen as some point anyway, in order to place pupils into prior attainment groups for futures progress measures. 

Lack of tables. There's nothing wrong with a table; you can show a lot in a table. In the absence of tables to show information for key groups, the scatter plots are perhaps trying to do too much. Squares for boys, triangles for girls, pink for disadvantaged, grey for non-disadvantaged, and a bold border to indicate SEN. It's just a bit busy. But then again, we can see those pupils that are disadvantaged and SEN, so it can be useful. It's not a major gripe and time will tell if it works, but sometimes a good old table is just fine.

And finally a few minor niggles:

There is no such things as greater depth in Grammar, Punctuation and Spelling at KS2. Mind you, yesterday it had greater depth for all subjects at KS2 and that's changed already so it's obviously just a typo.

And many of the national comparator indicators on the bar graphs are wonky and don't line up. They look more like backslashes. 

But overall this is big improvement on the previous versions and will no doubt be welcomed by head teachers, senior leaders, governors and anyone else involved in school improvement. This, alongside ASP and the Compare Schools website, shows the direction of travel of school data: that it's becoming more simplified and accessible. 

And that's a good thing.