Thursday, 19 May 2016

I Predict a Riot

I've come to the conclusion that I'm easily misunderstood. I rock up somewhere to give a talk, start ranting about how we have to wean ourselves off our number addiction - stop trying to measure and quantify everything that moves (and if it doesn't move, measure it until it does) - and people are left thinking "who the hell is this guy? I thought he was a data analyst but he's telling us to quit the data". Well, that's not quite true. I'm telling people to quit the dodgy data, to get beyond this 'bad data is better than no data at all' mindset. But much of what I see in terms of tracking of pupil progress falls well and truly into the bad data camp.

Essentially, there are three main things I take issue with when it comes to tracking systems:

1) The recreation of levels - most systems still use best fit bands linked to pupils' coverage of the curriculum. *coughs* "sublevels!"

2) The simplistic, linear points-based progress measures and associated 'expected' rates of progress *coughs* "APS!"

3) The attempts to use tracking data, based on ongoing teacher assessment against key objectives, to predict end of key stage test outcomes *coughs* "astrology!"

So, earlier this week, I was attempting to explain my thoughts on points 1 and 2 when someone stated - and I'm paraphrasing here - that 'the DfE quantify everything so we need to do the same in order to predict outcomes'. First, let me me be clear here, I am not suggesting schools give up on data - I think that would be foolhardy - I just think we need to be smarter and only collect data that has a genuine and demonstrable impact on pupils' learning. We just have to accept that we can't quantify everything - as much as some may want to - and admit that much of the data we generate is for purposes of accountability and performance management, not learning. Second, I do not believe we should use tracking data to predict end of key stage outcomes. It's a bad idea for a number of reasons.

First, such predictions hinge on a school's (or teacher's) definition of 'secure'/'on track'/'at age related expectations'. What constitutes so-called 'secure' quite clearly differs from one school to another. Many schools are using systems that provide a convenient answer for them and this is often based on the simplistic expectation that pupils will achieve a certain percentage of objectives per term, so a pupil is expected to achieve, say, a third of the objectives in the first term, two thirds in the second term, and so on. I recall an amusing yet worrying twitter conversation where teachers offered up their definitions of end of year expectations, all based on a percentage of objectives achieved. Various numbers were thrown into the ring: 51%, 67%, 70%, 75%, 80%. Interestingly no one suggested 100%, so quite clearly there are many pupils out there tagged as 'secure' despite having considerable gaps in their learning, gaps that may well widen over time. If we make the assumption that these pupils are on track to meet expected standards then we may well be in for the a shock. 

The next thing that creeps in are those key accountability thresholds: 65% for the floor standards, 85% for coasting. So, based on our definition of 'secure' or 'on track', which are quite possibly wide of the mark, we attempt to estimate the number of pupils that will meet the expected standard to satisfy ourselves (and our governors) that we'll be safe this year. Breathe a sigh of relief. All this inferred from a somewhat spurious definition of 'secure' that varies from school to school. Worse still are those predictions based on a linear extrapolation of a pupil's current progress gradient. I remember tracking systems doing this with point scores and the predictions were off the map. A pupil has made 4 points/steps/bands this year so we assume they will do the same over the next two years and will therefore easily exceed the expected standard (seriously, this is going on).

Next, we fall deeper down the rabbit hole and find schools that are converting teacher assessment into a sort of pseudo-scaled score. So, a pupil that is currently 'secure' or 'at ARE' will have a score of 100, whilst those pupils that are 'above' have higher scores, and those that are 'below' have lower scores. This is achieved by scoring and weighting each objective, totalling each pupil's score and standardising it. Horrible. Don't do this.

My overall concern is the impact these practices have on the expectations and aspirations for pupils. Will there be a concentration of resources on 'borderline' pupils and perhaps less opportunity for pupils to deepen their learning? Can a school really promote a learning without limits culture if it is distracted by distant thresholds? Will such approaches create a false sense of security that could easily backfire? 

And what are the consequences if those predictions are wrong, as they are so likely to be?

Obviously schools will want to have some idea of likely outcomes, and no doubt governors (and others) will request such information, but really this should only be done for end of key stage cohorts, and any predictions should be informed by the standards, test frameworks and optional testing. It is extremely risky to try to make the leap from a broad teacher assessment, at the end of year 4, say, to an end of key stage outcome, especially now when the curriculum is so new. Essentially we are attempting to link observations to outcomes based on a huge amount of supposition, and this is extremely risky. 

My firm belief is that tracking systems need to be untangled from accountability and performance management if they are to be truly fit for purpose. They should not be used to set performance targets and they should not be used for making predictions. If they are used in this way then there is always the risk that the data will be manipulated to provide the rose-tinted view rather than the warts-and-all picture that we really need. Instead, tracking systems should be very simple tools for recording and monitoring pupils' achievement of key objectives; that allow teachers to quickly identify gaps in pupils' learning and respond accordingly.

And if they do that then the final outcomes will take care of themselves.

Friday, 13 May 2016

The Future of Base Exploration

So, as we all know, the findings of the SQA's comparability study have been reported and the three remaining baseline assessments (EEx, NFER, CEMS) have been deemed to be incomparable with one another. Wow! Really? The DfE attempted to convince us that they had intended to carry out this study all along, whereas I'm fairly sure they stated they'd wait until the 2015 cohort got to end of key stage 1 in 2018 before testing the reliability and validity of the data. I guess they just realised the mess that had been created and wanted a quick way out of a mire. It's not going away of course. SchoolsWeek, who ran a front page story on 26th February under the headline 'future of baseline tests in doubt' noted that 'a mooted alternative to tests is the introduction of "school readiness" checks - an option said to be preferred by number 10'. But it does mean that all current primary school cohorts will have their progress measured from key stage 1 to key stage 2, and there probably won't be an entry to key stage 2 value added  measure until 2024, when the 2017/18 cohort reach the end of year 6.

And today SchoolsWeek have run another story: Numbers for baseline tests begin to thin out. Now, there doesn't appear to have been the collapse that many anticipated because some schools evidently still see value in having a baseline assessment, but in the absence of an official accountability measure, i.e. VA, it remains to be seen what schools do with this data. Many will run further standardised tests from other providers and hope that the numbers go up. This will probably mean subtracting the baseline score from a later standardised score in the hope that results will be positive. Some may assume this to be a form of value added (which it isn't). Many schools will seek to group pupils based on past and present standardised scores in order to construct progress matrices. The score thresholds used to define these groups will, in most cases, be entirely arbitrary rather than based on any statistical process. Over the next few years we will no doubt see some rather dubious and unscientific practices going on, which will be used to great effect in improvement conversations and inspections, and in many cases these will go unchallenged because they are based on standardised tests, rather than teachers' assessment. 

It is entirely understandable why schools will adopt the practices they are about to adopt: they are desperate for a supposedly robust and reliable progress measure, and if it's based on tests it must be right, right? Well, anyone who has read any of my blogs in the past year or so will know my thoughts on progress measures offered by most tracking systems. We have become so desperate for a numerical scale of progress that we are willing to overlook the any deficiency in terms of accuracy and meaning. We will readily sacrifice our principles at this crossroads of assessment for a simple number because there are those out there - from various external bodies - who demand it. We are driven by number lust even though the numbers - the pseudo-APS - we generate tell us little or nothing about what pupils can and cannot do, and are therefore of no use to teachers trying to do the job of teaching. I wish we'd just give up trying to quantify distance travelled by pinning a point score onto teacher assessments. 

So, yes, if we want to measure progress properly, and get away from pseudo levels and points of our tracking systems, then tests most certainly have a part to play. But assigning pupils into groups based on arbitrary test score thresholds or subtracting one test score from another isn't the way forward. What i'd like to see are the baseline test providers develop robust interim VA measures. Pupils do a baseline assessment, the scores from which are collected by the provider. Then, perhaps two or three years later, the same pupils take another test (it doesn't need to be from the same provider but the provider would need to collect these scores as well). Assuming a large enough population of pupils have taken the tests (i.e. thousands of pupils), VA analysis can then be carried out. Each pupils' test score in the later test is compared against the average test score for pupils with the same baseline score, and the average difference is calculated to show the cohorts' overall VA score. This would be a robust, valid and meaningful progress measure where each pupil's attainment is compared against potentially thousands of pupils with similar start points across hundreds of schools. Again, the baseline providers would not necessarily have to design and administer the later tests, they could instead offer to collect and analyse data from a number of different providers. They could even calculate VA from their own baseline assessment to key stage 1 tests.

And if they offered this then I suspect numbers taking baseline tests, rather than thinning out, would actually increase substantially, such is the need to robust progress measures.

I hope the baseline providers are giving this some thought.

Sunday, 1 May 2016

The Floor Show

'This is the floorshow, the last ideal' - The Sisters of Mercy

Yesterday, Twitter lit up with debate and disbelief over Nicky Morgan's speech to NAHT. Much of the consternation related to her assertion that "this more than 1% more schools will be below the floor standard than last year". How could she guarantee this? How can she possibly know this in advance with such confidence? With new tests and assessments based on a new curriculum, how can anyone predict how many schools will be below floor standard? 

The reason for the confusion is that everyone is focussed on attainment, and for the past year there has been a creeping panic that many pupils will fail to meet the expected standard, especially in writing. And it even seemed that the DfE had come round to this way of thinking with its statements on standards this year in its clarification document on KS2 writing:

'As this is the first year of schools working with the new interim assessment frameworks, the Minister for Schools has written to the Chief Inspector asking him to ensure that Ofsted inspectors take into account national performance and contextual factors when considering a school's performance in writing at KS2, which is used as part of the floor standard.'

'The Minister has also asked RSCs to be mindful of the impact of these new arrangements in making decisions about issuing warning notices and tackling underperformance following this year’s results.'

So it looked like the ground was being prepared for a large number of schools to be below floor and that some degree of lenience would be shown.

But then Nicky Morgan announces, with head-scratching certainty, that there would be no more than 1% more schools below floor than last year. This means just 6% of schools.


Was working towards the expected standard the new expected standard?

Well, as mentioned above, we were all focussing on attainment: the percentage of pupils passing the expected standard thresholds in reading and maths tests (scaled score of 100+), and assessed as meeting the expected standard or working at greater depth in writing. We all assumed there would be a high percentage of schools where fewer than 65% of pupils meet this standard in the three subjects combined, and this could well be the case. But as stated in the DfE's primary accountability technical document, that is only part of the floor standard. Schools that fall below the 65% threshold will need to meet or exceed each of the VA thresholds for reading, writing and maths. And these haven't been set yet. We know these thresholds will be negative, we just don't know how negative, and we are not in a position to have any attempt at estimating VA right now, or predicting what these thresholds might be.

So, let's put this in the context of old money. Imagine if, last year, the DfE announced that it had changed the attainment floor standard from 65% L4+ RWM to 65% L5+ RWM. Suddenly the majority of schools are below floor. Then it states that any schools falling below this new, tougher standard must be above each of the VA thresholds for reading, writing and maths. 

The panic grows.

Then it announces that the VA thresholds are 97.2, 96.8, and 97.1.

At a guess, we'd probably only have around 3% of schools below floor.

Now, I was under the impression that the DfE would set the KS2 VA thresholds based on 2016 data and maintain them for future years, much like they have with the progress 8 threshold for secondary schools. This would give schools some idea of what they're up against. But I can't see how this will work now. With standards improving each year - as they now doubt will with pupils having increasing exposure to this new curriculum - I suspect VA thresholds will change to reflect this. As more schools meet the 65% floor standard they will need to tighten the progress measures to ensure they catch the right number of schools. You can make data do anything you like of course.

And, no doubt, we'll all be kept guessing.