Wednesday, 16 December 2015

How I track Ofsted

This is something I used to do for the LA and often end up doing for HTs when they ask me "what are Ofsted up to? Will we be next?". Obviously we don't really know who will 'be next' but this method has proved to be fairly accurate so I thought it was worth sharing here. It's straightforward - it requires some basic Excel skills - and it uses a data source that you may not aware of, so hopefully will be of interest. Let's get started!

1) Google 'Ofsted Management Information' and click on the relevant link or, better still, click here. This will take you to the following page:

Scroll down until you get to the latest data:

Right now, the last update was the 30th November 2015, so it's a bit out of date and will require supplementing with an additional source later on. Anyway, download the file and it will open in Excel. Make sure you click 'enable editing' and click on the 'Provider Level Data' tab to access the school inspection data (see screenshot below). Then, right click on the 'Provider Level Data' tab to check if the sheet is protected. If it is, click 'Unprotect Sheet', which will enable you to carry out the following actions.

You should now be looking at a spreadsheet that contains the current and previous inspection outcomes and dates of every school in England. Here I have anonymised the school names but your download will contain the real deal. The spreadsheet may already have filters in place, in which case you can get started. If there are no filter arrows in the column headings, then enable them by selecting the row containing the column headings (usually row 2) and chose to apply filters (top right of 'home' ribbon in Excel 2013, next to 'Find & Replace' binoculars).

The first thing I've done is filter for Gloucestershire schools. Click on the Local Authority column filter arrow, unselect all and then select the desired LA:

Next I want to select the inspection outcome that I'm interested in. Here, I've selected 'good' schools from the 'overall effectiveness' column:

Then I've filtered for primary schools in the 'Ofsted Phase' column:

I now want to find schools that were inspected before a particular date (e.g. the latest inspection date of a school I'm working with) and so I need to filter by inspection date. Hopefully 'date filters' will be present when you click on the filter arrow but occasionally it's not, in which case you will need to do something a bit clever. Highlight the 'latest inspection date' column by clicking on its column heading letter. Then, in the data ribbon, click on 'text to columns', click next twice and then select the 'Date: DMY' button. This will convert the dates in the column into a proper recognised date format and the filters will work.

Note: this step is only necessary if 'date filters' is not visible when you click on the filter arrow. If it is visible then you can miss this step.

Now we can click on the filter arrow at the top of the 'Latest Inspection Date' column and apply our date filter. Here, I am interested in those primary schools that have not had an inspection since 1st January 2012:

So, now I have a list of 'Good' Gloucestershire primary schools that were last inspected before 1st January 2012. I can click on the 'Latest Inspection Date' column again to put these into date order, either oldest to newest, or vice versa:

Finally, because this is from a slightly out of date data source you should check to see if any of these schools have been inspected recently. For this it's worth going to Watchsted ( Start with the 'latest 100' map and zoom into your area of interest:

I can see there are 3 schools in Gloucestershire that have been inspected recently and I'll want to cross-reference those with my list.

So, that's it. Hope that's useful.

And if the lovely Watchsted people are reading this, here's something for the wishlist: date filters on the map please.

Happy New Year!

Wednesday, 25 November 2015

A red RAG to a bull

Last month the schools causing concern guidance was updated. This guidance sets out local authorities' statutory responsibilities for tackling underperformance, and includes details on the issuing of warning notices and their powers of intervention. My fear is that it's more about stick than carrot and effectively turns LAs into boot boys, going out to round up as many schools as possible on the basis of flimsy evidence and their own interpretation of vague guidance. Ironically, this round up will result in more schools being taken out of local authority control thus making LAs the architects of their own downfall. I was going to blog about it at the time but life got in the way and it faded from view. But my concerns have been reignited by one particular local authority's approach to rating its schools, and I feel compelled to discuss their methodology in the public domain.

So, let's return to the schools causing concern guidance, and to the definition of low standards of performance contained therein. Page 11 contains the following definition:

The definition of what constitutes “low standards of performance” is set out in section 60(3) of the 2006 Act. This is where they are low by reference to any one or more of the following:

I. the standards that the pupils might in all the circumstances reasonably be expected to attain; or,

II. where relevant, the standards previously attained by them; or

III. the standards attained by pupils at comparable schools.

For the purpose of this guidance, “unacceptably low standards of performance” includes: standards below the floor, on either attainment or progress of pupils; low standards achieved by disadvantaged pupils; a sudden drop in performance; sustained historical underperformance, performance of pupils (including disadvantaged pupils) unacceptably low in relation to expected achievement or prior attainment, or performance of a school not meeting the expected standards of comparable schools.

This is so vague and open to interpretation that I reckon I could make half the schools I work with fit the criteria. Take 'below floor' for example. Officially, to be below floor the school has to be below the attainment threshold and progress medians, but here it's a case of either/or. A sudden drop in performance can of course be down to a drop in prior attainment and one would hope that LAs would always take that into account but this depends on the quality and depth of their analysis. Another thing that bothers me is 'comparable schools'. What are comparable schools? The DfE and Ofsted have their own definitions of similar schools based on prior attainment (see Ofsted dashboard and DfE performance tables); the Perspective system, used by LAs, also had a similar schools measure but one based on contextual factors, whilst FFT use a complex range of factors in their CVA analysis in order to compare like with like. There is no single definition of 'comparable schools'.

And all this means that LAs can devise their own methods of identifying 'schools causing concern' based on their own interpretation of the guidance. The model I'm going to deal with here is one such example of an oversimplified and flawed approach to risk rating that definitely warrants further scrutiny.

When I wrote the first draft of this blog I named the LA but realised that this is likely to cause problems for certain schools, so I've decided against it. However, I still think it's important to share the method because stuff like this should be exposed and challenged. Hopefully it’ll encourage others to do likewise.
This particular LA – let’s call them LAx - attempted to resist FOI requests made by the local media to publicise their risk rating methodology on the grounds that it was not in the public interest. This has been rejected and so the method has now been shared and will be made public in the next week or so. The LA are obviously not too happy about this decision, and neither are many schools, particularly those that will have to share their 'red' RAG rating with parents.

And so we get to the important bit: the methodology that LAx are using to quantify, categorise and RAG rate the performance of its schools. Here we go!

The Model
The following table shows an example of the method. There are 11 indicators and a score is given depending on how the school's results compares against the indicator. Generally speaking this means -1 if below the national average and +1 if equal to or above it. For L4+RWM, schools below floor are -1 and those equal to or above the national average are +1, whilst those that are above floor but below the national average get 0. With regards 2 levels of progress, a school will be -2 if below the national average, and +2 if above the floor standard. Those schools between the national average and the floor standard medians get 0. Hope that all makes sense.
Finally, the school's latest result is compared to the previous year and a trend indicator is assigned in order to show apparent improvement or a decline in standards.

*note: 2LP/3LP averages in RAISE aren't really averages, they are national proportions (as if all pupils are in one school). The floor is the only real average (i.e. a median), but that's beside the point.

The schools' scores and trend indicators are then collated and assigned a RAG rating thus:
Schools with scores of +4 or more are 'green', those with scores between -4 and +3 are 'amber', and those below -4 are 'red'. I'm not entirely sure if the trend influences the RAG rating in any way but considering School H has seen a decline in standards 2 years running and still gets a 'green' I assume not.
Of this methodology, the LA says this:

In the recent 2015 Ofsted inspection, Ofsted said the risk tool “places the local authority in a stronger position to identify schools at risk of deteriorating early enough to prevent their decline.” (Ofsted, May 2015).
It is important to note that the risk tool is intended to be the start of a process that is used to inform discussion with Headteachers. The core function of the risk tool is to indicate where schools are at risk of not being judged as Good or Outstanding at their next Ofsted inspection.
So, what's wrong with this model then? This leads neatly on to the next section entitled.....
So, what's wrong with this model then?
Well, a few things really:

1) It takes no account of cohort size.
This is obvious. Pupils in smaller schools have a bigger % impact on results than those in larger schools. RAISE deals with this in 2 ways: the confidence intervals used in statistical significance testing, and the (rather flawed) methodology in the closing the gap section that produces those lovely red boxes. However, if we borrow the latter for this process then any gap that is smaller than the percentage value of one child should be ignored. Actually, any data that is not significantly above or below should treated as statistically in line, and also be ignored.

2) It takes no account of pupils' start points

Schools with high proportions of pupils that were L2C and L3 at KS1 are far less likely to have high or even average percentages making 3 levels of progress. We all know this.

3) It takes no account of context

Schools with high percentages of SEN and lower attaining pupils are seriously disadvantaged by this model. For schools with high percentages of SEN, FFT's CVA analysis is much fairer.

4) It takes no account of VA

Why? Next year this will be the only progress measure so may as well build it in now to ensure the whole thing is future proofed. VA is fairer for schools that have high percentages of lower attainers; CVA is even fairer.

5) It takes no account of progress across KS1

The model just compares L2B+ against national averages. What about those schools with low ability intakes, where pupils make good progress but %L2B is still below average? FFT KS1 VA/CVA percentile ranking would be a much fairer measure that would recognise the good progress made by pupils in schools where KS1 attainment is relatively low.

6) It favours coasting schools

Schools that are above the attainment comparators and above all the national averages and floor standards for 2LP in reading, writing and maths, but below the 3LP measures, will get a score of +8. Schools with smaller cohorts, and high percentages of SEN and low attaining pupils, will get lower scores despite pupils making good progress in real terms. Incorporating VA and carrying out a comparison of FFT attainment and progress ranking at both KS1 and KS2 would help to even things out.

7) It doesn't take account of phonics

This surprises me. It's the only test carried out during KS1 and judging by the pages devoted to it in RAISE, evidently the DfE and Ofsted take it seriously. Also, it is quite common to see low phonics results sandwiched between high attainment at EYFS and KS1.

8) It doesn't take account of gaps

Considering the importance of the closing the gap agenda, this seems like a glaring omission. There are plenty of relatively low attaining schools where FSM pupils make fantastic progress, and there are plenty of high attaining schools where FSM pupils make relatively poor progress. This should be recognised.

9) The thresholds are arbitrary

Why is -4 amber and -5 red? Is it statistically relevant? I'm interested.

10) "You're red!" doesn't seem like the ideal start point for a conversation

Just sayin'.

A little more information, a little less action

I once attempted to create a risk rating tool for an LA. It took account of cohort sizes and percentages of SEN, FSM and EAL, alongside at least 12 other key performance measures including VA. Schools were ranked on the various key measures and those with the highest average rank scores were the supposedly the most at risk. It was an interesting academic exercise but I became increasingly uneasy with it because I only had to make a few minor tweaks and the top 10 most 'at risk' schools could completely change. In the end I realised that the only effective way to evaluate school performance was to go through every single RAISE report and FFT dashboard for every school, and, most importantly, combine that with our own intelligence about those schools from recent visits (e.g. new Headteacher, changes in demographics, addition of new SEN resource, increase in EAL pupils etc) to arrive at a qualified judgement on that schools position. And that's what we did. Very, very hard work during the autumn term when the data came out but it is the best way to fully understand the subtleties of school performance. No shortcuts.

As mentioned above, the Schools Causing Concern guidance concerns me because it gives LAs the green light to develop flimsy, oversimplified methods for identifying supposedly weak and vulnerable schools, such as the one presented here. We therefore end up with a model that takes no account of context and  is biased in favour of higher attaining schools, rewarding those that are above average even when the difference is not significant in any way. It certainly does no favours for those schools with lower ability intakes. So, I do not agree with Ofsted's assertion that the 'tool' 'places the local authority in a stronger position to identify schools at risk of deteriorating early enough to prevent their decline'. It's way too basic and retrospective for such grand statements.

This probably makes me a marked man in LAx territory now, and I may have to work under a pseudonym in future, but if it causes LAs to reconsider their approaches then it's been worth it.

And until it changes, my RAG rating for this particular methodology is obviously...


Sunday, 15 November 2015

The Value Added Tax

Remember the community charge, or the poll tax as it became better known? The supposed merit of the system was that it was simple, easy to understand and straightforward to apply. But like many things that are simple it was flawed. It was also extremely unfair. Under the poll tax, everyone would pay the same and so the poorest would suffer most.

I spend a lot of my time pulling VA apart. It's a bit of an obsession. Every year I produce a VA calculator, which schools can download here, and in December last year I wrote this blog to explain how VA is likely to work next year once levels have gone (hint: VA was never really anything to do with levels anyway). I assumed that it was business as usual, that a pupil's attainment at KS2 would be compared against the average attainment for pupils nationally with the same KS1 start point. Well, it appears I was almost right. Almost, but not quite.

Last week Michael Tidd released this rather excellent video (not that I'm jealous or anything) and accompanying blog. He'd already run the video past me a couple of weeks back and I stated that I didn't think it was quite right, that the proposed methodology had been oversimplified. The big difference between the method described in the video and that which I had assumed from previous RAISE methodology, was that the video (and other guidance doing the rounds) suggested that pupils' KS2 estimates would be based on overall KS1 APS, whilst I assumed it would be based on the precise KS1 profile (i.e. KS2 attainment of 2B, 2B, 2B pupils would only be compared against other 2B, 2B, 2B pupils). Michael sought confirmation from the DfE and it turns out I may have been wrong: from 2016, KS2 VA estimates will be based on overall KS1 APS, and this bothers me. 

Up to and including this year, VA estimates for KS2 (the line pupils need to cross to achieve a positive VA score) were calculated from the KS1 APS but the pupil's relative differences in maths and reading at KS1 were taken into account. This meant that pupils with the same KS1 APS but with different KS1 profiles had different - sometimes very different - estimates for KS2. The following table, taken from my VA calculator, shows these differences for pupils who all had a KS1 APS of 15 but had different KS1 attainment profiles. 

The estimates range from 24.3 points for the admittedly unlikely pupil who was L3 in reading and writing and W in maths, up to 30.5 points for the L1, 2b, L3 pupil. There is a 2 point difference in estimates between pupils with more likely KS1 profiles.

So, that's how it works this year. But next year, under the proposed new model, all these pupils will have the same scaled score estimates. Regardless of differences in attainment at KS1, they will have the same expectations for KS2 as long as the KS1 APS is the same. They will all have to cross the same line.

Simple, easy to understand and easy to administer.

But is it fair?

Monday, 12 October 2015

What is sufficient progress?

In the crazy world of competitive shin kicking, combatants apparently shout "Sufficient!" when they've had enough. I know how they feel. This whole issue of defining 'sufficient progress' feels like being kicked in the shins repeatedly. So, I thought I'd try to explain what it is and why attempting to set end of key stage targets right now is a fairly futile process.

The Key Stage 2 Assessment & Reporting Arrangements for 2016 were published last week and the key changes section contained this brief explanation of the new progress measures:

2.9 Progress Measures

Progress measures in 2016 will work in a similar way to current primary value-added measures or Progress 8 in secondary schools. A school’s score will be calculated by comparing their pupils’ KS2 results against those of all pupils nationally who had similar starting points.

Pupils will be assigned to prior attainment groups based on their KS1 results.

The department will confirm what score a school would need to get to have made ‘sufficient progress’ after the tests have been sat next summer.

More detailed guidance on how the new measures will be constructed is expected to be published early in 2016.

After years of expected and better than expected levels of progress measures, this seems new and daunting, but it is exactly the same method used in the VA measures of RAISE and FFT reports for years. Essentially, it involves comparing a pupil's KS2 attainment against the national average attainment for pupils in the same cohort with the same start point (this is known an estimate or benchmark). So, for example, we compare the KS2 scaled score of a pupil that was 2c, L1, 2c at KS1 against the national average scaled score for pupils that were 2c, L1, 2c at KS1. I produced this hypothetical example to illustrate this:

In this example, a pupil has fallen short of the expected standard but has made 'sufficient' progress and achieves a positive VA score because their scaled score is higher than the national average result for pupils with the same prior attainment. Conversely, it is possible for a pupil to achieve the expected standard but not make 'sufficient' progress because nationally pupils with the same prior attainment achieved a higher score on average. The differences between pupils' actual and estimated results are then averaged for the whole cohort to arrive at a school VA score. It is most likely that sufficient progress will be a negative threshold and perhaps based on percentile ranking so that we don't end up with 50% of schools below floor. 

The key thing here, with regards our attempts to second guess what constitutes sufficient progress, is that pupils' individual benchmarks are calculated retrospectively. In other words, because a pupil is compared against the average attainment of pupils nationally with the same start points in the same year, we have to wait until all the 2016 results are in before we know the line they have to cross. This means we are stumbling around in the dark right now and any attempt to set targets is like shooting into the night whilst wearing a blindfold. At best it's distracting; at worst it's a danger to pupils' learning. Even FFT - and those guys know a thing or two about target setting - are being cautious this year by providing broad estimates in the form of ARE bands. But that's not stopping some people from having a crack at it. 

Things will improve in 2017 once we have some actual data in the bank, but considering more and more pupils will reach the expected standard each year, and that average scores will increase, any estimate derived from the previous year's data is likely to be too low. Right now though we have absolutely no idea what a likely outcome will be from any particular start point because no one has sat the tests. We certainly shouldn't be applying some spurious methodology like adding two whole levels of progress to the KS1 result and then attempting to convert the outcome to a scaled score. This is failing to understand the difference between VA and expected progress. It is a fact that many pupils in primary schools made so-called expected progress but fell short of their VA estimate (the opposite was common in secondary schools), and we could unwittingly repeat this through an ill-conceived approach. And, whilst the DfE claim that the new expected standard is equivalent to a 4b, we know this is a very broad approximation and performing a conversion to scaled scores on that basis is likely to be inaccurate and misleading. 

The concerns are twofold: 1) that schools will attempt to teach to the test, and 2) that schools will be held to account for targets set on flawed methodology. My fear is that schools, having set scaled score targets for pupils based on a 'sufficient progress' model, will then test pupils to see how close to these targets the pupils are. The sample tests don't have a raw score to scaled score conversion so they might attempt to do it themselves using the rough criteria for meeting the expected standard contained in the frameworks. Highly dubious. Alternatively they might used a commercial standardised test, which produces scores in a similar format. Again, this is very risky. Schools must understand that a score of 100 in these tests indicates the average level for pupils taking that particular test, and therefore cannot be linked to the key stage 2 expected standard. It might be that pupils find that particular test hard, so 100 will be linked to a low raw score. Or they might find it easier, so 100 will be linked to a higher raw score. No matter the difficulty of the test, around a half of pupils will be below 100 and the other half will be above. The expected standard, on the other hand, will always be around the same level of  difficulty and the DfE want to move towards 85% of pupils achieving it (the floor standard being kept at 65% in 2016). This means that 100 is not the average and is therefore a different entity to a score of 100 in a commercial test.

So, take note when trying to set targets for 2016. It's a pointless exercise. The data will have no validity - it's a stab in the dark - and could even be dangerous. The best advice, for this year at least, is to block out the noise and concentrate on the teaching. Then the results will hopefully take care of themselves.

Tuesday, 6 October 2015

The Grim Fairy Tale of Teacher Assessment

It occurred to me recently that if statutory teacher assessment were a fairy tale it would be Cinderella. Loved and cherished in the Early Years, its future has become ever more uncertain, its status steadily undermined. Soothed by masters who, publicly at least, pay lip service to its value, teacher assessment is becoming increasingly sidelined, abused, neglected and ignored. One wonders if ultimately it will be ditched altogether. These are dark days indeed for teacher assessment.

Early Years: the warning signs
The foundation stage profile as a statutory entity has a year to go and the government has introduced a baseline assessment for future progress measures, which many schools have already carried out this term. A choice was offered and schools voted overwhelmingly for Early Excellence because it fitted the ethos of the EYFS: teacher assessment based purely on observation. But many schools opted for NFER or CEM (having possibly opted for one of the others before they were removed from the list) so we have a fragmented approach, and I question if this is the DfE's ideal situation. On the one hand, they want to offer schools a choice (and allow the market to decide the best approach) but deep down would they have preferred a single test that provides a standardised baseline? I'm intrigued to see what happens when the first cohort of children get to the end of key stage 1. What if analysis shows that there's no apparent relationship between the baseline assessment and key stage 1 results? This would surely mean that the baseline is a poor predictor of outcome, a critical stipulation of the consultation (see p7). What then? Could they be ditched and replaced? I guess we'll find out in 3 years time.

Key Stage 1: The abuse begins 
Remember the performance descriptors? Below national standard, working towards national standard, working at national standard, and mastery. There was a consultation, there was opposition, there was silence for months, and then there were three - working towards, working within, and working at greater depth within the expected standard - contained in a sparse 11 page document. All that time to come up with those? Is it an improvement? OK, there was an apology, but there is also the intriguing use of the word 'interim'. These teacher assessment are interim and they are for 2016 only. What happens after that? And what happened to 'below'? What of those pupils that do not meet the criteria of 'working towards the expected standard'? How do we assess them?

If that isn't worrying enough, the DfE's response to the consultation on primary school assessment and accountability states that 'at the end of key stage 1, teacher assessment in mathematics and reading will be informed by externally-set, internally-marked tests. There will also be an externally-set test in grammar, punctuation and spelling which will help to inform the teacher assessment of writing.' Informed. Is this a polite way of saying validated? Or straitjacketed? Does this mean that the teacher assessment can only be X if the test score is Y? 

I read Daisy Christodoulou's blog in support of tests recently. It contained the following interesting point:

'Similarly, one way we could ensure greater equity in the early years is to introduce exams at KS1, rather than teacher assessments, since we have some evidence that teacher assessments at this age are biased against pupils from a low-income background – but again, if you suggest replacing teacher assessments with tests, you generally do not get a great response.'

Fair point. And she's right of course, it wouldn't get a great response but one wonders how much support such a position gets behind closed doors. And come to think of it, isn't that what's actually happening anyway? Next year's key stage 1 tests will provide scaled scores linked to an expected standard; and these scaled scores will 'inform' the relevant teacher assessment of which there will be one of just three possible outcomes. It sounds like the tests are winning to me.

Key Stage 2: Willful neglect

Now things get worse for dear teacher assessment. Reading, maths and science are reduced to simple binary outcomes. Are they working at the expected standard? Yes or no. Writing has been reduced from five possible outcomes (same as the originally proposed key stage 1 performance descriptors with 'above' shoehorned in between 'meeting the expected standard' and 'mastery') down to three as per key stage 1. Just like key stage 1 there is no teacher assessment for GPS, but unlike at key stage 1, the result cannot be used to 'inform' the teacher assessment because the tests are externally marked, which probably explains why for all subjects other than writing, the teacher assessment is binary. The yes or no response. In what way is this useful to anyone?

The STA's timetable of progress measures states that in academic years 2019/20 and 2020/21, progress will be measured from 'new' KS1 teacher assessment to 'new' KS2 test and teacher assessment outcomes. Call me a cynic but I don't believe it. I don't believe anyone would choose to use the vague key stage 1 teacher assessments as a baseline when a scaled score is a available. And while we're on the subject, I wouldn't be at all surprised if GPS test score usurps the writing teacher assessment at the key stage 2 end of the measure. For VA you want data to be as fine as possible, and a 3 tier teacher assessment hardly cuts it. At key stage 2, teacher assessment is looking seriously endangered.

Key Stage 3/4: Locked in the attic

Baselines for new Progress 8 measure at key stage 4 will involve a decimal level derived from pupils' English and maths results at key stage 2, where English is a combination of reading test result and writing teacher assessment. From 2017 onwards, once the last cohort with overall key stage 2 English levels have left, reading and mathematics test results only will be used in calculating key stage 2 prior attainment fine levels for use in progress 8. Writing will not feature. For those pupils missing key stage 2 test results, the teacher assessment will only be used in certain circumstances. In most cases where a pupil is missing one result, the teacher assessment will not replace it. Instead, the pupils' baseline will involve the one test result that is present. Here the importance of the key stage 2 teacher assessment has not so much been undermined as completely demolished.

The End

I think this is as far as the Cinderella analogy goes. I do try to be optimistic but I can't see any cause to be so here. From the potential mess of the fragmented reception baseline to the near total exclusion of teacher assessment from progress 8 baselines, and the 'interim' frameworks in between, the future looks bleak. For all the positive noises about the importance of professional judgement, teacher assessment at all key stages has been progressively marginalised to the point it is a shadow of its former self, and I'm not sure this particular fairy tale will have a happy ending.

Wednesday, 9 September 2015

The house that moved: my meeting with Ofsted.

I had some weird dreams last week. One involved a cottage that slipped its anchor and rolled down a hill, crashing into a farmyard. I emerged shaken but unscathed. The other involved a dystopian, Running Man-style game staged in a dimly lit branch of B&Q. Cornered and frightened, I thankfully awoke before my inevitable, grisly demise. I never got the flexible tap connector I went in for.

And for all this mental anguish I blame Ofsted. Jack Marwood had invited me along to Ofsted Towers to meet with Sean Harford and others to discuss data and assessment and tracking and stuff, and I was rather anxious. Turns out I really didn't need to be. Sometimes you pump yourself up for an argument only to discover that the assumed adversary agrees with much of what you have to say. Rather deflating really. All that time spent rehearsing in front of the mirror...........

I'm kidding.

So, last Friday (4th September) I met with Jack Marwood (@Jack_Marwood), Steve Wren (@Yorkshire_Steve) and Peter Atherton (@DataEducator) in London for our date with destiny. Unfortunately Destiny couldn't make it so.......

Rubbish joke. I apologise. Back to the meeting. I went with two main points to discuss:

1) The Ofsted view of assessment and tracking systems

2) The misinterpretation of statistical significance indicators

I want to deal with point 1 here because a) it's probably the area that schools are most concerned about, and b) opinions voiced in the room on the subject of statistical significance indicators in RAISE, whilst extremely encouraging, were simply that: opinions. We have some way to go on that one.

Now, anyone who has read my blog will know my opinions on the various new and popular approaches to assessment without levels. I am really concerned that schools are replacing levels with systems that are simply levels by another name; systems that place pupils into best-fit bands; that involve point scores and expected rates of progress built upon assumptions of linear progress. In short, they are flawed and risk repeating many of the mistakes that led to the removal of levels in the first place. A key issue is that many of these systems are not sufficiently designed to show progress during periods of consolidation and deepening of learning. They therefore have the potential to cause pupils to be moved on before they are ready in order to show progress. But schools want these systems because a) they provide a security blanket by offering the comfort of the familiar, and b) They believe that this is what Ofsted want.

"We have to measure progress, right?"

But Ofsted have been at pains to bust these myths. The handbook now states:

Ofsted does not expect performance and pupil-tracking information to be presented in a particular format. Such information should be provided to inspectors in the format that the school would ordinarily use to monitor the progress of pupils in that school (Ofsted Handbook, p12). 

Sean Harford went further by saying:

"what we want to see is information, not data. I don't care if this involves numbers or not."

This is a welcome and radical departure from what most of us assume is required for inspection i.e. that progress and gaps must be quantified in some way using a points-based system. Essentially, what Ofsted are expecting to see is useful and meaningful assessment of pupils' learning; information that helps teachers identify strengths and weakness, gaps and next steps; information that can be understood by pupils and parents as well as staff. Formative assessment. Assessment for learning. This may or may not involve numbers, and if learning is represented by numbers then those numbers must mean something. Data must be an aid to learning.  

We talked about target setting. Sean hoped that schools would no longer be setting targets via thresholds, e.g. "this term we have X% at 3B. Next term we expect that to increase to Y%".

I think I got a bit animated at this point. 'But that's exactly what is happening' I cried, tears of frustration falling onto my Pukka Pad, 'except schools are now saying "this term we have X% at Secure. Next term we expect that to increase to Y%"'.

Sean responded: "if that's what schools think we want to see then we have a problem."

I just about resisted leaping into the air, shouting "BACK OF THE NET!" and doing airplanes round the room with my shirt pulled up over my head.

Perceptions of Ofsted have changed a lot in the past couple of years and Sean Harford (and his predecessor Mike Cladingbowl) must take much of the credit for this by providing a public face; by listening and responding; by agreeing to hold meetings such as this. Consequently, headteachers have fewer concerns about Ofsted as an organisation; instead they now fear the 'rogue inspector'. The new Ofsted Handbook along with the included myth-busting statements have gone a long way to relieve some of the anxiety but still teachers will be heard to utter "well, they say that but........". There is evidently some cynicism about what they say and what they do actually matching. And this is why so many schools are implementing systems that carry on tracking in the same old way, just in case Ofsted want points and bands and thresholds. In case they still want levels. I am now convinced that they don't but it's up to Sean and his team to ensure that this message gets across to all inspectors. 

I have faith but ultimately, the proof will be in the pudding.

After all, Ofsted is as Ofsted does. 

Friday, 31 July 2015

Assessment Commission report: my top 5 points

The cat is out of the bag. This week the frustratingly overdue report from the Assessment Commission into assessment without levels was leaked via Warwick Mansell and it nearly broke twitter, such was the excitement over its contents. Keen to head the charge, I skimmed through it and tweeted some key points. But Harry Fletcher-Wood was already on the case. He carried out a more thorough dissection and hit us with a barrage of tweets followed by a ridiculously quickly written and excellent summary. He's already covered all the key aspects which just leaves me to countdown my top 5 points from the report and enjoy the warm glow of satisfaction derived from the knowledge that many of their recommendations match what I've been tweeting, blogging and banging on about for the past year.

So, here are my top 5:

5) "Levels also used a ‘best fit’ model, which meant that a pupil could have serious gaps in their knowledge and understanding, but still be placed within the level." (p8)

Yet many if not most schools are implementing systems that are placing pupils into best-fit bands, which have little to do with teaching and learning and everything to do with accountability. Yeah, I'm looking at you Emerging, Developing, Secure. It's time to take an honest, objective look at these systems and ask the question: "Is this really assessment without levels?" 

4) "The word mastery is increasingly appearing in assessment systems and in discussions about assessment. Unfortunately, it is used in a number of different ways and there is a risk of confusion if it is not clear which meaning is intended." (p11).

Call me old fashioned but I reckon it probably is best to work out what mastery means before we attempt to assess it.

3) "Progress became synonymous with moving on to the next level, but progress can involve developing deeper or wider understanding, not just moving on to work of greater difficulty. Sometimes progress is simply about consolidation." (p7).

Just that: sometimes progress is simply about consolidation. Progress is neither a race nor is it linear, and we need to stop devising systems that treat it as such. 

2)" The starting point of any assessment policy should be the school’s principles of assessment." (p20)

It does not start with the tracking system!

1) "More frequent collection of assessment data may not only be a waste of time, but could actually be damaging if actions are taken based on spurious or unreliable interpretations. It could also encourage a rapid-but-superficial approach to learning." (p26).

Yes! We need assessment for learning, not assessment of learning. If we adopt systems of assessment that involve the collection of data every few weeks we'll continue to repeat the mistake of the past whereby a) teachers may be tempted to fabricate data in order to 'prove' progress, and b) pupils may be pushed on before consolidating their knowledge. Ultimately no one wins. Maybe, just maybe, progress measures themselves are at the heart of the problem

So, that's the key points I've taken from the report. I really recommend you read it, digest it, and look at your own systems through the prism of its guidance. Hopefully by this time next year we'll actually start assessing without levels.

Happy holidays!

Friday, 24 July 2015

The Progress Paradox

There is a radical concept in urban design known as shared space. It involves the removal of kerbs, street furniture, and painted lines in order to blur the boundaries between traffic and pedestrians. The idea is that if you merge the various zones of use in the urban environment - pavements, cycle lanes and roads - people become more aware of other users and more conscientious towards their fellow citizens as a result. And it works! Removing all the features that are designed to keep us safe actually makes us safer.

I promise there is a point to this and I'll get back to it later.

I have blogged before about the highly dubious and misguided approaches we take to measuring progress. That we seek to distill learning down to a baseless numerical value not for the benefit of teaching and learning - for teachers and pupils - but for the purposes of accountability and performance management. Levels - perhaps once fit for purpose - were hacked up into an ill-defined system of sublevels and points, and bundled into neat packages of 'expected' progress in order to quantify learning and satisfy the demands of external agencies.

Points became the currency of scrutiny.

And so, these measures are now part of the common language of assessment and are now so integral to the daily running of a school that it is hard to imagine a world without them. They have come to define the contours of learning. It is perhaps inevitable that when levels were removed we set about recreating them. We needed the numbers to 'prove' the progress even though we knew deep down that the numbers meant nothing. The cage was opened but we quietly closed the door and stayed put.

But we have to measure progress, right? Surely we need to quantify it in some way?

Don't we?

One of the key reasons for the removal of levels was that they often caused pupils to be rushed through content before they were ready. Pupils that were deemed to be 'broadly level 4' therefore reached the end of end of the key stage with significant gaps in their learning.

But if that was a key  issue with levels, isn't it a problem with any progress measure? If we are driven by steps, bands and points then isn't there a big temptation to tick the box and move the pupil on? Aren't we just chasing meaningless numbers? Has anything really changed?

This brings me back to the concept of shared space. Perhaps if we remove all the points and expected rates of progress - the street furniture of assessment - we would concentrate more on the learning; on identifying pupils' weaknesses and addressing the gaps. Assessment would then be returned to it's proper state: about what is right for the child, not what is right for the bottom line; and ultimately both the child and school would benefit.

So, maybe progress measures are a distraction and if we concentrate on embedding learning - on consolidation, cognition, gaps, and next steps - then the progress will take care of itself. Perhaps, ironically, pupils would make better progress in a world without progress measures, where teachers are not chained to expected rates linked to linear scales that tempt them to push pupils on before they are ready. We must avoid repeating past mistakes, shoehorning pupils into 'best-fit' bands and expecting uniform progression through the curriculum. Instead let's focus on the detail - track the objectives that the pupil has achieved and assess their depth of understanding. The progress will be evident in pupils' work and we don't need arbitrary numbers to tell us that.

Essentially it all comes down to one irrefutable truth:

If you teach them, they will learn.

Thursday, 16 July 2015

The Gift

We're all knackered. You've all been teaching forever and I've visited approximately 1000 schools a week since I become self-employed last November. What I want to do right now is talk to my family, watch the Big Bang Theory, drink some beer and then sod off to France in a couple of weeks and go climbing. The last thing I wanted to do this evening was write a blog.

But then the DfE published this research into the reception baseline.

I skipped the first document (55 pages), speed read the next one, and wasn't going to bother with the third. It basically sounded like one of those police officers at the scene of an accident: "nothing to see here. Move along." But I thought I should make the effort. It's only 12 pages long after all. 

And I'm very glad I did. In amongst the flannel and whitewash was this:

The research noted the difference between the scores of the two groups - the teaching & learning group and the accountability group - with the latter having lower scores, suggesting that perhaps when tests are administered for purposes of establishing a baseline for measuring progress (i.e for accountability reasons) lower scores are given.

Then they appear to have let their guard down.

Read paragraph 3 in the screenshot above:

"The overall result would be statistically significant at the 95% level if the data were from an independent random sample."

Hang on! What?

Is the data significant? Or isn't it?

It would appear that the use of a 95% confidence interval is not appropriate in this case because the data is not from a random independent sample. So it is significant at the 95% level but that test is not used due to the nature of the sample. Quite rightly they employ a more appropriate test.

But significance tests in RAISE are carried out using a 95% confidence interval. Either this means that cohorts of pupils are independent random samples or the wrong test is used in RAISE.

This is something that Jack Marwood, myself and others have been trying to get across for a while - that there isn't a cohort of pupils in England (or maybe anywhere for that matter) that can be considered to be an independent random sample.

Not one.

So if the DfE decides to use a different test for significance in this research on the grounds that the samples are not independent and random, then shouldn't they do the same in RAISE?

Until cohorts of children are true independent, random samples, does this mean we can discount every blue (and green) box in our RAISE reports?

Well, perhaps not - that would be rather foolhardy. In an email exchange with Dave Thomson of FFT today, he stated that the tests used in RAISE are useful in that they indicate where there is a large deviation from the national mean and significant data should be treated as the starting point for a conversation. He did then point out that no cause can be inferred; that statistical significance is not evidence of 'school effects' and that it should not be treated as a judgement.

So, there is some disagreement over the significance of the sentence (pun intended) but I'm still left wondering why a test that is not appropriate here, is deemed appropriate for other data that is neither random nor independent. 

That sentence may not change everything as I rather excitedly claimed last night, but it does pose big questions about the validity of the tests used in RAISE. This reads like an admission that statistical significance tests applied to groups of pupils are flawed and should be treated with extreme caution. Considering how much faith and importance is invested in the results of these tests by those that use them to judge school performance, perhaps we need to have a proper conversation about their use and appropriateness. It is certainly imperative that users understand the limitations of these data.

So, thank you DfE, in one sentence you've helped vindicate my concerns about the application of statistical significance tests in RAISEonline. An unexpected end of year gift. 

Have a great summer!

Saturday, 27 June 2015

Running to stand still

Yesterday I re-read this from @edudatalab and, following an enlightening discussion with @meenaparam, I took the red pill and discovered that the VA rabbit hole goes deeper than I previously thought. 

Much is made of the issue of progress in junior schools and their correspondingly poor Ofsted outcomes. I've tweeted about the problem numerous times and have written a blog post about it, comparing estimates derived from CATS against VA estimates based on the KS1 results. The differences can be enormous with far higher expectations for KS2 attainment when plotted from KS1 - the gap between the CATS and VA estimates in junior schools is around 3 points on average, with the former being the more accurate predictor. 

Inevitably the finger of blame points squarely at the Infant school, and in some cases this may be justified. I've worked with a number of junior schools where the large proportion of supposedly high ability pupils is completely at odds with both the school's own assessment of pupils on entry and the context of the area. However, as the Education Data Lab article points out, it may not be as simple as this. Is the issue of poor progress in junior schools really about over inflation of results in the infant school? Or is the cause more complicated and less direct than that? 

Could it be that the issue of poor progress in junior schools actually relates to the depression of KS1 results in primary schools?


To get your head round this you need to understand how VA works.

VA involves the comparison of a pupil's attainment against the national average outcome for pupils with the same start point. 

Now, what happens if primary schools were dropping KS1 assessments by a sublevel, so, for example 2As became 2Bs? If, on average, all those pupils went on to get a 5C then it would appear that was the national average outcome for a 2B pupil when in actual fact it's the national average outcome for a 2A. The benchmark for a 2B pupil therefore becomes a 5C.

The implications for this in a junior school are huge. It is of course highly unlikely that the infant school would depress their results so even without any grade inflation the junior school is in a tricky position. The benchmark for their 2B pupils is a 5C because that is apparently what is happening nationally. Unfortunately for the junior school their 2B pupils are real 2B pupils, not bumped down 2Aers. 

If we add into this any grade inflation by the infant school then the problem is exacerbated even further. The wholesale depression of baselines by primary schools results in unrealistic expectations for schools whose KS1 data are accurate, and any inflation of results at KS1 pushes the expectation still further out of reach. There are direct and indirect factors which explain why so many junior schools' RAISE reports have a green half (attainment) and a blue half (progress). Essentially pupils in junior schools have to make an extra 2 points of progress to make up for the depression of KS1 results by primary schools nationally and possibly a further 2 points to account for any grade inflation in the infant school. 4 extra points of progress just to break even.

Running to stand still.

Unfortunately, the only way to solve this problem is to have a universally administered baseline test.

Watch this space.

Wednesday, 17 June 2015

Tracked by the Insecurity Services

Last night @LizzieP131 tweeted this:

Which was followed by this:

In the past week I've been told by headteachers using one particular system that their pupils need to achieve 70% of the objectives to be classified as 'secure' whilst another tracking system defines secure as having achieved 67% of the objectives (two thirds). The person who informed us of this was critical of schools choosing to adjust this up to 90% and I'm thinking "hang on! 90%  sounds more logical than 67%, surely".

And then this comes in from @RAS1975:



Achieving half the objectives makes you secure? 

It's like a race to the bottom.

So, secure can be anything from 51% upwards. And mastery starts at 81%.

I'm sorry but how the hell can a pupil be deemed to be secure with huge gaps in their learning? And how can a pupil have achieved 'mastery' (whatever that means) when they only achieved 4/5th of the key objectives for the year?

It makes no sense at all.

This is what happens when we insist on shoehorning pupils into best-fit categories based on arbitrary thresholds: it's meaningless, it doesn't work and it's not even necessary. 

It's also potentially detrimental to a pupil's learning. Just imagine what could happen if we persist in categorising pupils as secure despite them needing to achieve a third of year's objectives.

Ensuring that pupils are not moved on with gaps in their learning is central to the ethos of this new curriculum. Depth beats pace; learning must be embedded, broadened and consolidated. How does this ethos fit with systems  that award labels of 'secure' despite large gaps being present in pupils's knowledge and skills?

The more I look at current approaches to assessment without levels the more frustrated and disillusioned I become. System after system are recreating levels and we have to watch them. They may call them steps or bands but they are levels by another name and are repeating the mistakes of the past. Pupils are being assigned to a best-fit category that tells us nothing about what a pupil can and cannot do and risk being moved on once deemed secure despite learning gaps being present. This is one of the key reasons for getting rid of levels in the first place.

So, take a good look at your system. Look beyond all the bells and whistles, the gizmos and gloss, and ask yourself this: does it really work?

And please, please, please, whatever you do, make sure you....