Measuring what matters: Redefining data’s role in schools

Perspective Article

Written by: Amarbeer Singh Gill and Jennifer Curran

Published on: May 13, 2024

Data

9 min read

AMARBEER SINGH GILL, TEACHER EDUCATOR, AMBITION INSTITUTE, UK

JENNIFER CURRAN, RESEARCH SCIENTIST, AMBITION INSTITUTE, UK

We [often] question the judgement of experts whenever we seek out a second opinion on a medical diagnosis. Unfortunately, when it comes to our own knowledge and opinions, we often favor feeling right over being right… We need to develop the habit of forming our own second opinions.

Grant, 2021, p. 18

We distinctly remember telling parents/carers, ‘Your child has been working hard this year; it shows because their end-of-term scores have gone up.’ We also remember the guilt at realising that the quantitative measure that we were sharing didn’t fully represent the progress and development that the pupil had made across the year, because it couldn’t. It’s something we still sit with today and is the reason we felt compelled to write this article. Not because we believe that schools are doing something ‘wrong’, but because the great responsibility that we have as teachers means that we should constantly question our existing practices. We must ensure that we favour being right over feeling right.

In 2018 alone, the Department for EducationThe ministerial department responsible for children’s services and education in England published their report on tackling onerous and inappropriate data requirements (Teacher Workload Advisory Group, 2018); Professor Becky Allen wrote about issues surrounding ‘progress’ measures (Allen, 2018); and the National Director of Education wrote for OfstedThe Office for Standards in Education, Children’s Services and Skills – a non-ministerial department responsible for inspecting and regulating services that care for children and young people, and services providing education and skills how time invested in marking and data practices wasn’t being reflected in benefits to learning (Harford, 2018). We acknowledge that this article may make for uneasy reading as it attempts to scratch the surface of ideas that are at times unwieldy, often abstract, but always important.

How do we know whether students are getting better?

Let’s return to the statement we shared: ‘Their scores have gone up.’

We wanted to communicate whether students were ‘getting better’: if the numbers are increasing, then surely students are improving? We’d assumed that scores alone could pinpoint how students were doing and whether they were getting better. When we interrogated this, the walls started crumbling.

Consider how the tests were administered: we teach some topics across a term before assessing in exam conditions. If learning is knowledge that has been permanently changed (Bjork and Bjork, 2011), then would we feel confident that students would score similarly if they repeated that test a week/ term later (i.e. that their knowledge had been permanently changed)? We know that forgetting is inevitable (Murre and Dros, 2015) and we’ve all experienced students being able to do something well only to struggle after even a small delay. What this means is that end-of-unit/term tests are too short-term a measure to draw valid inferences about learning (Christodoulou, 2016).

What about scores/grades from longer-term measures? Grades describe a range of scores, so are inherently imprecise, and they are also arbitrary: cut-off points placed along the normal distribution of student performance (Christodoulou, 2016). What about scores? Let’s again consider administration of tests: students sit an assessment; it gets marked and a score is produced. A question to ask here is whether a single score accurately reflects ability? Would a student get the same score in a test if they took it in the morning vs the afternoon (Christodoulou, 2016)? Normal variations in mental performance mean that we can’t treat a given score as precise; instead, it will reflect a point within a range that a student would likely achieve under any given circumstances (Allen, 2018).

A grade/score alone can’t reflect progress because:

Termly/unit tests are too short-term a measure
They will only test a small sample of the domain
Grades communicate broadly how well students might be doing in a subject, but they are imprecise by design
Scores are unreliable because they represent a point within a probable range.

So, grades are too imprecise to be compared and a comparison of scores is a comparison of ranges (which can’t be done), which means we can’t measure progress (Allen, 2018). So how can we know that students are getting better? Because our curriculums are designed to ensure this. We sequence and teach the curriculum so that knowledge is built, returned to, and embedded over time (e.g. by regularly providing low-stakes retrieval opportunities) and check whether students are ready to move on by using frequent, targeted formative checks for understanding.

What can grades or scores tell us?

While grades/scores aren’t precise enough to tell us what students know, they can communicate proficiency within a subject. Getting an 8 at GCSE maths tells us that a student is highly competent at maths and will likely be successful in future maths-based study/employment. Outside of national exams, things get a little trickier.

The whole purpose of grades is to communicate a shared meaning (Wiliam and Black, 1996) e.g. GCSE grades are designed to evaluate student performance against an entire national cohort, to give an idea of relative proficiency. However, schools sometimes apply them as a ‘point-in-time’ measure, which can be problematic when it comes to shared meaning. If we award a student a grade 3 for their end-of-year-7 exam, does it mean that we expect them to get a 3 at GCSE? Or that if they take their GCSE today, they’d get a 3? Or something else entirely?

An alternative might be to provide scores, but again we run into problems. As discussed, scores are highly variable and, in more subjective disciplines (e.g. English and art), can also be influenced by the marking. In our experience, teachers will often qualitatively agree, but the challenge comes in trying to give a score based on criteria that require interpretation, with the result often varying, depending on the marker – it’s why moderation exists. An alternative might be to use comparative judgementAn approach to marking where teachers compare two students’ responses to a task and choose which is better, then repeat this process with other pieces of work, which instead requires assessors to compare pupil work and simply decide which is better (Wheadon et al., 2020; Jones et al., 2016).

As the above perhaps demonstrates, there are no easy solutions to this incredibly complex problem:

Scores give a false sense of preciseness
Grades avoid this but can create confusion if they overlap with national exam grades.
Both can be threatened by subjective marking, which might be eased by comparative judgement, but its use must be driven by the purpose that having those comparisons would serve.

As former teachers, we appreciate the pressures of the system and know that you can’t just stop considering grades/scores. However, we hope that this article supports you to reflect upon the meaning attributed to them, the sources used to select them and the way in which you communicate them to stakeholders.

What are the leadership implications?

Leaders could consider the evidence behind what they are requesting and challenge ‘how we’ve always done it’.. Too often, teachers feel that they are being asked for data that doesn’t seem to serve a purpose. Sometimes, leaders believe that specific data (e.g. target grades) can impact wider school improvement, but this purpose doesn’t get shared with staff. Sometimes, these clear reasons are missing. As a former primary school assessment lead, it was common practice to collect data about pupils in all subjects, regardless of whether those particular subjects had been taught since the previous assessment. There was a box on the software programme that seemed to need filling. Where use of data is effective, leaders already ensure that teachers understand the rationale. To communicate the rationale clearly with staff, leaders need to ensure that they themselves are aware of why the data is being collected.

Alongside this, leaders could be highly selective in the data that they ask colleagues to collect. Because of the time and accountability pressures, ‘we sometimes use data that we have to hand rather than what we need’ (EEF, 2019, p. 15). So, we might end up with data that was collected to serve a different purpose from the one for which we’re using it, which can result in misplaced justifications under the guise of ‘evidence’, e.g. a failure to meet the grade/progress target is seen as a lack of teacher/curriculum effectiveness. We can mitigate this by:

using multiple sources to triangulate and draw more confident conclusions (e.g. accompanying end-of-year tests with teacher reflections when understanding attainment for individual students)
acknowledging the original sources of data (e.g. ‘The single grade “FFT estimate” actually summarises a distribution of grades. A particular pupil is estimated to have an x% chance of achieving a grade 9, a y% chance of achieving a grade 8, and so on. The single grade estimate is the mid-point of that distribution.’ (FFT Education Datalab, 2022, para. 5))
acknowledging the strengths/limitations of data and selecting others that will balance those out (e.g. lesson observations carried out regularly and by different people to mitigate against their subjective nature ).

While we’ve discussed how leaders might select data effectively, the questions preceding this are: What are we hoping to achieve by collecting this data? How will it improve teaching and learning? Is it more important that teachers can accurately assess in the moment, making decisions about how to support pupils during a lesson? Many schools have made shifts towards this approach already. What would happen if leaders in even more schools focused more on what inferences teachers are making with pupils day to day, considering the suggestions above, and less on each pupil’s row in their master spreadsheet?

What are the systemic implications?

We’ve considered teachers and leaders. However, a common and valid pushback is that all teachers and leaders work within a system of accountability. The 2016 report of the Independent Teacher Workload Review Group stated that ‘Although the Ofsted framework has changed, there is evidence to suggest that workload pressures associated with inspection have not been eased.’ (p. 6)

One challenge here is the constantly moving goalposts. Daniel Kebede made the case when he responded to the idea that the Ofsted framework has had positive changes : ‘the Ofsted framework has changed five times in nine years. That is incredibly difficult for a profession to keep up with… What sort of evidence do they collect in that Ofsted framework window? It can be really problematic, creating its own internal pressures.’ (Education Committee, 2023, p. 3) While the changes might be positive, the pace of change means that school leaders share expectations with staff, only to find that those no longer match the inspection framework, and sometimes labour-intensive data-gathering is not fit for purpose. Some of these changes reflect the fact that the focus is now on curriculum (intent, implementation and impact), rather than quality assurance by assessing outcomes. However, some schools are still collecting more data than others to evidence impact or are using it to drive their curriculum work; ‘“data drops” are more frequent in schools currently judged by Ofsted as requires improvement (RI) or inadequate’ (Fischer Family Trust, 2019, p. 4).

Conclusion

Despite changes in policies and frameworks, practices have lagged. A suggestion is that this is because using assessment data provides an efficient, low-cost and seemingly objective way in which to measure the impact of teaching via student progress (Bitler et al., 2021). Our hope is that this article prompts you to consider the following questions:

Are teachers managing the tension between moving through the curriculum and responding to the needs of the students (identified through regular checks for understanding)?
What inferences are we making from the data?
How do we know that those inferences are valid/accurate?
Are leaders communicating well-informed rationales for data collection?

Confronting these questions is no small ask and may mean difficult conversations with parents/carers, colleagues and perhaps ourselves. Yet it is something that we must ask of ourselves, our schools and our systems: to be brave in our leadership and give ourselves permission to be led by our curriculum and our teaching – to make the difficult choice of prioritising being right over feeling right.

Allen B (2018) What if we cannot measure pupil progress? In: Becky Allen. Available at: https://rebeccaallen.co.uk/2018/05/23/what-if-we-cannot-measure-pupil-progress (accessed 8 January 2024).
Bitler M, Corcoran SP, Domina T and Penner EK (2021) Teacher Effects on Student Achievement and Height: A Cautionary Tale. Journal of Research on Educational Effectiveness 14(4): 900–924.
Bjork EL and Bjork RA (2011) Making things hard on yourself, but in a good way: Creating desirable difficulties to enhance learning. In: Gernsbacher MA, Pew RW, Hough LM et al. (eds) Psychology and the Real World: Essays Illustrating Fundamental Contributions to Society. New York: Worth Publishers, pp. 56–64.
Christodoulou D (2016) Making Good Progress? The Future of Assessment for Learning. Oxford: Oxford University Press.
Education Committee (2023) 17 October 2023 – Ofsted’s work with schools – oral evidence. Available at: https://committees.parliament.uk/event/19541/formal-meeting-oral-evidence-session/#:~:text=The%20inquiry%20received%20mixed%20evidence,inspection%20helps%20individual%20schools%20improve (accessed 14 March 2024).
Education Endowment Foundation (EEF) (2019) Putting evidence to work: A school’s guide to implementation. Available at: https://dera.ioe.ac.uk/31088/1/EEF-Implementation-Guidance-Report.pdf (accessed 11 March 2024).
Fischer Family Trust (2019) How is data used in schools today? A 2019 survey of current practice. Available at: https://fft.org.uk/how-schools-use-data (accessed 8 January 2024).
FFT Education Datalab (2022) A quick overview of FFT estimates for secondary schools. Available at: https://ffteducationdatalab.org.uk/2022/11/a-quick-overview-of-fft-estimates-for-secondary-schools (accessed 11 March 2024).
Grant AM (2021) Think Again: The Power of Knowing What You Don’t Know. New York: Viking.
Harford S (2018) Assessment – what are inspectors looking at? In: Ofsted blog. Available at: https://educationinspection.blog.gov.uk/2018/04/23/assessment-what-are-inspectors-looking-at (accessed 8 January 2024).
Independent Teacher Workload Review Group (2016) Eliminating unnecessary workload associated with data management. Available at: https://assets.publishing.service.gov.uk/media/5a8014c5e5274a2e8ab4e12a/Eliminating-unnecessary-workload-associated-with-data-management.pdf (accessed 10 January 2024).
Jones I, Wheadon C, Humphries S et al. (2016) Fifty years of A-level mathematics: Have standards changed? British Educational Research Journal 42(4): 543–560.
Murre JM and Dros J (2015) Replication and analysis of Ebbinghaus’ forgetting curve. PLoS ONE 10(7): e0120644.
Teacher Workload Advisory Group (2018) Making data work: Report of the Teacher Workload Advisory Group. Available at: https://assets.publishing.service.gov.uk/media/5be1ccca40f0b667c116be10/Workload_Advisory_Group-report.pdf (accessed 3 January 2024).
Wheadon C, Barmby P, Christodoulou D et al. (2020) A comparative judgement approach to the large-scale assessment of primary writing in England. Assessment in Education: Principles, Policy & Practice 27(1): 46–64.
Wiliam D and Black P (1996) Meanings and consequences: A basis for distinguishing formative and summative functions of assessment? British Educational Research Journal 22(5): 537–548.

0 0 votes

Please Rate this content

0 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Measuring what matters: Redefining data’s role in schools

AMARBEER SINGH GILL, TEACHER EDUCATOR, AMBITION INSTITUTE, UK

JENNIFER CURRAN, RESEARCH SCIENTIST, AMBITION INSTITUTE, UK

How do we know whether students are getting better?

What can grades or scores tell us?

What are the leadership implications?

What are the systemic implications?

Conclusion

From this issue

Issue 21: Approaches to assessment

Impact Articles on the same themes

From the editor

Teacher professionalism: Redefining a crucial concept

Why does the meaning of professionalism matter for teachers?

Taking the early years seriously: Rethinking workforce professionalism

‘Datafication’: The tension between accountability and the ‘ethic of care’ in primary teaching

‘Teachers these days, they aren’t what they used to be’: An exploration of the expectations of the profession from both established and new teachers

Building a safeguarding culture of professional curiosity

What are we teaching? Teacher professionalism and the ideological battleground of curriculum

Using appreciative inquiry to value teachers’ professionalism

What does love look like in this classroom? Love as motivation for teacher professionalism