Grading examinations using expert judgements from a diverse pool of judges

Grading examinations using expert judgements from a diverse pool of judges

[featured_image]
  • Version
  • Download 120
  • File Size 216.57 KB
  • File Count 1
  • Create Date August 2, 2018
  • Last Updated August 2, 2018

Grading examinations using expert judgements from a diverse pool of judges

In normal procedures for grading GCE Advanced level and GCSE examinations, an Awarding Committee of senior examiners recommends grade boundary marks basedon their judgement of the quality of scripts, informed by technical and statistical evidence. The aim of our research was to investigate whether an adapted Thurstone Pairs methodology (see Bramley and Black, 2008, Bramley, Gill and Black, 2008) could enable a more diverse range of judges to take part. The key advantage of the Thurstone method for our purposes is that it enables two examinations to be equatedvia judges making direct comparisons of scripts from both examinations, and does not depend on the judges’, internal conceptions of the standard required for any grade.A General Certificate of Education (GCE) Advanced Subsidiary (AS) unit in biology provided the context for the study reported here. The June 2007 and January 2008 examinations from this unit were equated using paired comparison data from thefollowing four groups of judges: members of the existing Awarding Committee, other examiners that had marked the scripts operationally, teachers that had taught candidates for the examinations but not marked them, and university lecturers that teach biology to first year undergraduates.We found very high levels of intra-group and inter-group reliability for the scales and measures estimated from all four groups’, judgements.When boundary marks for January 2008 were estimated from the equated June 2007 boundaries, there was considerable agreement between the estimates made from each group’,s data. Indeed for four of the boundaries (grades B, C, D and E), the estimates from the Awarders’,, examiners’, and lecturers’, data were no more than 1 mark apart, and none of the estimates were more than 3 marks apart.We concluded that the examiners, teachers, lecturers and members of the current Awarding Committee made very similar judgments, and members of all four groups could take part in a paired comparison exercise for setting grade boundaries without compromising reliability.

Attached Files

FileAction
paper_2b711e294.pdfDownload