CLER Seminar: The criterial features of the Cambridge Assessment English reading bank

Cambridge Assessment English reading tasks are currently written by skilled external experts (Item Writers) who take authentic texts from sources such as newspapers and magazines.

The texts are then edited to suit the particular difficulty level required. One of the limitations of this approach is the depth and breadth of these source texts i.e. Item Writers tend to use only the source which they have read for pleasure e.g. The Guardian newspaper, National Geographic. Also the majority of the Item Writers are based in the UK and are of a particular demographic: 50-70 year-old ex-teachers. In order to prove that our tests have excellent content validation coverage and are fit for purpose, it is necessary to provide evidence that we are testing a wide and varied range of English language lexicogrammatical structures. Currently we do not have this evidence although we have over 20 years’ worth of past and present exam material stored on our servers.

The question of what we are testing and not testing is the central part of my research project. I will attempt to analyse the data which we have and produce lists of lexicogrammatical structures in the form of n-grams (chunks of language) for both word and part-of-speech in terms of CEFR (Common European Framework of Reference) difficulty. This research project will also consider the predictive qualities of the n-grams using a classification machine learning (ML) approach. I will research which length of n-gram (1-10) and which ML model provides the most precise results based on the known difficulty texts (past and present texts from tests). This ML model will then be used to label the CEFR difficulty of unknown n-grams. By running comparative analysis of n-grams seen in Cambridge Assessment English texts and ones appearing in large web-based corpora, I will be able to produce lists of unknown n-grams together with a predictive CEFR level. These lists, both known and predicted, can then be used within an auto-checker for level of difficulty for Item Writers and other interested parties. We can also start to identify source texts which contain the previously untested n-grams which we would like Item Writers to now use. This coverage and knowing what language is within our tests provides evidence supporting our content validation argumentation, and therefore increases our reliability as a high-stakes language test provider.

Darren Perrett

Speaker: Darren Perrett holds a Master’s degree (MA) in Language Testing from Lancaster University and is currently studying for his PhD in Education from the University of Leeds. He is an Assessment Manager at Cambridge Assessment, a non-teaching department of the University of Cambridge. He is responsible for content production and process improvements for adaptive English language tests. His main interests include auto item generation (AIG), machine learning and general automation of exam procedures. Before working for Cambridge he spent ten years’ working in Ukraine and Czech Republic as an EFL teacher and examiner.