Semantic Error Prediction: Estimating Lexical Complexity in Production

14 November 2024, Version 1

Abstract

Estimating word complexity is essential for many computer-assisted language learning technologies. We introduce semantic error prediction (SEP) as a novel task that assesses the production complexity of content words. In SEP, a system has to predict which word token are replacements of tokens from the original text. We use LLMs for this novel task and establish its practical relevance for predicting the vocabulary scores of learner essays, providing a finer-grained assessment of learner skills.

Content