Random forests as an exploratory tool for eye-movement control in reading, Victor Kuperman, PhD
Abstract: Decades of research have amassed an impressive body of knowledge on sources of variability in eye-movement control in reading. Major sources include text characteristics (i.e. properties of letters, morphemes, words, sentences, or passages) and participant characteristics (clinical status, age, reading experience, IQ, working memory, etc.). As a result, word length, frequency of occurrence and predictability in context – and more recently, component skills of reading (Reichle et al., 2013) – are routinely used as benchmark predictors of eye-movements and core parameters of computational models of eye-movement control (Reichle et al., 2006; Engbert, 2005). However, little effort has been allocated to establishing how important individual predictors or (sets of predictors) of eye-movements are relative to other predictors (or other sets). Yet such information is crucial for highlighting which aspects of linguistic complexity and individual ability and skill are central for efficient reading and when in the time-course of reading they are engaged.
I will present a study in which the non-parametric machine-learning technique of random forests evaluates the relative importance of a large set of text-related and participant-related variables as predictors of eye-movements and comprehension scores observed during text reading. I will demonstrate the utility of this method both for the comprehensive description of individual differences and language-driven variability in reading behavior unfolding over time, and for the generation of specific hypotheses that can be pursued with the confirmatory analysis.