File(s) under embargo
Supporting data for "Understanding eye movements during reading using hidden Markov models"
To better understand readers’ eye movement behavior during reading, we used the Eye Movement analysis with hidden Markov models (EMHMM) approach, which takes both spatial and temporal dimensions of eye movement behavior into account to quantify participants’ eye movement patterns and consistency.
Study 1 examined how enlarging the font size from a regular size that young children were typically exposed to affect their reading performance and eye movement behavior. We showed that this manipulation impaired word pronunciation accuracy. This impairment was associated with non-verbal IQ but not changes in eye movement behavior, suggesting that it may be more related to the development of font-size dependent perceptual representations of words than information extraction strategies. The font size change also decreased children’s gaze transition consistency among words, suggesting that it interfered with eye movement planning for implementing and developing visual routines for reading. These results suggested that young children may have initially developed font-size dependent perceptual representations for words and for eye movement planning during sentence reading before forming font-size independent reading behavior as observed in adults through increasing reading experiences with different font sizes.
Study 2 examined whether word boundary information, including using space or alternating colors as word boundary markers, could facilitate Chinese sentence reading performance in children. EMHMM revealed two representative gaze transition patterns through clustering: a more sequential and a less sequential pattern. As compared with regular sentences, when using alternating colors as word boundary markers, a larger gaze transition pattern change towards the more sequential pattern was correlated with a larger increase in comprehension accuracy. This effect was not observed when using spaces as word boundary markers. Thus, an instructional design should consider its potential costs on the cognitive processes established during prior learning, as well as the potential individual differences in the ability to adjust the existing processing strategy according to the new instructions.
Study 3 compared LLMs and humans in resolving semantic ambiguities by examining the case of segmenting overlapping ambiguous strings in Chinese sentence reading, where three characters “ABC” could be segmented in either “AB/C” or “A/BC” depending on the context. We showed that although LLMs performed worse than humans, they demonstrated a similar interaction effect between segmentation structure and word frequency order, suggesting that this effect could be accounted for by statistical learning of word/token occurrence regularities without assuming an explicit semantic word representation. Nevertheless, across stimuli LLMs’ responses were not correlated with any human performance or eye movement measures, suggesting differences in the underlying processing mechanisms.
In conclusion, here we used EMHMM to understand whether manipulating features in reading materials including font size and word boundary information would facilitate children’s reading. The results showed that children have an initial established visual routine for reading, and only individuals with more flexible visual routine would facilitate from the instruction. We also examined how LLMs and adults resolved the ambiguities due to lack of word boundary information. The results suggested that it is essential to understand these differences through XAI methods to facilitate LLM adoption.