Rapid repetitive syllable sounds associate with episodic memory, executive function, and working memory in cognitively healthy and subjectively impaired older adults

Participant recruitment and eligibility

A cross-sectional community sample of older adults was recruited from the Island Study Linking Aging and Neurodegenerative Disease (ISLAND), for which a full protocol has been previously published [27]. ISLAND was launched in Tasmania, Australia, in 2019 as a 10-year public health initiative to educate residents aged 50 years and over on reducing their modifiable dementia risk factors [27].

ISLAND participants were invited to complete the TAS Test protocol, an online battery of speech and motor-cognitive tests (full protocol previously published) [2]. In brief, TAS Test comprises five sections, each focusing on different tests and abilities: (1) video hand movement and FT tests, (2) keyboard tapping tests, (3) visuomotor tests for visuoperceptual deficits and reaction times, (4) visuospatial tests of ability and working memory, and (5) tests of motor speech (DDK) and language abilities (verbal picture description).

The participants who elected to be involved self-administered TAS Test remotely and without researcher supervision at their convenience between October and December 2022 by logging into the TAS Test website through a personalized link, providing consent, and following the on-screen text and audiovisual instructions. If participants needed to resume testing after an accidental exit or interruption, they could log in again through their personal link and resume from their most recent incomplete test. Participants were not able to navigate backwards to retake completed test items.

HC and SCI group criteria

Because this study targeted adults without overt symptoms, we excluded any participants who reported conditions associated with cognitive or movement difficulties. Those who responded “yes” to any of the following ISLAND background survey questions were excluded: “Have you been told by a doctor that you have dementia?”; “Have you been told by your doctor that you have a memory impairment, but they were uncertain if you have dementia?”; “Have you been diagnosed with delirium?”; and “Have you been diagnosed with a central nervous system degenerative disease, e.g. Parkinson’s, Huntington’s, Multiple Sclerosis?” TAS Test survey questions were also considered, and participants were excluded if they reported any prior diagnoses of “Parkinson’s disease,” “Multiple Sclerosis,” or “Mild Cognitive Impairment” in these items.

Following this screening, participants were assigned to our HC or SCI groups depending on their response to the ISLAND background survey question, “Have you noticed a substantial change in your memory and mental function in recent years?” Participants who responded “no” were assigned to the HC group; those who responded “yes” were assigned to the SCI group. Figure 1 illustrates the participant recruitment and classification process.

Fig. 1figure 1

Flow diagram of participant recruitment, cognitive group classification criteria, and motor speech samples included in the final analysis dataset

Demographic and clinical information

Participants’ demographic and clinical details were surveyed at the time of their recruitment to ISLAND (between 2019 and 2020). From these baseline data, we included the following details in our models: age in years (at the median TAS Test login date), sex (self-report response options: male, female, other, prefer not to say), highest level of education attained, anxiety and depression scores as reported using the Hospital Anxiety and Depression Scale (HADS), and “yes” or “no” responses to the question “Have you been diagnosed with a psychiatric disorder e.g. depression, psychosis, bipolar disorder, anxiety disorder?” The HADS and psychiatric items were included to adjust for effects of anxiety and depression on test performance and were not applied as exclusion criteria. Further characteristics of the ISLAND cohort are available in the protocol and interim results [27, 28].

Cognitive assessment

ISLAND participants remotely completed two tests selected from the Cambridge Neuropsychological Test Automated Battery (CANTAB) [29] in August 2021.

Paired associates learning (PAL)

The PAL test assesses visual episodic memory through the memorization and location of increasing numbers of hidden patterns concealed in boxes on-screen. The test typically takes 8 min to administer. The score used in this study is the PALTEA6 score, which represents participants’ total errors on the six-pattern sequence, adjusted for incomplete or failed trials.

Spatial working memory (SWM)

The SWM test suite assesses working memory and executive function (planning and decision-making). Participants memorize and identify the locations of hidden pattern sequences and search for predetermined numbers of hidden tokens. The test typically takes 15 min to administer.

The scores used in this study are SWMBE6 for spatial working memory and SWMS for executive function. SWMBE6 measures the number of times during a six-token search trial that a participant revisits a box in which they had previously found a token. SWMS measures search strategy through the number of unique boxes a participant uses to commence their search processes over the course of all their trials.

Motor speech data collection

Within the motor speech section of the TAS Test protocol, participants completed three fast, 10-s monosyllabic DDK tests (“pa", “ta", and “ka”) in fixed order. Participants were presented with the following instructions in text and video, with the video including a demonstration by a speech pathologist (“pa” example described): “Please repeat the sound ‘pa’ as fast as you can, like this: ‘pa-pa-pa…’. Keep going until you see the ‘Well done’ sign.” Once ready, participants commenced a 5-s countdown to each test by clicking a button labelled “Start the Test.” Fig. 1 in the Online Resource shows the steps of the “pa” test.

No standardized equipment or physical configuration of recording space was used, as participants were instructed to use their personal desktop or laptop computers to access the web-based TAS Test battery at home.

Acoustic feature extraction pipeline

The fast “pa", “ta", and “ka” speech recordings were stored in single-channel WAV format at a variety of sample rates from 8 to 48 kHz since capture parameters varied depending on participants’ computer hardware and settings. All recordings with 8 kHz sample rates were excluded upon review due to distorted quality (three total; one “pa,” one “ta,” and one “ka”). Recordings of negligible size or length (likely due to issues during capture or upload) and those containing no utterances were also excluded. We applied non-stationary noise reduction using the Python package “noisereduce” (v3.0.2) [30] to reduce the influence of ambient noises.

For feature extraction, we used two methods to create a combined set of complementary features from the same set of preprocessed recordings. The first method was our custom Python script, which used the package “librosa” (v0.10.1) [31] to load recordings, normalize signals, evaluate relative changes in energy over time, and extract onset event times, and the “signal” submodule of “SciPy” (v1.13.0) [32] to locate local waveform peaks. Individual features were then computed using our directly specified formulae. The second method was developed by Redenlab® and has been previously published [33,34,35]. To facilitate the description of our full feature set, Fig. 2 annotates key events and classifications on a waveform excerpt adapted from a DDK recording.

Fig. 2figure 2

Annotated amplitude view adapted from a DDK recording showing seven syllable repetitions. The onset events mark the commencement of the syllable in each repetition; the inter-onset intervals (IOIs) are the time between sequential syllable onsets (i.e., time to complete each repetition cycle). The “pa", “ta", and “ka” syllables all begin with voiceless plosives (“p", “t", and “k”) and the voice onset events mark when the production of the voiced vowel (“a”) begins in each repetition. In this study, voice onset time (VOT) is measured as the duration between the speech onset event and the voice onset event. Each repetition is comprised of two zones: “syllable” during active speech production and “pause” for the segment between the end of the utterance and the start of the next repetition

As Fig. 2 shows, the main events and zones in the DDK tests are the onset of speech in each repetition, voice onset (vibration of the vocal folds) at the commencement of the vowel sound, and the distinction between the spoken “syllable” and silent “pause” zones within each repetition. Table 1 presents our full list of features, grouped by their underlying qualities to reflect the key events or zones used in their calculation. Features derived from the qualities of “onset,” “IOI,” and “energy” were extracted using our custom script, and those derived from the qualities “syllable,” “pause,” and “voice onset” were extracted using the methods developed by Redenlab®.

Table 1 Motor speech feature definitions grouped by underlying qualityStatistical analysis

The goal of this analysis was to determine whether any of the DDK features improved cognitive score model fit over demographic and clinical variables, rather than exploring individual feature significance and relationships. To this end, regression models were fitted to 18 combinations of factors from the two cognitive groups (HC or SCI), three cognitive score outcomes (PALTEA6 [episodic memory], SWMS [executive function], and SWMBE6 [working memory]), and three DDK tests. Null model variables were participant age, sex, highest level of education, presence of a psychiatric disorder diagnosis, and baseline HADS anxiety and depression scores. In R, cognitive scores were modelled using generalized linear models with outcome distributions suitable for count data: Poisson for executive function (“glm” function, base R v4.4.0), and negative binomial for episodic and working memory to account for overdispersion (“glm.nb” function, “MASS” v7.3–60.2) [36]. The “dredge” function of the “MuMIn” package (v1.47.5) [37] was used to evaluate whether DDK features made significant contributions to model fit over the null variables in any of the 18 test cases. This was achieved by using “dredge” to fit, evaluate, and rank models in each case, starting with the fixed set of null variables, followed by the null variables plus every linear combination of at least one feature. No interaction terms were included. To reduce computational cost and pairwise collinearity between DDK features, we prohibited the simultaneous inclusion of DDK features in any single model if their pairwise correlation coefficient exceeded 0.6 in all three DDK tests.

All model selection was performed using the corrected Akaike Information Criterion (AICc) [38]. Models containing DDK features were deemed to explain significantly more variance than null models if ΔAICc (computed as AICcnull − AICcspeech) exceeded the rule of thumb threshold of two [39]. In each test case, if at least one model containing at least one feature was found to be superior to the null model, this was interpreted as supporting evidence that the DDK test improved estimation of the applicable cognitive outcome. Nagelkerke’s pseudo-R2 (R2N) is reported as an estimate of variance explained.

As a supplemental analysis into whether motor speech features improved prediction of HC vs. SCI, we fitted logistic regression models to each syllable-based subset of our data (i.e., “pa", “ta", and “ka”). Using the R package “pROC” [40], we compared receiver operating characteristic (ROC) curves for models fitted with null variables only to models fitted with all null model variables plus all motor speech features.

Comments (0)

No login
gif