On 2019-04-23 12:47:59, user Brian Levine wrote:
In this study, the researchers assessed concurrent validity of questionnaires against established measures in a sample of 217 participants. There is a strong motivation for this kind of study, which provides useful information for researchers assessing memory, imagery/scene construction, navigation, and future thinking. The researchers are commended for a comprehensive study reflecting many hours of effort in order to execute these measures. My comments will be largely focused on the measures of autobiographical memory (AM), some of which were developed by my group. This comment grew out of a discussion with my trainees who also read the article, including Nick Diamond, Carina Fan, Raluca Petrican, Stephanie Simpson, and Lynn Zhu. I thank the authors for posting this preprint, open to community commentary.
A major contribution of this paper is an emphasis on subjective experience, which, although impossible to assess directly, is important to the consideration of episodic memory. This paper supports the view that subjective and objective instruments do not assess the same thing. As stated by the authors, the use of these instruments depends on the goals of the study. Where we disagree is the premise that seems to be implied in the title, which is that questionnaires (and to some extent, the objective tests) are measuring something different than what they purport to measure.
My main critique of the approach is that it lacks nuance in terms of levels of analysis within AM, which is itself a multifaceted construct. The authors took a strictly univariate approach in which each criterion measure is treated as a unitary measure of a latent construct. Normally, multiple measures would be deployed in a latent construct approach because no single measure is process-pure.
A main finding of the present study is that overall, subjective ratings (either on questionnaires or self-/other ratings of laboratory test performance) correlate with each other to a greater degree than the subjective/objective comparison. This is interesting though not surprising given that subjective measures do not measure the same thing as objective measures, and that they share measurement error bias. This is also the case for the scene construction measure which is held as objective, but in fact takes subjective ratings into consideration in the scoring.
In the Autobiographical Interview (AI), internal details are treated as a measure of a person’s capacity to recover contextual information from past events; external details reflect content not specifically related to the defined event and are therefore considered to be inversely related to cognitive control over memory retrieval. A recovered detail is neutral with respect to subjective/conscious experience. Patient M.L., who had a specific impairment in conscious re-experiencing of the past due to frontotemporal disconnection, showed only marginal reductions in internal detail production, even though his “remember” ratings for the same events suggested a profoundly reduced conscious experience (Levine, Svoboda, Turner, Mandic, & Mackey, 2009). He also showed reduced activation of the AM network when presented with rich retrieval cues for these events. Even more to the point, patients with severe medial temporal lobe amnesia, including H.M. (Steinvorth, Levine, & Corkin, 2005) have produced events with substantial internal details (see also Cermak & O'Connor, 1983).
The SAM episodic subscale, on the other hand, was developed specifically to probe the subjective experience of recollection at the trait level. As noted by the authors, we found that these were unrelated in our original SAM paper in healthy young adults (Palombo, Williams, Abdi, & Levine, 2013; see also Hebscher, Levine, & Gilboa, 2018 for a similar finding), nor were people with Severely Deficient Autobiographical Memory (SDAM) impaired on AM for recent events using the AI. Considering these findings, the above-described patient findings, and the more general findings of dissociation between subjective recollection and recognition performance, as illustrated in the Remember/Know technique, a strong relationship between these two measures should not be expected.
Nonetheless, some relationship between recovered details and self-reported episodic autobiographical re-experiencing at the trait level could be expected. I believe the lack of relationship is owing to the fact that the AI was designed to elicit the richest possible event descriptions from participants. As the authors note, internal details are scored liberally for the sake of reliability (i.e. the “benefit of the doubt” rule where any detail that could reasonable be considered internal was classified as such). However, there was another purpose in eliciting rich episodic autobiographical memories, which was to avoid a false positive classification of memory impairment based on incidental factors, such as misunderstanding instructions, which is of particular importance in studies of aging and clinical samples. Accordingly, under the most commonly used administration method, the subject selects an event for each time period that is highly accessible and likely well-rehearsed. The resulting score therefore reflects the participant’s best possible narrative production. This is why M.L. and H.M. could produce seemingly normal autobiographical narratives.
The SAM, on the other hand, is explicitly designed as a measure of trait mnemonics, not cognitive function as assessed by performance on a given test. The instructions for the episodic questions are “When answering, don’t think about just one event; rather, think about your general ability to remember specific events.” Even assuming that the SAM and the AI are designed to assess the same construct (which as I argue above is not the case) there is a difference between asking how one performs in general versus assessing how they perform when asked to give their best possible narrative by the examiner. By analogy, an introverted person may appear extroverted if required in certain social situations. There is no requirement to cue 5 lifetime period events as originally specified in our 2002 aging study. The AI scoring system has been applied to memories cued in different ways. Harvesting unrehearsed events from significant others may be a more effective way to estimate one’s typical retrieval abilities as opposed to their best possible performance.
The present paper used a sample of young adults. The AI as implemented in our 2002 study was developed for use in older adults and in patients. The internal detail measure is very sensitive to medial temporal lobe integrity. While this has been demonstrated in neuroimaging studies of healthy young adult samples (Hebscher et al., 2018; Palombo et al., 2018), its sensitivity to individual differences in a homogeneous sample of young adults is limited relative to individuals with compromised medial temporal lobe function, especially at the behavioral level. Nonetheless, the proportion of internal/total details or internal details/word count should be examined rather than the raw count of internal details, as the latter is confounded with verbosity. A comprehensive test of this relationship should also examine detail subcategories and time period effects. Given the foregoing I do not expect that this would change the results substantially, but it should be done for completeness.
It is intriguing that the parallel analysis on subjective vs. objective measures of spatial memory yielded significant relationships. This speaks to the complexity of AM relative to spatial memory. In navigation, the criteria for success are clearer than for AM. If someone arrives at the correct location (or gets lost), their subjective and objective experience are consonant. But if someone recalls an episode, it is unclear if the correct criterion is subjective experience or imagery or quantity of detail. As noted above, I agree with the authors that there is a distinction between subjective and objective measures, and that one’s selection of measures should be governed by the goals of the study. I would not agree that the present findings call into question whether or not internal details “is actually a good measure of recall ability” given that this measure (or its variants) has been used in over 170 studies (for table of studies, see AutobiographicalInterview.com), with good evidence for the validity of the internal/external distinction, including associations to brain structure and function. I also disagree that the findings of this study justify the use of vividness ratings alone as proxies for memory recall ability, especially in patients, who may show greater variability and less reliability in their introspective ratings than healthy adults. In any case, generalization to aging or clinical samples from a homogenous sample of younger adults is not justified.
There is great richness to these data that could be exploited in a multivariate data-driven approach. I recognize that this was not the goal of this study, but a multivariate approach such as Canonical Correlation Analysis (CCA) would allow the researchers to detect latent variables and patterns of association across these measures opaque to a series of bivariate correlations and linear regressions. This feels like a lost opportunity in favor of an assumption-laden approach that results in a flat, protracted series of individual analyses that is difficult to follow. In fact, much of the analyses here are already exploratory in that they assess the ability of questionnaires to predict performance on constructs other than the one they were hypothesized to measure. Data driven multivariate approaches are well-suited for such goals.
Finally, I had difficulty understanding the justification for proposing a single sentence test of any psychological construct. Classical test theory dictates that the reliability of a composite is better than the reliability of a single item. While single items may be useful as a screening technique, for pathognomonic signs, or when doing mass testing, they should not be used for assessment of complex traits, where interpretations of individual items may vary across individuals. A brief questionnaire for each construct would be more stable and does not pose an undue burden on participants. There are no psychometric data presented here to support the use of a single item measure aside from the fact that they showed sensitivity in this sample of healthy adults. These overfitted coefficients will shrink if tested in a separate sample. The composite test of all 15 single items could be subjected to psychometric analysis, but it is unclear if this is of interest.
Cermak, L. S., & O'Connor, M. (1983). The anterograde and retrograde retrieval ability of a patient with amnesia due to encephalitis. Neuropsychologia, 21(3), 213-234.
Hebscher, M., Levine, B., & Gilboa, A. (2018). The precuneus and hippocampus contribute to individual differences in the unfolding of spatial representations during episodic autobiographical memory. Neuropsychologia, 110, 123-133. doi:10.1016/j.neuropsychologia.2017.03.029
Levine, B., Svoboda, E., Turner, G. R., Mandic, M., & Mackey, A. (2009). Behavioral and functional neuroanatomical correlates of anterograde autobiographical memory in isolated retrograde amnesic patient M.L. Neuropsychologia, 47(11), 2188-2196.
Palombo, D. J., Bacopulos, A., Amaral, R. S. C., Olsen, R. K., Todd, R. M., Anderson, A. K., & Levine, B. (2018). Episodic autobiographical memory is associated with variation in the size of hippocampal subregions. Hippocampus, 28(2), 69-75. doi:10.1002/hipo.22818
Palombo, D. J., Williams, L. J., Abdi, H., & Levine, B. (2013). The survey of autobiographical memory (SAM): a novel measure of trait mnemonics in everyday life. Cortex, 49(6), 1526-1540. doi:10.1016/j.cortex.2012.08.023
Steinvorth, S., Levine, B., & Corkin, S. (2005). Medial temporal lobe structures are needed to re-experience remote autobiographical memories: evidence from H.M. and W.R. Neuropsychologia, 43(4), 479-496.