- Home
- Previous APAN Programs
- APAN Session Browser 2025
APAN Session Browser 2025

Daniel Polley
Topic areas: brain processing of speech and language hierarchical sensory organization neural coding
Fri, 11/14 9:10AM - 10:00AM | S1
Abstract
Abstract
Jennifer Linden
Topic areas: brain processing of speech and language cross-species comparisons hierarchical sensory organization neural coding
Fri, 11/14 3:25PM - 4:15PM | S2
Abstract
Abstract
Melissa Caras
Topic areas: brain processing of speech and language hierarchical sensory organization
Fri, 11/14 2:00PM - 2:30PM | S3
Abstract
Abstract
Xinyi Zou, Zhuoran Lyu, Xiaona Fan, Xuanlin Lyu and Joji Tsunada
Topic areas: auditory memory and cognition correlates of auditory behavior/perception neural coding
Fri, 11/14 12:00PM - 12:45PM | T1
Abstract
Parental care is essential for the offspring’s survival and well-being. A unique feature of human infant care is the involvement of both mothers and fathers, with the latter referred to as paternal care. However, the neural mechanisms underlying paternal care remain poorly understood. This knowledge gap persists because cooperative breeding, where both parents contribute to offspring care, occurs in only ~5% of animal species, limiting the availability of suitable animal models. Marmoset monkeys, a non-human primate species naturally exhibits paternal care in both wild and laboratory settings, provide a rare opportunity to study this behavior. Leveraging this advantage, our study aimed to uncover the neural mechanisms underlying paternal behavior in marmosets. Building on insights from studies of maternal behavior in rodents and infant-retrieving behaviors in marmosets, we hypothesized that sensitivity to infant calls in the auditory cortex of marmoset fathers increases after birth, and this change promotes paternal behavior evoked by these calls. To test this hypothesis, we longitudinally tracked changes in the behavioral and neural sensitivity of marmoset fathers to infant calls. Specifically, we continuously monitored caregiving behaviors within family groups in the housing colony and developed a novel behavioral paradigm to assess responses to infant distress calls in the laboratory. As expected, immediately after birth, marmoset fathers exhibited significantly stronger behavioral responses to infant distress calls compared to other auditory stimuli, such as frequency-matched band-pass noise. Interestingly, these responses gradually attenuated as infants began to spend more time independently in the housing environment, suggesting that paternal behaviors observed in the laboratory reflect naturalistic caregiving dynamics. Consistent with the behavioral findings, auditory cortical responses to infant distress calls during passive listening increased after birth, particularly in the theta and high-gamma frequency bands of the local field potentials. Importantly, these changes were not observed in neural responses to adult vocalizations or other non-infant-related stimuli, demonstrating a selective enhancement of neural sensitivity to infant-specific stimuli. Together with our preliminary analyses of neural data collected during active engagement in the behavioral paradigm and infant-retrieving tasks, our findings suggest that enhanced neural oscillations in specific frequency bands of the auditory cortex underlie call-induced paternal caregiving infant-directed behaviors in marmoset fathers.
Rajvi Agravat, Maansi Desai, Gabrielle Foox, Alyssa Field, Sandra Georges, Jacob Leisawitz, Anne Anderson, Dave Clarke, Elizabeth Tyler-Kabara, Howard Weiner, Andrew Watrous and Liberty Hamilton
Topic areas: auditory memory and cognition correlates of auditory behavior/perception
Fri, 11/14 12:00PM - 12:45PM | T2
Abstract
Our brains are constantly filtering incoming sounds to understand our environment. Are some temporal lobe areas more selective to certain sound streams? While extensively studied in adults, understanding how auditory stream selectivity develops and matures throughout childhood remains a critical gap in our knowledge. Using intracranial stereo-electroencephalography (sEEG), we investigated this question by presenting 51 participants (ages 4-22, 29M/23F, over 8000 electrodes total, with over 1000 temporal lobe electrodes total) with audiovisual movie trailers that contained both speech and music. We extracted and analyzed high gamma band activity (70-150 Hz) to index local neural firing. Data were analyzed using a computational source-separated STRF analysis. We used deep neural networks to decompose the mixed speech-music stimuli into isolated speech and music components, then built separate encoding models for each condition (speech only, music only, and original speech-music mixture). Although the participants heard only the original speech-music mixture and were not directed to attend to any particular stream, STG activity was significantly better modeled by speech only compared to the original mixed and music only models (p
Aysegul Gungor Aydin, Elizabeth B Torres, Mark D Tambini, Elias Youssef and Kasia Bieszczad
Topic areas: correlates of auditory behavior/perception neural coding neuroethology and communication
Fri, 11/14 12:00PM - 12:45PM | T3
Abstract
Hearing loss in mid-life has been identified as the largest modifiable risk factor for Alzheimer’s Disease (AD) (Livingston et al., 2024), yet the mechanistic link between auditory function and dementia remains unclear. If a common biological mechanism of neurodegeneration underlies both, then a very early biomarker of AD risk may be tractable in an auditory neural signal. Sound-evoked auditory brain signals may thus serve as a window into early biobehavioral changes that precede AD progression. Thus, we determined if an auditory neural signature of genetic AD risk could be detected in animal models of AD as a candidate prodromal diagnostic for early intervention strategies using the non-invasive, rapidly acquired auditory brainstem response (ABR). ABR morphology has long been proposed as a marker for AD (Tarawneh et al., 2021), its clinical utility has been limited by insufficient sensitivity and specificity. We applied a novel multidimensional parametric feature extraction method, developed for neurodevelopmental disorders (Torres et al., 2013, 2023), to 100s of single-trial ABRs in genetically-modified rats to show we can accurately predict AD genetic risk, stratify AD by root cause, and predict outcomes of novel interventions. AD models used were CRISPR/Cas9 genetically modified knock-in rats (N=15; male & female) that harbor familial early-onset AD risk mutations to amyloid precursor protein (AppSwedish) and/or Psen1 (Psen1LF) vs. wildtype humanized genes (Tambini et al., 2019; Tambini & D’Adamio, 2020). This powerful knock-in AD model makes no a priori assumption about the pathogenic mechanisms nor the behavioral consequences thereof; only the unbiased genetic one, and at the APP locus, whose proteolysis has a central role in both familial and sporadic AD pathogenesis. We found that normal-hearing AppSwe (Swedish familial AD risk variant) rats separate clearly from controls and from Psen1LF rats in multidimensional parameter space, with increasing separation in older age and is sex-dependent. Notably, auditory training normalizes the ABR of AppSwe rats into overlapping parameter space as trained controls, which supports a central vs. peripheral (e.g., cochleopathic) source of dysfunction. These preclinical data are the first ever to show how the auditory system provides a biomarker for early-life detection of AD and lays the groundwork to test the synergy of auditory and cognitive functions in human dementia.
Hemant Kumar Srivastava, Katharina Bochtler, Jordan Drew, Nikolas Scarcelli, Hong Jiang and Matthew McGinley
Topic areas: neural coding
Fri, 11/14 2:30PM - 3:15PM | T4
Abstract
Activity in neuromodulatory systems fluctuates continuously during wakefulness, influencing neural processing broadly across the brain. While state-dependent modulation has been extensively characterized in the auditory cortex and thalamus (e.g. McGinley et al., 2015), its effects on earlier stages of auditory processing remain poorly understood. To address this, we recorded single-unit activity from the dorsal cochlear nucleus (DCN) of awake, head-fixed mice using high-density Neuropixels probes, while simultaneously tracking pupil diameter and locomotion as proxies for neuromodulatory brain state. The DCN is an early brainstem structure that integrates both auditory and non-auditory inputs, representing a critical site for investigating how internal brain state shapes sensory encoding. DCN neurons were classified into putative cell types: fusiform (excitatory projection neurons), or cartwheel or vertical cell (two classes of interneurons) based on their responses to tones and broadband noise. Spontaneous firing rates exhibited robust, cell-type specific modulation by pupil-indexed arousal level. Moreover, neuronal synchrony within the DCN varied systematically with pupil size, in a cell-pair type-specific manner, suggesting that internal state not only modulates individual neuron excitability but also affects circuit-level interactions. The effects of pupil size on sound-evoked responses in DCN were comparatively small. In similar recordings in downstream inferior colliculus (IC), a distinct pattern was observed. Here, tone-evoked responses showed an inverted-U dependence on pre-stimulus pupil size, consistent with earlier reports from auditory cortex and thalamus. Furthermore, unsupervised clustering revealed subpopulations of IC neurons with distinct state-dependent patterns and unique response properties. The effects of pupil size on spontaneous activity in IC were comparatively small. These findings demonstrate that early auditory brainstem circuits are dynamically modulated by internal brain state, but with a strikingly different pattern than in IC, thalamus, and cortex. In particular, the DCN shows strong state influence on spontaneous activity and its synchrony, highlighting its role as a critical node in state-dependent auditory processing. Ongoing experiments are assessing the impact of other state measures, such as ear movements, on state-dependent DCN sound processing. McGinley, M. J., David, S. V., & McCormick, D. A. (2015). Cortical membrane potential signature of optimal states for sensory signal detection. Neuron, 87(1), 179-192.
Satyabrata Parida, Jereme Wingert, Jonah Stickney, Sam Norman-Haignere and Stephen David
Topic areas: correlates of auditory behavior/perception neural coding subcortical auditory processing
Fri, 11/14 2:30PM - 3:15PM | T5
Abstract
Neural manifolds (NMs) capture the geometry of population activity in sensory and motor systems, offering insight into how neural dynamics support cognitive function. Manifolds typically generalize across individuals and behavioral states, providing a robust characterization of neural representations independent of the specific neurons recorded. However, NM properties can vary across systems. For example, the visual cortex exhibits high-dimensional NMs, while motor cortex exhibits low-dimensional ones, reflecting differences in representational demands. Despite progress in other systems, the structure and generalizability of the NM in auditory cortex remain poorly understood. Here, we characterized the NM underlying natural sound representation by single neurons in the ACx. Using multichannel recordings from over 3,000 neurons in primary (A1) and secondary (PEG) fields, we collected over 40 hours of responses to natural sounds from awake, passively listening ferrets. Sound-evoked activity was high-dimensional, with population correlations decaying as a power law (exponent ≈ 1), closely resembling observations in the visual system. This suggests a common property across sensory systems that balances efficiency and robustness. To further probe NM structure, we trained a large deep learning-based encoding model to predict sound-evoked activity of the entire neural population. While the model accurately predicted the high-dimensional geometry of the recorded neural activity, it accomplished this with a relatively low-dimensional (≈ 100) encoding manifold, as revealed by model bottleneck embeddings. This encoding manifold was highly consistent across animals, underscoring its generality. The encoding NM also recapitulated key features of cortical sound representations. Simulated responses to natural sounds exhibited heterogeneous rate-level tuning and contrast gain control. The NM also captured diverse rate and temporal coding of click-train frequency. This analysis revealed a previously underappreciated subpopulation of neurons exhibiting a dual (rate and temporal) coding strategy. Together, these findings demonstrate that a low-dimensional encoding manifold can capture the diversity of a high-dimensional neural response space in the auditory cortex.
Allan Muller, Sophie Bagur and Brice Bathellier
Topic areas: auditory memory and cognition multisensory processes neural coding
Fri, 11/14 2:30PM - 3:15PM | T6
Abstract
During wake, sound-evoked and spontaneous neural activity of the auditory cortex evolve in distinct subspaces whereas anesthesia disrupts sound responses and merges these spaces. To evaluate if similar modifications of the sound representation geometry explain sensory disconnection during sleep, we followed large neural populations of the mouse auditory cortex across slow wave sleep and wakefulness. We observed that sleep dampens sound responses but preserves the geometry of sound representations which remain separate from spontaneous activity. Moreover, response dampening was strongly coordinated across neurons and varied throughout sleep spanning from fully preserved response patterns to population response failures on a fraction of sound presentations. These failures are more common during high spindle-band activity and more rarely observed in wakefulness. Therefore, in sleep, the auditory system preserves sound feature selectivity up to the cortex for detailed acoustic surveillance, but concurrently implements an intermittent gating mechanism leading to local sensory disconnections.
Malinda McPherson-McNato, Eduardo Undurraga, Aidan Seidle, Olivia Honeycutt and Josh McDermott
Topic areas: hierarchical sensory organization neural coding
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
Pitch is a building block of speech and music, but the extent to which pitch perception is shared across cultures is unclear. Evidence from Western participants suggests that pitch perception relies on multiple representations. For instance, harmonic tones are easier to discriminate in noise than inharmonic tones despite comparable discrimination in quiet, suggesting that different representations are used in noise and quiet. We tested whether these effects are present cross-culturally, comparing participants from the USA and a Bolivian Amazonian indigenous community (the Tsimane’). Participants heard two-note melodies and reproduced the melody by singing. Tones were either harmonic or inharmonic and were presented in noise or quiet. Both groups exhibited two characteristics of pitch perception previously seen in US listeners: the direction of pitch changes could be reproduced with equal accuracy for harmonic and inharmonic tones in quiet but was better for harmonic than inharmonic tones in noise. However, replicating previous work, Tsimane’ vocal reproductions were unrelated to the absolute pitch or chroma of the stimulus notes, differing from the tendency seen in Western participants to match pitch and/or chroma. These findings indicate that the basic structure of pitch perception is shared across cultures despite other differences in pitch-related behavior.
Zyan Wang, Sharlen Moore, Joy Wang, Yeonjae A. Lee, Ziyi Zhu, Ruolan Sun, Adam Charles and Kishore Kuchibhotla
Topic areas: brain processing of speech and language correlates of auditory behavior/perception neural coding
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
The speed of goal-directed to habit transitions has been debated since Clark Hull first asked in 1943: is habit formation slow or sudden? Using a ‘volitional engagement’ paradigm, we gave mice free access to citric-acid water that reduced—but did not eliminate—reward-seeking for plain water in an auditory go/no-go task. Animals learned to discriminate quickly but exhibited state-like fluctuations in engagement for thousands of trials. Strikingly, these fluctuations abruptly ceased (‘transition’) well after discrimination stabilized. Pre-transition behavior was sensitive to sensory-specific outcome devaluation, while post-transition behavior was not, indicating a transition from goal-directed to habitual behavior. HMM-GLM modeling pinpointed the transition to ~3 trials. Post-transition behavior showed orthogonal evidence for habitual control: increased motor stereotypy, reduced latencies, and altered phasic pupillary responses. Although the transition appears abrupt behaviorally, it could arise from gradual neural changes crossing a threshold, or a switch-like process that engages a readily available habit circuit. To adjudicate between these models, we focused on the dorsal striatum since bilateral lesions of the dorsolateral striatum (DLS) blocked the transition. Fiber photometry revealed a sharp drop in outcome-related and sharpening of stimulus-response activity in the DLS at the moment of a transition, supporting a switch-like mechanism. Thus, habits can emerge suddenly, mediated by an abrupt dorsal striatal shift from outcome- to stimulus-driven processing. Ongoing works uses optogenetic manipulation to examine the causal relation between dorsal striatal activity and transitions to habit. In addition, we are exploring cell-type (i.e., D1/D2 MSNs) and projection-specific pathways to define the precise circuit-level logic of decision control in the dorsal striatum.
Yaser Merrikhi, Carina Sabourin and Stephen Lomber
Topic areas: brain processing of speech and language correlates of auditory behavior/perception
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
One of the fundamental roles of the cerebral cortex is to assimilate and synthesize information from various sensory modalities, creating an integrated perception of the environment. While it is well established that auditory areas of cats develop multisensory capabilities within the first six months, the progression of these multisensory functions into adulthood, beyond six months, remains less characterized. This study investigated age-related changes in visually-evoked local field potentials (LFPs) in the primary auditory cortex (A1) and the higher order auditory area, the dorsal zone (DZ). Using laminar arrays of electrodes, we recorded LFP responses in A1 (n=741 recording sites) and DZ (n=525 recording sites) of 6 cats aged 9-36 months under light ketamine anesthesia in response to visual stimulation (80 lux, 500 ms). Our analysis focused on the power and phase consistency of LFP responses, critical indicators previously shown to be affected by cross-modal visual inputs. Our results indicate non-significant and inconsistent correlations between the power of LFP signals and age in DZ across various frequency bands. In contrast, A1 showed consistently positive, though non-significant, correlations across all frequency bands, with notable strength in the theta (θ; r=0.7, p=0.211) and alpha (α; r=0.726, p=0.102) bands. Significantly, the power of visually-evoked LFP responses in A1 increased relative to DZ with age, particularly in the θ (r=0.822, p=0.045) and α (r=0.902, p=0.012) bands. Similarly, phase consistency of LFP responses in DZ displayed inconsistent, non-significant correlations across frequencies. Conversely, A1 exhibited consistently positive, non-significant correlations, with stronger effects in θ (r=0.635, p=0.175) and α (r=0.643, p=0.169) bands. Notably, as cats aged, the phase consistency of LFP responses in A1 significantly outpaced DZ in the θ (r=0.872, p=0.023) and α (r=0.934, p=0.006) bands. These observations suggest a progressive enhancement in A1's ability to integrate visual stimuli with age, indicative of augmented multisensory processing capabilities in older cats. The increase in LFP response properties (i.e. LFP power and phase consistency) mainly reflects greater synaptic input from visual pathways in A1 as cats age. Our data suggest that this trend does not occur in DZ, highlighting potential regional variations in how the aging auditory cortex adapts to multisensory inputs. This work was supported by grants from the Canadian Institutes of Health Research and the Natural Sciences and Engineering Research Council of Canada.
Jing Wang and Nai Ding
Topic areas: cross-species comparisons neural coding neuroethology and communication
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
Speech and music are commonly distinguished by their characteristic temporal modulation spectra, typically peaking at ~5 Hz for speech and ~2 Hz for music. While this distinction has been robust across languages and musical genres, it remains unclear whether the observed rhythmic timescales merely reflect acoustic properties or also encode communicative functions and cognitive development. In this study, we examined how temporal modulations vary across vocal expressions and over development, revealing a functional continuum between speech and music shaped by both expressive context and maturational changes. To examine this, we analyzed a broad range of vocal productions, including canonical speech, sports command, city cries, expressive utterances (e.g., crying, sobbing, laughter), character-by-character reading, single-character interjections reading, and song. Modulation spectrum analysis revealed that these vocal forms occupy intermediate positions between prototypical speech and music, forming a speech–music continuum. For instance, sports command, city cries, and crying exhibited strong energy below 4 Hz, overlapping with music-like patterns. In contrast, character-by-character reading consistently peaked near 5 Hz. This suggests that modulation spectra vary systematically with communicative function and vocal style, rather than binary category membership. Crucially, a cross-sectional analysis of speech and singing from children aged 2 to 12 revealed that slow modulations (~0.5–1 Hz)—a hallmark of adult music and expressive prosody—only emerge robustly after age 10. This pattern suggests that the ability to generate or process extended rhythmic structures is not innate but emerges gradually with neurocognitive and motor development. Together, these findings challenge the strict dichotomy between speech and music and highlight a shared temporal architecture that is graded across contexts and emerges with development. We propose that slow temporal modulations reflect not only acoustic features but also evolving capacities for affective, artistic, and rhythmic communication.
Joel Ward, Daniel Smith, Alexander Feldman, James Ramsden, Andrew J King and Kerry Walker
Topic areas: brain processing of speech and language cross-species comparisons hierarchical sensory organization neural coding
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
Pitch is our perception of the tonal quality of sounds, and it plays a key role in our ability to selectively attend to a single voice in a crowded room. Multiple acoustical cues, including temporal periodicity and harmonicity, are known to contribute to our perception of pitch, but it is not well understood which of these pitch cues are most important in guiding attentive listening or how this may change with ageing. We investigated pitch-based selective attention by testing 75 normal-hearing (based on a pure tone audiogram) participants on a psychoacoustical task. Participants were asked to report the final two words of a target sentence and ignore a competing talker with the same voice at a different pitch. We manipulated potential pitch cues in separate testing blocks by either randomly jittering harmonic frequencies, replacing harmonics with noise (simulated whispering), including only “resolved” (2-5th) harmonics, including only “unresolved” (8-20th) harmonics, randomizing harmonic phase alignment, or adding reverberation. We found that older healthy listeners (>65y) performed worse than younger adults (18-40y) in all stimulus conditions (2-way ANOVA; p
Rhea Choi and Steven Eliades
Topic areas: auditory memory and cognition correlates of auditory behavior/perception subcortical auditory processing
Fri, 11/14 4:30PM - 6:00PM | poster + podium teaser
Abstract
Vocalization is a sensory-motor process requiring auditory self-monitoring to detect and correct errors in vocal production. During vocal production there is a well-described suppression of neural activity in the auditory cortex. Simultaneously, many neurons exhibit an increase in their sensitivity to experimental perturbations in sensory feedback. The potential neural computations underlying these vocalization-related responses, however, remain unknown. In part, investigation has been limited by the inherent variability in natural vocal production, challenging direct comparison with sensory responses during non-vocal conditions. In this study, we investigated neural activity in the auditory cortex of vocalizing marmoset monkeys, presenting parametrically-varied tone stimuli during vocal production, and comparing responses to similar sounds during passive listening. We found that responses to tone stimuli were reduced during vocalization when compared to tones presented when an animal was not vocalizing, including tones presented with playback of vocal sounds. The effects of vocal production on sensory responses often varied systematically across different tone frequencies, in a fashion suggestive a divisive gain modulation, rather than a subtractive effect of vocal suppression. We did not find any changes in frequency tuning attributable to vocalization. We further evaluated the relationship between responses to individual tone frequencies and the acoustics of the overlapping vocalizations, as well as differences due to neurons’ passive receptive fields. These results suggest possible computational mechanisms underlying vocalization-induced suppression, and may suggest possible targets for investigating local cellular and circuit mechanisms, with implications for our broader understanding of predictive and sensory-motor processes and their effects on sensory coding. This work was supported by NIH/NIDCD Grant R01-DC018525
Lorenzo Mazzaschi, Andrew J. King, Ben D. B. Willmore and Nicol S. Harper
Topic areas: correlates of auditory behavior/perception hierarchical sensory organization neural coding thalamocortical circuitry and function
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
To make sense of dynamic natural sounds, cortical neurons must be sensitive to sequential dependencies in sounds at various timescales. This suggests that the mammalian auditory cortex uses memory processes that retain information over seconds and fractions of seconds. To understand this neural system, we built computational models to predict responses of auditory cortical neurons. Existing models use long delay lines as memory, which is biologically unrealistic. We therefore developed new models with memory mechanisms arguably more consistent with biology. Specifically, we incorporated gated recurrency - a powerful concept from machine learning. This involves units with switchable memory components (gates) that enable relevant information to be retained whilst forgetting irrelevant information. Our aim was to improve the performance and biological plausibility of auditory models by capturing neural sensitivity to sequential stimulus dependencies. We fitted gated recurrent models to extracellularly recorded spiking activity of neurons recorded in in primary auditory cortex (A1) and higher auditory cortex (posterior ectosylvian gyrus, PEG) of awake ferrets in response to natural sounds. We compared the capacity of these models to predict neuronal responses, relative to standard delay-line models. Notably, we developed a gated recurrent model, the corticofugal long short-term memory (f-LSTM) model, which incorporated input gating resembling that described for corticofugal projections to subcortical regions. The f-LSTM performed the best of all the models for both A1 and PEG responses. We further improved the biological plausibility and performance for the f-LSTM model in A1 by adding an initial feedforward expansion layer. In A1, the deep cortical layers 5-6 , the source of descending corticofugal projections, receive extensive input from the superficial layers 2-4, which have a higher neuronal density than layers 5-6. This added expansion therefore mirrors high-dimensional processing in superficial cortical layers. Analyzing our models, we also found substantially longer memory retention in PEG than A1, suggesting a central role for PEG in memory-mediated processing. We also identified two kinds of stimulus features that evoked responses for which gated recurrency may be particularly relevant: sharp changes in sound intensity and periods of quiet. These results suggest that the auditory cortex may use a form of gated recurrency and that corticofugal projections and superficial-layer dimensionality expansions are important in understanding auditory cortical memory systems.
Tomas Suarez Omedas and Ross S. Williamson
Topic areas: auditory memory and cognition correlates of auditory behavior/perception subcortical auditory processing
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
A key challenge for the auditory system is to maintain reliable representations of sound despite interference from background noise (BN). Although the ascending auditory pathway contributes to constructing noise-invariant representations, the underlying mechanisms and neural subpopulations involved remain poorly understood. In the primary auditory cortex (ACtx), the canonical cortical microcircuit, thalamus-> layer (L)4 -> L2/3 -> L5, transmits sensory information across cortical layers. Here, we examined how neural representations of sound evolve through this microcircuit to achieve noise invariance. Using cell-type-specific two-photon calcium imaging in mouse ACtx, we recorded sound-evoked activity from axonal inputs of the auditory thalamus (MGBv) and somata of L2/3, L5 intratelencephalic (IT), and L5 extratelencephalic (ET) neurons. Mice were presented with pure tones either in silence or embedded in white BN. To assess noise-invariance, we applied information-theoretic metrics, population decoding, manifold geometry, and pairwise correlation analyses across single-cell and population levels. Tone-evoked responses in MGBv and L2/3 were significantly reduced in the presence of BN, leading to degraded stimulus encoding. In contrast, L5 IT and ET neurons maintained robust single-cell and population-level responses, suggesting the emergence of a noise-invariant output code in the deep layers of ACtx. To investigate mechanisms supporting this transformation, we developed a holographic optogenetics paradigm to probe functional connectivity within L2/3. We transiently activated small groups of neurons that were either co-tuned to the same frequency or co-modulated by BN. Co-modulated ensembles more effectively drove similarly co-modulated neurons than non-modulated or differently modulated ones. These findings suggest that local connectivity in L2/3 is structured to selectively reinforce noise-invariant subnetworks. Together, our results demonstrate that noise-dependent thalamic inputs are progressively transformed into noise-invariant auditory representations through canonical cortical circuits, with L5 projection neurons serving as a noise-invariant broadcast output.
Nathan A. Schneider, Michael Malina and Ross S. Williamson
Topic areas: auditory memory and cognition correlates of auditory behavior/perception
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
Auditory categorization, the transformation of continuous acoustic features into discrete perceptual categories, is critical for guiding decisions and actions in daily life. While the auditory cortex (ACtx) is necessary for categorical behavior, the contributions of specific excitatory cell types remain unclear. Nestled amongst several populations of cells, layer 5 (L5) extratelencephalic (ET) neurons are uniquely positioned to influence behavior, as they project to subcortical structures involved in decision-making, motor control, and reward. To investigate their role in perceptual categorization, we trained head-fixed water-restricted mice to categorize the rate of sinusoidal amplitude-modulated (sAM) noise bursts as “fast” or “slow” by licking left or right to receive a water reward. Using cell-type-specific GCaMP8s expression and two-photon calcium imaging, we then recorded from L5 ET, as well as L2/3 and L5 intratelencephalic (IT) neurons. In expert mice, L5 ET neurons exhibited robust categorical preferences for fast or slow sAM stimuli during active task engagement. These categorical representations were absent in untrained mice and emerged gradually with learning, as revealed by longitudinal imaging. Notably, the same neurons did not exhibit categorical selectivity during passive listening to identical stimuli, suggesting top-down modulatory input. In contrast, L2/3 and L5 IT neurons lacked categorical selectivity under all conditions. To ensure these effects were not confounded by movement, we fit a generalized linear model to dissociate sound-driven and movement-related activity. Categorical tuning in L5 ET neurons persisted even after excluding movement-related neurons, confirming that these signals reflect auditory processing rather than motor output. Finally, we investigated how population activity encoded behavioral choices. On ambiguous trials at the category boundary, principal component analysis revealed choice-predictive trajectories in all three populations, with the strongest and most distinct activity in L5 ET neurons. Together, these results suggest that L5 ET neurons in ACtx acquire task-dependent representations of both perceptual categories and behavioral choices, positioning this projection system as a key conduit for relaying behaviorally relevant auditory information to downstream targets.
Madan Ghimire and Ross S. Williamson
Topic areas: brain processing of speech and language hierarchical sensory organization neural coding
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
Extratelencephalic (ET) neurons in layers (L)5 and 6b of the primary auditory cortex (ACtx) project to subcortical targets and contribute to auditory-guided behavior and learning-related plasticity. Although both subtypes share major projection targets, including the inferior colliculus, thalamus, striatum, and amygdala, they exhibit striking differences in morphology and intrinsic physiology. L5 ET neurons are large pyramidal cells with a prominent apical dendrite extending toward L1, and many are burst-spiking. In contrast, L6b ET neurons often have non-pyramidal, radially oriented somata with widely branching dendrites that span more than a millimeter and are predominantly regular-spiking. Despite these differences, the in vivo functional properties of these neurons remain poorly understood. To address this, we used an intersectional viral strategy to express GCaMP8s in both L5 and L6b ET neurons within the same animal. Two-photon calcium imaging was used to record activity during the presentation of pure tones, sinusoidally amplitude modulated (sAM) noise, and ripples. L5 ET neurons exhibited predominantly excitatory responses, with higher response sparsity and trial-to-trial reliability, especially for pure tones and sAM noise. In contrast, L6b ET neurons showed a greater proportion of suppressive responses, particularly to complex stimuli, and displayed enhanced functional connectivity within their population. Unsupervised clustering revealed distinct response motifs and tuning properties across laminar subtypes. L5 ET neurons were more likely to exhibit monotonic, single-peaked tuning, whereas L6b ET neurons displayed non-monotonic and complex tuning profiles. Together, these findings suggest that L5 and L6b ET neurons support complementary coding strategies, with L5 neurons optimized for precise auditory relay and L6b neurons for integrating and broadcasting higher-order features.
Keith J. Kaufman, Rebecca F. Krall and Ross S. Williamson
Topic areas: correlates of auditory behavior/perception hierarchical sensory organization
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
Arousal state, indexed by pupil diameter, dynamically shapes cortical activity and sensory processing. While prior studies have documented arousal-dependent changes in neural responsiveness, reliability, and tuning in the auditory cortex (ACtx), most have treated the excitatory population as functionally uniform. In reality, ACtx contains diverse excitatory cell types, including intratelencephalic (IT), extratelencephalic (ET), and corticothalamic (CT) neurons, with distinct anatomical and physiological properties. These differences suggest that arousal may influence sensory coding in a cell-type-specific manner. To test this, we combined pupillometry with two-photon calcium imaging in awake mice, targeting L2/3 (n = 3,023), L5 IT (n = 2,447), L5 ET (n = 2,876), and L6 CT (n = 3,311) neurons. A multivariate regression model revealed that sound-evoked responses were modulated by arousal in a linear or non-linear fashion. The shape and magnitude of this modulation varied by cell type: L2/3 and CT neurons showed peak responsiveness at intermediate arousal levels, whereas L5 ET neuron responses increased monotonically and IT neuron responses remained largely unaffected. Hierarchical clustering identified four distinct arousal-modulation motifs (linear increasing, linear decreasing, U, and inverted-U) that were present across all cell types but differed significantly in prevalence. Tuning width analyses revealed that increased arousal led to pronounced multiplicative and additive gain changes in L5 ET neurons, indicating sharper and enhanced tuning compared to other populations. To assess how arousal shapes sensory encoding, we trained a neural network decoder to classify stimulus identity from population activity across pupil states. Decoding performance followed an inverted-U profile for L2/3 and L6 CT neurons, increased monotonically for L5 ET neurons, and remained unchanged for L5 IT neurons. Decorrelating population activity had no impact on decoding accuracy, but response reliability strongly predicted decoding performance and mirrored the same arousal-dependent trend. Together, these findings demonstrate that arousal exerts distinct effects across excitatory subtypes, dynamically reconfiguring auditory cortical representations in a cell-type specific manner.
Jereme Wingert, Jonah Stickney, Brad Buran and Stephen David
Topic areas: auditory memory and cognition subcortical auditory processing
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
Auditory encoding is almost exclusively studied in head-fixed preparations, out of necessity to control acoustic conditions. In contrast, natural listening is dynamic, as both listener and sound sources can move with respect to one another. A broad range of non-sensory information, including orofacial movements, locomotion, location, and task-related variables, have been shown to be encoded in auditory cortex (AC). Excluding a few studies on simplified cases of reafference, the impact of these non-sensory signals during naturalistic listening has not been investigated. To investigate the impact of both explicit acoustic stimuli and non-sensory signals on AC activity during movement, we performed semi-chronic Neuropixel recordings in the AC of four ferrets and used multi-camera tracking while they navigated an arena and performed a sensory detection task. During these tasks, animals had to identify and localize targets in continuous backgrounds of natural sounds presented concurrently from two different locations. Video tracking of head position and angle relative to the sound sources allowed us to virtually head-fix animals and estimate the stimuli reaching the ears via a head-related transfer function. Multimodal convolutional neural network models were fit to predict spiking activity based on aggregate auditory and spatial-motor signals. The multimodal models predicted activity better than models based only on auditory signals. Analysis of residual non-sensory activity showed modulation strongly resembling hippocampal place tuning, with sensitivity to both position and velocity. Place encoding by single units is stable across changing behavioral contexts, as shown by reversing trial initiation and reward locations. Allowing the head-related transfer function to be fit to maximize neural predictions does not remove the presence of positional coding. Also, place coding is still present in animals using headphones for sound presentation and during silent non-auditory periods. Finally, the interaction between sensory and non-sensory encoding can be described by a simplified model where a gain/offset term is applied to the model's sensory prediction, suggesting a high degree of separability between the signals. Place coding at the level of AC may support the transformation from egocentric to allocentric sensory representation in downstream areas.
Matteo Pisoni, Yannick Goulam Houssen, Benjamin Mathieu, Stephane Dieudonne and Brice Bathellier
Topic areas: correlates of auditory behavior/perception neural coding neuroethology and communication thalamocortical circuitry and function
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
Manipulating the activity of neural circuits with single cell resolution is necessary for in vivo dissection of the mechanisms underlying many brain functions. This is currently achieved with parallel illumination techniques, based on digital holography, which spread the excitation light across multiple neurons. However, this approach requires to accumulate large amounts of light energy to reach the action potential threshold. This limits the number of action potentials that can be elicited per unit of time across neurons, setting strong bounds on the size of the neural populations that can be efficiently activated. We reasoned that an ultrafast sequential approach provides higher peak energy levels to each neuron, and thereby would improve activation efficiency due to the quadratic relationship between peak energy and opsin excitation probability with two-photon excitation. To validate this idea, we used ULoVE, an acousto-optic technique previously developed for high-efficiency two-photon voltage indicator imaging (Villette et al. 2019). ULoVE generates local illumination volumes, matching the dimensions of a neuron, and which can be moved from neuron to neuron every 70 µs. When combining this method with the high-conductance opsin ChRmine, bicistronically expressed with GCAMP6m, we could trigger action potentials in a single neuron with a 70 µs illumination duration using as little as 7 µJ of total excitation light in optimized conditions (0.2 µJ laser pulses repeated @ 500kHz). This represents a ~7-fold reduction of the amount of energy per spike compared to the holographic approach with the same opsin (Marshel et al. 2019). Our ULoVE-based approach therefore opens the possibility to stimulate several hundreds to a thousand neurons within tens of milliseconds with no more than 100mW of average laser power, while current techniques can only activate ~100 neurons with similar illumination power over the same time window. We are currently optimizing our new approach by testing further calcium indicator / opsin combinations or expression methods to reduce crosstalk effects observed with the bicistronic ChRmine-GCAMP6m construct.
Charly Lamothe, Antonin Verdier, Corentin Caris, Sophie Bagur and Brice Bathellier
Topic areas: auditory memory and cognition correlates of auditory behavior/perception
Fri, 11/14 4:30PM - 6:00PM | poster + podium teaser
Abstract
Cochlear implants (CIs) are the most successful sensory neuroprosthesis and help almost a million patients to recover functional hearing. However, their relatively poor spectral and temporal accuracy limits speech intelligibility in noise and music perception. Moreover, CIs do not address pathologies involving auditory nerve loss or cochlear malformation. Because auditory cortex stimulation leads to sound perception, a possible approach to circumvent these issues could be to implant the neuroprosthesis at this level. However, cortical implants would require high-level encoding models to transfer information in a format similar to sound representations in the auditory cortex. To address this challenge, we have developed an AI-based encoder allowing us to recreate and map key features of the biological auditory code observed in the cortex while ensuring a high information throughput, which is closely measured by reconstructing the waveforms of the encoded sounds. Our approach uses an autoencoder whose latent space matches the number of electrodes on the stimulator and the temporal resolution of auditory cortex activity. The autoencoder is combined with a classifier of human-labelled sound categories that enables the generation of structured pre-semantic representations. The encoder and decoder are based on modern 1D convolution layers explicitly developed for sound processing, yielding efficient training and excellent performance. We demonstrate that this novel hybrid architecture trained on millions of natural sounds can perform sound encoding on hundreds of channels with almost no information loss while generating stimulation patterns matching multiple neural code properties of the auditory cortex. Preservation of information is established by the high perceptual similarity of reconstructed sounds compared to original sounds, for speech, music and natural sounds. The similarity with natural auditory cortex representations is established by using representational similarity analysis on cortical data and on the representation of our encoder. In addition, specific constraints in the training process allow us to reproduce tonotopy. In addition, we demonstrate that single channel activities for well-known sounds have temporal profiles similar to cortical neurons with a variety of onset, offset and sustained responses, as well as tuning properties observed commonly in vivo. Finally, we show that mice can perceive and use the representations generated by our models with good precision. Hence, our approach allows for precisely delineating the design of a potentially successful auditory cortical implant for auditory restoration.
James Bigelow, Toshiaki Suzuki, Ying Hu, Yulang Wu, Christoph Schreiner and Andrea Hasenstaub
Topic areas: correlates of auditory behavior/perception
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
Neural information processing has traditionally been studied by recording single neuron spike trains and quantifying changes in firing associated with specific sensorimotor events and other processes. Modern tools make it possible to simultaneously record many neurons and detect subsets of coincidentally active neurons. Using these methods, recent studies suggest that information processing by individual neurons differs from moment to moment depending on which other neurons are active at the same time. In our own work, we find many neurons in anesthetized rat auditory cortex (ACtx) are coactivated within multiple, independent coordinated neuronal ensembles (cNEs), which comprise subsets of neurons with reliably synchronous activity. By separating single neuron spike trains into subsets of spikes that are tightly synchronized with members of one or another cNE, we recently found individual neurons are sensitive to different sound features in dynamic moving ripples depending on which cNE they were synchronized with at a given time. This result suggests classic auditory receptive fields derived from single neuron spike trains may in fact reflect an amalgam of multiple underlying stimulus subspaces to which a neuron may respond depending on coactivation with other cNE members. This form of cNE-dependent multiplexing at the level of individual spikes could increase both the encoding capacity of individual neurons and the efficiency with which neural networks represent sensory and other information spaces. Considering the potential broad relevance of this finding for understanding how neural networks process information, the aim of the current study was to determine whether and how cNE-dependent coding is observed in neurons across diverse brain regions, processing a wide range of sensory features and other events. We conducted a series of experiments using Neuropixels probes to record from awake mouse ACtx, thalamus, and hippocampus. We replicated our earlier finding that single unit responses to dynamic moving ripples depend on cNE synchrony, show that this phenomenon generalizes to other sound types, and that it also occurs in auditory thalamus. We next demonstrate that cNE synchrony similarly influences how single units in ACtx respond to non-auditory events such as visual stimuli and locomotor feedback. Finally, we show that hippocampal units similarly respond to sound in ways that critically depend on cNE synchrony. Together, our findings suggest cNE-dependent coding – in which individual neurons play “different roles on different teams” – is a ubiquitous strategy for processing diverse types of information across brain regions.
Jose Martinez, Victoria Wedgewood and Mishaela DiNino
Topic areas: correlates of auditory behavior/perception neural coding
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
Auditory selective attention, which allows individuals to focus on a source while suppressing competing sound, is critical for hearing in background noise but depends on accurate neural encoding. Previous research shows that major depressive disorder (MDD) impairs domain-general attentional ability, but the relationship between MDD symptoms and auditory selective attention has not yet been examined. In addition, although the impact of MDD on auditory cortical activity has been thoroughly investigated, the relationship between symptom severity history and auditory brainstem responses (ABRs) remains under-explored. Therefore, we aimed to quantify ABR differences in individuals with MDD and measure how MDD severity and subcortical measures relate to performance on auditory selective attention tasks. We recruited young adults with verified normal hearing thresholds who each completed the Patient Health Questionnaire-8 (PHQ-8) to assess symptom severity, a measurement of persistent MDD symptoms to track symptom longevity, ABR recordings, and an auditory selective attention task requiring subjects to attend to a target stream of syllables while ignoring two distracting streams. Correlation analysis revealed a significant relationship between PHQ-8 responses and persistent depression scores, indicating that individuals with greater depression symptom severity had experienced those symptoms longer. Regression analyses were conducted to examine the relationships between symptoms of depression, ABR metrics, and performance on the auditory selective attention test. Results showed that participants with greater symptom severity and lifetime history of MDD had significantly reduced wave V amplitudes and longer ABR wave V latencies, indicating poorer late subcortical processing relative to individuals with less MDD history and symptom severity. Individuals with a longer history of depressive symptoms were also those who tended to perform worse on the auditory selective attention test. However, ABR metrics alone did not predict auditory attention scores, suggesting that disrupted cortical processing and/or other cognitive factors in individuals with MDD contribute to poorer performance on the task. Our findings suggest that ABR measures may be potential biomarkers of MDD, particularly because symptom persistence significantly increased these effects. This is key for mapping neural pathways that link brainstem processing and cortical control in MDD. Additionally, these findings suggest that future research aimed at strengthening attention in individuals with MDD should focus on higher order cortical and cognitive impairments.
Ahyeon Choi, Inyong Choi and Kyogu Lee
Topic areas: auditory disorders neuroethology and communication
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
Musical illusions offer a unique lens into the constructive nature of auditory perception. Among them, the scale illusion demonstrates how the auditory system can reorganize dichotically presented pitch sequences into a unified melodic contour that differs from the physical input. While prior neuroscience studies have explored such illusions using event-related potentials (ERPs) or spatial activation patterns, the temporally evolving neural representation of these perceptual reorganizations remains underexplored. In this study, we aimed to demonstrate the neural dynamics of scale illusion via linear decoding of unified melodic contour from listeners’ continuous EEG. We recorded 64-channel EEG data from 27 participants who listened to four variations of the scale illusion: (1) monaural presentation of the original illusion stimulus, (2) monaural presentation of the perceived illusory pitch contour, (3) the original scale illusion presented binaurally, and (4) a timbrally mismatched version of the illusion (piano vs. violin). Temporal response function (TRF) models were trained on the two monaural conditions (Stimuli 1 and 2) and tested on the dichotic conditions (Stimuli 3 and 4). After each trial, participants reported whether they perceived a continuous or fragmented contour. Pitch interval and envelope features were extracted and used as targets for decoding from EEG. In backward decoding, pitch-based models outperformed envelope-based models for Stimulus 3 (illusory, timbrally coherent), with the highest decoding correlation (pitch r = 0.098 vs. envelope r = 0.021). Decoding accuracy was consistently higher for Stimulus 3 than for Stimulus 4 (timbrally mismatched) across both features (p < .001), suggesting that timbral coherence supports neural recoverability of perceptual structure. Moreover, models trained on Stimulus 2 (reorganized percept) predicted Stimulus 3 significantly better than those trained on Stimulus 1 (original input) (p < .001), indicating that neural activity aligns more with perceived, rather than physical, pitch contours. These results demonstrate that pitch features more effectively capture the perceptual organization of illusory melodies in backward decoding. The enhanced decoding performance for the perceptually coherent and timbrally matched condition supports the view that the brain tracks internally reconstructed pitch streams over raw acoustic input. This suggests that illusory percepts are not only consciously experienced but also robustly encoded in neural dynamics, offering insights into the cortical representation of auditory illusions.
Wenhui Sun and Nai Ding
Topic areas: auditory memory and cognition
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
How to reliably recognize speech despite various acoustic degradation is a primary challenge for automatic speech recognition (ASR). It is commonly believed that the human speech recognition system is highly robust to acoustic degradation, and ASR systems should also have this property. Here, we challenge this view by showing that humans are highly sensitive to unfamiliar acoustic degradation, but they can achieve robust speech recognition through rapid adaptation. When first encountering a sentence that is subject to unfamiliar acoustic degradation, human speech recognition tends to be less robust than current ASR systems. Nevertheless, humans can quickly learn to recognize degraded speech, even without any supervision signal. After exposure to 10 degraded sentences, the human speech recognition rate can boost by 50% in some conditions. When the ASR systems are adapted to degraded speech through simple supervised and self-supervised procedures, they could achieve human-level speed of learning in most conditions. These results show that humans achieve robust speech recognition through quick adaptation, and continual adaptation is a plausible strategy to build more robust ASR systems.
Zimo Li, Chen Lu, Maneesh Sahani and Jennifer Linden
Topic areas: brain processing of speech and language correlates of auditory behavior/perception
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
Humans and mice can detect brief silent intervals ("gaps") lasting only a few milliseconds within continuous sound. Minimum duration thresholds for gap detection are commonly used as an index of auditory temporal processing ability, but the underlying neural mechanisms are not understood. Previous studies have argued that onset responses to the post-gap sound are most critical for gap detection, while offset responses to the pre-gap sound might amplify cortical sensitivity to the shortest gaps [1,2]. However, in short-gap conditions, neural responses to sound offsets and onsets overlap in time, making it difficult to distinguish their contributions to the brain's representation of the interruption of a continuous sound. Here, we sought to isolate and track onset-specific and offset-specific population dynamics in auditory cortex of awake mice listening passively to gap-in-noise stimuli with varying gap durations, in order to delineate the relative contributions of sound onsets and offsets to auditory cortical representations of brief gaps in noise. We recorded auditory cortical population activity with single-unit resolution and high temporal precision using Neuropixels probes. Examining well-separated onsets and offsets, we found that population responses evoked by the different events explored different, although non-orthogonal, activity subspaces. We then used normalized-covariance-based techniques to ask how population activity evoked by brief gaps in noise aligned with these subspaces. These methods revealed strong contributions of both onset-specific and offset-specific neural population dynamics to cortical representations of gaps in noise, even when the gap duration was so short (e.g., 2-8ms) that offset and onset responses would have been indistinguishable at a single-neuron level. Further application of the method to data from mouse groups varying in genotype (wild-type or 22q11.2 deletion model) and hearing condition (with or without conductive hearing loss) revealed that peripheral hearing loss impaired both offset-specific and onset-specific neural population dynamics evoked by gaps in noise. These results show that covariance-based population analysis can be used to resolve population coding of closely timed events that evoke overlapping patterns of neural activity. We conclude that both offset and onset responses contribute to auditory cortical population dynamics evoked by very brief silent gaps in noise. [1] Weible, Moore, Liu, DeBlander, Wu, Kentros & Wehr M (2014). Curr Biol 24:1447-1455. [2] Weible, Yavorska and Wehr (2020). Cereb Cortex 30:3590-3607.
Liam Moore, Mishaela DiNino, Elizabeth Fish and Melissa Polonenko
Topic areas: hierarchical sensory organization neural coding subcortical auditory processing thalamocortical circuitry and function
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
Electrophysiological measures, such as auditory brainstem responses (ABRs) and cortical auditory evoked potentials (CAEPs) are widely used by both researchers and hearing healthcare professionals to examine auditory neural function. These evoked potentials are traditionally conducted to a brief click stimulus, which does not mimic speech that we hear in everyday listening. Polonenko & Maddox (2021) created a paradigm in which ABRs and CAEPs can be collected in response to continuous speech, which is thought to better assess the neural function that occurs during everyday listening. Spatialized speech, in which speech is presented to the front and to the side, even more accurately represents real-world listening scenarios. However, neural responses to sound are typically examined from electrodes near the top center of the head (Cz or Fz), and responses from other electrodes may be more optimal to assess cortical responses to spatialized speech stimuli. This study aimed to examine cortical responses at different electrode locations in response to both spatialized and non-spatialized speech. Thirty young adults (aged 18-30 years) with verified normal hearing thresholds listened to spatialized and non-spatialized continuous speech while brainstem and cortical responses were recorded. In the non-spatialized condition, two stories spoken by different talkers were played at simulated center. In spatialized conditions, one story was played at simulated center and another was played to the left or right by applying +/- 600 µs interaural time difference cue. The late latency response, which contains the P1 and N1 event-related potentials, was compared between conditions at different electrode locations, which may reflect different neural generators based on perceived spatial location. Electrode locations were binned as “frontal” (F3, Fz, and F4) or “central” (C3, Cz, and C4). Area under the curve calculation for P1 and N1 responses for the different electrode locations revealed that both P1 and N1 responses differed depending on the speech condition. Interestingly, P1 activity changed across speech conditions in a similar way for the different electrode configurations, but N1 activity was dependent on both speech condition and electrode location. These results suggest that neural activity evoked by spatialized speech conditions is better represented by multiple electrodes rather than the one centralized electrode that is typically used in clinical testing. The findings from this study also provide greater insight into speech processing in the central auditory system.
Sudan Duwadi, De'Ja Rogers, Alex Boyd, Laura Carlton, Yiwen Zhang, Anna Gaona, Bernhard Zimmermann, Joe O'Brien, Alexander Von Luhmann, David Boas, Meryem Yucel and Kamal Sen
Topic areas: correlates of auditory behavior/perception neural coding
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
Complex Scene Analysis (CSA) enables the brain to focus on a single auditory or visual object in crowded environments. While this occurs effortlessly in a healthy brain, many hearing-impaired individuals and those with neurodivergent conditions such as ADHD and autism experience difficulty in CSA, impacting speech intelligibility (Dunlop et al., 2016). We propose using high-density (HD) functional Near-Infrared Spectroscopy (fNIRS) for whole-head brain imaging during complex scene analysis (CSA) in naturalistic settings. This approach allows analysis of cortical activity patterns, with potential applications in enhancing brain-computer interface technologies. Our experimental design mimics an ecologically valid cocktail party scenario in both overt and covert contexts. In the overt scenario, 3-second audiovisual movie clips are presented simultaneously at 30 degrees to the left and right. Prior to each clip, a 2-second spatialized white noise cue is paired with a white crosshair on the corresponding screen, guiding subjects on which direction to focus, with eye movements allowed. In the covert scenario, subjects are exposed solely to spatialized audio from the same set of movies. Here, the 2-second spatialized white noise serves as the cue, directing their attention, while they maintain a gaze on a central screen displaying a static white crosshair. For eye control tasks, the subjects move their eyes 30 degrees to the left and right. fNIRS data were collected from 15 subjects with a whole head, high density cap layout. Our results reveal brain areas that show evoked responses in both overt and covert conditions.
Toshiaki Suzuki, Timothy Olsen and Andrea Hasenstaub
Topic areas: hierarchical sensory organization neural coding neuroethology and communication
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
Recent studies have revealed that the auditory cortex (AC) not only processes auditory information but is also modulated by other sensory information. We have previously shown that the mouse AC contains a group of cells that respond to or are otherwise influenced by visual information (1). These neurons are mainly found in the deep layers, but their inputs remain poorly characterized. Thus, the present study aimed to identify primary sources of visual information in AC using retrograde viral tracing experiments. We first performed localized microinjection of retrograde tracers using iontophoresis. C57BL/6 mice were injected with a viral mixture (AAV2retro-hsyn-EGFP and AAV9-hsyn-ChR2-mCherry) in the right AC (for deep layer: n = 8, depth -0.75mm; for shallow layer: n = 6, depth -0.25mm) and after 3-4 weeks, we perfused and sliced their brains. EGFP-positive cells were analyzed using Aligning Big Brains & Atlases (BioImaging And Optics Platform), with reference to the Allen Brain Atlas (2017 CCF v3). For each injection, most input came from the ipsilateral cortex, especially from the temporal association areas, somatosensory cortex, and visual cortex. EGFP-positive cells were distributed across visual cortices, with a greater number observed in mice that had deep-layer AC injections - notable areas include the primary visual cortex, lateromedial, lateral intermediate, and anterolateral visual areas. These EGFP-labeled cells were predominantly located in the deep layers of the visual cortex Because AAV2retro seemed to spread less toward the subcortical region than we expected, we conducted additional experiments using G-deleted Rabies viral tracing. C57BL/6 mice were injected with a viral mixture (AAV8-hsyn-DIO-TVA(E66T)-EGFP-oG and AAV1-hsyn-Cre) in the right AC (n = 5, depth -0.75mm) by iontophoresis injection. Three weeks later, EnvA-RVdG-mCherry was injected in the same place. We perfused the brain 10 days after Rabies injection. mCherry-positive cells, which indicated the postsynaptic cells from the EGFP-positive starter cells, were also observed in the temporal association areas, somatosensory cortex, and visual cortex but also thalamic regions, including the medial geniculate complex and visual/multimodal thalamic nuclei: the lateral posterior and suprageniculate nuclei. Together, our findings suggest deep layers of the mouse AC receive convergent inputs from multiple visual cortical areas and thalamic regions, expanding our understanding of multisensory integration in the AC. 1. Morrill and Hasenstaub. (2018). Visual Information Present in Infragranular Layers of Mouse Auditory Cortex. J . Neurosci. 38, 2854-2862.
Sarah Tune, Judith Kunze, Justus Student, Carla Nau and Jonas Obleser
Topic areas: brain processing of speech and language correlates of auditory behavior/perception neural coding subcortical auditory processing
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
Perceptual decisions and metacognitive judgements emerge from a dynamic interplay of external sensory evidence and internal psychological states. How fluctuations in neurophysiological excitation-inhibition (E-I) balance mediate their interaction in auditory decision-making remains poorly understood. We tested this brain-behaviour link in a single-blind crossover intervention EEG study where healthy human adults (N=17) performed an adapted version of the ‘Bernoulli clicks’ auditory decision-making task [Keung et al. 2019]. Probabilistic cues and variable sensory evidence strength modulated prior expectation and perceptual uncertainty, respectively. In three separate sessions, participants received a target-controlled low-dose infusion of ketamine (a glutamatergic N-methyl-D-aspartate receptor (NMDAR) antagonist; expected to increase excitation), propofol (a GABA-A agonist; expected to increase inhibition), or placebo. We expected impulsive decision-making under increased excitation, and indecisive decision-making under increased inhibition. We aimed at understanding how both drug-specific alterations and moment-to-moment fluctuations in cortical E-I balance shape sensory encoding, evidence accumulation, and ultimately perceptual choices, thereby giving rise to distinct behavioural phenotypes. To this end, we combined a multi-faceted behavioural characterization based on continuous joystick traces with psychophysical and neural sensory encoding modelling. Comparison of pre- to post-task resting-state EEG spectra revealed the predicted modulation of cortical E-I balance: ketamine flattened the 1/f slope, indicating relative excitation, whereas propofol showed a trend toward steepening. In line with E-I alterations, we observed dissociable sources of suboptimality in behaviour: Under ketamine, behaviour was characterized by an increase in premature responses and a tendency to respond in line with an emerging overall bias to the right side. In contrast, propofol induced a more uniform decrement in behavioural performance with reduced sensitivity and slower reaction times, consistent with decision hesitancy under enhanced inhibition. Our results on how pharmacologically induced alterations of cortical E-I balance are linked to dissociable behavioural phenotypes underscores the critical role of E:I balance for adaptive behaviour, and directly inform computational theories linking cortical gain control to perceptual and metacognitive fidelity.
Jonas Obleser, Leonhard Waschke, Björn Herrmann, Martin Orf and Sarah Tune
Topic areas: brain processing of speech and language
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
Assessing internal noise by probing a system with varying external noise is a well-established method in systems identification and psychophysics. It aligns with item-response theory, where internal observer uncertainty (θ) and stimulus-induced uncertainty (δ) are estimated on a shared latent scale. Yet, in neuroscience, sensory uncertainty (e.g., stochastic afferent encoding) and perceptual uncertainty (e.g., fluctuations beyond encoding) remain conceptually separable but are rarely being dissociated empirically. In this project, we aim to quantify sensory, perceptual, and environmental (stimulus) uncertainty using extant neurophysiological data: LFP recordings from N=29 rodent A1 neurons (Deweese & Zador, 2004) for sensory-level analysis, and EEG from N=25 humans performing pitch discrimination (Waschke et al., eLife 2019) for perceptual-level and behavioural analysis. In rodent A1, internal noise should ideally sit at an intermediate level balancing excitability and sensitivity. In line with this, we observed the largest single-trial evoked responses (±25 ms, and late negative LFP at ±200 ms) at intermediate levels of internal noise, here expressed as pre-stimulus baseline variability σ (LMM on >30,000 trials, controlling for average baseline activity). This variability σ appeared embedded in a slow delta-range (1-4 Hz) rhythm. At the human perceptual level, pre-stimulus perceptual noise should ‘gate’ the impact of incoming sensory evidence, and vice versa. When stimuli are clear and low in uncertainty, baseline noise should have little effect; but when stimuli approach the perceptual threshold, internal noise should play a greater role in shaping auditory encoding. In human EEG, we thus modeled single-trial N1 amplitude (auditory-filtered) as a joint function of pre-stimulus σ and stimulus uncertainty δ (from no to several semitones pitch difference). N1 amplitude showed a significant σ × δ interaction (LMM, >10,000 trials): When pitch deviations were small or absent (i.e., high perceptual uncertainty), pre-stimulus σ predicted N1 amplitude; this effect diminished linearly with increasing stimulus clarity.Finally, treating N1 amplitude as an aggregate proxy of internal noise up to a perceptual stage, we modelled pitch decisions as a function of stimulus evidence (relative to an implicit 1 kHz standard) and encoding quality. A stronger N1 and, less robustly, lower pre-stimulus σ both enhanced pitch sensitivity. These findings highlight how internal uncertainty and external evidence jointly shape auditory perception and motivate a formalised, multilevel model of auditory uncertainty.
Tinghan Chen, Hemant Kumar Srivastava, Melissa Ryan, Hong Jiang and Matthew McGinley
Topic areas: auditory memory and cognition brain processing of speech and language correlates of auditory behavior/perception
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
Rapid acoustic onsets are a salient feature of many natural sounds including speech and environmental noises like rustling. These abrupt sound onsets cue sound objects and are processed by specialized auditory mechanisms tuned for temporal precision. In particular, octopus cells (OCs) in the cochlear nucleus are especially sensitive to these sharp onsets and provide precisely timed inhibitory input to the inferior colliculus (IC) via the ventral nucleus of the lateral lemniscus (Oertel et al., 2000). Despite this well-defined pathway, and functional importance of rapid onsets in object-segmentation and streaming, how these rapid onsets interact with the ongoing stimuli to shape sound processing in the IC remains unclear. To investigate the nature of this interaction, we presented brief, click-like stimuli with bandwidths varying from 1 to 5 octaves and a range of center frequencies, called ‘clickets’. We examined their impact on tone-evoked responses in the inferior colliculus (IC) across varying temporal and frequency offsets in awake head-fixed mice. A simple PSTH-based linear model revealed sublinear integration of tone and click responses, suggesting suppressive interaction. Modulation of tone responses was strongest when clicks and tones were presented simultaneously, consistent with rapid, precisely timed inhibition. Our findings suggest that rapid acoustic onsets engage precisely timed inhibition in the IC, possibly driven by OCs, leading to sublinear integration and suppression of ongoing tone responses. This interaction is strongest at minimal delays, suggesting a possible feedforward inhibitory mechanism for creating ‘pop-out’ of distinct sound objects. Oertel, D., Bal, R., Gardner, S. M., Smith, P. H. & Joris, P. X. Detection of synchrony in the activity of auditory nerve fibers by octopus cells of the mammalian cochlear nucleus. Proc. Natl. Acad. Sci. 97, 11773–11779 (2000).
Chen Lu and Jennifer Linden
Topic areas: brain processing of speech and language correlates of auditory behavior/perception
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
Central auditory processing deficits, such as abnormally elevated duration thresholds for detection of a brief gap in noise, are common in schizophrenia patients (Moschopoulos et al., 2020). However, schizophrenia patients tend to have hearing impairment (Saperstein et al., 2023), and hearing impairment is itself associated with elevated gap-detection thresholds (Fitzgibbons and Wightman, 1982). Do temporal processing deficits associated with hearing impairment and genetic risk for schizophrenia arise from the same mechanisms? Here, we used the Df1/+ mouse model of the 22q11.2 chromosomal microdeletion to address this question. In humans, the 22q11.2 deletion confers ~30% risk of developing schizophrenia and ~60% risk of hearing impairment (Bassett and Chow, 2008; Verheij et al., 2018). The Df1/+ mouse has a homologous chromosomal microdeletion and recapitulates many features of the human deletion syndrome, including susceptibility to developmental middle ear problems (Fuchs et al., 2013; Lu & Linden, 2025). We sought to disentangle the effects of hearing impairment and 22q11.2 deletion on auditory temporal processing by comparing cortical responses to gap-in-noise stimuli between Df1/+ mice with or without naturally occurring hearing impairment and WT mice with or without induced hearing impairment (by removing the malleus bone on P11). We recorded spiking activity of auditory cortical neurons using Neuropixels in awake, head-fixed adult mice passively listening to gap-in-noise stimuli with variable gap durations. We found distinct effects of hearing impairment and genetic risk for schizophrenia on different measures of temporal processing in the auditory cortex. In mice of either genotype with hearing impairment, both single-unit and population-level measures of neural sensitivity to brief gaps in noise revealed poorer temporal acuity, confirming results from human studies. Meanwhile, in Df1/+ mice with or without hearing impairment, fast-spiking units (putative parvalbumin-positive interneurons) exhibited increased sensitivity to noise onsets and offsets. Our findings demonstrate that hearing impairment and the 22q11.2 deletion generate distinct effects on auditory cortical processing: hearing impairment broadly disrupts temporal acuity, while the Df1/+ deletion specifically alters fast-spiking interneuron dynamics. These results highlight separable mechanisms by which hearing impairment and genetic risk for schizophrenia may alter auditory temporal processing.
Judith Kunze, Justus Student, Sarah Tune, Anne Herrmann, Martin Göttlich, Henrik Oster, Alexander Tzabazis, Benedikt Lorenz, Carla Nau and Jonas Obleser
Topic areas: auditory disorders correlates of auditory behavior/perception
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
Ketamine, a glutamatergic N-methyl-D-aspartate receptor (NMDAR) antagonist, models schizophrenia-related hypofunction by shifting neural excitation-inhibition (E-I) balance towards excitation, potentially increasing sensory and cognitive noise. How such NMDAR-driven excitability changes manifest themselves in noninvasive proxies of E-I balance in human EEG remains to be clarified. In a single-blind, crossover pharmaco-challenge study participants (N=17; currently 74 EEG sessions) performed an auditory decision-making task while receiving, in separate sessions, a low-dose infusion of ketamine (expected to increase E-I), propofol (a GABA-A agonist, expected to decrease E-I), or placebo. GABA and glutamate/glutamine (Glx) levels in the left auditory and anterior cingulate cortex (ACC) were assessed 24 h later using magnetic resonance spectroscopy (MRS). Target-controlled infusions based on the Domino and Schnider model were applied to achieve and maintain predicted plasma concentrations of 0.15 µg/ml and 0.5 µg/ml for ketamine and propofol, respectively. Drug effects on vital signs were as follows: ketamine increased systolic blood pressure (BP) on average by 13 mmHg and heart rate (HR) by 8 bpm; propofol reduced BP by 5 mmHg and HR by 4 bpm. As expected, ketamine reliably increased dissociative symptoms. Notably, however, higher systolic BP predicted elevated dissociation scores independent of drug condition, suggesting a link between cardiovascular arousal and dissociative experience. In resting-state EEG, ketamine significantly decreased the 1/f slope (spectral exponent), indicating a shift toward excitation. Propofol showed a trend in the opposite direction. Both drugs reduced alpha power, consistent with their neuromodulatory profiles. In MRS, both drugs increased interregional Glx correlation across Heschl’s gyrus and ACC. These parallel changes suggest a shared systems-level effect on cortical neurochemistry. Our findings demonstrate that low doses of ketamine and propofol modulate EEG- and MRS-based markers of E-I balance and highlight cardiovascular changes as predictors of dissociative experience.
Derek Nguyen and Yi Zhou
Topic areas: hierarchical sensory organization neural coding subcortical auditory processing
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
Local field potentials (LFPs) are low-frequency signals reflecting the summed extracellular activity of local neuronal populations. While LFP response strength reflects the magnitude of these currents, the spatial spread and waveform shape are influenced by the geometry of dendrites and the curvature of cortical layers, particularly in regions with pronounced folding such as sulci and gyri. The common marmoset (Callithrix jacchus) has a lissencephalic (smooth) brain with only two prominent sulci—the lateral sulcus and the superior temporal sulcus—both located in the temporal lobe, with the auditory cortex situated between them. Although the cortical anatomy is simplified, difficulties with tonotopic mapping using LFPs arise due to high individual variability and uncertain signal origins. In this study, we examined LFP propagation across cortical depth and surface area surrounding the primary auditory cortex (A1) in two marmoset monkeys. Using high-density linear silicon probes, we recorded LFPs and single-unit activity (SUA) along both the laminar depth and rostral-caudal-lateral axes. Auditory stimuli included pure tones (2–32 kHz) and broadband noise (BBN) presented at multiple sound levels. We observed persistent LFP responses to pure tones around 18 kHz that propagated at least 2 mm caudally and 5 mm laterally from A1. These responses were strongest in superficial layers and diminished in deeper layers at distal sites. BBN elicited similar propagation patterns but only at higher intensities (10–30 dB SPL). In contrast, within A1, LFP strength increased with depth, spanned a broader frequency range, and was detectable at lower sound intensities. These findings suggest that LFP propagation in the marmoset auditory cortex is frequency- and layer-dependent, with distinct activation patterns within and beyond A1.
Alina Schulte, Cora Caron, Matthew Apps, Dorothea Wendt and Hamish Innes-Brown
Topic areas: brain processing of speech and language correlates of auditory behavior/perception hierarchical sensory organization subcortical auditory processing
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
Understanding speech in noise requires sustained cognitive effort and often leads to fatigue, especially in individuals with hearing loss. This fatigue can reduce the willingness to use hearing aids or engage in conversations. To address this challenge, it is crucial to understand how fatigue accumulates and affects listening decisions. However, self-reports offer only partial insight due to limited conscious awareness of momentary fatigue states. Building on prior work, we applied an existing neurocomputational framework to a listening effort-based decision-making task in order to investigate trial-by-trial listening-related fatigue and its cortical correlates using fNIRS. Data from 5 out of 30 hearing aid users have been collected and analyzed so far (male, 74 years mean age). Participants were asked to maximize points which they could collect by choosing to either listen to and repeat back a sentence (high point value) or by resting (low point value). They also rated fatigue on each trial. Task difficulty (SNR) and rewards (point values) varied across trials. We fitted five effort-discounting models to participants’ trial-by-trial decisions and fatigue ratings and related model-derived fatigue states to cortical oxygenation changes. Our preliminary data show that decisions to listen and subjective fatigue ratings are best explained by a model that includes both recoverable fatigue (rising with effort, falling with rest), an unrecoverable component (monotonically increasing) and correctness of the response. All 5 participants showed significant activation in prefrontal areas with increasing response amplitudes over time-on-task. In two participants, a left prefrontal subregion was associated with unrecoverable fatigue, consistent with earlier findings of physically induced fatigue (Müller et al., Nat Commun, 2021). These preliminary results suggest that listening-related unrecoverable fatigue shares cortical correlates with physical-induced unrecoverable fatigue, validating the effort-discounting framework. Future work may enable monitoring and prediction of fatigue states to inform adaptive adjustments of hearing aid settings or behavioral strategies, helping to prevent mental exhaustion when the subjective value of listening is low.
Alexandra Martin, Chloe Huetz and Jean-Marc Edeline
Topic areas: brain processing of speech and language correlates of auditory behavior/perception neural coding
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
Over the last decades, a large number of studies have described the robustness of auditory cortex responses when target stimuli are presented in situations of acoustic degradations such as noise addition, reverberations, vocoding etc… In very few experiments comparisons have been made with the resistance to noise observed from responses of subcortical auditory neurons. In previous experiments performed in anaesthetized guinea pigs (Souffi et al 2020, 2023), we reported that the responses of inferior colliculus neurons were those showing the highest resistance to noise compared to the other auditory structures. To assess whether this resistance to noise is correlated with the animal’s behavioral performance, we trained awake, water deprived, CBA/J mice in a discrimination task between two guinea pig vocalizations (two whistles) in a Go/No Go protocol: One vocalization was used as CS+ the other as CS-. The CS+ allowed the mouse to obtain a drop of water if the animal licked a spout during the 5 seconds after the CS+ presentation. Initially, the mice were trained in quiet, and when they reached 80% of correct performance, they were trained in increasing levels of stationary noise (at SNR of +10dB, 0dB and -10dB), or of cocktail party noise made of the simultaneous vocalizations of a group of guinea pigs. All mice performed above 80% in quiet and in the +10dB SNR, but their performances were decreased at a 0dB SNR and even more at -10dB SNR. Neuronal recordings were collected in these mice, in awake, passively listening conditions. Neuronal recordings were also obtained while the animals were also engaged in the task. As controls, neuronal recordings were obtained from untrained, exposed mice, which were submitted to exactly the same number of vocalizations as the trained mice. These results should allow determining to what extent the activity of inferior colliculus neurons correlates with the behavioral performance in challenging conditions.
Farhin Ahmed and Bonnie Lau
Topic areas: correlates of auditory behavior/perception hierarchical sensory organization neural coding
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
Given the central role of spoken language in human life, difficulties with speech understanding can have profound negative consequences for daily functioning. Therapeutic intervention can mitigate speech and language problems in many clinical groups, but their success depends on timely and accurate diagnosis. However, identifying these difficulties is especially challenging in pre-verbal infants, who cannot participate in traditional behavioral assessments—resulting in a lack of reliable tools for evaluating early speech and language development. To address this gap, we are working toward developing objective, data-driven measures by analyzing the brain’s responses to speech. Recent advances in machine learning applied to neural data reveal that the human brain tracks the dynamic patterns of incoming speech sounds —a phenomenon well-documented in neurotypical adults. Yet, this line of research has rarely been extended to preverbal infants, particularly those with neurodevelopmental conditions. Our study addresses this gap by measuring brain responses to naturally produced speech in infants from 3 groups: those with Down syndrome, those with a high-likelihood of developing autism, and those with a low-likelihood of developing autism. Infant-directed speech materials were presented to these infants at 6 months and 12 months of age as we recorded their neural responses using electroencephalography (EEG) - a tool that has excellent temporal resolution and is relatively affordable and scalable to clinical settings. Applying machine learning techniques to the EEG data, we aim to obtain objective brain-based measures of speech processing and language understanding. The ultimate goal of this project is to develop a brain-based predictive diagnostic to identify infants that are at increased likelihood of language and learning delays. Such early identification is critical for maximizing the effectiveness of treatment options and for assessing the effectiveness of novel therapeutics.
Julia M. Leeman, Cynthia D. King and Jennifer M. Groh
Topic areas: brain processing of speech and language correlates of auditory behavior/perception hierarchical sensory organization neural coding
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
Perceptually, we group sounds from the same source and segregate sounds from different sources. This study characterizes the effects of consonance and dissonance on this process. We used an auditory streaming paradigm: C-AB-C-AB-C-AB etc, with the higher pitch C presented alone and the lower pitches A and B presented simultaneously. The frequency separation between C and B governs whether B groups with A or C (Bregman & Pinker, 1978). Here, we investigated whether the harmonic relationship between A and B affects this grouping by testing two conditions: A and B are two octaves apart (consonant), or A and B are an octave and a major 7th apart (dissonant). Human participants classified sound patterns as to whether B grouped with A or with C as a function of C’s frequency. We compared the midpoint of the psychometric curves. The consonant condition had a smaller midpoint, suggesting that consonant intervals may be more likely to fuse. This result will help design stimuli to investigate the neural patterns underlying auditory grouping and segregation. With these stimuli, we can compare neural responses to sounds that differ in their perception as grouped or segregated.
Jinhee Kim, Wenkang An, Abigail Noyce and Barbara Shinn-Cunningham
Topic areas: neural coding
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
Auditory selective attention requires coordination of multiple neural processes that operate at different timescales, each implemented by different neural architectures. Listeners can achieve the same outcome by attending to a spatial location or a specific talker, but the underlying mechanisms remain partially understood. We applied representational similarity analysis (RSA) to EEG data collected during auditory attention to examine time courses and topographies of these different attention processes. Each trial began with a visual cue of attention type (spatial, talker, or no-attention), followed by an auditory cue specifying the exact target (spatial: location; talker: voice). Four overlapping syllables (/ba/, /da/, or /ga/) then played, one of which matched the cued target, and participants reported its identity. 21 conditions included spatial attention (left, right), talker attention (male, female), and passive listening. EEG power spectra were computed and averaged within five frequency bands (delta: 2–4 Hz, theta: 4–8 Hz, alpha: 8–14 Hz, beta: 14–20 Hz, gamma: 20–50 Hz) to assess oscillatory activity. Two epochs were extracted from each trial: "cue" for preparatory attention, and "stimulus" for attention during sensory input. For each frequency band and the time-domain EEG signal, a cross-validated linear support vector machine (SVM) trained at each subject and time point classified between each pair of conditions. SVM accuracies yielded a time series of representational dissimilarity matrices for each frequency band and the time-domain signal. The channel-space weighting of each classifier allowed us to probe the spatial structure of neural information. We observed clear effects of attention vs. passive listening in all frequency bands in both the cue and stimulus epochs. In the time-domain signal, cue effects were transient (possibly sensory-driven responses), while oscillatory effects arose gradually and persisted across the trial. The alpha band showed the strongest effect during stimulus presentation, with time-domain signal, theta, beta, and gamma showing moderate differences. Topographies of the most informative channels show posterior contributions of alpha power, with two distinct time dynamics: one reflecting trial-long attention and another that faded after the target onset. The information encoded by alpha power and the time-domain signal appears largely independent of one another, suggesting that this approach captures multiple neural mechanisms. This work shows that RSA is a powerful analytical approach for revealing the dynamics of endogenous attention, advancing our understanding of cognitive control.
Christopher Conroy, Yale E. Cohen and Robert M. McPeek
Topic areas: brain processing of speech and language correlates of auditory behavior/perception neural coding
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
Signal detection theory (SDT) posits a role for both sensory and decision factors in the discrimination of sensory stimuli. According to SDT, discriminating between two categories of sensory stimuli depends on both the observer’s internal sensitivity to those stimuli and the decision criterion that is used for the discrimination boundary. The receiver operating characteristic (ROC) curve is a central construct in SDT, therefore, because it shows how performance in a discrimination task changes with the decision criterion. It can also be used to index sensitivity and sheds light on the form of the internal distributions of sensory events that are associated with the two stimulus categories. Neurophysiological investigations of sensory and decision factors in auditory perception, therefore, may benefit from neurophysiological data obtained from behavioral ROC paradigms. Towards this end, we trained a male rhesus macaque monkey on an auditory spatial discrimination task that was designed to yield a behavioral ROC curve. On each trial, the monkey fixated a central point and two visual-saccade targets – one positioned to the left, the other to the right of fixation – were presented. An auditory stimulus (noise) was then presented to either the left or right of the midline. The monkey’s task was to decide if the noise was to the left or right of the midline and to report that decision by making a saccade to the left or right visual-saccade target, respectively. Trial-by-trial shifts in the monkey’s criterion were induced by manipulating the relative reward size associated with correct-left versus correct-right decisions. The relative reward size was cued to the monkey by the shapes of the visual-saccade targets: they either had different shapes and therefore signaled asymmetric reward potential (and thus were intended to induce relatively liberal or conservative criterion placements) or had the same shapes and signaled even reward potential (and thus were intended to yield a relatively neutral criterion placement). This procedure successfully induced trial-by-trial shifts in the monkey’s criterion and yielded a behavioral ROC curve: for example, when the visual cues indicated the potential for a relatively large reward associated with a correct-right decision, the monkey adopted a liberal criterion with respect to right spatial judgements, whereas, the monkey adopted a more conservative criterion when correct-left decisions were associated with a larger reward. Neurophysiological data obtained using this behavioral ROC paradigm could be useful for understanding the neural bases of sensory and decision processes in auditory spatial discrimination.
Jusung Ham, Jinhee Kim, Miranda Becker, Vibha Viswanathan, Hyo Sung Joo, Do Anh Quan Luong, Jihwan Woo, Barbara Shinn-Cunningham and Inyong Choi
Topic areas: brain processing of speech and language correlates of auditory behavior/perception
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
Background: Understanding speech in noise is a cognitively challenging task, especially for individuals with hearing loss. Conventional hearing aids, which focus on amplifying sound to ensure audibility, often fail to aid listening in noisy settings, underscoring the role of central auditory processing. Auditory selective attention is a fundamental aspect of central, non-linguistic processing that modulates the neural representation to favor a target compared to background noise. Prior studies have found that individual differences in attentional modulation of cortical auditory evoked potentials (CAEP) are associated with speech-in-noise performance. Our previous neurofeedback training (NFT), designed to strengthen the attentional modulation, improved word-in-noise perception in normal-hearing English speakers. Here, we aim to investigate whether such training can generalize to speakers of different languages and to the sentence-in-noise (SiN) task. Methods: Normal-hearing English and Korean native speakers underwent three NFT sessions with a gamified audiovisual brain-computer interface. They were told to attend one of the two speech streams—a female saying "Up" five times and a male saying "Down" four times—presented simultaneously. The target speech was marked by a series of coins placed above or below a center line. According to their attentional state, decoded from single-trial EEG, the rocket icon moved up or down, either gaining or missing the coins. Before and after training, we evaluated the speech-in-noise perception via SiN tasks, where participants repeated sentences presented in multi-talker babble at various signal-to-noise ratios (SNRs). The speech corpora from QuickSIN were used and adapted for Korean speakers. Results: After the training, attentional modulation has been strengthened, indicated by an increased attentional modulation index—the ratio of CAEP amplitude to the onsets of attended vs. ignored speech. The SiN performance, measured by the number of accurately repeated keywords, also improved in challenging SNR conditions. Conclusion: These results support that NFT can enhance sentence-based speech-in-noise perception in both English and Korean speakers. Continued research on larger populations, including those with hearing loss, will contribute to the development of evidence-based rehabilitation of speech-in-noise perception, which could improve the quality of daily communication.
Christa Michel and Ross S. Williamson
Topic areas: cross-species comparisons subcortical auditory processing
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
For all mammals, survival depends on the ability to learn which environmental features predict rewards and to use that information to guide behavior. The caudal “tail” of the dorsal striatum (Ts) plays a central role in this process by integrating inputs from sensory cortices and midbrain dopaminergic neurons to link sensory cues with motivated actions. In the auditory system, the tail of the striatum receives dense excitatory input from both layer (L)5 intratelencephalic (IT) and extratelencephalic (ET) neurons in primary auditory cortex (ACtx), each of which adaptively signals different reward-related features of sounds during learning [Schneider et al., in prep.]. Here, we examined how cell type-specific activity in ACtx and dopaminergic signaling in Ts jointly support auditory-guided learning. Mice were trained on a head-fixed associative learning task in which one of two sinusoidal amplitude-modulated (sAM) noise bursts predicted a water reward after a short (0.5 s) delay. While mice learned to discriminate between the two stimuli, we simultaneously recorded calcium activity from L5 IT neurons and dopamine transients in Ts using fiber photometry. Our preliminary results (n=4) show that sound-evoked activity in L5 IT neurons is prolonged and reliably followed by rapid, robust dopamine transients in Ts. These dopamine responses are time-locked to stimulus onset, occur independently of movement, and consistently precede licking behavior. Ongoing experiments aim to characterize how L5 IT, L5 ET, and dopamine signals evolve with learning and whether they encode trial-by-trial reward predictions or errors. We hypothesize that L5 IT neurons facilitate early sound-reward associations, while L5 ET neurons contribute to the execution of learned behaviors in later stages.
Wenyue Xue, Nolan Sun, Jason Xie and Jun Yan
Topic areas: auditory disorders correlates of auditory behavior/perception neural coding
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
Background Exposure to loud noise results in irreversible outer hair cell damage and a marked elevation of the hearing threshold, while prolonged exposure to moderate noise (MN) typically does not produce significant alterations in hearing sensitivity. Recent studies indicate that MN exposure can result in the impairment of ribbon synapses and the neural plasticity of the central auditory system, i.e., tonotopic map reorganization in the auditory cortex. To date, little is known about the functional alteration in the lower auditory brainstem. This study investigated the immediate changes in neuronal responses along the mouse auditory brainstem following long exposure to pure tone. Methods Under anesthesia (ketamine and xylazine), twenty C57BL/6 young adult mice, aged 4-7 weeks old, were unilaterally exposed to 1 hour of continuous pure tone (characteristic frequency, CF based on the mice's hearing) at 65 dB SPL. Before and after exposure, the auditory brainstem response (ABR) was applied to assess the changes in neuronal auditory response in the brainstem. A paired t-test was used to compare the difference between the ABR recording before and after the exposure. Results Among the 20 mice, the mean ABR threshold was significantly elevated at CF (p = 0.008) and 0.5 octaves above (p = 0.012). Nine of the experimental mice showed threshold shift after the exposure, an average of 6.875 ± 3.48 dB SPL. In the mice with no threshold shift (n = 11), the Wave I amplitude at the threshold level significantly decreased (p = 0.008). The dynamic range measured at CF showed that the responses among 40-60 dB SPL were affected most. At CF, the amplitude of Wave I and Wave II at 60 dB SPL significantly decreased, while Wave V showed no significant change. The individual change of Wave I is consistent with a decrease in amplitude and an increase in latency, while in higher auditory brainstem stages, the variability increased. Conclusions Our results suggest that exposing mice to a 1-hour pure tone with 65 dB SPL impacts the hearing sensitivity in a tone-exposure related pattern. The signal processings in the auditory nerve and cochlear nucleus are impaired, while the impact may probably be compensated in the inferior colliculus.
Zehua Kcriss Li and Samuel Norman-Haignere
Topic areas: brain processing of speech and language correlates of auditory behavior/perception
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
Integrating information across the complex temporal structure that defines music and speech is one of the central challenges of human hearing. Many theories propose distinct neural integration timescales for music and speech, reflecting their differing acoustic and temporal properties. However, no studies have directly estimated integration windows for speech and music in the human auditory cortex, in part due to the challenge of measuring integration windows from nonlinear systems like the brain using coarse, non-invasive neuroimaging methods. To overcome this challenge, we measured integration windows from the human auditory cortex using spatiotemporally precise intracranial recordings from neurosurgical patients coupled with a recently developed method for estimating integration windows from nonlinear (and linear) systems (the Temporal Context Invariance (TCI) paradigm). The TCI paradigm identifies the shortest segment duration for which neural responses remain invariant across varying contexts and does not make any assumptions about stimulus features that underlie the response or the stimulus-response mapping (e.g., linear or nonlinear). Consistent with prior findings, we observed that neural integration windows substantially increase as one ascends the cortical hierarchy from primary to non-primary auditory cortex bilaterally. However, we find that integration windows were very similar for speech and music stimuli across both primary and non-primary auditory cortex in both the left and right hemisphere. These findings suggest that neural integration windows do not change substantially with the category of sound, and thus that information in music and speech is integrated using similar temporal windows.
Jin Dou and Edmund Lalor
Topic areas: auditory memory and cognition
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
How the human brain so effortlessly processes natural speech remains incompletely understood. One approach to studying this issue that has proven useful in recent years centers on estimating linear encoding models – sometimes known as temporal response functions (TRFs) – that allow one to predict brain responses to speech based on the various acoustic and linguistic features of that speech. However, interpreting the predictive accuracy of such models is complicated by large inter-individual differences in the signal-to-noise of various brain measurements and an inability to know what might constitute optimal model performance. Both of these issues would be addressed if it were possible to estimate the maximum explainable variance in each individual’s brain responses to natural speech. A common approach to estimating the explainable variance in brain responses to stimuli involves computing correlations between responses to repeated presentations of the same stimulus. However, this approach is not best suited for natural speech, where repeated presentation of the same stimulus is sure to lead to differences in brain activity across repetitions due to changes in attention and reductions in the encoding of linguistic information. An alternative strategy is to assume that the brains of different people who speak the same language should produce similar responses to the same stimuli. Indeed, using this strategy, previous studies estimated the explainable variance in ECoG and fMRI responses to speech by calculating how accurately a selected subject’s brain signal (target) can be decoded from the signals of other (source) subjects. Specifically, these studies: 1) fit decoders for a target’s brain signals using different numbers of source subjects; 2) fit curves to model the relationship between the number of source subjects and the target subject’s decoding accuracy. Here, we tailored this framework for EEG by 1) first applying multi-way canonical correlation analysis (MCCA) on source subjects’ EEG to reduce the data dimensionality, 2) time-lagging the transformed data before ridge regression to incorporate the temporal differences between source and target EEG, 3) adopting a hierarchical version of bootstrapping to account for the inter-subject differences in EEG SNR. We tested this method on an EEG dataset that was collected from 19 subjects who each listened to about 1 hour of an audiobook. The obtained estimate of explainable variance was then compared with the prediction accuracy of a TRF model that was fit on different acoustic and linguistic speech features. The results showed that the explainable variance at centroparietal EEG channels was not fully explained by the TRF models. This established an important benchmark for future models of EEG responses to natural speech and provides a useful framework for controlling for inter-individual differences when using such models.
Wei-Ching Lin, Anastasiia Rudaeva, Jin Dou, Zehua Li, Aaron Nidiffer, Samuel Norman-Haignere and Edmund Lalor
Topic areas: auditory disorders brain processing of speech and language multisensory processes
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
The idea that top-down, knowledge-based information affects the processing of bottom-up sensory signals has become central to thinking about perception in recent years. And speech perception, in particular, has increasingly been acknowledged to involve the integration of sensory input with expectations based on the context of that speech. However, debate still surrounds the issue of whether or not prior knowledge feeds back to affect early auditory encoding in the lower levels of the speech processing hierarchy, or whether perception can best be explained as a purely feedforward process. Some recent work has provided support for the idea that predictive feedback influences the acoustic-phonetic encoding of speech. Specifically, several scalp EEG studies have reported that the cortical tracking of various acoustic and phonetic speech features appears to be affected by the context-based predictability of words when participants listen to naturalistic (podcast) speech. In the present study, we aim to seek further evidence for this effect – and to see how the effect might vary across different stages of the cortical speech processing hierarchy – by examining neural responses to natural speech using intracranial EEG (iEEG) recordings in neurosurgical patients. Specifically, we analyzed iEEG from patients as they listened to ~12-24 minutes of a podcast. We fitted temporal response functions (TRFs) to model the relationship between acoustic envelope of the speech and the neural data on each iEEG channel (both low frequency activity and high gamma power). We used these TRFs to reconstruct the acoustic envelope on new trials from the recorded iEEG. The reconstruction accuracy served as in index of the low-level acoustic-phonetic encoding of the speech. We then assessed how this index varied for each word as a function of how predictable that word was in the context of the podcast (using an estimate of lexical surprisal computed from a large language model). This analysis revealed that – on many of the iEEG channels – the speech envelope tracking was stronger for words that were more surprising. This finding supports the notion that EEG responses to low level responses to natural speech (at least partially) reflect prediction error. As such, it contributes to our understanding of the neuroscience of perception. And it also provides insights that will be useful in future clinical research, including research involving groups in whom predictive perception is thought to differ.
Andre Palacios Duran, Aaron Nidiffer, Wei-Ching Lin, Zehua Kcriss Li, Samuel Norman-Haignere and Edmund Lalor
Topic areas: brain processing of speech and language hierarchical sensory organization neural coding subcortical auditory processing
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
Speech is a highly dynamic signal with rich acoustic structure. What operations the brain uses to convert this acoustic structure into robust semantic and syntactic representations remains unclear. The fact that brain activity reflects the acoustic dynamics of speech (speech “tracking”) is something that has been widely used to better understand speech processing, attention, and multisensory integration – including via the use of Temporal Response Function (TRF) encoding models. The transient characteristics of TRFs suggest that speech tracking primarily derives from the summation of transient evoked responses. On the other hand, it has been argued that an optimal way for the brain to parse and track speech into linguistic units is through the entrainment of ongoing intrinsic oscillations. While both frameworks have led to insights into speech neurophysiology, the field lacks clarity on the degree to which each mechanism contributes to speech comprehension. Here, we aim to clarify this by directly testing the ability of both frameworks to account for brain responses to natural speech. In a first set of experiments, EEG was recorded from 18 participants as they listened to audiobooks whose speech rate was manipulated. We then compared the dynamics of TRFs computed from the EEG recorded under different rate conditions. In a second set of experiments, we manipulated the temporal regularity of syllables in the same stimuli, producing speech with the same median speech rate but different overall temporal statistics. One condition had nearly isochronous syllables (regular speech), while the other had more fast and slow syllables (irregular speech). Unaltered speech was used as a control condition (natural speech). Both regular and irregular speech were intelligible and resulted in robust cortical tracking. We examined whether models of oscillatory entrainment – optimized to predict the brain data from each condition – could explain cortical tracking. TRFs estimated from simulated EEG generated by these oscillatory models differed from real EEG, suggesting that the oscillatory entrainment models we tested do not represent real neural activity. To test whether oscillatory entrainment may play a role in more spatially confined regions of auditory cortex less visible to scalp EEG, we repeated this analysis using intracranial EEG from neurosurgical patients. These data show qualitatively similar results. Finally, acknowledging that both mechanisms may play a role in speech processing, we model brain responses using TRFs and entrained oscillators in together to see whether accounting for oscillatory entrainment improves predictive accuracy.
Isaac Boyd, Zhili Qu, Howard Gritton and Kamal Sen
Topic areas: auditory memory and cognition correlates of auditory behavior/perception neural coding
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
Mechanisms for solving the cocktail party problem are thought to rely on complex cortical circuitry involving excitatory and inhibitory neurons. Naturalistic stimuli in cocktail party settings contain rich temporal dynamics. Recent work proposed a cortical circuit model that can explain experimentally observed neural discrimination of temporally dynamic stimuli in mouse auditory cortex (ACx). The model also captures the effects of optogenetic suppression of parvalbumin (PV) neurons, revealing the specific contribution of PV neurons to improving cortical temporal coding. This model architecture consists of a three-layer network with (i) two input excitatory (E) neurons that produce a signal corresponding to stimuli onset and offset respectively, (ii) an intermediate pair of E neurons plus one inhibitory PV interneuron, and (iii) an output PV neuron and read-out E cell. Thus far, the task of finding the best synaptic weights for a given network architecture has been accomplished manually. Here, we describe two approaches for finding the synaptic weights in the network automatically: backpropagation and a genetic algorithm (GA). We compare and contrast these models and describe the family of network architectures consistent with experimental recordings of cortical responses to temporally dynamic stimuli. The model provides a more holistic understanding of the functional contribution of different neuronal types, specifically PV neurons, in solving the cocktail party problem.
Donghyeok Lee, Ashley Qin and Matthew Leonard
Topic areas: correlates of auditory behavior/perception hierarchical sensory organization neural coding
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
Understanding the neural underpinnings of individual variability in learning requires examining brain states beyond simple task-evoked responses. The pre-stimulus period, rather than being a neutral baseline, may harbor critical neural dynamics influencing subsequent cognition. We hypothesized that network-level spatiotemporal patterns like cortical traveling waves (Muller et al. 2018; Zhang et al. 2018) may reflect dynamic brain states linked to evolving internal belief states that predict upcoming perceptual encoding and learning outcomes. Here, we analyzed high-density electrocorticography (ECoG) data from English-speaking human participants performing a challenging perceptual category learning task where they learn to identify Mandarin tone categories (Yi et al. 2021). Behaviorally, participants exhibited highly variable, non-monotonic learning of the tone categories. To understand the factors that underlie these learning dynamics, we developed a belief updating model and quantified trial-by-trial behavioral uncertainty as Shannon entropy. We hypothesized that behavioral uncertainty on each trial would be related to pre-stimulus (500ms before sound onset) brain states, which we characterized as theta band (4-7Hz) traveling waves. We found distinct patterns of theta traveling waves propagating across widespread cortical regions in the pre-stimulus period. Importantly, these wave patterns changed dynamically on a trial-by-trial basis. Critically, specific pre-stimulus traveling wave patterns significantly correlated with, and predicted, concurrently modeled trial-specific behavioral uncertainty (entropy). Specifically, one wave pattern across the lateral sensorimotor and auditory cortex consistently preceded high-entropy (low certainty) trials, while a distinct pattern preceded low-entropy (high certainty) trials. Furthermore, modeled belief states evolved dynamically as learning progressed. Our findings suggest the pre-stimulus period contains rich, dynamic network-level neural activity, which manifest as theta traveling waves, reflecting trial-by-trial fluctuations in an individual's internal belief state during learning. This notion is strengthened by a complementary analysis using Hidden Markov Models to identify discrete pre-stimulus states. Critically, while these states significantly predicted the subject’s upcoming behavioral choice, they did not predict trial-by-trial accuracy. This dissociation underscores that pre-stimulus activity reflects internal biases more than it dictates perceptual success, moving beyond traditional accuracy metrics. Together, understanding these pre-stimulus neural patterns and their link to subjective uncertainty provides a novel avenue for investigating neural mechanisms of individual learning variability, perceptual encoding, and cognitive performance.
Calli Smith, Aaron Nidiffer, Edmund Lalor and Elise Piazza
Topic areas: multisensory processes
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
How does interaction impact language processing? Prior research has investigated neural markers of language processing either during interaction or passive listening. However, no study has directly tested how neural processing might differ between the two situations in order to identify distinct neurophysiological mechanisms of interpersonal spoken interaction. To do so, we asked pairs of “interactors” to engage in a series of prompted (but unconstrained) conversations while we recorded EEG and audio. In separate sessions, we then recorded EEG from “observers” who listened to conversations from a previous pair of interactors. Cortical tracking of acoustic (spectrogram) and linguistic (lexical surprisal) features of the conversations was estimated using forward encoding models that predicted the EEG responses of each participant from the speech features. In preliminary data, interactors tracked linguistic features more closely than observers, while observers tracked acoustic features more closely than interactors. Across groups, acoustic features contributed more to models than linguistic features. Participants rated their own overall attentiveness similarly across the two groups, although interactors rated the conversations as higher quality (enjoyment, naturalness, connection, feeling “on the same page”). Early trends also suggest a complex relationship between quality ratings and neural measures of conversation tracking. Better tracking of linguistic features was associated with higher quality ratings in both groups, but for observers, better tracking of acoustic features predicted lower quality ratings. Future analyses will compare additional aspects of language processing across participant roles.
Kunpeng Yu, Hemant Srivastava, Justin Fine, Ben Hayden, Kit Jaspe, Nikolas A. Scarcelli, Hong Jiang, Jay Hennig and Matthew J. McGinley
Topic areas: auditory memory and cognition correlates of auditory behavior/perception hierarchical sensory organization neural coding
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
Temporal coherence (TC) between frequency channels is crucial for differentiating a target sound from its background. Attention plays an important role in TC encoding, as seen in 'cocktail party' scenarios, where individuals focus on a specific sound source to suppress background noise and improve perception. In a mouse behavioral task model, we previously found that task utility modulates attention-like mechanisms, enhancing detection of TC in tone clouds (see de Gee et al., 2022). However, how TC is encoded at the population level in auditory cortex, and how attentional state modulates this encoding, remain open questions. According to Krishnan et al., TC is encoded by integrating spectro-temporal features like continuity, pitch, and spectral grouping. Spectro-temporal receptive field (STRF) models have been widely used to describe such encoding in primary auditory cortex (A1). Building on this framework, we recorded A1 activity in mice performing a sustained attention task with fluctuating task utility. Using receptive field estimation tools developed by Berens’ lab, we identified neurons whose responses to tone clouds were well-predicted by STRFs. While many neurons were well-predicted by STRFs during tone cloud trials, their predictability decreased during TC stimuli, especially under high utility conditions. Linear discriminant analysis (LDA) showed that TC decoding accuracy increased with task utility but was primarily driven by a distinct population of non-STRF neurons. To explore this computationally, we trained a reinforcement learning RNN model on the TC task using spectrotemporal inputs. The model successfully distinguished tone clouds from coherent signals, and its units developed functional preferences for encoding either spectrotemporal features or coherence-related information, with the latter emerging predominantly in the deeper layer. These findings suggest that STRF-explainable neurons encode low-level features, whereas TC representation under attentional modulation involves neurons encoding more abstract properties. Ongoing work uses machine learning to characterize the feature space of non-STRF neurons and examine how their representations vary with task utility. de Gee, Jan Willem et al. "Mice regulate their attentional intensity and arousal to exploit increases in task utility." bioRxiv (2022). Krishnan, L., Elhilali, M., & Shamma, S. (2014). Segregating complex sound sources through temporal coherence. PLoS Computational Biology, 10(12), e1003985. Huang, Z., Ran, Y., Oesterle, J., Euler, T., & Berens, P. (2021). Estimating smooth and sparse neural receptive fields with a flexible spline basis. bioRxiv.
Lorenz Fiedler and Dorothea Wendt
Topic areas: brain processing of speech and language neural coding novel neurotechnologies
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
Auditory attentional control, including sustaining and switching attention to relevant sounds while ignoring irrelevant ones, is crucial for navigating busy auditory scenes. This ability can be compromised by hearing loss, which increases cognitive load under a limited cognitive capacity. We investigated whether manipulating the loudness of both relevant and irrelevant background sounds reveals a sweet spot for attentional control, both in behavior and pupil responses. This sweet spot would manifest as optimal behavioral efficiency and maximal pupil selectivity, defined as the difference between responses to relevant versus irrelevant background sounds. We recorded pupil size from 44 participants (ages 44-83 years; PTA: 7.5 – 64 dB HL) while they performed a dual task. The primary task involved sustained attention to an audiobook, and the secondary task required memorizing spoken numbers (preceded by a cueing sound) from a relevant location, with numbers from an irrelevant location also presented. The levels of the spoken numbers and the cueing sound were manipulated relative to the primary target (-27, -19, -11, -3 dB). Preliminary analysis reveals level-dependent pupil selectivity, peaking at the intermediate background-sound level of -11 dB, suggesting a sweet spot at the group level. Further analysis will explore the relationship between this sweet spot and behavioral performance, as well as cognitive tests of working memory, processing speed, and inhibitory control.
Riccardo Catena, Mohsen Alavash, Björn Herrmann and Jonas Obleser
Topic areas: neural coding thalamocortical circuitry and function
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
Theoretical models and experimental evidence argue that the balance between excitatory and inhibitory activity is crucial for optimal functioning of the brain. Many studies have attempted to measure “excitation-inhibition ratio” (E-I) non-invasively using EEG or fMRI. However, expression of E-I across the human cortex has remained understudied. Addressing the cortical heterogeneity of the E-I is important in studying whether and where the balance in neural dynamics is altered in healthy aging, under pharmacological interventions, or during tasks. Here, we address this question using resting state data from the Human Connectome Project and two previously established measures of E-I. Specifically, we ask (1) how estimates of E-I map across the entire cortex and (2) how these maps relate to receptor-density maps of neurotransmitters glutamate (NMDA-receptor) and GABA. Neuroimaging data were parcellated according to the Desikan-Killiany atlas with 68 cortical parcels. The first measure we used was a metric from complex systems dynamics, called the Hurst exponent (H). A recent study predicted a negative correlation between H and excitatory activity in fMRI BOLD signals (Trakoshis et al. 2020). More recently Fotiadis et al. (2023) showed that during resting state primary sensory-motor areas have higher H than other areas, which is associated with more inhibition. The second measure is derived from the parametric feedback-inhibition control model (pFIC) as described in Zhang et al. (2024). We mapped the cortical distribution of E-I derived from pFIC at rest and observed higher E-I in primary sensory-motor areas, in a specular way to the pattern shown by Fotiadis et al. (2023). We additionally calculated the H of the same fMRI time series and found a conflicting positive correlation with the E-I from the pFIC model. Further, we consulted PET-derived NMDA and GABA neurotransmitter maps (Hansen et al. 2022, Dukart et al. 2018) and correlated the E-I from the pFIC model with the NMDA and GABA receptor densities. Across cortical parcels, we observed only a moderate negative correlation between the cortical E-I and NMDA density, which is non-trivial considering the excitatory role of glutamate and its important role in driving GABAergic interneurons. These seeming inconsistencies deserve further attention but provide partial support for the results of Fotiadis et al. (2023). In sum, several E-I measurement methods (data- or model-driven) from non-invasive hemodynamic recordings are now available. However, currently, they provide partly inconsistent cortical E-I distributions, which calls for cross-validation across different E-I measures.
Lizabeth Romanski, Matthieu Fuchs, Yunshan Cai, Mark Diltz and Keshov Sharma
Topic areas: multisensory processes
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
When we watch actors in movies or individuals around us, we interpret actions and develop opinions of the character’s personality traits and social status. Brain networks that are activated during social communication and interaction include the ventral frontal lobe. In nonhuman primates, the ventrolateral prefrontal cortex (VLPFC), a proposed homologue of human inferior frontal gyrus, is a locus for the convergence of face and vocal social communication information. Recent studies showed that macaque observers attribute social status roles to conspecifics seen in videos, and that social status can be decoded from amygdala neuronal responses to videos of simulated hierarchical social interactions. In the current study we examined the encoding of social status (dominant, subordinate), identity and vocalization-expression in VLPFC neurons of observer macaques while they viewed audiovisual social interaction videos. The stimuli were created by positioning two separate, conspecific videos side-by-side, depicting two “actor” macaques interacting in a single location, with a dominant making aggressive vocalizations, while the adjacent video showed a subordinate monkey making submissive vocalizations, resulting in a single video of two monkeys “conversing” with one another. We performed neurophysiological recordings using microelectrode arrays (FMAs, Microprobes) implanted in the VLPFC of the observer macaques and tracked eye-movements while they attended audiovisual movies. First, separate videos of 3 conspecific actors making neutral vocalizations and gestures were shown for 4 consecutive days, followed by 4 days of the social interaction videos featuring pairs of the same “actor” monkeys from the neutral videos, then the neutral movies were repeated. Preliminary data includes ~280 single units recorded during 5 sessions of social interaction videos. Our data indicates that observer macaques look at the first vocalization movie (aggressive display) then the second vocalization movie (submissive display) with eye-movements targeting the eyes and the mouths of the “actor” monkeys when vocalizing, with VLPFC neural activity time-locked to the onset of the first video then the second video. We will ascertain if looking time is based on perceived social status and if social status can be extracted from VLPFC single-unit and population data. In addition, we will determine if social status, or perceived identity, is modified by experience, by comparing neural responses to the neutral videos before and after exposure to social interaction videos. Our experiments will provide novel information about the role of VLPFC in social communication.
Emily Dappen, Alexander Billig, Ariane Rhone, Mitchell Steinschneider, Matthew Banks and Kirill Nourski
Topic areas: auditory disorders correlates of auditory behavior/perception
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
Speech processing involves multiple hierarchically organized regions. Higher-order cortical areas involved in speech comprehension are sensitive to changes in level of consciousness. Delirium, an acute disorder of consciousness, is characterized by fluctuating symptoms that include inattention, disorganized thinking, confusion, emotional changes, hallucinations, hypo- or hyper-active behaviors, and impaired speech communication. This study examined cortical responses to conversational speech during delirium. Participants were adult neurosurgical patients who were diagnosed with delirium while undergoing intracranial electroencephalography (iEEG) monitoring for medically refractory epilepsy. Delirium assessments were verbally administered twice daily and following seizures. Audio and iEEG data were collected simultaneously during assessments. Suppression indices (SI) were calculated as the difference between the average broadband gamma (30-150 Hz) neural activity during the interviewer’s speech and participant’s own speech, divided by their sum. SI at each recording site were compared between delirium-negative and -positive conditions. Cortical processing was examined using a linear modeling approach (temporal response function, TRF). TRF output (model prediction accuracy) was compared between delirium-positive and -negative conditions for acoustic, sub-lexical, lexical, and semantic features of speech. Sites with a SI near zero in the delirium-negative condition often exhibited significantly higher or lower SI during delirium, reflecting a global imbalance in processing of self-generated speech. Auditory areas within superior temporal cortex, as well as middle temporal and inferior frontal cortices showed greater SI during delirium. Auditory core cortex, middle frontal gyrus, and sensorimotor cortex showed similar SI in delirium-positive and -negative conditions. TRF analysis revealed lower prediction accuracy in delirium for lexical and semantic speech features in frontal and parietal regions. Changes in SI and TRF model prediction accuracy may reflect delirium-related impairments impacting speech comprehension. SI changes during delirium may reflect impaired executive control and contribute to broader deficits in processing of self-related stimuli. Reduced TRF prediction accuracy suggests disruptions in speech processing beyond acoustic attributes. This study expands our knowledge of the impact of delirium on cortical auditory processing.
John Magnotti, Yue Zhang, Xiang Zhang, Zhengjia Wang, Yingjia Yu, Kathryn Davis, Sameer Sheth, Isaac Chen, Daniel Yoshor and Michael Beauchamp
Topic areas: hierarchical sensory organization neural coding thalamocortical circuitry and function
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
Human speech perception is multisensory, integrating auditory information from the talker's voice with visual information from the talker's face. BOLD fMRI studies have implicated the superior temporal gyrus (STG) in processing auditory speech and the superior temporal sulcus (STS) in integrating auditory and visual speech, but as an indirect hemodynamic measure, fMRI is limited in its ability to track the rapid neural computations underlying speech perception. We directly recorded neural activity in the STG and STS in 42 epilepsy patients implanted with stereoelectroencephalograpy (sEEG) electrodes. Patients identified single words presented in auditory, visual and audiovisual formats with and without added auditory noise. Seeing the talker's face provided a strong perceptual benefit, improving perception of noisy speech in every participant (42% average improvement). Neurally, a subpopulation of bimodal electrodes concentrated in mid-posterior STG and STS showed short latency responses to auditory speech (71 ms) and visual speech (109 ms). Significant multisensory enhancement was observed in STS bimodal electrodes: compared with auditory-only speech, the response latency for audiovisual speech was 40% faster and the response amplitude was 18% larger. In contrast, STG bimodal electrodes showed neither faster multisensory latencies nor significant response enhancement. Surprisingly, STS response latencies for audiovisual speech were significantly faster than those in the STG (50 ms vs. 64 ms). This suggests the possibility of a parallel pathway model in which the STG plays the primary role in auditory-only speech perception, while the STS takes the lead in audiovisual speech perception.
Lakshmi Narasimhan Govindarajan, Sagarika Alavilli and Josh McDermott
Topic areas: brain processing of speech and language subcortical auditory processing
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
Background Sensory inferences about the state of the world are made from ambiguous observations and are thus inevitably uncertain. Accurate estimation of uncertainty is likely critical to decisions about where and when to act, but little is known about how uncertainty is estimated for real-world perceptual problems. To investigate this issue, we developed a new class of stimulus-computable models to enable the representation of uncertainty. We optimized these models for sound localization and pitch estimation, and compared their uncertainty estimates to human confidence judgments obtained from experiments in which participants reported confidence on a trial-by-trial basis. Methods The model estimated parameters of a probability distribution over the variable to be estimated (location or fundamental frequency for the present work). The models were optimized to maximize the likelihood of ground truth labels for large datasets of natural sounds embedded in noise, utilizing binaural spatial renderings for localization and short speech or music excerpts for pitch estimation. When stimuli are ambiguous, the models should learn to produce wider or potentially even multi-modal distributions. To measure human confidence, we asked participants to make localization or pitch judgments and place monetary bets (1–5 cents) to indicate confidence. To simulate the same experiments on the models, we took the model’s confidence to be the entropy of the posterior distribution it estimated. Results Human confidence judgments varied with stimulus properties: lower bets were placed for peripheral locations and pure tones in localization, and for complex tones with higher-numbered harmonics or lower SNR in pitch. These variations in confidence across conditions and domains were largely mirrored by the model. Conclusions Humans have internal estimates of the uncertainty of their auditory percepts—both for sound location and pitch. These confidence judgments closely match the uncertainty estimates of models optimized for accurate performance in each domain, suggesting that human confidence is normatively appropriate. This modeling framework offers a general approach for investigating confidence across diverse perceptual domains.
Walker Gauthier, Ethan Hong, Noelle James and Benjamin Auerbach
Topic areas: correlates of auditory behavior/perception
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
Optimizing sensory decision-making requires the integration of uncertain sensory signals with prior expectations. Disambiguating these bottom-up and top-down contributions to sensory perception, however, remains a difficult challenge in neuroscience. Here we show that a probabilistic oddball detection task provides a useful framework for differentiating these perceptual components. By leveraging an inferential sensory task design, we demonstrate that rats can integrate information that is not directly accessible from the physical sensory stream to predict when a deviant stimulus will occur. Importantly, we show that the current state of a subject’s internal model can be observed via covert probe trials and that these models can be manipulated in a manner that is independent from overt task rules and incoming sensory stimuli. This paradigm thus compels rats to form an experimentally specified internal model that can be quantitatively derived from behavior. We further validate these findings by employing a hierarchical drift diffusion model, which reveals parameter changes that specifically correspond to both bottom-up and top-down processes. Modeling this behavior through reinforcement learning and active inference schemes, we further elucidate the relationship of reward and information optimization in inferential auditory decision making. These results align with current theoretical predictions from active inference frameworks, illustrating the utility of this task in parsing the principles of perception.
Thomas Harmon, Andrew Kim, Evelyn Hardin and Richard Mooney
Topic areas: cross-species comparisons neural coding subcortical auditory processing
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
Cochlear implants and AAV-based gene therapy have made it possible to restore hearing in congenitally deaf humans at different stages of life. However, the total absence of auditory experience prior to these interventions is likely to alter the connectivity of the auditory system in a manner that may impede hearing restoration. In fact, prior studies using conventional tracing methods in surgically-deafened rodents and congenitally deaf cats have revealed expanded connections to the auditory cortex from adjacent non-auditory cortical regions. However, whether these approaches can provide a complete brain-wide account of altered neuroanatomical connections to the auditory cortex is uncertain. Therefore, we used an intersectional viral and genetic approach to map the brainwide neuroanatomical connections to the primary auditory cortex of adult mice that were either congenitally deaf or equipped with normal hearing. Specifically, we injected retrograde viral vectors into the primary auditory cortex of adult Ai14 reporter mice that were either homozygous for a nullifying mutation to the TMC1 locus, rendering them congenitally deaf, or hemizygous for the mutation and had normal hearing throughout postnatal development. We used generalized multivariate linear regression to estimate the influence of hearing on presynaptic neuron counts, while accounting for other factors (e.g. transfection efficiency, age, sex) that could influence neuron labeling. In deaf mice, this analysis revealed expanded neuroanatomical connections from subregions of the orbitofrontal, ectorhinal, perirhinal, insular, somatosensory, and visual cortex and reduced connections from the basomedial amygdala and most thalamic nuclei. However, neuroanatomical connections from most other regions to the auditory cortex were indistinguishable between hearing and deaf mice. In summary, we found evidence of previously unknown changes to neuroanatomical connections to the auditory cortex of congenitally deaf mice, raising new considerations for efforts to restore full auditory function to congenitally deaf patients.
Ryan Calmus, Zsuzsanna Kocsis, Joel Berger, Hiroto Kawasaki, Timothy D. Griffiths, Matthew A. Howard and Christopher I. Petkov
Topic areas: cross-species comparisons multisensory processes subcortical auditory processing
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
There is substantial interest in identifying neuronal signals that carry traces of the auditory sensory past and predictions about the future. However, outside of animal models, we lack insights into how site-specific neuronal activity within the human mnemonic system carries information reflecting the relational coding of learned auditory sequences over temporal delays. We conducted an auditory statistical learning task with a cohort of 11 neurosurgery patients during intracranial monitoring of refractory epilepsy. Patients listened to perceptual sequences of 3 nonsense words containing a dependency between two sounds in each sequence. Words were drawn from sets (A, B and X), with regularities between pairs of relevant (A and B) sounds often separated in time by uninformative (X) sounds. Each A-B sound pairing thus formed either an adjacent or non-adjacent dependency. We first analyzed site-specific single-unit activity and local field potentials (LFPs) from auditory cortex, hippocampus, and frontal cortex using traditional methods, demonstrating engagement of fronto-temporal sites including the hippocampus in the processing of the auditory sequences. In addition to univariate analyses, novel multivariate decoding analyses applied to both single-unit and LFP responses revealed evidence of distinct sequence item representations: hippocampal high-frequency oscillations and single-unit activity suggested time-compressed replay occurs transiently before and after key sounds in the sequence, while frontal cortex results provided evidence of a stimulus-driven relational code that incorporates ordinally coded position and item identity. Building on these findings, we characterize a variety of single-unit responses that, in concert with our decoding analyses, provide evidence for a distributed relational code underlying prospective and retrospective auditory sequence item representation in the human hippocampus and frontal cortex. Our results reveal critical roles for the mnemonic system in transforming auditory events into mental structures, providing insights into the single-neuron and mass-neural contributions to the maintenance and replay of sensory sequences in the human brain.
Michael Malina and Ross S. Williamson
Topic areas: brain processing of speech and language thalamocortical circuitry and function
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
Perception is an active process in which primary sensory cortices are dynamically modulated by behavioral goals, environmental context, and internal state. Top-down corticocortical feedback is a key mechanism underlying this modulation, and prior studies have shown that individual feedback projections can shape sensory processing and influence behavior. Primary auditory cortex (ACtx) receives feedback from multiple higher-order cortical areas, including prefrontal, motor, and association cortices, and its responses vary markedly with behavioral context. To dissect the functional contributions of distinct top-down sources, we developed a clear-skull preparation enabling temporally precise, unilateral optogenetic inactivation of nine cortical areas within a single recording session. In awake mice, we used high-channel-count extracellular electrophysiology to record single-unit activity across the ACtx laminae during passive presentation of amplitude-modulated pure tones. On randomly interleaved trials, we inactivated each cortical target individually to assess the contribution of distinct feedback sources. This within-session design allowed for direct, trial-wise comparisons of how top-down inputs modulate both spontaneous and evoked activity at the single-neuron level. In the absence of sound, inactivation of prefrontal regions consistently reduced spontaneous firing rates in ACtx, whereas inactivation of motor and association areas had minimal effect, suggesting a dominant role for prefrontal feedback in setting baseline excitability. In contrast, during sound presentation, inactivation of posterior prefrontal, motor, and association regions enhanced tone-evoked responses, revealing a context-dependent shift in top-down influence. Standardized major axis regression showed that inactivation of higher-order regions broadly increased response magnitude, and cosine-angle differences revealed substantial changes in tuning shape. Notably, anterior prefrontal inactivation selectively suppressed responses at each neuron’s best frequency, indicating a specialized role in frequency-specific gain control. These findings demonstrate that higher-order cortical inputs modulate ACtx activity in functionally distinct ways, differentially influencing spontaneous and evoked activity in a context-dependent, region-specific manner.
Korey Sudana, Kameron Clayton and Daniel Polley
Topic areas: auditory memory and cognition brain processing of speech and language correlates of auditory behavior/perception neural coding
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
Sound features are encoded by the central auditory system through a combination of maps, metronomes, and meters—that is, which neurons are active, when spikes occur, and the overall rate of population activity, respectively. While most sound features are encoded via maps and metronomes, we recently showed that loudness perception is encoded in the auditory cortex using a simplified meter-based code that depends only on the population spiking rate, irrespective of spike timing or neuron identity (Clayton et al., 2024). We found that parvalbumin-expressing inhibitory neurons (PVNs) in the auditory cortex act as a volume knob on this meter, shifting the neural and behavioral boundary between “soft” and “loud” by up to 20 dB. Here, we used 2-photon calcium imaging to record from layer 2/3 excitatory neurons in the primary auditory cortex (A1) of five well-trained mice (N=16 sessions; 4,514 neurons). Imaging was conducted during a two-alternative forced-choice (2AFC) loudness classification task and again during a passive listening session immediately afterward. Passive recordings replicated our earlier electrophysiology findings, revealing heterogeneous sound level tuning across neurons that summed to a linearly increasing population code. Mean population activity from a single trial could classify stimulus sound level within ~3 dB. However, recordings from the same neurons during the behavioral task showed that in ~50% of sessions, this relationship inverted: population activity decreased with increasing sound level. Focusing on stimuli near the perceptual boundary, we found that behavioral choices could also be decoded from the population activity rate, but the direction of this relationship (i.e., more or less activity when mice reported “loud”) was evenly split across sessions. The key variable that explained this inversion was the spatial mapping between loud vs. soft and left vs. right lick spouts, with stimuli mapped to the contralateral spout eliciting the greater population responses. These findings caution against assuming that neural correlates of perception are best—or only—revealed during active behavior. They also underscore the critical importance of spatial counterbalancing in AFC designs. A1 qualifies as a higher-order cortical area in that its activity reflects sound features as well as cognitive and motor variables. Disentangling cognitive and motor influences to isolate the neural encoding of low-level perceptual features is essential but is not trivial, particularly when using slower calcium-based signals to monitor neural activity.
Sagarika Alavilli, Lakshmi Narasimhan Govindarajan and Josh McDermott
Topic areas: correlates of auditory behavior/perception hierarchical sensory organization neural coding
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
Background Where was my phone ringing from? Humans have some ability to discern both what sounds are present in a scene and where sounds are located. However, little is known about how identity and location are jointly represented (“bound" together), especially in multi-source scenes. In particular, it remains unclear whether humans make auditory binding errors in which source identities and locations are individually represented accurately but then mis-associated. Methods We probed for binding errors using an auditory search task. Participants listened to auditory scenes composed of between 1 and 5 natural sounds (e.g., applause, fireworks, laughter). Sounds were played from a speaker array, with sources being presented a minimum of 20 degrees apart. Participants were cued with a target sound either before or after the scene and were instructed to report the location of the target. We measured binding errors as errors in which the reported source location was close to the location of one of the other sources in the scene. To assess whether these errors could be explained without binding-specific failures, we simulated a null model that made errors due to guessing (as if the target source was not heard) or noise about the true source location. Results On average, localization error increased with the number of sources in the scene, and was higher in the post-cue compared to the pre-cue condition. In the post-cue conditions with 4 or 5 sources, participants made substantially more binding errors than the null model, suggesting that some such errors are caused by a failure to correctly associate sound identity with spatial location. By contrast, binding errors in the pre-cue conditions were no greater than predicted by the null model. Conclusions Humans maintain a joint representation of sound identity and spatial location for auditory scenes, but this representation becomes susceptible to errors in large scenes. In conditions without directed attention, some such errors appear to be due to binding failures, suggesting a limitation on auditory feature integration.
Diya Yi, Misako Komatsu and Joji Tsunada
Topic areas: neuroethology and communication
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
Vocal communication involves a series of cognitive processes, which can be broadly categorized into three components: perceiving others’ vocalizations; deciding whether and how to respond, encompassing decision-making and vocal planning; and producing vocal motor outputs. These processes must work harmoniously, with integration and bridging between components being crucial for effective communication. Previous research has typically focused on specific brain regions or isolated cognitive functions, often lacking a holistic perspective of macro-scale, whole-cortical dynamics and their role in the complete communication process. Therefore, although the cortical areas associated with each cognitive component have been localized in humans, the macro-scale cortical dynamics underlying the integration of these cognitive processes remain unknown. Building on recent findings linking macro-scale cortical dynamics to behavioral performance, we hypothesized that traveling wave like cross-areal interactions play a role in integrating the three communicative components. To test this hypothesis, we recorded whole-cortical activity using epidural electrocorticography (ECoG) while subject marmosets vocally interacted with partners. We found theta-band activation in several cortical areas, including the parietal and auditory cortices, while listening to partner’s calls. This activity was further modulated depending on whether the subjects engaged in vocal interactions, potentially representing the integration of sensory processing with decision-making and vocal motor preparation. Given the widespread nature of this modulation, we next characterized whole-brain activity patterns by employing a novel analytical method, Weakly Orthogonal Conjugate Contrast Analysis (WOCCA). This analysis revealed that cortical activity could be decomposed into two distinct traveling wave like propagation patterns: a rotational wave corresponding to vocal motor preparation and a translational wave to decision-making. Notably, the energy of the translational wave during listening to partner’s calls correlated with the vocal production-induced suppression of high-gamma-band activity, particularly in the prefrontal and auditory cortices. As vocalization-induced suppression is believed to reflect sensory prediction, the translational wave may propagate specific decision-related or acoustic information associated with subsequent vocal production to local cortical areas. These findings suggest that the brain orchestrates the sequential cognitive processes underlying vocal communication through macro-scale traveling waves.
Anthony Kim, Abhinav Uppal and Gert Cauwenberghs
Topic areas: neural coding neuroethology and communication subcortical auditory processing
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
Pitch errors offer a controlled window into auditory surprise, revealing neural mechanisms underlying music perception. Although not central to everyday listening—which often involves the buildup and resolution of musical expectations—pitch errors serve as a useful paradigm for studying the brain’s real-time response to auditory irregularities. Prior EEG studies, such as Maidhof et al. (2013) and Ruiz et al. (2011), have demonstrated that beta-band oscillations in musicians reflect the detection and correction of self-generated pitch deviations during piano performance. However, the neural dynamics involved in processing pitch errors embedded in external stimuli during passive listening, particularly with complex musical pitch errors, are less characterized. This study investigated brain responses to pitch deviations in monophonic violin excerpts of nursery rhymes (e.g., "Mary Had a Little Lamb") using a portable dry-electrode EEG system (CGX Quick-32r). A trained musician (N=1) passively listened to seven 10-second excerpts with eyes closed. Pitch errors (0-6 semitones, sharp and flat) were varied across three conditions: no errors (true control), medium errors (2-4 semitones), and high errors (5-6 semitones), each repeated three times per excerpt (21 trials/condition). EEG data from 32 channels were analyzed at T7, T8, and Cz for beta-band (15-30 Hz) activity after preprocessing (1 Hz high-pass filter, 60 Hz noise removal, artifact correction). Power Spectral Density (PSD) analysis showed a 2.4 dB beta power increase at T8 from no-error to high-error conditions. Event-Related Desynchronization/Synchronization (ERDS) revealed desynchronization at T7 (~250 ms post-error), synchronization at T8, and synchronization at Cz (~400 ms post-error), suggesting three neural processes: (i) error detection (left temporal), (ii) severity evaluation (right temporal), and (iii) predictive adjustment (central).
Nicholas Murphy and Stephen Lomber
Topic areas: auditory disorders subcortical auditory processing
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
The brain exhibits a remarkable ability to reorganize following abnormal sensory experience. It remains unclear how intact visual areas may be modified to compensate for the loss of auditory input following auditory deprivation. Specifically, it is unknown how the lateral geniculate nucleus (LGN) develops in the absence of competing auditory input. In the absence of auditory input, exuberant projections to the LGN which are typically pruned across development are thought to be preserved to allow for enhanced visual processing. Y cells in the LGN relay motion information and are a promising substrate for investigating compensatory plasticity in the thalamus. Immunohistochemical labelling of Y cells was performed using SMI-32, an antibody targeting the heavy chain of non-phosphorylated neurofilament protein. Unbiased systematic random sampling was employed to quantify Y cell morphology in the LGN of hearing (N = 2 hemispheres) and perinatally ototoxically deafened (N = 2 hemispheres) cats (F = 1, M = 1). Branched structure analysis was performed using Neurolucida Explorer (Williston, VT, USA) to assess LGN layer surface area, soma volume, dendrite number, length and branch order. Differences between hearing and deaf subjects were assessed using Mann-Whitney U tests. Within conditions, differences across layers were assessed using Kruskal-Wallis tests. No differences in soma volume were appreciated in any layer however, C-Complex soma volume was significantly lower than those in layers A and A1 (p < 0.001). The number of dendrites in layers A and C-Complex between hearing and deaf subjects was equal, but deaf subjects revealed fewer dendrites in layer A1 compared to hearing (p = 0.0026). In both conditions, neurons in layers A and A1 had a significant increase in the number dendrites per neuron compared to the C-Complex (p < 0.0001). Dendrites were significantly longer in deaf subjects in layer A (p = 0.0031) and layer A1 (p = 0.00043), but not in the C-Complex. In all cases, dendrites were longer in layers A and A1 compared to the C-Complex (p < 0.001). Deaf subjects revealed an increase in higher-order branches in layers A and A1 (p < 0.0001), but not in the C-Complex. In both conditions, neurons were more branched in layers A and A1 compared to C-Complex (p < 0.001). Overall, Y cells in layers A and A1 of deaf subjects were significantly longer and more branched compared to those in hearing subjects. No morphological differences were observed in the C-Complex. These preliminary data suggest that small-scale morphological changes occur in the early visual system as a consequence of auditory deprivation. Data collection continues for 8 subjects.
Haotian Ma, Zhengjia Wang, Xiang Zhang, John Magnotti and Michael Beauchamp
Topic areas: neural coding novel neurotechnologies
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
The McGurk effect is a multisensory illusion induced by pairing certain incongruent auditory and visual syllables, such as auditory "ba" with visual "ga" (AbaVga). A deeper understanding of the McGurk effect has been hampered by a paucity of computational models. Recent advances in artificial neural networks (ANNs) have produced ANNs with human-like language abilities, such as AVHuBERT (Audiovisiual Hidden-unit Bidirectional Encoder Representations from Transformers), which transcribes speech into text using both heard (auditory) speech and visual information from the face of the talker. AVHuBERT was trained on congruent audiovisual speech, so we set out to examine whether AVHuBERT also experienced the McGurk effect when presented with incongruent audiovisual speech. When humans are presented with AbaVga, there is substantial variability in the reported percepts. The most frequent responses are "da" (the illusory fusion percept) and "ba" (the auditory component of the stimulus). For the same McGurk stimulus, different individuals report different percepts, and for different McGurk stimuli, different individuals report different percepts. Averaged across 12 different McGurk stimuli, 165 humans reported an average of 37% McGurk percepts, with a range from 0% to 100%. To model human variability, we created 165 variants of the pre-trained AVHuBERT ANN by adding Gaussian noise to the learned weights in the transformer layers. Then, we presented each variant with the same 12 McGurk stimuli presented to human observers. For each stimulus, a variant provided 20 possible transcriptions, ranked by their likelihood, and the initial syllable of each transcription was used to calculate the percentage of "ba" or "da" responses for that stimulus for that variant. Across variants, the mean McGurk percentage was 39%. This was not significantly different from the human percentage (p = 0.3) using an unpaired t-test. AVHuBERT variants, like human participants, showed high levels of variability in McGurk responses, although the range was compressed (15% to 70%, vs. 0% to 100% for humans). There was variability across the 12 McGurk stimuli for both human participants (17% for the weakest stimulus to 58% for the strongest stimulus, averaged across participants) and AVHuBERT variants (19% for the weakest stimulus to 64% for the strongest stimulus, averaged across variants) but the correlation in stimulus strength between participants and variants was not strong (r=0.34, p = 0.28).
Adele Simon, Maria Chait and Jennifer Linden
Topic areas: brain processing of speech and language correlates of auditory behavior/perception
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
The amplitude envelope of speech is crucial for comprehension, and cortical activity in the theta-delta bands has been shown to track this envelope. The hypothesis underlying this tracking is that it reflects the cumulative response to transient events in the speech signal, such as sound onsets (rapid increases in sound amplitude) and offsets (rapid decreases in sound amplitude). While onset tracking has been widely studied, offset tracking remains under-explored. Here we analyzed the roles of onsets and offsets in cortical speech tracking using two datasets from different laboratories, consisting of continuous EEG recordings from British (n=18) [1] and Danish (n=22) [2] participants listening to audiobooks. We used an onset/offset model based on thalamic responses to sound transients in mice [3] to extract separately the onsets and offsets present in continuous speech. To assess the contributions of these transient sound events to the cortical response, we estimated Temporal Response Functions, which are weights for a linear model that maps continuous stimuli to neural activity. We trained models using the representations of onsets, offsets, or a combination of both, and compared the performance of the models by correlating the measured EEG with the EEG predicted by the linear models. All models -- those trained on onsets, offsets, and the combined onset-offset representation -- demonstrated significant performance above chance levels (with p
Jimmy Dion, Ian Stevenson and Monty Escabí
Topic areas: correlates of auditory behavior/perception multisensory processes neuroethology and communication subcortical auditory processing
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
Separating target sounds from background noise is challenging for people with normal hearing, and especially challenging for people with hearing impairments as well as artificial speech recognition systems. Why different natural backgrounds mask the perception of speech to varying degrees, and how the brain separates speech from background noise is not yet fully understood. Here, we recorded multi-unit activity from the inferior colliculus of head-fixed unanesthetized rabbits passively listening to speech mixed with eleven natural backgrounds (speech babble, bird babble, water, fire, city noise, etc.). Speech was mixed with each original (OR) background at multiple acoustic signal-to-noise ratios (aSNRs). We also presented speech mixed with spectrum-matched (SM) backgrounds that retain the original spectra but have whitened modulations, or modulation-matched (MM) variants that retain the original modulations but have whitened spectra. Using a cross-spectrum shuffling procedure, we separated the foreground- from the background-driven neural responses. The neural signal-to-noise ratio (nSNR) was then estimated and used as a metric of speech coding fidelity. We find that nSNR depends on background category and statistics. Compared to OR backgrounds, nSNR can increase or decrease for SM and MM backgrounds in a category-dependent manner. This suggests that the original spectra (OR vs. MM) or modulations (OR vs. SM) improve or worsen the neural representation of speech. Interestingly, the impact of background category and statistics was largely independent of aSNR, since increasing aSNR amplified nSNR approximately linearly. We also demonstrate how the background spectrum and modulation statistics affect neural encoding of speech at temporal modulations below 10 Hz, which overlap and potentially mask temporal fluctuations for phonemes, words, and syllables. Spectrum and modulation masking effects were segregated by background category, and a model of the cochlea and auditory midbrain showed similar segregation. Finally, human perceptual tasks using the same backgrounds suggest that nSNR correlates with speech recognition accuracy. Collectively, the results posit that spectrum and modulation statistics critical to speech masking are reflected in the population activity of the inferior colliculus. These neural response statistics reflect the coding fidelity for speech and likely contribute to downstream signaling underlying speech perception in real-world noise. Funding: R01DC020097
Jenna Blain, Ian Stevenson and Monty Escabi
Topic areas: auditory memory and cognition brain processing of speech and language correlates of auditory behavior/perception
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
Spectro-temporal receptive fields (STRFs) are used in auditory neuroscience to model the time-frequency sensitivity of auditory neurons. Often, STRFs are derived using unbiased synthetic stimuli, such as dynamic ripples, which can easily be estimated using spike-triggered averaging. However, with natural sounds, decorrelation and regularization techniques are needed to remove embedded stimulus correlations that distort the estimated STRFs. Furthermore, nonlinearities and non-stationarities make it difficult to predict neural responses to natural sounds. Neural recordings were obtained from the inferior colliculus of unanesthetized rabbits in response to natural sounds and dynamic moving ripple (DMR). We developed a model-based approach for deriving auditory STRFs and predicting single trial spike trains to these sounds. The model consists of a first and second order nine parameter Gabor STRF (gSTRF; Qiu et al. 2003), accounting for the neuron’s linear and non-linear spectro-temporal integration. A four-parameter nonlinear integrate-and-fire (IF) compartment incorporates intrinsic noise, cell membrane integration, and nonlinear thresholding to generate simulated spikes, while a six-parameter adaptative STRF dynamically adjusts the intracellular current to account for slow fluctuations in firing rate. Bayesian optimization was used to fit neural data and derive optimal model parameters. We compared optimal gSTRFs to other approaches such as a generalized linear model. STRFs derived via regression were spectrally smeared, indicating that stimulus correlations were not effectively removed. In comparison, the gSTRF was compact and provided biologically feasible estimations of parameters, such as the neuron’s best frequency and temporal modulation. The adaptive STRF improves predictions of firing changes across sounds not captured by the gSTRF. Furthermore, unlike conventional receptive field models, which generate continuous outputs, the model produces and predicts spiking activity with spike timing precision comparable to IC neurons (between ~0.5 -5 ms). Simulations were also performed where the “ground truth” STRF and spiking activity were known a priori. We demonstrate that the gSTRF converges to the original simulation parameters and replicates the spiking activity from the original simulations. This approach allows one to derive auditory STRFs and predict neural spiking activity to natural sounds using functionally interpretable basis functions and spiking model computations. The small number of parameters make exploration of nonlinear and nonstationary effects due to natural sound statistics more feasible.
Alex Clonan, Ian Stevenson and Monty Escabi
Topic areas: auditory memory and cognition correlates of auditory behavior/perception hierarchical sensory organization neural coding
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
Humans excel at recognizing speech in natural auditory scenes despite large variation in speakers’ voice quality, intonation, and pitch. Previous work has found that different types of background sounds and their spectrotemporal statistics can improve or worsen speech recognition accuracy. However, how foreground speech features, such as fundamental frequency or formants, influence speech recognition accuracy is not well understood. Here we explore how different speech articulatory features interact with natural and perturbed background sounds to influence speech recognition abilities. Participants (n=10) were tasked with recognizing digit sequences (e.g., 0-9-1, 2-5-7) generated by adult and adolescent male and female talkers in various natural background sounds (e.g., water, babble, construction noise etc.; SNR= -9 dB) and perturbed background variants with distorted spectrum or modulation statistics. We found that different spoken digits had different accuracies depending on both the background sound statistics and the speaker’s voice. To explore these speech driven effects, we compared recognition abilities across the fundamental (F0), first formant (F1), and second formant (F2) frequencies of the foreground. Using a generalized linear mixed effects model, we found substantial, statistically significant trends with digit identity, F0, F1, and F2, on recognition accuracy, in addition to background-driven effects. To characterize the unique interactions between the foreground and background sound statistical features, we also used a hierarchical model of the auditory periphery and auditory midbrain, paired with generalized perceptual regression (GPR). GPR models perceptual judgments of single listeners with the neural-acoustic model, and allows perceptual outcomes for arbitrary speech-foreground combinations to be predicted using auditory features. In our data, we find that this sound-computable model captures nearly 76% of the perceptual variance across background and digits interactions, and nearly 88% across background, F0, F1 and F2 interactions. The results identify foreground features that influence speech recognition in natural environmental noise and identify auditory model computations that account for and may underlie perceptual outcomes.
Xindong Song, Dan Wang and He Wang
Topic areas: novel neurotechnologies
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
The classic model of the primate auditory cortex (Hackett & Kaas 2000, Kaas, 2011) proposed a three-tier functional hierarchy along the medial-lateral (or dorsal-ventral) axis, consisting of the core, belt, and parabelt areas, with tonotopic gradients only observed in a few subregions such as A1, R, RT, AL, and ML. Using our recently developed through-skull imaging technique, we performed functional mapping across the entire superior temporal gyrus (STG) in awake marmosets, including the classic auditory cortex and its adjacent regions extending rostrally to the temporal pole. The results reveal that: (1) Besides the major tonotopic gradient in A1 that spanning from the very low to the very high frequency ends of the hearing range of the species, there are another two gradients that also represent the entire frequency range, one locates from RT (low) to RPB (high), the other locates rostrally beyond the classical auditory cortex and near the temporal polar area. (2) Comparative responses across these three gradients resemble the medial-to-lateral three-tier functional hierarchy, supporting a progressive functional hierarchy from caudal to rostral regions; (3) Measurements of pitch-sensitive areas also support a hierarchical structure composed of at least two discrete functional modules along the caudal-to-rostral direction; (4) Measurements of conspecific vocalization-sensitive areas similarly support a vocal processing system consisting of at least three discrete functional modules from caudal to rostral; (5) Additional tests using natural sounds show that areas closer to the temporal pole tend to extract higher-order auditory information. Together, these findings support an updated model of the functional organization of the primate auditory areas along the STG, incorporating a progressive functional hierarchy with at least three levels along the caudal-to-rostral axis.
Akhil Bandreddi, Dana Boebinger, David Skrill, Kirill Nourski, Matthew Howard, Christopher Garcia, Thomas Wychowski, Webster Pilcher and Sam Norman-Haignere
Topic areas: neuroethology and communication
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
Complex natural sounds such as speech contain many different sources of information, but recognizing these distinct information sources is computationally challenging because they are highly entangled in the acoustic waveform that reaches the ear. For example, variation in the acoustic attributes of different talkers makes it challenging to recognize the identity of a word, while variation in the acoustics of different words makes it challenging to recognize talker identity. How does the auditory system disentangle word identity from talker identity in auditory cortex? While disentanglement has been extensively studied in the visual cortex, less is known about how it is accomplished in the human auditory cortex. One hypothesis is that specific brain regions specialize for coding either word identity or talker identity. Alternatively, word identity and talker identity may be represented by distinct dimensions of the neural code at the population level instead of specific regions. Distinguishing between these two hypotheses has been challenging in part due to the coarse resolution of non-invasive neuroimaging methods such as fMRI. To address this question, we measured neural responses to a diverse set of 338 words spoken by 32 different talkers using spatiotemporally precise intracranial recordings from the auditory cortices of neurosurgical patients undergoing chronic invasive monitoring for medically intractable epilepsy. We developed a simple set of model-free experimental metrics for quantifying representational disentangling of word and talker identity, both within individual electrodes and brain regions, as well as across different dimensions of the neural population response. We observed individual electrodes in speech-selective regions of non-primary auditory cortex that show a representation of words that is partially robust to acoustic variation in talker identity, but no electrodes or brain regions showed a robust representation of talker identity. However, at the population level, we observed distinct dimensions of the neural response that nearly exclusively coded either words or talker identity. These results suggest that while there is partial specialization for talker-robust word identity in localized brain regions. Robust disentangling is accomplished at the population level with distinct representations of words and talker identity mapped to distinct dimensions of the neural code for speech.
Kirill Nourski, Mitchell Steinschneider, Ariane Rhone, Alexander Billig, Emily Dappen, Ingrid Johnsrude and Matthew Howard
Topic areas: brain processing of speech and language
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
Cochlear implants (CIs) are the treatment of choice for severe to profound hearing loss. Despite progress in CI technology, there is considerable variability in outcomes. The function and plasticity of central auditory pathways contribute to this variability. While assessing cortical processing in CI users is methodologically difficult, spectrally degraded sounds presented to normal-hearing listeners can approximate the CI’s input to the central auditory system. A previous intracranial electroencephalography (iEEG) study used spectrally degraded (noise-vocoded) speech tokens in a phonetic behavioral task (Nourski et al., 2024, Front Hum Neurosci 17:1334742). Differential responses to clear and vocoded stimuli could present as a “clear-preferred” or a less common “vocoded-preferred” pattern. This study sought to further characterize the vocoded-preferred pattern using clear and vocoded sentences. Participants were adult neurosurgical epilepsy patients undergoing chronic iEEG monitoring. Speech sentences (1.3-3.7 s duration) were degraded using a 3-band noise vocoder (Billig, Herrmann et al., 2019, J Neurosci 39:8679-89). Cortical activity was recorded using depth and subdural electrodes. Electrode coverage included superior temporal plane, superior temporal gyrus, dorsal and ventral auditory-related areas, prefrontal, and sensorimotor cortex. Analysis of iEEG data focused on event-related band power (ERBP) in canonical bands (theta through high gamma). Differences between responses to clear and vocoded sentences were established by cluster-based permutation tests. Differential responses to clear and vocoded sentences typically manifested as ERBP increases in high and low gamma bands and decreases in beta, alpha and theta bands. These responses could persist throughout the sentence duration. The vocoded-preferred pattern was less common than clear-preferred. Vocoded-preferred responses occurred bilaterally in the superior temporal plane, to a lesser extent in posterior regions of the superior and middle temporal gyri and inferior parietal cortex and were virtually absent in prefrontal cortex. The results emphasize the role of auditory cortex and the dorsal stream in processing degraded speech. Clear-preferred responses are likely driven by spectral complexity and intelligibility. Vocoded-preferred responses may reflect processing demands associated with challenging listening conditions. Cortical regions that are differentially activated by clear and vocoded speech may have diagnostic and prognostic utility and present potential targets for neuromodulation-based CI rehabilitation strategies.
Sunreeta Bhattacharya and Ross S. Williamson
Topic areas: cross-species comparisons novel neurotechnologies
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
While model-free associative learning allows animals to rapidly link stimuli with outcomes, flexible goal-directed behavior in dynamic environments requires processing uncertainty and the ability to generalize associations. These capabilities emerge from inferring a model-based causal representation of task states (belief states). Belief states are represented and updated in the orbitofrontal cortex (OFC), which encodes value-tagged cognitive maps. However, how these representations are updated through learning from ambiguous auditory cues remains poorly understood. To investigate this, we developed a novel closed-loop, auditory-guided dynamic foraging task in mice, designed to probe interactions between OFC and auditory cortex (ACtx) during belief updates in inference-based decision making. The task includes two phases: in the inference phase, mice learn solely from reward outcomes; in the auditory phase, they learn a cue-target mapping. In a linear track with speakers and lick spouts at left (L) and right (R) ports, mice initiate trials by maintaining position in a central location to trigger onset of an auditory cue (identical from both speakers), followed by a fixed delay after which reward is available at a target with a high probability. In the inference phase, the auditory cue (8 Hz, sAM noise) is non-informative, and the hidden target (L/R) reverses probabilistically after a correct lick. This probabilistic structure is formalized as a partially observable Markov decision process (POMDP), and prior work shows it engages the OFC during optimal inference. In the auditory-guided phase, the cue identity (2 Hz and 32 Hz, sAM noise) is perfectly informative about the target, rendering the reward state fully observable given the cue. Mice trained on the inference task show choice behavior consistent with belief states computed by an ideal observer, consistent with a model-based strategy that updates using state prediction errors. In the auditory-guided phase, trained mice perform with high accuracy and movement trajectories reveal signatures of evidence accumulation and decision confidence. Notably, 20-30% of trials are “early trials”, where mice violate the post-cue delay and receive no reward. Accuracy on early trials improves with training but plateaus below completed trials, suggesting a dissociation between task acquisition and expression. Using Bayesian Q-learning, we show that uncertainty-guided exploration accounts for these learning dynamics. Ongoing experiments aim to directly compare model-based and model-free learning driven by auditory cues, enabling mechanistic dissection of belief-state updating in OFC-AC circuits.
Baishen Liang, Aaron Earle-Richardson, Gerald Grant, Muhammad Zafar, Birgit Frauscher, Derek Southwell, Gregory Hickok and Gregory Cogan
Topic areas: auditory memory and cognition correlates of auditory behavior/perception hierarchical sensory organization
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
Conversational language relies on the transient maintenance of verbal working memory (vWM), yet the precise neural mechanisms supporting vWM remain elusive. To address this gap, in this study, we recorded and analyzed neural signals from patients implanted with stereoelectroencephalography (sEEG) electrodes performing a lexical repetition task with a vWM delay. 37 patients (mean age = 31, 21 female) with electrodes implanted as part of their surgical epilepsy evaluation were recruited in this study. In each trial, patients heard a word or nonword (e.g., “bacon”, “valuk”), and repeated it aloud after a visual Go cue following a randomized delay. As a control analysis, 18 of the recruited patients completed another lexical repetition task with the same stimuli but without delay. Local neural engagement was identified as significant increases in high-gamma power (HG, 70–150 Hz) relative to a pre-trial baseline (permutation t-test, p < .05; time cluster-based correction, p < .05). We found Delay electrodes (N = 1009) with significant HG responses during vWM delay. Within the Delay electrodes, 33.11% (N = 402) also had auditory responses (Auditory delay electrodes), 20.51% (N = 249) had motor responses (Motor delay electrodes), and 38.06% (N = 462) had both responses (Sensory-motor delay electrodes). We also found electrodes with only delay activity (Delay-only electrodes, 7.33%, N = 89). Moreover, we analyzed the subset of patients with the control task and found that both Delay electrodes with auditory (79.22%) and motor (67.67%) responses remained active even without delay, but the Delay-only electrodes mostly became silent (78.57%). These results indicate that vWM activity largely stems from general sensory-motor speech processing, even without an explicit vWM task, but vWM-specific activity also exists. Motor and Auditory delay electrodes were distributed in the dorsal (frontoparietal) and ventral (temporal) streams of speech processing, respectively, while the Sensory-motor delay electrodes were spatially adjacent to both electrode types, spanning both processing streams. Delay-only electrodes were distributed in the frontoparietal network subserving executive functions. These results suggest that the general sensory-motor neural activity supports vWM maintenance by linking receptive and generative speech processing, along with a vWM-specific executive module. Taken together, we found that vWM is supported by general sensory-motor speech processing neural mechanisms with potential vWM-specific control processes.
Molly Shallow, Aldis Weible and Michael Wehr
Topic areas:
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
Producing adaptive natural behavior requires the integration of information from multiple sensory systems in the brain along with information about the organism’s current motivational state; yet it remains unclear how this integration of information occurs between regions. Traditionally, studies of sensory encoding have focused on a single brain region or sensory system in isolation, using artificial and reduced behavioral tasks. Understanding how the brain accomplishes this during real-world operation requires natural behavioral paradigms and investigation of key brain regions that might be involved. Prey capture in mice is one such complex natural behavior well-suited to reveal the dynamic interactions of motivation and sensory information. Here we studied the interactions between the motivational region zona incerta (ZI) and the sensory region auditory cortex (AC). Previous studies have suggested that AC is critical for auditory scene analysis, which is likely necessary for auditory prey capture. The subthalamic structure ZI is not directly involved in auditory processing, but is involved in motivation and reinforcement, and has been shown to promote prey capture. In addition, ZI sends direct inhibitory projections to much of neocortex, in particular to several areas of AC. We therefore wondered if motivational signals provided by the ZI-AC projections might enhance the processing of salient sounds, such as the prey’s movement, during nocturnal hunting. To test the impact of these ZI-AC projections, we first used a head-fixed prep with high-density silicon probes to gain insight into the effects of activation of the projections across cortical layers. For these experiments, mice expressing ChR2 in GAD2+ projections were presented with pure tones and white noise while superficial optic fibers were used for photoactivation. These recordings showed that activation of ZI projections modestly enhanced sound responses of a subset of cells in a frequency-specific manner. This enhancement suggests that integration of motivation and sensory information may involve the operation of a disinhibitory circuit in auditory cortex. To understand the role of these projections during prey capture, we used tetrode arrays to record from AC while mice performed auditory prey capture and, on some trials, optogenetically activated the terminals. Neuronal responses to pure tones and white noise showed modest enhancement of sound responses, similar to that seen with silicon probes in head-fixed mice. We are currently investigating the effect of optogenetic stimulation on the behavior during prey capture and on spiking of cells in auditory cortex.
Yaneri Ayala, Hugo Caffaratti, Ryan Calmus, Joel Berger, David Christianson, Ariane Rhone, Bob McMurray, Federico De Martino, Essa Yacoub, Vincent Magnotta, Lucia Melloni, Taufik Valiante, Kamil Uludag, Timothy Griffiths, Mario Zanaty, Hiroto Kawasaki, Snehajyoti Chatterjee, Ted Abel, Matthew Howard III, Jeremy Greenlee and Christopher Petkov
Topic areas: auditory memory and cognition brain processing of speech and language hierarchical sensory organization neural coding
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
Cells distributed across the six laminae in the mammalian neocortex underpin its information flow and circuit functions. However, it is unknown how single neuron function and laminar information flow in the human brain varies across cortical layers during complex auditory behavior. We tested the hypothesis of a neuronal entanglement for auditory working memory (WM) and language across cortical layers in human prefrontal and temporal cortex. We obtained laminar array recordings from dorsolateral prefrontal cortex (DLPFC) or superior temporal gyrus during 3 neurosurgery patient treatment procedures. During the awake intracranial procedures in the operating room, patients conducted a combined language and WM task requiring them to alphabetize 2-3 heard words during a WM delay. Trial-by-trial, the patients alphabetized the words into or out of a grammatical construction. Cortical depth-dependent laminar fMRI (0.8 mm isotropic voxel size), capable of resolving sets of layers and broader system interactions, was also obtained with the patients preoperatively as they conducted the same task. In characterizing single neuron responses across the cortical layers, we observed that most DLPFC neuronal firing rates decreased as the words were presented to the patient, increased during the WM delay period, and increased yet again during the patient’s vocal production response. Neurons with WM-specific effects were distributed in superficial and deep layers, and we studied the state-space trajectories of the neuronal populations. Language grammatical effects engaged, albeit weakly, both superficial and deeper layers of DLPFC. Information flow via local field potentials identified laminar circuit motifs engaged during the listening, WM and verbal response phases of the task. The laminar fMRI results recapitulated several of the laminar array recording findings, and showed overlapping language and WM clusters in a network of frontotemporal brain regions including DLPFC. Finally, in a separate cohort of epilepsy and tumor patients, samples from tissue that required clinical resection were obtained prior to and after task performance by the patients during their operating room procedure. The samples were analyzed for single-nuclear multi-omics (RNA+ATAC sequencing), and we studied differences in early gene expression across the pre- and post-task samples to identify the cellular elements influenced by the behavioral experience. The results provide multi-level insights from cells to systems into working memory and language interactions across cortical layers in regions, including DLPFC, implicated in general cognition but rarely in audition or language functions. * YA, HC & RC joint first; JG & CP joint senior authors.
Nanlin Shi, Gregory Cogan, Suseendrakumar Duraivel, Aaron Earle-Richardson, Derek Southwell, Matthew Vestal, Gerald Arthur Grant, Muhammad Shahzad Zafar and Birgit Frauscher
Topic areas: auditory memory and cognition correlates of auditory behavior/perception neural coding
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
Beyond primary auditory regions, non-primary brain areas like the hippocampus and insula are increasingly recognized as contributors to speech perception. However, their specific roles in the hierarchical processing of speech remain poorly characterized. To address this gap, we recorded intracranial neural signals from 24 epilepsy patients implanted with stereoelectroencephalography (sEEG) electrodes (n = 4594). During the experiment, patients passively listened to naturalistic sentences from the TIMIT dataset. We extracted high-gamma power (70–150 Hz) from the recordings and applied linear models to examine how they encode the acoustic features. Using spectrotemporal receptive fields (STRFs), we identified 1,252 electrodes with significant responses to the acoustic spectrogram (p < 0.001, compared to permuted conditions). While most were located in classical auditory areas such as the superior temporal gyrus (STG; N = 232, 18.5%) and middle temporal gyrus (MTG; N = 193, 15.4%), a substantial number were also found in non-primary regions, including the fusiform gyrus (FuG; N = 85, 6.8%), insula (INS; N = 79, 6.3%), inferior parietal lobule (IPL; N = 66, 5.3%), precentral gyrus (PrG; N = 50, 4.0%), hippocampus (Hipp; N = 48, 3.8%), and inferior temporal gyrus (ITG; N = 40, 3.2%). Notably, electrodes in IPL and Hipp exhibited significant temporal delays in encoding compared to STG, suggesting a distinct role in auditory processing hierarchy. To explore the encoding of higher-order linguistic features, we further mapped HG features to embeddings from all 12 layers of a large language model (GPT-2 Small). The final layer (layer 11) showed the strongest encoding performance, with 238 electrodes reaching significance (p < 0.001). In addition to STG (N = 82, 34.5%) and MTG (N = 35, 14.7%), these representations extended to IPL (N = 27, 11.3%), FuG (N = 18, 7.6%), and INS (N = 15, 6.3%). Together, these findings demonstrate that non-primary regions systematically encode both low- and high-level speech features, with distinct spectrotemporal dynamics from those in classical auditory cortex. This motivates further investigation into their roles in processing syntactic and lexical information, expanding our understanding of the distributed neural architecture supporting natural speech perception.
Gasser Elbanna and Josh McDermott
Topic areas: auditory memory and cognition brain processing of speech and language neural coding
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
Humans excel at transforming acoustic waveforms into meaningful linguistic representations, despite the inherent variability in speech signals. However, the mechanisms that enable such robust perception remain unclear. One bottleneck is the absence of models that replicate human performance and that could be used to probe for mechanistic hypotheses. To address this bottleneck, we developed PARROT, an artificial neural network model of continuous speech perception. PARROT combines a simulation of the human ear with convolutional and recurrent neural network modules. Together, these model stages map variable acoustic signals onto linguistic representations (e.g., phonemes and characters). The model was trained on 7.5 million utterances of varying lengths, totaling 15,000 hours of speech. To evaluate human-model alignment, we designed a novel behavioral experiment in which participants transcribed spoken nonwords. This experiment allowed us to compute the first full phoneme confusion matrix in humans, enabling a systematic comparison of human–model phoneme confusions. PARROT exhibited similar patterns of phoneme confusions as humans (r=0.95) as well as similar patterns of phoneme accuracy (r=0.98). To assess the importance of recurrence in human speech perception, we trained a parameter-matched variant in which the recurrent layers were replaced with depth-matched feed-forward layers. Eliminating recurrence noticeably reduced the model’s correspondence to human phoneme confusion patterns (r=0.86), highlighting the critical role of temporal integration in human-like speech perception. To further understand the role of contextual cues in human speech perception, we manipulated the model’s access to surrounding context. Models with access to both future and past context aligned more with human phonemic judgments than those using past or future alone. This result suggests that humans integrate across a local time window extending into the future to disambiguate speech sounds. Overall, the results suggest that aspects of human-like speech perception emerge by optimizing for sub-lexical recognition from cochlear representations, and show how the resulting models can clarify the mechanisms contributing to robust perception.
Lauren Ralston, Elise Rickert, Anne Anderson, Dave Clarke, Nancy Nussbaum, Elizabeth Tyler-Kabara, Howard Weiner and Liberty Hamilton
Topic areas: correlates of auditory behavior/perception neural coding
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
To effectively communicate, humans must distinguish speech in the presence of competing noise. This process is not fully understood but involves both auditory cortex and frontal regions to filter incoming information. Understanding neural processing of speech in noise in children has clinical implications for development in typical hearing populations as well as those with processing disorders. Thus, the overarching goal of this study is to investigate speech in noise processing in pediatric patients. Historically, studies examining speech in noise processing use either behavioral, neuroimaging, or scalp electrophysiology (EEG) techniques. Here, we used intracranial recordings in nine participants with drug resistant epilepsy, which allows for higher spatiotemporal resolution than other techniques. All procedures were approved by the UT Austin Institutional Review Board. We recorded neural and behavioral responses while the Bamford-Kowal-Bench Speech in Noise (BKB-SIN) test was administered. The BKB-SIN test presents lists of sentences in noise and the subject is asked to repeat the sentences, resulting in a score reflecting how much louder the speech needed to be than the background noise for the subject to correctly identify it. Behavioral responses were grouped into high, mid, and low signal to noise ratio (SNR) categories. The neural data were preprocessed to remove epileptiform activity and artifacts, and event related potentials were analyzed across frequency bands from 4 to 200 Hz. These frequency bands were extracted using a Hilbert transform in theta (4-8 Hz), alpha (8-13 Hz), beta (13-30 Hz), gamma (30-70 Hz), and high gamma (70-150 Hz). Though high gamma power has been shown to track speech features during perception, the ability to process speech in noise also relies on inhibitory networks, which are associated with other frequency bands. Specifically, alpha power (8-13 Hz) in the frontal cortex may reflect inhibitory processes. Therefore, we hypothesized that there would be increased activity in the frontal region of the cortex at the alpha frequency, reflecting inhibition for background speech that must be ignored. Using a linear mixed effects model, we show a significant interaction between SNR level and correct versus incorrect behavior on alpha power. Specifically, increased alpha in the frontal regions of the cortex is associated with better behavioral performance at mid SNR levels (p=0.004), which could reflect listening effort. This result suggests that increased alpha power in the frontal cortex is associated with the ability to correctly identify speech in background noise when the task is of moderate difficulty.
Ashley Qin, Donghyeok Lee and Matthew Leonard
Topic areas: auditory disorders brain processing of speech and language correlates of auditory behavior/perception neural coding
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
While the brain is highly adept at learning new skills implicitly, challenging tasks like adult language acquisition benefit from feedback that provides explicit guidance on performance (McCandliss et al., 2002). The ways in which feedback from previous trials is related to changes in neural encoding during learning are unclear. Here, we asked how specific trial-by-trial corrective and reinforcing feedback predicts subsequent behavior and neural encoding using high-density direct brain recordings in English-speaking neurosurgical participants while they learned to identify Mandarin Chinese tones. Participants listened to Mandarin syllables that varied in lexical tone (T1: high-flat; T2: high-falling; T3: low-dipping; and T4: low-rising), pressed a button to report which tone they heard, and received visual feedback to indicate accuracy. No additional rewards or penalties were used to signal performance. We found three types of neural populations: (1) exclusively feedback-responsive, (2) exclusively stimulus-responsive, and (3) feedback- and stimulus- responsive. All populations were widely distributed across the cortex, with stimulus-responsive and stimulus-and-feedback-responsive populations located primarily in superior temporal gyrus, and feedback-responsive populations concentrated in orbitofrontal and cingulate cortices. The co-localization of stimulus and feedback encoding suggests an important role for integrated coding of both sensory input and trial-by-trial behavioral performance during learning. To understand how feedback influences encoding in these populations, we compared the effects of receiving corrective versus reinforcing feedback on the previous trial. Preliminary results suggest that corrective feedback changes the encoding of multiple tones, potentially reorganizing the representation of the entire stimulus space, whereas reinforcing feedback primarily affects individual tones, likely strengthening the association between each tone and its correct label. Together, these results show that by linking past performance to present sensory encoding, feedback-driven modulation offers a possible mechanism through which feedback promotes plasticity in the adult brain during non-native speech sound learning.
Emina Alickovic, Payam Shahsavari Baboukani, Kasper Eskelund, Cajsa Thulin, Josefine Muller and Bo Bernhardsson
Topic areas: correlates of auditory behavior/perception multisensory processes
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
This study focused on how changes in signal-to-noise ratio (SNR) affect the brain’s neural tracking of attended and ignored speech in people with hearing impairment. Neural responses were modeled starting from the speech envelope and extended by adding phonetic information and features from each layer of OpenAI’s Whisper model to better represent speech processing under challenging listening conditions similar to the Cocktail Party Problem. Using a Temporal Response Function (TRF) approach, adding these speech features increased the correlation between predicted and measured EEG signals, improving from the envelope alone to models that included phonemes and Whisper’s layer-wise outputs. However, incorporating these richer features did not lead to a significant improvement in distinguishing attended from ignored speech compared to models using only acoustic features. These findings contribute to understanding how the brain processes speech in noisy environments for hearing-impaired listeners. The results may guide the development of objective measures to assess hearing aid function and user-specific hearing difficulties. Moreover, this approach could support the design of signal processing strategies in hearing devices by enhancing neural tracking of speech and auditory attention, ultimately improving speech comprehension in real-world listening situations.
Yukai Xu and Joji Tsunada
Topic areas: brain processing of speech and language
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
Human speech exhibit remarkable flexibility, allowing control over not only content but also fine acoustic structures. Two brain pathways have been proposed to govern fine vocal control in humans and non-human primates: 1) one descending from the orofacial motor cortex to motoneuron pools, and 2) another descending from the anterior cingulate and prefrontal cortices via the periaqueductal gray (PAG) to motoneuron pools. While the role of the former pathway in fine vocal motor control has been studied—neural activity in the speech motor cortex can be explained by linear combinations of speech apparatus engagements—the latter pathway’s role and relative contributions of these two pathways remain unknown. To address this question, we investigated the neural mechanisms of vocal acoustic control in marmoset monkeys, a New World primate species capable of sophisticated regulation of vocal timing, sound intensity, and acoustic frequencies. Specifically, by using pharmacological inactivation and electrical stimulation, we perturbed activity in the orofacial motor cortex, prefrontal cortex and PAG, and analyzed their effects on vocal acoustics. Consistent with previous lesion studies, pharmacological inactivation of the orofacial motor cortex and prefrontal cortex using GABAA receptor agonist muscimol did not alter acoustic structures and call rates. However, orofacial motor cortex inactivation biased call type usage. Additionally, electrical stimulation of the orofacial motor cortex modulated the fundamental frequency (F0) of vocalizations in a current- and timing-dependent manner, although stimulation alone did not evoke vocalizations. In contrast, PAG stimulation reliably evoked vocalizations with short latencies (700 ± 160 ms), consistent with its established role in vocal triggering. We also confirmed that different PAG subregions preferentially associated with specific call types. Importantly, evoked calls exhibited significantly elevated F0 compared to spontaneous calls, and F0 and sound intensity positively correlated with stimulation power, revealing a novel mechanism for parametrically controlled vocal acoustics. These findings suggest 1) a differential role of the orofacial motor cortex in fine acoustic control between humans and monkeys, and 2) the existence of a midbrain circuit in the PAG that preprograms fine acoustic parameters prior to vocal production.
Po-Ting Bertram Liu
Topic areas: auditory memory and cognition correlates of auditory behavior/perception
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
Several studies have developed Deep Neural Network (DNN) models to mimic one function of human auditory perception; however, the ultimate goal is a unified model to simulate human auditory perception under various hearing conditions, even simulate high-level functions, e.g. naturalistic speech perception. To this end, I proposed a speech perception simulator, an encoding-decoding framework utilizing auditory modeling and decoding models using DNN. Decoding models should be able to reconstruct perceived sound from the output of simulated auditory nerves’ activities. Sound reconstruction from auditory spike trains remains unsolved until my recent decoding work. This research problem can date back to a scientific abstract from the 2012 Bernstein conference. In this sound reconstruction task, the decoding model should use only spiking activities of simulated auditory nerves without additional auxiliary input, such as the signals from the efferent pathways. In 2015 and 2024, the research teams in the U.S. proposed decoding models using algorithms in signal processing, but their results were not realistic enough to proceed with further experiments or simulations, or to draw significant conclusions. My reconstruction models have solved this long-standing unsolved research problem in auditory computational neuroscience. The sound reconstruction models were trained on the dataset with auditory neurograms as input, without any auxiliary input, and sound waveforms as output. Auditory neurograms were computed using Raymond Meddis’ Matlab Periphery model on the LJSpeech dataset. My decoding models are neural vocoder-based. To evaluate decoding models, the first 20 sound files were not included in training and were used to compute objective metrics, e.g. PESQ for speech quality. As a result, my models can achieve more than average PESQ score of 4.096, while other baseline models on the same reconstruction task achieved average PESQ scores of around 1.5 or below 3.398. Thus, my decoding models can reconstruct high fidelity sound and make this simulation framework practical to use. This speech perception simulator will enable researchers to answer hypothetical research questions, especially those that are difficult to manipulate in animals or humans. Using this simulation framework, researchers can manipulate auditory nerves’ properties or adjust the parameters of auditory models to investigate their effects on auditory processing. For example, my previous study examined the impact of eliminated phase-locking of auditory nerves’ on naturalistic speech perception. More experiments using this framework are underway and may be presented at the conference.
Alessandro La Chioma and David Schneider
Topic areas: auditory memory and cognition brain processing of speech and language correlates of auditory behavior/perception neural coding
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
Auditory perception relies on predicting the acoustic consequences of our actions. The auditory cortex (AC) responds differently to expected versus unexpected self-generated sounds. In the real world, the same motor action can produce different sounds depending on the environment in which the behavior is produced – e.g. footsteps sound different when walking on concrete compared to fallen leaves. Yet it remains untested whether AC dynamically updates predictions about self-generated sounds in a context-dependent manner. We developed a naturalistic audio-visual-motor virtual reality (VR) for head-fixed mice. Real-time locomotion tracking was performed to provide artificial footstep sounds that were yoked to a precise phase of the step cycle, creating an ethological and experimentally manipulable form of auditory reafference. While running on the treadmill, mice traversed two different contextual environments, each consisting of a distinct visual corridor accompanied by distinct footstep sounds. Using this system, we asked whether AC neural activity reflects predictions about the sound that footsteps are expected to produce in a given context, and to what extent contextual and motor signals integrate with auditory information. Following behavioral acclimation, we made high-density neuronal recordings from primary AC as mice traversed the two VR contexts and experienced expected or deviant footsteps. We observed overall suppression of neural responses to self-generated sounds compared to the same sounds heard passively. Subsets of neurons responded differently to the same sound heard in the expected versus the unexpected context. Population-level analysis indicates that context information is embedded in AC population activity. In addition to sound and context, neuronal responses were also largely affected by locomotion speed, with subsets of neurons tuned to a specific phase of stride cycle. We used generalized linear models to help isolate the contribution of auditory, context, and motor signals to the activity of individual neurons. Our results suggest that AC combines auditory and motor signals with visual cues for context-dependent processing of self-generated sounds.
Kameron Clayton, Korey Sudana, Joanne Li, Ethan Lawler, Jennifer Zhu, Elizabeth Norris, Paul Gratias, Evan Hale, Myunghoon Yoo, Artur Indzhykulian and Daniel Polley
Topic areas: correlates of auditory behavior/perception neural coding neuroethology and communication
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
The development of sensory systems proceeds sequentially from activity-independent genetic programs which wire up afferent circuits to a series of activity-dependent critical periods where neural tuning is shaped by sensory experience. While altering sensory input during these critical periods can dramatically reshape cortical maps and behavior, it remains unclear whether initiating sensory transduction in adulthood—after lifelong deprivation—can result in normal cortical coding and perception. One possibility is that lifelong deprivation leads to irreversible deficits due to cross-modal plasticity or degeneration of afferent pathways. Alternatively, the basic cortical architecture established prenatally may provide a scaffold for functional sensory processing, even without developmental experience. In the auditory system, mutations in the Otof gene prevent synaptic transmission from inner hair cells to auditory nerve fibers and thus completely block afferent sound transmission, while crucially, all other components of the sensory transduction apparatus remain intact. Here, we asked whether initiating afferent sensory transduction in a deaf adult Otof-deficient mouse using a well-established cochlear gene therapy (Chung et al. 2023 ASGCT) would be sufficient for the emergence of normal cortical sound encoding and perception despite a total lack of auditory experience, as measured through videographic and operant behavioral assays, in tandem with high-density single unit recordings and chronic calcium imaging in auditory cortex (ACtx). Three weeks following gene therapy, treated mice could perform basic sound detection, complex sound discrimination, and even sound detection in noise (N = 6 treated mice, N = 6 normal hearing controls). Sound-evoked facial movements were present, albeit with lower amplitudes (N = 6 treated mice, N = 8 WT). Cortical single unit recordings (N = 4 mice, n = 549 units) revealed strikingly normal receptive fields and tonotopic organization. Further, single units showed normal sound level encoding and even faithfully represented temporally modulated stimuli like click trains. Using a triple transgenic mouse model (VGlut1-Cre:jGCaMP8s:Otof-/-), we have tracked the emergence of sound responses in populations of L2/3 pyramidal neurons after gene therapy (n = 7,786 neurons). Well-organized tone-evoked receptive fields emerge within five days of gene therapy, suggesting that pre-existing circuit architectures, not experience, drive the rapid emergence of sensory coding and perception following adult-onset sensory transduction.
Magdalena Solyga and Georg Keller
Topic areas: correlates of auditory behavior/perception multisensory processes neural coding
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
A major challenge in systems neuroscience is translating insights from animal models to humans, especially in the study of predictive processing. While animal studies have uncovered detailed circuit mechanisms, based on which predictive processing has become a well-established framework for understanding cortical function, direct translation to human research has been limited - partly due to differences in experimental paradigms. In prior work, we used virtual reality and two-photon imaging to show that audiomotor mismatch responses in mouse auditory cortex are enhanced by concurrent visuomotor mismatches, suggesting that multimodal, non-hierarchical interactions shape prediction error signals in cortical layer 2/3. We now extend this approach to humans. We developed a virtual reality paradigm for freely moving participants, combining EEG recordings with a closed-loop virtual reality system in which participants can explore a virtual environment by walking around. For audiomotor coupling, we used continuous white noise which volume scaled with walking speed. First, we tested how movement modulates auditory responses - a phenomenon well documented in animals but less explored in humans. Our preliminary data showed slight modulation of auditory responses during movement compared to stationary condition. To study prediction error responses, we also briefly interrupted the sensorimotor coupling in this environment at random times. We found robust visuomotor and weaker audiomotor mismatch responses, suggesting that the computational mechanisms of prediction error computations may be shared between mouse and human.
Alex Clonan, Ian Stevenson and Monty Escabi
Topic areas: brain processing of speech and language correlates of auditory behavior/perception neural coding
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
Human hearing is critical to everyday communication and the perception of natural auditory scenes. For individuals with misophonia, sounds commonly experienced in daily life can evoke severe discomfort and distress. Aversion is often described in terms of broad sound categories, such as bodily sounds, but what acoustic features cause specific sounds to be aversive or not, within the same category or across different individuals, remains unclear. Here, we explore whether bottom-up statistical sound features processed in the auditory periphery and midbrain can explain aversion to sounds. Using the Free Open-Access Misophonia Stimuli (FOAMS) dataset and a hierarchical model of the auditory system, we find that sound summary statistics can predict discomfort ratings in participants with misophonia. For each listener, the model produces individualized transfer functions that pinpoint specific spectrotemporal modulations that contribute towards sound aversion. Overall, the model explains 76% of the variance in discomfort ratings, and we find substantial differences across participants in which sound features drive aversion. An advantage of the approach here is that it is sound-computable – perceptual ratings can be fit from or predicted for any sound. To illustrate applications of sound-computable models, we consider 1) extrapolation to a large set of untested environmental sounds and 2) personalized trigger detection in continuous audio. Model predictions identify untested sound categories, not in the original FOAMS set, that may also be aversive and suggest that there may be substantial heterogeneity in how aversive specific sounds are within some sound categories. In continuous audio, we show how sound-computable models can identify the timing of potential triggers from sound mixtures. Altogether, our results suggest that acoustic features – spectrotemporal modulations, in particular – can practically be used to characterize the individualized patterns of aversion in participants with misophonia. Future perceptual studies using synthetic sounds and sound sets with more diverse acoustics will allow model predictions to be tested more broadly; however, sound-computable models may already have applications in precision diagnosis and management of misophonia.
Sharlen Moore, Travis You, Ziyi Zhu and Kishore Kuchibhotla
Topic areas: brain processing of speech and language correlates of auditory behavior/perception hierarchical sensory organization
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
Astrocytes, the predominant glial cell type in the brain, have emerged as active participants in neural information processing and plasticity. However, their functional dynamics across learning remain poorly understood. Here, we used chronic fiber photometry and two-photon calcium imaging to longitudinally track populations or individual astrocyte Ca2+ dynamics in the auditory cortex of awake mice (expressing Aldh1l1-dependent GCaMP6s in an inducible manner) across the acquisition of an auditory discrimination task. We trained mice to lick to a tone for water reward (S+) and withhold from licking to another tone (S−) to avoid a timeout. We added catch trials in which the reward contingency was changed in a block-based manner to understand how these signals change upon contextual cues. Over several days of training, astrocytes exhibited learning-related modulation of their Ca2+ dynamics, with cells showing enhancement of their evoked responses during rewarded trials. This increased activity arose within the first 100 trials of reward-based training and was not due to licking as the same astrocytes exhibited suppressed activity on errors of action (incorrect licking to the S-, false alarm). We observed two timescales of astrocyte activation, one ‘early-in-trial’ that aligned with the tone and initial lick. Another appeared later-in-trial, was broader and significantly larger, potentially associated with motor, reward and post-reward processing. Omitting reward on correct trials (hits, S+) led to biphasic responses where a transient increase in activity was followed by a profound suppression, suggesting that reward consumption may drive extended increases in astrocyte activity. Receiving extra reward amount (2x), led to a potentiation of these responses. Similar patterns of activity arose in correct trials with no licking (correct rejects to the S-) in the same reward-shift block. Finally, baseline activity in astrocytes efficiently tracked the reward-context. Together, these data suggest that astrocytes are tracking a more abstract context variable irrespective of movement. Interestingly, baseline reward-related signals extended beyond individual trials, potentially playing a role in maintaining a signature of reward and trial history for upcoming choices. Our data suggest that coordinated astrocyte ensembles may provide a scaffold for integrating reward signals with sensory processing to facilitate learning, potentially bridging trial-level and inter-trial computations. This study expands our understanding of astrocyte contributions to neural circuit dynamics underlying adaptive behavior.
Jennifer Lawlor, Sarah Elnozahy, Fangchen Zhu, Ziyi Zhu, Aaron Wang, Fengtong Du, Tara Raam and Kishore Kuchibhotla
Topic areas: brain processing of speech and language subcortical auditory processing
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
During sensorimotor learning, animals link a sensory cue with actions that are separated in time using circuits distributed throughout the brain. Learning thus requires neural mechanisms that can operate across a wide spatiotemporal scale and promote learning-related plasticity. Neuromodulatory systems, their broad projections and diverse timescales of activity, meet these criteria and have the potential to link various sensory and motor components. Yet, it remains unknown the extent to which this proposed model of plasticity occurs in real-time during behavioral learning. The acquisition of sensorimotor learning in a go/no-go task has been found to be faster and more stereotyped than previously thought (Kuchibhotla et al., 2019). We trained mice to respond to one tone for a water reward (S+) and withhold from responding to another (S-). We interleaved reinforced trials with those where reinforcement was absent (“probe”). Early in learning, animals discriminated between S+ and S- in probe but not reinforced trials. This unmasked a rapid acquisition phase of learning followed by a slower phase for reinforcement, termed ‘expression’. What role does cholinergic neuromodulation play in task acquisition? Here, we test the hypothesis that cholinergic neuromodulation provides a ‘teaching signal’ that drives primary auditory cortex (A1), and links stimuli with reinforcement. We exploit our behavioral approach and combine this with longitudinal two-photon calcium imaging of cholinergic axons in A1 during discrimination learning. We report both robust stimulus-evoked cholinergic activity to both S+ and S- and stable licking-related activity throughout learning. While this activity mildly habituates in a passive control, in behaving animals the S+ and S- stimulus-evoked activity is enhanced (S+: duration, S-: amplitude and duration) on the timescale of acquisition. Additionally, we test the hypothesis that cholinergic neuromodulation impacts the rate of task acquisition. We expressed ChR2 bilaterally in cholinergic neurons within the basal forebrain of ChAT-Cre mice and activated these neurons on both S+ and S– trials throughout learning. Test animals acquired the task faster than control groups. These results suggest that phasic bursts of acetylcholine, directly impact the rate of discrimination learning. In two additional cohorts, we paired the activation with either the S+ or the S– stimulus. In each case, we observed a refinement of the action associated with the paired stimulus: increased licking for S+ and suppression of licking for S–. Together, these results suggest that CBF activation differentially modulates contingency-related actions.
Alexandria Lesicko, Erin Michel, Autumn Soots, Kehinde Adeaga and Maria Geffen
Topic areas: auditory memory and cognition brain processing of speech and language
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
Real-world auditory behaviors, such as vocalization, sound-driven defense behavior, and orienting movements, often extend beyond passive listening and involve complex audio-motor and multisensory integration. Within the central auditory system, the inferior colliculus (IC) serves as an obligatory relay station and massive convergence center for auditory information. In addition to its role in sound processing, the IC receives input from diverse multisensory and neuromodulatory structures and is implicated in a variety of such acoustico-motor behaviors. However, little is known about the representation of somato-motor signals within the IC and their functional role in auditory behavior. In this study, we performed two-photon imaging in the IC while recording the spontaneous movements of head-fixed mice and presenting a variety of sound stimuli. Video recordings were analyzed using FaceMap and DeepLabCut and neurons that were responsive to either sound, movement, or both were parsed using a generalized linear model. We found that movement was robustly encoded across the dorsal surface of the IC, with movement-responsive neurons surprisingly outnumbering sound-responsive neurons. Neurons that encoded facial or ear movements were less common than neurons that encoded movements of the limbs or trunk, and many neurons encoded movement from multiple body regions. Movement led to a decrease in sound-driven responses in IC neurons and activation of somato-motor inputs to the IC led to a decrease in performance accuracy in mice trained on a sound detection task, suggesting a net suppressive effect of somato-motor inputs on auditory processing in the IC.
Amber Kline, Brooke Holey and David Schneider
Topic areas: subcortical auditory processing
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
Many sensations we perceive are caused by our own actions, which we distinguish from externally generated stimuli. In the auditory system, the ability to differentiate between external and self-generated sounds is crucial for vocal communication, musical training, and general auditory perception. The auditory system leverages the tight correlation between movements and the timing of incoming sensory information to discern whether a sound is self-generated, and through experience, animals form expectations for the sensory consequences of their movements. The secondary motor cortex (M2) sends movement-related signals to auditory cortex and is a potential source for establishing specific associations between sounds and their corresponding movements, yet it remains unknown how M2 activity changes with experience as mice learn and update auditory-motor expectations. Using two-photon calcium imaging in awake behaving mice, we find a subset of M2 cells sensitive to deviations from the expected sensory outcomes of movements. Further analyses aim to uncover the extent to which M2 neuronal ensembles represent changing sensory-motor associations as animals form new expectations over time.
Annesya Banerjee, Ian Griffith and Josh McDermott
Topic areas: auditory memory and cognition correlates of auditory behavior/perception neural coding
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
Background: Humans with normal hearing abilities are able to attend to target sources in the presence of concurrent sounds, allowing them to communicate in noisy environments. Such abilities are limited for individuals with hearing loss and users of cochlear implants. Attentional deficits could reflect degraded peripheral information, for instance, if attentional cues are not encoded with sufficient fidelity. Alternatively, deficits could result from suboptimal decoding of the altered peripheral representations that follow hearing loss and cochlear implantation (as if the central auditory system cannot fully adapt to the altered periphery). To study this issue, we optimized artificial neural network models to recognize speech from a cued talker in multi-talker settings, and asked whether the models could perform the task using simulated cochlear input stimulation. Method: We optimized deep neural networks to report words spoken by a cued talker in a multi-source mixture. Models were trained using simulated binaural auditory nerve input obtained from either a normal cochlea, a cochlea with degraded temporal coding (simulated via lowering the nerve phase-locking cutoff to 50 Hz), or a simulated cochlear implant. Attentional selection was enabled by stimulus-computable feature-based gains, implemented with learnable logistic functions operating on the time-averaged model activations of a cued talker. Gains could be high for features of the cue, and low for uncued features, as determined by parameters optimized to maximize task performance. Results: Models with normal nerve input successfully learned to use both spatial and vocal timbre cues to solve the word recognition task. In the presence of competing talkers, these models correctly reported the words of the cued talker and ignored the distractor talker(s), similar to humans with normal hearing abilities. Models with degraded temporal coding performed worse than the normal hearing model, but showed some benefit of target-distractor spatial separation and sex differences. Models with simulated cochlear implant stimulation performed notably worse, showing only modest benefits from target-distractor sex differences, and showing spatial benefits only for very large spatial separation. Conclusion: Our results suggest that auditory attention deficits in cochlear implant users reflect limitations of peripheral information available from current electrical stimulation strategies.
Wynne M Stagnaro, Thomas Zhihao Luo, Adrian G Bondy, Julie A Charlton, Charles D Kopec, Sarah Jo C Venditto and Carlos D Brody
Topic areas: auditory memory and cognition brain processing of speech and language correlates of auditory behavior/perception
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
Perception is shaped by stimulus history and behavioral context. Forward suppression, where a stimulus reduces the response to subsequent stimuli for hundreds of milliseconds, is a prevalent form of auditory adaptation. In decision making tasks, whether the subject receives a reward on a given trial influences the subsequent decision. Whether, and if so, where, these distinct timescales and types of contextual modulation interact remains unknown. We measured forward suppression behaviorally and neurally during an auditory decision-making task in which rats accumulated pulsatile click stimuli (“Poisson Clicks task”), as well as in a different task where the stimuli were identical but no longer coupled to reward. Adaptation profiles were inferred from both choice behavior and neural responses recorded in the auditory thalamus (MGB; medial geniculate body) and tail of the striatum (TS). In line with previous work, both neural responses and behavioral sensitivity were suppressed immediately following a click, characteristic of forward suppression. Yet neural adaptation in these regions recovered more quickly and had a smaller dynamic range than behavioral adaptation. Further, reward on a previous trial increased the dynamic range of adaptation behaviorally but had no impact on adaptation in MGB or TS, suggesting that this trial-level influence emerges in downstream regions as sensory adaptation continues to evolve. In contrast, when stimuli were decoupled from reward, the dynamic range of neural adaptation decreased in TS. In sum, these results show that trial-level context shapes forward suppression behaviorally but not in the auditory thalamus or striatum, whereas broader task context can modulate neural adaptation, particularly in striatum. Different types of behavioral context may influence adaptation at different levels of the sensory processing hierarchy.
Grant Zempolich and David Schneider
Topic areas:
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
Making mistakes is an important part of learning and performing skilled behaviors like speech or music. Equally important is knowing how to react to mistakes in order to improve with practice. However, the neural circuits that process mistake-related feedback and drive behavioral corrections in mammals remain poorly understood. Here, we identify how the mouse brain encodes error-related signals during skilled acoustic behavior, and how these signals drive adaptive motor cortical changes to improve performance. We developed a skilled, sound-guided behavior that requires mice to use real-time acoustic feedback to guide their ongoing forelimb movements and is auditory cortex dependent. Auditory cortex neurons encode error signals associated with distinct types of behavioral mistakes and the activity of error-sensitive neurons predicts both across-trial and within-trial changes in behavior. Skilled acoustic behavior also requires the secondary motor cortex (M2), a motor planning and execution region that receives acoustic input from auditory cortex. Following behavioral mistakes, persistent activity in M2 encodes specific acoustic errors, and these error signals modulate M2 dynamics within a low-dimensional space that governs motor planning. These data reveal a learned coordinate transformation that converts specific acoustic errors into adaptive adjustments in motor planning and behavior. Collectively, these results uncover a cortical circuit that detects errors and facilitates learning from mistakes during skilled behavior in mammals.
Nilay Atesyakar, Sarah Rajan, Justin Yao and Kasia Bieszczad
Topic areas: hierarchical sensory organization neural coding
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
Auditory associative learning induces sound-specific changes in the auditory cortex (ACx) (Schreiner and Polley 2014; Weinberger 2015). Does neurophysiological plasticity offes a signal processing advantage for learned signals under subsequently challenging listening conditions? Learning mechanisms may selectively promote listening for a remembered sound signal in noisy backgrounds via signal-specific ACx plasticity to facilitate signal-cued behaviors. To test if learning-induced ACx changes improves signal processing in noisy backgrounds, adult male rats (n = 11) were shaped to press a lever to receive a water reward. Training followed in “Quiet” on a simple tone-reward associative task (5.0 kHz, 60 dB SPL). Daily sessions required for >80% correct presses to tones was used to identify rapid (< 10 days) vs. slower (> 10 days) learners. After reaching the same high levels of asymptotic performance, a behavioral memory test confirmed response specificity for acoustic frequency (5.0 kHz (trained tone), vs. 4.2 kHz, 5.9 kHz) in “Quiet” vs. “Noise” under different signal-to-noise ratios (SNR). In vivo ACx multiunit recordings followed all behavioral assessments in a single acute anesthetized recording session in two parts: (1) pure tones (25ms, 0.525–47.7 kHz, 10–60 dB SPL) were in “Quiet” to identify ACx multiunit activity based on short-latency, frequency-tuned, tonotopic evoked responses; (2) the same tones (25ms) embedded in Gaussian white noise trials (275-415ms, e.g. 275ms after noise onset) under varying SNRs. Remarkably, responses to the remembered tone frequency in noise differed in rapid- (n = 6) vs. slower- (n = 5) learning rats, even though testing was done when all rats performed the task (in Quiet) equally well. Only rapid learning changed sound-evoked ACx activity (nrapid = 99 vs. nslower = 67 sites relative to naive: n = 7 rats, 94 sites) as: (i) signal-specific significantly increased evoked activity for the behaviorally relevant tone and (ii) significantly decreased evoked activity during steady state noise. Statistically significant differences emerged with higher noise conditions and were most pronounced in rapid vs. slower learners. Surprisingly, behavioral responses generalized across acoustic frequency in both groups during quiet and the lowest levels of background noise—signal-specificity only emerged for rapid learners with higher levels of background noise before failing again under the noisiest conditions. This work highlights the impact of learning experiences on enhancing signal detection via a non-linear auditory cortical decoding mechanism that is established early during the acquisition of novel auditory associations.
Rachel Cassidy and Ross S. Williamson
Topic areas: auditory memory and cognition correlates of auditory behavior/perception
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
In natural environments, animals must react appropriately to behaviorally relevant sounds, such as those signaling predators or food sources, while ignoring competing auditory signals. Effective responses require identifying both the identity and location of sounds. The psychophysics of sound localization have been well-characterized by presenting stimuli at varying positions relative to an animal’s head, which is kept stationary for precise spatial control. However, such approaches do not capture sensorimotor strategies animals use during active navigation. In natural settings, environmental noise, behavioral context, and self-motion all shape auditory spatial perception. Animals frequently orient their heads and bodies to reduce perceptual ambiguity and improve localization accuracy. To investigate the behavioral strategies that support naturalistic sound localization, we developed a freely-moving auditory discrimination task. C57BL/6J mice were trained to localize a target sound along the perimeter of an open circular arena while ignoring a distractor presented simultaneously from another azimuthal position. Stimuli were bandpassed white noise of varying center frequencies. After trial initiation, both stimuli played continuously until mice approached and licked a water spout beneath the perceived target. Since mice were free to explore before making a decision, this design enabled observation of how animals integrate dynamic sensory input with self-motion. Over several weeks of training, mice adopted characteristic sensorimotor strategies that facilitated task learning, including rapid head and body “scanning” movements at stimulus onset and 180° turns to correct errant approaches towards the distractor. These behaviors became more frequent with learning and coincided with increased accuracy, shorter reaction times, and more stereotyped trajectories. These results indicate that mice resolve spatial ambiguity with dynamic movement strategies and update ongoing motor plans in response to changing acoustic input. Our ongoing work investigates the neural circuits underlying this dynamic sensorimotor loop. While auditory cortex encodes spectral features and learned associations, it lacks a map of egocentric space. In contrast, its projection targets in the midbrain, such as the superior and inferior colliculi, are spatially tuned and mediate orienting behaviors. We are currently using pharmacological and optogenetic perturbations to dissect how corticocollicular circuits contribute to spatial decision-making, offering new insight into how animals transform auditory input into adaptive action.
Ariadna Corredera, Deanna Garcia, Alessandro La Chioma and David Schneider
Topic areas: auditory memory and cognition correlates of auditory behavior/perception
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
In ethologically relevant contexts, distinguishing self-generated from externally generated sensory input is critical for survival. External cues can range from harmless (e.g., ambient noise or social partners) to threatening (e.g., predator sounds). We used wireless electrophysiology in freely moving mice to study auditory processing of self- versus externally generated sounds as they walked across surfaces that either naturally produced noise (e.g., rustling leaves) or across a silent surface paired with locomotion-triggered sounds in virtual reality. Using wireless miniature microphones positioned at the ear level, we matched the intensity of playback sounds to the real self-generated sounds from leaves. Although auditory neurons were broadly tuned, responses were stronger to externally generated playback sounds, and we could accurately decode sound identity from single-neuron activity. When mice were placed with a cage-mate or a non–cage-mate, neural responses varied with partner proximity and were slightly suppressed for self-generated sounds compared to those from the partner. Finally, visual predator cues led mice to alter their behavior in ways that reduced sound production, with auditory cortical dynamics reflecting internal-state changes.
Jordan Fox, Brian Fischer, William DeBello, Diasynou Fioravante, Mark Ellisman and Jose Peña
Topic areas: auditory disorders multisensory processes
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
The auditory pathways of the barn owl (Tyto alba) provide a unique system for studying neural coding and computation underlying hearing and sound localization. In this work, we develop high-resolution, morphologically accurate computational models of neurons in the avian midbrain for which we have broad image and functional data, but where models combining the two have yet to be implemented and studied. This project contributes to the broader research endeavor, informing our understanding of basic levels of neural computations and the potential prevention and treatment of auditory and communicative disorders in humans. Recent anatomical studies of space-specific neurons (SSNs) found in the inferior colliculus of the barn owl have used high-resolution electron (EM) and stimulated emission depletion (STED) microscopy techniques to reveal dense arborization and complex dendritic structures called toric spines. While the complex morphology of these structures is hypothesized to support frequency and binaural cue integration, the exact nature of this integration is unknown. We use this imaging data to develop morphologically accurate compartmental models using the Arbor simulation codebase, then optimize the model to be consistent with patch-clamp experiments using simulation-based inference, a combination of Bayesian statistical modeling and machine learning. In this way, we explore and constrain the electrophysiological parameterizations that are consistent with in vitro spiking data. The resulting model is used to infer the integration properties of SSNs, such as linearity or nonlinearity, and thus answer key questions about their nature and role in models of sound localization.
Megan Kirchgessner, Mihir Vaze and Robert Froemke
Topic areas: correlates of auditory behavior/perception neural coding neuroethology and communication
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
The postnatal brain undergoes substantial plasticity to represent features and statistics of the sensory world. For the auditory system, studies in rodents (Zhang et al. 2001, Villers-Sidani et al. 2007, Dorrn et al. 2010) and humans (Näätänen et al. 1997) indicate that the postnatal primary auditory cortex changes in response to early acoustic experience. Much work on developmental plasticity in rodents has focused on responses to pure tones and representations of sound frequency, whereas less is known about the development of vocalization processing. Ultrasonic vocalizations are spectrotemporally complex sounds used by rodents in varied social contexts, e.g., distress calls from pups soliciting parental caregiving (Ehret 2005). How and when neuronal representations of auditory cues (from lower-dimensional features like spectral frequency to higher-dimensional auditory objects like vocalizations) first emerge in the postnatal auditory cortex and change with experience is unknown. To date, examining sensory processing across postnatal experience has been technically challenging, thus limiting our understanding of how changes unfold across individual neurons and neuronal cell-types. To address these questions, we perform longitudinal two-photon calcium imaging of hundreds of excitatory and/or inhibitory neurons in the auditory cortex of young mice (N=20), from postnatal day (P) 12 into adulthood. We found that both excitatory and inhibitory neurons in the auditory cortex started responding to tonal stimuli by P13-14 with an initial tonotopic organization that expanded over the next few days. A fraction of neurons gradually changed frequency tuning across days, although overall there was a stable tonotopic representation even at these early ages. Repeated tone exposure during the auditory critical period (P13-16) shifted single-neuron tuning towards the exposed frequency. Compared to lower-frequency tones and down-shifted vocalizations, responses to ultrasonic stimuli (both pure tones and natural vocalization playback) emerged at later ages (>P28) but were quite transient, with many responses disappearing after several days. Ultrasonic vocalization responses initially were observed in neurons tuned to ultrasonic frequencies, but over later development they became independent from ultrasonic frequency tuning. Our results show how varied sensory representations at the single-cell and population levels in the postnatal auditory cortex emerge and change over the course of early-life development.
Ziyi Zhu, Adam Charles and Kishore Kuchibhotla
Topic areas: brain processing of speech and language correlates of auditory behavior/perception multisensory processes
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
Humans and other animals can learn and execute many different tasks throughout their lifespan, a process known as continual learning. However, this biological ability challenges many artificial neural networks that suffer catastrophic forgetting, unless these networks are regularized or expanded. Specifically, unique information about new tasks can be encoded through expansion (adding ‘neurons’ into the network for a new task), while shared information between old and new tasks can be integrated into shared representations. Here, we aimed to test how the biological brain naturally solves this problem. We trained mice on a class of tasks involving the learning of multiple, related, sensorimotor associations, specifically multiple distinct auditory two-choice tasks using a moveable wheel. We exploited a sequential training curriculum where mice expertly performed both tasks in a block-based manner at the final stage of training. We tracked neural activity of L2/3 pyramidal cells in the auditory cortex (AC) and the posterior parietal cortex (PPC) using multi-area two-photon mesoscopic calcium imaging, which allowed us to longitudinally track expansion and integration of neural representation at single-cell resolution throughout different stages of multi-task learning. Surprisingly, a sub-area in PPC showed both reliable auditory responses even in naive animals and dynamic response patterns during learning, indicating its importance in learning multiple auditory tasks. Together, our behavioral and neural approach promises to help us better understand the precise computations used by biological neural networks for continual learning and how this depends on the learning curriculum.
Roland Ferger, Andrea Bae and José Luis Peña
Topic areas:
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
Natural environments challenge the brain to prioritize the processing of salient stimuli. This is especially difficult for concurrent auditory stimuli because sound waves interact such that the phase of each frequency can be altered prior to arrival at the eardrum. This interaction depends on the relative phase of two signals and is different at each ear for spatially separated sound sources, leading to binaural decorrelation. The barn owl, a sound localization specialist, uses interaural time difference (ITD) as primary cue for sound localization in azimuth. ITD detection relies on binaural correlation and, thus, suffers from binaural decorrelation. However, the owl’s midbrain stimulus selection network (MSSN) is dedicated to representing locations of the most salient stimulus among concurrent stimuli. Previous competition studies using unimodal (visual) and bimodal (visual and auditory) stimuli have shown that relative strength is encoded in the spike response rates. Questions remained concerning competition of concurrent auditory signals. To this end, we presented diverse auditory competitors (concurrent flat noise or amplitude modulated noise) and recorded neural responses of awake barn owls in subsequent midbrain space maps, the external nucleus of the inferior colliculus (ICx) and optic tectum (OT, homologue to the mammalian superior colliculus). Other work showed that binaural decorrelation can explain decreased spike response rates in ICx. We expanded the above experiments to use competing stimuli that were spectrally non-overlapping, ruling out binaural decorrelation, but contained enough frequencies across the owls hearing range to be unambiguously localized. While both nuclei contain topographic maps of auditory space, OT also integrates visual input and is part of the global-inhibitory MSSN. Through comparative investigation, we show that while increasing the strength of a competitor sound decreases spike response rates of spatially distant neurons in both regions, relative strength determines spike train synchrony of nearby units only in OT. Furthermore, changes in synchrony in OT are correlated to gamma range oscillations of local field potentials (LFPs), associated with input from the MSSN. Our results suggest that modulations in spiking synchrony between units are an emergent coding scheme for relative strength of concurrent stimuli, which may have implications for downstream read out. We compare results in both midbrain maps according to the effect of spectrally overlapping and non-overlapping stimuli on spike responses and LFP. This further elucidates the role of the MSSN for selecting the most salient stimulus.
Bonnie Lau, Tanya St. John, Annette Estes and Stephen Dager
Topic areas: brain processing of speech and language neural coding novel neurotechnologies
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
Histological studies conducted in postmortem human tissue samples from autistic children show dysmorphology and a decrease in neuronal number in auditory brainstem nuclei important for binaural coding. Difficulty on auditory tasks that rely on binaural hearing, such as understanding speech in noisy real-world environments, is also frequently reported by neurodiverse individuals. In this study, we obtained neural and behavioral measures of binaural hearing to test the hypothesis that deficits in binaural coding underlies the auditory processing difficulty experienced by neurodiverse children. To assess binaural processing, we measured the neural encoding of timing differences between the ears using electroencephalography (EEG) as well as spatial release from masking (SRM), a binaural hearing phenomenon that improves speech perception when distracting talkers are spatially segregated from the talker of interest. To prioritize real-world relevance, we simulated an everyday classroom scenario for our speech perception measure: Listening to a teacher read a book in a sea of background noise. We recorded EEG while participants were listening to an audiobook under 3 conditions 1) Quiet, 2) Co-located Noise, and 3) Segregated Noise. To quantify how well speech is encoded in the neural response, we employed a linear modelling approach where a ridge regression model was fit from acoustic speech features including the envelope, envelope derivative, word onset, and phoneme onset, to the EEG signal to produce a multivariate temporal response function (mTRF). Our results show reduced neural encoding of timing difference between the ears in neurodiverse children in comparison to age and sex-matched neurotypical children. Furthermore, the mTRF analysis revealed the surprising finding that the neural encoding of speech is worse for neurodiverse children when distracting talkers are moved away from the target speaker. This is the opposite pattern expected. Together these findings suggest binaural processing differences in neurodiverse children, warranting further investigation in larger cohorts and a wider age range of neurodiverse individuals.
Andrea Santi, Sharlen Moore, Kelly Fogelson, Aaron Wang, Jennifer Lawlor, Kali Burke, Amanda M Lauer and Kishore Kuchibhotla
Topic areas: correlates of auditory behavior/perception
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
Alzheimer’s disease is a form of dementia in which memory and cognitive decline is thought to arise fromunderlying neurodegeneration. These cognitive impairments are transient when they first appear and canfluctuate across disease progression. Here, we investigate the neural mechanisms underlying fluctuationsof performance in amnestic mice. We trained APP/PS1+ mice on an auditory go/no-go task thatdissociated learning of task contingencies (knowledge) from its more variable expression underreinforcement (performance), while monitoring the activity of 6,216 excitatory neurons in 8 mice by usinglarge-scale two-photon imaging in behaving mice. We found that auditory cortical networks were moresuppressed, less selective to the sensory cues, and exhibited aberrant higher-order encoding of rewardprediction when APP/PS1+ mice exhibited significant performance deficits compared to control mice. Asmall sub-population of neurons, however, displayed the opposite phenotype, reflecting a potentialcompensatory mechanism. Volumetric analysis demonstrated that deficits were concentrated near Aβplaques. Strikingly, these cortical deficits were reversed almost instantaneously on probe (non-reinforced)trials when APP/PS1+ performed as well as controls, providing neural evidence for intact stimulus-actionknowledge despite variable ongoing performance. Our results suggest that the amnestic phenotype istransient, contextual, and endogenously reversible, with the underlying neural circuits retaining theunderlying stimulus-action associations. Thus, memory deficits commonly observed in amnestic mousemodels, and potentially at early stages of dementia in humans, relate more to contextual drivers ofperformance rather than degeneration of the underlying memory traces. Future investigation will testwhether amyloid disrupts the integration of long-range inputs, disturbing the delicate balance betweeninhibition (via PV+ interneurons) and disinhibition (via VIP+ interneurons) and thereby impairing theactivation of the memory trace.
Justine Shih, Surya Tokdar and Jennifer M. Groh
Topic areas: correlates of auditory behavior/perception neural coding
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
How the brain encodes multiple simultaneous stimuli is not fully understood. When presented with multiple sensory stimuli, single neurons can fluctuate or ‘multiplex’ over time between encoding each individual stimulus (Caruso et al., 2018). In visual areas, the prevalence of neurons that are found to show this fluctuating behavior, termed mixtures, has been shown to differ based on the perceptual binding of the stimuli into separate vs. fused objects (Jun et al., 2022, Schmehl et al., 2024). In the auditory system, how the brain binds sounds into “objects” is not entirely understood, but involves a temporal component (Deike et al., 2012). Here, we investigated this binding using harmonically related sounds. We sought to determine how individual neurons in the auditory cortex encode different combinations of harmonic stimuli and how fluctuating response patterns may relate to perceptual binding. To test this, we collected 60 single-neuron recordings from the auditory cortex of a rhesus macaque as she passively listened to combinations of harmonics that formed congruent and incongruent stimuli. Congruent stimuli contained tones that were integer multiples of the same fundamental frequency, while incongruent stimuli contained lower and upper tones from two different fundamental frequencies, respectively. We found clear evidence of multiplexing in the auditory cortex in response to both types of stimuli; however, patterns changed as the sounds progressed in time. During the initial 0-100 ms response period, neurons most commonly responded to either the lower or the upper harmonics. During the later 100-500 ms period, neurons instead showed fluctuating activity between encoding the lower vs. higher harmonics. These general patterns occurred regardless of the congruence of the harmonic complex. Together, these results suggest that processing of complex sounds with multiple related harmonic components evolves over time in the auditory cortex. This finding may provide insight into how the segregation of auditory “objects” unfolds across time.
Aneesh Bal, Andrea Santi, Samantha Soto, Patricia Janak and Kishore Kuchibhotla
Topic areas: correlates of auditory behavior/perception multisensory processes thalamocortical circuitry and function
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
Humans exhibit the remarkable ability to continuously learn new things over their lives without forgetting previous information, an ability known as continual learning. Currently, the neural mechanisms underlying this ability remain unknown. Studying this phenomenon in biological organisms remains challenging, as the nature of the problem requires training experimental subjects on numerous tasks over time while simultaneously obtaining neural recordings. Given this constraint, we use mice as a model organism, given their amenability to longitudinal neural recording procedures and trainability on complex cognitive tasks. However, many assume that mice can only handle simple learning paradigms, successfully learning at most a few tasks. We contend that this limit is not a result of reduced cognitive capacity in mice, but rather from training procedures that ignore naturalistic behaviors. To that end, we built the Continual Learning Mouse Playground, a fully-automated home-cage training system where mice continuously complete cognitive tasks in a naturalistic setting. Using the Mouse Playground, we trained mice (n=12 across 3 cohorts) to sequentially learn 8 tasks that varied across the learned task structures (Go-NoGo, 2AFC) and perceptual dimensions (pure tone, sound duration, click rate, AM modulation, sweep direction, and sound intensity). Importantly, we defined a task as any unique combination of task structure and perceptual dimension. Mice exhibited expert performance on a sequence of 8 tasks, reaching at least 80% accuracy on each task in rapid fashion. Throughout training, we probed earlier tasks between new tasks and found strong retention with minimal relearning. So far, this data showed that mice can learn many tasks in sequence without forgetting, but an important aspect of continual learning is being able to leverage prior knowledge in new contexts, known as compositional inference. To assess this, the final two tasks combined a familiar task structure (2AFC) with previously-learned auditory dimensions, yet in an unencountered novel pairing. Strikingly, we observed that 6 out of 12 trained mice exhibited extremely rapid learning in Tasks 7 and 8, with some mice exhibiting over 80% performance in the first 50 trials of Task 8. A generalized linear model revealed that rapidly-learning mice utilized stimulus information earlier in Tasks 7 and 8 and relied less on a biased-response strategy compared to gradually-learning mice. Collectively, these results establish the Mouse Playground as a viable platform to study continual and compositional learning, providing rapid and scalable behavior results that were previously difficult to study.
Anna Chambers, Carter Tims, Ken Hancock, Ethan Lawler and Daniel Polley
Topic areas: brain processing of speech and language cross-species comparisons neural coding
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
During sleep, sounds retain their ability to potently and rapidly induce arousal. Sleep is thus vulnerable to disruption in cases of central auditory dysfunction. Poor sleep is a primary complaint of patients with hearing disorders such as tinnitus, and is common among people with sensorineural hearing loss. Uncovering a mechanistic link between hearing loss and sleep loss, e.g. via central hyperactivity and hyperarousal, could identify new ways to treat hearing disorders and insomnia, and help to explain the risk for dementia that is strongly associated with hearing loss. Here, we tested whether noise-induced hearing loss (NIHL) is associated with disrupted sleep in a mouse model. We used a noise exposure paradigm to produce a permanent threshold shift at high frequencies while leaving lower frequency hearing intact, and measured electrocorticography (ECoG) and electromyography (EMG) signals during wake and sleep. Mice were housed in custom-built, acoustically transparent enclosures while ECoG, EMG and video were recorded during either silence or in the presence of octave-band noise stimuli at varying intensities. We analyzed sleep architecture in silence by scoring periods of wakefulness, rapid-eye movement (REM) and non-REM sleep and defining sleep bout frequency and duration. Further, we analyzed sound-induced awakening thresholds to track the propensity of high and low frequency sounds to wake animals from sleep. We compared these metrics before and after NIHL or sham exposure. Mice exhibited fragmented sleep patterns after NIHL characterized by significantly shorter bouts of NREM sleep. Moderate intensity stimuli were more effective at waking mice after noise exposure. Meanwhile, no changes in sound-induced waking behavior were observed in control mice that underwent sham noise exposure. These results suggest that hyperarousal accompanies NIHL in mice, raising the possibility that excess central auditory pathway gain after NIHL extends to auditory arousal networks, disrupting sleep architecture and rousing the sleeping brain in response to inconsequential environmental sounds. As many aspects of sleep are readily quantifiable, the topic is well suited for parallel investigations in mice and humans.
Elise Rickert, Lauren Ralston, Anne Anderson, Dave Clarke, Nancy Nussbaum, Elizabeth Tyler-Kabara, Howard Weiner and Liberty Hamilton
Topic areas: auditory disorders multisensory processes
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
Everyday life consists of complex auditory environments that require the listener to filter out noise and process a target signal. While behavioral performance in children has been well documented, more research is necessary to understand the development of neural correlates for understanding speech in noise. Most investigations of how children’s brains process speech in noise are limited to behavioral or noninvasive measures such as electroencephalography (EEG) or auditory brain response (ABR) which have little spatial specificity. Intracranial recordings, on the other hand, allow for direct recording of neural activity with high spatiotemporal resolution. In this study, we administered the Bamford-Kowal-Bench (BKB-SIN) test to children with intractable epilepsy while recording neural data using stereoelectroencephalography (sEEG). This test increases the volume of multi-talker background noise with each trial while the listener attempts to accurately repeat the sentence. All procedures were approved by the UT Austin Institutional Review Board. Data were collected from 16 participants (7 females, 9 males; ages 5-22 years old). Patients were recruited from Dell Children’s Medical Center in Austin, Texas, and Texas Children’s Hospital in Houston, Texas. Neural data was aligned to sentence timings and preprocessed to remove epileptiform activity. Behavioral data and high gamma amplitudes (70-150 Hz) from the primary, secondary auditory cortex and related areas were extracted and compared in different signal-to-noise (SNR) conditions. Behaviorally, the age corrected average score for lists 1-8 was 2.58 dB SNR. There were 10 participants with an SNR score in the normal range, 4 with a mild SNR loss range score, and 1 with a moderate SNR loss score. Analysis of neural data indicated that similar to adults, the superior temporal gyrus (STG; p=0.0275) and planum temporale (PT; p=0.0283) had a stronger response in high SNR conditions than low and mid. The STG response also varied based on listener accuracy (p=0.0422), with higher amplitude in incorrect conditions compared to correct. Interestingly, the amplitude response in the middle temporal gyrus (MTG) was stronger for mid SNR conditions than in high (p=0.0151) and low (p=0.0082) SNR conditions. Additionally, the insula showed an effect of age on amplitude for correct versus incorrect response trials, with stronger responses in older compared to younger participants (p=0.0483). The results of this study provide anatomically precise information regarding speech in noise processing in children and add to the body of literature concerning typical development in the auditory cortex and related areas.
Mackenna Wollet, Chloe J. Bair-Marshall, Kathleen A. Martin and Robert C. Froemke
Topic areas: brain processing of speech and language correlates of auditory behavior/perception neural coding
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
Learning at the behavioral level can sometimes be ‘one-shot’ or almost instantaneous, but most studies of long-term plasticity such as long-term potentiation (LTP) seem to require dozens or hundreds of induction events yet measure outcomes minutes afterwards. However, recent studies of behavioral timescale synaptic plasticity (BTSP) show that enduring hippocampal plasticity can be induced in just a few trials in vivo and in vitro (Bittner et al. Nat Neurosci 2015, Bittner et al. Science 2017, Liao & Losonczy Annu Rev Neurosci 2024). Here we examine the potential for BTSP to occur in the mouse auditory cortex. For in vivo experiments, adult mice were head-fixed and trained on a 2-alternative forced-choice task to distinguish pure tones of one frequency (e.g., 13 kHz) from tones of other nearby frequencies (Martin et al. Nat Neurosci 2024). We performed 2-photon imaging during behavior and observed sudden, sometimes trial-to-trial changes in frequency tuning curves. For some responsive cells, new peaks to behaviorally relevant stimuli could emerge or be suppressed, shifting overall tuning curve structure. In some cases, unresponsive cells suddenly became responsive to tones during behavioral training. Many of these sudden and enduring changes happened during the earliest phases of training. We also examined BTSP in brain slices of adult mouse auditory cortex. We made whole-cell recordings from layer 5 pyramidal neurons in current-clamp mode, monitoring EPSP amplitudes before and after pairing events with strong postsynaptic depolarization ~500 ms on 2-10 pairing trials. ~25% of recordings showed BTSP post-pairing when inhibition was intact; this increased to ~50% of recordings with inhibition blocked. This also led to a change in spike output probability after pairing- 3.8% of trials before pairing evoked 1 or more postsynaptic spikes that increased to 27.3% of trials after pairing (p=0.02 in paired t-test, n=6 cells), without significant changes to resting membrane potential or input resistance. Finally, we pharmacologically blocked L-type calcium channels (LTCCs) and observed no increase in EPSP magnitude or spike probability following pairing protocol (p=0.18, n=2 cells). This is also observed with hippocampal BTSP (Bittner et al. Science 2017), and we have determined that LTCCs are required for cortical BTSP induction.
Yingjia Yu, Kyle Rupp, Jasmine Hect and Taylor Abel
Topic areas: brain processing of speech and language cross-species comparisons neural coding
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
Voice is a uniquely rich and important sensory cue whose distinct acoustic features allow us to recognize and differentiate conspecifics. Previous neuroimaging and electrophysiology research reveals a hierarchical voice processing network within the human auditory cortex. However, the cortical representation of higher-order voice information, such as emotion and identity, is intensely debated. To address this gap, we presented speech-style stimuli that varied systematically in both speaker identity and emotional content to patient-participants undergoing invasive monitoring for epilepsy surgery evaluation via intracerebral recordings. The stimuli featured six speakers (three female, three male) with matched fundamental frequencies. Patients performed a same-different judgment task on speaker identity, where each trial consisted of two sequential sentences from speakers of the same sex. Within one patient, broadband high-gamma activity (HGA, 70–150 Hz) was extracted and used to fit sex-specific regression models quantifying the spatiotemporal dynamics of neural data with identity-related acoustic and category-level information. Sex-specific decoding models demonstrated above-chance classification accuracy in discriminating between different identities (male: 42.8%, female: 38.9%, chance = 33.3%). To investigate what identity-related information was represented in the neural data, sex-specific linear regression encoding models were built using time-varying acoustic features extracted from experimental stimuli with the addition of binary speaker identity labels. We quantified the temporal dynamics of categorical identity representation by comparing model performance and model fit in a sliding-window analysis: a nested model using only the acoustic features versus a full model using identity labels alongside the same acoustic predictors. We found that the full model for male speakers was better able to predict neural responses around superior temporal gyrus/sulcus than the nested model, with effects emerging beyond 400ms after stimulus onset. Likelihood ratio test statistics comparing nested to full models showed statistically significant improvement in model fit for sites within STG, potentially suggesting a category-level encoding of speaker identity localized to the secondary auditory cortex. These findings provide evidence that the secondary auditory cortex hierarchically encodes speaker identity, potentially represented as complex voice features beyond simple acoustics. Our results extend previous research by demonstrating that categorical identity information is represented at an abstract level within the ventral auditory processing stream.
Abigail McElroy, Kai Park, Jessica Mai and Christopher Rodgers
Topic areas: brain processing of speech and language correlates of auditory behavior/perception hierarchical sensory organization
Fri, 11/14 10:30AM - 12:00PM | poster
Abstract
Hearing loss reduces the ability to communicate with others and navigate the world. Spatial hearing, which is typically supported by cues from both ears, is especially impaired by hearing loss. Humans and animals can recover some spatial hearing ability after hearing loss, though outcomes are heterogeneous and the compensatory mechanisms involved are not fully understood. To study changes in spatial hearing after hearing loss, our lab has created a sound seeking task in which freely moving mice must navigate to a sound source. We previously demonstrated that conductive hearing loss impairs performance at this task, but mice with unilateral hearing loss are able to recover their performance over a period of days, while mice with bilateral hearing loss do not recover. I have also observed that lesion of auditory cortex impairs task performance in mice both with and without hearing loss. Auditory cortex has been shown to be necessary for recovery of spatial hearing performance in other studies, and recordings of auditory cortical neurons show they are tuned to sound location. I hypothesized that representations of sound location in auditory cortex were necessary to perform sound seeking and that changes in these spatial representations enabled recovery after hearing loss. To investigate these hypotheses, I performed electrophysiological recording using tetrode wires in right auditory cortex of an awake, freely moving mouse. Sounds were presented passively as the mouse moved about the behavior arena. I recorded in three sequential conditions: prior to hearing loss, after unilateral hearing loss, and after bilateral hearing loss. Prior to hearing loss, sound-evoked responses were robust, and on average, neurons were more activated by sounds from contralateral space as expected. After unilateral hearing loss, evoked responses decreased, though preference for contralateral sounds was preserved. Strikingly, evoked responses increased after bilateral hearing loss, returning to the magnitude observed prior to hearing loss. There was also an increased preference for sounds at the midline, rather than in contralateral space. I will follow up on this work by recording from more mice during both passive sound presentation and task performance across hearing loss conditions to determine whether these changes in sound evoked response and spatial representation have behavioral effects. By doing so, we will understand what effect these changes have on recovery after hearing loss and whether they are adaptive for spatial hearing.
Jasmine Hect, Kyle Rupp and Taylor Abel
Topic areas: correlates of auditory behavior/perception neural coding
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
Auditory-responsive cortex extends beyond primary and belt regions to include inferior frontal gyrus (IFG), yet the representational role of these frontal responses remains poorly understood. One possibility is that IFG encodes distinct acoustic features in parallel with canonical auditory areas; alternatively, its responses may reflect downstream processing of abstracted representations formed earlier in the auditory hierarchy. Clarifying this distinction is critical for understanding the functional architecture of auditory cognition. To investigate, we recorded intracranial stereo-EEG from 31 neurosurgical patients listening to a diverse set of naturalistic sounds spanning speech, music, and environmental categories. A subset of IFG channels (primarily localized to pars triangularis) exhibited early response latencies comparable to those in superior temporal gyrus (STG), motivating a targeted analysis of their tuning properties. We estimated spectrotemporal receptive fields (STRFs) using the maximally informative dimensions (MID) method, identifying the acoustic features that best predicted high-gamma activity. Data-driven clustering revealed reoccurring frequency and temporal tuning motifs across the auditory-responsive network. While STRFs were less reliably modeled in IFG than in lower order areas, they exhibited tuning resembling canonical motifs found in temporal cortex. Silhouette analyses confirmed that IFG channels did not form a distinct tuning profile, suggesting their auditory responses reflect inherited encoding of complex acoustics. These findings provide evidence that early IFG responses may participate in the auditory ventral stream by preserving canonical feature tuning in service of higher-order categorization.
Elaida Dimwamwa, Per-Niklas Barth and David Schneider
Topic areas: thalamocortical circuitry and function
Fri, 11/14 4:30PM - 6:00PM | poster
Abstract
Whether it is the intended chime of a doorbell that we press, the unintended thud of our foot on the floor as we walk, or the trial-and-error process of learning to play an instrument, we constantly hear and learn to expect sounds caused by our actions. Such learned expectations for the sensory consequences of our actions require an internal model that integrates motor and sensory information. It has previously been shown that deep layer neurons of the primary auditory cortex (A1) integrate auditory information with motor information from areas such as secondary motor cortex (M2). Further, with learning of a motor-sensory association, movement-related activity in A1 remaps to represent the identity and timing of an expected, self-generated sound. However, the cellular and synaptic basis for the formation and expression of such an internal model remains unknown. Here, we hypothesize that the internal model for self-generated sounds lies within the plastic synapses of deep layer A1 neurons, specifically the synapse between M2 neurons and layer 6 corticothalamic (L6CT) neurons in A1. To test this hypothesis, we trained mice to produce a simple forelimb lever press coupled to a predictable sound and made translaminar recordings from A1, including optogenetically identified L6CT neurons. In ongoing experiments, we are testing the extent to which L6CT neuron activity represents the expected acoustic outcome of the movement. Additionally, we are manipulating synaptic plasticity within A1 and are measuring the functional synaptic strength of M2 inputs onto L6CT neurons to determine the role of M2-to-A1 synapses in the formation and expression of an internal model. Overall, these experiments will provide valuable insights into the identity of the motor and sensory neurons that learn internal models as well as the mechanism underlying motor-sensory transformations.