- Home
- Previous APAN Programs
- APAN Session Browser 2024
APAN Session Browser 2024
Dan Sanes
Fri, 10/4 9:05AM - 10:00AM | S1
Abstract
Abstract
Liberty Hamilton
Fri, 10/4 2:05PM - 2:30PM | S2
Abstract
Abstract
David Schneider
Fri, 10/4 3:35PM - 4:00PM | S3
Abstract
Abstract
Jennifer Lawlor, Melville Wohlgemuth, Cynthia Moss and Kishore Kuchibhotla
Fri, 10/4 12:00PM - 12:45PM | T1
Abstract
Categorical perception of sensory inputs, including human speech, enables adaptive behavior and is thought to emerge in the sensory cortex. There would be significant computational advantages, however, to functional specialization before cortical processing. To what extent do categorical representations emerge early in the auditory hierarchy? To address this, we conducted two-photon imaging experiments in an animal species that depends heavily on acoustic signaling processing and exhibits a rich repertoire of communication calls, akin to human listening and speech. The big brown bat, Eptesicus fuscus, stands out in the animal kingdom for its reliance on sound processing for navigation (through echolocation) and for social interactions (through acoustic communication with conspecifics). Eptesicus fuscus move in three dimensions through their environment and must rapidly distinguish between acoustic signals used for navigation and those used for social interaction. Here, we used two-photon calcium imaging in the awake big brown bat, to enable large-scale (9,446 neurons in five bats), spatially resolved recordings in the inferior colliculus (IC) of animals listening to auditory playbacks. We discovered a novel, superficial tonotopy in the IC that was orthogonal to spatially clustered representations of social and navigation vocalizations. Population decoding revealed sharp boundaries across, but not within, these categories. Temporally reversing the vocalizations preferentially impacted the encoding of conspecific vocalizations, suggesting that IC neurons are sensitive to the ethological relevance of the stimuli. To examine the categorical nature of single neurons, we created four acoustic morphing continua that transition between social and navigation exemplars. We found a substantial fraction of recorded neurons responded in a categorical manner to one of those continua. Auditory models for perceptual categorization rely on the idea that the periphery and midbrain possess mostly a feedforward and filter-bank role. Our data support a revised view of neural categorical representation in which ethologically relevant sensory streams are spatially segregated early in the auditory hierarchy and provide parallel channels of categorical ‘primitives’ to downstream regions.
Lucas Vattino, Maryse Thomas, Cathryn MacGregor, Christine Junhui Liu, Carolyn Sweeney and Anne Takesian
Fri, 10/4 12:00PM - 12:45PM | T2
Abstract
Inhibitory interneurons in neocortical layer 1 (L1-INs) regulate cortical plasticity and learning, but the long-range and local cortical inputs that control the activity of these INs have not been characterized. L1-INs can be subdivided into two major classes by the expression of either neuron-derived neurotrophic factor (NDNF) or vasoactive intestinal peptide (VIP). These INs are thought to integrate sensory and neuromodulatory signals from the diverse long-range axons that populate L1. L1-INs also make inhibitory synaptic contacts with their neighbors and are connected electrically through gap junctions, suggesting that they may form complex networks. Here, we combined anatomical tracing, in vitro electrophysiology, and in vivo two-photon imaging experiments in the mouse auditory cortex to understand how specific sensory inputs recruit the L1-INs. We used monosynaptic retrograde tracing and whole-cell electrophysiology to characterize the thalamic inputs onto VIP and NDNF L1-INs within the auditory cortex. We find that the vast majority of auditory thalamic inputs to these L1-INs unexpectedly arise from the ventral subdivision of the medial geniculate body (MGBv), the tonotopically-organized primary auditory thalamus. These L1-INs receive robust functional monosynaptic MGBv inputs that are comparable in strength and short-term plasticity to those in the L4 excitatory pyramidal neurons. Activation of these thalamic axons drives robust feed-forward inhibition onto both L1-IN subtypes, but differences in feed-forward inhibition between these subtypes suggests that they receive distinct sources of inhibitory inputs. To interrogate the synaptic connectivity between the L1-IN subtypes, we performed fluorescence-guided whole-cell electrophysiology while optogenetically activating VIP or NDNF L1-INs. We found that GABAA-mediated synaptic connections between NDNF neurons were significantly stronger than those between VIP neurons or other L1-INs, suggesting a robust recurrent inhibitory network within the NDNF IN subpopulation. To understand how the connectivity between neighboring NDNF L1-INs impacts in vivo sound processing, we activated ensembles of NDNF L1-INs in awake mice while recording the activity of the NDNF L1-IN network using two-photon calcium imaging. Strikingly, activation of small NDNF L1-INs ensembles (
Alexandria Lesicko, Erin Michel, Autumn Soots, Kehinde Adeaga and Maria Geffen
Fri, 10/4 12:00PM - 12:45PM | T3
Abstract
Real-world auditory behaviors, such as vocalization, sound-driven defense behavior, and orienting movements, often extend beyond passive listening and involve complex audio-motor and multisensory integration. Within the central auditory system, the inferior colliculus (IC) serves as an obligatory relay station and massive convergence center for auditory information. In addition to its role in sound processing, the IC receives input from diverse multisensory and neuromodulatory structures and is implicated in a variety of such acoustico-motor behaviors. However, little is known about the representation of somato-motor signals within the IC and their functional role in auditory behavior. In this study, we performed two-photon imaging in the IC while recording the spontaneous movements of head-fixed mice and presenting a variety of sound stimuli. Video recordings were analyzed using FaceMap and DeepLabCut and neurons that were responsive to either sound, movement, or both were parsed using a generalized linear model. We found that movement was robustly encoded in the IC, with movement-responsive neurons surprisingly outnumbering sound-responsive neurons. Most movement-responsive neurons responded to facial or ear movements, with fewer responding to movements of the limbs or trunk, and many neurons encoded movement from multiple body regions. To further investigate how somato-motor inputs affect auditory behavior, we trained mice to perform a go/no-go sound detection task in which they lick for a water reward during the presentation of a noise target stimulus. Once mice reached a performance criterion of 80% correct on behavioral training, the amplitude of the noise target was systematically varied to determine their detection thresholds and they were moved into a testing phase in which somato-motor inputs to the IC were optogenetically activated on a subset of trials. Preliminary data showed that activation of somato-motor inputs to the IC led to a decrease in performance accuracy on the sound detection task. Finally, we performed anterograde trans-synaptic labeling of neurons in the IC that receive somato-motor input. We found axon fibers from trans-synaptically labeled IC neurons in the medial geniculate body, laterodorsal tegmental nucleus, contralateral lateral cortex of the IC, and the superior colliculus, suggesting that IC neurons that receive somato-motor input project to regions involved in auditory and motor processing. Together, these experiments demonstrate robust somato-motor encoding and modulation of auditory information in the IC.
Nathan A. Schneider, Michael Malina and Ross S. Williamson
Fri, 10/4 2:30PM - 3:15PM | T4
Abstract
Auditory-guided behavior is a fundamental aspect of our daily lives, whenever auditory information guides our decisions and actions. Nestled amongst several populations, extratelencephalic (ET) neurons reside in the deep layers of auditory cortex (ACtx) and provide a primary means of routing auditory information to sub-cortical targets associated with decision-making and action. To investigate the behavioral role of L5 ET neurons, we trained head-fixed mice to categorize the rate of sinusoidal amplitude-modulated (sAM) noise bursts as either fast or slow to receive a water reward. We then used two-photon calcium imaging alongside selective GCaMP8s expression to monitor the activity of L5 ET, as well as layer (L)2/3 and L5 intratelencephalic (IT) populations. L5 ET neurons significantly changed their stimulus tuning across learning. Longitudinal recordings revealed that these neurons dynamically shifted their responses to selectively encode the slow and fast categories of the trained stimuli. This categorical selectivity correlated with performance and was completely absent in untrained mice. In trained mice, these L5 ET neurons showed notably weaker selectivity during passive listening, suggestive of top-down modulatory input. Furthermore, decoding analyses revealed a robust representation of category identity in the L5 ET population which grew with learning. Categorical information was weaker and stayed relatively constant in both L2/3 and L5 IT populations, implicating L5 ET cells specifically in the acquisition of auditory categories. Behavioral choice could also be robustly predicted from L5 ET activity. Choice activity grew with learning and preceded motion onset, emphasizing that these signals were separate from motor activity. Choice signals were only weakly present in L2/3 and L5 IT populations and did not change across learning. However, while L5 IT neurons did not show categorical selectivity at stimulus onset, they did display categorical selectivity following the animal’s choice. This effect was only present in the earliest days of learning, hinting at a role for these neurons in early association learning of auditory stimuli. Together, these results suggest that ACtx projection neuron sub-types differentially encode behaviorally relevant stimuli throughout learning, emphasizing the divergent pathways from ACtx and their contributions to auditory-guided behavior.
Megan Kirchgessner, Mihir Vaze and Robert Froemke
Fri, 10/4 2:30PM - 3:15PM | T5
Abstract
The postnatally developing brain undergoes tremendous structural as well as functional changes, including in how individual and populations of neurons in the cortex respond to environmental stimuli (Katz and Shatz Science 1996; Froemke Annu Rev Neurosci 2015). Such developmental changes in sensory coding have been difficult to measure and quantify due to challenges in applying in vivo recording methods to small and physically growing brains. Here, we present results from longitudinal two-photon calcium imaging of hundreds of excitatory layer 2-4 neurons in the developing primary auditory cortex of mouse pups (N=9, mean=351 neurons per animal), starting from just after ear-opening at postnatal day 12 (P12) into adulthood (up to P60) in the same animals. Auditory cortical excitatory neurons transition from highly-correlated, sound-independent activity patterns to decorrelated, sound-evoked activity in response to pure sine-wave tones, as well as frequency-modulated sweeps, at P13-14. We tracked individual neurons over days (minimum: 9 days) to weeks (maximum: 7 weeks) of postnatal development. We found that superficial cortical neurons initially have a clear tonotopic organization that expands to encompass a broader range of frequencies at the expense of fine-scale precision within our imaging fields of view. Individual neurons’ best frequencies tended to remain stable on a day-to-day basis, while some neurons exhibited gradual drift towards nearby, predominantly higher frequency representations. Additionally, we observed that neuronal responses to playbacks of ultrasonic pup vocalizations (USVs) typically emerged a few days after auditory response onset, at P16-18. These responses were initially correlated with high-frequency spectral tuning, which contrasts with the organization of vocalization responses in the adult mouse auditory cortex (Galindo-Leon et al. Neuron 2009; Marlin et al. Nature 2015). USV responses then seemed to disperse across the whole of the imaging field, so that local best frequencies were uncorrelated with vocalization responses in older animals. Altogether, these data reveal how single-cell sensory-evoked activity in the auditory cortex – including to a spectrotemporally complex and naturalistic stimulus, USVs – emerges and changes across postnatal development.
Joel I Berger, Alexander J Billig, Phillip E Gander, Sukhbinder Kumar, Kirill V Nourski, Christopher K Kovach, Ariane E Rhone, Christopher M Garcia, Hiroto Kawasaki, Matthew A Howard and Timothy D Griffiths
Fri, 10/4 2:30PM - 3:15PM | T6
Abstract
In order to perceive the world around us, our brains have to keep scenes in memory and detect when scenes change. Recent intracranial work utilizing visual paradigms has begun to elucidate the specific neural mechanisms involved in these processes, highlighting contributions of structures such as the hippocampus and cingulate cortex. However, it is not yet clear whether the same mechanisms are involved in the processing of acoustic stimuli. Here, we report human intracranial recordings of single neurons recorded from various brain structures while participants performed separate tasks involving either auditory working memory or auditory boundary detection. We intentionally used non-verbal stimuli to isolate these processes and avoid potential confounds of including semantic information. For the working memory task, participants were required to keep in mind a target tone on each trial and then – following a delay period – match a repeated tone to the frequency of the target. For the boundary detection task, participants listened to concatenated texture sections, each lasting several seconds and consisting of tone glides randomly overlapping in time and frequency. Glide direction and frequency excursion were fixed within a section, but one or both parameters changed from section to section – thus creating acoustic boundaries that did not vary in their overall energy or spectrum. Participants listened passively to the stimuli and were then required to listen again while detecting boundaries between acoustic events. Single neurons were isolated offline using an automated procedure with manual curation. We found neurons within the hippocampus and cingulate whose firing rates were modulated at various phases of the working memory task, including throughout the delay period and during active tone adjustment. Often, these neurons showed suppression rather than increased activity, though there was heterogeneity in response types across the population. Additionally, for the first time, we demonstrate cells in the hippocampus that respond to acoustic boundaries only when participants are actively engaged in detecting these events. Overall, these data implicate the hippocampus and cingulate cortex in auditory event segregation and working memory. Studies are ongoing to determine how widespread these processes are and how they relate to activity within auditory cortex.
Chen Lu and Jennifer Linden
Fri, 10/4 10:15AM - 12:00PM | A01
Abstract
Introduction People with 22q11.2 Deletion Syndrome (22q11.2DS) have high risk of psychiatric disorders such as schizophrenia and of hearing impairment from middle ear problems. Both hearing impairment and genetic risk for schizophrenia are thought to shift "cortical excitation-inhibition balance" towards excitation, but the definition, nature and impact of this shift are debated. Here, we used the Df1/+ mouse model of 22q11.2DS to ask whether hearing impairment and the 22q11.2 deletion have similar or different effects on activity of neuronal populations in the auditory cortex. Like human 22q11.2 deletion carriers, Df1/+ mice have high rates of hearing impairment from middle ear problems (Fuchs et al., 2013). Previous work has revealed abnormalities in cortical auditory evoked potentials in Df1/+ mice with and without hearing impairment (Lu and Linden, 2023). However, no studies have compared the effects of the 22q11.2 deletion and hearing impairment on neuronal population activity in the auditory cortex. Methods To mimic the middle ear problems affecting a subset of Df1/+ mice (n=6 hearing impaired, n=4 normal hearing), we performed ear surgery on WT mice at P11, removing the malleus bone (n=5) or creating sham controls (n=6). Auditory brainstem response threshold measurements at ages 4, 6 and 8 weeks confirmed that malleus removal produced hearing impairment in WT mice comparable to that observed in affected Df1/+ mice. Then, we recorded spiking activity of auditory cortical neurons (Df1/+: 1122 neurons; WT: 1772 neurons) using Neuropixels probes in awake, head-fixed mice listening passively to 16kHz tones at sound levels adjusted relative to hearing threshold. To test cortical adaptation and excitability, we measured changes in firing rates of neurons that were significantly responsive to 16kHz tones (Df1/+: 842 neurons; WT: 927 neurons) as we increased inter-tone interval (from 200 to 1000 ms) and relative sound level (from threshold to threshold+30dB). Results Analysis of single-unit neuronal population activity revealed that growth of tone-evoked firing rates with increasing inter-tone interval was abnormally high in Df1/+ mice compared to WT mice with similar hearing thresholds, and decreased with the severity of hearing loss (2-way ANOVA, p genotype = 0.003, p hearing < 0.001, p genotype×hearing = 0.79). In contrast, growth of tone-evoked firing rates with increasing sound level (relative to threshold) was abnormally low in mice with poor hearing sensitivity, but was not affected by genotype (p genotype = 0.81, p hearing = 0.002, p genotype×hearing = 0.79). Phase coherence of the tone-evoked LFP was also robustly elevated in mice with hearing impairment regardless of their genetic status. Surprisingly, spontaneous firing rate was not affected by hearing impairment, and was lower in Df1/+ than WT mice (p genotype = 0.006, p hearing = 0.65, p genotype×hearing = 0.88). These results were robust across different time intervals for measuring evoked or spontaneous firing rate; different normalizations of evoked activity; and different significance levels for tone responsiveness of neurons included in analysis. Conclusions We conclude that hearing impairment and the 22q11.2 deletion have distinctive and only partly overlapping effects on neuronal population activity in auditory cortex, suggesting that shifts in cortical excitation-inhibition balance towards excitation can manifest in multiple ways at the level of neuronal population activity. References Fuchs JC, Zinnamon FA, Taylor RR, Ivins S, Scambler PJ, Forge A, Tucker AS and Linden JF (2013). Hearing loss in a mouse model of 22q11.2 Deletion Syndrome. PLOS ONE 8(11): e80104. doi:10.1371/journal.pone.0080104. Lu C and Linden JF (2023). Auditory evoked-potential abnormalities in a mouse model of 22q11.2 Deletion Syndrome and their interactions with hearing impairment. bioRxiv, doi: 10.1101/2023.10.04.560916. Acknowledgements This research was funded by the UCL Institute for Mental Health (Small Grant 2022 to JFL) and the UK Medical Research Council (MR/P006221/1 to JFL).
Zhe-Chen Guo, Jacie McHaney, Zilong Xie and Bharath Chandrasekaran
Fri, 10/4 10:15AM - 12:00PM | A02
Abstract
Understanding speech is a complex task involving the rapid mapping of acoustic signals to stored phoneme representations in the brain. Middle-age marks a critical period in life when individuals begin to experience various changes including self-reported hearing difficulties, even though they often still exhibit clinically-normal hearing thresholds and cognition. Much prior work focuses on identifying age-related differences in cochlear synaptopathy and low-level auditory encoding while it remains unclear how middle age impacts more linguistically relevant aspects of speech perception. By modeling and decoding electroencephalographic (EEG) responses to continuous acoustic properties and discrete phonemes in naturalistic speech, we identified a range of phoneme encoding changes in middle-aged listeners. Twenty-four younger (18–25 years old, M = 21.42, SD = 2.02) and 20 middle-aged (40–54 years old, M = 46.05, SD = 4.35) adults with normal cognition and hearing thresholds listened to continuous speech in quiet while EEG was recorded. Multivariate temporal response function (TRF) encoding models predicting neural responses from acoustic representations of the stimuli revealed no significant age group difference in the tracking of spectro-temporal acoustic features. To further examine categorical phoneme representations, we averaged EEG responses to each phoneme to derive “phoneme-related potentials” (PRP) and trained a deep neural network classifier (EEGNet) to decode PRPs. Contrary to the TRF analysis, phoneme labels of PRPs were predicted less accurately and with greater uncertainty for middle-aged adults than younger adults, suggesting a less precise phoneme encoding. Interpreting features learned by the PRP classifier revealed that phoneme processing in middle-aged listeners is delayed and recruits a broader network of neural resources. Finally, through representational similarity analysis, we found that middle-aged adults’ PRPs aligned less well with a phonological featural description of phonemes, indicating less robust phoneme encoding at the feature level. Together, the findings suggest that despite not impacting neural tracking of lower-level acoustics, middle age is associated with significant declines in maintaining phoneme representations. The declines may be linked to fuzzier representations of important phonemic features in the auditory cortex, which could cascade to downstream processes and contribute to greater listening effort.
Grant Zempolich and David Schneider
Fri, 10/4 10:15AM - 12:00PM | A03
Abstract
Identifying mistakes is important for improving performance during acoustic behaviors like speech and musicianship. Although hearing is instrumental for monitoring and adapting these behaviors, the neural circuits that integrate motor, acoustic, and goal-related signals to detect errors and guide ongoing sensorimotor adaptation in mammals remain unidentified. Here, we develop a novel closed-loop, sound-guided behavior that requires mice to use real-time acoustic feedback to guide skilled ongoing forelimb movements. Large scale electrophysiology recordings reveal that the mouse auditory cortex integrates information about sound and movement, as well as encodes error- and learning-related signals during this sound-generating behavior. Distinct groups of auditory cortex neurons signal different error types, and the activity of these neurons predicts both within-trial and across-trial behavioral adaptations. Brief, behavior-triggered optogenetic suppression of auditory cortex during error signaling hinders behavioral corrections on both rapid and long time scales, indicating that cortical error signals are necessary for skilled acoustic behaviors. Together, these experiments identify a cortical role for detecting errors and learning from mistakes and suggest that the auditory cortex plays a critical role in skilled, sound-generating behavior in mammals.
Prajna Bk, Natalie Gustafson and Justin M Aronoff
Fri, 10/4 10:15AM - 12:00PM | A04
Abstract
Binaural fusion is a perceptual phenomenon where the auditory signals received by the two ears are integrated to form a single auditory percept. In both typical hearing (TH) and bilateral cochlear implant (CI) user populations, fusion is potentially affected by two parameters: interaural correlation (IC; measured as the normalized cross-correlation, i.e., the extent to which signals are correlated across the ears), and, interaural cochleotopic asymmetry (ICA; i.e., the extent to which the stimulation delivered is offset from the matched cochleotopic locations across ears). An increase in ICA or a decrease in IC negatively impacts fusion, however, it is not clear if, when both occur concurrently, they have a combinatorial effect on binaural fusion. Thus, further investigation is warranted to gain a better understanding of the binaural fusion phenomenon. Such an investigation requires independent manipulation of IC of the binaural signal and asymmetry of stimulation across the ears, which is achievable only in CI users. The goal of this study was to investigate whether, for bilateral CI users, the combination of decreased IC and increased ICA has a greater effect on fusion than the effect of either factor alone. 12 Cochlear brand bilateral CI users participated in a binaural fusion task where the IC and ICA of the stimulation were manipulated independently. The envelope IC of the signal was varied between 0.89 and 1.00. Further, one electrode close to the apical and basal ends of the cochlea each was used as reference and ICA was varied by separately pairing the reference electrode with 5 electrodes spread across the electrode array in the other ear. The choice of apical and basal ends as reference points allows for a greater maximum magnitude of asymmetry compared to using the middle of the cochlea. Participants indicated their auditory percepts by adjusting a dial to control the intracranial widths, location, and number of images appearing on a visual display of a schematized head. A measure for fusion was derived directly from the reported image. Fusion-dispersion scores range from 0 to 17 where a score of 0 represents a punctate fused auditory percept, a score of 9 represents a single diffused auditory percept, and a score of 17 represents two separate punctate percepts, one at each ear, indicating a complete lack of auditory fusion. A weak but significant negative correlation was found between the IC of the envelopes and binaural fusion for symmetric stimulation (i.e. when either apical electrodes in both ears or basal electrodes in both ears were used). Similarly, a moderate and significant negative effect of increasing asymmetry was seen on perceived fusion for diotic envelopes (IC =1). Further, comparing the symmetric and asymmetric conditions (an interaural asymmetry of approximately 2.5 octaves) for diotic envelopes showed that asymmetry led to a significant decrease in binaural fusion (median fusion-dispersion score increased from 4.5 to 10.25). Decorrelating the signal when the stimulation was symmetric across the ears (ICA=0) also reduced the degree of fusion perceived (median fusion-dispersion score increased from 4.5 to 9.5). Examining the combined effect of IC and ICA, it was observed that decorrelation in the presence of asymmetric stimulation did not produce a significant effect (median fusion-dispersion score increased from 10.25 to 12.25) and the effect size was negligible. Lastly, while increasing ICA for a decorrelated signal did not produce a significant effect (median fusion-dispersion score increased from 9.5 to 12.25), its effect size was large. In conclusion, decrease in IC and increase in ICA, both have a negative influence on perceived fusion. However, the results did not yield a further significant decrease in reported fusion when ICA and IC were combined. This suggests that the combinatorial effect of these manipulations, if it exists, is inconsistent.
Sunreeta Bhattacharya and Ross Williamson
Fri, 10/4 10:15AM - 12:00PM | A05
Abstract
Foraging in a new environment under constraints of time and energy requires an animal to rapidly learn relevant parameters such as the distribution of resources and travel times between patches. Animals can also use other strategies for foraging that are model-free and value-based, driven by recent history of successful foraging attempts, but which may prove to be suboptimal with respect to reward foraging. To probe behavioral and neural correlates of flexible inference and adaptive decision making, a hidden-state foraging task is a commonly used experimental setup that can be used to discern specific strategies and decision variables being utilized by a forager. Past studies (Vertechi et al 2020, Cazzettes et al 2023, 2024) have shown that mice transition from a value-based or stimulus-bound strategy (early in learning) to an inference-based strategy upon learning task-relevant parameters (later in learning). However, it remains unclear whether the decision behavior is a result of a slow learning process manifesting as inference-based optimal behavior later in learning, or if inference is more rapid and latent in early learning only to be expressed gradually. The latter is possible due to low confidence estimates of environmental variables (Gershman 2019), misaligned utility functions, incomplete knowledge of task-relevant and task-irrelevant stochasticity in early stages of learning, potentially causing inference-based decision variables to be suppressed in a competition between multiple strategies. Studies in sensory cortex (Kuchibhotla et al 2019, Drieu et al 2024) suggest that associations are learnt more rapidly than performance metrics suggest. To address these questions, we have adopted an ideal observer’s perspective to examine varying levels of decision and inference noise that lead to suboptimal behavior despite the underlying strategy being inference-based. Furthermore, we describe the effect of a predictive auditory cue on an agent’s reliance on previously learned task statistics. Our current efforts are focussed on probing the representations of environmental and task-specific parameters in frontal (orbitofrontal and anterior cingulate) and primary auditory cortex, and their dynamic interactions, to understand how animals integrate past knowledge with new sensory information for optimal foraging decisions with a view to elucidating the mechanisms of adaptive decision-making and the neural substrates that support flexible inference in complex environments.
Shailee Jain, Rujul Gandhi, Matthew K. Leonard and Edward F. Chang
Fri, 10/4 10:15AM - 12:00PM | A06
Abstract
Speech is a dynamic acoustic signal that requires listeners to continuously extract and integrate information at multiple timescales. Prior studies have shown that local neural populations (Bhaya-Grossman & Chang, 2022; Yi, Leonard, & Chang, 2019) and single neurons in human superior temporal gyrus (STG) (Leonard, Gwilliams et al., 2023) encode many different phonological and linguistic speech features, including acoustic-phonetic content, prosodic cues like pitch and intensity, and sequence information like phoneme and word-level surprisal. However, given that speech understanding involves more than simply identifying individual speech sounds, a key question is how these sounds are represented in the context of the words and phrases listeners perceive. Here, we used high-density Neuropixels probes to address how single neurons in STG encode speech features in a context-dependent manner. We built context-sensitive encoding models that learned to predict the spiking activity of each neuron using hidden states of a deep neural network (DNN) pretrained on speech. We chose speech DNNs as they can extract continuous, contextual features of their inputs across multiple timescales. To investigate the degree of context-sensitivity across neurons, we varied both the amount of prior context available to the DNN (20-1000ms) and the layer from which states were extracted. Overall, the DNNs better predicted neuronal firing than context-independent, linguistically-motivated speech features like consonants, vowels, pitch, and intensity. These improvements were independent of the speech feature a given neuron tuned to. The DNNs also outperformed both nonlinear spectrotemporal receptive field models (Keshishian et al., 2020) and randomly initialised speech DNNs, suggesting that their performance was not simply driven by their nonlinear computations, high dimensional representations, or model architecture. These effects were observed across all Neuropixels recording sites, showing that neurons throughout STG and across cortical layers can be well modelled by continuous, contextual DNN features. To better understand the nature of this encoding, we next investigated how each neuron’s prediction performance varied with the amount of prior context provided to the DNN and the layer from which hidden states were extracted. We applied linear dimensionality reduction methods to the encoding performance of neurons across the different DNN feature spaces. The primary source of variation across neurons was in their sensitivity to the amount of context, with some neurons showing an improvement in performance with more context, while others showed a decrease or no change at all. This suggests that neurons across STG capture contextual information at many different temporal scales. Selectivity for a specific DNN layer was a secondary source of variation across neurons, with some neurons preferring deeper DNN layers that have been shown to encode higher-order features like acoustic-phonetic content while others showed no strong selectivity or a preference for shallower layers that encode spectral modulations and envelope magnitude (Vaidya, Jain, & Huth, 2022; Pasad, Chien, Settle, & Livescu, 2024). This suggests that STG neurons also have substantial variability in feature tuning that may be orthogonal to the diversity in their context sensitivity. Lastly, given the heterogeneity in both tuning and context-sensitivity across neurons, we hypothesised that population-level neural activity could capture an integrated representation of the speech input at the timescale of perceptually-meaningful units like words and phrases. Using a decoding analysis, we found that populations with long context-sensitivity could faithfully represent the spectral content of speech over timescales consistent with higher-order word and phrase-level information (∼1sec). Overall, our results suggest that single neurons in human STG encode speech in a context-dependent manner. The substantial heterogeneity of neurons in both feature tuning and context-sensitivity likely enables local populations in this brain region to track multiple levels of speech content rapidly and in parallel.
Alejandra M. Husser, Besim Prenaj, Christopher J. Ritter, András Jakab, Huw Swanborough and Alexis Hervais-Adelman
Fri, 10/4 10:15AM - 12:00PM | A07
Abstract
Around 24 weeks gestational age, the foetal auditory system is responsive to vibration, allowing perception of the acoustical environment outside the womb, a capability that substantially increases in sophistication during the third trimester of pregnancy. Numerous studies have demonstrated that the prenatal auditory and linguistic environment influences postnatal capacities, such as the discrimination of speech in the mother tongue from that in a foreign language, speech signals from non-speech tonal segments or prosodic variations. Recent findings reveal that infants' earliest cries carry a pitch accent reflecting the language of their gestational environment. This suggests that articulatory patterns influenced by the auditory environment start to develop in utero, before vocal production begins. We aim to investigate the neuronal basis of the audiomotor network supporting those early linguistic expressions and their early development. Laryngeal control is part of orofacial coordination and key for early speech articulation, mainly characterised by prosodic variations. We thus focus on the identification and functional integration of the laryngeal motor cortex (LMC) in the early cerebral language network. Orofacial coordination including laryngeal control, is mainly represented in the lower part of the motor cortex. The LMC consists of a dorsal and a ventral part, the dorsal LMC being particularly relevant for prosodic aspects of speech, while the ventral LMC is commonly related to speech fluency. The Developing Human Connectome Project (dHCP) is a unique database of foetal and neonatal brain images acquired with magnetic resonance imaging (MRI). We used the functional resting-state data from 567 neonates to conduct functional connectivity analysis and derive network properties of language related brain regions. Newborns’ MRI data were acquired between 26 and 44 gestational weeks. The dataset includes preterm, and term born infants, defined by their gestational age at birth and at scan time being before 37 weeks of gestation or after, respectively. Region to region connectivity analysis was applied and drove the subsequent selection of seeds for the seed-to-voxel analysis. Linear regression analysis then aimed to identify the developmental trajectory of the network organisation. Connectivity analysis with semipartial standardised correlations (p-FDR < 0.05) show specific associations between ventral motor areas and primary auditory areas, such as the medial superior temporal gyrus (mSTG). Frontal language areas, such as the inferior frontal gyrus (IFG), demonstrate more extensive ties with the ventral motor cortex, including more dorsal sections of orofacial areas. Refined seed-to-voxel analyses with the IFG, mSTG and parts of the motor cortex as seeds, replicate the previous findings and further reveal specific associations between individual parts of the motor cortex and language areas, and with subcortical brain regions such as the insula, the cerebellum, the basal ganglia, the thalamus and the brainstem (cluster corrected p-FDR > 0.05). Linear regression analysis indicates a significant increase of overall connectivity strength between motor and language areas with gestational age. We identify a significant gender effect where female newborns have a significantly higher increase in inter-hemispheric connections with age compared to their male peers. Functional connectivity patterns computed from neonatal resting-state data reveal specific associations between ventral motor areas and speech-related brain regions. This suggests a similar location of the ventral LMC in the neonatal brain as in adults. Connectivity patterns of individual motor regions provide some indices for the localisation of the dorsal LMC in dorsal parts of the ventral orofacial cortex though results are less strong. Connectivity patterns of areas in between the assumed location of the ventral and dorsal LMC suggest representations of other orofacial movements such as jaw coordination supporting the dual LMC representation. Additionally, the association of dorsal LMC areas with supplementary motor areas, subcortical structures and the cerebellum overlap with research in adults. Although our findings reveal a certain differentiation between ventral and dorsal LMC in neonates, further investigations are needed to support the clear location and differentiation of the ventral and dorsal LMC in the neonatal brain. The changes in connectivity strength between motor and language areas during gestation reveal a risk for immature vocal control in children born prematurely. This is in line with numerous findings of cerebral and cognitive deviations in this population. Investigations in foetal brains could further enlighten the difference between in and ex utero brain development and the implication for early vocal control. Overall, our findings support the involvement of the laryngeal motor cortex in the neonatal language network allowing an early development of vocal control. Future investigations will allow to refine the identification of the laryngeal motor cortex in young brains and extend the understanding of its developmental trajectory.
Corey Roach, Lalitta Suriya-Arunroj, Sophia Fu, Harry Shirley, Joshua Gold and Yale Cohen
Fri, 10/4 10:15AM - 12:00PM | A08
Abstract
Perceptual decisions can be influenced strongly by learned cued expectations but exactly how and where in the brain this influence is exerted is not well understood. One possibility is that ‘higher-order’ brain areas like the prefrontal cortex, which encodes learned rules and expectations, applies top-down modulation to sensory areas like the primary auditory cortex (AC) to modulate perceptual decision-making. To test this idea, we trained two animals to perform an auditory discrimination task in which they reported (via joystick) whether a target tone was a ‘high’ or ‘low’ frequency. We titrated task difficulty by embedding the target in various levels of background noise. We preceded target onset by either an informative visual prior or a non-informative acoustic prior. While the monkeys performed the task, we simultaneously recorded from the ventrolateral prefrontal cortex (vlPFC) and AC using 24-channel linear arrays positioned perpendicularly across cortical layers. We analyzed the spectral properties of the local field potential (LFP) in both areas and quantified interareal communication (LFP-LFP coherence, Granger causality, and cross correlation) across different task epochs. We hypothesized that if a top-down signal from vlPFC was modulating AC responses, an evoked spectral event, likely in lower-frequency bands (theta-beta), would reliably occur between the onset of the LED and presentation of the target tone. Moreover, this event would be correlated with: a) a change in AC responsiveness to the test tone, b) an increased coherence, and c) an elevation Granger signaling in the PFC-to-AC direction. We found that correct trials in which the informative visual prior was presented were characterized by greater theta and beta coherence in the superficial and upper-middle layers of AC. Directional metrics indicate that this change in coherence was driven by information arriving from dlPFC during the presentation of priors. These preliminary findings support the idea that top-down influences on auditory perceptual decision-making involve context-dependent communication from vlPFC to AC.
Simon Baumann, Samuel Oguntayo, Richard Saunders, Patrik Wikman and Josef Rauschecker
Fri, 10/4 10:15AM - 12:00PM | A09
Abstract
The primate auditory-motor system is a network of auditory, motor, parietal and subcortical regions that form a sensory-motor feedback loop. The network is crucial for the development and maintenance of human speech and music performance, but the system is also relevant for the control of nonhuman primate vocalizations and auditory-guided actions (Rauschecker, 2011). Data from neuroimaging studies have provided evidence for interaction in a number of brain regions during speech and music performance in humans, and we previously highlighted the same network in rhesus macaques during a task requiring the reproduction of tone sequences using levers (‘monkey piano’; Archakov et al. 2020). However, neuroimaging data do not have the temporal and spatial resolution to follow the neuronal interaction patterns between the involved brain areas. Here we present neurophysiological data that have been recorded using multiple, high-density Neuropixels probes across the auditory-motor network in rhesus macaques. The probes allow the monitoring of sensory-motor responses in a large number of neurons (typically 50-150 per probe) across the auditory-motor network. Three rhesus macaques were trained to listen to and reproduce seven tone sequences on a monkey piano (see above). Specific catch trials were infrequently interspersed in the tasks, such as key presses that result in no sound or the wrong sound, in order to provoke error responses. Recording chambers were implanted over the left hemisphere giving access to the premotor cortex (PMC), the auditory cortex (AC), the posterior parietal cortex (PPC), and the putamen of the basal ganglia (BG) for up to four probes at a time. We are recording from these areas based on our fMRI data (Archakov et al. 2020) while the animals perform the above auditory-motor tasks using nonhuman primate versions of semi-conductor (CMOS) based Neuropixels probes with a 45-mm shank length providing 384 channels per probe that are selectable from 4000 recording sites in double rows along the length of the probes. For each probe, the recording sites are from portions of the probe in one or several auditory-motor areas (e.g., PMC and putamen) based on prior MRI data and pilot probe mapping. From the probes, we are obtaining single-cell data (spike-sorted with kilosort3) with a particular focus on sensory-motor and error responses and local field potentials (LFPs) for network interaction analysis based on Granger Causality, in order to generate a network interaction model.
Wuzhou Yang and Daniel B. Polley
Fri, 10/4 10:15AM - 12:00PM | A10
Abstract
Neurons in the central auditory pathway offer temporal processing par excellence in the brain. The intrinsic, synaptic, and circuit specializations that support high-fidelity, high-speed feature extraction in subcortical auditory centers have been extensively characterized. The other end of the temporal processing spectrum – encoding slowly varying temporal features – is an essential building block for the perception of important features in speech (e.g., prosody), music (e.g., rhythm), and segregation of auditory foreground objects from background sounds but has received far less attention. A recent study from our lab demonstrated an inverse hierarchy for specialized processing of rapid and slowly modulated sound features (Asokan et al., Curr. Biol 2021). This study identified a tradeoff between the inferior colliculus (IC) and primary auditory cortex (A1), with the IC offering excellent resolution of local features and poor sensitivity to slowly emerging features and A1 offering the opposite specialization. Recordings from the ventral subdivision of the medial geniculate body (MGBv) were somewhere in between, offering weak sensitivity to slow rhythmic patterns and middling resolution of rapidly modulated features. Here, we have revisited temporal processing in the auditory thalamus with an expanded emphasis on higher-order thalamic subdivisions, which are known to be the among the first sites of time-to-rate conversions for encoding envelope modulation rate and could therefore feature sensitivity to slowly emerging temporal patterns the matches or even exceeds the auditory cortex (Bartlett and Wang, J. Neurophys 2011). We performed simultaneous recordings from single units in the MGBv and higher-order auditory thalamus in unanesthetized head-fixed mice while presenting noise bursts separated by inter-burst intervals (IBIs) at different lengths to generate burst trains, we found that thalamic neurons exploited different encoding strategies to represent burst trains with short or long IBIs. Moreover, by arranging the IBIs separating consecutive noise bursts randomly or in a pattern (i.e., a rhythm), we revealed the activity of thalamic neurons emerged during the establishment of temporal patterns. Our findings open a door for further investigating the function of the auditory thalamus in processing complex temporal patterns.
Aneesh Bal, Andrea Santi, Samantha Soto, Patricia Janak and Kishore Kuchibhotla
Fri, 10/4 10:15AM - 12:00PM | A11
Abstract
Humans exhibit remarkable abilities in multi-task learning, utilizing compositionality to rapidly excel at novel tasks by building upon simpler primitives. However, the behavioral and neural mechanisms underlying compositional learning remain unknown. Studying the neurobiological basis of such learning presents unique challenges, including the need to obtain large-scale neural recordings across many unique tasks. Mice emerge as a promising candidate organism, given their amenability to longitudinal neural recordings. However, there is a common belief that mice may have limited cognitive capacity for more complex learning paradigms. To overcome these obstacles, we created the Multi-Task Mouse Playground, an innovative fully-automated behavioral training system that provides mice with a naturalistic environment and full volitional control over their learning process. Using the Mouse Playground, we first trained mice (n=7) on two primitive Go-NoGo tasks sequentially. Mice learned an auditory sweeps direction task (T1), where the S+ was a frequency upsweep, and the S- was a frequency downsweep. We subsequently trained these mice on a second Go-NoGo task, in which a short pure tone served as the S+ (40ms 6kHz) and a long pure tone was the S- (160ms 6kHz). Finally, in the compositional task (T3), mice were required to discriminate between four stimuli: short upsweep (coherent, S+), short downsweep (S-), long upsweep (S-) and long downsweep (coherent, S-). Across these task phases, we demonstrate that mice a) exhibit the ability to learn two primitive tasks (T1, T2), b) exhibit high proficiency in learning a compositional task (T3), and c) display remarkably rapid learning in the compositional task on coherent stimulus features derived from the primitive tasks. Furthermore, mice retained high performance on both primitive sub-tasks following exposure to the compositional task, indicating that the compositional task does not interfere with prior learning. These results provide evidence that mice may combine knowledge from primitive tasks to solve a compositional task. Additionally, these results provide evidence that the Mouse Playground is a valuable platform for the automated training of multiple tasks, which we can use for future studies into multi-task learning more broadly.
Alex Clonan, Xiu Zhai, Ian Stevenson and Monty Escabi
Fri, 10/4 10:15AM - 12:00PM | A12
Abstract
Recognizing speech in noise, such as in a busy street or restaurant, is an essential cognitive skill where the task difficulty varies across acoustic environments and noise levels. Although humans use a variety of acoustic cues for sound segregation, the specific natural sound cues and the cognitive strategies for segregating speech in natural auditory scenes are poorly understood. To investigate human speech recognition, we assessed how the spectrum and modulation statistics of natural sounds mask the recognition of spoken digits (0 to 9). We enrolled participants in a psychoacoustic study where digits were presented in various natural background sounds (e.g., water, construction noise, speaker babble; tested for SNR=-18 to 0 dB) and perturbed variants. We perturbed the backgrounds by either 1) phase randomizing (PR) the sound spectrum or 2) spectrum equalizing (SE). PR retains the power spectrum but distorts the modulation statistics while SE distorts the power spectrum and retains modulation statistics. At a fixed noise level, accuracy can range from near 0% to 100% depending on the background, and perturbations of summary statistics can mask or unmask speech, thus improving or hindering speech recognition. To further quantify this interference between foreground and background spectrum/modulation content, we used texture synthesis (McDermott & Simoncelli 2011) to manipulate individual modulation statistics from the backgrounds. We provide evidence that sounds can mask and unmask, which can be driven by either natural sound spectrum or modulations. To identify the perceptual strategy and specific acoustic cues that influence accuracy, we developed generalized perceptual regression (GPR), a framework which links summary statistics from a hierarchical auditory model to word recognition accuracy. GPR is motivated by conventional regression methods used to estimate neural receptive fields which relate sensory features to neural spikes (0s or 1s indicating presence or absence of a neural response) for natural sensory stimuli. Here, in lieu of neural spike trains, our approach uses single trial perceptual judgments (0s or 1s, indicating correct and incorrect decisions). The approach is highly modularized, and has transitive applications to machine perception, animal behavior and task-specific analysis. Implementing GPR with a midbrain-inspired front end, we found acoustic summary statistics accurately predict single trial perceptual judgments, accounting for more than 90% of the perceptual variance across backgrounds and noise levels. Furthermore, interpretable, perceptual transfer functions from the regression framework identify which natural sound features are influential and how they impact recognition. Providing a schema for interpreting perception of diverse natural stimuli, transitive to various modalities and behavioral questions. The results indicate human perception of speech in natural backgrounds involves the interference of specific, and identifiable, summary statistics that are represented in auditory midbrain (Zhai et al 2020) and which account for spectral and modulation masking of speech.
Kaley Graves and Daniel Llano
Fri, 10/4 10:15AM - 12:00PM | A13
Abstract
GABAergic neurons in the inferior colliculus (IC) play a crucial role in auditory processing by extracting specific features of sounds (Ono et al., 2005). We have used a Gad67-GFP mouse model to better study the function of these neurons by using a green fluorescent protein that is expressed endogenously via the GAD67 promoter. Unfortunately, this strain of mouse is bred on a mouse strain that is known to lose hearing at a very young age. The aim of this study is to better develop a mouse model that allows for the functional examination of GABAergic neurons while retaining largely normal hearing throughout the lifespan, resembling that of the gold standard mouse model most used in hearing research. This study additionally aims to understand the mechanisms that underlie hearing loss in this animal model so that this hearing loss can be corrected. Therefore, this study will focus specifically on the hair cells (HCs) and ribbon synapses of the cochlea, which could be responsible for hearing loss in the mouse model, as well as clinical disorders, such as presbycusis. Hearing loss is easily assessed in humans via audiometric evaluation with auditory brainstem response testing (ABR) as a supplementary hearing threshold or neurodiagnostic test. Unfortunately, cochlear morphology in humans is not as easily assessed as it is in mice that have been genetically manipulated to have a hearing loss. Additionally, pure tone audiometric evaluation and speech testing is impossible to conduct in mice to draw direct comparisons of hearing thresholds in humans, which is why ABR testing is necessary. In lieu of human subjects, four different mouse models were used due to known similarities between the mouse and human genome in addition to parallels between structural components of the cochlea. Through investigation of these questions, this study has successfully developed a mouse model that allows the study of GABAergic neuron function in the IC while maintaining good hearing to better understand their contribution to auditory processing, and to also make a connection to potential other underlying causes of presbycusis in humans. Subsequent ABR testing at 6- and 12-month timepoints confirmed maintained good hearing in this new mouse model. Morphological analysis of the cochlea in each of the four strains further supported ABR threshold findings, in addition to brain histology which confirmed continued maintained expression of GFP under control of the endogenous GAD67 promoter.
Ellie Ambridge and Ediz Sohoglu
Fri, 10/4 10:15AM - 12:00PM | A14
Abstract
Human speech perception has a remarkable capacity to cope with sub-optimal auditory stimuli. This ability to cope depends, in part, on perceptual learning, which is a relatively long-lasting improvement in understanding degraded speech as a result of past experience or training (Davis et al., 2005). Two divergent theoretical frameworks of perceptual learning have been proposed. Transformation mechanisms suggest listeners learn to reverse the effect of degradation by a process of compensation or inverse transformation (see Cooke et al., 2022). This contrasts with a cue-reweighting mechanism which suggests a reweighting of acoustic-phonetic cues, for example, upweighting of intact cues and downweighing of degraded cues (Goldstone, 1998; Sohoglu & Davis, 2016). In the present study, 30 normal hearing listeners (25 female; mean age 21.47) were trained over three days to understand spoken sentences in which fine spectral modulations or fast temporal modulations were filtered out (see Flinker et al., 2019; Elliott & Theunissen, 2009; Webb & Sohoglu, 2023). Participants also listened to a clear speech control condition whereby the acoustic signal was unchanged. From day one to three, participants’ word report accuracy increased from ≈ 20% to ≈ 60%, showing robust perceptual learning. On days one and three, we used EEG and temporal response function (TRF) analysis (Crosse et al., 2016), to assess neural tracking of intact modulations i.e. those present in both clear and filtered speech. We also assessed neural tracking of degraded speech modulations i.e. those present in clear speech but attenuated in filtered speech. We found a significant speech type x day interaction effect on neural tracking (TRF model accuracies for intact modulations: F(1, 29) = 4.490, p = 0.04; for degraded modulations: F(1, 29) = 5.352, p = 0.03). Comparing day one to day three, for filtered speech, we found a significant increase in tracking of intact modulations (F(1,29) = 6.549, p = 0.02), and a significant increase in tracking of degraded modulations (F(1,29) = 6.183, p = 0.02). No significant change for either modulation type was apparent in the clear speech control condition (p >.05). These results provide evidence in favour of a transformation-based account of perceptual learning. Our findings shed important insights into how the brain adapts to perceptually challenging stimuli, with possible future clinical implications for cochlear implant users (Niparko et al., 2010).
Kamini Sehrawat and Israel Nelken
Fri, 10/4 10:15AM - 12:00PM | A15
Abstract
Sound preferences may be innate but are often learned. In humans, sound preferences are often studied using music, and preference for music was related to the release of dopamine in the striatal reward system. Moreover, there was an increase in functional connectivity between the auditory cortex and the reward system during pleasurable music listening. We studied sound preferences in mice. Studies have shown that mice can be trained to prefer human music. We exposed mice during critical periods of development (P7-P40) to musical excerpts from the first movement of Beethoven Symphony 9, with free access to food and water. As control, we used ‘sham’ exposed mice (exposed to silence), mice exposed to chirp sounds instead of music, and naive mice, who were left in the animal house and didn’t undergo any exposure. We measured their preference for exposed music in adulthood using a two-choice preference test. The test lasted 3 hours, and mice could choose between the music or silence zone without reinforcement. We found that the preference for the exposed music was sex-dependent. Male mice exposed to non-music environments avoided the music zone. However, both music-exposed and naive male mice spent more time in the music zone. Female mice did not show a clear preference due to frequent switching between the music and silence zones. We measured the neural responses of the same mice under anesthesia using wide-field calcium imaging. We observed an overall suppression in the neural activity of the exposed (to music and to silence) compared to the naive mice. There was a robust negative correlation between sound preference and auditory cortex activity in female mice. In contrast, the auditory cortex activity did not correlate with preference among males.
Stephen David, Jereme Wingert and Satyabrata Parida
Fri, 10/4 10:15AM - 12:00PM | A16
Abstract
Understanding how the brain encodes and extracts information from dynamic natural sounds is a long-standing problem in sensory neuroscience. The classic linear-nonlinear spectro-temporal receptive field (LN STRF) describes encoding as convolution of the sound spectrogram with a linear spectro-temporal filter, followed by a static rectifying nonlinearity. Subspace encoding models have been proposed as a generalization of the LN STRF, in which the stimulus is convolved with two or more filters, and the response is then a nonlinear combination of the projection into this tuning subspace. Subspace models provide a logical, interpretable expansion of the LN STRF but have proven difficult to estimate accurately, especially for dynamic natural sounds. Recently, an alternative modeling framework, using convolutional neural network (CNNs), has proven effective at accounting for encoding properties in auditory cortex substantially better than the LN model. However, CNNs are complex and the functional properties that underlie their improved performance can be obscure. The current study sought to measure the spectro-temporal tuning subspace from CNN model fits, thus providing insight into their functional properties. Single-unit data was recorded using high channel-count microelectrode arrays from primary auditory cortex (A1) of awake, passively listening ferrets during presentation of a large natural sound set (45-90 min unique sounds). A CNN was fit to the data, replicating approaches from previous work. To measure the tuning subspace, the dynamic STRF was measured as the locally linear filter approximating the input-output relationship of the CNN at each timepoint in the stimulus. Principal component analysis was then used to reduce this very large number of filters to a smaller subspace. Typically, 2-10 filters accounted for 95% of variance in the dynamic STRFs. The stimulus was projected into the subspace for each neuron, and a new model was fit using only the projected values. On average, the subspace model was able to predict time-varying spike rate nearly as accurately as the full CNN. Sensory responses could also be plotted in this relatively small subspace, describing nonlinear tuning in this relatively compact space. This result indicates that the nonlinear encoding properties captured by the CNN can be described as a subspace encoding model, providing a conceptual link between these two modeling frameworks.
John Magnotti, Yue Zhang, Xiang Zhang, Daniel Yoshor, Sameer Sheth, Isaac Chen, Kathryn Davis, Yingjia Yu, Zhengjia Wang and Michael S. Beauchamp
Fri, 10/4 10:15AM - 12:00PM | A17
Abstract
A key function of human superior temporal cortex is to decode the rapid stream of language elements that constitute speech. Speech perception is especially difficult under noisy listening conditions, but seeing the face of the talker improves intelligibility. To stuidy the neural mechanisms of audiovisual speech perception, we measured z-scored broadband high-frequency activity (BHA, 70 – 150 Hz) in epilepsy patients implanted with stereotactic EEG electrodes. Each patient was presented with 50 different words with pink noise added at a signal-to-noise of -4 dB. Words were presented in noisy auditory-only (An), noisy audiovisual (AnV) and visual-only (V) formats. Participants repeated each word after presentation. An accuracy score was calculated based on the phoneme correspondence between the stimulus and response words. Accuracy was entered into a linear mixed effects model (LME) as the dependent measure; stimulus format (An, AnV) were fixed factors; word and participant were random factors. As expected, accuracy was significantly higher for AnV than An words, 86% vs. 71%, p < 0.001. The RAVE iEEG software toolbox was used to preprocess and analyze all data. Across 15 participants, 196 electrodes were identified in the posterior superior temporal gyrus and sulcus (pSTG/S) that showed a significant response to speech. The BHA response to each word (800 ms before clip onset vs. 800ms after auditory onset) was calculated and entered into an LME with fixed effects of accuracy and format and random effect of electrode. There was a significant main effect of format (p < 0.001) and an interaction between accuracy and format (p < 0.001) caused by a negative correlation between perceptual accuracy and neural response for AnV words, but a positive correlation for An words. Across electrodes, the BHA response to V-only words moderated the correlation for AnV (p < 0.001), but not An words (p = 0.09). These results are consistent with a model in which visual speech suppresses responses to auditory phonemes that are incompatible with the viewed speech (e.g. a closed mouth shape is incompatible with the sound "da"; Karas et al., 2019).
Baher Ibrahim, Austin Douglas, Yoshitaka Shinagawa, Alexander Asilador and Daniel Llano
Fri, 10/4 10:15AM - 12:00PM | A18
Abstract
The inferior colliculus (IC) is an information processing hub that receives widespread convergent auditory projections. While the dorsal cortex (DC) - the non-lemniscal division of the IC- receives major auditory cortical projections, some reports showed that the DC is a tonotopic structure, which indicates the structure’s ability to integrate the basic spectral features of sound to process the complex auditory information. However, it is unclear if the DC has another level of mapping to integrate the different spectral and temporal features of complex sounds across different sound levels. Therefore, the two-photon imaging of the calcium signals was used to track the neuronal response of the DC to sounds of different degrees of spectral and temporal complexity such as pure tones (PT), unmodulated noise (UN), and amplitude modulated noise (AMN). In addition to the tonotopic map, the DC showed a periodotopic organization whereby the cells of a medial rostrocaudal area were best tuned to UN separating medial and lateral regions where the cells were best tuned to AMN. Analyzing the neuronal response to each tested sound was used to generate spectral and temporal indices for each neuron, which were then used to map the DC based on the dynamics of the neuronal responses across different sound amplitudes. The DC showed a cellular organization that mapped the DC surface into two main regions: dorsomedial (DMC) and dorsolateral (DLC) cortices. At the lowest tested sound level (40 dB SPL), the DMC was more responsive to simple tones (i.e. PT) and less responsive to complex sounds (i.e. UN and AMN) compared to the DLC. Although increasing the sound level increased the percentage of responsive cells in both DMC and DLC, it dynamically modulated the cells of the DMC to be more responsive mostly to UN without changing the response profile of the DLC. These maps were consistent across males and females at different estrous phases. These data suggest that the DC is mapped to process the different spectrotemporal features of sound based on the sound intensity to enhance the segregation of different sound sources.
Hugo Caffaratti, Yaneri A. Ayala, Ryan Calmus, Joel I. Berger, David Christianson, Federico De Martino, Essa Yacoub, Luca Vizioli, Lucia Melloni, Taufik Valiante, Kamil Uludag, Timothy D. Griffiths, Snehajyoti Chatterjee, Ted Abel, Matthew A. Howard III, Jeremy Greenlee and Christopher I. Petkov
Fri, 10/4 10:15AM - 12:00PM | A19
Abstract
Human-unique cognitive abilities such as language require the maintenance and manipulation of auditory content in short-term memory, but it remains unclear how this auditory content is processed across cortical layers when humans perceive, manipulate or produce words. Studies of dorsolateral prefrontal cortex (DLPFC) in nonhuman primates have implicated layer 3 neurons in a recurrent circuit for working memory (WM), and a canonical microcircuits model describes the superficial and deeper layer involvement in feedforward and feedback interactions between brain areas. Here, we focus on DLPFC, often implicated in WM but not language function, to test the hypothesis of a neural entanglement between WM and language combinatorial semantics across layers. To study these functional dimensions in humans, we employed an auditory task incorporating high-fidelity synthesized speech sounds and spoken responses from the participant. In our task, participants heard two or three words, and then either alphabetized or maintained the word order during a delay period, after which they verbally reported their mental order. The presented words and their order were manipulated on a trial-by-trial basis, such that alphabetization by the participants rearranged the words either into or out of a grammatical order. We analyzed high-density laminar array recordings obtained during deep-brain stimulation procedures in neurosurgery patients at the University of Iowa. Extracellular recordings were compared with laminar fMRI (0.8mm) in healthy participants, and patients scanned preoperatively. Laminar array recordings from DLPFC showed single neurons, including those in the approximate location of layer 3, modulated by all task components. WM manipulation-specific effects were strongest in layers 3 and 6. The language grammatical effects, although expectedly weaker than those for WM, engaged both superficial and deeper layers in DLPFC. Laminar fMRI is capable of resolving sets of layers, allowing us to study interactions across sites and laminae of the language network, such as within auditory and motor cortex, anterior-temporal lobe and inferior-frontal gyrus. The neuroimaging results recapitulated many of the laminar array recording effects, revealing overlapping significant clusters associated with grammatical and WM effects in the aforementioned network, including DLPFC. As a complement to these neuroimaging studies, single-nuclear multiomics (RNA+ATAC sequencing) analyses are also presently being conducted on clinical tissue samples obtained from the recorded area after task performance and compared to control samples taken from the same patient prior to task performance. Collectively, these results provide initial insights into the interplay of working memory and language across cortical layers from an area in the human brain that, though often implicated in cognitive domain-general function, is not commonly associated with language.
Shiyi Fang, Fei Peng, Bruno Castellaro, Muhammad Zeeshan, Nicole Rosskothen-Kuhl and Jan Schnupp
Fri, 10/4 10:15AM - 12:00PM | A20
Abstract
Binaural cues, such as interaural time difference (ITD), play a crucial role in localizing sound sources in the auditory system. However, contemporary cochlear implant (CI) processors use a coding strategy that only conveys the ITD information contained in the envelope of the sound (envelope ITD) to the cochlear implant (CI) user. As a result, the ITD information contained in the temporal fine structure of the sound (pulse-timing ITD) is not transmitted, which may contribute to the poor spatial hearing perception of CI users. To investigate the sensitivity of CI-implanted rats to envelope and pulse timing ITD, we designed a stimulus comprising a 900pps pulse train modulated by a 20 Hz sine envelope in which pulse timing ITD (PT_ITD) and envelope ITD (ENV_ITD) could vary independently from the values {-0.1, 0, 0.1 ms}. We recorded neural activity from the inferior colliculus (IC) of anesthetized neonatal deafened rats using a multi-channel silicon probe. For each multi-unit, we first applied a Wiener filter method to remove electrical artifacts, and then computed the analog multi-unit activity (AMUA) over the onset response window (0-50 ms) and the baseline window (150-200 ms). Any multi-unit with a peak amplitude in the onset window AMUA larger than the average plus 5 times the standard deviation of the baseline window AMUA was identified as responsive to the CI stimulation. For every responsive multi-unit, the proportion of variance explained by PT_ITD and ENV_ITD was computed to reveal the effect of envelope and pulse timing ITD on AMUA intensity. Our study recorded a total of 332 responsive multi-units, with 83% of them being sensitive to PT_ITD, while only one multi-unit was found to be sensitive to ENV_ITD. This indicates that CI-implanted rats exhibit far greater sensitivity to pulse timing ITD than envelope ITD. These findings suggest that the current CI stimulus strategy is not providing effective ITD information that CI users are sensitive to, and that CI users have the potential for better sound localization ability.
Derek Nguyen and Yi Zhou
Fri, 10/4 10:15AM - 12:00PM | A21
Abstract
Animal vocalizations are critical communication signals for a species to engage in social interactions such as mating selection, predator warning, and territorial control. Marmoset calls typically exhibit harmonic spectral structures covering the frequency range of 5-40 kHz, and contain dynamic envelope modulation and repetitive temporal structures (e.g., twitters). In the primary auditory cortex (A1) of marmosets, neurons respond in a phase-locked manner to envelope features of sounds, akin to time-division multiplexing (Zhou and Wang, 2010). In this study, we investigated whether time-division multiplexing extends to encoding the harmonic spectral structures observed in marmoset calls. We collected single-unit responses from the A1 of two marmoset monkeys using high-density silicon probes to sample neural activities simultaneously across cortical layers. We investigated how temporal response features (onset, offset, sustained, and their combinations) manifest in the time-multiplexing space between using intact vocalizations and their spectral variations with identical envelope modulation. Our findings showed that temporal response signatures, as defined, are not static properties of A1 neurons. Spectral variations induce shifts between the onset, offset, and sustained patterns, thereby altering the temporal sequence of unit responses within the neuronal population. These results suggest that instead of recruiting distinct neurons for different stimuli (place code), the same group of A1 neurons may encode varied stimuli by modifying their temporal response orders. This form of time-division multiplexing code, which has been observed in other animal species, likely plays an important role in auditory pattern recognition.
Xindong Song, Yueqi Guo, Chenggang Chen, Jong Hoon Lee and Xiaoqin Wang
Fri, 10/4 10:15AM - 12:00PM | A22
Abstract
Tonotopic organization of the auditory cortex has been extensively studied in many mammalian species using various methodologies and physiological preparations. Tonotopy mapping in primates, however, is more limited due to constraints such as cortical folding, use of anesthetized subjects, and mapping methodology. Here we applied a combination of through-skull and through-window intrinsic optical signal imaging, wide-field calcium imaging, and neural probe recording techniques in awake marmosets (Callithrix jacchus), a New World monkey with most of its auditory cortex located on a flat brain surface. Coarse tonotopic gradients, including a recently described rostral-temporal (RT) to parabelt gradient, were revealed by the through-skull imaging of intrinsic optical signals and were subsequently validated by single-unit recording. Furthermore, these tonotopic gradients were observed with more detail through chronically implanted cranial windows with additional verifications on the experimental design. Moreover, the tonotopy mapped by the intrinsic-signal imaging methods was verified by wide-field calcium imaging in an AAV-GCaMP labeled subject. After these validations and with further effort to expand the field of view more rostrally in both windowed and through-skull subjects, an additional putative tonotopic gradient was observed more rostrally to the area RT, which has not been previously described by the standard model of tonotopic organization of the primate auditory cortex. Together, these results provide the most comprehensive data of tonotopy mapping in an awake primate species with unprecedented coverage and details in the rostral proportion and support a caudal-rostrally arranged mesoscale organization of at least three repeats of functional gradients in the primate auditory cortex, similar to the ventral stream of primate visual cortex.
Satyabrata Parida, Jereme Wingert, Sam Norman-Haignere and Stephen David
Fri, 10/4 10:15AM - 12:00PM | A23
Abstract
Neural manifolds (NM) provide compact, low-dimensional representations of heterogeneous, high-dimensional neural activity and can be used to relate neural circuit dynamics to sensory/motor coding as well as cognitive function. This functional relationship can generalize across subjects and across behavioral contexts within subjects, providing a general characterization of representation, independent of the specific neurons included in its measurement. The current study sought to characterize the NM for representation of natural sounds by single units in the auditory cortex. We performed multichannel neurophysiological recordings from 1470 neurons in primary (A1) and secondary (PEG) fields of ferret auditory cortex. Data were collected during presentation of a large sound corpus comprised of two sets: 1) NM set (~100-s long), repeated for all neurons and used to identify the NM, 2) encoding model set (~10 hours), played cumulatively across recordings and used to train encoding models. The NM was estimated using PCA, and revealed that a NM with only about 360 dimensions captures 80% of the variance for the entire neural population. Representational similarity analysis revealed that the NM was strongly correlated between estimations from different animals. To gain an understanding of how sound is represented in the NM, we hypothesized that deep neural networks (DNN) trained to predict NM representations from auditory stimuli serve as a cortical encoding model. The encoding model can be used to thoroughly characterize the NM, such as the spectrotemporal tuning of its individual dimensions. DNNs with a wide range of architectures were trained to construct encoding models to optimize model hyperparameters. Preliminary results show a significant correlation between NM validation accuracy and neural PSTH prediction accuracy. These results point to the existence of a general NM for auditory cortex, and suggest that this NM can be leveraged to train accurate auditory cortical encoding models. Future work includes: 1) comparing the accuracy of NM-based encoding models to that of other state-of-the-art encoding models and 2) employing subspace analysis to characterize spectrotemporal properties of the distinct NM dimensions.
Muhammad Zeeshan, Fei Peng, Bruno Castellaro, Shiyi Fang, Nicole Rosskothen-Kuhl and Jan Schnupp
Fri, 10/4 10:15AM - 12:00PM | A24
Abstract
Bilateral cochlear implants (biCIs) are increasingly used to treat severe hearing loss. However, human biCI users usually exhibit relatively poor binaural cue sensitivity, with interaural time difference (ITD) sensitivity in prelingually deaf patients being particularly poor. To better understand these shortcomings in prosthetic binaural hearing, it would be helpful to know what the “innate” ITD and interaural level difference (ILD) sensitivity of the neonatally deafened (ND), mature mammalian auditory pathway is like, but this cannot easily be investigated in humans. We therefore recorded neural responses in the inferior colliculus (IC) of rats deafened by i.p. kanamycin injection. When the deaf rats reached maturity (>p60) they were urethane anesthetized and implanted with biCIs. IC multiunit responses to pulse train stimuli at rates of 1, 100, and 900 pps with combinations of ITD ∈ ±{0, 0.04, 0.08, 0.12} ms and ILD ∈ ±{0, 1, 4} dB were recorded extracellularly, and analyzed for ITD or ILD. At pulse rates of 1, 100, and 900 pps, 85.6%, 99.7%, and 97.2% respectively of multiunits were significantly ITD sensitive (Kruskal-Wallis tests), 88.5%, 96.4%, and 88% were ILD sensitive, and 76.8%, 96.1%, and 85.5% were sensitive to both. Sensitivity to small electrical stimulus ITDs and ILDs was therefore very widespread in the IC of adult, hearing-inexperienced, acutely CI-stimulated ND rats. While most multiunits showed significant sensitivity to both cues, examining the proportions of variance explained by ITD or ILD respectively revealed that multiunits in the naive IC nevertheless form two distinct clusters that are either predominantly ITD sensitive or predominantly ILD sensitive.
Anna Liu and Christina Vanden Bosch der Nederlanden
Fri, 10/4 10:15AM - 12:00PM | A25
Abstract
Every day, we individuate and attend to the many sounds in our environments, but we do not attend to all sounds equally. Previous findings have shown that when we listen to busy auditory scenes, we are biased to attend to speech over other meaningful real-world sounds. However, it is unknown what specific factors contribute to this attentional speech bias. The study examines whether listeners’ language backgrounds influence their attention toward speech in auditory scenes, guiding attention toward speech in their native over foreign languages. For this study, we have 3 groups of adult listeners from a linguistically diverse community (Toronto, Canada). Our 3 groups are English-Mandarin bilinguals (N=19), English-Other Language bilinguals (N=34), and English-only monolinguals (N=21). We aim to collect 34 participants per language group for the full dataset. Both bilingual groups are native or early bilinguals that acquired their known languages between 0-6 years of age and were proficient in both languages. Monolinguals acquired English as their first language and had little to no knowledge or exposure to other languages. Listeners completed an auditory change detection task with scenes that consisted of sounds drawn from 4 sound categories: speech (English and Mandarin pseudo-sentences), animal calls, environmental sounds, and musical instruments. In this task, listeners heard pairs of back-to-back auditory scenes, with each scene composed of 4 sounds played simultaneously. The scenes heard in each trial were either exactly the same (Same trials), or one sound from the first scene changed to another sound from a different category in the second scene (Different trials). After each scene pair, listeners were asked whether the two scenes they heard were the same or different, and their accuracy rates were analyzed. Preliminary findings show that all listeners are more accurate at detecting changes to speech over other sound categories in Different trials, replicating the presence of an attentional speech bias. We found that all three groups performed similarly detecting changes to English speech, but English-Mandarin bilinguals were more accurate at detecting changes to Mandarin speech than the other groups. Our findings suggest that our linguistic experiences contribute to an attentional bias toward speech over other sounds in auditory scenes. Specifically, when listeners hear speech in native or known languages, they are more likely to attend to that speech, while foreign languages are not attended to equally. This work will allow us to better understand what features of speech and experiences with language predict its prioritization over other sounds in busy environments. The results of this research may then allow us to investigate ways to accommodate a more diverse range of listeners from different hearing backgrounds and improve communicative efforts in busy environments.
Guoyang Liao, Dana Boebinger, Christopher Garcia, Kirill Nourski, Matthew Howard, Thomas Wychowski, Webster Pilcher and Sam Norman-Haignere
Fri, 10/4 10:15AM - 12:00PM | A26
Abstract
Neural responses in the auditory system have been classically modeled as a weighted sum of a time-frequency image of sound (the “spectrotemporal receptive field” or STRF), analogous to simple-cell receptive fields in the visual cortex. STRFs are interpretable and easy to fit with limited neural data but have limited predictive power in the auditory cortex, particularly in non-primary regions of the human brain that respond selectively to complex natural sounds (e.g., speech, music). Here, we show that a simple modification of this framework, inspired by complex cells in the visual system, substantially enhances the ability of STRF-based acoustic models to predict human cortical responses to natural sounds, measured using spatiotemporally precise intracranial recordings from neurosurgical patients. Specifically, we model neural responses as the weighted sum of spectrotemporal envelopes, computed by convolving a cochleagram representation of sound with a set of fixed spectrotemporal receptive fields and then computing the envelope of these subbands. We show that the envelope representation substantially and reliably increases the variance explained by the model across a diverse set of natural sounds, with the largest improvements in non-primary regions. Our two-stage spectrotemporal envelope model retains the key advantages of standard STRF models, while substantially enhancing their ability to predict neural responses to complex, ecologically relevant sounds.
Yinuo Liu, Ja Young Choi and Tyler Perrachione
Fri, 10/4 10:15AM - 12:00PM | A27
Abstract
The human auditory cortex has long been associated with language function, and it’s widely believed that the leftward structural asymmetry of the auditory cortex may underlie or reflect the functional leftward lateralization of language (e.g., Meyer et al., 2014). However, studies using automated surface-based parcellation and analysis packages (e.g., FreeSurfer) yield impossibly identical results when measuring hemispheric lateralization of surface area, regardless of the size and distribution of samples. In this study, we investigated to what extent surface area of Heschl’s gyrus (HG) and other key language areas is affected by automatic parcellations. Using both natural and left-right flipped structural MRI volumes, we ran the default FreeSurfer processing pipeline and calculated surface area lateralization of HG and other language areas. We found that both natural brains and flipped brains showed significant leftward lateralization in the primary auditory cortex and some key language areas such as the pars opercularis of the inferior frontal gyrus, indicating the existence of systematic bias in the automatic processing stream. Manually labelling of HG in both natural and flipped brains revealed no such bias. A step-by-step investigation of the automatic processing stream from FreeSurfer suggested the bias exists in the parcellation atlas, rather than the surface reconstruction. Our results suggest that relying on the automatic cortical parcellation provided by FreeSurfer can lead to misrepresentation of the degree of leftward surface area lateralization in key structures important to audition and language.
Yukai Xu and Joji Tsunada
Fri, 10/4 10:15AM - 12:00PM | A28
Abstract
A key aspect of human vocal communication is our ability to flexibly modify and control our speech depending upon social contexts as well as environmental conditions (context-dependent vocal production). Whereas recent progress in human and monkey neurophysiological studies has identified differential roles within the frontal cortex in vocal control, spanning planning, initiation, and articulation of vocalizations, the specific neural computations underlying context-dependent vocal control, particularly at the single neuron level, remain unknown. To address this gap, we recorded the local field potentials (LFPs) and single-unit activity from the prefrontal cortex of marmoset monkeys, a species known for their sophisticated vocal control abilities, while marmosets voluntarily produced vocalizations in different social and environmental contexts. The social context manipulations varied the distance from a partner monkey (close: 1m, far: 3.5m), and the environmental manipulations changed the ambient noise level (standard: 30 dB, noise: 60 dB). Consistent with human findings, monkeys altered their vocal behavior, including the use of different call types and acoustics, in response to the contextual manipulations. The theta-band and high-gamma bands of LFPs showed increases immediately before vocal production (pre-vocal activity), which further encoded call type information. Interestingly, single-unit activity exhibited similar modulations, with a subset of neurons encoding call types only in a specific context. To further understand whether and how neural populations integrate call types with contextual information, we analyzed low-dimensional neural trajectories of population spiking activity. This analysis revealed distinct pre-vocal trajectories that varied based on the interplay between call types and contexts. Our findings suggest that unique neural population dynamics link vocal production with specific contexts, offering insights into context-dependent vocal behavior.
Elie Partouche, Victor Adenis, Chloe Huetz and Jean-Marc Edeline
Fri, 10/4 10:15AM - 12:00PM | A29
Abstract
For several decades, cochlear implant (CI) is the most successful neuro-prosthetic device allowing thousands of patients to recover hearing sensation and speech understanding. The performances are usually good in silence, but CI patients have more difficulties in the presence of background noise. Potentially, these limitations stem from the large spread of currents diffusing in the cochlea’s perilymph when the different electrodes are activated. To reduce this large spread of current, one potential strategy is to change the shape of the electrical pulses. In human, studies have used asymmetric rectangular pulses (Macherey et al., 2006, 2008). A new pulse shape, called ramped pulse, has been proposed (Ballestero et al., 2015) but only one study has showed that such pulses elicited eABR responses with lower thresholds and steeper growth functions than rectangular pulses (Navntoft et al 2020). Here, we report the consequences of using ramped pulses on the discriminative abilities of auditory cortex (ACx) neurons in anesthetized guinea pigs. The intracochlear stimulating array was a shortened version of the EVO electrode array used by Oticon Medical (Smørum, Denmark). It was composed of 6 ring-shaped Platinum-Iridium electrodes with a 0.0046mm² surface. Center-to-center inter-electrode distance was 600µm. Four shapes of ramped pulses were tested and 20 charge levels were used (3-31.5nC) to obtain the growth functions of ACx neurons. Recordings were collected by 16-channels electrodes composed of two rows of 8 electrodes separated by 1000 µm (350 µm between electrodes of the same row). Mutual Information (MI) was used as a metric to determine to what extent ACx neurons discriminate between the different charge levels. The four shapes of ramped pulses elicited cortical responses with (i) lower thresholds and (ii) higher maximal firing rates than the rectangular pulses, both with anodic and cathodic first phases. On average, the dynamic range was unchanged as the whole growth functions of most of ACx neurons were shifted toward lower injected charges. The MI values were computed for each recording (MIind) and also for the entire population of recordings (MIpop) available with a given strategy (around 100 recordings). Based upon individual recordings, the mean MI values were similar between ramped and rectangular pulse shapes. However, based on the MIpop, the ramped pulses led to higher discrimination abilities than the rectangular pulses, but only when the first phase was cathodic. These results suggest that ramped pulses can be considered as a good alternative to rectangular pulses, but the polarity of the first phase matters to benefit from this shape.
Jusung Ham, Jinhee Kim, Hwan Shim, Kyogu Lee and Inyong Choi
Fri, 10/4 10:15AM - 12:00PM | A30
Abstract
Difficulty hearing in noise has been consistently reported by listeners with and without hearing loss, suggesting that peripheral amplification is not sufficient to mitigate this problem. Auditory selective attention (ASA) is thought to play a key role in speech-in-noise perception by enhancing the cortical representation of target speech over noise. Our group has previously developed a perceptual training protocol that provides feedback based on the strength of attentional modulation of cortical auditory evoked potentials, which improved speech perception in noise in normal hearing (NH) listeners. To improve and extend such neurofeedback training to a larger population, including cochlear implant (CI) listeners, we aimed to develop a new algorithm that can decode ASA from single-trial EEG. The attention decoder needed to be 1) adaptable to different speech stimuli and subjects, 2) compact enough to run in real-time on stimuli of a few seconds duration, and 3) explainable to reveal the neural process behind the attention decoding. We tested the combination of a temporal response function (TRF) and a template-matching algorithm for this purpose. 28 NH and 37 CI listeners were instructed to attend to one of two simultaneous speech streams: a female saying "up" 5 times and a male saying "down" 4 times within 3 seconds. We recorded 64-channel EEG during the experiment. The TRF, a linear model between the attended speech envelope and the single-trial EEG, was computed. By convolving the "up" and "down" speech envelope with the TRF, we generated an EEG template; an idealized EEG assuming perfect attentional modulation. We used the correlation coefficients between the template and the single-trial EEG for 64 channels as a feature set for binary classification. Classification performance was evaluated using leave-one-subject-out cross-validation with logistic regression and support vector machine classifiers. Using the TRF, we were able to generate a stimulus and subject general EEG template. With the template, decoding of ASA with a 3-s single-trial EEG achieved 60% accuracy for NH and 58% accuracy for CI listeners. The generated templates were well aligned with the grand-averaged event-related potentials, indicating that the temporal signature of the neural response seems to be critical for ASA decoding. We can use this compact and explicable attention decoding for neurofeedback training of ASA in NH and CI listeners. Fine-tuning the algorithm will improve the participants' engagement for better training. The interpretable process of the decoding scheme will verify that the proposed training targets the neural circuit of ASA.
Viviana Hernandez Castanon and Silvio Macias Herrera
Fri, 10/4 10:15AM - 12:00PM | A31
Abstract
We investigated the neural basis of auditory processing for communication calls in the Mexican free-tailed bat, Tadarida brasiliensis, a species known for its complex and highly elaborate vocal repertoire. We addressed the question of whether the encoding of conspecific call types depends on the response selectivity of individual neurons or the collective response of multiple neurons within a structured layer organization in the primary auditory cortex. We analyzed the selectivity and decoding properties and layer organization for neurons in response to ten types of communication calls and two types of sonar pulses as acoustic stimuli. We found that neurons in the primary auditory cortex exhibit a wide range of selectivity to these social calls, indicating a diverse neural tuning. Moreover, we identified collective processing mechanisms in the auditory neurons for decoding call types. This was achieved using supervised classification based on the neuron’s firing rate relative to the baseline activity (measured as the Z score) and temporal components defined by the interspike intervals (ISI) of the neuron’s action potentials. This approach provided an understanding of how auditory neurons process and differentiate between different call types, emphasizing the importance of both the rate and timing of neuronal firing in the auditory decoding process. This mechanism indicates that the distinction among communication calls relies on a neural ensemble capable of modulating their response to a spectrum of sound features across different cortical layers. Our findings contribute to a deeper understanding of the neural mechanisms underlying auditory processing in the Tadarida brasiliensis bat model, highlighting the complexity of neural encoding in bat communication.
Manuel Roth and Assaf Breska
Fri, 10/4 10:15AM - 12:00PM | A32
Abstract
Knowledge about features of an upcoming stimulus often aids our perceptual system in detecting a stimulus, expressed by increased speed and/or accuracy of stimulus detection. According to current theories, this is achieved by using this knowledge, extracted from stimulus regularities, to form predictions that prepare the central nervous system for the upcoming stimulus, for example by increasing gain or sensitivity of circuits with receptive fields to the predicted features. In general, this mechanism is essential for dealing with our dynamic sensory environment. In the case of the auditory system, a large body of work has shown that anticipating and therefore focusing on a particular frequency increases auditory sensitivity to that frequency in the presence of background noise. Often, the prediction of the target frequency in such studies was based on tone repetition, symbolic cues associated with specific frequencies, or by increasing their global probability, e.g., by presenting them on the majority of trials. Similar effects have also been found when the target frequency was cued by more complex regularities, such as sequences of tones. However, whether such facilitation effects also occur when the target frequency has to be predicted according to a rule on each trial remains unclear. In this experiment, we tested whether cuing a target frequency by a pitch interval that served as a rule to be applied to the subsequent stimulus would facilitate detection of the predicted frequency relative to deviant frequencies. We presented human participants with a standard pitch interval on each trial, followed after a short delay by a third tone that differed from the tones that made up the standard interval. Following this third tone, participants’ task was to detect a fourth tone, the target tone, whose frequency was defined by the same relative interval between tones 3 and 4 as between tones 1 and 2 of the standard interval in most trials, encouraging participants to use this knowledge to predict the target frequency. To test whether this prediction aids in the detection of the cued frequency, we presented either the correct or a deviant frequency (above or below the target frequency) at threshold in the presence of auditory white noise in a two-interval forced-choice paradigm (one interval with, one without a tone). The standard interval was either ascending or descending (across participants). We hypothesized that participants’ predictions would confer a perceptual advantage when the presented target frequency matched the cued frequency compared to when a deviant frequency was presented. In line with our hypothesis, we found a benefit in detecting the correct target tone compared to the deviant tones. Interestingly, we found that the detection performance depended on the direction of the standard interval (increasing or decreasing in pitch) and the location of the target tones relative to the standard interval tones. Overall, our results show that humans are able to form interval-based predictions and use them to facilitate auditory perception. However, they also point to complex interactions in the acoustic properties of stimuli that influence how prediction facilitates perception. Whether this is based on a difference in perceived consonance/dissonance or other factors not considered here should be the focus of future studies.
Ghattas Bisharat and Jennifer Resnik
Fri, 10/4 10:15AM - 12:00PM | A33
Abstract
Chronic stress, a prevalent experience in modern society, is a major risk factor for many psychiatric and sensory disorders. Despite the prevalence of perceptual abnormalities in these disorders, little is known about how stress affects sensory processing and perception. In this study, we combined chronic stress, longitudinal measurement of cortical activity, and auditory-guided behaviors to test if sound processing and perception of neutral sounds in adults are modulated by chronic stress. We found that chronic stress induces changes in sound processing, reducing sound-evoked activity in an intensity-dependent manner (N = 5 mice, n=520 tracked neurons). These changes in sound processing led to specific changes in perception, assessed through behavior, modulating loudness perception while leaving tone in noise detection unaffected (N=11). Additionally, our work reveals that the impact of stress on perception evolves gradually as the stressor persists over time, emphasizing the dynamic and evolving nature of this mechanism. Our findings challenge the notion that chronic stress primarily manipulates stimuli with existing emotional valences, shedding light on a novel mechanism through which chronic stress influences physiology and behavior.
Amber Kline, Brooke Holey and David Schneider
Fri, 10/4 10:15AM - 12:00PM | A34
Abstract
Many of the sensations we perceive are caused by our own actions, which we can distinguish from externally generated stimuli. In the auditory system, the ability to differentiate between external and self-generated sounds is crucial for vocal communication, musical training, and general auditory perception. The auditory system leverages the tight correlation between movements and the timing of incoming sensory information to discern whether a sound is self-generated, and through experience, animals form expectations for what each movement will sound like. Neural responses to expected self-generated sounds are suppressed in the primary auditory cortex (A1) and unexpected sounds elicit error-like responses. During sound-generating behaviors, the secondary motor cortex (M2) sends movement-related signals to A1 and is a potential source for establishing specific associations between sounds and their corresponding movements. Recent work suggests that M2 activity encodes a combination of movement-, sound-, and expectation-related signals, yet it remains unknown how M2 activity changes with experience as mice learn and update auditory-motor expectations. Here, we test the hypothesis that corollary discharge signals sent from M2 to A1 do not simply encode action, but instead convey rich information to about movements and their expected acoustic consequences. To investigate motor cortical dynamics in response to expected and unexpected sounds, we trained mice to push a lever and receive a reward. At a reproducible time in the lever trajectory, a sound is played such that mice learn to associate it with their own movements. We show that M2 neurons reliably encode lever movements and also respond to unexpected self-generated sounds. Following extensive experience with a sound-generating lever, M2 responses to the expected self-generated sound become weak, but M2 retains the ability to respond to sounds that violate the mouse’s expectation, suggesting that M2 neurons are sensitive to specific sensory outcomes. Ongoing two-photon calcium imaging experiments aim to understand how M2 ensembles change their activity as animals learn the acoustic consequences of their movements and how these ensembles reorganize when unexpected sounds are presented. These experiments will provide valuable insights into the brain’s mechanisms for predicting and updating the acoustic consequences of our actions in real time and could uncover fundamental principles underlying dynamic information flow between sensory and motor regions of the brain.
Jules Erkens, Ram Kumar Pari, Marina Inyutina, Mathieu Marx, Florian Kasten and Benedikt Zoefel
Fri, 10/4 10:15AM - 12:00PM | A35
Abstract
Neural activity aligns to the syllable rhythm of speech, an effect often termed “neural entrainment”. This process has become ubiquitous in current theories of speech processing, given that neural entrainment correlates with successful perception of speech. Indeed, studies that used transcranial alternating current stimulation (tACS) to manipulate neural entrainment have demonstrated a causal role of entrainment for speech processing. Nevertheless, the relative contribution of neural entrainment to target and distractors for speech perception in a setting with multiple competing speakers remained unclear. Moreover, it has been shown that transcutaneous stimulation of peripheral nerves can explain some tACS effects, but this confound has never been explored in tACS studies on speech perception. We here used tACS to manipulate entrainment to two simultaneously presented sequences of rhythmic speech, whilst participants attended to one of them. A random temporal relationship between speech streams allowed us to disentangle effects of tACS on target- and distractor processing, and to examine their combined effect on a behavioural measure of speech perception. tACS was applied bilaterally to target entrainment in two regions shown to be instrumental for speech perception (superior temporal gyrus and inferior frontal gyrus) as well as in a control condition designed to produce similar cutaneous stimulation, but which significantly reduced direct brain stimulation. We found that the phase relation between tACS and both target and distracting speech modulated word report accuracy, and to a similar degree. The strength of phasic modulation of target processing correlated with that of the distractor across subjects, and their combined effect on speech perception was stronger than each of the two alone. Importantly, these effects were also observed in the cutaneous control condition. Our findings therefore show (1) that entrainment to target and distracting speech jointly and causally contributes to speech perception, and (2) that tACS entrainment effects on speech perception might not be driven by direct cortical stimulation, but rather follow an indirect somatosensory pathway. These results have major implications for the development of tACS as a stimulation method, as they illustrate the urgent need for cutaneous control conditions, but also suggest somatosensory stimulation as an alternative for a manipulation of speech perception.
Sagarika Alavilli and Josh McDermott
Fri, 10/4 10:15AM - 12:00PM | A36
Abstract
Background Was that my phone ringing? Humans perform environmental sound recognition throughout their daily lives. Despite being a routine task, environmental sound recognition remains poorly understood, partly due to limited research on human sound recognition in natural scenes with multiple sources. We aimed to better characterize this aspect of human perception with a large-scale experiment measuring sound recognition in naturalistic multi-source scenes. In parallel, we built models of environmental sound recognition by combining machine learning with biologically inspired models of peripheral auditory processing. The models were then compared to the human behavioral results. Methods Human participants performed an online sound recognition task. Participants listened to auditory scenes composed of between 1 and 5 natural sounds. Sounds were drawn from an existing dataset (GISE) consisting of 51 different sound classes (e. g. applause, tearing, laughter). Following each scene presentation, participants judged whether a specific queried sound class was present in the scene. To build candidate models of environmental sound recognition, we trained deep neural network models on a sound classification task using the same 51 sound classes. The models received a 2 second sound clip as input. The sound was passed through a fixed model of the cochlea (bandpass filters, half-wave rectification, and lowpass filtering) followed by a neural network whose output was interpreted as the probability that each sound class was present in the scene. Results On average, human listeners performed well above chance, but performance decreased as the scene size increased. The models replicated both overall human performance and the dependence on scene size. Humans also consistently recognized some classes of sounds better than others. The model’s performance for individual sound classes captured much of the variance in human performance across classes. Conclusions We collected human environmental sound recognition judgments that will be part of a benchmark with which to evaluate models in this domain. The results illustrate that models optimized for environmental sound recognition capture aspects of human perception. These results set the stage for future explorations of auditory scene perception involving salience, attention, and memory.
Jereme Wingert, Brad Buran and Stephen David
Fri, 10/4 10:15AM - 12:00PM | A37
Abstract
Auditory encoding is traditionally studied in head-fixed preparations, largely out of necessity to control acoustic conditions. This constraint has created a gap in knowledge about auditory processing in complex naturalistic conditions in which a listener can dynamically change their position with respect to sound sources. Recent work in both auditory cortex and other modalities has shown that sensory cortical neurons encode a broad range of non-sensory information, including location and task-related variables. Studies of non-sensory encoding typically do not consider the impact of these variables on the concurrent sensory representation. To overcome this gap, we performed semi-chronic Neuropixel recordings in auditory cortex of two ferrets while they freely navigated a behavioral arena and performed a go/no-go tone detection behavior. During the task, a continuous background of natural sounds was played, which maximized the diversity of natural stimuli presented to the animal. Simultaneous tracking of head position and angle permitted virtual head fixation by modeling stimuli reaching the ears following transformation by the head-related transfer function. This framework allowed for robust characterization of auditory neural selectivity while animals behaved freely. Decoding from populations of auditory neurons revealed representation of non-auditory variables, including position, velocity, and head direction, consistent with findings from other groups using rodent models. For single neurons, a multimodal auditory and spatial-motor encoding model predicted spiking activity better than models including only auditory related signals. Surprisingly, most of the improvement in prediction accuracy could be accounted for by a gain/offset model in which spatial-motor variables were used to modulate gain and offset of predictions by an auditory-only model. The gain/offset effects modulating model predictions were found to be spatially localized within the behavioral arena. The output of the encoding model, along with positional decoding and simple 2D-tuning curve representations of neural activity suggest that position within the behavioral arena is the dominant component impacting auditory responses. This observation contrasts with studies of movement-related activity in auditory cortex in head-fixed rodent preparations in which animals can move on treadmills/spheres, which find suppression related to motor activity. Together, these findings demonstrate that representation of position is tightly interwoven with representation of relatively low-level sound features in auditory cortex. How this multimodal code is utilized by downstream perception and behavioral processes remains an open question for future study.
Sudan Duwadi, De'Ja Rogers, Alex D. Boyd, Laura Carlton, Yiwen Zhang, Anna Gaona, Bernhard Zimmermann, W.Joe O'Brien, Alexander Von Luhmann, David Boas, Meryem Yucel and Kamal Sen
Fri, 10/4 10:15AM - 12:00PM | A38
Abstract
Complex Scene Analysis (CSA) enables the brain to focus on a single auditory or visual object in crowded environments. While this occurs effortlessly in a healthy brain, many hearing-impaired individuals and those with neurodivergent conditions such as ADHD and autism experience difficulty in CSA, impacting speech intelligibility (Dunlop et al., 2016). Whole head brain imaging during active CSA has potential to reveal critical insights into underlying cortical activity patterns. Although electroencephalography (EEG) and functional magnetic resonance imaging (fMRI) are common in brain imaging, their limitations have restricted ecologically valid CSA studies. Motion constraints and persistent background noise during scanning makes fMRI a less ecologically valid modality for CSA. EEG, while beneficial for its high temporal resolution, poses limitations due to its comparatively lower spatial resolution. To overcome these limitations, we propose using high-density (HD) functional Near-Infrared Spectroscopy (fNIRS) for whole-head brain imaging during complex scene analysis (CSA) in naturalistic settings. This approach allows analysis of cortical activity patterns, with potential applications in enhancing brain-computer interface technologies. Our experimental design mimics an ecologically valid cocktail party scenario in both overt and covert contexts. In the overt scenario, 3-second audiovisual movie clips are presented simultaneously at 30 degrees to the left and right. Prior to each clip, a 2-second spatialized white noise cue is paired with a white crosshair on the corresponding screen, guiding subjects on which direction to focus, with eye movements allowed. In the covert scenario, subjects are exposed solely to spatialized audio from the same set of movies. Here, the 2-second spatialized white noise serves as the cue, directing their attention, while they maintain a gaze on a central screen displaying a static white crosshair. fNIRS data were collected from 3 subjects using the NinjaNIRS22 system with a whole head, high density cap. We analyzed whole-brain evoked responses to understand cortical activity patterns in both scenarios. Moreover, using the brain's signals and machine learning techniques, we also decoded the attended spatial location. Such an approach holds potential for brain-computer interface applications, such as enhancing hearing aid algorithms to selectively focus on specific sound sources (Ning et al., 2024). Our results show robust evoked responses in frontal eye fields (FEF), primary auditory regions, superior and medial temporal gyrus, primary sensory regions, supramarginal gyrus, and angular gyrus in both overt and covert conditions. References Dunlop, W. A., Enticott, P. G., & Rajan, R. (2016). Speech Discrimination Difficulties in High-Functioning Autism Spectrum Disorder Are Likely Independent of Auditory Hypersensitivity. Frontiers in Human Neuroscience, 10. https://doi.org/10.3389/fnhum.2016.00401 Ning, M., Duwadi, S., Yücel, M. A., von Lühmann, A., Boas, D. A., & Sen, K. (2024). fNIRS dataset during complex scene analysis. Frontiers in Human Neuroscience, 18. https://doi.org/10.3389/fnhum.2024.1329086
Letitia Schneider, Isma Zulfiqar, Yaël Balbastre, Lori Holt, Martina Callaghan and Frederic Dick
Fri, 10/4 10:15AM - 12:00PM | A39
Abstract
Thanks to advances in receive coils, parallel imaging, and pulse sequence design, fMRI temporal resolution has increased several fold, with repetition times (TRs) of
Tobias Reichenbach, Alina Schüller and Jasmin Riegel
Fri, 10/4 10:15AM - 12:00PM | A40
Abstract
Oral communication is of fundamental importance to our social interactions. However, understanding a speaker in a conversation is a highly complex task that requires attentional filtering as well as rapid neural processing of acoustic and linguistic aspects of the speech signal. The specific mechanisms behind these processes still remain largely unclear, not least because the investigation of the neural processing of continuous natural speech has begun only relatively recently. An important feature of speech is the fundamental frequency of its voiced parts, arising from the opening and closing of the glottis. Together with its many higher harmonics, the fundamental frequency carries most of the energy of voiced speech. In the presence of background noise such as other speech signals, a listener can employ this speech feature to single out a target voice. The fundamental frequency is tracked by high-frequency neural activity. Due to the similarity to the frequency-following response (FFR) to a pure tone, we refer to this high-frequency neural tracking as speech-FFR. We and others have recently shown that the speech-FFR can be measured in response to continuous speech, opening up a window of opportunity for investigating the involvement of the speech-FFR in different aspects of speech processing [1]. Using electroencephalography (EEG), we have already shown that subcortical contributions to the speech-FFR are modulated through selective attention as well as through acoustic and linguistic information, reflecting top-down mechanisms. However, additional cortical contributions to the speech-FFR have recently been identified, mostly using magnetoencephalography (MEG). Here we investigate the characteristics of this cortical contribution and its role for speech processing. To this end, we employ MEG measurements of the neural responses to continuous speech. The speech-FFR is determined through source reconstruction followed by the computation of temporal response functions (TRFs). We thereby find that the subcortical and the cortical contribution are temporally well separated [2]. Although the subcortical contribution measured through MEG is small, we can reliably detect it in a large dataset of long MEG recordings. It occurs at a latency of around 10 ms, while the cortical contributions arise at a longer delay of about 35 ms. We then investigate if, similar to the subcortical contribution, the cortical responses are modulated by selective attention [3]. In an experiment in which participants listen to two competing male speakers, alternatingly attending one of them, we find that the cortical contribution to the speech-FFR is larger when the corresponding speaker is attended than when they are ignored. Musical training has been found to influence the subcortical contribution to the speech-FFR, with musicians experiencing larger responses than non-musicians. We wondered if the cortical contribution exhibited this behavior as well. We therefore measured MEG responses to two competing speakers in a large dataset of 52 subjects, 18 of which had substantial musical training, 9 some and 25 no or almost no musical training. In contrast to previous findings on the subcortical response, we did not observe a systematic influence of the type or amount of musical training on the cortical speech-FFR, neither on its amplitude, latency or attentional modulation. Taken together, our results show that the cortical contribution to the speech-FFR can be reliably measured through MEG, occurs significantly later than the subcortical contribution, and plays a role for the processing for speech in background noise due to its involvement in selective attention. Moreover, the cortical speech-FFR plays at least partly a different role to the subcortical response, as it is not significantly affected by musical training. We therefore believe that the further investigation of the interplay between the subcortical and the cortical contributions to the speech-FFR, as well as their top-down modulation through higher cognitive factors such as selective attention, will reveal a more complete understanding of the neural processing of speech in noise. [1] A. E. Forte, O. Etard and T. Reichenbach, The human auditory brainstem response to running speech reveals a subcortical mechanism for selective attention, eLife 6:e27203 (2017). [2] A. Schüller, A. Schilling, P. Krauss, T. Reichenbach, The early subcortical response at the fundamental frequency of speech is temporally separated from later cortical contributions, J. Cogn. Neurosci. (2024) 36:475 [3] A. Schüller, A. Schilling, P. Krauss, S. Rampp, T. Reichenbach, Attentional modulation of the cortical contribution to the frequency-following response evoked by continuous speech, J. Neurosci. (2023) 43:7429
Alexander Billig, Joel Berger, Richard McWalter, Tobias Teichert, Christopher M. Garcia, Christopher K. Kovach, Christopher I. Petkov, Hiroto Kawasaki, Matthew A. Howard, Josh McDermott and Timothy Griffiths
Fri, 10/4 10:15AM - 12:00PM | A41
Abstract
Many of the sound scenes we encounter consist of a large number of stochastically timed similar events that together form acoustic textures, such as the sound of a rain shower. These can typically be characterised by a relatively sparse set of summary statistics, such as correlations between modulation envelopes across frequency bands (McDermott & Simoncelli, Neuron 2011). Listeners are thought to automatically extract such statistics when perceiving and remembering textures (McDermott et al., Nat Neurosci 2013). Like other sounds, textures can be perceived to continue through sufficiently loud interrupting white noise, even when the texture is physically absent (McWalter & McDermott, Nat Comms 2019). In the case of textures with stable summary statistics, this illusion can last for several seconds. We tested the hypothesis that such persistence draws on neural circuits beyond auditory cortex, including hippocampus. We presented many exemplars of two different textures to six neurosurgical participants. In each trial, two seconds of texture were followed by two seconds of white noise and then a final second of texture. In a control condition, 200-ms silent gaps were inserted either side of the noise. All participants reported perceiving the illusion in the continuous case, with a significant drop in such reports when gaps were present. Only when the texture was physically present could its identity be decoded from single-neuron firing and high-gamma power in auditory cortex. In contrast, a decoder using theta power across multiple electrodes was able to identify not only the physically presented texture, but also the texture perceived during the white noise. Channels in regions beyond primary auditory cortex, including planum polare and hippocampus, carried the greatest weight in these decoders. In the substantial majority of participants, this decodability based on theta power dropped to chance level when silent gaps buttressed the noise to suppress the illusion. We continue to study the extent to which the extraction and persistence of summary statistics through several seconds of interruption draws on brain areas with longer intrinsic timescales than primary auditory cortex, and interactions between these regions.
Deanna Garcia, Ariadna Corredera Asensio, Suha Chang and David Schneider
Fri, 10/4 10:15AM - 12:00PM | A42
Abstract
Mice vocalize during courtship and parental behavior, suggesting that real-time social behavior is influenced by acoustic cues. Many non-vocal behaviors also produce sounds, such as locomotion, digging, eating, and drinking. Recent work suggests that the acoustic landscape of socializing rodents is dominated by these non-vocal, rodent-produced sounds, yet it remains largely unknown whether or how non-vocal sounds influence social behavior. Here, we performed two experiments to explore how mouse social behavior changes when mice are noisy compared to when they are quiet. We first placed pairs of mice in adjacent linear tracks, the floors of which were covered with either noisy materials (leaves, bedding) or quiet materials (rubber). The linear tracks shared an opaque wall such that the mice could hear one another but could not see one another and we used markerless keypoint tracking (SLEAP) to quantify the behavior of both mice relative to one another. Over the course of tens of minutes, we observed a notable absence of vocalizations. Despite this lack of vocalizations, mice often synchronized their locomotion, moving up and down their linear tracks in the same direction and at the same time, and often pausing their locomotion with synchrony that exceeded chance levels. Having observed synchrony with spatially separated mice, we next asked how freely interacting mice adjust their behavior on noisy surfaces and in the dark. In preliminary experiments, pairs of mice tend to occupy similar positions within an arena covered in dry leaves, suggesting that mice might use acoustic cues to locate social partners in the dark. Ongoing experiments and analyses are aimed at determining how auditory, tactile, and olfactory cues distinctly contribute to these synchronous social behaviors. Collectively, these experiments indicate that mouse social interactions change when mice are noisy, suggesting that non-vocal, mouse-generated sounds may play a role in naturalistic social interactions.
Ana Sanchez Jimenez, Victoria Bajo, Ben Db Willmore, Andrew J King and Fernando Rodriguez Nodal
Fri, 10/4 10:15AM - 12:00PM | A43
Abstract
The ability, through training, to overcome the impairment in sound localization caused by asymmetric conductive hearing loss is well documented. Research in ferrets wearing an earplug has established that such experience-dependent plasticity requires a functioning auditory cortex and the integrity of its descending circuits to the inferior colliculus (IC). Here, we examined whether behavioral adaptation to asymmetric hearing loss is associated with changes in IC response properties. We recorded from the IC bilaterally using high-density Neuropixels probes over several weeks in three ferrets that were performing a sound localization task in the azimuthal plane. Ferrets were rewarded for approaching and licking a spout below the target speaker, and the spatial tuning properties of IC neurons were explored using broadband sounds. The location of the probes inserted dorsoventrally into the IC was determined physiologically and anatomically by recording neuronal frequency response areas under sedation and histological inspection, respectively. Most of the recordings were in the central nucleus of the IC, with some recording sites in its dorsal and lateral cortices. Under normal hearing conditions, the most common responses were primary-like and sustained. Most neurons (77.12 ± 14.89%) had a contralateral preference (mean centroid for left IC: 54.08 ± 12.84°, right IC -42.99 ± 45.48°), with a broad equivalent rectangular receptive field (ERRF) (145.77 ± 7.83°). No differences between left and right IC were observed. Moreover, a population linear decoding model was able to decode the stimulus azimuth from IC activity highly accurately, with greater accuracy in the left-right axis than in the front-back axis. The contralateral preference and broad spatial tuning of IC neurons are consistent with opponent two-channel coding of sound location. Plugging one ear produced a marked change in the response properties of IC neurons. Neurons ipsilateral to the earplug exhibited a broadening of their spatial tuning (larger ERRF) and a reduced contralateral preference, whereas neurons contralateral to the earplug exhibited a profound suppression of their activity. During behavioral adaptation to unilateral hearing loss, we observed a small reduction in ERRF in the IC ipsilateral to the earplug and increased spatial modulation of responses in the contralateral IC, though these changes were not sufficient to restore normal spatial tuning. Nevertheless, the population decoding model showed a progressive improvement in decoding performance over the course of training, indicating that a neural correlate of behavioral adaptation is found in the IC.
Joseph Pinkl
Fri, 10/4 10:15AM - 12:00PM | A44
Abstract
Subjective tinnitus, the perception of sound in the absence of an external stimulus, is a serious health issue for hundreds of millions of people worldwide. Currently, there are no approved drugs to prevent or treat tinnitus partly due to a lack of reliable detection methods for animal studies. Previously, we developed a behavioral-based paradigm for tinnitus detection in mice called sound-based avoidance detection (SBAD). SBAD is a dual paradigm that uses negative reinforcement (electrical shocks) to infer tinnitus (silent trial) while monitoring potential confounding variables including alertness, motivation, motor functioning, and memory (sound trial). In a previous study, tinnitus detection via SBAD testing was validated by functional assays of the midbrain that demonstrated abnormal increases in neuronal activity in the inferior colliculus in mice that received traumatic noise exposure. Here, we aimed to further validate the SBAD method by directly modulating neuronal activity of the inferior colliculus (IC) in mice using Designer Receptors Exclusively Activated by Designer Drugs (DREADDs)-based chemogenetic tools, a class of engineered proteins that are activated by synthetic ligands. Eight adult C57BL/6J mice received bilateral stereotaxic micro-injections of AAV8-CaMKIIa-hM3D(Gq)-mCherry (AAV) at either 50 nL (n = 4) or 150 nL (n = 4) doses in the central nucleus of the IC (stereotaxic coordinates: AP: -5.00 mm, ML: +/- 1.00 mm, DV: -1.50 mm). After one month of recovery, animals underwent 15 days of SBAD training followed by 7 consecutive days of SBAD testing. On every other day of testing starting on the second day, animals received intraperitoneal injections of clozapine-N-oxide dihydrochloride (CNO), a chemogenetic actuator, at a concentration of 5 mg/kg, one hour before testing. On all other days of testing, animals received .5ml injections of .9% saline as a control. For SBAD silent trials, there was a significant main effect of injectant type (saline vs CNO; p = .013) however there was no significant main effect of AAV dose and no significant interaction between injectant type and AAV dose. Silent trial scores post-CNO injection ranged from approximately 74-100% with five of the eight animals yielding scores that indicated tinnitus. SBAD scores suggest that selective modulation of the IC in AAV injected mice via post-operative CNO injections can lead to tinnitus percept, however the dose of AAV did not affect SBAD test scores. Future research will assess the three-dimensional localization of AAV injected sites. Because IC hyperactivity is implicated in many tinnitus studies, excitatory DREADD injections of the IC as a reliable experimental model for tinnitus research is well supported.
Ilina Bhaya-Grossman, Yulia Oganian, Emily Grabowski and Edward Chang
Fri, 10/4 10:15AM - 12:00PM | A45
Abstract
Lexical stress, or the emphasis placed on syllables within words, critically facilitates word recognition and comprehension processes. For instance, it enables listeners to distinguish between the noun “a present” (PRE-sent) and the verb “to present” (pre-SENT). In English, lexical stress is prominently signaled by relative speech intensity, with the stressed syllable exhibiting the greatest intensity relative to other syllables in the word. Prior work has shown that the human speech cortex on the superior temporal gyrus (STG) encodes speech intensity as a series of discrete acoustic landmarks marking moments of peak intensity change (peakRate). Building on this finding, a key question arises: Is there a neural encoding of relative intensity in the STG that supports the perception of lexical stress? To address this question, we performed intracranial recording (n=9 ECoG patients) while English speaking participants performed two experiments. In Experiment 1, participants performed a forced choice task, identifying whether the first or second syllable was stressed in a set of synthesized two syllable pseudo-words (e.g. hu-ka, ma-lu). The intensity of the first syllable in each pseudo-word varied while the intensity of the second syllable was fixed, allowing us to experimentally test whether neural responses to the second syllable depended on the intensity of the first. We found that a subset of cortical sites on the human STG encoded relative intensity, that is, these sites showed activation in response to the second syllable only when its intensity was greater than that of the first syllable. Critically, we found that cortical sites that encoded relative intensity were distinct from those that encoded peakRate. Neither population encoded which syllable participants perceived as stressed when they were presented with ambiguous pseudo-words, where both syllables had identical intensity. In Experiment 2, we used a passive listening paradigm to extend our findings to a naturalistic speech stimulus. Our results indicate that relative and absolute intensity of speech is encoded in two distinct neural populations on the STG and further, that these populations do not encode stress percepts in cases where the intensity cue to lexical stress is removed. Our results reveal the multiple, distinct neural representations that work in concert give rise to lexical stress perception.
Charlie Fisher, I.M. Dushyanthi Karunathilake, Michael A. Johns, Allison Vance, Stefanie E. Kuchinsky, Samira Anderson and Jonathan Z. Simon
Fri, 10/4 10:15AM - 12:00PM | A46
Abstract
Listening in noisy environments is a common challenge for older adults, even those with clinically normal hearing. Moreover, compared to younger adults, older adults report higher listening effort when listening to competing speakers. This suggests that auditory scene segregation is affected by more than just hearing loss and may also be influenced by cognitive processing, both of which decline with age. In this study, we aim to determine if auditory-cognitive training (of two types, with different levels of required cognition) can improve speech-in-noise listening in normal-hearing, older adults. To analyze the effects of auditory-cognitive training, we collected behavioral and neural data from older adults pre- and post-training, along with younger adults who did not undergo training. Magnetoencephalography (MEG) was used to record brain responses while subjects listened to narrated audiobooks under four different noise conditions. For each audio presentation, we logged listener-reported intelligibility and listening effort. All neural data was analyzed using the temporal response function (TRF) framework. Additional behavioral data obtained include various tasks of working memory and audio segregation, such as stochastic figure-ground (“tone cloud”) detection, the quick speech in noise test (QuickSIN), a speech perception in noise task (SPIN), and tests of working memory (RSPAN and N-back). Preliminary results for older adults show a post-training reduction in listening effort with competing speakers. Additionally, some neural measures, such as the reconstruction of stimulus speech features, were generally reduced—an indication that maladaptive overcompensation typically observed in older adults may be decreased. This reduction of stimulus speech feature reconstruction demonstrates neuroplasticity that brings the older adults closer to the younger adults. Critically, one pre-training behavioral measure may predict the level of neuroplasticity benefit, performance in the tone cloud detection task: lower tone cloud detection pre-training scores were associated with larger reductions in stimulus reconstruction measures post-training. These results are promising for the incorporation of auditory-cognitive training in older adults who experience difficulty understanding speech in noise.
David Skrill, Dana Boebinger, Chris Garcia, Kirill Nourski, Matthew Howard, Thomas Wychowski, Webster Pilcher and Sam Norman-Haignere
Fri, 10/4 10:15AM - 12:00PM | A47
Abstract
A central goal of sensory neuroscience is to build parsimonious computational models that can both predict neural responses to natural stimuli, such as speech and music, and reveal interpretable functional organization in the brain. Statistical “component” models can learn interpretable, low-dimensional structure across different brain regions and individuals. For example, prior studies have revealed that human cortical responses to natural sounds in non-primary regions can be approximated as the weighted sum of speech, music, and song-selective neural responses. Component models, however, cannot generate predictions for new stimuli or generalize across different experiments because they lack an explicit “encoding model” that links these components to the stimuli that drive them. Modern encoding models derived from deep neural networks, on the other hand, often have strong predictive power, but deriving simple and generalizable insights from these models can be challenging in part because they are constructed by mapping a high-dimensional feature set to a high-dimensional neural response. To overcome these limitations, we develop "component-encoding models" (CEMs) which approximate neural responses as a weighted sum of a small number of component response dimensions, each approximated by a stimulus-computable encoding model derived from a predictive deep neural network. We show using simulations, fMRI data, and human intracranial responses to natural sounds that our CEM framework can infer a small number of interpretable response dimensions across different experiments with non-overlapping stimuli and subjects while maintaining and even improving the prediction accuracy of standard encoding models.
Moved to B70
Fri, 10/4 10:15AM - 12:00PM | A48
Abstract
Moved to B70
Ralph Peterson, Violet Ivan, Alex Williams, David Schneider and Dan Sanes
Fri, 10/4 10:15AM - 12:00PM | A49
Abstract
Interactive vocal communication is a complex sensorimotor phenomenon that requires coordination between the auditory and motor systems, as well as coordination between animals. Work in Scotinomys teguina, the singing mouse (Okobi et al. 2019, Banerjee et al. 2024), has revealed how motor cortical dynamics subserve vocal production during antiphonal communication (i.e. turn-taking), however less is known about the role of auditory processing in generating appropriate vocal outputs. Here, we report that Mongolian gerbils — a highly social fossorial species that live in large multigenerational families — engage in ultrasonic antiphonal communication that depends on auditory cortex (AC) activity. Male and female gerbils (postnatal days 36-99) were exposed to an ultrasonic vocalization (USV) bout (2 seconds, 20 trials, 30-60 second inter-trial interval) in an isolated testing chamber while both video and audio recordings were obtained. We found that gerbils (n=21) responded to this auditory stimulus with seconds-long sequences of stereotyped USVs. Given the robust vocal response, we used this behavior to test whether AC activity was required for antiphonal responses. Bilateral muscimol infusion into AC through chronically implanted cannulae significantly reduced antiphonal responses and vocalization-induced postural orientation responses, as compared to saline-infused controls (n=5). Unilateral inactivation of either AC reduced antiphonal responses as compared to saline controls, however vocal responses were significantly higher than bilateral inactivation. This suggests that there may be coordination between hemispheres to process conspecific vocalizations critical for antiphonal calling. Finally, preliminary chronic wireless silicon probe recordings showed that vocalization playback activated AC neurons, even on playback trials where the animal did not respond vocally. Therefore, AC signals that encode ongoing social auditory experience are necessary for antiphonal vocalizing, and are likely transmitted to downstream social/motor structures that coordinate vocal turn-taking behavior.
Zyan Wang, Sharlen Moore, Ziyi Zhu, Ruolan Sun, Angel Lee, Adam Charles and Kishore V. Kuchibhotla
Fri, 10/4 10:15AM - 12:00PM | A50
Abstract
A fundamental tenet of animal behavior is that decision-making involves multiple ‘controllers.’ Initially, behavior is goal-directed, driven by desired outcomes, shifting later to habitual control, where cues trigger actions independent of the motivational state. Clark Hull’s question from 1943 still resonates today: “Is this transition [to habit] abrupt, or is it gradual and progressive?” Despite a century-long belief in gradual transitions, this question remains unanswered as current methods cannot disambiguate goal-directed versus habitual control in real time. Here, we introduce a novel ‘volitional engagement’ approach, motivating animals by palatability rather than biological need. Providing less palatable water in the home cage reduced motivation to ‘work’ for plain water in an auditory discrimination task compared to water-restricted animals. Using quantitative behavior and computational modeling, we found that palatability-driven animals learned to discriminate as quickly as water-restricted animals but exhibited state-like fluctuations when responding to the reward-predicting cue-reflecting goal-directed behavior. After thousands of trials, these fluctuations spontaneously and abruptly ceased, with animals always responding to the reward-predicting cue. In line with habitual control, post-transition behavior displayed motor automaticity, decreased error sensitivity (assessed via pupillary responses), and insensitivity to sensory-specific outcome devaluation. Bilateral lesions of the habit-related dorsolateral striatum (DLS) blocked transitions to habitual behavior. Finally, we used bilateral fiber photometry in the putative controllers of goal-directed (dorsomedial striatum, DMS) and habitual (DLS) behavior to monitor the evolution of neural activity across learning. Both the DMS and DLS exhibited learning-related plasticity in cue, lick, and outcome-related signaling at similar timescales in parallel. Immediately after transitioning to habitual behavior, outcome-related signaling was suppressed in the DLS and, to a lesser extent, in the DMS, while cue-evoked responses further sharpened. This abrupt shift (reduction in outcome signaling and sharpening of cue-evoked responses) indicated that sensory cues rather than outcomes drive habitual responding. Our results demonstrate that both controllers (DMS and DLS) exhibit learning-related plasticity in parallel but that the behavioral manifestation of habits emerges spontaneously and abruptly in a DLS-dependent manner, suggesting the involvement of a higher-level process that arbitrates between the two.
Steven Eliades and Joji Tsunada
Fri, 10/4 10:15AM - 12:00PM | A51
Abstract
Vocalization is a sensory-motor process requiring auditory self-monitoring to detect and correct errors in vocal production. This process is thought to involve an error signal encoding the difference between vocal motor predictions and sensory feedback, but direct evidence for the existence of such an error signal is lacking, and the underlying coding mechanisms remain uncertain. One potential mechanism for this feedback error detection is a well-described suppression of the auditory cortex, seen during both human speech and animal vocalization. Past studies, however, have been limited to single manipulations of auditory feedback and have thus been unable to fully test the error signal hypothesis or give insight into the underlying computations. In this study, we investigated vocal responses in the auditory cortex marmoset monkeys, testing frequency-shifted feedback of varying magnitudes and directions. Consistent with an error signal, we found population-level activity that scaled with the magnitude of feedback shifts. Feedback sensitivity was greatest in vocally suppressed units, and in units whose frequency tuning overlapping vocal acoustic ranges. Individual units often exhibited preferences for either positive or negative frequency changes, with many responding to shifts in both directions, as well as sensitivity to feedback shifts of different magnitudes. Comparisons between vocal production and passive listening reveal a change in the population of units encoding frequency shifts. These results suggest that vocal responses and feedback sensitivity in the auditory cortex are consistent with an error calculation during vocal production, both at the individual unit and population level, and suggest coding using changes in both firing rates and the units involved.
Anna-Lena Krause and Lars Hausfeld
Fri, 10/4 10:15AM - 12:00PM | A52
Abstract
In two-talker situations, listeners need to segregate a stream of interest from a second stream. Our auditory system utilizes several cues to support segregation and intelligibility of the relevant speech stream. Most cues were investigated using short and anechoic speech and the effects of reverberation on the underlying neural processing in naturalistic listening situations remain unclear. In this study, we presented participants (N = 18) with two simultaneous speakers while their neural responses were measured with high-density EEG. Using the image method, we computed binaural room impulse responses (BRIRs) to simulate mild and strong reverberation for speakers that were co-located or separated in azimuth. In addition, speech signals were manipulated to create small or large pitch separation between speakers. The analyses of behavioral responses (intelligibility and difficulty) showed main effects of reverberation and azimuth separation. To investigate cortical responses to reverberant, continuous speech, we employed a linear systems approach using multivariate temporal response functions (mTRFs). In line with behavioral results, decoding of the non-reverberant (‘dry‘) speech in the reverberant mixture showed a strong main effect of reverberation in addition to smaller main effects of location and pitch showing that the auditory system is sensitive to these cues, in particular reverberation, with the current experimental setup. Using speech encoding, we found that non-reverberant speech was more strongly represented compared to reverberant speech signals already at early processing stages (80-100 ms) suggesting early ‘de-reverberation’ in the processing hierarchy. In addition, dry speech was more strongly represented in mild vs. strong reverberation conditions at a later stage (> 200 ms) indicating stronger linguistic processing likely due to successful speech extraction and segregation. These results and ongoing analyses help to shed more light on neural processes underlying speech perception in naturalistic, reverberant situations.
Nicholas Audette and David Schneider
Fri, 10/4 10:15AM - 12:00PM | A53
Abstract
Many of the sensations experienced by an organism are caused by their own actions, and accurately anticipating both the sensory features and timing of self-generated stimuli is crucial to a variety of behaviors. In the auditory cortex, neural responses to self-generated sounds exhibit frequency-specific suppression, suggesting that movement-based predictions may be implemented early in sensory processing. By training mice to make sound-generating forelimb movements, we recorded detailed neural responses while mice produced and experienced sounds that met or violated their expectations. We identified suppression of responses to self-generated sounds that was specific across multiple acoustic dimensions and to a precise position within the trained movement. Prediction-based suppression was concentrated in L2/3 and L5, where deviations from expectation also recruited a population of prediction-error neurons. Prediction error responses were short latency, stimulus-specific, and dependent on a learned sensory-motor expectation. Recording when expected sounds were omitted revealed expectation signals that were present across the cortical depth and peaked at the time of expected auditory feedback. Building on these findings, we are pursuing the substrate of prediction-based suppression by recording neural activity from identified cell types and auditory brain regions.
Jie Wan, Jonathan Venezia and Gregory Hickok
Fri, 10/4 10:15AM - 12:00PM | A54
Abstract
Engaging in rhythmic movements, such as grooving or dancing, in response to music is a ubiquitous human behavior. Particularly noteworthy are behaviors involving motor synchronization to auditory rhythms, which are distinctively observed in humans and parrots. From the speech perspective, rhythm synchronization is hypothesized to be a prerequisite for speech coordination between the dorsolateral system involving pitch-related functions and the ventrolateral system involving phonetic articulation functions. This study aims to explore the cognitive rhythmic mechanisms underpinning these phenomena, particularly focusing on how different motor effectors may exhibit preferences for specific rhythm rates. We selected to compare the supralaryngeal vocal tract and finger as representative motor effectors for speech/singing and body movement, respectively. Given that speech typically occurs at a syllabic rate of approximately 4 Hz, while spontaneous finger-tapping or walking tends to occur around 2 Hz, we investigated motor synchronization to auditory rhythms ranging from 2 Hz to 4 Hz. Participants were instructed to either tap their fingers or vocalize along with auditory rhythms presented at varying rhythms. Preliminary results indicate that both motor effectors exhibit a preference for higher frequencies over lower ones, with accuracies improving from 2 Hz to 4 Hz. Interestingly, we observed individual differences in preferred motor effector, which may be due to participants' musical training backgrounds. These findings shed light on the intricate interplay between motor coordination and rhythmic processing, offering insights into how humans synchronize their movements with auditory rhythms. Further analysis of individual differences and the influence of musical training may provide deeper understanding into the cognitive mechanisms governing rhythmic behavior.
Sahil Luthra, Raha Razin, Chisom Obasih, Adam Tierney, Fred Dick and Lori Holt
Fri, 10/4 10:15AM - 12:00PM | A55
Abstract
Humans and other animals develop remarkable behavioral specializations for identifying, differentiating, and acting on classes of ecologically important signals. This emergent expertise permits facial and vocal identification, object recognition, and at least in humans, speech comprehension. Such behavioral specializations are often associated with preferential, regionalized neural responses to a particular class of stimuli. How are these stimulus-class-specific responses related to the processes that emerge to enable these behaviors? One theoretical approach emphasizes the importance of learned attention along perceptual dimensions that provide critical information for differentiating or grouping complex signals into behaviorally relevant categories. The local increase in neural selectivity and response associated with learning is hypothesized to be driven by attentional gain at diagnostic regions along the dimension. Here we take advantage of the spatialized representation of auditory frequency to make a direct test of this theory by directing novel, category-diagnostic auditory information into spectrally-delimited bands. Prior to a single fMRI session, adult human participants (N=49) trained for 5 days to develop expertise in rapidly and accurately categorizing complex, multidimensional sounds they had never previously encountered. Four sound categories were associated with the identity of different 'space aliens'. Key to our approach, each sound possessed acoustic energy in two non-overlapping frequency bands (~80-750 Hz and ~1000-9000 Hz), each populated with three 400-ms ‘chirps’ varying in frequency contour. In the category-diagnostic band, these acoustically variable chirps possessed an underlying regularity that defined category membership (the identity of the space alien). Simultaneous and temporally aligned chirps in the other, non-diagnostic frequency band were acoustically variable and possessed no coherent regularity associated with category identity. Category learning thus depended on discovering the diagnostic frequency band and learning the patterned information within it. Two categories carried diagnostic information in the high-frequency band; two carried information in the low-frequency band. This allowed us to capitalize on bilateral tonotopic organization to establish cortical regions potentially relevant to the spectrally-delimited category-diagnostic information signaling category identity. When listeners become expert in these novel categories, we should observe commensurate increases in neural activation in regions tuned to the frequencies that convey the category-disambiguating information. Indeed, in the context of categorization we observed greater cortical activation within tonotopically mapped regions associated with the category-diagnostic frequency band, compared to the simultaneous non-diagnostic band. Moreover, we expected that category-diagnostic information should become more salient with expertise: Even when listeners are not making category decisions dependent on spectrally-delimited information, the most expert listeners should show greater activation for the frequency band they have learned carries information diagnostic of category identity. We observed this effect: There was increased activation for the frequency band associated with category identity, even in a different categorization task for which amplitude, and not frequency, was diagnostic of whether aliens were “big” or “small” independent of their identity. This effect was observed even though scanning took place an average of 9 days post-training. This is consistent with stable changes in cortical activation elicited by category exemplars that generalize across task demands; prioritization of diagnostic dimensions persists even in contexts that do not demand reliance on them. As a test of the 'attentional gain' hypothesis, we compared auditory-category-associated activation with cortical activation driven by explicit sustained attention directed at sinewave tones situated within the high- versus low-frequency bands. Across a number of auditory cortical regions, these activation patterns align with the diagnostic-band prioritization that arose implicitly in categorization. In other words, regions recruited when listeners explicitly attend to a delimited region of the frequency spectrum are correspondingly more active when categorization implicitly hinges on information situated in these frequency bands. We further explore the way that preferential activation for category-relevant frequency bands interacts with the degree of category expertise, and particularly with the rate of learning across training. In all, learning auditory categories results in prioritization of auditory-category-relevant perceptual information across a representational dimension — a prioritization reflected in systematic changes in neural response to complex sounds. This category-selective difference in activation persists even when the dimension is irrelevant to the task at hand, and aligns closely with patterns of activation that emerge with explicit, sustained attention to the dimension across unrelated sounds. Together, this suggests that acquiring new auditory categories can drive the emergence of acquired attentional salience to dimensions of acoustic input.
Ziyi Zhu, Adam Charles and Kishore Kuchibhotla
Fri, 10/4 10:15AM - 12:00PM | A56
Abstract
Humans and other animals can learn and execute many different tasks throughout their lifespan, a process known as continual learning. However, this biological ability challenges many artificial neural networks that suffer from catastrophic forgetting, unless these networks are regularized or expanded. Specifically, unique information about new tasks can be encoded through expansion (adding ‘neurons’ into the network for a new task), while shared information between old and new tasks can be integrated into shared representations. Here, we aimed to test how the biological brain naturally solves this problem. We trained mice on a class of tasks involving the learning of multiple, related, sensorimotor associations, specifically multiple distinct auditory two-choice tasks using a moveable wheel. We exploited two training curricula where mice learned these tasks either sequentially or simultaneously. In both configurations, mice expertly performed both tasks in a block-based manner at the final stage of training. We tracked neural activity of L2/3 pyramidal cells in the auditory cortex (AC) and the posterior parietal cortex (PPC) using multi-area two-photon mesoscopic calcium imaging. This recording method allowed us to longitudinally track expansion and integration of neural representation at single-cell resolution throughout different stages of multi-task learning. Surprisingly, a sub-area in PPC showed both reliable auditory responses even in naïve animals and dynamic response patterns during learning, indicating its importance in learning of multiple auditory tasks. Together, our behavioral and neural approach promises to help us better understand the precise computations used by biological neural networks for continual learning and how this depends on the learning curriculum.
Ryohei Tomioka, Naoki Shigematsu, Toshio Miyashita, Yumiko Yoshimura, Kenta Kobayashi, Yuchio Yanagawa, Nobuaki Tamamaki, Takaichi Fukuda and Wen-Jie Song
Fri, 10/4 10:15AM - 12:00PM | A57
Abstract
The cortico-basal ganglia loop has traditionally been conceptualized as consisting of three distinct information networks: motor, limbic, and associative. However, this three-loop concept is insufficient to comprehensively explain the diverse functions of the cortico-basal ganglia system, as emerging evidence suggests its involvement in sensory processing, including the auditory systems. In the present study, we demonstrate the auditory cortico-basal ganglia loop by using transgenic mice and viral-assisted labelings. The caudal part of the external globus pallidus (GPe) emerged as a major output nucleus of the auditory cortico-basal ganglia loop with the cortico-striato-pallidal projections as its input pathway and pallido-cortical and pallido-thalamo-cortical projections as its output pathway. GABAergic neurons in the caudal GPe dominantly innervated the non-lemniscal auditory pathway. They also projected to various regions, including the substantia nigra pars lateralis, cuneiform nucleus, and periaqueductal gray. Considering the functions associated with these GPe-projecting regions, auditory cortico-basal ganglia circuits may play a pivotal role in eliciting defensive behaviors against acoustic stimuli.
Megan Warren, Larry Young and Robert Liu
Fri, 10/4 10:15AM - 12:00PM | A58
Abstract
Recognizing individuals based on their vocalizations would presumably be adaptive for social species, yet evidence for this ability is thin in common mammalian models for studying social neuroscience. For example, within many rodent species, the fleeting nature of social relationships may be one reason it has been difficult to demonstrate vocal recognition – if it exists for them. One rodent model in which vocal recognition may be highly beneficial is the prairie vole (Microtus Ochrogaster), which forms long-lasting pair bonds between mates. Thus, we aimed to determine whether prairie voles can use vocalizations as a means of individual recognition. We recorded vocalizations of prairie voles in three social contexts: free interaction between a male (n=7) and a stranger female (n=1), free interaction between a male and his to-be partner (n=7), and free interaction between a male and his established partner. Using sinusoidally frequency modulated (sinFM) features of the vocalizations emitted during these encounters, we found that we could computationally identify both the individual identity (mean accuracy = 0.74±0.11, t = 128.7, p < 10e-3) and the experience level (3.89 < z < 421.25, all p < 0.009) of the males based solely upon their vocal emissions. Next, to determine whether female voles were able to distinguish the calls of their mates from the calls of unfamiliar males, we designed a novel playback paradigm in which 10-minute audio streams were built out of 30 second intervals of either vocalizations or background noise. A mated female was placed into an arena with two competing speakers, one playing back her mate’s ultrasonic vocalizations (USVs), and another playing back the USVs of a stranger male. We discovered that females spent more time actively investigating the speaker playing back her partners' USVs than the speaker emitting stranger USVs (t = -6.44, p < 10e-3). Thus, prairie voles emit vocalizations with distinguishable features, and females are able to identify the USVs of their mates as compared to an unfamiliar male. Additionally, female prairie voles also choose to inspect their male partners' USVs more than the USVs of a stranger male. Together, these results lay the foundation for the prairie vole as a tractable rodent model to study the neural basis of individual vocal recognition.
Benedikt Zoefel, Omid Abbasi, Joachim Gross and Sonja Kotz
Fri, 10/4 10:15AM - 12:00PM | A59
Abstract
Evidence accumulates that the cerebellum's role in the brain is not restricted to motor functions. Rather, cerebellar activity seems to be crucial for a variety of tasks that rely on precise event timing and prediction. Due to its complex structure and importance in communication, human speech requires a particularly precise and predictive coordination of neural processes to be successfully comprehended. Recent studies proposed that the cerebellum is indeed a major contributor to speech processing, but how this contribution is achieved mechanistically remains poorly understood. We here aimed to reveal a mechanism underlying cortico-cerebellar coordination and demonstrate its speech-specificity. In a re-analysis of magnetoencephalography (MEG) data, we found that activity in the cerebellum aligned to rhythmic sequences of noise-vocoded speech, irrespective of its intelligibility. We then tested whether these "entrained" responses persist, and how they interact with other brain regions, when a rhythmic stimulus stopped and temporal predictions had to be updated. We found that only intelligible speech produced sustained rhythmic responses in the cerebellum. During this "entrainment echo", but not during rhythmic speech itself, cerebellar activity was coupled with that in the left inferior frontal gyrus (IFG), and specifically at rates corresponding to the preceding stimulus rhythm. This finding represents unprecedented evidence for specific cerebellum-driven temporal predictions in speech processing and their relay to cortical regions.
Jasmine L Hect, Kyle Rupp, Mike Sobol and Taylor Abel
Fri, 10/4 10:15AM - 12:00PM | A60
Abstract
Prior work suggests frontal regions facilitate encoding of higher-level sound categories such as speaker identity or affect. fMRI studies report a highly correlated response of parabelt auditory cortex with the inferior frontal gyrus (IFG) during voice perception, even in the absence of speech. What features are encoded by this activity in IFG remain unknown. Here we use intracerebral electroencephalography (iEEG) to investigate the spatiotemporal dynamics of the neural response to natural sounds in frontal and temporal regions. We collected iEEG data in 27 patient-participants during completion of an auditory 1-back repetition task of 72 voice (8 speech stimuli) and 72 non-voice natural sounds (Belin voice localizer adapted for iEEG, 17% repeats). Anatomic electrode localization was performed using Freesurfer cortical surface reconstruction and automatic labeling of gyri and sulci individually for each patient. iEEG channels were assigned to a gyral or sulcal region if it was the label of the nearest pial surface vertex (baseline, t-stat p
Shinichi Kumagai, Tomoyo Isoguchi Shiramatsu, Karin Oshima, Kensuke Kawai and Hirokazu Takahashi
Fri, 10/4 10:15AM - 12:00PM | A61
Abstract
Vagus nerve stimulation (VNS) modulates neural activities in the auditory cortex. Previous studies suggest that VNS can enhance feedforward (FF) auditory processing from lower to higher-order areas. However, it remains unclear how VNS modulates neural entrainment to rhythmic sounds. In this study, we hypothesized that VNS enhances the auditory steady-state response (ASSR), a form of entrainment, in the cerebral cortex receiving FF thalamocortical input. We investigated the effect of VNS using electrocorticography that covered the auditory and insular cortex of 11 Wistar rats under isoflurane anesthesia. Electrophysiological recordings were performed after implantation of the VNS system. The auditory cortex and insular auditory field were characterized by auditory-evoked potentials while presenting click stimuli. We presented rapid and periodic click trains in a session containing 300 trials before and more than 3 hours after VNS. Click trains were 500ms in duration at rates of 20- and 40-Hz (20 and 40 clicks/s). The intertrain interval was 500ms. ASSR was estimated using inter-trial phase clustering (ITPC) at the rate of presented click trains. We assessed, using analysis of variance, how VNS modulates ITPC. ITPC increases were observed at the corresponding frequencies of the click trains in both the auditory cortex and insular auditory field: at 20 Hz during 20-Hz click trains and at 40 Hz during 40-Hz click trains. In the auditory cortex, VNS significantly enhanced ITPC around 40 Hz and 20 Hz in response to the 40-Hz and 20-Hz click trains, respectively. Furthermore, in the insular auditory field, VNS increased ITPC around 40 Hz in response to the 40-Hz click trains. On the other hand, VNS did not change ITPC in spontaneous activities in the intertrain intervals. These results suggest that VNS can enhance ASSR in the auditory and insular cortex, consistent with its effect on FF processing in previous studies. VNS may strengthen the entrainment of cortical oscillations induced by repetitive auditory stimulation.
David O Sorensen, Jenna Sugai, Kenneth E Hancock and Daniel B Polley
Fri, 10/4 10:15AM - 12:00PM | A62
Abstract
Individuals with chronic tinnitus either hear an indefatigable and irrepressible phantom sound every hour of their waking day or a phantom sound that benignly fades into the background when unattended. The difference between benign and burdensome tinnitus may be explained by a combination of sensory insults (e.g. excess cortical gain), limbic arousal, or executive ability to suppress the sound. Previous results from our lab show that the degree of central gain does not correlate with tinnitus burden, but measures of affective processing are associated with increased burden. To test the hypothesis that individuals with burdensome tinnitus exhibit broader deficits in inhibitory control over external auditory sounds, we developed a paradigm to probe the neural and behavioral effects of auditory distraction. Subjects were presented with a target stimulus organized along four nested timescales, including temporal fine structure (~500 Hz), envelope (~25-80 Hz), envelope changes (~5 Hz), and embedded context (~0.5 Hz). EEG was recorded to capture following responses as participants reported perceptual judgments about the embedded context. The target was paired with two sets of competitor stimuli—melodies and matched noise—which share low-level features but differ in their level of distraction. Results in participants with normal hearing (n=15) showed that synchronization to rapid features was insensitive to the difference in distraction, whereas synchronization to the slower envelope changes was reduced when the target was accompanied by the more distracting melodies. In participants with tinnitus (n=11) and matched low-frequency hearing, behavioral analysis confirmed no gross difference in distraction performance. Our ongoing work will focus on how each distractor degrades neural synchronization to envelope changes in individuals with high tinnitus burden against individuals with low tinnitus burden. Our results build on work that shows that individuals with tinnitus perform as well as individuals with normal hearing on listening tasks in noisy environments and expand the work into the neural representation of sounds in distracting environments. We also demonstrate that our novel paradigm for measuring auditory distraction can be applied in clinical populations.
Josefine Hebisch, Anna-Christin Ghassemieh, Emanuela Zhecheva, Myrthe Brouwer, Simon van Gaal, Lars Schwabe, Tobias H. Donner and Jan Willem de Gee
Fri, 10/4 10:15AM - 12:00PM | A63
Abstract
The arousal systems of the brainstem, specifically the locus coeruleus-noradrenaline system, respond “phasically” during decisions. These central arousal transients are accompanied by dilations of the pupil. Attempts to understand the impact of phasic arousal on cognition would benefit from the ability to manipulate arousal in a temporally precise manner. Here, we evaluated a candidate approach for such a manipulation: task-irrelevant sounds presented during the execution of a challenging task. Such sounds drive responses of brainstem nuclei involved in the control of pupil size, but it is unknown whether the sound-evoked responses mimic the central arousal transients recruited by cognitive computations. We aimed to test to which level of temporal precision and reliability task-irrelevant sounds can be used to systematically manipulate pupil responses, and how these responses relate to phasic arousal boosts that occur naturally during decisions. We tested a total of 97 participants in three challenging perceptual decision-making tasks. In each experiment, we compared pupil responses evoked by the task with the pupil responses evoked by task-irrelevant white noise sounds of varying onset latencies or durations. Participants were asked to judge whether a low-contrast visual grating stimulus was superimposed onto dynamic visual noise (Exp. 1 and 2) or whether the average orientation of a stream of eight gratings belonged to a “diagonal” or “cardinal” category (Exp. 3). The pupil dilated in response to both task engagement and the task-irrelevant sounds. The latter consistently drove robust and precisely timed pupil responses that superimposed onto the task-evoked pupil responses and were therefore separable through a linear subtraction. We replicated a negative correlation between the amplitude of task-evoked pupil responses and choice bias observed in previous studies. Yet, in neither experiment did the task-irrelevant sounds affect bias, nor any other aspect of choice behavior. Our findings suggest that task engagement and task-irrelevant sounds may differentially recruit neural systems that are all involved in the control of pupil size but have distinct influences on cognitive computation.
Amy LeMessurier, Ayat Agha, Gurket Kaur, Janaye Stephens and Robert Froemke
Fri, 10/4 10:15AM - 12:00PM | A64
Abstract
How does the brain derive meaning from social vocalizations and drive behavior in response? Vocalizations are complex stimuli that must require integration across multiple time-scales and appropriate contextualization. Thus it is likely that top-down modulation of feed-forward auditory processing is needed for these computations. To test this hypothesis, we took advantage of a robust mouse behavior, in which parental animals search for lost pups in response to infant ultrasonic vocalizations (USVs). Experienced moms are experts at this behavior, but nulliparous females can also learn to respond to infant distress calls after cohousing with lactating moms and litters, and emergence of pup retrieval behavior is correlated with oxytocin-dependent plasticity in the left auditory cortex (Marlin et al. 2015, Schiavo et al. 2020). We first asked whether projections from auditory cortex to subcortical areas are important for pup retrieval behavior in experienced maternal mice by chemogenetically silencing activity in left auditory cortex layer 5 during retrieval. Pup retrieval was reduced after CNO treatment vs vehicle control sessions (N=6 mice, p
Cedric Bowe, Jessica Mai, Valentina Esho, Rowan Gargiullo, Eliana Pollay, Lucas Williamson, Karinne Cobb and Chris Rodgers
Fri, 10/4 10:15AM - 12:00PM | A65
Abstract
Research in Alzheimer's disease (AD) has primarily focused on the presence of amyloid beta and tau neurofibrillary tangles in affected neurons with little focus on how the pathology alters neural network activity. Currently there is evidence of impaired communication within brain networks, described as changes in “functional connectivity”, resulting in improper coordination and integration of information between brain regions in networks such as the default mode network. However, the studies that describe these findings mostly use FMRI which lacks the temporal and spatial resolution to accurately measure the changes in neural activity that relate behavior to functional connectivity. Using an animal model with similar findings will allow for the use of more invasive and accurate methods of recording neural activity. To investigate how these changes in neural activity affect behavior, we train a mouse model of amyloidosis (5xFAD) to perform an auditory spatial processing task. This task is designed to test their ability to integrate auditory, navigational, and motor information. During this task, we use a video tracking algorithm (SLEAP) to record their movements and use this data to compare the differences in motor behavior between the 5xFAD models and controls. Preliminary data currently shows 5xFAD models perform worse at the task in comparison to their control counterparts, although we have also observed strain differences between B6SJL and a hybrid of C57BL/6J with CBA/CaJ. The average speed and distance traveled does not significantly differ between 5xFAD and control, meaning that this performance difference cannot be explained by gross changes in locomotor ability. We are currently characterizing the behavioral patterns in both groups of mice using an unsupervised machine learning tool known as Keypoint MoSeq. In the future we plan to implant tetrodes into the auditory cortex, motor cortex and hippocampus as the mice perform their task and compare the neural activity and functional connectivity of 5xFAD mice and controls. Enhancing our understanding of how AD pathology translates to the aberrant neural activity that leads to AD symptoms will allow us to explore new therapeutic targets for neural modulation methods.
Nathiya Vaithiyalingam Chandra Sekaran, Mathew R Lowerison, Pengfei Song, Susan L. Schantz and Daniel A Llano
Fri, 10/4 10:15AM - 12:00PM | A66
Abstract
Exposure to environmental toxins such as polychlorinated biphenyls (PCBs) is widespread via multiple routes such as air, soil and water because of their stability and resistance to degradation. It was previously observed that the effects of developmental PCB exposure demonstrated that PCBs impact both central and peripheral auditory systems independently and PCB exposures can combine with later noise exposure to produce supra-additive effects on the auditory system. Previous work has also shown that developmental PCB exposure impacts endothelial cell function. Thus, the current goal is to study how combined sequential exposure to PCBs and environmental noise stress impact the microvasculature of the auditory system. The current study used super-resolution ultrasound localization microscopy (ULM) as a novel tool to measure microvascular dynamics in the auditory system. Female CBA/CAJ were dosed orally with 6 mg/kg/day of the PCB mixture dissolved in corn oil vehicle 4 weeks prior to breeding and dosing was continued through gestation and until postnatal day (PND) 21. On PND 21, pups were weaned, and two males from each litter were randomly selected for the study. As adults at the age of P90, the male mice were exposed to high-intensity noise for 45 mins at 110dB. We examined the impact of hearing using auditory brainstem responses (ABR) after the developmental PCB exposure at pre and post-noise exposures. The hearing threshold was again tested on day 7 post noise exposure to determine if the PCB has any effect on hearing recovery after noise exposure and which is associated in blocking hearing recovery. We established our capacity to image the microvasculature of the inferior colliculus (IC) of this model, which will allow us to assess PCB and noise effects on this structure. The first IC imaging using ULM was established using a cranial window which can allow us to image longitudinally. We applied ULM imaging to a mouse model exposed to PCB and quantified differences in cerebral vascularity, blood velocity, and vessel tortuosity across the midbrain and other brain regions. Vascular structures identified with ULM were validated with histology of the same brain with FITC. The blood supply to the medial IC flows from a paramedian branch of the basilar artery. The lateral IC receives blood supply from additional pathways. This work will provide new insights in understanding the microvascular effects in the central auditory system of an environmental toxin and subsequent stressor-exposed model.
Dana Boebinger, Guoyang Liao, Christopher Garcia, Kirill Nourski, Matthew Howard, Thomas Wychowski, Webster Pilcher and Sam Norman-Haignere
Fri, 10/4 10:15AM - 12:00PM | A67
Abstract
Information coding in speech is computationally challenging because sounds with shared information (e.g., word) vary enormously in their acoustics. Successful communication requires listeners to encode information in a manner that is “invariant” to such acoustic variation, but currently little is known about the neural computations that underlie this ability in the human auditory cortex, in part due to the coarse resolution of standard neuroimaging methods. Prior research has shown that auditory cortex can adaptively suppress acoustic variation that is stationary over time (e.g., a stationary background sound), which might support invariant representations of non-stationary sounds such as speech. In addition, human non-primary auditory cortex contains neural populations specifically tuned to speech structure, which might support acoustically tolerant speech representations even in the absence of generic adaptation mechanisms. We used spatiotemporally precise human intracranial recordings to measure the tolerance of cortical speech representations to many different types of acoustic variation (e.g., reverb, background noise, spectral filtering), leveraging a novel experimental paradigm that allowed us to measure the time needed for invariant speech representations to emerge, relative to the onset of speech information. Our results show that representations of speech become nearly completely invariant to a wide range of different types of acoustic variation in speech-selective neural populations of non-primary human auditory cortex. These acoustically invariant speech representations emerge rapidly, within ~250 ms of the onset of speech information, but on average require longer computation times than non-invariant representations. We show that these effects cannot be explained by standard, linear spectrotemporal filtering models or adaptive suppression of stationary acoustic information, suggesting that invariant information coding is accomplished by rapid, nonlinear computations that are tuned to speech-specific structure.
Aramis Tanelus, Ralph Peterson, Aman Choudhri, Chris Ick, Violet Ivan, Niegil Francis, David Schneider, Dan Sanes and Alex Williams
Fri, 10/4 10:15AM - 12:00PM | A68
Abstract
Social animals congregate in groups and communicate with vocalizations. To study the dynamics of natural vocal communication and their neural basis, one must reliably determine the sender and receiver of the vocal signal. Existing approaches to address this problem rely on estimating source positions using time delays between microphones in an array (e.g. beamforming), or by surgically affixing miniature microphones to the animal. Although effective in some contexts, these approaches are not robust to reverberant environments (beamforming) or not scalable to large social groups (mini microphones). Thus, there is considerable interest in developing non-invasive sound source localization and vocal call attribution methods that work off-the-shelf in typical laboratory settings. To this end, we developed (1) a supervised deep learning framework with calibrated uncertainty estimates that achieves state-of-the-art sound source localization performance in reverberant environments, (2) novel hardware solutions to generate benchmark datasets for training/evaluating sound source localization models across labs, and (3) curated and released the first large-scale benchmark datasets for vocal call localization in social rodents. In addition, we detail a procedure to generate synthetic training data with acoustic simulations for pre-training sound source localization models.
D. Walker Gauthier, Noelle James and Benjamin D. Auerbach.
Fri, 10/4 10:15AM - 12:00PM | A48
Abstract
Atypical auditory processing is one of the most common and quality-of-life effecting symptoms seen in autism spectrum disorders (ASD). ASD individuals often exhibit altered sound sensitivity and feature discrimination, contributing to sensory overload and disrupted language comprehension. Fragile X Syndrome (FXS) is the leading inherited cause of ASD and a majority of FXS individuals present with these auditory processing alterations. We have shown previously that a Fmr1 KO rat model of FXS exhibits altered sound sensitivity that coincides with abnormal perceptual integration of sound duration and frequency. Here we further characterized auditory processing deficits in Fmr1 KO rats using an operant conditioning tone discrimination assay and in vivo electrophysiology recordings from the auditory cortex (ACx) and inferior colliculus (IC). We found that Fmr1 KO rats exhibit poorer frequency resolution, which corresponded with sound-evoked hyperactivity and broader frequency tuning in auditory cortical but not collicular neurons. These findings suggest that cortical hyperexcitability may account for a range of auditory behavioral phenotypes in FXS, providing a potential locus for development of novel biomarker and treatment strategies that could extend to other forms of ASD.
Zahra Ghasemahmad, Elliot Chin, Maryse Thomas, Carolyn Sweeney, Jeffery Wenstrup and Anne Takesian
Fri, 10/4 4:15PM - 6:00PM | B01
Abstract
Mice emit a repertoire of complex vocalizations during different behavioral contexts, including courtship and aggressive interactions. Playback of these vocalizations can elicit distinct behavioral responses and neuromodulatory release patterns in the brains of listening mice that depend upon behavioral state and experience. Auditory cortex is thought to provide information about the identity of the vocalization to the motor and emotion centers of the brain involved in shaping these behavioral reactions. However, the representation of these salient vocalizations within neuronal subpopulations across auditory cortical regions is not well understood. To address this, we focused on pyramidal excitatory neurons and a subpopulation of inhibitory interneurons expressing Neuron-Derived Neurotrophic Factor (NDNF) which are believed to shape cortical responses in a state-dependent manner. Using transgenic mouse lines that express genetically-encoded calcium indicators (GCaMP) in either cortical pyramidal neurons or NDNF interneurons, we performed widefield and two-photon calcium imaging from the auditory cortical regions in awake head-fixed mice. Playback of distinct vocalizations induced robust activity across the primary auditory cortex (A1), anterior auditory field (AAF), and multiple higher-order auditory cortical fields. Furthermore, subsets of L2/3 and L4 pyramidal neurons and NDNF interneurons show strong selectivity to specific vocalizations or vocalization classes (mating versus stress calls). Many of these neurons show robust responses to these vocalizations, but not to sound stimuli with the same spectral content. A small number of vocalization-selective cortical neurons can accurately decode the vocalization identity. Our ongoing experiments are examining the behavioral state- and experience-dependent responses to these vocal stimuli within specific auditory cortical regions. Together, these studies will provide insight into the processing of salient vocalizations within specific circuits across auditory cortical fields.
Przemyslaw Jarzebowski and Daniel Bendor
Fri, 10/4 4:15PM - 6:00PM | B02
Abstract
The ability to predict future sensory inputs is a crucial part of the brain’s function. While some theories suggest that predictions and their effects are limited to higher-order brain areas, others propose that predictive processes are fundamental across the brain. Distinguishing these two and other possibilities would help us understand how perceptions emerge and change with expectations. In the sensory cortex, neuronal activity is influenced by the predictability of sensory stimuli. However, it is often challenging to separate predictive signals from sensory-evoked responses occurring at the same time. To address this, we examine neuronal responses to the omission of expected sounds. Using single-unit recordings in awake mice, we investigate how auditory cortex neurons respond when an anticipated sound is omitted from a predictable sequence. Our findings show that a subset of neurons responds to an unexpected omission of sounds. These omission-responsive neurons fall into two distinct categories: (1) neurons whose activity mimics that during the played sound and (2) neurons that exhibit error-like responses, firing only when an expected sound is omitted, not when it is presented. Multiple lines of evidence support the idea that both types of responses are specific to predictions: omission responses are consistent across different sound sequences and are not explained by delayed sound offset responses or the animal's movement during the unexpected omission. Additionally, the neurons adjust their response to changing expectations, ceasing to respond when sound omission becomes expected. Our results demonstrate that predictive signals can elicit precisely timed responses in the auditory cortex. The responses we observe diverge from the ones proposed by existing theoretical frameworks, suggesting the potential for novel insights into mechanisms of predictive processing.
Nishan Shettigar, Jessica Teran and Emily Dennis
Fri, 10/4 4:15PM - 6:00PM | B03
Abstract
Drawing inspiration from nature, we ask mice to hunt for crickets, guided by sensory cues, in a large arena. This hexagonal arena (2 meters wide), comprises modular hexagon tiles and felt walls, creating a repeated Y maze structure. Inside most hexagonal walls is nothing except the floor. For 16 tiles, there is a speaker and a trap door on the floor, under which is a cricket. As the animal navigates through the arena, whenever the animal stops moving (pauses), a speaker in one of the release tiles provides an auditory stimulus (chirp noises). If they enter the tile that is producing chirps, a trap door opens and a cricket is available. After they eat, the cricket chirps come from another tile until they enter that tile, and receive another cricket. This process continues for the duration of the experimental session. Most mice become very successful in finding crickets in this complex task, earning 4-8 crickets per hour. Here we report our findings on how the animals explore and hunt for crickets using ambiguous sensory cues. To gain insights into strategies used by the animals, we perform global and local targeted sensory perturbations to this task and demonstrate that most animals predominantly use sound cues to find crickets under these conditions. Further, using the comparative neuroethological approach, behavior modeling, and neuronal recordings, we aim to identify strategies animals use to successfully hunt in this complex environment using auditory cues.
Samira Souffi and Israel Nelken
Fri, 10/4 4:15PM - 6:00PM | B04
Abstract
Dopaminergic release from neurons of the ventral tegmental area (VTA) has been widely studied for its role in reward, prediction error encoding, behavioral reinforcement, motivational salience and learning processes. Dopamine release has been implicated in determining sound preferences as well, but this role is less well characterized. We exposed male and female mice in their homecage to human music (1st movement of Beethoven’s 9th symphony) or silence from P7 to P40, covering both early and late auditory critical periods. At early adulthood, we performed a free-choice behavioral test in which mice could choose to dwell in a music zone or in a silence zone of the test box. Following the test, we performed fiber photometry in the VTA of these mice, while they moved freely and heard simple (broadband noise and pure tones) and complex (exposed and unexposed music excerpts) sounds. On average, we found that exposed mice (music-or silence-exposed) spent a longer time in the music zone than in the silence zone compared to naive mice, with noticeable differences between males and females: in males, the music-exposed mice spent longer time in music than in silence compared to the silence-exposed mice and in contrary, in females, the music-exposed mice spent shorter time in music than in silence compared to the silence-exposed mice. All sounds were associated with both increasing and decreasing calcium transients. The music-exposed mice showed a large decrease in VTA activity to all sounds compared to naive mice; silence-exposed mice showed a smaller decrease in activity (males) or no significant change (females). Further experiments are currently conducted on naive TH-cre mice using a Cre-dependent GCaMP virus (AAV9-Syn-DIO-jGCaMP8m) allowing chronic recordings of the dopaminergic neurons only.
Ole Bialas, Aaron Nidiffer, Lori Holt, Fred Dick, Sahil Luthra, Erin Smith and Edmund Lalor
Fri, 10/4 4:15PM - 6:00PM | B05
Abstract
Selectively listening to acoustic signals is crucial in acoustically complex environments. Traditionally, attention is understood as an interplay between goal-directed top-down modulation and bottom-up stimulus salience. However, attention can also be guided by the observer's prior expectations. A potentially important driver of such expectations is the presence of statistical regularities in the acoustic environment. Temporal statistical regularities encountered as syllabic transitional probabilities have long been shown to affect perception and learning. Notably, acoustic regularities can affect performance and learning even if they occur along a dimension that is irrelevant for the task at hand. The venerable 'probe signal' psychoacoustic paradigm has shown that listeners asked to detect pure tones in noise will show markedly different perceptual thresholds for tone frequencies whose probability of occurrence differs - even though tone frequency is task-irrelevant. The listener's expectation - based on the global probability of each tone frequency - can dramatically affect low-level perception. While this may suggest an exaggerated response to the expected signal, error-based accounts predict the opposite pattern: enhancement of unexpected, rarely occurring stimuli. Here, we use human EEG and a simple go/no-go task to investigate how stimulus probability along a task-irrelevant dimension, pitch, affects detection of tones in noise and their neural correlates. We show that tones occurring with higher probability are detected faster and more accurate, and that the evoked response potentials to those tones have larger amplitudes and shorter latencies. Thus, response amplitude and latency may be considered neural correlates of auditory statistical learning.
Christopher Conroy, Yale E. Cohen and Robert M. McPeek
Fri, 10/4 4:15PM - 6:00PM | B06
Abstract
When perceptual decisions about visual stimuli are reported with eye movements, signals related to decision formation are evident in the superior colliculus (SC). Whether or not signals related to decision formation are evident in the SC in the context of saccadic decisions based on nonvisual sensory inputs, however, is unknown. The SC receives sensory inputs from multiple auditory areas, specifically, auditory spatial inputs thought to mediate reflexive and overt orienting behaviors. It is possible, therefore, that the SC plays a role in the evaluation of auditory inputs in the context of saccadic decisions based on auditory spatial cues. To investigate this possibility, we recorded individual SC neurons in a rhesus monkey that was making decisions about where to look based on auditory spatial cues. On each trial, the monkey fixated a visual stimulus located at the center of a visual display, and a sequence of brief, auditory-white-noise bursts (with noisy spatial locations) was presented along the frontal-horizontal plane. The monkey's task was to decide if the acoustic source that generated the sequence was located to the left or right of the frontal midline and to report that decision by making a saccade to one of two visual targets. The location of the correct target was spatially dissociated from the acoustic source (within a hemifield), thus requiring a flexible transformation of the auditory cues into an overt action plan (i.e., the monkey did not make a saccade directly to the perceived location of the acoustic source but rather interpreted that location to select the appropriate saccade plan). The monkey's performance on this task varied systematically with evidence strength: sensitivity (d') increased as the location of the acoustic source moved away from the frontal midline, providing more robust auditory cues for the selection of a saccade plan. A subset of SC neurons exhibited activity during task performance that suggested a potential involvement of those neurons in saccadic-decision formation, meaning modulations of their activity patterns by auditory-evidence strength and/or nonsensory factors related to task performance. This suggests that the SC may play a role in the evaluation of auditory cues when the evaluation of such cues is required for saccade planning. It also suggests a more general role for the SC, and perhaps other oculomotor structures, in the formation of saccadic decisions based on nonvisual sensory inputs.
Keshov Sharma, Mark Diltz, Matthieu Fuchs, Yunshan Cai and Lizabeth Romanski
Fri, 10/4 4:15PM - 6:00PM | B07
Abstract
Facial gestures and vocalizations combine in many brain regions to form meaningful exchanges during social communication. In the macaque, the VLPFC is a site of multisensory integration that shows high regional activation (via fMRI) and selective single unit responses to social stimuli. Most recently, our group has shown that populations of neurons, instead of single units, in the VLPFC persistently encode the identity of a conspecific while viewing naturalistic, communicative expressions. Given that approximately 70% of VLPFC neurons are multisensory, frequently combining auditory and visual information non-linearly, we sought to understand if population encoding of identity is a result of multisensory integration or could be supported by a single sensory modality. For the current study we used the same naturalistic audiovisual movie clips of macaques vocalizing as previously employed and compared population responses in VLPFC to multisensory as well as the unimodal auditory and visual components separately. Ensembles of single units were recorded in the VLPFC from multi-electrode arrays in VLPFC in one macaque. We utilized 9 short audiovisual movie clips that consisted of 3 unfamiliar macaques (3 identities) each producing 3 vocalizations accompanied by prototypical facial expressions (3 vocalization/expressions). Each recording session involved the presentation of 3 sets of stimuli: (1) the audiovisual movie clips, (2) the silent movie clips (same movie, with audio track removed) and (3) the audio track/vocalization alone. We recorded the activity of a small population of neurons (39-42 units) during each task and then performed a sliding bin decoding analysis using the population activity to predict the identity of conspecifics presented in the stimuli of each trial (high-dimensional discriminant analysis, LOOCV, typically 107 training trials). We found that decoding accuracy for identity increased quickly after stimulus onset for audiovisual and visual trials, peaking between 60-80% at 300ms post stimulus onset and sustained above 45% at 800ms (chance = 33%, n=10 populations per modality). In contrast, identity decoding accuracy for auditory (vocalization alone) trials ranged between 30-40% and did not vary across the time of stimulus presentation (n = 9). Further analyses are focused on decoding of expression, i.e. type of vocalization, presented in the stimuli. These results suggest that, in nonhuman primates, visual face and contextual features are critical components of identity processing during social communication that may precede or guide accompanying vocal perception
Karl Lerud, Vrishab Commuri, Charlie Fisher, Samira Anderson, Behtash Babadi, Stefanie Kuchinsky and Jonathan Simon
Fri, 10/4 4:15PM - 6:00PM | B08
Abstract
Listening to speech in noise is an everyday occurrence made possible, if conditions allow, by complex neural processes. Specifically, listening to one person’s voice in the midst of one or more competing voices is often called the cocktail party problem, and is a task that lends itself to experimental investigation because of its naturalistic nature and the ease of parametric control. Here, we record simultaneous EEG and MEG, as well as multiple behavioral measures, from normal hearing younger adults as they attend to one of two competing speakers, at several different signal-to-noise ratios (SNRs), reading a narrative text. Using the temporal response function (TRF) paradigm with respect to the EEG and MEG responses, we analyze auditory responses from both the brainstem and cortex. Stimulus regressors are constructed to represent a hierarchy of auditory and language-based features from both the target and distractor speakers, from which multiple TRFs are calculated. These TRFs are also analyzed to determine which aspects of auditory brain responses are modulated by selective attention, and to what extent. EEG, which is sensitive to deep auditory sources, allows estimation of faster time-scale TRFs, calculated with regressors corresponding to a cochlea and auditory nerve model. MEG allows estimation of slower time-scale TRFs, calculated with slower regressors corresponding to features of the speech signal such as the envelope and envelope onsets, as well as linguistic features at the phoneme and word level. A hierarchy of speech-related TRFs and their corresponding sources is thus measured concurrently, including at the latencies of the auditory brainstem response (ABR, 0 – 15 ms), middle latency response (MLR, 15 – 60 ms), N1-P2 complex (60 – 200 ms), and slower linguistic responses (120 – 800 ms). We find little evidence that faster TRFs from early auditory areas depend on the speaker identity regressor (target vs. distractor), but can demonstrate that later auditory and linguistic cortical TRFs exhibit a wide range of levels of the effect of selective attention. Some results in the recent literature are mixed with regard to attentional modulation of ABR- or frequency following response (FFR)-type responses; this novel approach that combines concurrent source- space EEG and MEG and separate families of TRF regressors for each speaker, may help to shed further light not only on how the cortex differentially tracks an attended speaker, but also on how the earlier auditory system may or may not do the same.
Sharlen Moore, Zihao'Travis' You and Kishore Kuchibhotla
Fri, 10/4 4:15PM - 6:00PM | B09
Abstract
Astrocytes, the predominant glial cell type in the brain, have emerged as active participants in neural information processing and plasticity. However, their functional dynamics across learning remain poorly understood. Here, we used chronic two-photon calcium imaging to longitudinally track individual astrocyte Ca2+ dynamics in the auditory cortex of awake mice (expressing Aldh1l1-dependent GCaMP6s in an inducible manner) across the acquisition of an auditory discrimination task. We trained mice to lick to a tone for water reward (S+) and withhold from licking to another tone (S−) to avoid a timeout. Over several days of training, astrocytes exhibited learning-related modulation of their Ca2+ dynamics, with cells showing enhancement of their evoked responses during rewarded trials. This increased activity was not due to licking as the same astrocytes exhibited suppressed activity on errors of action (incorrect licking to the S-, false alarm). Omitting reward on correct trials (hits, S+), however, led to biphasic responses where a transient increase in activity was followed by a profound suppression, suggesting that reward consumption may drive extended increases in astrocyte activity. Interestingly, reward-related astrocyte calcium signals extended beyond individual trials, potentially playing a role in maintaining a signature of reward and trial history for upcoming choices. Our data suggest that coordinated astrocyte ensembles may provide a scaffold for integrating reward signals with sensory processing to facilitate learning, potentially bridging trial-level and inter-trial computations. This study expands our understanding of astrocyte contributions to neural circuit dynamics underlying adaptive behavior.
Amit Khandhadia, Stephen Town, Soraya Dunn and Jennifer Bizley
Fri, 10/4 4:15PM - 6:00PM | B10
Abstract
Natural auditory sensation in mammals must resolve the position and distance of auditory stimuli often during movement of the listener. However, many experiments examining spatial responses in auditory cortex often restrain animals and present sounds from limited locations. Experiments that have expanded the number of spatial positions and allowed free-movement have discovered that primary auditory regions can encode sound not only in egocentric, head-centered space but also allocentric, world-centered space (Town, Brimijoin et al. 2017, Amaro, Ferreiro et al. 2021). But these experiments still presented sound from the edges of an enclosure rather than in an immersive soundscape limiting our ability to understand how spatial representations are formed in the auditory pathway. To better understand the structure of world-centered representations, we recorded from ferrets as they freely roamed within a speaker grid environment. Within this speaker grid, forty individually controlled speakers were located within a 2m by 4m arena, placed beneath an acoustically transparent mesh floor. We recorded from ferrets while presenting click stimuli at intervals of 500-750 milliseconds (ms) from all locations in a pseudorandom sequence while subjects freely explored. Ferrets were previously implanted with 32-channel chronic electrode arrays over primary and some secondary auditory regions of the cortex. While recording, we also captured video and used this footage to train a deep neural network model in Deep Lab Cut (DLC) to track the position of the head, ears, and body of the subjects. From this tracking, we determined the physical position and head angle of the ferret. From the neural recordings, we extracted both local field potentials (LFP) and neural spiking data and aligned both to the timing of clicks from each speaker location. Using the head angle information, we can map the angle of each click relative to the head, and construct head-centered spatial receptive fields, while with the location of each click in the arena, we can construct world centered spatial receptive fields. Our preliminary results indicate that event related potentials (ERPs) in the LFP to clicks remained at a similar level regardless of distance to the sound source, with tuning to sound angle remaining largely similar across the range of distances. We will further combine these results with spike rates and deeper investigation of the responses to head angle, sound position, and speed. Together this will unlock further understanding about the spatial encoding of auditory stimuli during natural behaviour.
James Webb, Paul Steffan, Benjamin Hayden, Daeyeol Lee, Caleb Kemere and Matthew McGinley
Fri, 10/4 4:15PM - 6:00PM | B11
Abstract
In natural environments, animals must make informed choices regarding nutrition, shelter, or predation. One ubiquitous task across species, and consequently impactful in shaping the evolution of neural circuitry related to decision-making, is patch-based foraging. In patch-based foraging, animals must harvest resources that are finite and aggregated in confined geographical areas, called patches, by processing information about the rate of intake, the cost of traveling to the next patch, and, importantly, the uncertainty inherent in natural settings. The optimal strategy in the absence of uncertainty, termed the marginal value theorem (MVT), has been well-demonstrated in many ethological studies, but naturalistic patch-based foraging with uncertainty has been difficult to model in an experimental setting. Here, we developed a patch-based foraging task in acoustic environments for freely moving and head-fixed mice in order to investigate decision-making in the face of multiple layers of uncertainty. Briefly, the mice ran on a physical or virtual track until they arrived at a patch, at which point a tone cloud stimulus was played to indicate that they could lick for fixed-volume sucrose solution rewards. The timing of rewards followed a modified inhomogeneous Poisson process to mimic both patch depletion and natural stochasticity. A pure tone stimulus began to play when a reward became available, the frequency of which increased if additional rewards accrued. At any time, mice could leave the patch (residence time) and travel towards the next, replenished patch, during which a pink noise stimulus was played. Once they arrived at the next patch, the process was repeated as before. In both freely moving and head-fixed experiments, MVT-based models outperformed heuristic strategies (such as leaving after a fixed number of rewards or elapsed time without a reward), suggesting that our experimental paradigms replicate naturalistic patch-based foraging and that, particularly in the virtual task, mice processed auditory information about patch location and reward timing to inform decision-making. In freely moving environments with highly stochastic rewards, MVT-based models of residence times failed to account for the multiple layers of uncertainty. Rather, a hierarchical Bayesian model of the reward rate best explained foraging behavior by allowing prior probabilities of the environment to be influenced by recent observations. In the best performing model, priors displayed moderate uncertainty about environmental parameters, and the likelihood incorporated reward times from only the current patch. Taken together, our results suggest that mice handled the multi-layered uncertainty, or meta-uncertainty, pervasive in naturalistic settings, by independently modeling its sources. This hierarchical model has important implications for the study of neural circuits underlying decision-making processes.
Po-Ting and Bertram Liu
Fri, 10/4 4:15PM - 6:00PM | B12
Abstract
In human auditory pathways, the Auditory Periphery (AP) system encodes sound from the ear to the cochlear nuclei. AP system bidirectionally interacts with the higher auditory system, including Medial OlivoCochlear (MOC), inferior colliculus, etc., and goes toward the auditory cortices through Auditory Nerve Fibers (ANFs). ANFs have shown phase-locking to the temporal structure acuity of complex sound stimuli below 2k-5kHz. However, it is unclear how AN phase-locking determines speech perception. Simulation studies were conducted utilizing 1) a simulated human AP model and 2) our decoding model to simulate human speech perception. The decoding model was deep artificial neural networks that reconstruct high-fidelity sounds based on ANFs’ spiking activities. To simulate the human AP model under the Normal Hearing (NH) condition, Ray Meddis’ Matlab Auditory Periphery (MAP) model was used. The Acoustic Reflex and MOC Reflex in the efferent pathways were simulated by MAP. The representations of simulated ANFs’ spiking activities were called auditory neurograms. In phase-locking elimination experiments, there were NH and two Limited Hearing (LH) conditions. To create LH conditions, low-pass filters with cutoff frequencies at 50 and 1k Hz were applied to the NH auditory neurograms. Thus, the filtered neurograms had limited information of speech stimuli. To simulate human speech perception, the LJSpeech dataset was used for training models under NH and LH conditions using our decoding model, and its first 20 wav files were in the test set. The NH model was constructed using the decoding model from our previous study. To test the speech reconstruction performance of these models, Structural Similarity Index Measure (SSIM) was computed to examine the similarity between the spectrograms of original and reconstructed sound stimuli. The NH model achieved mean SSIM scores of 0.9212. On the other hand, the LH models with eliminated AN phase-locking were trained on the filtered neurograms using our decoding model. The results showed mean SSIM scores of 0.654 and 0.823 for the 50 and 1k Hz LH models, respectively. A two sample t-test showed significant differences between each model pair among NH and two LH conditions (p
Rebecca Belisle, Kendrick Tak and Tyler Perrachione
Fri, 10/4 4:15PM - 6:00PM | B13
Abstract
Difficulties with auditory processing, including hearing impairments, multisensory integration, and language processing are widely reported in autism spectrum disorders (ASD) (Bougeard et al., 2021; Rosenhall et al., 1999; DePape et al, 2012; O’Connor, 2021; Deemyad, 2022). Neuroimaging studies have found neural correlates of these auditory impairments, including reduced fMRI adaptation to repeated audio-visual stimulus presentation in the auditory cortex (Millin et al., 2018) and reduced functional connectivity of temporal regions (Anderson et al., 2010; Wilson et al., 2022). Furthermore, studies of structural connectivity and autism have found that microstructural measures of association fibers indicate altered tract integrity compared to neurotypical individuals (Rane et al., 2015). In light of these findings, the present project investigated differences in structural connectivity between auditory cortical regions – Heschl’s gyrus and planum temporale – and frontal and parietal cortical regions between children with ASD and controls. We used MRI data acquired from children ages 5 to 18. Of these children, 37 were diagnosed with autism (31 M, 6 F), and 64 were typically developing children (32 M, 32 F). For each individual, Heschl’s gyrus and planum temporale were manually delineated on the MRI surface, then projected into the volume. Probabilistic tractography was run with FSL’s ProbtrackX (Behrens et al., 2003; Behrens et al., 2007) with Euler streamlining for increased accuracy. For each run, the seed was either Heschl’s gyrus or planum temporale and the waypoint mask and termination mask were both set to one of the cortical target regions. For each seed-target pair, we calculated the connectivity index as the logarithm of the number of streamlines that reached the target divided by the logarithm of the number of streamlines sampled from each seed region (Tschentscher et al., 2019). This index serves as a measure of connectivity strength that is adjusted for the size of the seed volume. Then, we investigated group differences in connectivity index for each seed-target pair using a regression model and controlling for effects of language task performance, age, sex, and total intracranial volume. The connectivity strength between the right Heschl's gyrus and frontal and parietal regions implicated in language and sensory perception was significantly greater for the typically developing group compared to the group with autism. This finding of reduced structural connectivity strength supports prior findings of reduced functional connectivity between Heschl’s gyrus and frontal and parietal sensorimotor regions (Wilson et al., 2022). Overall, these results provide potential structural bases for the widely shown differences in sensory responsivity and multisensory integration in autism. References: [1] Bougeard, C., Picarel-Blanchot, F., Schmid, R., Campbell, R., & Buitelaar, J. (2021). Prevalence of autism spectrum disorder and co-morbidities in children and adolescents: a systematic literature review. Frontiers in psychiatry, 12, 744709. [2] Rosenhall, U., Nordin, V., Sandström, M., Ahlsen, G., & Gillberg, C. (1999). Autism and hearing loss. Journal of autism and developmental disorders, 29, 349-357. [3] DePape, A. M. R., Hall, G. B., Tillmann, B., & Trainor, L. J. (2012). Auditory processing in high-functioning adolescents with autism spectrum disorder. PLoS ONE, 7(9). [4] O’Connor, K. (2012). Auditory processing in autism spectrum disorder: a review. Neuroscience & Biobehavioral Reviews, 36(2), 836-854. [5] Deemyad, T. (2022). Lateralized changes in language associated auditory and somatosensory cortices in autism. Frontiers in Systems Neuroscience, 16, 787448. [6] Millin, R., Kolodny, T., Flevaris, A. V., Kale, A. M., Schallmo, M. P., Gerdts, J., ... & Murray, S. (2018). Reduced auditory cortical adaptation in autism spectrum disorder. ELife, 7, e36493. [7] Anderson, J. S., Druzgal, T. J., Froehlich, A., DuBray, M. B., Lange, N., Alexander, A. L., ... & Lainhart, J. E. (2011). Decreased interhemispheric functional connectivity in autism. Cerebral cortex, 21(5), 1134-1146. [8] Wilson, K. C., Kornisch, M., & Ikuta, T. (2022). Disrupted functional connectivity of the primary auditory cortex in autism. Psychiatry Research: Neuroimaging, 324, 111490. [9] Rane, P., Cochran, D., Hodge, S. M., Haselgrove, C., Kennedy, D. N., & Frazier, J. A. (2015). Connectivity in autism: a review of MRI connectivity studies. Harvard review of psychiatry, 23(4), 223-244. [10] Behrens, T. E., Johansen-Berg, H., Woolrich, M. W., Smith, S. M., Wheeler-Kingshott, C. A., Boulby, P. A., ... & Matthews, P. M. (2003). Non-invasive mapping of connections between human thalamus and cortex using diffusion imaging. Nature neuroscience, 6(7), 750-757. [11] Behrens, T. E., Berg, H. J., Jbabdi, S., Rushworth, M. F., & Woolrich, M. W. (2007). Probabilistic diffusion tractography with multiple fibre orientations: What can we gain?. neuroimage, 34(1), 144-155. [12] Tschentscher, N., Ruisinger, A., Blank, H., Díaz, B., & Von Kriegstein, K. (2019). Reduced structural connectivity between left auditory thalamus and the motion-sensitive planum temporale in developmental dyslexia. Journal of Neuroscience, 39(9), 1720-1732.
Jan Willem de Gee, Laia Alonso-Marmelstein, Kate Schwarz-Roman and Romke Rouw
Fri, 10/4 4:15PM - 6:00PM | B14
Abstract
Individuals with misophonia (Swedo et al., 2022; Jastreboff & Jastreboff, 2001) experience strong negative emotions like rage or disgust in response to everyday, often human-made, sounds (e.g., chewing or throat clearing). The ubiquitous nature of these “trigger sounds” makes misophonia a devastating condition. Furthermore, a remarkably large number of individuals in general population report 'mild' or 'moderate' misophonic complaints (Wu et al., 2014; Rouw & Erfanian, 2018; Naylor et al., 2021). This points at a contentious topic of debate, not only in misophonia research, but also in the pluriform and extensive research on sound sensitivity in related research areas: (e.g., autism, tinnitus, hyperacusis, or PTSD (Stiegler & Davis, 2010; Greenberg & Carlos, 2018; Jüris et al., 2013): what underlies the large individual differences in sound sensitivity? To this end, we sought to establish a sensitive measurement tool to objectively quantify the physical response to trigger sounds. We hypothesized that pupillometry might be that tool. Pupil size fluctuations at constant luminance have previously been shown to reflect neuromodulatory activity (Joshi & Gold, 2020; de Gee et al., 2017) and the ensuing cortical arousal state (Larsen & Waters, 2018; McGinley et al., 2015); importantly, pupil size is also highly correlated to activity of the anterior insula (de Gee et al., 2017), a key brain region in misophonia (Kumar et al., 2017). Thirty participants were recruited from the general population. Their misophonic complaints, as measured with the Amsterdam Misophonia Scale (AMS; Schröder et al., 2013), ranged from sub-clinical to extreme. On each trial they listened to either a misophonia trigger sound or a generally unpleasant sound and rated their experienced annoyance level on a 4-point scale. We observed larger pupil dilation time-locked to the trigger sounds versus generally unpleasant sounds. Within participants, pupil dilation correlated with the subjective (rated) severity of the negative experience of both trigger and generally unpleasant sounds. Across participants, pupil response magnitude, but not subjective ratings, significantly predicted AMS scores. A leave-one-out cross-validation analysis furthermore showed that pupil data predicted AMS score within granularity of its severeness categories. In summary, this study demonstrates, for the first time, the sensitivity of pupillometry to variations in the strength of the misophonic response. The method holds promise as an additional instrument for investigating between-group or inter-individual differences and may potentially add a diagnostic tool at the level of a single individual.
Meredith Ziliak, Jax Marrone, Andres Navarro, Sahil Desai, Emily Bell, and Edward Bartlett
Fri, 10/4 4:15PM - 6:00PM | B15
Abstract
Auditory thresholds, distortion product otoacoustic emissions (DPOAEs), and auditory brainstem responses (ABRs) are widely used to assess hearing loss in patients. Their broad application often leads to a one-size fits all treatment approach, primarily addressing loss of hearing sensitivity. While hearing sensitivity is commonly an issue across hearing loss types, understanding the underlying causes of a patient's loss of hearing is crucial for selecting optimal treatments for each individual. For example, blast exposure, aging, and noise exposure all result in increased thresholds, indicative of loss of hearing sensitivity, yet are likely to possess distinct diagnostic profiles. In 2019, Altschuler et al. identified threshold changes along with persistent damage to outer hair cells (OHCs) and a reduction in cochlear synapses following exposure to noise resembling small arms fire (SAF). SAF noise is common in environments such as the military, law enforcement, or civilian recreation (fireworks, hunting, etc.). However, little is known about how this type of damage influences early (
Lorenz Fiedler, Torben Christiansen, Ingrid Johnsrude and Dorothea Wendt
Fri, 10/4 4:15PM - 6:00PM | B16
Abstract
Auditory attention can be voluntarily focused on a sound source, but it may also be automatically captured by off-focus sounds. The latter can be either relevant or irrelevant for a listener. The ability to switch attention towards a relevant sound source (but to suppress an irrelevant one) requires attentional control and is crucial for navigating in a complex auditory scene. In a dual-task paradigm, we investigated whether pupil responses reflect relevance-dependent selectivity in the processing of background sounds and whether this selectivity correlates with behavioral performance. We asked 21 participants with self-reported normal hearing (N = 21, Age: 27 to 66 years, pure tone average: -4 to +26 dB HL) to listen to continuous speech presented from the front (primary task). At random order and unpredictable times, additional speech sounds were presented from the left or right side (background sounds). Each of these background sounds consisted of a name followed by a two-digit number. While the name served as a cue, the secondary task was to memorize numbers from either the right or left side (i.e., relevant), which was instructed before each one-minute trial. Afterwards, participants were asked to pick three relevant numbers from a board of nine, which also contained three irrelevant numbers and some random numbers that were not presented on that trial. We found increased pupil responses to relevant background sounds compared to irrelevant ones (i.e., selectivity). This selectivity predicted behavioral performance in the secondary task: Participants who exhibited stronger selectivity were able to recall more numbers correctly. Interestingly, pupil responses did not significantly differ between correctly recalled and missed relevant background sounds. However, they were stronger for stream-confused compared to correctly rejected irrelevant background sounds. This suggests that participants were more challenged by suppressing irrelevant sounds than switching attention to relevant sounds. Importantly, neither hearing thresholds nor age predicted behavioral performance in the secondary task. Our findings demonstrate that pupillometry reflects auditory attentional control abilities, which are meaningful for hearing diagnostics as well as the development of intelligent noise management in hearing aids and communication devices.
Estelle In'T Zandt and Dan Sanes
Fri, 10/4 4:15PM - 6:00PM | B17
Abstract
The individual vocalizations of most species display acoustic variability. Therefore, an essential question is how the central auditory system represents vocal categories. If specific call types are associated with distinct information or behaviors, then the brain should be able to form a representation that distinguishes each call type, despite this acoustic variability. Although there is a rich understanding of vocalization representations throughout the auditory neuraxis, the general approach has been to probe auditory neurons with relatively few exemplars, and often from vocalizers with an unknown relationship to the receiver. Here, we address the issue of call categorization by analyzing the response of auditory cortex (AC) populations to a large array of exemplars recorded from the animal’s own family and those of two other families. We investigated the ability of AC neural populations to categorize 4 vocalization types in awake, freely-moving adult Mongolian gerbils (Meriones unguiculatus). Gerbils are a highly social rodent species in that they live together as multi-generational families and produce a rich vocal repertoire. We used chronically-implanted high-density silicon probes to record wirelessly from single AC neurons while presenting a large set of variants for each of 4 vocalization categories (n=1200 exemplars). The vocalizations were obtained from overnight recordings of individual gerbil families, one of which was always the implanted animal’s own family. The response of each neuron to pure tones and amplitude modulated (AM) white noise was also used to characterize spectral and envelope responses. Initial analyses focused on AC neuron rate coding using a population decoder (support vector machine). Single unit responses generally displayed a highly variable response to the 300 exemplar calls within a category. However, despite this within-group variance at the single-neuron level, AC populations were able to decode categories significantly above chance. The sensitivity for decoding each call type was a d’ between a given category and all other categories. The ability of AC populations to decode the family identity of each of the 4 call types displayed a weak effect, suggesting that this information is not represented not at the single syllable level. Current research is addressing whether family identity is present at the level of vocalization bouts or in cortical areas downstream of AC. Taken together, our preliminary results suggest that small auditory cortex neuron populations can robustly represent vocalization categories despite the large acoustic variance across exemplars.
Jenna Blain, Monty Escabi and Ian Stevenson
Fri, 10/4 4:15PM - 6:00PM | B18
Abstract
Spectro-temporal receptive fields (STRFs) are widely used in auditory neuroscience to model the time-frequency sensitivity of auditory neurons. In many instances, STRFs are derived using unbiased synthetic stimulus ensembles, such as dynamic ripples or random chords, which can easily be estimated using spike-triggered averaging. When natural sounds are used, however, decorrelation and regularization techniques are needed to remove residual stimulus correlations that can distort the estimated STRFs. Furthermore, nonlinearities and non-stationarities (such as adaptation) make it difficult to predict neural responses to natural sounds. We obtained neural recordings from the inferior colliculus of unanesthetized rabbits in response to a sequence of natural sounds, dynamic moving ripple sounds (DMR), and speech in varying background noises. We developed a model-based approach for deriving auditory STRFs and predicting single trial spike trains to these sounds. The model consists of a nine parameter Gabor STRF (gSTRF; Qiu et al. 2003), which accounts for the neuron’s spectro-temporal integration of the stimulus and a nine parameter contrast STRF which dynamically changes the threshold of the neuron. Additionally, there is a four-parameter nonlinear integrate-and-fire output compartment that incorporates intrinsic noise, cell membrane integration, and nonlinear thresholding to generate simulated output spikes and a four parameter gain control component that accounts for the adaptation of the neuron. We used Bayesian optimization to fit neural data and derive optimal model parameters by maximizing the model’s log-likelihood. To validate our spiking gSTRF model, we compared the optimal gSTRFs to more common approaches such as a generalized linear model (GLM). We found that STRFs derived via regression were spectrally smeared, indicating that stimulus correlations were not effectively removed, despite implementation of decorrelation techniques. In comparison, our gSTRF was compact and provided biologically feasible estimations of the parameters, such as the neuron’s best frequency, delay, and best temporal and spectral modulation frequency. We also carried out these comparisons with simulated data where the “ground truth” STRF and spiking activity were known a priori. For these simulations, we demonstrate that the gSTRF converges to the original simulation parameters and replicates the spiking activity from the original simulations down to millisecond precision. Collectively, this new approach allows one to derive auditory STRFs and predict neural spiking activity to natural sounds using functionally interpretable basis functions. The small number of parameters makes exploration of nonlinear and nonstationary effects due to natural sound statistics more feasible.
Irene Onorato, Livia de Hoz and David McAlpine
Fri, 10/4 4:15PM - 6:00PM | B19
Abstract
The auditory system's sensitivity to stimulus statistics is essential for sound discrimination in noisy environments and the temporal binding of auditory objects – both critical for complex scene analysis. Feedforward and feedback interactions within the auditory pathway dynamically modulate these properties, but the underlying circuit mechanisms remain poorly understood. We investigate these interactions by simultaneously recording from the inferior colliculus (IC) and auditory cortex (A1) in awake and anesthetized mice. Using a broadband noise stimulus with modulated amplitude, we examine how IC and A1 neurons encode sound statistics. We changed the intensity range, complexity of the sound, and the speed of amplitude modulation and investigated how these parameters are represented in terms of the dynamics of neuronal adaptation and timescales of integration and how they are modulated by feedforward and feedback interaction. Furthermore, we use optogenetics to dissect the specific roles of excitatory and inhibitory sub-types in mediating these neuronal dynamics. In summary, we characterise how cortical and subcortical neurons are modulated by feedforward and feedback information, supporting a reliable stimulus encoding across different sound contextes. These findings are critical for better understanding the way the auditory system is able to represent a noisy and constantly changing sound environment, ultimately helping to comprehend the mechanisms underlying the emergence of listening dysfunctions.
Bshara Awwad and Daniel Polley
Fri, 10/4 4:15PM - 6:00PM | B20
Abstract
Acoustic trauma, characterized by sudden noise-induced injury to cochlear hair cells or primary afferent neurons, triggers a complex series of structural, transcriptional, and physiological responses in the central auditory system of adult rodents. While these compensatory mechanisms aim to restore sensitivity, they frequently lead to neural hyperactivity, potentially leading to auditory disorders such as tinnitus, hyperacusis, and impaired hearing in noisy settings. Sensorineural hearing loss (SNHL) caused by acoustic trauma is typically studied as a sensory disorder, though patient complaints emphasize significant aversion, discomfort, and anxiety triggered by moderate-intensity innocuous sounds, often leading to social withdrawal and depression. The affective dimensions of hearing loss have not been modeled in laboratory animals, so very little is known about the underlying neural substrates and mechanisms. Here, we tested the hypothesis that hyperactive efferent projections from the auditory forebrain induce hyper-responsive, non-habituating sound processing in a key sensory gateway to the limbic system, the lateral amygdala (LA). Here, we expressed GCaMP in excitatory LA neurons and monitored bulk calcium activity over a several week period with implanted fibers in awake, head-fixed mice. We found that sound evoked activity in LA habituated over several days in sham-exposed mice but became hyper-responsive and non-habituating in mice with noise-induced high-frequency SNHL. Mice then underwent a Pavlovian discriminative auditory threat conditioning protocol, which produced discriminative fear recall and selective LA plasticity in sham-exposed mice but generalized, non-extinguishing threat learning and LA plasticity in SNHL mice. Our ongoing experiments are directly testing the contribution of auditory cortex (ACtx) hyperactivity in LA plasticity, with a particular focus on whether direct activation of parvalbumin-expressing interneurons in ACtx can sustainably reinstate normal LA sound processing and auditory threat evaluation.
Jimmy Dion, Ian Stevenson and Monty Escabí
Fri, 10/4 4:15PM - 6:00PM | B21
Abstract
Humans and animals are challenged when real-world sounds occur in background noise. These scenarios also challenge the hearing impaired and speech recognition systems. Perceptual studies have shown how spectrum and modulation statistics of a background sound influence the perception of a foreground target, yet how the brain separates sound mixtures is poorly understood. We recorded multi-unit population activity from the inferior colliculus of head-fixed unanesthetized rabbits via linear 64-channel arrays. Speech sentences or zebra finch song motif foregrounds were presented in the presence of seven natural backgrounds at multiple signal-to-noise ratios (SNRs). These backgrounds included speech babble, bird babble and construction noise. The backgrounds were delivered in the original unmodified (OR) or the perturbed phase randomized (PR) or spectrum equalized (SE) conditions. PR preserves the original spectrum but distorts (whitens) the original modulations, whereas SE distorts the spectrum but not the modulations. Via a shuffled spectrum estimation, we separated the foreground- and background-driven neural responses for each sound mixture and condition (OR, PR and SE), which allowed us to separately compute the foreground- and background-driven power spectra. To assess the fidelity of neural encoding for each background and condition, we estimated the neural SNR by dividing the foreground by the background-driven spectra. Results show that neural SNRs depend on the background and its spectrum and modulations. Compared to original backgrounds, PR backgrounds enhance or reduce the neural SNR depending on the background, which implies that the original background modulations improve or hurt the foreground representation. Similarly, SE backgrounds enhance or reduce the neural SNR, suggesting that the original background spectra could beneficially or detrimentally impact foreground encoding. For some backgrounds, the spectrum dominates the neural SNR, while for others the modulations have a greater impact. We also demonstrate how spectrum or modulation interference is most prominent for modulation frequencies < 10 Hz, overlapping the temporal fluctuations for individual words or syllables in the song motif. Finally, preliminary comparisons to human perceptual data using the same backgrounds in speech recognition tasks suggest that neural SNR correlates with recognition accuracy. Collectively, the findings suggest that the spectra and modulations of backgrounds influence and interfere with the representation of foreground vocalizations, suggesting that these statistics critically underlie masking of real-world natural sounds.
Isaac Boyd, Sanket Srivastava, Zhili Qiu, Howard Gritton and Kamal Sen
Fri, 10/4 4:15PM - 6:00PM | B22
Abstract
Cortical circuits are thought to play a critical role in solving the cocktail party problem. A previous model by Dong, Colburn and Sen proposed how such computations may be carried out in the songbird. However, relatively little is known about underlying cortical circuit mechanisms in mammals. Here we present a model cortical circuit for mouse auditory cortex that is similar in architecture to the model by Dong et al. but incorporate recent experimental data from mouse auditory cortex. The model utilizes experimentally measured spatial tuning curves in mouse auditory cortex to model distinct spatial channels that reflect directional sound processing. The input layer uses auditory stimuli and spectrotemporal receptive fields to generate spiking activity. The intermediate layer consists of excitatory (E) neurons and two types of inhibitory interneurons (P and S). P interneurons mediate two types of stimuli driven inhibition: within-channel feedforward inhibition to modulate the temporal dynamics of E neurons in the same spatial channel, and lateral inhibition to sharpen spatial tuning curves. S neurons mediate cross-channel surround suppression via horizontal connections. Finally, the spatial channels converge in the output layer onto a single output neuron. We use this model to explain recent experimental measurements of neural discriminability of target sounds from multiple spatial locations presented in isolation (“Clean” condition), as well as the neural discriminability of competing target and masker sounds from different locations (“Masked” condition), which display “hotspots” of high discrimination performance at specific spatial configurations. The model explains experimental observations and reveals distinct functional contributions of different interneuron types in helping solve the cocktail party problem.
Huaizhen Cai and Yale Cohen
Fri, 10/4 4:15PM - 6:00PM | B23
Abstract
Because we live in a multisensory environment, it is reasonable to speculate that our brain has evolved to preferentially take advantage of multisensory information. However, despite a large literature examining multisensory processing, we still do not have a full understanding of how cortical activity (e.g., in the primary auditory cortex [A1]) contributes to multisensory perception. Here, we recorded A1 neural activity in non-human primates while they performed an ethologically relevant multisensory detection task that utilized monkey vocalizations and a video of a vocalizing monkey. We manipulated task difficulty by varying the signal-to-noise ratio (SNR) between an auditory target stimulus (i.e., a monkey “coo” vocalization) and a background noisy “chorus” of monkey vocalizations. We found that a temporally and contextually congruent video of a vocalizing monkey improved the monkeys’ ability to detect the target vocalization. Our analyses of A1 activity indicated that 1) it was modulated more by visual stimuli in lower SNR conditions than in higher SNR conditions; 2) visual stimuli improved the capacity of linear classifiers to decode target responses from noise responses; and 3) their population neural trajectories encoded the target’s SNR. Further, we found neurons with significant spectro-temporal tuning properties were less likely to be modulated by task parameters than those neurons that did not have significant tuning properties. Overall, we found that visual stimuli modulated A1 activity and improved the encoding of auditory stimuli by A1 neurons, which might facilitate auditory perception.
Celine Drieu, Ziyi Zhu, Ziyun Wang, Kylie Fuller, Aaron Wang, Sarah Elnozahy and Kishore Kuchibhotla
Fri, 10/4 4:15PM - 6:00PM | B24
Abstract
Rapid learning confers significant advantages to animals in ecological environments. Despite the need for speed, animals appear to only slowly learn to associate rewarded actions with predictive cues. This slow learning is thought to be supported by a gradual expansion of predictive cue representation in the sensory cortex. However, evidence is growing that animals learn more rapidly than classical performance measures suggest, challenging the prevailing model of sensory cortical plasticity. Here, we investigated the relationship between learning and sensory cortical representations. We trained mice on an auditory go/no-go task that dissociated the rapid acquisition of task contingencies (learning) from its slower expression (performance). Optogenetic silencing demonstrated that the auditory cortex (AC) drives both rapid learning and slower performance gains but becomes dispensable at expert. Rather than enhancement or expansion of cue representations, two-photon calcium imaging of AC excitatory neurons over weeks and unsupervised tensor decomposition revealed two higher-order signals that were causal to learning and performance. First, a reward prediction (RP) signal emerged within tens of trials, was present after action-related errors only early in training, and faded at expert levels. Strikingly, silencing at the time of the RP signal impaired rapid learning, suggesting it serves an associative and teaching role. Second, a distinct cell ensemble encoded and controlled licking suppression that drove the slower performance improvements. These two ensembles were spatially clustered but uncoupled from underlying sensory representations, indicating a higher-order functional segregation within AC. Our results provide a mechanistic dissociation between learning and performance and reshape our understanding of the fundamental role of the sensory cortex.
Kirill Nourski, Mitchell Steinschneider, Ariane Rhone, Rashmi Mueller and Matthew Banks
Fri, 10/4 4:15PM - 6:00PM | B25
Abstract
Neural responses to novel sensory stimuli represent electrophysiologic signatures of predictive coding. These responses are promising biomarkers of consciousness and have clinical importance for assessing depth of general anesthesia, diagnosis, and prognosis of disorders of consciousness. Loss of responsiveness (LOR) induced by the general anesthetic propofol is associated with suppressed responses to short-term auditory novelty (local deviance, LD) outside canonical auditory cortex (Nourski et al, J Neurosci 2018, 38:8441-52). This suppression may represent a biomarker of loss of consciousness. By contrast, responses to long-term novelty (global deviance, GD) are more sensitive to propofol, becoming abolished at subhypnotic doses of the drug. Unlike propofol, which modifies GABA-ergic inhibition, the alpha-2 adrenergic agonist dexmedetomidine modulates neural circuits involved in non-REM sleep. The present study examined whether the changes in cortical auditory processing observed with propofol could be generalized to dexmedetomidine and to natural sleep. Intracranial recordings were obtained in neurosurgical patients undergoing monitoring for refractory epilepsy. Stimuli were vowel sequences incorporating within- and across-sequence deviants (LD and GD, respectively). The stimuli were presented to the participants while they were awake, during administration of dexmedetomidine, and during daytime sleep. Dexmedetomidine infusion was titrated to reach sedation with responsiveness to command and then to LOR. Neural activity was examined in auditory cortex and other brain regions as averaged evoked potential (AEP) and high gamma (70-150 Hz) band power. AEP LD and GD effects had a higher prevalence and were more broadly distributed compared to high gamma effects in the awake state. The timing of the AEP LD effect was consistent with that previously observed for the mismatch negativity (MMN) response. GD effects emerged later and had a timing consistent with the P3b novelty response. As observed previously for propofol, subhypnotic doses of dexmedetomidine reduced LD effects in medial temporal and prefrontal cortex and nearly completely eliminated GD effects both within and outside of auditory cortex. LOR was associated with loss of LD effects in prefrontal cortex and the temporal lobe beyond auditory cortex. Similar changes were observed during daytime sleep. LD effects within canonical auditory cortex were preserved following dexmedetomidine-induced LOR and during sleep. The results expand previous work on mechanisms underlying anesthesia-induced loss of consciousness. The current data support the generalizability of changes in auditory cortical processing from propofol to dexmedetomidine and sleep. LD effects outside canonical auditory cortex may represent a biomarker of conscious auditory novelty processing and highlight the clinical utility of MMN. The resilience of LD effects in auditory cortex following anesthesia-induced LOR and during sleep demonstrates preservation of low-level novelty monitoring of the acoustic environment at the cortical level.
Rajvi Agravat, Maansi Desai, Garbielle Foox, Alyssa Field, Anne Anderson, Dave Clarke, Elizabeth Tyler-Kabara, Andrew Watrous, Howard Weiner and Liberty Hamilton
Fri, 10/4 4:15PM - 6:00PM | B26
Abstract
The human brain’s ability to process complex auditory signals, such as speech and music, relies on extracting and representing various embedded information. Speech contains linguistic information (phonology, semantics, lexical properties) and acoustic information (spectral patterns, pitch structures, rhythmic elements). This study aims to advance our understanding of the neural representations of speech and music in the auditory cortex by addressing key questions: 1) How do the neural representations of acoustic features in speech compare to those in music within the auditory cortical hierarchy? 2) How does the neural representation of pitch and spectral features in speech and music change with development from early childhood to late adolescence? We recorded brain activity via stereo-electroencephalography (sEEG) from 24 participants (14M/10F) while they listened to movie trailer stimuli containing both speech and music. We extracted the high-gamma band (70-150 Hz) activity in bilateral auditory-related regions, including Heschl’s gyrus (HG), superior temporal gyrus (STG), and middle temporal gyrus (MTG). These movie trailers were then split into speech and music-only content for analysis using a neural network audio separation algorithm. We then fit linear encoding models that predicted high gamma activity from either the original spectrogram (mixed speech and music), the music-only spectrogram, or the speech-only spectrogram. In all cases, the patients heard the original (mixture) of speech and music, so if the separate spectrograms more effectively predict neural activity, this suggests a preferential representation of that information. Preliminary findings indicate higher pitch-related selectivity for speech than music in areas such as HG, STG, and MTG. We also looked at spectral features for speech and music and where these are encoded in the brain. The data points for this analysis will follow a general positive correlation, suggesting that electrodes with higher spectral responses to music tend to also have higher responses to speech and vice versa. The model demonstrates enhanced acoustic selectivity for speech when evaluating speech-only spectrograms compared to the original spectrograms extracted from the movie trailers. This suggests that removing non-speech auditory sources, such as background music or sound effects, can improve the model’s performance in representing and processing the speech components. Further work could examine the roles of auditory attention and encoding differences between sung speech and regular speech.
Arielle Moore, Tyler Perrachione and Emily Stephen
Fri, 10/4 4:15PM - 6:00PM | B27
Abstract
Recent electrocorticography (ECoG) studies have revealed response latencies in the superior temporal gyrus (STG) that suggest parallel processing of speech information between auditory cortex and posterior STG. While speech information is classically thought to arrive in auditory cortex in Heschl’s Gyrus (HG) first before progressing to downstream speech areas in posterior STG (pSTG), recent evidence from ECoG has revealed neural responses in pSTG to the onset of speech signals with similar response latencies as HG, suggesting early parallel processing of speech information between pSTG and auditory cortex. Here, we aimed to investigate whether there are differences in the latencies of the hemodynamic response to speech onsets across auditory areas using fMRI. We examined the hemodynamic responses and their temporal derivatives to speech onsets in a passive listening functional magnetic resonance imaging paradigm. Using a mass univariate modeling approach, we found higher values for the temporal derivative coefficient in speech onset responses in pSTG, suggesting faster peak times in pSTG than the rest of the STG. The results of this study supports prior findings of low latency neural activity in pSTG on par with primary auditory areas in ECoG, and suggest that fMRI may be a useful tool for measuring rapid temporal aspects of speech processing in cortex.
Annesya Banerjee, Mark Saddler and Josh McDermott
Fri, 10/4 4:15PM - 6:00PM | B28
Abstract
Introduction: Cochlear implants (CI) are a standard treatment for severe sensorineural hearing loss but fail to restore fully normal hearing. These shortcomings could arise from suboptimal stimulation strategies, degeneration in the auditory system, or limits on the brain’s ability to adapt to CI input. Computational models that predict auditory behavior from CI input may clarify how these different factors shape CI outcomes. Here, we developed artificial neural network models of hearing with CIs and investigated the effects of these different factors on speech recognition and sound localization performance. Methods: We modeled normal hearing by training a feedforward convolutional neural network to recognize speech or localize sounds in noise given simulated auditory nerve input from an intact cochlea. We modeled CI hearing by testing this trained network on simulated nerve input from CI stimulation. To simulate learning to hear through a CI, we re-optimized all or part of the network for CI input. To assess the potential consequences of neural degeneration, we silenced different fractions of simulated nerve fibers and network model units (simulating peripheral and central degeneration, respectively). Results: Models trained with CIs exhibited impaired speech recognition and sound localization relative to the normal hearing model. When the entire neural network was optimized for CI input, speech recognition was substantially better than that of typical CI users, even with substantial simulated peripheral and central degeneration. Performance on par with CI users was achieved when re-optimization was limited to the late stages of the network, consistent with plasticity constraints limiting speech recognition in CI users. By contrast, for the task of sound localization, large deficits remained even after reoptimizing the full model, suggesting device-related factors limiting performance. Results differed only modestly between stimulation strategies currently in use. The results point to central plasticity as limiting CI outcomes while also identifying limitations in existing stimulation strategies. Conclusion: Our results help clarify the roles of impoverished peripheral information and incomplete central plasticity in limiting CI users’ performance of realistic auditory tasks.
Tomas Suarez Omedas and Ross Williamson
Fri, 10/4 4:15PM - 6:00PM | B29
Abstract
Discriminating relevant auditory signals from background noise (BN) poses a fundamental challenge to sensory systems. Primary auditory cortex (ACtx) plays a key role in disentangling auditory signals from BN by creating noise-invariant representations; however, the neural subpopulations and circuit computations that give rise to noise-invariance remain elusive. We studied how the sensory responses of three excitatory neuronal subpopulations, layer (L)2/3 intratelencephalic (IT), L5 IT and L5 extratelencephalic (ET), are involved in generating noise-invariant representations. We utilized two-photon calcium imaging in ACtx of awake, head-fixed mice to monitor each excitatory subpopulation during passive presentation of pure tones with/without a constant white BN. Using a combination of information-theoretic, manifold-geometric, and stimulus decoding analyses, we found that the presence of BN differentially modulated the responses of each ACtx subpopulation at both the single neuron and population level. In the presence of BN, L2/3 responses were largely suppressed, and the encoding of tones was less accurate. In contrast, L5 IT and ET each had similar responses with/without BN and encoded tones with similar accuracy. To relate this representational characterization to the underlying functional connectivity we designed a holographic optogenetics paradigm to excite groups of neurons that share similar modulation motifs and to quantify their influence on the rest of the neural population. We want to test the hypothesis that the influence of groups with similar modulation to motifs will have similar influence as groups of neurons that share sensory tuning curves. Taken together, these experiments show how noise-invariant representations are constructed across the cortical column.
Zhili Qu, Jian Nocon, Kamal Sen and Howard Gritton
Fri, 10/4 4:15PM - 6:00PM | B30
Abstract
Complex auditory scenes are comprised of dynamic stimuli that can originate from multiple spatial directions and compete for listener attention. Despite the known complexity of these auditory environments, the specific mechanisms by which the auditory cortex successfully suppress competing streams remains largely unknown. Suppression of competing streams within auditory circuits is thought to depend on inhibitory networks although the cell types and circuit connectivity that supports spatial attention is not well resolved. To address this gap, we examined the role of somatostatin (SST) neurons in contributing to the neural discrimination of auditory streams in complex scenes. SST neurons are crucial for modulating pyramidal excitation and important contributors to surround inhibition, thereby influencing the dynamics of cortical activity. To understand the role of SST neurons in complex scene analysis, we created a multi-speaker environment with overlapping and competing signals allowing us to simulate a "cocktail party" like environment. Utilizing electrophysiology, optogenetics, and classifiers that utilize spike timing information to discriminate different target stimuli, we reveal that SST neurons contribute to neural sensitivity to the spatial configuration of competing sounds. These findings reveal a specialized role for SST neurons in enabling cortical circuits to discriminate auditory targets from background noise in complex auditory scenes.
Tiange Hou, Blake Sidleck, Danyall Saeed and Michele Insanally
Fri, 10/4 4:15PM - 6:00PM | B31
Abstract
The mammalian auditory system is remarkably adaptable, allowing animals to flexibly respond to sensory information in dynamic environments. Cortical responses from behaving animals are highly variable, ranging from cells that are highly modulated to sensory stimuli (i.e. classically responsive) to those that appear to fire randomly (non-classically responsive). While classically responsive cells have been extensively studied for decades, the contribution of non-classically responsive cells to behavior has remained underexplored despite their prevalence. Flexible behaviors such as perceptual learning involves the engagement of various neural circuits including top-down regions such as frontal areas. To determine how frontal and sensory cortical areas interact to enable flexible behavior, we used silicon probes to record single-unit responses simultaneously from auditory cortex (AC, n=1,365 cells) and a downstream region in frontal cortex called secondary motor cortex (M2, n= 1,343 cells) while mice performed a go/no-go auditory reversal learning task (N=12 mice, d’=1.8±0.18). Mice first were trained to respond to a target tone for water reward, and to withhold from responding to a non-target tone (i.e. pre-reversal). Once animals reached expert criteria a rule switch was introduced, and the rewarded tone was reversed (i.e post-reversal). Both chemogenetic and optogenetic silencing experiments indicated that both regions are required for reversal learning (chemogenetics, d’=0.29±0.46, N=9 animals; optogenetics, d’= -0.08±0.10, N=7 animals). Single-unit responses in both regions were highly diverse, spanning the range from classically to non-classically responsive. However, there were notable differences between AC and M2. While sensory-responsive AC neurons mainly exhibited onset responses, the vast majority of sensory-responsive neurons in M2 were offset responses that had shorter latencies as animals learned post-reversal. While evoked responses to the target tone were suppressed in AC during post-reversal, they were enhanced in M2. Using a previously published spike-timing dependent Bayesian decoder we found that the number of task-encoding cells increased during both pre and post-reversal learning in both AC and M2. While task-encoding non-classically responsive cells were preferentially recruited in AC during periods of rapid learning, classically-responsive neurons in M2 were recruited when animals achieved expert performance levels suggesting a dissociable role for these neural response types. These results indicate that distinct neural dynamics in both regions drive flexible auditory behavior.
Fernando Nodal, Inesh Sood, Andrew King and Victoria Bajo
Fri, 10/4 4:15PM - 6:00PM | B32
Abstract
The descending projection from the auditory cortex to the inferior colliculus (IC) has been implicated in auditory spatial learning, the passive processing of speech-in-noise, innate defensive fear behavior, and processing of behaviorally-salient stimuli. However, stimulation of auditory cortical inputs has both excitatory and inhibitory effects on the activity of IC neurons and some neural properties, such as the spectrotemporal tuning properties of these neurons and their contrast gain control, seem to be independent of cortical inputs. How auditory cortex influences the response properties of IC neurons is therefore far from being understood. We optogenetically activated corticocollicular neurons in anesthetized ferrets and recorded IC activity using Neuropixels probes to explore the effect of these descending inputs on the frequency selectivity of IC neurons and their spatial sensitivity. Channelrhodopsin (ChR2) was expressed in corticocollicular neurons by injecting retrograde virus encoding Cre recombinase (Retro2 AAV.Cre.GFP) at multiple sites in the IC and a viral construct encoding ChR2 in reversed fashion under the FLEX cassette (AAV.Flex.ChR2.mCherry) at multiple locations in the primary auditory cortex (A1) 4-6 weeks before the recordings. Activation of ChR2 expressed in corticocollicular neurons was achieved by blue laser light illumination delivered by an optic fibre with an intensity of 10 mW at the tip placed in the center of the middle ectosylvian gyrus, where A1 is located. Activation of corticocollicular inputs increased the magnitude of the driven responses to pure tones of most neurons recorded in the central nucleus (CNIC) and the dorsal (DCIC) and external cortex (ECIC) of the IC, while decreasing their frequency selectivity in DCIC and ECIC but not in the CNIC. By contrast, other neurons (24%) showed the opposite effect. In addition, activation of the corticocollicular pathway shifted the frequency tuning of IC neurons, increasing the representation of the most sensitive mid-frequency region (around 10 kHz) in the ferret audiogram. The responses of IC neurons to broadband noise presented in virtual acoustic space also became stronger and more broadly tuned following optogenetic activation of corticocollicular inputs. Our results demonstrate pronounced effects of cortical activation on the spectral and spatial response properties of IC neurons, paving the way for future experiments in behaving animals to explore the role of this descending circuit in active listening.
Andrea Santi, Sharlen Moore, Jennifer Lawlor, Aaron Wang, Kelly Fogelson, Jordan Amato, Kali Burke, Amanda M Lauer and Kishore Kuchibhotla
Fri, 10/4 4:15PM - 6:00PM | B33
Abstract
Alzheimer’s disease (AD) is a progressive form of dementia in which memory loss is thought to arise from underlying neurodegeneration. An enduring question, however, is the extent to which memories are permanently lost, or instead, remain largely intact but become inaccessible. Here, we trained APP/PS1+ mice with significant amyloid accumulation to lick in response to one tone (S+) and withhold licking to another tone (S-). We exploited a recent finding that performance on non-reinforced (‘probe’) trials provides a more accurate measure of the strength of the associative memory, since performance on reinforced trials is impacted by non-associative, contextual factors. We found that behavioral performance in APP/PS1+ mice was impaired on reinforced trials. Surprisingly, however, this impairment was not due to memory loss, as APP/PS1+ performed as well as controls on probe trials. A biologically-plausible reinforcement learning model showed that synaptic weights from sensory-to-decision neurons were preserved (i.e. intact sensorimotor memories) but performance suffered on reinforced trials due to insufficient contextual scaling (i.e. impaired contextual expression). To test this possibility, we performed large-scale two-photon imaging of 6,216 excitatory neurons in 8 mice in the auditory cortex, the first site of amyloid deposition in APP/PS1+ mice. On reinforced trials, cortical networks in APP/PS1+ mice were more suppressed, less selective to the sensory cues, and exhibited aberrant behavioral encoding compared to control mice. A small sub-population of neurons, however, displayed the opposite phenotype reflecting a potential compensatory mechanism. Volumetric analysis demonstrated that deficits were concentrated near Aβ plaques. Strikingly, these network deficits were reversed almost instantaneously on probe trials, providing neural evidence for intact sensorimotor memories. Our results suggest that amyloid deposition initially impacts contextual expression, rather than degrading the underlying memory trace, providing a roadmap to use circuit-level interventions to reveal hidden memories in early AD.
Elizabeth Flad, Rachel Gatewood, Onaedo Adigwe, Irmak Gokcen and Yan Gai
Fri, 10/4 4:15PM - 6:00PM | B34
Abstract
The ability for the mammalian auditory system to locate sound sources relies on localization cues varying with time, intensity, and frequency. Current generations of hearing aids do not provide users with comparable performance in localizing and segregating concurrent sources. We have previously developed a robotics algorithm that identifies multiple azimuthal sources in a rapid manner by using a 2D spiral model created at a single sound frequency (Orr et al., Hear Res, 2023). The present study implemented the algorithm in a real-time system, measured its localization errors, and compared the results to human performance obtained under the same conditions. A total of 36 loudspeakers were evenly spaced on a large metal ring, and a dummy head wearing two in-ear microphones was placed at the center to record the single or mixed sound. Once the locations have been determined using our algorithm stored in a computer, the dummy head with a laser pointer fixed at the forehead rotated to face the detected locations. Meanwhile, 15 human subjects were recruited to replace the dummy head and performed the same tasks. In general, humans significantly outperformed (t-test) the robotics system based on a single-frequency model (such as 600 Hz), especially when two or more sounds coexisted. Combining the model across multiple frequencies using a simple machine-learning classifier significantly (t-test) improved the algorithm’s performance. Our study sets the foundation for the next-generation hearing aids that can automatically localize and segregate spatial sounds.
Peili Chen, Shiji Xiang, Edward Chang and Yuanning Li
Fri, 10/4 4:15PM - 6:00PM | B35
Abstract
Recent computational cognitive neuroscience advances highlight the parallel between deep neural network (DNN) processing of speech/text and human brain activity. However, most studies have examined how single-modality DNN models (speech or text) correspond with activity in particular brain networks. Yet, the critical factors driving these correlations remain unknown, especially whether DNNs of different modalities share these factors. It is also not clear how these driving factors evolve in space and time in the brain language network. To address these questions, we analyzed the representation similarity between self-supervised learning (SSL) models for speech (Wav2Vec2) and language (GPT-2), against neural responses to naturalistic speech captured via high-density electrocorticography from 16 participants. We developed both a time-invariant sentence-level and a time-dependent single-word-level neural encoding models. These models helped delineate the overall correspondence and fine-grained temporal dynamics in brain-DNN interactions. Our results indicate high prediction accuracy of both types of SSL models relative to neural activity before and after word onsets. We observed distinct spatiotemporal dynamics: both models showed high encoding accuracy 40 milliseconds before word onset, especially in the mid-superior temporal gyrus (mid-STG), with Wav2Vec2 also peaking 200 milliseconds after word onset. Applying clustering analysis to the timecourse of word-level encoding score of the SSL models, we found two distinct clusters that mainly correspond to mSTG and pSTG. The mSTG cluster contributed to the -40ms peak, while the pSTG cluster contributed to the 200ms peak. Using canonical correlation analysis, we discovered that shared components between Wav2Vec2 and GPT-2 explain a substantial portion of the SSL-brain similarity. Further decomposition of DNN representations indicated that contextual information encoded in SSL models contributed more to the brain alignment in mid-STG and before word onsets (-40ms), while acoustic-phonetic and static semantic information encoded in SSL models contributed more to the brain alignment in post-STG and after word onsets (200ms). In summary, we demonstrate that both speech and language DNNs share neural correlates driven by context and acoustic-phonetic cues, aligning with distinct neural activity patterns over space and time. Our findings suggest that key aspects of neural coding in response to speech are captured by self-supervised DNNs of different modalities, reflecting a convergence of artificial and biological information processing systems.
Baher Ibrahim and Daniel Llano
Fri, 10/4 4:15PM - 6:00PM | B36
Abstract
Despite the critical role of the thalamus in transmitting sensory information from lower sensory structures to the higher cortical centers, how the thalamus modifies ascending information is still unclear. The diversity of the thalamic interactions with sensory and non-sensory brain areas indicates that thalamic function is modulated by multiple brain regions. For instance, the GABAergic sheet of neurons called thalamic reticular nucleus (TRN), which surrounds most of the dorsal part of the thalamus, has synaptic connections with thalamocortical and corticothalamic projections as well as with the non-sensory parts of the forebrain. Most of the studies done on the TRN focused more on the reciprocal thalamic-TRN connection through a closed-loop model, which explains the well-known oscillatory phenomena in the thalamus. However, the diverse connections between TRN and other non-sensory brain regions could indicate the presence of nonreciprocal or open loop organization, which could provide a platform for modulating the thalamocortical transmission of sensory information through modulating the relative timing between TRN stimulation and thalamic sensory stimulation. The colliculo-thalamo-cortical (CTC) slice, which preserves the synaptic connection between the inferior colliculus (IC), medial geniculate body (MGB), and auditory cortex (AC), was used to examine the change in AC activity following the change in the relative timing between the electrical stimulation of the IC and the laser photostimulation of caged glutamate at the TRN. Initially, the connection between TRN and MGB in the CTC slice was confirmed by mapping the inhibitory postsynaptic currents (IPSCs) recorded from MGB cells by laser photostimulation of the caged glutamate through a grid spanning the two structures. The presence of thalamic IPSCs produced by stimulating TRN (with shorter latency) or MGB (with longer latency) indicated the thalamic-TRN connection in the CTC slice. The AC activity was then monitored by imaging the Ca-signals from the CTC slice following the IC electrical stimulation and the laser photostimulation of caged glutamate at the TRN level. While simultaneous stimulation of the IC and TRN resulted in the suppression of the AC activity, the AC activity was enhanced when the TRN stimulation preceded the IC stimulation by 100 ms. This data indicates the possible role of TRN in modulating the thalamocortical transmission of sensory information, which could be explained by the open-loop model of TRN connections.
Gibbeum Kim, Rafay Khan, Yihsin Tai, Somayeh Shahsavarani and Fatima T Husain
Fri, 10/4 4:15PM - 6:00PM | B37
Abstract
Tinnitus often coexists with hearing loss, but not all individuals with hearing loss experience tinnitus. While previous studies have highlighted structural alterations associated with hearing loss and/or tinnitus, the fundamental neural mechanisms underpinning tinnitus severity remain poorly understood. In this study, we conducted large-scale voxel-based morphometry (VBM) to investigate gray matter (GM) volume differences among participants with varying tinnitus severity and hearing status, compared to controls. T1-weighted structural MR images were collected from 100 participants, who were divided into four groups: normal hearing controls (CONNH, n=20), hearing loss controls (CONHL, n=18), normal hearing tinnitus sufferers (TINNH, n=17), and tinnitus sufferers with hearing loss (TINHL, n=45). The tinnitus group was further divided into two groups based on the severity of their tinnitus: bothersome tinnitus (BTIN, n=19) and mild tinnitus (MTIN, n=43). Two-way ANOVA results showed the main effects of tinnitus and hearing loss. We found GM volume reduction in the auditory cortices (planum polare) and insular cortices in CONHL compared to CONNH, regardless of the presence of tinnitus. Furthermore, tinnitus with normal hearing (TINNH) compared to controls with normal hearing (CONNH) showed reduced GM volume in the anterior and posterior insula. The auditory cortex and frontal gyrus exhibited decreased GM volume in the TINHL group compared to CONNH, suggesting a combined effect of tinnitus and hearing loss. While tinnitus severity did not show a significant overall effect, there was a significant positive correlation between tinnitus distress and GM volume in bilateral planum polare. Additionally, our findings suggested distinct patterns of GM volume reduction in several brain regions between MTIN and BTIN. This study contributes to the understanding of anatomical changes in the brain related to hearing loss and tinnitus by highlighting that the reduction in GM volume in several brain regions may show distinct patterns based on tinnitus severity. This advances the knowledge of tinnitus pathophysiology and can contribute to the development of treatments for tinnitus.
William Dai and Daniel Llano
Fri, 10/4 4:15PM - 6:00PM | B38
Abstract
Perineuronal nets (PNNs) have been shown to serve critical roles in neuroplasticity and neuroprotection. Despite their wide distribution across the mammalian central nervous system, PNNs are not well characterized within non-lemniscal portions of the inferior colliculus (IC). The IC is the primary integration hub of the central auditory pathway, receiving and sending multimodal information across various brain regions. The lateral cortex of the IC (LC) shows a unique level of organization in the form of modules, periodic clusters of neurons with distinct neurochemical and connectional properties. In particular, the modules stain strongly for parvalbumin, cytochrome oxidase, and other markers of metabolic activity, suggesting that they may be more susceptible to oxidative stress compared to the surrounding area, or matrix. The present study seeks to determine if PNNs are found more commonly in modules compared to matrix as a potential defense mechanism against increased oxidative stress, and if PNNs are upregulated as an animal ages due to the accumulation of oxidative species. To answer these questions, transgenic mice expressing GFP in GAD67-expressing neurons were processed, and LC-containing sections were stained for parvalbumin and PNNs. Neuronal soma were quantified based on the combination of markers they stained positively for and their location in the module-matrix organization. Statistical analyses were conducted to identify any significant differences in the proportion of cell-types between modules and matrix, as well as any significant trends with age. We found that for all GABAergic cell-types, the modules comprise a greater percentage of neurons that stained for other markers, including PNNs, compared to the matrix. The only exception was neurons that expressed GABA only, with the matrix comprising a greater percentage of these neurons. Additionally, we found that as a mouse ages, the only cell-type that appears to show a significant trend is GABAergic, PNN-expressing neurons in the modules, which increase in percentage with age. Together, these data suggest that the modules of the LC are upregulated in PNNs due to the increased metabolic activity and risk of oxidative stress present, further elucidating the functional differences between modules and matrix and the role of PNNs in the auditory pathway and their potential involvement in diseases with disturbances in oxidative stress.
Jasmine Alyssa Robinson, Mark Presker, Sarah Rajan, Madison Yip and Kasia Bieszczad
Fri, 10/4 4:15PM - 6:00PM | B39
Abstract
Enhanced behavioral reactivity to drug-associated cues is a primary feature of addiction that contributes to relapse vulnerability, a primary therapeutic target in addiction treatment. Substantial evidence supports a central role for mesolimbic system plasticity in addiction but little is known about the involvement of sensory systems. Here we report neuroplasticity events in sensory neural representations of drug-associated cues, offering a fresh perspective to explain the heightened behavioral response to drug-associated sensory stimuli. In recent years, the centrality of sensory systems in learning and memory has emerged, e.g., experience-dependent plasticity in the auditory system occurs over a lifetime of salient experiences with sound. Early auditory processing is intertwined with brainstem neuromodulatory structures like the locus coeruleus (LC), a prime candidate circuit mechanism for adaptive processes such as focused attention, or for maladaptive disease states, like addiction. Altered neural processing of sensory stimuli linked to drug-tasking may be fundamental to relapse risk. Here, we use the auditory brainstem response (ABR) in rats to investigate if basic sensory processing can be affected by experience with sound cues paired with cocaine administration. We report a novel behavioral assay that reveals neutral sound cues can be selectively conditioned with cocaine (auditory cocaine conditioning (AuCC): 6 daily 30-min. conditioning sessions of tone (e.g., 5 kHz, 70 dB SPL) exposure with cocaine (20 mg/kg; i.p. injection) to later control free exploratory or operant behavior. We also report sound-specific changes to auditory neural processing (using minimally invasive surface neurophysiology before vs. after AuCC) in early tone-evoked ABR timing and magnitude for sounds associated with cocaine (vs. tones associated with saline i.p. injection) that persist for at least weeks. Further, auditory-cued behavior and neural responses correlate with LC activity (measured using fos activity upon cocaine-cue exposure vs. saline cue exposure). Importantly, all behavioral and neural assessments report activity without cocaine on board, so are interpreted to be driven by the cocaine-conditioned properties of the paired tone cue. Indeed, auditory cocaine conditioning leads to sound-specific neuroplasticity in the auditory system that may maladaptively drive attention and behavior towards drug-associated cues. Overall, the findings are consistent with an emerging hypothesis that drug-induced neural plasticity in sensory systems may underpin the altered reactivity to drug-cues in cocaine addiction and in cue-induced relapse.
Ediz Sohoglu, James Webb and Connor Doyle
Fri, 10/4 4:15PM - 6:00PM | B40
Abstract
Prediction facilitates language comprehension but how are predictions combined with sensory input during perception? Previous work suggests that cortical speech representations are best explained by prediction error computations rather than the alternative ‘sharpened signal’ account. The key signature of prediction error is an interaction between bottom-up signal quality and top-down predictions. When predictions are uninformative, increasing bottom-up signal quality results in enhanced neural representations. However, the opposite occurs when predictions are informative (suppressed neural representations with increasing signal quality). Here we explore a listening situation more naturalistic than in previous work in which listeners heard sentences and predictions obtained directly from the speech signal i.e. from sentence context. In Experiment 1, listeners (N=30) heard degraded (16-channel noise-vocoded) sentences in which the context was strongly or weakly predictive of the last word, based on cloze probability. All sentences were semantically coherent. We also manipulated signal quality of the final word (2/4/8 channels). Using Temporal Response Function (TRF) analysis of EEG responses to the final word, we measured cortical representations of speech acoustic features (spectral and temporal modulations). We observed a significant interaction between context predictiveness and signal quality (F-test, p = .04). However, follow-up tests showed that there was only a marginal effect of signal quality on TRF model accuracies within the weakly predictive condition. In follow-up Experiment 2, listeners (N=31) listeners heard final words varying more strongly in signal quality (4/8/16 channels). We also included a control condition in which sentence context was unintelligible (1-channel noise-vocoded) and therefore completely uninformative about the final word. Here we observed a more robust interaction between context predictiveness and signal quality (F-test, p < .001). For the unintelligible context, increasing signal quality led to increased TRF model accuracies (F-test, p < .001) while for the strongly predictive context, increasing signal quality led to reduced model accuracies (F-test, p = .008). These findings are more consistent with the prediction error account and show that previous findings extend to more naturalistic listening situations. In ongoing work, we are extending these findings to further naturalistic materials (story listening).
Adele Simon, Maria Chait and Jennifer Linden
Fri, 10/4 4:15PM - 6:00PM | B41
Abstract
Cortical processing underlying speech perception can be studied through the exploration of “speech tracking”, which relies on the idea that the slow temporal variations in a speech signal are represented in the theta-delta band of cortical recordings. Analysis of speech tracking involves estimating the linear relationship between auditory input and cortical response, allowing for the reconstruction of continuous speech from human EEG responses or the prediction of continuous EEG responses from speech. The representation of speech used as a regressor is typically the acoustic envelope. This representation does not take into account temporal non-linearities in auditory processing that may influence cortical drive. We postulated that human speech tracking might be improved with a simple non-linear model of auditory temporal processing originally developed to predict brain activity in the auditory thalamus of anesthetized mice [1]. An essential part of this model is normalization by recent sound level history, which transforms the acoustic envelope into an “adaptive gain” signal representing stimulus-dependent central auditory responsiveness. Here we asked if modelling of the human auditory response to continuous speech can be improved by using the adaptive gain as a regressor rather than the acoustic envelope. As the adaptive gain calculation relies on the sound pressure level (SPL), a logarithmic representation of the envelope, the SPL representation was also included in the analysis in order to dissociate effects specific to adaptive gain and temporal normalization from those related to the use of the logarithmic scale. We analysed two datasets from different laboratories [2,3], both consisting of continuous EEG recordings from either British (n=18) or Danish (n=22) participants listening to audiobooks. Cortical speech-tracking performance was measured with both encoding and decoding models: the former reconstructs the audio from cortical recordings, and the latter predicts EEG from the audio representation. In both cases, modelling performance was assessed with either the reconstruction or prediction accuracy, quantified by the Pearson’s correlation between the reconstructed audio or predicted EEG and the actual representation or recording. Subject-dependent models were trained and tested on segments of 30s of continuous recording, using a leave-one-out approach. To assess if different auditory representations yielded different modelling performances, Wilcoxon tests were used to compare mean reconstruction accuracy and prediction accuracy across participants. Results indicated that using the SPL representation of the speech signal significantly improved modelling performance compared to using the acoustic envelope, for both the decoding model (p
James Bigelow, Yulang Wu, Ying Hu and Andrea Hasenstaub
Fri, 10/4 4:15PM - 6:00PM | B42
Abstract
Studies of auditory processing in hippocampus (HC) have mostly focused on sounds with learned behavioral meaning (e.g., tones predicting reward). Recent insights into the relationship between hearing and cognitive health have generated interest in clarifying the nature and extent of auditory processing in HC, including passive responses to sound without behavioral training. For instance, hearing loss (HL) can cause spatial memory impairment and HC abnormalities (e.g., dendritic simplification) in rodents. HL is also a risk factor for Alzheimer's disease (AD), which appears first in entorhinal cortex and HC. Finally, sensory stimulation protocols in which tone pips and/or light flashes are delivered at 40 Hz have recently attracted attention as therapeutic interventions for restoring AD-related deficiencies in HC gamma oscillations. In naïve AD mice, these tone pips evoked responses in some HC units, reduced amyloid-β plaques, and rescued spatial memory impairment. Thus, HC is affected by hearing manipulations and may respond to sounds without behavior training. Little else is known about the auditory properties of HC. How many HC neurons respond to sound? Which sound features do they prefer? Do they interact with other auditory pathway neurons? Answering these questions could clarify the relationship between hearing and cognition and provide insights relevant to hearing-based cognitive therapeutics. In the present study, we recorded single-unit activity in awake mice using Neuropixels probes spanning HC, auditory cortex (ACtx), and medial geniculate body (MGB). We presented diverse sounds including pure tones, broadband noise bursts, and dynamic moving ripples (DMRs). Roughly 1/25 HC units responded robustly to tones (FDR < 0.001), which had longer response latencies and narrower bandwidths than ACtx and MGB units. Responses were much more common for noise bursts (~1/10 units). Although we observed responses to DMR onsets in ~1/20 HC units, none showed consistent preference for spectrotemporal features in continuous DMRs using spike-triggered averaging techniques. Ensemble analysis during spontaneous recordings revealed that over 1/3 of all HC units showed synchronous activity patterns with AC and/or MGB units. Of these, most tone-responsive HC units were synchronized with both structures, whereas most others were synchronized with ACtx only. Our results suggest responses to passively presented sounds are not uncommon in HC (>1/10 units), that sound onsets (especially broadband noise) most effectively drive these units, and that they are more likely synchronized with auditory pathway structures including cortex and thalamus.
Madan Ghimire and Ross Williamson
Fri, 10/4 4:15PM - 6:00PM | B43
Abstract
Extratelencephalic (ET) neurons of the auditory cortex (ACtx) receive both ascending and intra/inter-cortical inputs, forming an intricate feedback circuit to higher-order subcortical targets. Located in both layer (L) 5, and L6b of ACtx, they play important roles in learning-induced plasticity. L5 and L6b ET neurons are morphologically and physiologically distinct. Unlike L5 ET neurons that have pyramidal-shaped cell bodies with a single prominent apical dendrite, L6b ET neurons have radially oriented soma with profusely branched dendrites extending over a millimeter, allowing them to integrate information across longer distances. In vitro studies have found that a significant fraction of L5 ET neurons are burst spiking while L6b ET neurons are largely regular spiking. The in vivo ramifications and functional significance of such ET diversity remains poorly understood. To address this, our current study focused on characterizing single-cell and population-level sound processing by both L5 and L6b ET neurons. Using an intersectional viral strategy, we expressed GCaMP8s in both L5 and L6b ET neurons and used two-photon microscopy to record calcium activity in response to pure tones and sinusoidally amplitude modulated (sAM) noise. We then used hierarchical clustering to identify and characterize distinct neural response motifs in an unsupervised manner. Pure tones elicited a response in over 33% of the L5 ET neurons (24% enhanced, 9% suppressed), while only 23% of L6 ET neurons showed a response (10% enhanced, 13% suppressed). In contrast, sAM responses demonstrated greater diversity, with clusters representing distinct patterns of both enhanced and suppressed firing. Most L5 ET neurons exhibited excitatory responses (30% enhanced, 8% suppressed), while the dominant response motif in L6b ET neurons was suppression (19% enhanced, 25% suppressed). These results demonstrate a dichotomy in stimulus processing between L5 and L6b ET neurons, potentially leading to differential impacts on downstream targets that modulate auditory-guided behavior.
Alex Clonan, Xiu Zhai, Ian Stevenson and Monty Escabi
Fri, 10/4 4:15PM - 6:00PM | B44
Abstract
Recognizing speech in noisy environments is a critical task of the human auditory system where the unique spectrum and modulation content of speech and backgrounds interfere to influence perception. Natural backgrounds can be quite diverse, with high degrees of spectrotemporal variability that arises from highly varied environmental acoustic generators. For speech, articulation imposes unique acoustic idiosyncrasies (i.e.: fundamental frequency, intonation) that influence our vocal quality, pronunciation, and phonetic implementation. Here we assess how the spectrum and modulation statistics of natural sounds mask the recognition of spoken digits (0 to 9) and investigate speaker and word driven foreground effects. We enrolled participants in a psychoacoustic study where digits were presented in various natural background sounds (e.g., water, construction noise, speaker babble; tested for SNR=-18 to 0 dB) and their perturbed variants. We perturbed the backgrounds by either 1) phase randomizing (PR) the sound spectrum or 2) spectrum equalizing (SE). PR retains the power spectrum but distorts the modulation statistics while SE distorts the power spectrum and retains modulation statistics. Even at a constant noise level (-9 dB SNR), the ability to recognize foreground digits was substantially helped or harmed by these background perturbations, depending on the original background sound. Yet, each word had a unique interaction with a delivered background’s statistics, varying across spectro-temporal perturbations. This indicates that word level acoustics interfere with the background statistics in unique to each digit. To identify the acoustic cues that underlie speech and noise interference, we next developed a physiologically inspired model of the auditory system to predict perceptual trends with a bottom-up representation of word acoustics. Sounds were decomposed through a cochlear filter bank (cochlear stage) and a subsequent set of spectrotemporal receptive fields that model modulation selectivity in auditory midbrain (mid-level stage). Using logistic regression, we then estimated perceptual transfer functions that provide insight to the acoustic cues driving speech perception from the vantage point of different speakers and vocalized words. Preliminary results suggest that these word specific transfer functions interfere with background sounds in predictable ways, explaining ~60% of the perceptual variance for single digit-in-noise identification.
Andre Palacios Duran, Aaron Nidiffer and Edmund Lalor
Fri, 10/4 4:15PM - 6:00PM | B45
Abstract
Low-frequency cortical activity tracks the dynamics of natural speech. However, the field still lacks consensus regarding the precise physiological mechanisms of this tracking. One prominent model proposes that the quasi-rhythmic structure of speech entrains ongoing cortical oscillations, mediating the segmentation of sounds into phonetic-linguistic units (for example, phonemes or syllables). This falls in line with the widely accepted view that the brain extracts information from speech by passing neuronal representations of speech along hierarchically connected brain regions, each of which parses the speech at different timescales. However, the fact that cortical activity tracks the dynamics of speech is also compatible with an alternative model – one that assumes that such tracking reflects the summation of a series of transient evoked responses originating from neural networks tuned to various acoustic and linguistic features of speech. Here, we aim to compare these two potential explanatory mechanisms. In particular, we begin by recognizing the fact that regressing neurophysiological data against the amplitude envelope of speech produces a temporal response function (TRF) which can reliably predict responses to novel speech stimuli. Then we ask the question: can candidate oscillatory models of speech tracking explain the existence speech generated TRFs? We do this by driving two oscillatory models with speech stimuli, attempting to fit TRFs to the resulting simulated brain activity, and then assessing whether the resulting TRFs can predict held-out simulated responses. One of the models is biologically implausible but produces reasonable TRFs, while the other produces simulated neural activity with atypical characteristics. We then extend our approach by modeling EEG datasets recorded while healthy, neurotypical adults listened naturalistic speech with modulated dynamic properties. Specifically, we model the EEG using the TRF framework (which is implicitly an evoked response model) and by fitting the abovementioned oscillator models using nonlinear optimization. This allows us to compare the temporal dynamics of the data simulated by each model against the dynamics of EEG. In addition, by determining oscillator model parameters via optimization to predict real EEG data, this work goes a step beyond prior studies using heuristically chosen parameters. This study establishes a framework for resolving an important debate in the field of speech – and indeed perceptual – neurophysiology.
Shagun Ajmera, Ivan Abraham and Fatima Husain
Fri, 10/4 4:15PM - 6:00PM | B46
Abstract
To understand the cognitive networks of auditory disorders in humans, neuroimaging methods are crucial. Tinnitus is one of the most bothersome hearing disorders, where patients suffers from a continuous inescapable phantom sound percept. Yet, the neuroscientific understanding of tinnitus is limited at present, partly due to lack of sufficient neuroimaging data to account for the large variability inherent in tinnitus conditions. More often than not, data acquisition is impeded by low patient counts and doubled costs against scanning ample controls. But thanks to increase in data sharing trends, data scarcity can be overcome with ‘harmonization’ methods if scans from independent sites and studies are combined systematically and successfully. We attempt to optimize data harmonization with limited ‘n’ of tinnitus participants, by merging our modest MRI dataset (n=87) with matched samples from another large dataset of controls, the Lifespan Human Connectome Project (HCP; n=725). We exploit whole-brain functional connectivity (FC) measured on resting-state brain activity and use artificial intelligence based techniques in order to methodically pool datapoints of participant FCs across the two datasets. Through multiple independent iterations, we train the model on several independent HCP samples along with our tinnitus and control samples to achieve a robust architecture for FC reconstruction and tinnitus classification. We assemble and train the pipeline using deep neural network modules to purge dataset-dependent factors from individual FCs while simultaneously preserving connectivity patterns that effectively distinguish tinnitus from controls. Preprocessed resting state functional MRI scans were used to measure correlation-based FC between brain regions, for tinnitus and control participants in our dataset as well as age and gender matched controls in HCP. A dataset identifier module was pre-trained using multivariate logistic regression to predict the data source from individual FC irrespective of tinnitus status. Next, a variational autoencoder was trained to encode and decode individual FC, with loss function set to penalize the error in estimated FC, along with the dataset identifier’s prediction on the estimated FC. Training was performed over 500 epochs and 20 repetitions on randomly drawn and matched HCP data, and class-balance maintained with our data. Concurrently the estimated FC matrices were input to a binary tinnitus classifier, a feedforward fully-connected neural network module, which was trained to predict class labels (tinnitus/control) based on individuals’ groups identity. In addition to training and testing data, performance was tracked for the left-out tinnitus and control data, and also other individuals having hearing loss with/without tinnitus. The figure in the attached file illustrates the deep learning architecture used. The dataset-identifier predicted the source data on FC with 99.4% accuracy for participants in our dataset, and 93.3% for HCP. Over the training period of autoencoder, the estimated FC showed steadily decreasing reconstruction error in conjunction with low dataset identifiability. The dataset-identifier predictions were closer to chance accuracy for estimated FC, in contrast to high accuracy for the actual FC. Lastly, the tinnitus classifier could predict labels with better accuracy on estimated FC than actual FC. Although the classifier was trained only on tinnitus and control data, the probability predictions for controls’ estimated FC were significantly lesser not only than those for tinnitus’, but also other non-control groups such as hearing loss with and without tinnitus. Leveraging multiple databases can be a useful innovation around insufficient patient data in neuroimaging tinnitus studies - not only in relieving statistical demands, but also in introducing participant heterogeneity in the data. While combining independent datasets is not straightforward due to variability in a range of complex parameters across sites, deep learning based methods are capable of eliminating irrelevant site-specific noise in the data, and amplifying the neural markers of interest in tinnitus.
Nathan Vogler, Ruoyi Chen, Alister Virkler, Violet Tu, Jay Gottfried and Maria Geffen
Fri, 10/4 4:15PM - 6:00PM | B47
Abstract
In a real-world environment, the brain must integrate information from multiple sensory modalities, including the auditory and olfactory systems. However, little is known about the neuronal circuits governing how odors influence and modulate sound processing. Here, we investigated the mechanisms underlying auditory-olfactory integration using anatomical, electrophysiological, and optogenetic approaches, focusing on the auditory cortex as a key locus for cross-modal integration. First, retrograde and anterograde viral tracing strategies revealed a direct projection from the piriform cortex to the auditory cortex. Next, using in vivo electrophysiological recordings of neuronal activity in the auditory cortex of awake mice, we found that odor stimuli modulate auditory cortical responses to sound. Finally, we used in vivo optogenetic manipulations during electrophysiology to demonstrate that olfactory modulation in auditory cortex, specifically, odor-driven enhancement of sound responses, depends on direct input from the piriform cortex. Together, our results identify a novel cortical circuit shaping olfactory modulation in the auditory cortex, shedding new light on the neuronal mechanisms underlying auditory-olfactory integration.
Jade Toth, Blake Sidleck, Olivia Lombardi, Tiange Hou, Abraham Eldo, Madelyn Kerlin, Xiangjian Zeng, Danyall Saeeed, Luz Andrino, Tal Inbar, Michael Malina and Michele Insanally
Fri, 10/4 4:15PM - 6:00PM | B48
Abstract
Flexible responses to sensory cues in dynamic environments are essential for adaptive auditory-guided behaviors such as navigation and communication. How do neural circuits flexibly gate sensory information to select appropriate behavioral strategies based on sensory input and context? Auditory neural responses during behavior are diverse, ranging from highly reliable ‘classical’ responses (i.e. robust, frequency-tuned cells) to irregular or seemingly random ‘non-classically responsive’ firing patterns (i.e., nominally non-responsive cells) that fail to demonstrate significant trial-averaged responses to sensory inputs or other behavioral factors. While classically responsive cells have been studied for decades, the contribution of non-classically responsive cells to behavior remains underexplored despite their prevalence. Our previous work has shown that non-classically responsive cells in auditory cortex (AC) and secondary motor cortex (M2) contain significant stimulus and choice information and encode flexible task rules. Both classically and non-classically responsive units are essential for asymptotic task performance, yet their role during learning is unknown. Here, we explore how diverse cortical responses emerge and evolve during flexible behavior. We trained mice on a go/no-go auditory reversal learning task to respond to a target tone (11.2 kHz) by licking for a water reward, and to withhold their response to a single non-target tone (5.6 kHz); referred to as the ‘pre-reversal’ stage). Once mice reached behavioral criteria (percent correct ≥ 70% and d’ ≥ 1.5) a rule switch was implemented by ceasing to reward the 11.2kHz tone and rewarding a previously non-rewarded tone (5.6 kHz); referred to as the ‘post-reversal’ stage. This paradigm required animals to relearn the task with different reward contingencies while ensuring they have learned the core task structure. Mice learned this task within a few weeks performing at high d’ values (d’ = 2.5±0.1, N = 15 mice). We identified three key learning phases: ‘early learning’ when animals performed near chance (< 30% progress towards max d’ for that animal) ‘late learning’ when behavioral performance rapidly improves (≥ 30% progress towards max d’) and ‘expert’ performance (d’ ≥ 1.5 and percent correct ≥ 70%). Bilateral chemogenetic inactivation of auditory cortex during post-reversal significantly impaired behavioral performance demonstrating that reversal learning depends on auditory cortex. We used high-channel count silicon probes to record from single units in AC (n=1,327 cells) during all learning phases including pre-and-post reversal. Cortical response profiles during learning were heterogeneous, spanning the continuum from classically to non-classically responsive. We found that the proportion of non-classically responsive neurons significantly increased during late learning when the greatest gains in behavioral performance occur. Surprisingly, single-cell Bayesian decoding performance was highest for classically responsive cells during early and expert learning phases but significantly decreased during late learning. To examine how task information is represented at the population-level, we decoded from ensembles of simultaneosly recorded cells that included varying fractions of classically and non-classically responsive cells. We found that during late learning mixed ensembles comprised of both classically and non-classically responsive cells encode significantly more task information than homogenous ensembles and emerge as a functional unit critical for learning. Emerging evidence suggests that AC, and an area downstream, M2, form a central auditory axis critical for auditory perception. M2 is directly connected to AC and suppresses sensory responses in AC during locomotion. Strikingly, we found that optogenetically silencing inputs from M2 selectively modulated non-classically responsive cells in AC, prevented mixed ensemble recruitment, and impaired reversal learning (n = 570 cells, N = 7 animals). Our findings demonstrate that top-down inputs recruit non-classically responsive neurons into diverse ensembles in auditory cortex to enable behavioral flexibility.
Naho Konoike, Miki Miwa, Kosuke Itoh and Katsuki Nakamura
Fri, 10/4 4:15PM - 6:00PM | B49
Abstract
Vocal communication plays a crucial role in primate social interactions, particularly in survival and social organization. Humans have developed a unique and advanced vocal communication strategy in the form of language. To study the evolution of human language, it is necessary to investigate the neural mechanisms underlying vocal processing in humans and to understand how brain mechanisms have evolved by comparing them to those in non-human primates. Here, we examined brain activity in frontal and temporal areas of marmoset monkeys while the animal was listening to conspecific calls. We used 14 adult common marmosets (6 males and 8 females, 2-12 years old, 330-490 g) for electrocorticogram (ECoG) or electroencephalogram (EEG) recordings. All experiments were approved by the Animal Care and Use Committee of Kyoto University. The animals were presented with three types of species-specific vocalizations (Phee, Tsik, and Tsik-String calls) and white noise lasting approximately one second during the recording. In four marmosets, the 16-ch sheet electrode array was implanted in the auditory cortex or prefrontal cortex. In the remaining 9 animals, electrodes were placed non-invasively on the scalp according to the international 10-20 system. We found that event-related spectral power increased at a lower frequency of approximately 10-30 Hz immediately after stimulus onset. These responses were observed in EEG and ECoG data and were prominent in older marmosets over 8 years of age. In contrast, power in the gamma range (above 30 Hz) decreased after stimulus onset compared to the baseline period and lasted for one second. This decrease in event-related power was ambiguous in the older marmosets. We found that in the middle frontal area (Fz), initial transient responses to call and noise stimuli varied with age, and sustained responses were suppressed in younger animals. These may reflect the functional maturation of the Fz. When decoding event-related potentials (ERPs) by multivariate pattern classification, the decoding accuracy of four auditory stimuli was transiently high at 200 ms after stimulus onset, indicating that ERPs represent noise and call information. This study provides novel insights by scalp EEG and ECoG to capture widespread neural representations in marmosets during vocal perception, filling gaps in existing knowledge.
Ian Griffith, R. Preston Hess and Josh McDermott
Fri, 10/4 4:15PM - 6:00PM | B50
Abstract
Introduction Attention enables communication in settings with multiple talkers, allowing us to select sources of interest based on their features. Decades of research have left two gaps in our understanding of feature-based attention. First, humans succeed at attentional selection in some conditions but fail in others, for reasons that remain unclear. Second, neurophysiology experiments implicate multiplicative gains in selective attention, but it remains unclear whether such gains are sufficient to account for real-world attention-driven behavior. To address these gaps, we optimized an artificial neural network with stimulus-computable feature-based gains for the task of recognizing a cued talker’s speech, using binaural audio input (a “cocktail party” setting). Results Feature-based attention task. Humans and models reported the middle word in a speech excerpt spoken by a cued talker within a mixture of talkers. The cued talker was indicated by a different excerpt of the cued talker’s voice, presented prior to the mixture. Both cue and mixture were presented as binaural audio. Model training. Training examples were constructed by sampling from a set of 48,000 voices, with mixtures composed of a randomly selected target talker excerpt superimposed with between 1 and 5 competing voices or natural sounds, at a randomly sampled signal-to-noise ratio (-10 to 10 dB). Audio was spatially rendered at locations within simulated reverberant rooms using human head-related transfer functions. Feature-based attention model. We supplemented a convolutional neural network (CNN) model of the auditory system with feature gains derived from the representation of the “cue” sound. The representation of the cue was derived from the same CNN by averaging the activations over time, yielding a memory representation of the cued source’s features. These average cue activations were the input to sigmoidal gain functions. Intuitively, the gains should be high for features present in the cue, passing the target talker’s features through the auditory system while attenuating other features. The sigmoid parameters were optimized along with the CNN parameters to maximize correct reports of the cued source. Models attend on par with humans. We measured task performance in 81 human participants for several types of distractors, and then simulated the same experiment on the model. Stimuli used in this experiment were new to both participants and the model. The model approximately replicated both the overall performance of human listeners and the dependence on SNR and distractor type. The model also exhibited failures of attention (reports of words uttered by distractor talkers) at a similar rate to humans, and closely matched the characteristic deficits produced in humans when listening to inharmonic and whispered speech. Models learn spatial tuning in azimuth. Humans benefit from spatial separation between sources (“spatial release from masking”). To test if the models similarly exploited spatial separation, we evaluated performance as a function of target-distractor separation in azimuth for spatially rendered sound sources. Target word recognition increased and confusion rates decreased as a function of spatial separation for the model, as in humans. To assess whether the model had merely learned to select the ear with the higher SNR, we measured recognition thresholds (the SNR granting 50% of ceiling performance) with distractors placed symmetrically in azimuth (to eliminate “better-ear listening”). The model displayed thresholds that varied with spatial separation on par with humans. Models learn late selection. Human auditory attention tends to enhance the neural representation of a target source at late stages of the auditory system. Attentional selection in the model could in principle occur at any model stage. To assess the locus of model attentional selection, at each CNN stage we measured correlations between the activations of target-distractor mixtures and of either the target or distractor alone. Differences between target-mixture and distractor-mixture correlations emerged only at later model stages, consistent with human neuroscience evidence for late selection. Baseline models. To investigate the importance of the architectural constraint imposed by the gain functions, we trained a version of the model without this constraint, instead taking the mixture and cue as separate input channels. This Baseline CNN provided a worse match to human behavior, reporting words from “distractor” talkers more frequently, and explaining less variance in human performance than the feature-based attention model. The Baseline model also had worse absolute performance in spatialized configurations, with higher thresholds than humans. Conclusions Despite not being fit to match humans, a model equipped with multiplicative feature gains and then optimized for word recognition replicated human performance across a wide range of real-world conditions, showing signs of selection based both on the voice’s timbre and spatial location. The results suggest that human-like attentional strategies emerge as an optimized solution to the cocktail party problem, providing a normative explanation for the limits of human performance in this domain. The model also provides hypotheses for how attention might be expected to modulate neural responses at different stages of the auditory system.
Namitha Jain, Shagun Ajmera, Gibbeum Kim, Howard Berenbaum and Fatima Husain
Fri, 10/4 4:15PM - 6:00PM | B51
Abstract
Hyperacusis and misophonia are sound tolerance disorders in which everyday sounds, usually innocuous to most people, become extremely bothersome. These disorders are distinguished by the specific types of sounds that trigger discomfort. In hyperacusis, a wide range of sounds become intolerable at certain loudness levels. Conversely, misophonia involves a reduced tolerance to very specific sounds, also known as trigger sounds, such as chewing, sniffing, and slurping. Despite these differences, both disorders share overlapping symptoms, such as decreased sound tolerance and emotional and physical reactions to sounds. Moreover, hyperactivity and hyperconnectivity in similar neural regions, including the auditory-limbic and attentional networks, are implicated in both disorders. However, these neurophysiological studies typically examine hyperacusis or misophonia separately, neglecting their overlapping symptomatology. Recent findings suggest that these disorders frequently co-occur (Brennan et al., 2024), making it unclear whether the implicated neural regions are specific to hyperacusis or misophonia. Our study aimed to explore the neural correlates of hyperacusis, misophonia, and their comorbidity using a task-based functional magnetic resonance imaging (fMRI) paradigm. We collected task-based fMRI data from 92 participants aged 18 to 25 years, categorized into four groups: hyperacusis (H: N=19), misophonia (M: N=29), comorbid hyperacusis and misophonia (MH: N=19), and controls (C: N=25). Participants were classified based on hyperacusis questionnaire scores and structured clinical interviews. During the fMRI experiment, participants were exposed to 90 emotionally valent sounds while inside the MR scanner. These sounds were selected from the International Affective Digitized Sounds-2 database, with 30 sounds from each emotional valence category: unpleasant, neutral, and pleasant. The sounds were presented in a pseudorandomized order, and each sound stimulus lasted 6 seconds. Participants were instructed to perceptually categorize each sound as unpleasant, neutral, or pleasant as soon as they felt confident in their rating. Only individuals with normal hearing thresholds (
Madhumitha Manjunath, Sahil Luthra, Wusheng Liang, Barbara Shinn-Cunningham and Abigail Noyce
Fri, 10/4 4:15PM - 6:00PM | B52
Abstract
Lateral prefrontal cortex (PFC) is recruited in a variety of cognitive functions, leading to its characterization as a key player in a “multiple demand” (MD) network (Fedorenko et al., 2013, PNAS). However, recent evidence suggests that lateral PFC contains discrete, interdigitated structures with significant preferences for auditory versus visual cognition (Michalka et al., 2015, Neuron; Noyce et al., 2017, Journal of Neuroscience; Noyce et al., 2022, Cerebral Cortex). The transverse gyrus intersecting precentral sulcus (tgPCS) and caudal inferior frontal sulcus/gyrus (cIFS/G) are active during auditory tasks and are preferentially connected to the auditory cortex. Their visual-biased counterparts (superior and inferior precentral sulcus, sPCS and iPCS) are active during visual tasks and are preferentially connected to the visual cortex. In vision, spatiotopic organization is dominant, and spatial maps are observed in visual-biased regions of PFC (Michalka et al., 2015, Neuron). However, in audition, tonotopic organization dominates in the early auditory cortex (e.g., Dick et al., 2017, Journal of Neuroscience). Here, we use functional magnetic resonance imaging (fMRI) to evaluate whether tonotopic organization occurs in auditory-biased regions of PFC. MRI data were collected on a Siemens MAGNETOM Prisma 3T MRI scanner at the CMU-Pitt BRIDGE Center. Preprocessing was done using fMRIPrep pipeline (Esteban et al., 2019, Nature Methods); General Linear model (GLM) analyses were conducted in Python using the nilearn package; phase-encoded analyses were implemented in csurf. We adapted the audio-visual (AV) localizer from Noyce et al. (2017) to identify auditory-biased regions of PFC. We directly contrasted auditory working memory (2-back on cat and dog vocalizations) against visual working memory (2-back on male and female face photographs), and used the resulting maps to functionally define bilateral tgPCS and cIFS/G regions of interest for each subject. In our experience, mapping PFC requires cognitively demanding tasks. We thus developed a task in which 2-back working memory for short 4-note melodies is embedded in phase-encoded tonotopy. In four tonotopy runs, participants hear a series of four-tone melodies and are asked to detect 2-back repetitions; over the course of a scanning run, tones sweep upward or downward in frequency, with tone frequencies ranging from 175 Hz to 5286 Hz. We employ a phase-encoded tonotopy design, such that particular frequency ranges occur at regularly spaced intervals during the run; in this way, a Fourier-based analysis can recover the phase lag (and therefore frequency band) that elicits the maximal response across cortex. Future planned analyses include measurement of language-specific recruitment across the auditory-biased PFC, as well as investigation into a possible role of functional connectivity among PFC and temporal lobe regions. Previous studies have identified auditory and visually-biased subregions of lateral PFC using eight runs of the AV localizer task; here, we demonstrate that these regions can be reliably recovered using only four runs. Participants exhibited high level behavioral performance across all tasks, despite the high working memory demands. The 2-back tonotopy task robustly recruits auditory-biased and visual-biased regions of PFC, and yields strong tonotopic organization in the auditory cortex. However, preliminary analyses (n=6) show weak to no tonotopic organization across auditory-biased PFC regions. This suggests that the cognitive role these regions play in hearing may differ substantially from that observed in vision by visual-biased PFC regions, consistent with other asymmetries between sensory modalities that have previously been observed in PFC.
Zhengjie Yang and Livia de Hoz
Fri, 10/4 4:15PM - 6:00PM | B53
Abstract
To make sense of the acoustic environment, the auditory system must segregate, based on their history, streams of sounds arriving at the ear simultaneously. This requires sensitivity to both the probability and the predictability of the different sounds in the stream. That the auditory system is sensitive to the probability of appearance of a sound is well established. Here we presented complex sound protocols with varying predictability to anesthetized mice. Neuronal responses in the mouse inferior colliculus (IC) were recorded using with Cambridge NeuroTech and Neuropixels probes. Neurons exhibited suppression that was specific to unpredictable sounds and could not be explained through tuning, probability of appearance, or adaptation triggered by the immediately preceding sounds. Notably, the magnitude of the suppression was dependent on the tuning of the neurons relative to the predictable sounds. Furthermore, the effect was insensitive to temporal expansions in the sound sequence resulting from increasing the inter-tone interval up to 4 times. Therefore, in complex auditory environments, neurons reflect unsupervised learning of the relative predictability of various sounds, in a process that might be relevant for stream segregation.
Emily Han, Yukai Xu, Zheng Pan and Joji Tsunada
Fri, 10/4 4:15PM - 6:00PM | B54
Abstract
Exploring the adaptation of the brain to the statistical properties of natural sensory inputs is crucial for understanding the evolutionary shaping of sensory systems and for the development of bio-inspired algorithms in artificial intelligence. In group-living primates, species-specific vocalizations play a pivotal role in communication, and the auditory cortex may have adapted to process these sounds efficiently. While previous research has demonstrated that the marmoset monkey auditory cortex exhibits both temporally synchronized responses (temporal coding) and non-synchronized rate-coding responses to time-varying auditory stimuli, the optimization of these coding strategies for processing the statistical characteristics of monkey vocalizations has yet to be elucidated. Our study first involved a detailed analysis of the structure and acoustic properties of marmoset vocalizations recorded in a colony setting. We focused on three primary types of social calls—Phee, Trillphee, and Trill—and measured key parameters such as fundamental frequency (f0), f0 contour, modulation frequency and depth, and frequency modulation (FM) transition time. Utilizing these metrics, we synthesized vocalizations that either conformed to or deviated from the natural parameter range. Subsequently, we investigated how these synthesized sounds are represented in the auditory cortex. Our preliminary analysis revealed that neurons showing synchronized response to FM-rich trill calls depended upon the modulation frequency and depth, as well as f0 contour, with no clear preference for acoustics in the natural range. Consistent with the opponent model, rate-coding neurons typically exhibited monotonically changing firing rates depending upon modulation frequency. However, a subset of neurons showed specificity for acoustic features falling within the natural parameter range. These findings suggest that the marmoset auditory cortex employs specific adaptations to process auditory features in their species-specific vocalizations.
Kevin Sitek, Noirrit Chandra, Bharath Chandrasekaran and Abhra Sarkar
Fri, 10/4 4:15PM - 6:00PM | B55
Abstract
The auditory system comprises multiple subcortical brain structures that process and refine incoming acoustic signals along the primary auditory pathway. Due to technical limitations of imaging small structures deep inside the brain, most of our knowledge of the subcortical auditory system is based on research in animal models using invasive methodologies. Advances in ultra-high field functional magnetic resonance imaging (fMRI) acquisition have enabled novel non-invasive investigations of the human auditory subcortex, including fundamental features of auditory representation. However, functional connectivity across subcortical networks is still underexplored in humans, with ongoing development of related methods. Traditionally, functional connectivity is estimated from fMRI data with full correlation matrices. However, partial correlations reveal the relationship between two regions after removing the effects of all other regions, reflecting more direct connectivity. Partial correlation analysis is particularly promising in the ascending auditory system, where sensory information is passed in an obligatory manner, from nucleus to nucleus up the primary auditory pathway, providing redundant but also increasingly abstract representations of auditory stimuli. While most existing methods for learning conditional dependency structures based on partial correlations assume independently and identically Gaussian distributed data, fMRI data exhibit significant deviations from Gaussianity as well as high temporal autocorrelation. Here, we developed an autoregressive matrix-Gaussian copula graphical model to estimate the partial correlations and infer the functional connectivity patterns within the auditory system while appropriately accounting for autocorrelations between successive fMRI scans. Our results show strong positive partial correlations between successive structures in the primary auditory pathway on each side (left and right), including between auditory midbrain and thalamus, and between primary and associative auditory cortex. These results are highly stable when splitting the data in halves and computing partial correlations separately for each half of the data, as well as across cross-validation folds. In contrast, full correlation-based analysis identified a rich network of interconnectivity that was not specific to adjacent nodes along the pathway. Overall, our results demonstrate that unique functional connectivity patterns along the auditory pathway are recoverable using novel connectivity approaches and that our connectivity methods are reliable across multiple acquisitions.
Maansi Desai, Alyssa Field, Gabrielle Foox, Nancy Nussbaum, Rosario DeLeon, Andrew Watrous, William Schraegle, Dave Clarke, Elizabeth Tyler-Kabara, Howard Weiner, Anne Anderson and Liberty Hamilton
Fri, 10/4 4:15PM - 6:00PM | B56
Abstract
Intracranial recordings have provided valuable insights into investigating the neural circuitry of speech perception in adults. Similar research in pediatric populations is rare due to the difficulty of recordings and, in many cases, the inability of younger patient participants to tolerate monotonous experimental sessions. We addressed this gap by determining whether movie trailer stimuli could be used to replace more typical, less engaging sentence stimuli to derive auditory receptive fields in children undergoing invasive surgical monitoring for epilepsy. We additionally incorporate age and neuropsychological measures in efforts of identifying speech and language responses in the brain across the lifespan. We recorded stereoelectroencephalography (sEEG) from 30 patients (age 4-21, 17M/13F) at Dell Children’s Medical Center in Austin, Texas and Texas Children’s Hospital in Houston, Texas. Electrode coverage included right hemisphere or bilateral coverage of auditory and language-related areas. All patients listened to and watched audiovisual children’s movie trailer stimuli. A subset of patients also listened to sentences from the TIMIT acoustic-phonetic database. We fit linear encoding models to describe the relationship between acoustic and linguistic stimulus features and the high gamma power of the local field potential (70-150 Hz). Predicting neural activity from phonological features and the spectrogram demonstrated robust model performance in bilateral primary and secondary auditory regions such as Heschl’s gyrus, superior temporal gyrus and middle temporal gyrus (ravg_phn=0.12, rmax_phn=0.68, ravg_spec=0.11, rmax_spec=0.53). To determine whether phonetic and spectral selectivity changes with development, we categorized our patient population into young childhood (age 4-5), middle childhood (6-11), early adolescence (12-17), and late adolescence (18-21). We found that phonetic selectivity was more robust in the early and late adolescent groups compared to younger ages, whereas spectral tuning emerged earlier in development. In addition, latency of speech responses decreased with age. These differences appear to be age-related rather than related to speech and language ability, as we observed no correlation between pre-operative neuropsychological measures of receptive language nor attention on phonetic and spectral selectivity. Overall, our use of an engaging audiovisual stimulus allowed us to derive acoustic and phonetic selectivity using a task that is appropriate for a wide age range.
Shreya Nandi, Andrea Bae, Roland Ferger and José Luis Peña
Fri, 10/4 4:15PM - 6:00PM | B57
Abstract
Inherent to survival of many species is sound localization. Barn owls (Tyto furcata) are sound localization specialists and utilize binaural cues of interaural time difference (ITD) and interaural level difference (ILD) to infer horizontal and vertical locations respectively in space. These cues construct a topographic map of space in the midbrain, but the readout of this map is corruptible to a dearth of frequency information. Specifically, pure tone and narrowband frequencies originating from one location in space are perceived as originating from multiple locations, a result of the phase-locking and frequency-specific nature of ITD-inferring neurons. These additional perceived locations of a pure tone or narrowband sound are known as phantom sound sources. Behavioral studies have demonstrated that owls will head-turn towards both true and phantom sources when the sound’s bandwidth is less than 3 kHz. Similarly, electrophysiological studies of the optic tectum (OT), a part of the midbrain containing the topographic map of space, have demonstrated that side peak suppression, a phenomenon necessary for accurate sound localization, does not occur when the sound’s bandwidth is less than 3 kHz. However, these studies were performed using a single electrode in a single part of the map, while pure tones and narrowband sounds activate multiple parts of the map. Additionally, activation of multiple components of the map invokes the midbrain stimulus selection network to determine the most salient stimulus for further processing. Further, information across OT is biased by upstream frequency-ITD preferences, calling into question the equality of responses to pure tones and narrowband sounds across the map. Thus, we propose using a multi-electrode array to concurrently observe responses to pure tones and narrowband sounds across the OT map, and hypothesize excitation of responses at ITDs whose frequency preferences include the frequency of the stimulus, and suppression of responses at ITDs whose frequency preferences do not include the frequency of the stimulus.
Adam Hockley, Laura H Bohorquez and Manuel S Malmierca
Fri, 10/4 4:15PM - 6:00PM | B58
Abstract
Under the Bayesian brain hypothesis, the brain continuously generates a model of the environment based on predictions gained from previous experiences. Incoming sensory information is compared to this model and either confirms the prediction, or if significantly different enough, prediction errors update the generative model. Using the auditory “oddball” paradigm, neural correlates of prediction errors are observed as early as the auditory midbrain, with the strength of error increasing up the auditory hierarchy to the auditory cortex (AC). The medial prefrontal cortex (mPFC), which is involved in planning complex behaviour and decision making, exhibits strong and long-lasting responses solely to auditory deviants, consistent with coding of prediction error. This raises the hypothesis that mPFC exerts a top-down control of deviance detection in sensory cortices, by transmitting prediction signals. To test this, we injected Female Long-Evans rats with 1 μl AAV5-hSyn-eNpHR3.0-EYFP to the mPFC to allow optogenetic suppression of mPFC neurons. After 7-11 days of recovery, 64-channel neural recordings were conducted in the AC under urethane anesthesia. An auditory “oddball” paradigm composed of standard (STD) repeating stimuli, deviants (DEV) and no-repetition controls (CTR) were presented monaurally. This allowed decomposition of the neural mismatch effect into two components: repetition suppression and prediction error, which were measured during suppression of mPFC neurons. Rats showed expression of EYFP throughout the mPFC, demonstrating successful optogenetic virus transfection. Further, neural recordings using an optrode in the mPFC showed that local LED illumination reduced activity of mPFC neurons during spontaneous and auditory-evoked activity. LFP and single-unit recordings in the AC during the auditory “oddball” paradigm showed robust neural mismatch, with responses to DEV stimuli greater than CTR stimuli, and limited responses to STD stimuli. Inhibition of the mPFC had no effect on neural responses during STD or CTR stimuli, but reduced LFP amplitudes and single-unit responses to DEV stimuli, providing evidence for top-down predictive transmission from mPFC which enhances AC responses to unpredicted stimuli. The reduced deviant response during mPFC inhibition was accompanied by reduced neural synchrony in the auditory cortex, suggesting weakening of the previously described cortical deviant-detector ensembles.
Carolyn Sweeney, Maryse Thomas, Lucas Vattino, Kasey Smith and Anne Takesian
Fri, 10/4 4:15PM - 6:00PM | B59
Abstract
The GABAergic inhibitory interneurons that populate the outermost layer of superficial sensory cortex are sites of convergence for bottom-up sensory and top-down contextual signals that powerfully regulate cortical network activity and plasticity. However, little is known about how these interneurons process sensory information. We performed two-photon calcium imaging in awake mice to record the responses of two superficial cortical interneuron populations in superficial auditory cortex, VIP and NDNF interneurons, to simple and complex sound stimuli and compared these responses to those recorded in layer 2/3 pyramidal neurons. We find that VIP and NDNF interneurons respond robustly to sound, however with less trial-to-trial reliability than pyramidal neurons. Interestingly, VIP and NDNF interneurons display sharp tuning for the stimuli to which they show reliable responses. VIP and NDNF interneurons exhibit stronger within-population noise correlations than pyramidal neurons, suggesting that robust non-auditory inputs to these neurons or strong interactions within these interneuron networks may underlie the low reduced reliability. Indeed, the magnitude and reliability of sound-evoked responses of these interneurons are increased during locomotion, suggesting that their activity is modulated by top-down inputs that convey behavioral state. This study demonstrates the diversity of the in vivo responses of VIP and NDNF interneurons, and points to the state-dependent modulation of reliability as a potential mechanism by which superficial cortex relays contextual cues.
Vrishab Commuri, Dushyanthi Karunathilake, Stefanie Kuchinsky, Behtash Babadi and Jonathan Simon
Fri, 10/4 4:15PM - 6:00PM | B60
Abstract
Listening in difficult, noisy conditions affects the cortical neural circuits that underlie speech comprehension. These directional circuits convey neural signals between cortical regions, encode information related to processing of the stimulus, and are characterized by their dominant frequency band, e.g., delta band or theta band. Here we elucidate how these circuits change as listening conditions become increasingly adverse, and we reveal differences in regional recruitment between older and younger individuals. We utilize the Network Localized Granger Causality (NLGC) framework applied to magnetoencephalography (MEG) data to simultaneously estimate neural currents in cortex and the graph network that connects current sources to one another. This directional connectivity is analyzed in multiple non-overlapping regions that span the entire cortex. Additionally, a Temporal Response Function (TRF) analysis is performed on the estimated current sources to probe hierarchical processing of speech features among network-connected current sources and to determine to what extent these circuits convey signals that temporally track the stimulus. Broadly, we estimate the connectivity of the cortical neural circuits in physiological frequency bands that are involved in processing speech, and we examine how the circuits change with age and listening difficulty. We also demonstrate how to combine these circuits with established TRF analysis to localize hierarchical processing of speech. We present results on a listening data set, but note that the methods are widely applicable to most MEG data sets. This work was supported by NIH grants R01-DC019394 and T32-DC00046.
Jacie McHaney, Zhe-Chen Guo, Nike Gnanateja, Aravindakshan Parthasarathy and Bharath Chandrasekaran
Fri, 10/4 4:15PM - 6:00PM | B61
Abstract
Speech perception abilities decline with age and have been partially attributed to a number of factors beyond hearing loss, such as temporal processing of voice pitch. Temporal processing of voice pitch, or the fundamental frequency (F0) in speech, helps listeners identify the target speaker, detect prosody, and tags important information for speech perception. Middle-aged adults without overt hearing loss report increased rates of speech perception difficulties, but the extent that temporal processing of F0 is awry in middle-age is unclear. Using electroencephalography (EEG) in younger and middle-aged adults, we measured neural phase locking to static F0 in syllables /ba/, /da/, and /ga/, which are commonly used to examine temporal processing abilities. We also measured temporal processing of the dynamic F0 waveform from continuous speech stimuli to better capture temporal processing abilities in more naturalistic listening conditions, as speech is rarely comprised solely of static F0. Finally, we examined neural encoding of linguistic pitch accents. Pitch accents are distinct changes in F0 that signal crucial prosodic and communicative information from the speaker for speech perception. The extent to which temporal processing of F0 impacts higher level linguistic encoding of pitch accents remains unclear. We found that temporal processing of both static and dynamic F0 were reduced in middle-age compared to younger adults, even though comprehension of continuous speech remained intact. Averaged EEG responses to pitch accents were classified using a convolutional neural network, EEGnet. Pitch accent classification was not different between younger and middle-aged adults. However, the model entropy, which can be interpreted as the uncertainty for classifying pitch accents, was significantly higher for middle-aged adults. Additionally, age groups showed distinct topographical differences in electrodes relevant for pitch accent classification, with middle-aged adults showing more relevance distributed across the parietal region, while younger adults showed more temporally-oriented topography. Together, these results indicate that reduced temporal processing of F0 may introduce noise into the auditory system that impacts higher-level pitch accent encoding in middle-age. To overcome pitch encoding difficulties, middle-aged adults may exert more listening effort or employ additional strategies to maintain speech perception performance. These compensatory mechanisms likely require the recruitment of additional cortical regions for speech perception and may lead to greater self-perceived listening difficulties in middle-age, even in the absence of hearing loss.
Ryan M. Calmus, Zsuzsanna Kocsis, Joel I. Berger, Hiroto Kawasaki, Timothy D. Griffiths, Matthew A. Howard and Christopher I. Petkov
Fri, 10/4 4:15PM - 6:00PM | B62
Abstract
To understand how the human auditory-cognitive system establishes internal models of the world, there is substantial interest in identifying neuronal signals that carry traces of the auditory sensory past and predictions about the future. However, outside of animal models, we lack insights into how site-specific neurophysiological activity within the auditory cortical mnemonic system carries information reflecting maintenance activity to sounds over delays, and on the signals that reflect prospective preplay or retrospective replay of a learned sensory sequence. To study these signals in a controlled manner, we conducted an auditory statistical learning task with a cohort of neurosurgery patients during presurgical intracranial monitoring of refractory epilepsy. Patients listened to perceptual sequences of 3 nonsense words containing a dependency between two sounds in each sequence. Words were drawn from sets (X, A and B), with regularities between pairs of relevant sounds (A-B) often separated in time by uninformative (X) words, forming either an adjacent or non-adjacent dependency. We first analyzed site-specific single-unit activity and local field potentials (LFPs) from auditory cortex, hippocampus and frontal cortex using traditional methods, demonstrating engagement of fronto-temporal auditory sites and the hippocampus in the processing of the sequencing regularities. In addition to univariate analyses, a novel multivariate decoding analysis applied to both single-unit and LFP responses revealed evidence of auditory hippocampal replay, suggesting that time-compressed replay occurs after key sounds in the sequence. Building on these findings, we characterized a variety of single unit responses that, in concert with our decoding analyses, provide evidence for a distributed neural code underlying prospective and retrospective auditory sequence item representation in the human hippocampus. Our results elucidate critical roles for the auditory mnemonic system in transforming sensory events into mental structures, providing insights into the single-neuron and mesoscale contributions to the maintenance and replay of sequential information in the human brain.
Yunshu Li, Victoria Figarola, Abigail Noyce, Adam Tierney, Ross Maddox, Fred Dick and Barbara Shinn-Cunningham
Fri, 10/4 4:15PM - 6:00PM | B63
Abstract
Auditory selective attention, the ability to focus on specific sounds while ignoring competing sounds, enables communication in complex auditory environments. Previous studies have demonstrated that attention strongly modulates cortical representations of sound, but whether and where this modulation occurs in subcortical structures remains unclear. Here, we use electroencephalography (EEG) to record event-related potentials (ERPs), an index of cortical activity, as well as auditory brainstem responses (ABRs, subcortical responses to sound) during a selective listening task. Using a previously developed paradigm (Laffere et al 2020; 2021), subjects attend to a 3-note melody presented to one ear in one range of pitches while ignoring an interleaved, competing melody played to the other ear in a different pitch range. To test brainstem attention modulation, we utilized pitch-evoking pseudo-tones formed by convolving a periodic impulse train with a tone pip, after Polonenko et al (2019; 2021). With these stimuli, each individual tone pip within a pseudo-note elicits one ABR, while the pseudo-note onset elicits a strong cortical response. An earlier version of this combined paradigm presented notes at a rate of 4 Hz; however, overlap of the cortical ERPs from each note hindered quantification of cortical responses. In this current study, we present the stimuli at an across-melody presentation rate of 3 Hz so that cortical responses are temporally isolated, allowing us to better assess cortical neural activity. Initial results produced clear cortical ERPs to each note and subcortical ABRs to each pip. From the cortical responses, we analyzed how attention modulated both the ERP phase and the inter-trial phase coherence (ITPC) at 1.5 Hz. With the slower stimulus presentation, we could see that attention enhanced the ERP evoked by the note onset when it was the “target” stream (as quantified by the cortical P1-N1 peak difference). Additionally, the ITPC showed peaks at the within-melody repetition rate of 1.5 Hz. Importantly, the best performing listeners showed nearly a 180 degree phase separation between conditions. Similar to results from the companion study using the faster stimulus rate, we also found robust ABRs evoked by each tone pip. Consistent with our earlier results, we also see a post wave V peak in the ABR that is modulated by attention. By simultaneously recording ERPs and ABRs, these results allow us to track attention-mediated changes in the neural signals in the cortex and brainstem, respectively.
Ja Young Choi, Shengyue Xiong, Jacie McHaney and Bharath Chandrasekaran
Fri, 10/4 4:15PM - 6:00PM | B64
Abstract
Spoken language contains not only the message that the speaker is trying to convey but also a lot of information about the person speaking. Previous studies have established that processing talker-related information and processing linguistic information are closely intertwined. One of the phenomena that shows this relationship between talker information and linguistic processing is language familiarity effect, where listeners are better able to identify talkers who speak in the listeners’ native language than talkers who speak in a language unfamiliar to the listeners. Here, we investigate the neural signatures of language familiarity effect by measuring pupillary responses to identifying talkers’ voices in native and non-native languages, as pupillometry provides a well-established measure of processing effort, with greater pupil dilation indicating more processing demands and surprisal. Native English-speaking adults listened to audio recordings of 10 English sentences and 10 Mandarin sentences, spoken by 4 native English speakers and 4 native Mandarin speakers respectively. The participants learned to identify talkers over 4 training blocks of 40 trials each and were tested on a new set of sentences spoken by the same talkers in the final generalization test block. In each trial, listeners identified the talkers by pressing the number key that corresponds to each of the 4 voices in each language, and they were given feedback in the form of “correct” or “incorrect”. During the task, their left-eye pupil sizes were recorded at 1000 Hz. For each trial, pupil size data from the stimulus onset to 4000 ms after stimulus offset was analyzed after being normalized against each participant’s baseline pupil size prior to stimulus onset. Participants’ talker identification accuracies and response times were also analyzed using drift diffusion models (DDMs), which take into account both accuracy and response time data so that we can integrate the two different performance metrics to examine the dynamic decision-making process underlying the task of the experiment. The behavioral results showed that listeners identified voices in their native language significantly more accurately than voices in unfamiliar language, replicating previous studies’ results. Accuracy did increase from the first training block to the last training block in both languages over the 4 blocks, but the accuracy difference between the two languages persisted throughout the last block and the generalization block, suggesting the robust advantage of language familiarity. DDM results showed a faster evidence accumulation rate and a lower decision threshold in English compared to Mandarin, demonstrating the inefficiency of gathering relevant sensory information and increased cautiousness in decision-making in unfamiliar language. Pupillometry data showed a greater pupil dilation in Mandarin than in English in the earlier blocks of training while the pupil size difference between the two languages diminished in the later blocks. This reduction in pupil size difference may indicate a learning effect, where repeated exposure to the non-native language reduces the cognitive load required for processing. This pattern aligns with theories of neural plasticity, where the brain becomes more efficient in processing unfamiliar exemplars with increased exposure and practice. The pupillometry results reflect the behavioral results as greater pupil dilation is associated with more difficulty in learning, possibly due to more arousal and more cognitive effort. These results demonstrate the possibility of using pupillometry and drift diffusion models to probe the cognitive processes underlying voice identification in familiar and unfamiliar languages, highlighting the intricate interplay between language familiarity and cognitive effort and how it manifests as physiological change.
Gang Xiao, Daniel Llano, Brandon Li and Kush Nandani
Fri, 10/4 4:15PM - 6:00PM | B65
Abstract
Sensory perception can vary depending on different higher-level contexts, such as behavioral contexts. It has been shown that task engagement can modulate the activity of neurons throughout the mammalian brain in many sensory modalities, including the auditory system. The midbrain integration hub of the auditory pathway, the inferior colliculus (IC), was found to have task-engagement-related modulation. The dorsal nucleus of IC received enormous top-down corticocollicular projections from the auditory cortex and these axons and their target neurons might be involved in the top-down processing of auditory response, including the behavioral modulation. In this study, we used 2-photon microscopy to study the neural activity in the dorsal IC while the mice either performing a sound discrimination task or listening to the sound passively. We found that the neural activity of neurons in IC was modulated by the task-engagement with more than 50% of them in an increasing manner while there was no spatial pattern observed. The population neural activity in dorsal IC could also indicate the task status of the mouse.
Lorenzo Mazzaschi, Andrew J. King, Ben D. B. Willmore and Nicol S. Harper
Fri, 10/4 4:15PM - 6:00PM | B66
Abstract
Much of the activity of the brain that has been recorded in response to dynamic natural sounds, and consequently the processing that gives rise to it, remains to be explained. Encoding models consisting of simple spectrotemporal filters and feedforward neural networks have had some success at predicting the responses of neurons in ferret auditory cortex to natural sounds. These include the spectrotemporal receptive field (STRF), the linear-nonlinear (LN) model, the network receptive field (NRF), and convolution-based neural networks. However, the proportion of the explainable variance in cortical responses to natural sounds that can be captured by the current best models is still under 50%. In addition, these models all require the use of unrealistically long delay lines to reach their best levels of prediction performance. We set out to improve on these models by introducing recurrent connections. Recurrency provides a form of memory, and, in principle, enables models to capture neural sensitivity to sequential dependencies, the latter being central to most natural sounds. Of particular interest to us was gated recurrency. Gates are sigmoidal functions that learn to recognize relevant information within memory and incoming data, controlling its flow through time. As a result, gated recurrent neural networks are well-suited for modeling long sequences. We explored two gated recurrent architectures, based on Long Short-Term Memory (LSTM) units and Gated Recurrent Units (GRU). We found that both of these network architectures offer better neural prediction performance in awake ferret primary auditory cortex than comparable feedforward neural networks, while also eliminating the latter’s need for unrealistically long delay lines. Examination of the behavior of the gated recurrent models suggests some of the improvement they provide may be due to better capturing neural responses to silence. In particular, some neurons that benefit from the use of gated recurrency seem to be sensitive to the duration of periods of silence within natural sounds. We also found that some neurons were well-correlated with the activity of gates. Whether gating-like mechanisms actually exist in the brain, or the model gating approximates some other memory process, remains to be determined.
Muneshwar Mehra, Amber Kline, Michellee Garcia and Hiroyuki Kato
Fri, 10/4 4:15PM - 6:00PM | B67
Abstract
Atypical behavioral responses to sensory inputs are observed in 60-96% of individuals with autism spectrum disorder (ASD). A dominant theory, informed by human functional imaging, suggests that sensory symptoms in ASD result from local hyperconnectivity and long-range hypoconnectivity within the neocortical network. This reduced long-range connectivity is considered to impair top-down modulation of sensory cortices, leading to inflexible sensory processing. However, the neuronal circuit-level consequences of this long-range hypoconnectivity remain poorly understood. In this study, we address this gap in knowledge using a mouse model for Angelman Syndrome (AS), which exhibits high ASD comorbidity. We conducted in-vivo extracellular recordings from the primary auditory cortex (A1) in AS and control mice during a sound offset-lick detection task that requires sustained attention. During the learning phase, AS mice demonstrated impaired performance, characterized by slower learning curves and prolonged lick latencies than their control littermates. On the recording day, local field potential (LFP) analysis of spontaneous activity in A1 showed significantly higher low-frequency (delta and theta) power in AS mice. During behavioral engagement, wild-type mice exhibited suppression of low-frequency oscillation power, indicative of effective top-down regulation of A1 sensory processing. In contrast, AS mice showed diminished behavior-dependent modulation. Auditory brainstem response (ABR) and current source density (CSD) analyses indicated no significant differences in peripheral sensory inputs between AS and control groups, further supporting the impairment in top-down control. Our ongoing research aims to link these electrophysiological findings with poor behavioral performance in AS mice. Together, these results bridge our understanding of ASD between the anatomical and perceptual levels, potentially guiding the identification of therapeutic targets to normalize altered sensory processing.
Kyle Rupp, Jasmine Hect and Taylor Abel
Fri, 10/4 4:15PM - 6:00PM | B68
Abstract
Humans rapidly and seamlessly categorize sounds in their environment, with voice representing a special category of sound due to its social and behavioral importance. Converging evidence from imaging and electrophysiology studies suggests a specialized network for voice processing exists in higher-order auditory cortex. However, it is unknown where and when in the cortical auditory hierarchy voice category-level representation (i.e. encoding) emerges. We recorded from intracerebral electrodes implanted in the context of epilepsy surgery while 20 patient-participants listened to voice and non-voice sounds. We built encoding models with both acoustic features and a binary category feature to indicate whether a sound was a human vocalization. Comparing this full model to a nested model that included only acoustic features, we used model improvement to characterize the strength of voice encoding at individual recording sites. We found that voice encoding strength improved across the auditory hierarchy from core to lateral belt to parabelt regions. Next, we used a sliding window approach to investigate how voice encoding evolves across time. First, we found that relative to stimulus onset, encoding models peaked early in core (~90 ms) and lateral belt (~110 ms) and substantially later in parabelt (~360 ms). Second, we found that in core and lateral belt, there was no difference between acoustic-only and full models through this early peak, suggesting that these areas strongly encode only acoustic features during the initial stage of sound processing. Third, the full model performed substantially better in lateral belt starting around 120 ms, suggesting voice category-level encoding emerges around this time. Parabelt sites exhibited strong voice encoding that emerged around 160 ms, with a modest early acoustic encoding peak around 130 ms. Lastly, a weak category encoding emerged late in core (~200 ms), suggesting a potential role of feedback between areas. Together, these results suggest that auditory cortex engages in a dynamic interplay of acoustic and voice-category feature representation both between and within areas, elucidating the complex spatiotemporal dynamics of voice encoding in auditory cortex.
Yaneri A. Ayala, Yuki Kikuchi, Ryan M. Calmus, Joel I. Berger, Christopher K. Kovach, Hiroto Kawasaki, Timothy D. Griffiths, Matthew A. Howard III and Christopher I. Petkov
Fri, 10/4 4:15PM - 6:00PM | B69
Abstract
A fundamental aspect of human communication and cognition is the capability to extract ordering relationships between auditory events in a sequence. Studies from our group identified segregable neural processes in human and monkey auditory cortex, differentially sensitive to both high- and low-probability transitions following incidental auditory statistical learning (Kikuchi et al, 2017 PLoS Biology). However, the prior single-neuron data were only possible to obtain in monkeys. Here, we report human single-unit activity from multiple brain areas including auditory cortex and hippocampus recorded across eleven epilepsy patients during clinical intracranial monitoring. The patient participants were first exposed to the statistical learning paradigm, listening to regularities in the ordering relationships in the sequences of nonsense words. Afterwards in a testing phase, they listened either to sequences consistent with the ordering regularities heard during the exposure phase (high-probability transitions) or to those containing sequencing violations (low-probability transitions). Expectedly, strongly driven responses to each sound in the sequence were observed in Heschl’s gyrus and to a lesser extent in the hippocampus and other areas. We analytically contrasted the response to acoustically matched elements within the sequences that only differed in their prior sequencing context (e.g., high- or low-probability transitions). Neuronal signals stronger to high-probability transitions were categorized as ‘prediction weighted’ (Pw) signals and those to low-probability transitions as ‘prediction-error weighted’ (PEw). We observed human single-unit sensitivity to both types of signals, not only in the auditory cortex where they were first found in monkeys, but also in the broader system. Most of the auditory cortical neurons exhibited PEw than Pw signals, while these proportions were reversed in the hippocampus and other brain areas. Further work is underway to disentangle these signals in neuronal population spaces and to study the inter-areal interconnectivity via spike-field coherence. The results indicate that incidental auditory statistical learning differentially engages a broad neuronal network in sequencing prediction and prediction error processing.
Stuart Washington, Kyle Shattuck, Jan Steckel, Herbert Peremans, Elisabeth Jonckers, Rukun Hinz, Tom Venneman, Monica Van den Berg, Lisbeth Van Ruijssevelt, Thomas Verellen, Dominique Pritchett, Jan Scholliers, Sayuan Liang, Sönke Von der Berg, Stephen Savoia, Partha Mitra, Stephen Lin, Paul Wang, Marleen Verhoye, Annemie Van der Linden, Karl-Heinz Esser and Georgios Keliris
Fri, 10/4 4:15PM - 6:00PM | B70
Abstract
Echolocating bats live huddled together in colonies comprising hundreds of individuals and use complex sounds to communicate and to navigate. These highly social and vocal species make ideal subjects for functional magnetic resonance imaging (fMRI) studies of auditory social communication given their relatively hypertrophic limbic and auditory neural structures and their reduced ability to hear MRI gradient noise. Establishing the existence of neural networks related to social cognition (e.g., default mode-like networks or DMLNs) in order Chiroptera could pave the way towards a new frontier in the study of mammalian socialization and communication. We measured blood oxygenation level dependent (BOLD) signal at 7T from nine lightly anesthetized pale spear-nosed bats (Phyllostomus discolor). Specifically, we performed independent components analysis (ICA) and revealed 15 resting-state networks. We also measured neural activity elicited by noise ripples (on: 10 ms; off: 10 ms) that span the ultrasonic hearing range (20-130 kHz) of this species. Resting-state networks intersected parietal, occipital, and auditory cortices, along with auditory brainstem, basal ganglia, cerebellum, and hippocampus. We determined that two out of a possible four midline networks were the best candidates for DMLN. We also found two predominantly left and two predominantly right auditory/parietal cortical networks. Regions within all four auditory/parietal cortical networks have been demonstrated to respond to social calls. As expected by the emergence of side-band inhibition in the inferior colliculus, ultrasonic noise ripples significantly activated the auditory brainstem (NOISE>SILENCE- cluster-level: p=5.27 x 10-5, FWE correction, kE=7613) yet deactivated the auditory/parietal cortex (SILENCE>NOISE- cluster-level: p=2.08 x 10-9, FWE correction, kE=17452). Iterative (“jack knife”) analyses revealed consistent, significant functional connections between left, but not right, auditory/parietal cortical networks and DMLN nodes, especially the anterior-most cingulate cortex. Thus, a resting-state network implicated in social cognition displays more distributed functional connectivity across left, relative to right, hemispheric cortical substrates of audition and communication in this echolocating bat species. The application of advanced histological methods to 12 ex-vivo Phyllostomus discolor brain samples that have also undergone structural imaging (i.e., T2-weighted 3D rapid spin echo and spin-echo diffusion weighted) increase the likelihood of generating detailed, 3D population-based atlases as a computerized anatomical reference for these and future chiropteran functional neuroimaging results.