Deciphering the transformation of sounds into meaning: Insights from disentangling intermediate representations in sound-to-event DNNs

01 December 2025, Version 1
This content is an early or alternative research output and has not been peer-reviewed by Cambridge University Press at the time of posting.

Abstract

Neural representations estimated from functional MRI (fMRI) responses to natural sounds in non-primary auditory cortical areas resemble those in intermediate layers of deep neural networks (DNNs) trained to recognize sounds. However, the nature of these representations remains poorly understood. In the current study, a convolutional DNN (YAMNet), pre-trained to map sound spectrograms to semantic categories, is used as a computer simulation of the human brain’s processing of natural sounds. A novel sound dataset is introduced and employed to test the hypothesis that sound-to-event DNNs represent basic mechanisms of sound generation (here, human actions) and physical properties of the sources (here, object materials) in their intermediate layers. Systematic changes to those latent representations are made with the help of a disentangling flow model. The manipulations are shown to cause a predictable effect on DNN’s semantic output. By demonstrating this mechanism in silico, the current study paves the way for neuroscientific experiments aiming to verify it in vivo. Code available at https://github. com/TimHenry1995/LatentAudio.

Keywords

Machine learning
Latent space disentanglement
YAMNet
Auditory processing
Invertible neural network
Normalizing flow

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting and Discussion Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.