Psych 129 - Sensory Processes
Auditory grouping
Why grouping?
- At any given moment, the air molecules around us are in constant motion that is caused by the movements and vibrations of objects in the environment. Typically, more than one object, or sound source, will be contributing to these fluctuations. The tympanic membrane simply moves back and forth according to the net fluctuations in air pressure - i.e., the fluctuations in air pressure due to different sound sources are combined together into a single waveform at the tympanic membrane. The challenge that the brain and auditory system face is to separate this waveform into its different constituent components, in order to form an explicit representation of the actual sound sources present in the environment. This is a very difficult problem.
- The first stage in sound source separation is the frequency analysis performed by the basilar membrane. This allows for an explicit representation of a sound in terms of the different frequencies of oscillation embedded within it. Thus, a high frequency sound source and a low frequency sound source would be easily separated by the tonotopic representation provided by hair cells and/or the auditory nerve.
- However, many sound sources are fairly broadband, meaning that they contain a broad range of frequencies. Moreover, different sources often overlap in their spectra, meaning that they will excite common portions of the basilar membrane. Thus, it is often impossible to separate sound sources based purely on the tonotopic representation provided by the cochlea. Additional processing must be performed in order to separate sounds with overlapping spectra.
Grouping rules
- We know from psychophysical experiments that the brain uses a number of different grouping rules to attribute the different components of a sound to a common source. Some of the known rules are
proximity in frequency - group together sound components in the same frequency range
harmonic content - group together sound components whose frequencies occur in integer multiples
common time course - group together harmonics with common onset or common temporal fluctuations
location - group together sound components coming from the same location in space, for example via ITD and IID cues.
Auditory cortex
- Where in the brain the above grouping rules are implemented is currently unknown. A likely candidate though is the auditory cortex.
- Our knowledge of the response properties of neurons in the auditory cortex is quite scant, but what we do know is that these neurons exhibit frequency tuning, similar to auditory nerve fibers (although somewhat narrower), and that the neurons are organized spatially within the cortex so as to form a tonotopic representation.
- Neurons in the auditory cortex also exhibit plasticity, meaning that they change their response properties (i.e., tuning curves) with experience. Experiments with monkeys trained to discriminate between tones within a certain frequency range 3000-3010 Hz, show that the tonotopic representation expands around that range of frequencies. Meaning that more neurons are devoted to representing a given frequency range according to task demands.
Speech
- Speech is produced by air being compressed through membranes in the throat, creating vibrations that are filtered by resonating structures within the mouth and nasal cavity. As the mouth is moved in different configurations, the sound is filtered differently, which changes the harmonic content or timbre.
- A common method of analyzing speech is with a spectrogram, which displays the frequency content of a sound as a function of time.
- The spectrogram of a speech utterance shows that spoken words are composed of distinct frequency elements, or formants.
- The spectrogram may be parsed in time into phonemes, which are considered the smallest indivisible (atomic) units of speech.
- Most computer algorithms that perform speech recognition attempt to extract phonemes from the spectrogram and then piece these phonemes together into words. This is a very difficult and still unsolved problem. The brain is the only device we know of that can do this.