Yan Karklin
Center for Neural Science, NYU
Efficient coding of images and sounds: Hierarchical processing and biological constraints
Friday 30th of November 2012 at 11:00am
560 Evans
Efficient coding provides a powerful principle for explaining early
sensory processing. Among the successful applications of this theory
are models that provide functional explanations for neural responses
in the primary visual cortex (Bell & Sejnowski, 1995; Olshausen &
Field, 1996) and in the auditory nerve (Smith & Lewicki, 2006). Two
of the challenges facing these models are mapping abstracted
computations onto noisy, nonlinear, multistage neural implementations,
and capturing the complexity of natural signals that necessitates such
hierarchical neural processing. For example, statistical models of
vision often suggest a direct transformation from the image to a set
of oriented features, but this seems to bypass earlier stages -- the
retinal output, organized into intricate mosaics of center-surround
receptive fields -- and ignores their nonlinear response properties.
Auditory models of efficient coding yield filters consistent with the
cochlear output, but little work has been done on learning
hierarchical representations that can explain downstream processing of
complex sounds.
In this talk I will describe two recent projects that address some of
these issues. First, I will show that an efficient coding model that
incorporates ingredients critical to biological computation -- input
and output noise, nonlinear response functions, and metabolic
constraints -- can predict the basic properties of retinal processing.
Specifically, we develop numerical methods for simultaneously
optimizing linear filters and response nonlinearities of a population
of model neurons so as to maximize information transmission in the
presence of noise and metabolic costs. When the model includes
biologically realistic levels of noise, the predicted filters are
center-surround and the nonlinearities are rectifying, consistent with
properties of retinal ganglion cells. The model yields two
populations of neurons, characterized by On- and Off-center responses,
which independently tile the visual space, and even predicts an
asymmetry observed in the primate retina: Off-center neurons are more
numerous and have filters with smaller spatial extent.
In the case of auditory coding, I will present Hierarchical Spike
Coding, a two-layer probabilistic generative model for complex
acoustic structure. The first layer consists of a sparse spiking
representation that encodes the sound using kernels positioned
precisely in time and frequency. Patterns in the positions of first
layer spikes are learned from the data: on a coarse scale, statistical
regularities are encoded by a second-layer spiking representation,
while fine-scale structure is captured by recurrent interactions
within the first layer. When fitted to speech data, the second layer
acoustic features include harmonic stacks, sweeps, frequency
modulations, and precise temporal onsets, which can be composed to
represent complex acoustic events. Unlike spectrogram-based methods,
the model gives a probability distribution over sound pressure
waveforms. This allows us to use the second-layer representation to
synthesize sounds directly, and to perform model-based denoising, on
which we demonstrate a significant improvement over standard methods.
(This is joint work with Chaitanya Ekanadham and Eero Simoncelli.)(video)
Join Email List
You can subscribe to our weekly seminar email list by sending an email to
majordomo@lists.berkeley.edu that contains the words
subscribe redwood in the body of the message.
(Note: The subject line can be arbitrary and will be ignored)