NSF Funded Research
Collaborative Research: Hierarchical Models of Time-Varying Natural Images (Bruno A. Olshausen/David K. Warland, PIs)
funded under IIS-0625717, 6/1/06-5/31/07
continued under RI-0705939
The overarching goal of this project is to advance the state of the art in image analysis and computer vision by building models that capture the robust intelligence exhibited by the mammalian visual system. Our approach is based on modeling the structure of time-varying natural images, and developing model neural systems capable of efficiently representing this structure. With this approach we aim to shed light on the underlying neural mechanisms involved in visual perception while at the same time bringing these models to bear on practical problems in image analysis and computer vision.
The models that are to be developed will allow the invariant structure in images (form, shape) to be described independently of its variations (position, size, rotation). The models are composed of multiple layers that capture progressively more complex forms of scene structure in addition to modeling their transformations. Mathematically, these multi-layer models have a bilinear form in which the variables representing shape and form interact multiplicatively with the variables representing position, size or other variations. The parameters of the model are learned from the statistics of time-varying natural images using the principles of sparse and efficient coding.
The early measurements and models of natural image structure have had a profound impact on a wide variety of disciplines including visual neuroscience (e.g. predictions of receptive field properties of retinal ganglion cells and cortical simple cells in visual cortex) and image processing (e.g. wavelets, multi-scale representations, image denoising). The approach outlined in this proposal extends this interdisciplinary work by learning higher-order scene structure from sequences of time-varying natural images. Given the evolutionary pressures on the visual cortex to process time-varying images efficiently, it is plausible that the computations performed by the cortex can be understood in part from the constraints imposed by efficient processing. Modeling the higher order structure will also advance the development of practical image processing algorithms by finding good representations for image-processing tasks such as video search and indexing. Completion of the specific goals described in this proposal will provide us with mathematical models that can help to elucidate the underlying neural mechanisms involved in visual perception while at the same time providing new generative models of time-varying images that will provide better ways of describing their structure.
The explosion of digital images and video has created a national priority of providing better tools for tasks such as object recognition and search, navigation, surveillance, and image analysis. The models developed as part of this proposal are broadly applicable to these tasks. Results from this research program will be integrated into a new neural computation course at UC Berkeley, presented at national multi-disciplinary conferences, and published in a timely manner in leading peer-reviewed journals. Participation in proposed research is available to both graduate and undergraduate levels, and the PI will advise Ph.D. students in both neuroscience and engineering as part of this project.
Publications arising from this research:
- Rozell CJ, Johnson DH, Baraniuk RG, Olshausen BA (2007) Neurally plausible sparse coding via thresholding and local competition. Neural Computation, (submitted). pdf
- Olshausen BA, Cadieu C, Culpepper J, Warland DK (2007) Bilinear models of natural images. SPIE Proceedings vol. 6492: Human Vision and Electronic Imaging XII, (B.E. Rogowitz, T.N. Pappas, S.J. Daly, Eds.), Jan 28-Feb 1, 2007, San Jose, California. pdf