NPB 261B - Lab #2
Due Monday, February 21
In this assignment you will investigate the power spectrum and
higher-order statistics of natural images, and you will examine some models
for how neurons in the visual system may have been adapted to this structure.
Power spectrum
The first step is to measure the power spectrum of natural images. Get
some images, either with a digital camera, or over the internet, etc. Then,
load an image into Matlab using the imread command, which can
read images of many different formats. To see how to use the
imread command and what types of images it can read, type help imread.
For example, to read the einstein image from the last assignment
you type
im=double(imread('einstein.jpg'));
It will be easiest for this exercise if the image is square, so if its
not already square you should truncate the largest dimension. Now,
to compute the power spectrum of the image you use the Fourier transform,
which converts a function in the space domain to a function in the frequency
domain. You can compute the Fourier transform of an image in Matlab
using fft2 as follows:
imf=fftshift(fft2(im));
The function fftshift is needed to put the DC component (frequency
= 0) in the center. The power spectrum is just the square of the modulus
of the Fourier transform, which is obtained as follows:
impf=abs(imf).^2;
To display the power spectrum you will need to define some frequency coordinates.
If the image is of size N, then the frequencies run from -N/2 to N/2-1
(assuming N is even):
f=-N/2:N/2-1;
Then display the log power spectrum using imagesc:
imagesc(f,f,log10(impf)), axis xy
You will note that the power falls off with frequency as you move away
from the center. Ignore the vertical and horizontal streak for now
- its an artifact due to the boundaries of the image. Now, to get a
better idea of how the power falls off, we can do a rotational average of
the power spectrum:
Pf=rotavg(impf);
Now define a frequency coordinate from 0 to N/2:
f1=0:N/2;
and plot the rotationally averaged spectrum on a log-log plot:
loglog(f1,Pf)
You should see that the power tends to fall as 1/f2
- approximately with a slope of -2 on a log-log plot. You should try
doing this for a couple different images. Try to find an image which deviates
strongly from this trend and see if you can explain what's happening.
Whitening
Now let's whiten the image by designing a filter in the frequency domain
that will flatten the spectrum of a natural image (on average). Since
the power spectrum tends to fall as 1/f2, then the amplitude
spectrum will fall as 1/f. So we want to design the amplitude
spectrum of our filter so that it rises linearly with frequency, to compensate
for the 1/f amplitude spectrum of natural images. To do this
we need to first define a grid of frequency coordinates:
[fx fy] = meshgrid(f);
Then convert to polar coordinates:
[theta rho]=cart2pol(fx,fy);
The filter is then
filtf = rho;
Plot the filter using
imagesc(f,f,filtf), axis xy
and you will see that it is a function that simply rises with frequency,
independent of orientation. But remember that due to noise, we don't
want to whiten all the way to the highest spatial-frequencies. Also,
because we are working in rectangular coordinates, there is a larger range
of spatial frequencies along the diagonals, so we'd like to contrict the
whitening function to lie within a constant radius just short of the Nyquist
frequency. To do this, we multiply by a circular, Gaussian filter as
follows:
filtf = rho.*exp(-0.5*(rho/(0.7*N/2)).^2);
Now plot out the filter again and you will see that it rises linearly to
a point and then begins to fall again. Take a rotational average and
plot as a function of one dimension and you will see more clearly how the
filter is shaped. To see what the filter looks like in the space domain,
you need to calculate the inverse Fourier transform:
filt = fftshift(real(ifft2(fftshift(filtf))));
You can view the filter as follows:
imagesc(filt,[-1 1]*max(filt(:)))
axis image, colorbar
You will need to zoom into the center with the magnifying glass to see the
filter. Note that it is basically a center-surround type filter. The
inhibitory surround may be a bit hard to see because they are small negative
numbers, but its basically analogous to what is found in the retina and LGN.
To get the full scoop on this story you should read the following paper:
Atick JJ (1992) Could information theory provide an ecological
theory of sensory processing? Network, 3, 213-251
Now let's see what happens when we apply our whitening filter to the image.
Remember that convolution in the space domain is multiplication in the
frequency domain, so we simply multiply the spectrum of the image with the
spectrum of the filter and take the inverse fft as follows:
imwf = filtf.*imf;
imw = ifft2(fftshift(imwf));
View the image
imagesc(imw), axis image
colormap gray
And look at its spectrum
Pwf=rotavg(abs(imwf).^2);
loglog(f1,Pwf)
Note that the spectrum is now pretty much flat, with simply a lowpass roll-off
at the high end of the spectrum.
There's one thing left to do: contrast normalization. It has been
shown that neurons in the retina and LGN already do some form of local contrast
normalization. One simple model of this process is to divide the output
of a neuron by some measure of the total activity in the neighborhood - for
example, standard deviation. Here's one way of doing this using a Gaussian
neighborhood with a diameter of 16 sample nodes:
D=16;
[x y] = meshgrid(-D/2:D/2-1);
G = exp(-0.5*((x.^2+y.^2)/(D/2)^2));
G = G/sum(G(:));
imv = conv2(imw.^2,G,'same');
imn = imw./sqrt(imv);
Now look at the contrast normalized image imn. You will note
that it brings out many of the details in the image.
Sparse coding
You will see that the whitened image, constrast normalized image above that
it is far from being structureless. The reason is that whitening is
simply a process for removing pairwise, linear correlations.
Structures such as lines and edges are characterized by their higher-order
statistics - i.e., non-linear statistical dependencies among three or more
pixels - and so these forms of structure will remain even after whitening.
So the idea is that the cortex focusses on representing these higher-order
forms of structure that are contained in the images coming from the LGN. One
way of learning higher-order structure is through sparse coding -
i.e., by finding a lossless representation of the image in which only a few
neurons need to be active at any given point in time. It is closely
related to independent components analysis (ICA) and projection
pursuit. Here we will follow the connection to projection pursuit.
The goal of projection pursuit is to find a low dimensional projection of
the data that yields a non-Gaussian distribution. In general, any random
projection of the data will yield a Gaussian distribution. But when
the linear subspace defining the projection is aligned with the structure
of the data, then the projection will be non-Gaussian. Sparse coding
can be thought of as a special case of projection pursuit in which the form
of non-Gaussianity is sparse - i.e., a distribution peaked at zero with heavy
tails with respect to a Gaussian. This means that the value of the projection
is typically around zero, but not in the trivial sense that it simply has
low variance. A population of neurons with such distributions would
exhibit a sparse code, because only a small fraction of the neurons will
be active.
We can think of a cortical simple cell as computing a one-dimensional projection
of the data if its instantaneous activity is essentially the result of computing
the inner-product between its weight vector (receptive field) and the image.
So, let's try designing different receptive fields for a model cortical
neuron and see what their projections look like. For starters, let's
make the receptive field an array of weight values with the same size D
as the neighborhood defined above for contrast normalization. We
can simulate the output of such a neuron when the image is scanned over its
receptive field by convolving the rf with the image. For example, consider
a neuron with initially random receptive field weights, w:
w = randn(D);
w = w/sqrt(sum(w(:).^2)); % normalizes the weight vector
w = w - mean(w(:)); % zero mean
You can do the convolution as follows:
imfilt = conv2(imn,w,'same');
Note that we use the contrast-normalized, whitened image, because we are
simulating a weighted sum of LGN inputs to the cortex. Look at the resulting
image as viewed through the rf of this neuron using imagesc. Then
compute the histogram using the hist function:
pvar=[-1:.01:1]*max(abs(imfilt(:)));
H=hist(imfilt(:),pvar);
Then compute a Gaussian of the same variance
sigma=std(imfilt(:));
mu=mean(imfilt(:));
P=exp(-0.5*((pvar-mu)/sigma).^2);
P=P/sum(P);
and plot both on a semilog plot as follows:
semilogy(pvar,H/N^2,pvar,P,'k--')
(You may have to use the axis command to get the dynamic range
right.) You should see that with random weights the distribution is
more or less Gaussian - i.e., an upside-down parabola.
A random projection of the data (random receptive field) yields a Gaussian
distribution, as predicted from projection pursuit. So, what kind of
receptive field yields a non-Gaussian distribution? Let's try a Gabor
filter model of a cortical simple cell. A Gabor filter is simply a Gaussian
modulated sinusoid. We can make a vertically oriented Gabor as follows:
sigma_x = D/8;
sigma_y = D/8;
f_x = 1/(D/4);
g = exp(-0.5*((x/sigma_x).^2 + (y/sigma_y).^2)).*sin(2*pi*f_x*x);
g = g/sqrt(sum(g(:).^2)); % normalization
You can see what this looks like using imagesc. Then convolve
with the contrast normalized image:
imfilt = conv2(imn,g,'same');
Now look at the resulting image as viewed through the rf of this neuron,
and plot the histogram and compare to a Gaussian as before. You should
see that it is now distinctly non-Gaussian, and the specific way in which
it is non-Gaussian is such that it is peaked at zero with heavy tails. Another
way of saying this is that the neuron spends more time at zero than expected
from a Gaussian distribution. Try different parameter settings for the
Gabor filter to see if you can either accentuate or degrade the degree of
non-Gaussianity (sparsity) exhibited the model neuron.
For further reading on these issues see
Simoncelli EP, Olshausen BA (2001) Natural Image
Statistics and Neural Representation. Annual Review of Neuroscience,
24, 1193-1215.
http://www.cns.nyu.edu/pub/eero/simoncelli01-reprint.pdf
What you should turn in
You should turn in a lab writeup that contains:
- A set of images and their corresponding power spectra, with at least
one that demonstrates strong deviation from the 1/f2 trend,
along with an explanation of why.
- The whitened, contrast-normalized image computed for a "good" 1/f2 image
above. Use this same image for parts 3-5 below.
- The resulting response histogram, plus super-imposed Gaussian fit,
obtained using the random receptive field model.
- The resulting response histogram, plus super-imposed Gaussian fit,
obtained using a Gabor filter for which you have optimized the parameters
for sparsity. Show the Gabor and report what parameters you used.
- Same thing for Gabor filter parameters that minimize sparsity.
Make sure to use subplot to put multiple plots in one figure. Don't
turn in one plot per page. Make your report legible.