Lab #1 - Matlab and modeling
2. This problem is designed to show the perils of over-fitting data, similar
to what was discussed in class. We provide a simulated dataset of "characteristic
frequency" (cf) and "bandwidth" (bw) for 200 auditory cells. With these data,
"cf" is considered the independent variable (i.e., the x-axis) and "bw" the
dependent variable (y-axis). You can get this data set, hw1.mat, from the
class homepage. You should plot the data first as individual data points.
Here are some commands to get started:
>> load hw1.mat
>> whos
>> plot(cf,bw,'.')
(a) The variables cfsub and bwsub contain a subset of seven points extracted from the total dataset. Create a linear fit to this subsample of the data using the matlab routine polyfit.
(b) Now create a 5th-order polynomial fit to the subsample.
(c) Plot both fits, shown as solid lines, along with the subsample, shown as individual data points, on the same plot. Which of these fits the subsample better?
(d) Now plot both of your function fits from (a) and (b) along with the entire dataset. Which of these fits the entire data better? What's going on?