"Say we have sampled 10 points, and want to find the equation of the straight line that passes through all 10 points. The equation of a line has two parameters: $y = a_0 + a_1 x$. Each sampled data point can be used to give us an equation that could be used to find what the parameters of a straight line passing through that point are. But, we only have 2 parameters, but will obtain 10 equations from our sampled points. This means that our system is overdetermined. The chance that 10 sampled points will fall on the same line is effectively zero unless they have been chosen to do so (and so are not in fact sampled), so there is no way to solve this exactly.\n",

"Say we have sampled 10 points, and want to find the equation of the straight line that passes through all 10 points. The equation of a line has two parameters: $y = a_0 + a_1 x$. Each sampled data point can be used to give us an equation that could be used to find what the parameters of a straight line passing through that point are. We only have 2 parameters, but will obtain 10 equations from our sampled points. This means that our system is overdetermined. The chance that 10 sampled points will fall on the same line is effectively zero unless they have been chosen to do so (and so are not in fact sampled), so there is no way to solve this exactly.\n",

"\n",

"Instead, we want to do the best we can and find the line that passes most closely to each of the sampled points. There are a number of ways we could choose to do this. The most commonly used approach is to find the deviation from the line at each of our sampled points, and find the set of parameters that minimize the total of the squares of these deviations, hence the term least squares fitting. This particular approach was first used by Gauss in determining the orbits of comets, and he showed the the least squares estimate coincides with maximum likelihood estimates for independent normally distributed errors.\n",

"\n",

...

...

@@ -52,7 +52,9 @@

{

"cell_type": "code",

"execution_count": null,

"metadata": {},

"metadata": {

"collapsed": true

},

"outputs": [],

"source": [

"# Let's create some M random data points as we did for the interpolation\n",

"When we fill in our x and y values in the expression for $D$, so we'll have two equations in two unknowns ($a_0$ and $a_1$). This is a linear system which we have already learned how to solve.\n",

"When we fill in our x and y values in the expression for $D$, we'll have two equations in two unknowns ($a_0$ and $a_1$). This is a linear system which we have already learned how to solve.\n",

"\n",

"Gathering the coefficients and simplifying (leaving out the summation limits to declutter) we get\n",

"$$ M a_0 + a_1 \\Sigma x_i = \\Sigma y_i $$\n",

...

...

@@ -88,7 +90,9 @@

{

"cell_type": "code",

"execution_count": null,

"metadata": {},

"metadata": {

"collapsed": true

},

"outputs": [],

"source": [

"# Recall we know how to solve the linear system Ax=b\n",

...

...

@@ -186,7 +190,9 @@

{

"cell_type": "code",

"execution_count": null,

"metadata": {},

"metadata": {

"collapsed": true

},

"outputs": [],

"source": [

"# We need to first get our data in the form expected by our function.\n",

...

...

@@ -316,7 +322,9 @@

{

"cell_type": "code",

"execution_count": null,

"metadata": {},

"metadata": {

"collapsed": true

},

"outputs": [],

"source": [

"# Let's reproduce our fit with np.polyfit.\n",

...

...

@@ -421,7 +429,9 @@

{

"cell_type": "code",

"execution_count": null,

"metadata": {},

"metadata": {

"collapsed": true

},

"outputs": [],

"source": [

"# First let's load our data and plot it.\n",

...

...

@@ -452,7 +462,9 @@

{

"cell_type": "code",

"execution_count": null,

"metadata": {},

"metadata": {

"collapsed": true

},

"outputs": [],

"source": [

"# We'll need some initial guesses for the fit.\n",