Search
×

Sign up

Use your Facebook account for quick registration

OR

Create a Shvoong account from scratch

Already a Member? Sign In!
×

Sign In

Sign in using your Facebook account

OR

Not a Member? Sign up!
×

Sign up

Use your Facebook account for quick registration

OR

Sign In

Sign in using your Facebook account

Shvoong Home>Science>Mathematics>LEAST SQUARE METHOD FOR LINEAR REGRESSION Summary

LEAST SQUARE METHOD FOR LINEAR REGRESSION

Article Summary   by:writerwrites     Original Author: WRITERWRITES
ª
 
In regression analysis, least squares, also known as ordinary least squares analysis, is a method for linear regression that determines the values of unknown quantities in a statistical model by minimizing the sum of the residuals (the difference between the predicted and observed values) squared. This method was first described by Carl Friedrich Gauss around 1794, close to the turn of the 19th century (Linear Algebra With Applications, 3rd Edition, by Otto Bretscher) The objective consists of adjusting a model function to best fit a data set. The chosen model function has adjustable parameters. The data set consist of n points with . The model function has the form , where y is the dependent variable, are the independent variables, and are the model adjustable parameters. We wish to find the parameter values such that the model best fits the data according to a defined error criterion. The least sum square method minimizes the sum square error equation with respect to the adjustable parameters . For an example, the data is height measurements over a surface. We choose to model the data by a plane with parameters for plane mean height, plane tip angle, and plane tilt angle. The model equation is then y = f(x1,x2) = a1 + a2x1 + a3x2, the independent variables are , and the adjustable parameters are ., one replaces the relation by where the noise term ε is a random variable with mean zero. Note that we are assuming that the x values are exact, and all the errors are in the y values. Again, we distinguish between linear regression, in which case the function f is linear in the parameters to be determined (e.g., f(x) = ax2 + bx + c), and nonlinear regression. As before, linear regression is much simpler than nonlinear regression. (It is tempting to think that the reason for the name linear regression is that the graph of the function f(x) = ax + b is a line. But fitting a curve like f(x) = ax2 + bx + c when estimating a, b, and c by least squares, is an instance of linear regression because the vector of least-square estimates of a, b, and c is a linear transformation of the vector whose components are f(xi) + εi.
By recognizing that the regression model is a system of linear equations we can express the model using data matrix X, target vector Y and parameter vector δ. The ith row of X and Y will contain the x and y value for the ith data sample. Then the model can be written as which when using pure matrix notation becomes where ε is normally distributed with expected value 0 (i.e., a column vector of 0s) and variance σ2 In, where In is the n×n identity matrix. The least-squares estimator for δ is (where XT is the transpose of X) and the sum of squares of residuals is One of the properties of least-squares is that the matrix is the orthogonal projection of Y onto the column space of X. The fact that the matrix X(XTX)−1XT is a symmetric idempotent matrix is incessantly relied on in proofs of theorems. The linearity of as a function of the vector Y, expressed above by saying is the reason why this is called "linear" regression. Nonlinear regression uses nonlinear methods of estimation. The matrix In − X (XT X)−1 XT that appears above is a symmetric idempotent matrix of rank n − 2. Here is an example of the use of that fact in the theory of linear regression. The finite-dimensional spectral theorem of linear algebra says that any real symmetric matrix M can be diagonalized by an orthogonal matrix G, i.e., the matrix G′MG is a diagonal matrix. If the matrix M is also idempotent, then the diagonal entries in G′MG must be idempotent numbers. Only two real numbers are idempotent: 0 and 1. So In − X(XTX) -1XT, after diagonalization, has n − 2 1s and two 0s on the diagonal. That is most ofthe work in showing that the sum of squares of residuals has a chi-square distribution with n−2 degrees of freedom.
Published: September 30, 2007   
Please Rate this Summary : 1 2 3 4 5
  1. Answer   Question  :    worked example about least squares View All
Translate Send Link Print
X

.