上QQ阅读APP看书，第一时间看更新

Multiple linear regression concepts

So far, we have resolved simple linear regression problems that study the relation between a dependent variable, y, and an independent variable, x, based on the following regression equation:

In this equation, the explanatory variable is represented by x and the response variable is represented by y. To solve this problem, the least squares method was used. In this method, we can find the best fit by minimizing the sum of squares of the vertical distances from each data point on the line. As mentioned previously, we don't find that a variable depends solely on another very often. Usually, we find that the response variable depends on at least two predictors. In practice, we will have to create models with a response variable that depend on more than one predictor. These models are known as multiple linear regression models. It is a straightforward generalization of single predictor models. According to multiple linear regression models, the dependent variable is related to two or more independent variables.

The general model for n variables is of the following form:

Here, x₁, x₂,.. x_n are the n predictors and y is the only response variable. The coefficients β_i measure the change in the y value, associated with a change in x_i, keeping all the other variables constant. The simple linear regression model is used to find a straight line that best fits the data. On the other hand, multiple linear regression models, for example, with two independent variables, are used to find a plane that best fits the data; more generally, it is a multidimensional plane. The goal is to find the surface that best fits our predictors in terms of minimizing the overall squared distance between itself and the response variable.

To estimate β similarly to what we did in the simple linear regression case, we want to minimize the following term over all possible values of intercepts and slopes:

Just as we did in the case of simple linear regression, we can represent the previous equation in matrix form, as follows:

We can name the terms contained in this formula as follows:

This can be reexpressed using a condensed formulation:

Finally, to determine the intercept and slope through the least squares method, we have to solve the previous equation with respect to β, as follows (we must estimate the coefficients with the normal equation):

Basically, it represents the same equation that we looked at previously. We can then calculate the intercept and slope of the regression line.