The method that we have discussed so far in this chapter have involved fitting linear regression models, via least squares or a shrunken approach, using the original predictors, .
We now explore a class of approaches that transform the predictors and then fit a least squares model using the transoformed variables. We will refer to these techniques as dimension reduction methods.
- Let represent linear combinations of our original predictors. That is, for some constants
- We can then fit the linear regression model, using ordinary least squares,
Note that in model , the regression coefficient are given by . If the constants are chosen wisely, then such dimension reduction approaches can often outperform origin linear square(OLS) regression.
Notice that from definition , where(Hence, model 2 can be thought of as a special case of the original linear regression model)
Dimension reduction serves to constrain the estimated coefficients, since now they must take the form .
Can win in the bias-variance tradeoff.
All dimension reduction methods work in two steps. First, the transformed predictors are obtained. Second, the model is fit using these predictors. However, the choice of or the selection of the 's can be achieved in different ways. We will consider two approaches for this task: principal components and partial least squares.
Principal Components Regression
- Here we apply principal components analysis (PCA) to define the linear combinations of the predictors, for use in our regression.
- The first principal component is that(normalized) linear combination of the variables with the largest variance.
- The second principal component has largest variance, subject to being uncorrelated with the first.
- And so on.
- Hence with many correlated original variables, we replace them with a small set of principal components that capture their joint variation.
With two-dimensional data, we can construct at most two principal components. However, if we have other predictors, then additional components could be constructed. The would successively maximize variance, subject to the constraint of being uncorrelated with the preceding components.
Partial Least Squares
PCR identifies linear combinations, or directions, that best represent the predictors .These directions are identified in an unsupervised way, since the response Y is not used to help determine the principal component directions.
Consequently, PCR suffers from a potentially serious drawback: there is no guarantee that the directions that best explain the predictors will also be the best directions to use for predicting the response.
- Like PCR, PLS is dimension reduction method, which first identifies a new set of features that are linear combinations of the original features, and then fits a linear model via OLS using these M new features.
- But unlike PCR, PLS identifies these new features in a supervised way — that is, it makes use of the response Y in order to identify new features that not only approximate the old features well, but also that are related to the response.
- Roughly speaking, the PLS approach attempts to find directions that help explain both the response and predictors.