plsregress
Calculate partial least squares regression using SIMPLS algorithm.
plsregress uses the SIMPLS algorithm, and first centers X and
Y by subtracting off column means to get centered variables. However,
it does not rescale the columns. To perform partial least squares regression
with standardized variables, use zscore to normalize X and
Y.
[xload, yload] = plsregress (X, Y) computes a
partial least squares regression of Y on X, using NCOMP
PLS components, which by default are calculated as
min (size (X, 1) - 1, size(X, 2)), and returns the
the predictor and response loadings in xload and yload,
respectively.
[xload, yload] = plsregress (X, Y,
NCOMP) defines the desired number of PLS components to use in the
regression. NCOMP, a scalar positive integer, must not exceed the
default calculated value.
[xload, yload, xscore, yscore, coef,
pctVar, mse, stats] = plsregress (X, Y,
NCOMP) also returns the following arguments:
0:NCOMP components with the
first row containing the squared errors for the predictor variables in
X and the second row containing the mean squared errors for the
response variable(s) in Y.
.W is a matrix of PLS weights.
.T2 is the statistics for each point in
xscore.
.Xresiduals is an matrix with the
predictor residuals.
.Yresiduals is an matrix with the
response residuals.
[…] = plsregress (…, Name, Value, …)
specifies one or more of the following Name/Value pairs:
| Name | Value | |
|---|---|---|
"CV" | The method used to compute mse. When
Value is a positive integer , plsregress uses
-fold cross-validation. Set Value to a cross-validation
partition, created using cvpartition, to use other forms of
cross-validation. Set Value to "resubstitution" to use both
X and Y to fit the model and to estimate the mean squared errors,
without cross-validation. By default, Value = "resubstitution". | |
"MCReps" | A positive integer indicating the number of
Monte-Carlo repetitions for cross-validation. By default,
Value = 1. A different "MCReps" value is only
meaningful when using the "HoldOut" method for cross-validation,
previously set by a cvpartition object. If no cross-validation method
is used, then "MCReps" must be 1. |
Further information about the PLS regression can be found at https://en.wikipedia.org/wiki/Partial_least_squares_regression
Source Code: plsregress
## Perform Partial Least-Squares Regression
## Load the spectra data set and use the near infrared (NIR) spectral
## intensities (NIR) as the predictor and the corresponding octave
## ratings (octave) as the response.
load spectra
## Perform PLS regression with 10 components
[xload, yload, xscore, yscore, coef, ptcVar] = plsregress (NIR, octane, 10);
## Plot the percentage of explained variance in the response variable
## (PCTVAR) as a function of the number of components.
plot (1:10, cumsum (100 * ptcVar(2,:)), "-ro");
xlim ([1, 10]);
xlabel ("Number of PLS components");
ylabel ("Percentage of Explained Variance in octane");
title ("Explained Variance per PLS components");
## Compute the fitted response and display the residuals.
octane_fitted = [ones(size(NIR,1),1), NIR] * coef;
residuals = octane - octane_fitted;
figure
stem (residuals, "color", "r", "markersize", 4, "markeredgecolor", "r")
xlabel ("Observations");
ylabel ("Residuals");
title ("Residuals in octane's fitted responce");
|
## Calculate Variable Importance in Projection (VIP) for PLS Regression
## Load the spectra data set and use the near infrared (NIR) spectral
## intensities (NIR) as the predictor and the corresponding octave
## ratings (octave) as the response. Variables with a VIP score greater than
## 1 are considered important for the projection of the PLS regression model.
load spectra
## Perform PLS regression with 10 components
[xload, yload, xscore, yscore, coef, pctVar, mse, stats] = ...
plsregress (NIR, octane, 10);
## Calculate the normalized PLS weights
W0 = stats.W ./ sqrt(sum(stats.W.^2,1));
## Calculate the VIP scores for 10 components
nobs = size (xload, 1);
SS = sum (xscore .^ 2, 1) .* sum (yload .^ 2, 1);
VIPscore = sqrt (nobs * sum (SS .* (W0 .^ 2), 2) ./ sum (SS, 2));
## Find variables with a VIP score greater than or equal to 1
VIPidx = find (VIPscore >= 1);
## Plot the VIP scores
scatter (1:length (VIPscore), VIPscore, "xb");
hold on
scatter (VIPidx, VIPscore (VIPidx), "xr");
plot ([1, length(VIPscore)], [1, 1], "--k");
hold off
axis ("tight");
xlabel ("Predictor Variables");
ylabel ("VIP scores");
title ("VIP scores for each predictror variable with 10 components");
|