# SuanShu, a Java numerical and statistical library

com.numericalmethod.suanshu.stats.pca

## Class PCAbyEigen

• All Implemented Interfaces:
PCA

public class PCAbyEigen
extends Object
This class performs Principal Component Analysis (PCA) on a data matrix, using eigen decomposition on the correlation or covariance matrix. Each eigenvalue is proportional to the portion of the "variance" (more correctly of the sum of the squared distances of the points from their multidimensional mean) that is correlated with each eigenvector. The sum of all the eigenvalues is equal to the sum of the squared distances of the points from their multidimensional mean.

PCA essentially rotates the set of points around their mean in order to align with the principal components. This moves as much of the variance as possible (using an orthogonal transformation) into the first few dimensions. The values in the remaining dimensions, therefore, tend to be small and may be dropped with minimal loss of information.

The R equivalent function is princomp. The main difference is that we use divisor (nObs - 1) instead of nObs for the sample covariance matrix.

• K. V. Mardia, J. T. Kent and J. M. Bibby, "Multivariate Analysis," London, Academic Press, 1979.
• W. N. Venables and B. D. Ripley, "Modern Applied Statistics with S," New York, Springer-Verlag, 2002.
• Wikipedia: Principal component analysis
• ### Constructor Summary

Constructors
Constructor and Description
PCAbyEigen(Matrix data)
Performs Principal Component Analysis, using the eigen method and the user supplied correlation matrix, on a data matrix.
PCAbyEigen(Matrix data, boolean correlation)
Performs Principal Component Analysis, using the eigen method, on a data matrix.
PCAbyEigen(Matrix data, boolean correlation, Matrix V)
Performs Principal Component Analysis, using the eigen method, on a data matrix with an optional correlation (or covariance) matrix provided.
• ### Method Summary

All Methods
Modifier and Type Method and Description
DenseVector cumulativeProportionVar()
Gets the cumulative proportion of overall variance explained by the principal components
ImmutableMatrix data()
Gets the original data matrix.
Eigen eigen()
Gets the eigenvalue decomposition of the correlation (or covariance) matrix.
Vector loading(int i)
Matrix loadings()
Vector mean()
Gets the sample means that were subtracted.
int nFactors()
Gets the number of variables in the original data.
int nObs()
Gets the number of observations in the original data; sample size.
Vector proportionVar()
Gets the proportion of overall variance explained by each of the principal components.
double proportionVar(int i)
Gets the proportion of overall variance explained by the i-th principal component.
Vector scale()
Gets the scalings applied to each variable.
Matrix scores()
Gets the scores of supplied data on the principal components.
double sdPrincipalComponent(int i)
Gets the standard deviation of the i-th principal component.
Vector sdPrincipalComponents()
Gets the standard deviations of the principal components (i.e., the square roots of the eigenvalues of the correlation (or covariance) matrix).
Matrix V()
Gets the correlation (or covariance) matrix used by the PCA.
Matrix X()
Gets the (possibly centered and/or scaled) data matrix X used for the PCA.
• ### Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
• ### Constructor Detail

• #### PCAbyEigen

public PCAbyEigen(Matrix data,
boolean correlation,
Matrix V)
Performs Principal Component Analysis, using the eigen method, on a data matrix with an optional correlation (or covariance) matrix provided.
Parameters:
data - an (nObs * nFactors) numeric matrix that represents the original data
correlation - a logical value indicating whether the correlation matrix (preferred) or the covariance matrix should be used (N.B. the correlation matrix can only be used if there is no constant variable)
V - an optional correlation (or covariance) matrix; if supplied, this is used rather than the correlation (or covariance) matrix of the centered (and possibly scaled) data
• #### PCAbyEigen

public PCAbyEigen(Matrix data,
boolean correlation)
Performs Principal Component Analysis, using the eigen method, on a data matrix.
Parameters:
data - a matrix that represents the original data
correlation - a logical value indicating whether the correlation matrix (preferred) or the covariance matrix should be used (N.B. the correlation matrix can only be used if there is no constant variable)
• #### PCAbyEigen

public PCAbyEigen(Matrix data)
Performs Principal Component Analysis, using the eigen method and the user supplied correlation matrix, on a data matrix.
Parameters:
data - matrix that represents the original data
• ### Method Detail

• #### scale

public Vector scale()
Gets the scalings applied to each variable. If covariance matrix is used instead of the (preferred) correlation matrix, no scaling is performed.
Specified by:
scale in interface PCA
Returns:
the scalings applied to each variable in the original data
• #### V

public Matrix V()
Gets the correlation (or covariance) matrix used by the PCA.
Returns:
the correlation (or covariance) matrix
• #### eigen

public Eigen eigen()
Gets the eigenvalue decomposition of the correlation (or covariance) matrix.
Returns:
the eigenvalue decomposition of the correlation (or covariance) matrix
• #### sdPrincipalComponents

public Vector sdPrincipalComponents()
Gets the standard deviations of the principal components (i.e., the square roots of the eigenvalues of the correlation (or covariance) matrix).
Returns:
the standard deviations of the principal components

public Matrix loadings()
Description copied from interface: PCA
Returns:

public Vector loading(int i)
Description copied from interface: PCA
Specified by:
loading in interface PCA
Parameters:
i - an index, counting from 1
Returns:
• #### proportionVar

public Vector proportionVar()
Description copied from interface: PCA
Gets the proportion of overall variance explained by each of the principal components.
Specified by:
proportionVar in interface PCA
Returns:
the proportion of overall variance explained by each of the principal components
• #### data

public ImmutableMatrix data()
Gets the original data matrix.
Returns:
the original data matrix
• #### nObs

public int nObs()
Description copied from interface: PCA
Gets the number of observations in the original data; sample size.
Specified by:
nObs in interface PCA
Returns:
nObs, the number of observations in the original data
• #### nFactors

public int nFactors()
Description copied from interface: PCA
Gets the number of variables in the original data.
Specified by:
nFactors in interface PCA
Returns:
nFactors, the number of variables in the original data
• #### mean

public Vector mean()
Description copied from interface: PCA
Gets the sample means that were subtracted.
Specified by:
mean in interface PCA
Returns:
the sample means of each variable in the original data
• #### X

public Matrix X()
Description copied from interface: PCA
Gets the (possibly centered and/or scaled) data matrix X used for the PCA.
Specified by:
X in interface PCA
Returns:
the (possibly centered and/or scaled) data matrix X
• #### sdPrincipalComponent

public double sdPrincipalComponent(int i)
Description copied from interface: PCA
Gets the standard deviation of the i-th principal component.
Specified by:
sdPrincipalComponent in interface PCA
Parameters:
i - an index, counting from 1
Returns:
the standard deviation of the i-th principal component.
• #### proportionVar

public double proportionVar(int i)
Description copied from interface: PCA
Gets the proportion of overall variance explained by the i-th principal component.
Specified by:
proportionVar in interface PCA
Parameters:
i - an index, counting from 1
Returns:
the proportion of overall variance explained by the i-th principal component
• #### cumulativeProportionVar

public DenseVector cumulativeProportionVar()
Description copied from interface: PCA
Gets the cumulative proportion of overall variance explained by the principal components
Specified by:
cumulativeProportionVar in interface PCA
Returns:
the cumulative proportion of overall variance explained by the principal components
• #### scores

public Matrix scores()
Description copied from interface: PCA
Gets the scores of supplied data on the principal components. The signs of the columns of the scores are arbitrary.
Specified by:
scores in interface PCA
Returns:
the scores of supplied data on the principal components