SuanShu, a Java numerical and statistical library

com.numericalmethod.suanshu.stats.pca
Class PCAbySVD

java.lang.Object
  extended by com.numericalmethod.suanshu.stats.pca.PCAbySVD
All Implemented Interfaces:
PCA

public class PCAbySVD
extends java.lang.Object

This class performs Principal Component Analysis (PCA) on a data matrix, using the preferred Singular Value Decomposition (SVD) method.

PCA essentially rotates the set of points around their mean in order to align with the principal components. This moves as much of the variance as possible (using an orthogonal transformation) into the first few dimensions. The values in the remaining dimensions, therefore, tend to be small and may be dropped with minimal loss of information.

The R equivalent function is prcomp.

See Also:

Constructor Summary
PCAbySVD(Matrix data)
          Perform Principal Component Analysis, using the preferred SVD method, on a centered and scaled data matrix.
PCAbySVD(Matrix data, boolean centered, boolean scaled)
          Perform Principal Component Analysis, using the preferred SVD method, on a data matrix (possibly centered and/or scaled).
PCAbySVD(Matrix data, boolean centered, boolean scaled, Vector mean, Vector scale)
          Perform Principal Component Analysis, using the preferred SVD method, on a data matrix with (optional) mean vector and scaling vector provided.
 
Method Summary
 DenseVector cumulativeProportionVar()
          Get the cumulative proportion of overall variance explained by the principal components
 ImmutableMatrix data()
          Get the original data matrix.
 Vector loading(int i)
          Get the loading vector of the i-th principal component.
 Matrix loadings()
          Get the matrix of variable loadings.
 Vector mean()
          Get the sample means that were subtracted.
 int nFactors()
          Get the number of variables in the original data.
 int nObs()
          Get the number of observations in the original data; sample size.
 Vector proportionVar()
          Get the proportion of overall variance explained by each of the principal components.
 double proportionVar(int i)
          Get the proportion of overall variance explained by the i-th principal component.
 Vector scale()
          Get the scalings applied to each variable.
 Matrix scores()
          Get the scores of supplied data on the principal components.
 double sdPrincipalComponent(int i)
          Get the standard deviation of the i-th principal component.
 DenseVector sdPrincipalComponents()
          Get the standard deviations of the principal components (i.e., the square roots of the eigenvalues of the correlation (or covariance) matrix, though the calculation is actually done with the singular values of the data matrix)
 SVD svd()
          Get the Singular Value Decomposition (SVD) of matrix X.
 DenseMatrix X()
          Get the (possibly centered and/or scaled) data matrix X used for the PCA.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

PCAbySVD

public PCAbySVD(Matrix data,
                boolean centered,
                boolean scaled,
                Vector mean,
                Vector scale)
Perform Principal Component Analysis, using the preferred SVD method, on a data matrix with (optional) mean vector and scaling vector provided.

Parameters:
data - a matrix that represents the original data
centered - a logical value indicating whether the variables should be shifted to be zero centered
scaled - a logical value indicating whether the variables should be scaled to have unit variance before the analysis takes place (N.B. in general scaling is advisable; however, it should only be used if there is no constant variable)
mean - an optional mean vector (of length equal to nFactors) to be subtracted regardless of the flag centered
scale - an optional scaling vector (of length equal to nFactors) to be divided regardless of the flag scaled

PCAbySVD

public PCAbySVD(Matrix data,
                boolean centered,
                boolean scaled)
Perform Principal Component Analysis, using the preferred SVD method, on a data matrix (possibly centered and/or scaled).

Parameters:
data - a matrix that represents the original data
centered - a logical value indicating whether the variables should be shifted to be zero centered
scaled - a logical value indicating whether the variables should be scaled to have unit variance before the analysis takes place (N.B. in general scaling is advisable; however, it should only be used if there is no constant variable)

PCAbySVD

public PCAbySVD(Matrix data)
Perform Principal Component Analysis, using the preferred SVD method, on a centered and scaled data matrix.

Parameters:
data - a matrix that represents the original data
Method Detail

mean

public Vector mean()
Description copied from interface: PCA
Get the sample means that were subtracted.

Specified by:
mean in interface PCA
Returns:
the sample means of each variable in the original data

scale

public Vector scale()
Description copied from interface: PCA
Get the scalings applied to each variable.

Specified by:
scale in interface PCA
Returns:
the scalings applied to each variable in the original data

svd

public SVD svd()
Get the Singular Value Decomposition (SVD) of matrix X.

Returns:
the Singular Value Decomposition (SVD) of matrix X

sdPrincipalComponents

public DenseVector sdPrincipalComponents()
Get the standard deviations of the principal components (i.e., the square roots of the eigenvalues of the correlation (or covariance) matrix, though the calculation is actually done with the singular values of the data matrix)

Returns:
the standard deviations of the principal components

loadings

public Matrix loadings()
Description copied from interface: PCA
Get the matrix of variable loadings. The signs of the columns of the loading are arbitrary.

Returns:
the matrix of variable loadings

data

public ImmutableMatrix data()
Get the original data matrix.

Returns:
the original data matrix

nObs

public int nObs()
Description copied from interface: PCA
Get the number of observations in the original data; sample size.

Specified by:
nObs in interface PCA
Returns:
nObs, the number of observations in the original data

nFactors

public int nFactors()
Description copied from interface: PCA
Get the number of variables in the original data.

Specified by:
nFactors in interface PCA
Returns:
nFactors, the number of variables in the original data

X

public DenseMatrix X()
Description copied from interface: PCA
Get the (possibly centered and/or scaled) data matrix X used for the PCA.

Specified by:
X in interface PCA
Returns:
the (possibly centered and/or scaled) data matrix X

sdPrincipalComponent

public double sdPrincipalComponent(int i)
Description copied from interface: PCA
Get the standard deviation of the i-th principal component.

Specified by:
sdPrincipalComponent in interface PCA
Parameters:
i - an index, counting from 1
Returns:
the standard deviation of the i-th principal component.

loading

public Vector loading(int i)
Description copied from interface: PCA
Get the loading vector of the i-th principal component.

Specified by:
loading in interface PCA
Parameters:
i - an index, counting from 1
Returns:
the loading vector of the i-th principal component

proportionVar

public Vector proportionVar()
Description copied from interface: PCA
Get the proportion of overall variance explained by each of the principal components.

Specified by:
proportionVar in interface PCA
Returns:
the proportion of overall variance explained by each of the principal components

proportionVar

public double proportionVar(int i)
Description copied from interface: PCA
Get the proportion of overall variance explained by the i-th principal component.

Specified by:
proportionVar in interface PCA
Parameters:
i - an index, counting from 1
Returns:
the proportion of overall variance explained by the i-th principal component

cumulativeProportionVar

public DenseVector cumulativeProportionVar()
Description copied from interface: PCA
Get the cumulative proportion of overall variance explained by the principal components

Specified by:
cumulativeProportionVar in interface PCA
Returns:
the cumulative proportion of overall variance explained by the principal components

scores

public Matrix scores()
Description copied from interface: PCA
Get the scores of supplied data on the principal components. The signs of the columns of the scores are arbitrary.

Specified by:
scores in interface PCA
Returns:
the scores of supplied data on the principal components

SuanShu, a Java numerical and statistical library

Copyright © 2012 Numerical Method Inc. Ltd. All Rights Reserved.