Last active
August 29, 2015 13:57
-
-
Save plpxsk/9453979 to your computer and use it in GitHub Desktop.
very quick summary of PCA
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## I don't know if any of this will make sense | |
## but if you can't wait to get started doing PCA, take a look at the below | |
## all R code is right below. annoted code is below that | |
dataset.PCAcor <- princomp(dataset,cor=T) | |
summary(dataset.PCAcor) | |
loadings(dataset.PCAcor) | |
biplot(dataset.PCAcor) | |
biplot(dataset.PCAcor,col=c("azure4","black"), cex=c(0.8,1), expand=0.9) | |
title("Biplot based on correlation matrix") | |
## Id look at bivariate correlations first (ie, pairwise correlations between variables) | |
## your dataframe should only have the variables you want to summarize, in columns | |
## to get pca | |
## if the variables are standardized (or all have the same units) | |
## then cor = F | |
## if variables have different units (like lbs, miles, hrs, etc) | |
## then cor = T | |
dataset.PCAcor <- princomp(dataset,cor=T) | |
summary(dataset.PCAcor) | |
loadings(dataset.PCAcor) | |
### one of these will show you "proportion of variance". this is variance explained by each principal component (each component is a summary of a few variables. loadings show which variables are in each component) | |
### max # of components = the # of variables | |
### you want a low# of components to explain a large % of the variance (look at cumulative variance) | |
# a 2-d projection of first two components | |
biplot(dataset.PCAcor) | |
# biplot(dataset.PCAcor,col=c("azure4","black"), cex=c(0.8,1), expand=0.9) | |
title("Biplot based on correlation matrix") | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment