Wednesday, April 12, 2017

Animated 3D PCA plots in R!

Principle component analysis (PCA) is a great way to visualize the overall architecture of RNAseq data. I routinely run PCA on my sample table to see if my replicates group together or after averaging my replicates to visualize how similar one condition is to another.

The algorithm itself reduces the dimensionality of the data while maintaining the overall variance. It does this by finding orthogonal axes that maximize the variation in the dataset. These axes are referred to as principle components. We can then use these principle components to visualize the variation between datasets by plotting said components, usually the first two. Let’s start with that.

First we pull down some test RNAseq data and take just the first 1000 genes:

mydat <- read.table(url("http://bowtie-bio.sourceforge.net/recount/countTables/trapnell_count_table.txt"), sep="\t", header =T, row.names = 1)
#drop low counts and pull the first 1000 records
mydat <- mydat[rowSums(mydat > 1) >=4,]
mydat <- mydat[1:1000,]
y <- as.matrix((mydat))

Next we perform the PCA. Log transformation is optional but I usually do it for RNAseq data.

log_data = log(y+1,2)
group <- factor(c(colnames(y)))
#this puts samples as rows, genes as columns
transpose <- t(log_data)
transpose_df <- as.data.frame(transpose)
#this is the function that does the PCA
pca.data <- prcomp(transpose_df)

You can inspect the output of the PCA. Notice there are 4 PCA loadings since we used 4 samples. The report below shows us the standard deviation of each component. Note that it decreases with each loading since each PC captures less variation. This is reflected in the proportion of variance; the percentage of the variation captured by each PC.

scores = as.data.frame(pca.data\$x)
summary(pca.data)
## Importance of components:
##                            PC1     PC2    PC3       PC4
## Standard deviation     25.7222 14.0236 9.9563 2.812e-14
## Proportion of Variance  0.6911  0.2054 0.1035 0.000e+00
## Cumulative Proportion   0.6911  0.8965 1.0000 1.000e+00

In most presentations you will see the PCA plotted in 2D space using the first two PCs since they are the most important. Let’s do that here (or you can make something nicer in ggplot):

library(rgl)
plot(scores[,1:2], col=c(1:4),cex=2, pch=16,
xlim = c(-50,50), ylim=c(-50,50), main="PCA-2D")
text(scores[,1], scores[,2],
labels=c(rownames(scores)), cex= 0.7, pos=3)

This is a good way to represent the data since the first two PCs are the most important. But we know that there are more components not visualized (up to the number of samples). So in theory we can plot the first 3 PCs to get a better idea of the relationship of these these samples. Note I use scores[,1:3] to get the first three loadings.

plot3d(scores[,1:3], col=c(1:4), size=10, type='p',
xlim = c(-50,50), ylim=c(-50,50), zlim=c(-50,50))
text3d(scores[,1]+2, scores[,2]+10, scores[,3]+2,
texts=c(rownames(scores)), cex= 0.7, pos=3)

Then you can save the image:

#output image
rgl.snapshot("3DPCA-Merge.png", "png")
#ouput pdf:
rgl.postscript("3DPCA-Merge.pdf", "pdf")

Awesome! Rotate the frame to check out the relationship of the data! But how do you use this for a presentation? Well one idea is to make a gif by rotating the image and saving it to make an image series. This great code chunk came from the Filion Lab:

dir.create("animation_merge")
for (i in 1:360) {
view3d(userMatrix=rotationMatrix(2*pi * i/360, 0, 1, 0))
rgl.snapshot(filename=paste("animation_merge/frame-",
sprintf("%03d", i), ".png", sep=""))}

I’ll leave it to you to decide how to convert the image sequence to an animated format. There’s lots of tools online. I use an image processing tool called Fiji. File > Import > Image Sequence… Then… Image > type > 8-bit color… Then… File >Save as > Animated gif.

Niiceee.

sessionInfo()
## R version 3.3.3 (2017-03-06)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: macOS Sierra 10.12.3
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base
##
## other attached packages:
## [1] rgl_0.98.1
##
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.10    digest_0.6.12   rprojroot_1.2   mime_0.5
##  [5] R6_2.2.0        xtable_1.8-2    jsonlite_1.3    backports_1.0.5
##  [9] magrittr_1.5    evaluate_0.10   stringi_1.1.3   rmarkdown_1.4
## [13] tools_3.3.3     stringr_1.2.0   htmlwidgets_0.8 shiny_1.0.1
## [17] httpuv_1.3.3    yaml_2.1.14     htmltools_0.3.5 knitr_1.15.1