Thursday, December 21, 2017

Clickable Volcano Plots in Shiny!

Clickable volcano plots (and clickable MA plots) using ggplot and shiny!

One of the goals of any bioinformatics project is to distill a large amount of data into an easily digestable graphic or table. Examples are the volcano plot, and it’s cousin the MA plot, which are used to represent the results of a differential expression test. Volcano plots show fold chance as a function of pvalue/FDR while the MA plot shows fold change as a function of expression. These plots are paradoxical in that they are both simulataneously incredibly useful and completely worthless. They’re useful because with just a glance we can quickly pick out genes that are up or down in a given comparison e.g. treatment versus control. They’re worthless because aside from the most highly up or down points we have no idea what those genes are!
These plot are much much more useful when theyre interactive. Enter shiny! Let’s make a clickable volcano plot where the user can click points and get the gene information from them. First we will download some data and perform a simple DEtest. The data is from BCL6-depleted OCI-LY1 cells rescued with either WT or RD mutant BCL6 (N=3 for each group) and downloaded from the recount project.
library(edgeR)
library(SummarizedExperiment)

#get the data from recount
load(url("http://duffel.rail.bio/recount/SRP043078/rse_gene.Rdata"))
counts <- assays(rse_gene)$counts

#get the phenotypic data
phenotypes <- read.table(file='http://duffel.rail.bio/recount/SRP043078/SRP043078.tsv',header=T, sep='\t',fill=T)

#edgeR block for normalization and DE testing:
y <- as.matrix((counts))
#the last three samples are the mutants (see pheno data)
y <- DGEList(counts = y, group=c('WT','WT','WT','mut','mut','mut'))
y <- calcNormFactors(y)
y <- estimateCommonDisp(y)
y <- estimateTagwiseDisp(y)
#This is the DE test:
de.WT.mut <- exactTest(y, pair=c('WT','mut'))
tags.de.WT.mut <- data.frame(topTags(de.WT.mut, n=(58037)))
Now we have the results of the DE test with logFC of wildtype versus mutant, pvalue for the test, expression of the gene as logCPM, and the corrected pvalue expressed as the FDR.
head(tags.de.WT.mut)
##                        logFC   logCPM        PValue           FDR
## ENSG00000170989.8   6.286462 3.949995 3.588356e-159 2.082574e-154
## ENSG00000121440.14 -4.895898 2.652711 1.244436e-138 3.611166e-134
## ENSG00000106624.9   4.807209 3.067900 5.623195e-135 1.087845e-130
## ENSG00000067798.14  4.741035 3.468668 3.007754e-132 4.364025e-128
## ENSG00000116299.16  4.555373 2.736529 3.055612e-124 3.546771e-120
## ENSG00000144711.13  4.463181 3.795915 1.374557e-120 1.329586e-116
We have everything we need to make a volcano plot. First, we add one column to the dataframe to indicate significantly DE expressed genes. This will be used by ggplot to color the significant points. You can use whatever threshold you like but here we use logFC > 2 and FDR < 0.05. We also add a column to calculate the negative log of the FDR so it all fits nicely on the plot:
# significant is coded as TRUE, not sig as FALSE
tags.de.WT.mut$sig <- as.factor(abs(tags.de.WT.mut$logFC) > 2 & tags.de.WT.mut$FDR < 0.05)
#convert FDR to -log(FDR)
tags.de.WT.mut$negLogFDR <- -log10(tags.de.WT.mut$FDR)
#save for later:
write.table(tags.de.WT.mut, file="tags.de.WT.mut.txt", row.names = T, sep='\t', quote=F)
Now we’re all set for ggplot. Here’s the volcano:
library(ggplot2)
ggplot(tags.de.WT.mut,aes(x=logFC, y=negLogFDR, color=sig)) +
  geom_point() +
  coord_cartesian() +
  ylab("-log10 FDR") +
  xlab("log2 fold change")

And here’s the MA:
ggplot(tags.de.WT.mut,aes(x=logCPM, y=logFC, color=sig)) +
  geom_point() +
  coord_cartesian() +
  ylab("log2 FC") +
  xlab("log2 CPM")

These plots are a great way to see in general how many genes are up or down in a given experiment. Aside from possibly adding labels a few points they tell us very little about which genes are up or down. So let’s make a shiny app to interact with the results.
The following assumes at least a cursory knowledge of shiny. We’ll set up an extremely basic interface which just has the a select bar for the DE test, the plot, and slider for filtering the results. For the volcano plot, we can filter using CPM therefore combining the useable information from both the volcano and MA plots into one awesome interactive volcano plot.
library(shiny)

# Define UI for application that draws a histogram
ui <- shinyUI(fluidPage(
  h1("Clickable Volcano Plots!"),
  selectInput('inputFile', label='select a file:',choice='WT.mut'),
  plotOutput('volcanoPlot',click='plot_click'),
  sliderInput('cpmCut', label="log(CPM) cutoff",0,10,0, width="200px")
))
The server side logic is also simple. We create reactive objects for the file name, dataframe, and plot.
server <- function(input, output, session){
  
  #read in the table as a function of the select input
  dataFrame <- reactive({
    filename <- paste0("tags.de.",input$inputFile,".txt")
    read.table(file=filename, header=T, sep='\t')
  })
  
  # filter the table by CPM:
  dataFilter <- reactive({
    dataFrame()[dataFrame()$logCPM > input$cpmCut,]
  })
  
  #plot it normally with ggplot:
  output$volcanoPlot <- renderPlot({ 
    ggplot(dataFilter(),aes(x=logFC, y=negLogFDR, color=sig)) +
      geom_point() +
      coord_cartesian() +
      ylab("-log10 FDR") +
      xlab("log2 fold change")
  })
}

Making the plot clickable!

So now that our app is working nicely we can make it clickable. In the ui we add a data table to output the results of the click:
# Define UI for application that draws a histogram
ui <- shinyUI(fluidPage(
  h1("Clickable Volcano Plots!"),
  selectInput('inputFile', label='select a file:',choice='WT.mut'),
  plotOutput('volcanoPlot',click='plot_click'),
  sliderInput('cpmCut', label="log(CPM) cutoff",-10,10,-10, width="200px"),
  
  #here the table for the clicked points:
  tableOutput('clickedPoints')
))
Then on the server side we add the logic to caputre the points. We do this with the nearPoints function:
server <- function(input, output, session){
  
  #read in the table as a function of the select input
  dataFrame <- reactive({
    filename <- paste0("tags.de.",input$inputFile,".txt")
    read.table(file=filename, header=T, sep='\t')
  })
  
  # filter the table by CPM:
  dataFilter <- reactive({
    dataFrame()[dataFrame()$logCPM > input$cpmCut,]
  })
  
  #plot it normally with ggplot:
  output$volcanoPlot <- renderPlot({ 
    ggplot(dataFilter(),aes(x=logFC, y=negLogFDR, color=sig)) +
      geom_point() +
      coord_cartesian() +
      ylab("-log10 FDR") +
      xlab("log2 fold change")
  })
  
  #get the clicked points!
  clicked <- reactive({
    # We need to tell it what the x and y variables are:
    nearPoints(dataFilter(), input$plot_click, xvar = "logFC", yvar = "negLogFDR")
  })
  
  #output those points into a table
  
  output$clickedPoints <- renderTable({
    clicked()
  }, rownames = T)
}

Et voila! Now you can share the plots with colaborators and labmates for easy differential expression exploration! You could spiff things up even more by letting users click and drag to select points. To do this add brush=plot_brush to the plotOuput function and add the appropriate logic on the server side.
Here’s the code for making a clickable MA plots. Exatcly the same with slight modification to the plot and clicked function:
server <- function(input, output, session){
  
  #read in the table as a function of the select input
  dataFrame <- reactive({
    filename <- paste0("tags.de.",input$inputFile,".txt")
    read.table(file=filename, header=T, sep='\t')
  })
  
  # filter the table by CPM:
  dataFilter <- reactive({
    dataFrame()[dataFrame()$logCPM > input$cpmCut,]
  })
  
  #plot it normally with ggplot:
  output$volcanoPlot <- renderPlot({ 
    ggplot(dataFilter(),aes(x=logCPM, y=logFC, color=sig)) +
     geom_point() +
     coord_cartesian() +
     ylab("log2 FC") +
     xlab("log2 CPM")
  })
  
  #get the clicked points!
  clicked <- reactive({
    # We need to tell it what the x and y variables are:
    nearPoints(dataFilter(), input$plot_click, xvar = "logCPM", yvar = "logFC")
  })
  
  #output those points into a table
  output$clickedPoints <- renderTable({
    clicked()
  }, rownames = T)
}
With the MA plot it could be better with a different slider for pVlaue instead of CPM.
Tune in next time and we will try this whole thing again using plotly!

session:

sessionInfo()
## R version 3.3.0 (2016-05-03)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X 10.10 (Yosemite)
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] shiny_1.0.5                SummarizedExperiment_1.2.3
##  [3] Biobase_2.32.0             GenomicRanges_1.24.3      
##  [5] GenomeInfoDb_1.8.7         IRanges_2.6.1             
##  [7] S4Vectors_0.10.3           BiocGenerics_0.18.0       
##  [9] edgeR_3.14.0               limma_3.28.21             
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.14    knitr_1.17      XVector_0.12.1  magrittr_1.5   
##  [5] zlibbioc_1.18.0 xtable_1.8-2    R6_2.2.2        stringr_1.2.0  
##  [9] tools_3.3.0     htmltools_0.3.6 yaml_2.1.16     rprojroot_1.3-1
## [13] digest_0.6.13   mime_0.5        evaluate_0.10.1 rmarkdown_1.8  
## [17] stringi_1.1.6   backports_1.1.2 httpuv_1.3.5

No comments:

Post a Comment

xkcd plots!

xkcd plots: One of my favorite things about R is the massive number of communi...