Wednesday, May 10, 2017

Using Shiny to make a web plotter for RNAseq timeseries data

Shiny!

Let’s say you have finished a nice transcriptomics study and want to make your results accessable to the rest of the lab. One option is to email collaborators large datatables and let them find and plot the data they need but this has major drawbacks:
  1. The datatables are usually quite large.
  2. After each update (if you for example add new annotations) the previously distributed tables are out of date.
  3. Automating plotting functions makes it easy to repeat for different genes.
To get around these it’s a good idea to put the data in a database and create a simple user interface. This can be done many ways using python, mysql, javascript or some combination thereof. Here we’re going to do it in R using the shiny package.
Shiny is a framework that lets you create web apps while writing them entirely in R. It consists of two parts, a server script, which is essentially your R code written in Shiny’s reactive syntax, and a user interface script, which tells Shiny how to render the html and output the results.
The syntax of shiny is a little tricky at first but is easy to use once you get the hang of it. Shiny relies on a new (to R) concept: Reactivity. Reactivity is when a variable changes and all variables downstream of it change accordingly. It is essentially like re running your R-script over and over each time an input is changed. The major difference is that in Shiny it only runs the relevant code chunks that are ‘linked’ to the variable that has changed.
So what does this look like as a syntax? We can make a variable reactive by taking input from the user as ‘input$variable’.
For example to assign ‘gene’ using a user input 'genename' we would write:
gene <- input$genename
The variable ‘gene’ is reactive because each time the user changes the input 'genename', the value of 'gene' will change accordingly.
Now that ‘gene’ is reactive. There are two important things to remember:
  1. gene will now be used in downstream functions as ‘gene()’.
  2. Since ‘gene’ is reactive, we need to place all downstream functions into reactive expressions.
The latter point is the cause of most shiny errors so its worth reading about in the shiny documentation. A basic reactive expression is simply ‘reactive({})’. For example if we wanted to append ‘gene’ with ‘ID’:
geneID <- reactive({
  paste0(gene(),"ID")
})

Write the code

Ok let’s learn by doing! We’re going to build up the ui and server files separately then run them together at the very end but first we will write the code ‘normally’ then make it to reactive for the server.
The code below reads in a table of RNAseq data, takes just the first 1000 entries, subsets the data using the gene ‘ENSMUSG00000000028’, converts the subset to long format, and plots that gene using ggplot.
The data is timeseries RNAseq data from Trapnell et al. using the recount resource.
## Non-shiny example
mydat <- read.table(url("http://bowtie-bio.sourceforge.net/recount/countTables/trapnell_count_table.txt"), sep="\t", header =T, row.names = 1)
#drop low counts and pull the first 1000 records
mydat <- mydat[rowSums(mydat > 1) >=4,]
mydat <- mydat[1:1000,]
#re-arrange so the samples are increasing with time
#I know this order from the meta file
mydat <- mydat[c(4,3,1,2)]
#add an ID column we'll use for plotting
mydat$ID <- rownames(mydat)

#subset the dataframe by gene(s) of interest
gene1 <- c("ENSMUSG00000000028")
geneCounts <- mydat[rownames(mydat)%in%gene1,]

#add a vector with the genenames
geneCounts$ID <- rownames(geneCounts)

#put in long form for plotting
library(reshape)
geneCountsLong <- melt(geneCounts)
## Using ID as id variables
#plot using ggplot
library(ggplot2)
p <- ggplot(geneCountsLong, aes(x=geneCountsLong$variable, y=geneCountsLong$value, group=ID, colour=ID)) + 
  geom_line() +
  xlab("Sample") +
  ylab("counts")
print(p)

Build the server code

So now let’s build this into a reactive session using input$gene as the input. mydat stays the same so i don’t include those calls here. Remember, once a variable is reactive, we need to access it inside a reactive expression. So in the chunk below, gene1 becomes reactive when we assign it to the user input ‘gene’. Then in the geneCounts assignment we put that entire expression inside a ‘reactive({})’ wrapper function and call ‘gene1’ as ‘gene1()’.
library(shiny)
# server
gene1 <- c(input$gene)
#now need to make the next function reactive:
geneCounts <- reactive({
  mydat[rownames(mydat)%in%gene1(),]
})
Now we do the same with the ggplot chunk:
# server 
library(reshape)
geneCountsLong <- reactive({
  melt(geneCounts())
})

library(ggplot2)
p <- reactive({
  ggplot(geneCountsLong(), aes(x=geneCountsLong()$variable, y=geneCountsLong()$value, group=ID, colour=ID)) + 
  geom_line() +
  xlab("Sample") +
  ylab("counts")
})
To get the plot ‘p’ to show up on the user interface we use the ‘renderPlot({})’ wrapper and put it into an output variable that we can access on the ui side.
# server
output$plot1 <- renderPlot({
  print(p())
})

Build the UI

Great! Now it’s time to build a user interface. You can read up about customizing the ui. Here we make a sidebar with a textbox input to enter a gene name. Note that this creates the variable ‘input$gene’ that we can then use on the server side. The variable name is the first argument in ‘textInput()’. Then in the mainpanel we output ‘plot1’ from the server side we developed above.
#ui
ui <- fluidPage(
  sidebarLayout(
    sidebarPanel(
      textInput('gene', 'Input gene name', value = "", width = NULL, placeholder = 'e.g. ENSMUSG00000000028')
    ),
    mainPanel(plotOutput("plot1"))
  )
)

Put it all together

Now we can build all of this into our app to see how it runs. You can build a shiny app using the following template:
  1. All ‘library’ calls (important if you host on shinyapps.io)
  2. Non-reactive expressions like reading data etc.
  3. Server side expressions inside the server function wrapper: ‘server <- function(input, output) {}’
  4. User interface expressions insde the ui fluid page function ‘ui <- fluidPage()’
  5. A call to run the app with ‘shinyApp(ui = ui, server = server)’
Below I’ve pasted those elements in that order:
#library calls
library(reshape)
library(ggplot2)
library(shiny)

# Non- reactive elements go outside the server function
mydat <- read.table(url("http://bowtie-bio.sourceforge.net/recount/countTables/trapnell_count_table.txt"), sep="\t", header =T, row.names = 1)
mydat <- mydat[rowSums(mydat > 1) >=4,]
mydat <- mydat[1:1000,]
mydat <- mydat[c(4,1,2,2)]
mydat$ID <- rownames(mydat)

#server pasted together from above
server <- function(input, output) {
  gene <- reactive({
    c(input$gene)
  })
  geneCounts <- reactive({
  mydat[rownames(mydat)%in%gene(),]
  })
  geneCountsLong <- reactive({
    melt(geneCounts())
  })
  p <- reactive({
    ggplot(geneCountsLong(), aes(x=geneCountsLong()$variable, y=geneCountsLong()$value, group=ID, colour=ID)) + 
    geom_line() +
    xlab("Sample") +
    ylab("counts")
  })
  output$plot1 <- renderPlot({
  print(p())
  })
}

#ui pasted from above
ui <- fluidPage(
    headerPanel('Simple Gene Plotter!'),
    sidebarPanel(
      textAreaInput('gene', 'Input gene name', value = "", width = NULL, placeholder = 'e.g. ENSMUSG00000000001')
    ),
    mainPanel(plotOutput("plot1")
    )
  )

#this runs the app
shinyApp(ui = ui, server = server)
Note that its sometimes useful to have your ui.R and server.R files separated. Especially if you are hosting it at shinyapps.io.
It works! Here’s screen cap of our app:

Make it nice

Now we can add all the bells and whistles you’d expect from a normal webb app including images, css, even javascript. First though I usually tweak the core functions to be robust.
Here we want to be able to input multiple genes separated by a comma. Also rather than plotting as we type we want the app to wait. To solve the former we will add some code to split the input by comma. To solve the latter we will add a button that runs the app after it’s clicked.
Here we’re adding a ‘stringsplit()’ function to parse the user input so that multiple gene names can be used for plotting. We also changed the ‘reactive({})’ to ‘eventReactive({})’ This will wait until the user activates the code using a botton on the ui side. The ‘event’ to activate the code is ‘input$do’.
#server update
gene <- eventReactive(input$do, {
    unlist(strsplit(as.character(input$gene), ',', fixed=TRUE))
  }, ignoreNULL= T)
On the ui side, we’ll change the prompt dialogue to tell the user to enter a comma separated list to work with our ‘strsplit()’ function on the server side. We also change the example prompt showing the expected input using ‘textAreaInput(.., placeholder= )’. Finally we add an action button below the input that activates ‘input$do’ on the server side with ‘actionButton()’
#ui update
ui <- fluidPage(
    headerPanel('Simple Gene Plotter!'),
    sidebarPanel(
      textAreaInput('gene', 'Input gene names separated by comma:', value = "", width = NULL, placeholder = 'e.g. ENSMUSG00000000028,ENSMUSG00000000001'),
      actionButton("do", "Evaluate!")
    ),
    mainPanel(plotOutput("plot1")
    )
  )
This is what it looks like all together. You can copy and paste this direectly into your R terminal to see it live.
library(reshape)
library(ggplot2)
library(shiny)

mydat <- read.table(url("http://bowtie-bio.sourceforge.net/recount/countTables/trapnell_count_table.txt"), sep="\t", header =T, row.names = 1)
mydat <- mydat[rowSums(mydat > 1) >=4,]
mydat <- mydat[1:1000,]
mydat <- mydat[c(4,1,2,2)]
mydat$ID <- rownames(mydat)

server <- function(input, output) {
  gene <- eventReactive(input$do, {
    unlist(strsplit(as.character(input$gene), ',', fixed=TRUE))
  }, ignoreNULL= T)
  geneCounts <- reactive({
  mydat[rownames(mydat) %in% c(gene()),]
  })
  geneCountsLong <- reactive({
    melt(geneCounts())
  })
  p <- reactive({
    ggplot(geneCountsLong(), aes(x=geneCountsLong()$variable, y=geneCountsLong()$value, group=ID, colour=ID)) + 
    geom_line() +
    xlab("Sample") +
    ylab("counts")
  })
  output$plot1 <- renderPlot({
  print(p())
  })
}

ui <- fluidPage(
    headerPanel('Simple Gene Plotter!'),
    sidebarPanel(
      textAreaInput('gene', 'Input gene names separated by comma:', value = "", width = NULL, placeholder = 'e.g. ENSMUSG00000000028,ENSMUSG00000000001'),
      actionButton("do", "Evaluate!")
    ),
    mainPanel(plotOutput("plot1")
    )
  )

shinyApp(ui = ui, server = server)
Sweet! Here’s a screen cap of the app:

Building on this

There’s a lot to expand with on this little example. Shiny supports custom css, and even javascript. We may want to allow users to downlaod the image by adding a download widget. We could also output a table of the results to see the actual values using ‘renderTable()’. For inspiration you can always check out the shiny gallery!

Hosting your app

Finally you’ll want to distribute your app. You can host your app for free at shinyapps.io. Or you can run it internally on a lab computer. Lot’s more information on this shiny tutorial

Session

sessionInfo()
## R version 3.3.0 (2016-05-03)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X 10.10 (Yosemite)
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] shiny_0.14.2  ggplot2_2.2.0 reshape_0.8.6
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.7      knitr_1.15       magrittr_1.5     munsell_0.4.3   
##  [5] xtable_1.8-2     colorspace_1.3-0 R6_2.2.0         stringr_1.1.0   
##  [9] plyr_1.8.4       tools_3.3.0      grid_3.3.0       gtable_0.2.0    
## [13] htmltools_0.3.5  yaml_2.1.14      lazyeval_0.2.0   rprojroot_1.2   
## [17] digest_0.6.10    assertthat_0.1   tibble_1.2       mime_0.5        
## [21] evaluate_0.10    rmarkdown_1.4    labeling_0.3     stringi_1.1.2   
## [25] scales_0.4.1     backports_1.0.4  httpuv_1.3.3

No comments:

Post a Comment

Clustering RNAseq data using K-means: how many clusters?

Clustering RNAseq data using K-means: how many clusters? Note this is part 2 of a ser...