Tuesday, June 27, 2017

Running BLAST in a shiny web application

Well the past few posts have been very Shiny-centric and today is no exception (this is afterall the majority of my workload right now). This week i encountered an interesting problem: How to set up a BLAST server for a Shiny web app. It took some tinkering but I finally arrived at a pretty decent app!
If you want to jump right in, feel free to grab a copy of everything from the Github repository.

For any of the following to work you need to have installed the latest BLAST executables and have these programs be available in your $PATH.

The first challenge was to generate a user interface. This wasn't terribly difficult. For inputs I knew we needed a large text input box to handle the fasta, then a few options for evalue and database. It's also essential to have a "go button" in this case; since BLASTing is slow and computationally heavy, we don't want the BLAST to be called until everything is set. Finally as output, I wanted a simple table with the hits and below that, the alignment information for any selected hit. Here's what that looks like:


The code to build the ui is pretty straight forward. Just two panels, one for the input and one for the results. A little minor formatting here and there for the buttons etc.

library(shinythemes)
library(DT)

ui <- fluidPage(theme = shinytheme("cerulean"),
 tagList(
  tags$head(
  tags$link(rel="stylesheet", type="text/css",href="style.css"),
  tags$script(type="text/javascript", src = "busy.js")
 )
 ),
 
 #This block gives us all the inputs:
 mainPanel(
   headerPanel('Shiny Blast!'),
   textAreaInput('query', 'Input sequence:', value = "", placeholder = "", width = "600px", height="200px"),
   selectInput("db", "Databse:", choices=c("NvERTx.4","nr"), width="120px"),
   div(style="display:inline-block",
   selectInput("program", "Program:", choices=c("blastn","tblastn"), width="100px")),
   div(style="display:inline-block",
   selectInput("eval", "e-value:", choices=c(1,0.001,1e-4,1e-5,1e-10), width="120px")),
   actionButton("blast", "BLAST!")
 ),
 
 #this snippet generates a 'busy' indicator for long BLASTs
 div(class = "busy", 
   p("Calculation in progress.."), 
   img(src="https://i.stack.imgur.com/8puiO.gif", height = 100, width = 100,align = "center")
 ),
 
 #Basic results output
 mainPanel(
   h4("Results"),
   DT::dataTableOutput("blastResults"),
   p("Alignment:", tableOutput("clicked") ),
   verbatimTextOutput("alignment")
 )
)

One thing that might pop out in the above is the busy indicator. This is important since BLASTing against a large database like nr takes a looooong time. The indicator above is a really elegant solution from github user withr. Then when the user clicks BLAST the following icon pops up and the results pane greys out:



Be sure to include the js and css files linked for it to work!

The hard part of this project was making a 'reasonably' robust BLAST execution and results parse. The execution was pretty simple. To do that the app creates a temporary file for the fasta, then generates a system command to run the blast. The results (in xml) are then captured using the intern =T option of the system call.

I debated using and tested different blast outputs but settled on xml because it includes some useful information like the alignments and the full name of the hits (not just the gene symbol). The downside to this is we need to work with xml which if you're not used to it like me can be quite difficult.

Here's an example of the blastoutput in xml:

xmltop
##<BlastOutput>
## <BlastOutput_program>blastn</BlastOutput_program>
## <BlastOutput_version>BLASTN 2.6.0+</BlastOutput_version>
## <BlastOutput_reference>Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller (2000), "A greedy algorithm for aligning DNA sequences", J Comput Biol 2000; 7(1-2):203-14.</BlastOutput_reference>
## <BlastOutput_db>/Users/Jake/Documents/GitHub_repos/NvER_plotter_django/nemVec_ER/blast_db/NvERTx.4</BlastOutput_db>
## <BlastOutput_query-ID>Query_1</BlastOutput_query-ID>
## <BlastOutput_query-def>Query</BlastOutput_query-def>
## <BlastOutput_query-len>206</BlastOutput_query-len>
## <BlastOutput_param>
##  <Parameters>
##  <Parameters_expect>10</Parameters_expect>
##  <Parameters_sc-match>1</Parameters_sc-match>
##  <Parameters_sc-mismatch>-2</Parameters_sc-mismatch>
##  <Parameters_gap-open>0</Parameters_gap-open>
##  <Parameters_gap-extend>0</Parameters_gap-extend>
##  <Parameters_filter>m;</Parameters_filter>
##  </Parameters>
## </BlastOutput_param>
## <BlastOutput_iterations>
##  <Iteration>
##   <Iteration_iter-num>1</Iteration_iter-num>
##    <Iteration_query-ID>Query_1</Iteration_query-ID>
##    <Iteration_query-def>Query</Iteration_query-def>
##    <Iteration_query-len>206</Iteration_query-len>
##    <Iteration_hits>
##     <Hit>
##      <Hit_num>1</Hit_num>
##      <Hit_id>TRINITY_DN73914_c8_g1_i1</Hit_id>
##      <Hit_def>NvERTx.4.172576</Hit_def>
##      <Hit_accession>TRINITY_DN73914_c8_g1_i1</Hit_accession>
##      <Hit_len>206</Hit_len>
##      <Hit_hsps>
##       <Hsp>
##        <Hsp_num>1</Hsp_num>
##        <Hsp_bit-score>381.53</Hsp_bit-score>
##        <Hsp_score>206</Hsp_score>
##        <Hsp_evalue>4.84514e-105</Hsp_evalue>
##        <Hsp_query-from>1</Hsp_query-from>
##        <Hsp_query-to>206</Hsp_query-to>
##        <Hsp_hit-from>1</Hsp_hit-from>
##        <Hsp_hit-to>206</Hsp_hit-to>
##        <Hsp_query-frame>1</Hsp_query-frame>
##        <Hsp_hit-frame>1</Hsp_hit-frame>
##        <Hsp_identity>206</Hsp_identity>
##        <Hsp_positive>206</Hsp_positive>
##        <Hsp_gaps>0</Hsp_gaps>
##        <Hsp_align-len>206</Hsp_align-len>
##...
So for these results I needed to loop over them using xpathApply. You can see below where I extract the different elements and reference them to the tree above. I also use:
if(is.null(something){} else {...}
quite liberally since it really makes shiny apps run a lot more cleanly.

So without further ado here's the server code. I don't go through it here chunk by chunk but hopefully you can follow along with the comments!


require(XML)
library(plyr)
library(dplyr)
library(DT)

server <- function(input, output, session){
 
 blastresults <- eventReactive(input$blast, {
 
   #gather input and set up temp file
   query <- input$query
   tmp <- tempfile(fileext = ".fa")
 
   #if else chooses the right database
   #here you could add more databases
   if (input$db =="NvERTx.4"){
     db <- c("/Users/Jake/blast_db/NvERTx.4")
     remote <- c("")
   } else {
     db <- c("nr")
     #add remote option for nr since we don't have a local copy
     remote <- c("-remote")
   }
 
   #this makes sure the fasta is formatted properly
   if (startsWith(query, ">")){
     writeLines(query, tmp)
   } else {
     writeLines(paste0(">Query\n",query), tmp)
   }
 
   #calls the blast
   data <- system(paste0(input$program," -query ",tmp," -db ",db," -dust no -evalue ",input$eval," -outfmt 5 -max_hsps 1 -max_target_seqs 10 ",remote), intern = T)
   xmlParse(data)
 }, ignoreNULL= T)

 #Now to parse the results...
 parsedresults <- reactive({
   if (is.null(blastresults())){}
   else {
     xmltop = xmlRoot(blastresults())
 
     #the first chunk extracts elements from the xml
     results <- xpathApply(blastresults(), '//Iteration',function(row){
       query_ID <- getNodeSet(row, 'Iteration_query-def') %>% sapply(., xmlValue)
       hit_IDs <- getNodeSet(row, 'Iteration_hits//Hit//Hit_id') %>% sapply(., xmlValue)
       hit_length <- getNodeSet(row, 'Iteration_hits//Hit//Hit_len') %>% sapply(., xmlValue)
       bitscore <- getNodeSet(row, 'Iteration_hits//Hit//Hit_hsps//Hsp//Hsp_bit-score') %>% sapply(., xmlValue)
       eval <- getNodeSet(row, 'Iteration_hits//Hit//Hit_hsps//Hsp//Hsp_evalue') %>% sapply(., xmlValue)
       cbind(query_ID,hit_IDs,hit_length,bitscore,eval)
     })
     #this ensures that NAs get added for no hits
     results <- 40="" a="" align="" alignment="" alignments="" alignx="" as.data.frame="" blastresults="" blastresults_rows_selected="" bottom="" br="" breaks="" c="" carachters="" cbind="" chunk="" clicked="" colnames="" data.frame="" datatable="" do.call="" e-value="" else="" ength="" every="" for="" function="" get="" getnodeset="" gsub="" gt="" i="" id="" if="" input="" is.null="" it="" it_hsps="" lapply="" length="" look="" loop="" lt="" makes="" mid="" mini="" n="" names="" null="" on="" output="" over="" parsedresults="" paste0="" paste="" phew...="" print="" rbind.fill="" rbind="" renderdatatable="" rendertable="" rendertext="" results="" returns="" row="" rownames="T,colnames" rows="" sapply="" score="" selection="single" sp="" sp_hseq="" sp_midline="" sp_qseq="" split="" split_out="" splits="" stringsasfactors="FALSE)}))" strsplit="" t="" table="" tableout="" teration="" teration_hits="" the="" them="" this="" to="" together="" top="" uery="" unlist="" with="" wrapped="" xml="" xmltop="xmlRoot(blastresults())" xmlvalue="" xpathapply="" y="">

Okay that wasn't so bad. But I could make the reporting more robust instead of having two code chunks for single and multi fasta queries. The most important thing is it works! In the future I'd like to roll this into a package but for now I'll just include everything else we need to make it run.

This is the js for the busy indicator:

setInterval(function(){
 if ($('html').attr('class')=='shiny-busy') {
   setTimeout(function() {
     if ($('html').attr('class')=='shiny-busy') {
       $('div.busy').show()
     }
   }, 1000)
 } else {
   $('div.busy').hide()
 }
}, 100) 


And this is the css:


div.busy { 
 position:absolute;
 top: 40%;
 left: 50%;
 margin-top: -100px;
 margin-left: -50px;
 display:none;
 background: rgba(230, 230, 230, .8);
 text-align: center;
 padding-top: 20px;
 padding-left: 30px;
 padding-bottom: 40px;
 padding-right: 30px;
 border-radius: 5px;
} 


And here's the session:

sessionInfo()
##R version 3.3.0 (2016-05-03)
##Platform: x86_64-apple-darwin13.4.0 (64-bit)
##Running under: OS X 10.10 (Yosemite)
##
##locale:
##[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
##attached base packages:
##[1] stats graphics grDevices utils datasets methods base 
##
##other attached packages:
##[1] dplyr_0.7.0 XML_3.98-1.9 DT_0.2 shinythemes_1.1.1 shiny_1.0.3 
##
##loaded via a namespace (and not attached):
## [1] Rcpp_0.12.11 assertthat_0.2.0 digest_0.6.12 mime_0.5 R6_2.2.2 xtable_1.8-2 
## [7] jsonlite_1.5 magrittr_1.5 rlang_0.1.1 tools_3.3.0 glue_1.1.1 htmlwidgets_0.8 
##[13] httpuv_1.3.3 yaml_2.1.14 rsconnect_0.8 htmltools_0.3.6 tibble_1.3.3

No comments:

Post a Comment

Bulk download stock data from Yahoo finance with R

stocks StockScraper So a slow weekend means working on some of my non-bioinf...