Wednesday, September 27, 2017

Pokemods! An educational outreach initiative

Pokemodels! An educational outreach initiative

This post originally appeared on The Node.

Getting the next generation of scientists excited about Biology is an important part of our jobs as researchers. To that end Karine Nedoncelle, Aurelien Doucet and I created PokeMod cards. Each card features a model organism and highlights some of its contribution to the field of biology. The cards are easy to deploy at your next outreach event or in the classroom. All you have to do is download them, and have them printed at your local or university's print shop (we provide tiled front and back files in both A3 and A4).


Model organisms have become indispensable tools in biological research and have enabled innumerable advances in our understanding of life. But while many people are versed in core concepts involving cells, DNA and genes (my mother can give a pretty good explanation of CRISPR!), they are sometimes un-aware that the majority of the research behind these concepts is attributed to a handful of unique, sometimes exotic, organisms. Occasionally, particularly ignorant politicians have attacked such research as frivolous! Clearly there is a need to familiarize the greater public with the existence of our beloved model systems. We made PokeMods as an introduction and tribute to some of our most productive model organisms. We hope that after interacting with the cards, someone will walk away with a greater appreciation for the utility of say a fruit fly or sea urchin in biology research.

Learning Objective

A basic objective is the awareness that biological research is carried out using a variety of interesting organisms. This objective can be built upon in the classroom to include knowing which models are good for which types of research.

Target Audience

These cards are intended for children aged 8+. Younger children can certainly enjoy the cards but may not benefit from the learning objectives above. We also hope to reach a secondary audience that includes the ecosystem around the children (parents, educators, etc.)

Suggested activities

There are lots of ways these cards can be distributed. In a classroom they can be provided as a reward for correct responses or good behaviour. Then when students collect all of them they get a prize.

A more advanced classroom activity could be to create a game by assigning biological problems to groups of students. They then 'attack' the problem using their model systems and explain why.

During the 2016 fĂȘte de la science, an open house for the university, we 'hid' the cards at different exhibitions. For example the zebrafish card was found at a stall that was highlighting zebrafish research. When the students found all of the cards they got a prize, in our case a 3D printed DNA molecule. We also provided a flyer that displayed all the model organisms to find, connected by a phylogenetic tree. This aspect can be used with a more advanced audience to highlight the evolutionary relationship between the model organisms and the benefits to study them.

Monday, September 11, 2017

Bulk download stock data from Yahoo finance with R



So a slow weekend means working on some of my non-bioinformatics projects. This time it was writing an R script to scrape historical stock data from Yahoo finance using R. This comes after Yahoo broke everyone’s scripts (including one I had written in bash) by changing their API to require a cookie/crumb pair. I won’t go into detail about solving that problem (reg ex) or much about the script itself but feel free to have a look at it here.. The important thing is that it works.

I should note that another R package that I like very much, quantmod, includes the similar function getsymbols(). There’s a few things I dislike about getsymbols(), namely that when downloading multiple stocks it loads each one as a separate variable in the environment. This sucks if you’re downloading an entire exchange like NYSE. The other downside is not being able to limit the date range which again is useful when dealing with large numbers of stocks.

With that let’s take the StockScraper for a spin. First we need to source it since I’m too lazy to package it:


As you can see the script includes two functions. The primary function, stockhistoricals() is to download the data. The helper function get_stocklists() is a way to retrieve the stocklists for NYSE, AMEX and NASDAQ. It retrieves a lot of good stock metadata too which could be useful later when building correlation analysis.

Let’s bulk download NASDAQ as an example. Here I’m being explicit in the arguments but you can run them with their defaults listed at the top of the function. I usually run it with verbose=TRUE to monitor it but that would look like crap in this markdown!

NASDAQ <- stockhistoricals(stocklist="NASDAQ", start_date = "2016-09-11", end_date = "2017-09-11", verbose = FALSE)

So now we have a years worth of stock price historicals for the entire NASDAQ exchange. The data is stored as a list of dataframes named for the stock tickers. Check it out:

#list the first ten stocks

We can retrieve individual stock data using standard R notation:

#check out GOOG
##         Date   Open   High     Low  Close Adj.Close  Volume
## 1 2016-09-12 755.13 770.29 754.000 769.02    769.02 1311000
## 2 2016-09-13 764.48 766.22 755.800 759.69    759.69 1395000
## 3 2016-09-14 759.61 767.68 759.110 762.49    762.49 1087400
## 4 2016-09-15 762.89 773.80 759.960 771.76    771.76 1305100
## 5 2016-09-16 769.75 769.75 764.660 768.88    768.88 2049300
## 6 2016-09-19 772.42 774.00 764.441 765.70    765.70 1172800

If you’re comfortable with lists we can work directly with the list for simple analyses:

#Get the average adjusted close price for GOOG
## [1] 852.4343

Or we can extract stocks and do fun things like plot them.

#extract GOOG
GOOG <- data.frame(NASDAQ$GOOG)
names(GOOG) <- c("Date","Open","High","Low","Close","Adj.Close","Volume")

#plot it out!
ggplot(GOOG, aes(x = Date, y = Close)) +
geom_line() +
labs(title = "GOOG Price", y = "Closing Price", x = "")

In the future I might bundle stockscraper into an R package along with some of my favorite plotting and clustering wrappers. But for now I'll leave it at that.

Good luck and happy data-mining!


## R version 3.3.0 (2016-05-03)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X 10.10 (Yosemite)
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## other attached packages:
## [1] ggplot2_2.2.1  readr_1.1.1    httr_1.2.1     RCurl_1.95-4.8
## [5] bitops_1.0-6   XML_3.98-1.9  
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.11     knitr_1.16       magrittr_1.5     hms_0.3         
##  [5] munsell_0.4.3    colorspace_1.3-2 R6_2.2.2         rlang_0.1.2     
##  [9] plyr_1.8.4       stringr_1.2.0    tools_3.3.0      grid_3.3.0      
## [13] gtable_0.2.0     htmltools_0.3.6  lazyeval_0.2.0   yaml_2.1.14     
## [17] rprojroot_1.2    digest_0.6.12    tibble_1.3.3     curl_2.6        
## [21] evaluate_0.10    mime_0.5         rmarkdown_1.6    labeling_0.3    
## [25] stringi_1.1.5    scales_0.4.1     backports_1.1.0

Pokemods! An educational outreach initiative

Pokemodels! An educational outreach initiative This post originally appeared on The Node . Getting the next generation...