Friday, June 1, 2018

One hot encode a DNA sequence using python and scikit learn



Machine learning (in the informatics world) is like teenage sex: everyone talks about it, nobody really knows how to to do it, everyone thinks everyone else is doing it, so everyone claims they are too. Juvenile comparisons aside, the power of these tools can't be ignored. What really piqued my interest was reading Adam Geitgy's post on Medium where he builds a Super Mario level using a neural network. His entire eight part series by the way, is an awesome primer on machine learning. More recently I read a recent paper by Wang et al. that applied deep learning to transcription factor binding and I was inspired to learn more. Using deep learning tools for DNA analysis requires first converting DNA sequences to numbers. We can do this by one hot encoding our DNA sequence.

One Hot Encoding and DNA sequences:


One hot encoding is a way to represent categorical data as binary vectors. For DNA, we have four catagories A, T, G, and C

Thus a one hot code for DNA could be:
A = [1, 0, 0, 0]
T = [0, 1, 0, 0]
G = [0, 0, 1, 0]
C = [0, 0, 0, 1]

So the sequence AATTC would be:
[[1, 0, 0, 0],
[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 1, 0, 0],
[0, 0, 0, 1]]

You might be asking 'why not just use A=1, T=2, G=3, C=4?'
The answer is of course we can, this is called integer encoding and we need this for our encoding solution below.

To do this in python, I found a great one hot encoding tutorial by Jason Brownlee that takes advantage of the SciKit Learn library and adapted it to work with DNA sequences. Note that the SciKit learn library is pre-installed with the Anaconda distribution. It made the most sense to me to build this as a python class to use repeatedly for many sequences and storing their attributes for later use. the class hot_dna takes a fasta as argument. The first chunk will check for and store the sequence name (anything between '>' and newline). Then the sequence is converted to an array for integer encoding. The integer encoding is carried out using LabelEncoder(). Next, the integer encoded DNA is one hot encoded using OneHotEncoder(). Finally, these encodings and the original sequence along with it's name get loaded as attributes. Check it out below:

from numpy import array
from numpy import argmax
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
import re

class hot_dna:
 def __init__(self,fasta):
   
  #check for and grab sequence name
  if re.search(">",fasta):
   name = re.split("\n",fasta)[0]
   sequence = re.split("\n",fasta)[1]
  else :
   name = 'unknown_sequence'
   sequence = fasta
  
  #get sequence into an array
  seq_array = array(list(sequence))
    
  #integer encode the sequence
  label_encoder = LabelEncoder()
  integer_encoded_seq = label_encoder.fit_transform(seq_array)
    
  #one hot the sequence
  onehot_encoder = OneHotEncoder(sparse=False)
  #reshape because that's what OneHotEncoder likes
  integer_encoded_seq = integer_encoded_seq.reshape(len(integer_encoded_seq), 1)
  onehot_encoded_seq = onehot_encoder.fit_transform(integer_encoded_seq)
  
  #add the attributes to self 
  self.name = name
  self.sequence = fasta
  self.integer = integer_encoded_seq
  self.onehot = onehot_encoded_seq
And here's what a fasta looks like going through:
  
# EXAMPLE
fasta = ">fake_sequence\nATGTGTCGTAGTCGTACG"
my_hottie = hot_dna(fasta)

print(my_hottie.name)
>fake_sequence
print(my_hottie.sequence)
ATGTGTCGTAGTCGTACG
print(my_hottie.integer)
[[0]
 [3]
 [2]
 [3]
 [2]
 [3]
 [1]
 [2]
 [3]
 [0]
 [2]
 [3]
 [1]
 [2]
 [3]
 [0]
 [1]
 [2]]
print(my_hottie.onehot)
[[ 1.  0.  0.  0.]
 [ 0.  0.  0.  1.]
 [ 0.  0.  1.  0.]
 [ 0.  0.  0.  1.]
 [ 0.  0.  1.  0.]
 [ 0.  0.  0.  1.]
 [ 0.  1.  0.  0.]
 [ 0.  0.  1.  0.]
 [ 0.  0.  0.  1.]
 [ 1.  0.  0.  0.]
 [ 0.  0.  1.  0.]
 [ 0.  0.  0.  1.]
 [ 0.  1.  0.  0.]
 [ 0.  0.  1.  0.]
 [ 0.  0.  0.  1.]
 [ 1.  0.  0.  0.]
 [ 0.  1.  0.  0.]
 [ 0.  0.  1.  0.]]

All set for your machine learning algorithms! Which may or may not be the topic of a future post.

LearnMeSomeBiology(my_hottie.onehot)


Until next time!

Tuesday, January 30, 2018

Making interactive plots using javascript and D3 (v4)

There comes a time in every scientist's career when it's time to leave behind the excel chart tools. There are many options spanning a broad range of difficulty and customisability. Check out this post on Source for a good breakdown of the available programs and libraries. Personally, I am a huge fan of the R library ggplot2 because of it's flexibility and many add ons including my favorite, the XKCD add on. Lately though I've been working on some data visualisation projects and needed to generate plots on the fly for display in a website.

Enter: D3.js and it's offspring plotly.js.

D3 is a javascript library that takes input data and uses it draw, edit and render elements of a website. The results are stunning data visualisations built right into the html itself. Plotly, is a plotting library built on top of D3 and is basically a series of wrappers to make the most common chart types.

So why use plotly when we can go straight to the D3 source? Because D3 is difficult. Especially if you aren't completely comfortable in a js/html/css environment.

Then why use D3 when plotly can do (mostly) what we need? Because with D3 the possibilities are virtually endless. If you can dream it, D3 can probably make it.
Want to have a customised interactive plot? D3.
Want to make an interactive network diagram? D3.
How about a heatmap? D3.
Maybe you fancy a scatter plot with poop emojis?
You get the idea...

So let's give it a try. Below we'll make a simple scatter - line plot using D3 and add some interactivity.
CLICK HERE to skip the tutorial and jump to the full code.
Here's what we're making:

The plot is a basic line and scatter plot. Note the nifty hover labels and clickable points! Let's go through the code used to make it. First we need to make a simple web layout. Here I'm making a basic html page with two divs, one for the plot with the class D3Div, and one which will display some data with the id `clickTable`:

Document Setup


<html> 
<head>
<style>
.D3Div{
padding-top: 50px;
padding-right: 30px;
padding-bottom: 50px;
padding-left: 10px;
height: 50vh;
width:50vw;
}
.clickTableContainer{
padding-top: 50px;
padding-right: 30px;
padding-bottom: 50px;
padding-left: 10px;
}
</style>
</head>
<body>
<h1>D3 Scatter - Line Chart</h1>
<div id='container' class='D3Div'><!-- D3 chart will appear here! --></div>
<div id='clickTable'><!-- Clicked points will appear here! --></div>
</body>
</html>

Now we are ready to write our javascript. We have some dependencies to load: First is the D3 library (we're using version 4 which has a different syntax from v3) and its dependency jquery. We also will use the tool tip tipsy for hover labels.


  <!-- D3 -->
<script src="js/d3.v4.min.js"></script>
<!-- D3 requires jquery-->
<script src="js/jquery.min.2.1.3.js"></script>
<!-- Tipsy will make the hover label-->
<script type="text/javascript" src="js/jquery.tipsy.js"></script>
<link href="css/tipsy.css" rel="stylesheet" type="text/css" />

Data and D3 basics

First we need some data. Here's an array of RNAseq data for a time series:


    var dataset = [
[ 13.88, -1 ],
[ 8.28, 0 ],
[ 45.29, 2 ],
[ 25.71, 4 ],
[ 24.13, 8 ],
[ 101.49, 12 ],
[ 155.75, 16 ],
[ 142.93, 20 ],
[ 100.38, 24 ],
[ 51.73, 36 ],
[ 41.24, 48 ],
[ 33.8, 60 ],
[ 30.28, 72 ],
[ 25.38, 96 ],
[ 22.73, 120 ],
[ 19.11, 144 ],
]

Now for some D3 action. The basic workflow in D3 is to select an html element then modify it, adding drawing elements or what have you, according to our data. If we get it right, the drawing elements will show up in the correct spot on the screen to make a plot or whatever it is you're trying to create.

Here let's start by selecting the D3Div div element, adding an svg layer to it, and setting the height and width. This is the layer we'll draw the plot in. Note that first I'm taking the height and width of the D3Div div and using these as attributes for the svg layer. We use the d3.select() method to select D3Div, then we use .append() to add an svg element. Finally we set the height and width with .attr and store everything in the var svg.


 // get the height and width of the target container
//set padding for the graph
container = document.getElementById( "D3Div" );

var w = container.offsetWidth
var h = container.offsetHeight
var padding = 50;

// select the container D3Div and set its attributes. add an <svg> tag to it
var svg = d3.select("#D3Div")
.append("svg")
.attr("width", w)
.attr("height", h);

Did you notice the "chained" commands? The commands that are separated by "." are run in series allowing us to block multiple functions into one d3 call.

Scales

Now before we can map data to our svg layer we need to set up a scale. This will scale the data to fit the x and y range of our drawing layer which is itself established by the div size. The scale takes two arguments, domain and range. Domain is the interval of our input data, and range is the interval it is scaled to. Check out this image from Jerome Cukier



For our purposes we want the input interval, domain, to span our dataset, and the output to map to the div size minus a little padding. To calculate the span of our input data we're using the built in d3.max() function to loop through the array and find the maximum. Then for the range, we pass our height, width and padding variables.


 //This makes a scale. takes the max of the arrays and transforms them linearly
 //Use padding in the range so its not up on the edge
 var xScale = d3.scaleLinear()
.domain([-1, d3.max(dataset, function(d) { return d[1]; })])
.range([padding, width- padding]);
var yScale = d3.scaleLinear()
.domain([d3.max(dataset, function(d) { return d[0]; }),0])
.range([padding, height- padding]);

Let's draw some stuff!!

So now we have variables which store functions for selecting the D3Div, the X and Y scales and our datasets. Time to put them together to make something! We use the .selectAll() function to select all the circle drawing elements (even though they don't exist yet). Then we load the data with .data() and bind it to the elements with .enter. Then we use .append to draw a circle and .attr to set the x and y positions termed 'cx' and 'cy'. The functions we pass to the cx and cy attributes calls our scale functions on the appropriate 'column' of the array. The 'r' attribute defines the radius of the circles as 6. This is all repeated for each element of the data.


    svg.selectAll("circle")
.data(dataset)
.enter()
.append("circle")
.attr("cx", function(d) {
return xScale(d[1]);
})
.attr("cy", function(d) {
return yScale(d[0]);
})
.attr("r", 6);

Not bad! the dots (circles) have been arranged by their cx and cy attributes according to the scaled data. Go ahead and right click and 'inspect element' on the plot to see how it's broken down.

Next we want to add a line. To do that we'll make another variable to store a function using ds.line() to set the x and y coordinates of the path. Then we append that line to svg using svg.append("path"). We pass the dataset using .datum() this time since we are only drawing one element. Confused? more on .datum() versus .data() by the developer here. Finally, we set the "d" attribute (which is an html attribute that defines a path to follow) by calling our line function .attr("d", line);:


    var line = d3.line()
.x(function(d) { return xScale(d[1]); })
.y(function(d) { return yScale(d[0]); });
svg.append("path")
.datum(dataset)
.attr("fill", "none")
.attr("stroke", "steelblue")
.attr("stroke-width", 1.5)
.attr("d", line);

Axes

Now it's time for some axes. We should be used to storing functions in variables by now and here we make variables of the d3 functions d3.axisLeft() and d3.axisBottom() and pass our scale functions so they're the correct size. Then we need to position them. We can use our previously defined height and padding variables to tell them where to sit in the div. Finally, we add them to the plot using .append() and .call(). We set their position by passing xAxisPosition and yAxisPosition to the transform attribute. Note that translate sets an element to the position (x,y):


    //Set up the axis with the scale we made above
    var yAxis = d3.axisLeft(yScale);
var xAxis = d3.axisBottom(xScale);
//position them. Use the padding to stay inside the div var xAxisPosition = height-padding
var yAxisPosition = padding-10

//add the x axis svg.append("g")
.attr("transform", "translate(0,"+xAxisPosition+")")
.call(xAxis);

//add the y axis svg.append("g")
.attr("transform", "translate("+yAxisPosition+",0)")
.call(yAxis)

Starting to look more like a plot and less like modern art!

Axis Labels

Now we just need some labels. We simply set their positition using the height, width, and padding variables and add them to the plat using the now familiar .append and .attr functions: Here we can also add a bit of styling using .style.


    //set the positions
    var xLabelPositionY = height
var xLabelPositionX = width/2
var yLabelPositionY = padding -35
var yLabelPositionX = 0-(height/2)

//add them along with their text //can specify styling as well svg.append("text")
.attr("y", xLabelPositionY)
.attr("x", xLabelPositionX)
.style("text-anchor", "middle")
.style("font-family", "Monospace")
.text("Time");

//Same for the Y axis. //NOTE: the x and y are inverted because of the rotate! svg.append("text")
.attr("y", yLabelPositionY)
.attr("x", yLabelPositionX)
.attr("transform", "rotate(-90)")
.style("text-anchor", "middle")
.style("font-family", "Monospace")
.text("Counts");

Hey, that's a proper plot right there! But wait there's more! Since this is javascript we can do all the fun javascripty things like write callback functions!

Making plots interactive:

Here let's make a function that makes the dots (circles) clickable. To do that, we make a function 'clicked'. The function has four major steps:

1 : Select and set all the circles to black (to undo previous color changes).

2 : Invert the scale to go from pixels to data and call findNearest() to get the closest data point. This function from Andy Aiken on scottlogic.com is defined just below. It loops through the data and compares each data point to the clicked data and finds the smallest difference i.e. the closest data point. Clever huh?

3 : Use nearest to set the div clickTable to include those points.

4 : Change the color of the clicked dot to red (and kind of blink while doing so).

We invoke the function clicked by adding it to the end of the the circle drawing code chunk (since it targets only the circles).


     svg.selectAll("circle")
.data(dataset)
.enter()
.append("circle")
.attr("cx", function(d) {
return xScale(d[1]);
})
.attr("cy", function(d) {
return yScale(d[0]);
})
.attr("r", 6)
.on("click", clicked);

function clicked(d, i) {
// set all to black to 'undo' previous clicks d3.selectAll("circle")
.style("fill", "black")
.attr("r", 6)

// invert the scale to use the find nearest var xMouse = xScale.invert(d3.mouse(this)[0]),
nearest = findNearest(xMouse);

document.querySelector(".clickTableContainer").innerHTML = "Clicked Datapoints: "+nearest;;
//change color of the dot d3.select(this).transition()
.style("fill", "red")
.attr("r", 12)
.transition()
.attr("r", 6)
}

// function to get the nearest data //loops through dataset and compares the difference to mouse data //then finds the closest datapoint function findNearest(xMouse) {
var nearest = null,
dx = Number.MAX_VALUE;
dataset.forEach(function(data) {
var xData = data[1],
xDiff = Math.abs(xMouse - xData);
if (xDiff < dx) {
dx = xDiff;
nearest = data;
}
});
return nearest;
}

What about those hover labels?

They're rendered by tipsy. Here's the relevant block:


    //tipsy handles the tool tip:
    $('svg circle').tipsy({ 
gravity: 'w',
html: true,
title: function() {
var d = this.__data__
return 'Time:'+d[1]+'<br>Counts:'+d[0];
}
});

Great! But doesn't plotly do all this?

As discussed in the introduction, plotly is great for routine charting. If you want to do anything outside the bounds of plotly though, D3 can make it happen. For example, here's the same plot drawn with poop emojis!

Putting it all together:

In conclusion, I usually don't recommend re-inventing the wheel. So if plotly provides what you're looking for, that might be the way to go. If, however you want to make something completely new, or just like a challenge D3 is worth taking the time to look into. The learning curve for D3 is steep but once you get a handle on the workflow it comes quite naturally. Also check out bl.ocks.org for inspiration and code. Good luck!

Here's all the code in one block for copy pasting:


  <!-- D3 -->
<script src="https://d3js.org/d3.v4.min.js"></script>
<!-- D3 requires jquery-->
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.2.4/jquery.min.js"></script>
<!-- Tipsy will make the hover label-->
<script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/jquery.tipsy/1.0.3/jquery.tipsy.min.js"></script>
<link href="https://cdnjs.cloudflare.com/ajax/libs/jquery.tipsy/1.0.3/jquery.tipsy.min.css" rel="stylesheet" type="text/css" />
<script>

/*
* The Data
*/

var dataset = [
[ 13.88, -1 ],
[ 8.28, 0 ],
[ 45.29, 2 ],
[ 25.71, 4 ],
[ 24.13, 8 ],
[ 101.49, 12 ],
[ 155.75, 16 ],
[ 142.93, 20 ],
[ 100.38, 24 ],
[ 51.73, 36 ],
[ 41.24, 48 ],
[ 33.8, 60 ],
[ 30.28, 72 ],
[ 25.38, 96 ],
[ 22.73, 120 ],
[ 19.11, 144 ],
]

/*
* The Setup
*/

// get the height and width of the target container
//set padding for the graph
container = document.getElementById( "D3Div" );

var width = container.offsetWidth
var height = container.offsetHeight
var padding = 50;


//This makes a scale. takes the max of the arrays and transforms them linearly
//Use padding in the range so its not up on the edge
var xScale = d3.scaleLinear()
.domain([-1, d3.max(dataset, function(d) { return d[1]; })])
.range([padding, width- padding]);
var yScale = d3.scaleLinear()
.domain([d3.max(dataset, function(d) { return d[0]; }),0])
.range([padding, height- padding]);

// select the container D3Div and set its attributes. add an <svg> tag to it
var svg = d3.selectAll(".D3Plot")
.append("svg")
.attr("width", width)
.attr("height", height);

/*
* The Plot Drawing Elements
*/

/*
You have to select the circles even before they're created
the data function binds data to the circles
the enter function creates the circles
the append function sticks them on
then we set the attributes for each. x,y and r=radius
*/

svg.selectAll("circle")
.data(dataset)
.enter()
.append("circle")
.attr("cx", function(d) {
return xScale(d[1]);
})
.attr("cy", function(d) {
return yScale(d[0]);
})
.attr("r", 6)
.on("click", clicked);

//Replace the chunk above with this commented chunk for a poop plot:

/*
svg.selectAll("image")
.data(dataset)
.enter()
.append("svg:image")
.attr('x', function(d) {
return xScale(d[1]);
})
.attr('y', function(d) {
return yScale(d[0]);
})
.attr('width', 20)
.attr('height', 20)
.attr("xlink:href", "https://cdn.shopify.com/s/files/1/1061/1924/products/Poop_Emoji_7b204f05-eec6-4496-91b1-351acc03d2c7_grande.png?v=1480481059")
*/

//tipsy handles the tool tip:
$('svg circle').tipsy({
gravity: 'w',
html: true,
title: function() {
var d = this.__data__
return 'Time:'+d[1]+'<br>Counts:'+d[0];
}
});

//create a line function
//use the same scales
var line = d3.line()
.x(function(d) { return xScale(d[1]); })
.y(function(d) { return yScale(d[0]); });

/*
append the line function to the svg.
note you could put some of these attributes in a class .line and call:
.attr("class", "line")
*/

svg.append("path")
.datum(dataset)
.attr("fill", "none")
.attr("stroke", "steelblue")
.attr("stroke-width", 1.5)
.attr("d", line);

/*
* Axes
*/

//Set up the axis with the scale we made above
var yAxis = d3.axisLeft(yScale);
var xAxis = d3.axisBottom(xScale);

//position them. Use the padding to stay inside the div
var xAxisPosition = height-padding
var yAxisPosition = padding-10

//add the x axis
svg.append("g")
.attr("transform", "translate(0,"+xAxisPosition+")")
.call(xAxis);

//add the y axis
svg.append("g")
.attr("transform", "translate("+yAxisPosition+",0)")
.call(yAxis)

/*
* Axis Labels
*/

//set the positions
var xLabelPositionY = height
var xLabelPositionX = width/2
var yLabelPositionY = padding -35
var yLabelPositionX = 0-(height/2)

//add them along with their text
//can specify styling as well
svg.append("text")
.attr("y", xLabelPositionY)
.attr("x", xLabelPositionX)
.style("text-anchor", "middle")
.style("font-family", "Monospace")
.text("Time");

//Same for the Y axis.
//NOTE: the x and y are inverted because of the rotate!
svg.append("text")
.attr("y", yLabelPositionY)
.attr("x", yLabelPositionX)
.attr("transform", "rotate(-90)")
.style("text-anchor", "middle")
.style("font-family", "Monospace")
.text("Counts");

function clicked(d, i) {
// set all to black to 'undo' previous clicks
d3.selectAll("circle")
.style("fill", "black")
.attr("r", 6)

// invert the scale to use the find nearest
var xMouse = xScale.invert(d3.mouse(this)[0]),
nearest = findNearest(xMouse);

divs = document.querySelectorAll(".clickTableContainer");
for (i = 0, len = divs.length; i < len; i++) {
divs[i].innerHTML = "Clicked Data Points: "+ nearest;
};

//change color of the dot
d3.select(this).transition()
.style("fill", "red")
.attr("r", 12)
.transition()
.attr("r", 6)
}

// function to get the nearest data
//loops through dataset and compares the difference to mouse data
//then finds the closest datapoint
function findNearest(xMouse) {
var nearest = null,
dx = Number.MAX_VALUE;
dataset.forEach(function(data) {
var xData = data[1],
xDiff = Math.abs(xMouse - xData);
if (xDiff < dx) {
dx = xDiff;
nearest = data;
}
});
return nearest;
}

Friday, December 22, 2017

xkcd plots!

xkcd plots:

One of my favorite things about R is the massive number of community built packages. Besides the datascience essentials like tidyverse, there’s some quirky fun ones like emoGG for when you want :hankey: instead of the normal pch icons. So it was to my nerdy elation that I found the ggplot add on xkcd to draw ggplots in the style of the legendary webcomic. Let’s give it a try:

#the following block is used to install the fonts:

#install.packages('xkcd')
# download.file("http://simonsoftware.se/other/xkcd.ttf", dest="xkcd.ttf", mode="wb")
# system("mkdir ~/Library/Fonts/")
# system("cp xkcd.ttf  ~/Library/Fonts/")
# font_import(pattern = "[X/x]kcd", prompt=FALSE)
# fonts()
# fonttable()
# if(.Platform$OS.type != "unix") {
#   ## Register fonts for Windows bitmap output
#   loadfonts(device="win")
# } else {
#   loadfonts()
# }

The trickiest part is getting the xkcd font downloaded and put in the right directory. Once you have that figured out the plot is very straightforward:

#dataframe:
df <- data.frame(c(2,3,4,16,32,64,128,356,0,0,0,0,0),c(10,9,8,7,6,5,4,1.25,1.21,1.20,1.19,1.1,1))
colnames(df) <- c('proj','days')

library(xkcd)
library(ggplot2)
library(extrafont)

# here's just the plot:
xrange <- range(df$days)
yrange <- range(df$proj)
set.seed(123) # for reproducibility
p <- ggplot() + geom_smooth(aes(days, proj), data=df, method = 'loess',se=F) + xkcdaxis(xrange,yrange) +
  xlab('Days til deadline') +
  ylab('Time spent on this silly plots') +
  geom_vline(xintercept = 2,colour = "red") +
  annotate("text", 3.0, 250, label = "Crisis Point",colour = "red",family='xkcd') +
  scale_x_reverse()
p

You can also include stickfigures! They are a bit tought to draw though. I recommend experimenting with the angles until you get it right. The vignettes do a good job of explaining how the different parameters map.

#this block makes the stick figure:
#play around with the angles to see how they change the drawing:
mapping <- aes(x, y, scale, ratioxy, angleofspine,
               anglerighthumerus, anglelefthumerus,
               anglerightradius, angleleftradius,
               anglerightleg, angleleftleg, angleofneck)
ratioxy <- diff(xrange) / diff(yrange)
dataman <- data.frame( x= c(8), y=c(250),
                       scale = 75 ,
                       ratioxy = ratioxy,
                       angleofspine =  -pi/2  ,
                       anglerighthumerus = c(-pi/6),
                       anglelefthumerus = c(-pi/2 - pi/6),
                       anglerightradius = c(pi*1.25),
                       angleleftradius = c(pi*1.25),
                       angleleftleg = 3*pi/2  + pi / 12 ,
                       anglerightleg = 3*pi/2  - pi / 12,
                       angleofneck = runif(1, 3*pi/2-pi/10, 3*pi/2+pi/10))
datalines <- data.frame(xbegin=c(7),ybegin=c(250),
                        xend=c(6.5), yend=c(300))

# plot out the first plot with the stick figure:
p <- ggplot() + geom_smooth(aes(days, proj), data=df, method = 'loess',se=F) + xkcdaxis(xrange,yrange) +
  xlab('Days til deadline') +
  ylab('Time spent on silly plots') +
  geom_vline(xintercept = 2,colour = "red") +
  annotate("text", 3.0, 250, label = "Crisis Point",colour = "red",family='xkcd') +
  scale_x_reverse() +
  xkcdman(mapping, dataman) +
  annotate("text", x=6, y = 320,
           label = "I think I have a problem...", family="xkcd" ) +
  xkcdline(aes(xbegin=xbegin,ybegin=ybegin,xend=xend,yend=yend),datalines, xjitteramount = 0.12)
p