LaTeX sparklines from R

Why?

Standard deviations and inter-quartile ranges help give a general impression of the distribution of a variable, but sometimes I want to show what a variable looks like, without devoting half the page to add a figure. In my PhD, I needed a quick way to take a variable in R, and produce a histogram or line graph that could slot directly into the text alongside the mean/median and SD/IQR. This way I could provide much more information on what the variable spread look liked, quickly.

My solution

Sparklines are tiny charts, usually line or histographs, that can be embedded in text into documents. Below are two examples of sparklines from my PhD. The top line has a histogram, the second line has two linegraphs. The two sentences are from different parts of my PhD, which is why it’s confusing to read the lines together. A latex function exists for plotting these values, but there was no easy way to take a variable in R, and quickly get LaTeX ready code. To speed things up I wrote the function below.

Examples of what the code produces in LaTeX.

My function

In the most basic application, all that is required is the variable you want (a vector or column of a dataframe), as the only input. For instance jb_CutIntoBins(variable) is sufficient to get a sparkline histogram of a variable in an R dataframe. My function will output all the LaTeX code required to plot a sparkline histogram with 10 bins (bars) and a width of 6. If you provide two pieces of data, then the function will assume you want a linegraph (first variable will be on the y-axis, second on the x-axis) The number of bins can be edited via the options bins= and width=. You can also name the sparkline with name=. I put a proper help file into the package, that can be called from R with ?jb_CutIntoBins, if you install it via github (instructions below).

Example use

Load in some test data

This is only needed to recreate the demo sparklines I make below.

url <- 'https://gist.githubusercontent.com/epijim/8819934/raw/6c76df80eb095065a9ce0fa4b8f94410ad528fed/college_data.csv'
 library(RCurl)
 data <- getURL(url, ssl.verifypeer=FALSE)
 data <- read.csv(text = data)

Load in my function.

Assuming you have the devtools package installed you can load my function via the line of code below. Installing it this way will load up all my functions, and will also load up the help documentation. Otherwise, at the bottom of the post is the full code for the function.

install_github('epijim/EpiJimFunctions')

Test the function

Firstly - you have to add the package sparklines to the front matter in LaTeX to render this code.

Typing jb_CutIntoBins(data$Wine_budget) will result in the following output. This code can then be added to the front matter, and then you can call it in text with \sparkline:

\newcommand{\sparkline}{\begin{sparkline}{6}
 \sparkspike 0 1
 \sparkspike 0.111111111111111 0.5
 \sparkspike 0.222222222222222 0.7
 \sparkspike 0.333333333333333 0.3
 \sparkspike 0.444444444444444 0
 \sparkspike 0.555555555555556 0
 \sparkspike 0.666666666666667 0.2
 \sparkspike 0.777777777777778 0.1
 \sparkspike 0.888888888888889 0
 \sparkspike 1 0.1
\end{sparkline}}

As an side, I prefer to keep the LaTeX code for the sparklines separate from the main text, which is why I use \newcommand to define the sparkline in the front matter, then I can just call it via a single word in the text.

A more complicated example using all the options to make a line graph is below.

#quickly make an age variable from my example data
  data$collegeage <- 2015 - data$Founded

# Using my function
jb_CutIntoBins(data$Fixed_assets, # First variable is the thing I'm interested in
  data$collegeage, # Adding a second variable switches from histograph to linegraph
  bins=5, # Number of points I'm collapsing the data into
  width=4, # How wide the in text plot should be
  name="wineandage") # I can name the sparkline, if I make more than one.

This will produce the following LaTeX code.

\newcommand{\sparkoldmoney}{\begin{sparkline}{4}
 \spark
   0   0.285653902704857
   0.25   0.524112445240158
   0.5   0.656947333939176
   0.75   0.692538305471606
   1   1
/ \end{sparkline}}

Which can be added to the front matter of the LaTeX document, then in text you can call it like below.

The following line is slope-y \sparkoldmoney.

And it LaTeX will render the following.

An example linegraph, AKA the original sparkline.

Although the variables I used are a little confusing, it’s much more instinctive to read if you have linear time as the x-axis and are tracking something. I should also note that the linegraph function takes the median of each bin.

The function code

I wrote this function in under an hour, in the hopes of it saving me from a job that would only take a couple of hours. So this is not polished code and I’m keen to hear how it can be improved. If I do revisit this code I’ll add an option to use means or maybe a smoother rather than medians, and will incorporate all the other sparkline options available in the LaTeX package, like boxes and coloured points on the linegraphs.

jb_CutIntoBins <- function(variable,time="potato",name="line",bins=10,width=6){
  # Make histograph ###################### if clause ###########################
  if(length(time)==1){
    temp_ranges <- range(variable, na.rm=T) # Get range
    temp_breaks <- seq(temp_ranges[1],temp_ranges[2], # From first to last
                       length=(bins+1)) # Number of bins
    temp_cut <- as.data.frame(cut(variable,
                                  breaks=temp_breaks,labels=F))
    # Set up dataset
    temp_data <- data.frame(bin=1:bins,
                            value=c(NA)
    )
    temp_cut <- as.data.frame(table(temp_cut),stringsAsFactors=F) #
    temp_cut$temp_cut <- as.numeric(temp_cut$temp_cut)
    # Fill dataset
    for(i in 1:bins){
      tryCatch(
        if(i==temp_data[i,1]){
          temp_data[i,2] <- temp_cut[temp_cut$temp_cut==i,2]
        },error=function(e){})
    }
    # Make sparkline
    binlocations <- seq(0,1,length=bins)
    middle<-NA
    for(i in 1:nrow(temp_data)){
      ifelse(is.na(temp_data[i,2]),
             middle[i] <- paste0(" \\sparkspike ",binlocations[i]," ",0),
             middle[i] <- paste0(" \\sparkspike ",binlocations[i]," ",
                                 temp_data[i,2]/max(temp_data$value,na.rm=T))
      )
    }
  }
  # Make line ###################### if clause ###########################
  if(length(time)>1){
    temp_ranges <- range(time, na.rm=T) # Get range
    temp_breaks <- seq(temp_ranges[1],temp_ranges[2], # From first to last
                       length=(bins+1)) # Number of bins
    temp_cut <- as.data.frame(cut(time,
                                  breaks=temp_breaks,labels=F))
    temp_data <- data.frame(temp_cut,time=time,value=variable)
    names(temp_data)[1] <- "group" # messy as temp_cut is a df
    temp_data <- aggregate(temp_data$value,by=list(temp_data$group),FUN=median,na.rm=T)
    names(temp_data)[1] <- "group"
    temp_data$x <- temp_data$x/max(temp_data$x)
    temp_output <- data.frame(group=1:bins,value=c(0))
    for(i in 1:bins){
      tryCatch(
        if(i==temp_data[temp_data$group==i,1]){
          temp_output[i,2] <- temp_data[temp_data$group==i,2]
        },error=function(e){})
    }
    temp_output$datalocations <- seq(0,1,length=bins)
    middle <- c(" \\spark",
                paste0("   ",temp_output$datalocations,"   ",temp_output$value)
    )
  }
################### OUTPUT
  opening <- paste0("\\newcommand{\\spark",
                    name,"}{\\begin{sparkline}{",
                    width,"}")
  close <- "/ \\end{sparkline}}"
  output <- c(opening,middle,close)
  cat(output,sep="\n")
}
A post about: , , and

You May Also Enjoy