Monday, May 9, 2016

Outputting a Data Table to PDF from R

Outputting a Data Table to PDF from R
I’ve recently developed a workflow for writing manuscripts with co-authors lately where I:
  1. Perform all possible analyses in R in a single Rmarkdown script in Rstudio, knitted to a PDF. Use the notebook aspect of Rmarkdown extensively, so all my thoughts of what I’m doing and why are in there.
  2. Save the final workspace from the Rmarkdown script (with all the datasets input, transformed and further analyzed) to an external .Rdata file.
  3. Produce all the necessary figures for publication by sourcing a separate R script that contains code for producing those figures, generally as TIFF image files, at the necessary sizes for submission.
  4. Transfer all my miscellanous thoughts on methodology and results to actual submittable-manuscript form in Word as a .docx because many of my collaborators don’t use Markdown or Latex or even Google Docs, and plus tracking revisions is easiest in Word, as opposed to Google Docs or PDF.
Steps 2+3 are semi-necessary because the Rmarkdown script may take very long to run (i.e. multiple days on my machine), which sometiems is just lengthy data transformations (cough dating phylogenies in R cough) and so its useful to tweak figures slightly by just sourcing a script in a saved environment.

But recently I ran into an issue where I wanted R to spit out an auto-generated table as a PDF that looked nice, for the sake of including as a supplemental table to a manuscript.

So I has a bunch of symmetric pair-wise distances with some long (but necessary) labels, which meant one really ugly distance matrix. Imagine we have such a thing and we asked R to print it out.
##            ezjibmlvpy hiefdsurjx dpigheqcnr nphmejaiqg ebijmurcsf
## ezjibmlvpy          0          4          4          3          5
## hiefdsurjx          4          0          2          3          1
## dpigheqcnr          4          2          0          1         NA
## nphmejaiqg          3          3          1          0          5
## ebijmurcsf          5          1         NA          5          0
## csvhynkzjw          1          4         NA          2         NA
## celqhvdntm         NA          4         NA          4          5
## vecxfatldq         NA          2          3         NA          1
## meavfdgznh          3         NA          4          4         NA
## sjctdwmxki          1          1         NA          5         NA
##            csvhynkzjw celqhvdntm vecxfatldq meavfdgznh sjctdwmxki
## ezjibmlvpy          1         NA         NA          3          1
## hiefdsurjx          4          4          2         NA          1
## dpigheqcnr         NA         NA          3          4         NA
## nphmejaiqg          2          4         NA          4          5
## ebijmurcsf         NA          5          1         NA         NA
## csvhynkzjw          0          4          1         NA          1
## celqhvdntm          4          0         NA          5          1
## vecxfatldq          1         NA          0          1          4
## meavfdgznh         NA          5          1          0          1
## sjctdwmxki          1          1          4          1          0

Truly a very ugly distance matrix. We have a zero diagonal we don’t want to show, and showing both the upper and lower triangles is rather redundant. So, we can clean it up a little, mainly by erasing values we don’t want to look at…
table[lower.tri(table,diag=TRUE)]<-NA
table[is.na(table)]<-" "
table<-table[-ncol(table),-1]
##            hiefdsurjx dpigheqcnr nphmejaiqg ebijmurcsf csvhynkzjw
## ezjibmlvpy "4"        "4"        "3"        "5"        "1"       
## hiefdsurjx " "        "2"        "3"        "1"        "4"       
## dpigheqcnr " "        " "        "1"        " "        " "       
## nphmejaiqg " "        " "        " "        "5"        "2"       
## ebijmurcsf " "        " "        " "        " "        " "       
## csvhynkzjw " "        " "        " "        " "        " "       
## celqhvdntm " "        " "        " "        " "        " "       
## vecxfatldq " "        " "        " "        " "        " "       
## meavfdgznh " "        " "        " "        " "        " "       
##            celqhvdntm vecxfatldq meavfdgznh sjctdwmxki
## ezjibmlvpy " "        " "        "3"        "1"       
## hiefdsurjx "4"        "2"        " "        "1"       
## dpigheqcnr " "        "3"        "4"        " "       
## nphmejaiqg "4"        " "        "4"        "5"       
## ebijmurcsf "5"        "1"        " "        " "       
## csvhynkzjw "4"        "1"        " "        "1"       
## celqhvdntm " "        " "        "5"        "1"       
## vecxfatldq " "        " "        "1"        "4"       
## meavfdgznh " "        " "        " "        "1"
Still, printing it in R is pretty ugly, and copy-pasting it to Excel or Word or something and making it prettier for a manuscript seems rather time-consuming, especially if I’m going to be tweaking things upstream and these values might be changing (or even the number of comparisons might change. It would be nice to automize converting this to a servicable PDF.

Now, I just wanted to quickly generate a decent-looking PDF of this table from an environment just by sourcing a script, like I do with figures. As someone who only learned Rmarkdown two years ago and still mainly uses it via Rstudio, this was a bit of a challenge. So I put a call out to Twitter, and I got a helpful response from François Michonneau that put me on the right track:
Hmm, well, it did turn out to be a little more involved than that. Using packages knitr and rmarkdown, we can convert a table using kable, output the table to a separate .Rmd file and then render that file.
library(knitr)
library(rmarkdown)

table<-kable(table, format="markdown")
cat(table, sep="\n", file="Table.Rmd")
render("table.Rmd",output_format = "pdf_document")
And here’s a screen shot of what that produces.


The table is trailing off the page, because its wider than your typical LaTeX page size. Usually we could just rotate the table to avoid this, but in this particular case, we have a symmetric table with long column and row labels. So we need to rotate the page, and then we can plot our table comfortably…
format<-"---\noutput: pdf_document\npapersize: landscape\n---\n\n"
cat(c(format,table),
    sep="\n",file="tableRotate.Rmd")
render("tableRotate.Rmd")
And here is what that looks like:


The page is a little bigger than I’d like, but close enough! I’ll call it a success.