]> git.donarmstrong.com Git - don.git/blob - posts/adding_toc_to_pdfs_in_R.mdwn
0b67f734a8b70bc0aee97239fd58dbb8bf98f2c2
[don.git] / posts / adding_toc_to_pdfs_in_R.mdwn
1 [[!meta title="Adding a Table of Contents to PDFs from R"]]
2
3 I routinely generate very large PDFs from R which have hundreds (or
4 thousands) of pages, and navigating these pages can be very difficult.
5 Unfortunately, neither R's pdf() nor its cairopdf() drivers support
6 creating Table of Contents (or Index) while plots are being written
7 out. In the case of cairo, the underlying library doesn't
8 [support it either](http://osdir.com/ml/lib.cairo/2005-08/msg00506.html),
9 so this isn't something that can easily be added to R directly. I had
10 been thinking about sitting down for months and writing the support
11 into cairo and R's cairo package... but real life kept getting in the way.
12
13 Fast forward to a week ago, when I realized that `pdftk` does support
14 dumping the table of contents and updating the table of contents using
15 `dump_data_utf8` and `update_info_utf8`! Armed with that knowledge,
16 and a bit of hackery, we can save an index, and then update the pdf
17 once it's been closed.
18
19 The R code then looks like the following:
20
21      ..device.set.up <- FALSE
22      ..current.page <<- 0
23      
24      save.bookmark <- function(text,bookmarks=list(),level=1,page=NULL) {
25          if (!..device.set.up) {
26              Cairo.onSave(device = dev.cur(),
27                           onSave=function(device,page){
28                               ..current.page <<- page
29                           })
30              ..device.set.up <<- TRUE
31          }
32          if (missing(page)|| is.null(page)) {
33              page <- ..current.page+1
34          }
35          bookmarks[[length(bookmarks)+1]] <-
36              list(text=text,
37                   level=level,
38                   page=page)
39          return(bookmarks)
40      }
41      
42      write.bookmarks <- function(pdf.file,bookmarks=list()) {
43          pdf.bookmarks <- ""
44          for (bookmark in 1:length(bookmarks)) {
45              pdf.bookmarks <-
46                  paste0(pdf.bookmarks,
47                         "BookmarkBegin\n",
48                         "BookmarkTitle: ",bookmarks[[bookmark]]$text,"\n",
49                         "BookmarkLevel: ",bookmarks[[bookmark]]$level,"\n",
50                         "BookmarkPageNumber: ",bookmarks[[bookmark]]$page,"\n")
51          }
52          temp.pdf <- tempfile(pattern=basename(pdf.file))
53          temp.pdf.info <- tempfile(pattern=paste0(basename(pdf.file),"info_utf8"))
54          cat(file=temp.pdf.info,pdf.bookmarks)
55          system2("pdftk",c(pdf.file,'update_info_utf8',temp.pdf.info,'output',temp.pdf))
56          if (file.exists(temp.pdf)) {
57              file.rename(temp.pdf,pdf.file)
58          } else {
59              warning("unable to properly create bookmarks")
60          }
61      }
62
63 and can be used like so:
64
65      cairopdf(file="testing.pdf")
66      bookmarks <- list()
67      bookmarks <- save.bookmark("First plot",bookmarks)
68      plot(1:5,6:10)
69      bookmarks <- save.bookmark("Second plot",bookmarks)
70      plot(6:10,1:5)
71      dev.off()
72      write.bookmarks("testing.pdf",bookmarks)
73
74 et voila. Bookmarks and a table of contents for PDFs.
75
76 This basic methodology can be extended to any language which writes
77 PDFs and does not have a built-in method for generating a Table of
78 Contents. Currently, the usage of `Cairo.onSave` is a horrible hack,
79 and may conflict with anything else which uses the onSave hook, but
80 hopefully R will report the current page number from Cairo in the
81 future.
82
83 [[!tag tech r]]