2 %% add 'handout' option for handouts, and pgfpages for 2-on-1
3 \documentclass[smaller,compress]{beamer}
5 %\pgfpagesuselayout{2 on 1}[letterpaper,border shrink=5mm]
6 %\pgfpagesuselayout{4 on 1}[letterpaper,border shrink=5mm]
7 %\pgfpagesuselayout{2 on 1}[a4,border shrink=5mm]
9 \include{setup} %% has all definitions etc
11 %\title[cran2deb: Automated CRAN to Debian packages generation]{cran2deb: A
12 % system to automatically provide 1700+ CRAN packages as Debian binaries}
13 \title[cran2deb: CRAN to Debian packages]{cran2deb: A fully automated CRAN to \\
14 Debian package generation system}
15 \subtitle{\textsl{UseR! 2009 Presentation}}
16 \subject{UseR! 2009 Presentation}
17 \author[Charles Blundell \and Dirk Eddelbuettel]{Charles Blundell\inst{1} \and Dirk Eddelbuettel\inst{2}}
18 \institute[Gatsby \and Debian]{\inst{1}Gatsby Computational Neuroscience Unit
19 \\ University College London, UK \and \inst{2}Debian and R Projects \\ Chicago,
21 %\date[UseR! 2009]{Presentation at UseR! 2009 \\ Rennnes, France \\ July 2009}
22 \date[UseR! 2009 Presentation]{Universit\'{e} Rennes II, Agrocampus Ouest \\ Laboratoire de
23 Math\'{e}matiques Appliqu\'{e}es \\ 8-10 July 2009}
37 \section[Why]{Why: Background and Motivation}
39 \frametitle{About R -- and its repos}
40 \framesubtitle{An open statistical language / environment -- with lots of
41 excellent code contributions}
43 A few key facts that are non-controversial at a \textsl{useR!} conference:
45 \item \R\ is now a standard for statistical applications and research
46 \item \textit{``Success has many fathers''}: several key drivers can
47 be identified as to why \R\ has done so well
48 \item We would like to stress \textsl{repos} and available packages here:
49 CRAN, as well as BioConductor and Omegahat.
50 \item CRAN has been one of the drivers: an open yet rigourously QA'ed
51 repostory which has experienced tremendous growth
56 \frametitle{CRAN Packages} %% NB Or shall we merge this with the preceding slide?
57 \framesubtitle{Exponential Growth}
62 \includegraphics[height=6cm,transparent]{figures/Packages}
65 Source: Fox (2008, 2009), our calculations
71 \item CRAN archive network growing by 40\% p.a., now at around 1750 packages
73 \item John Fox provided this chart in an invited lecture at the last
74 \emph{useR!} meetings.
77 \begin{column}{0.25in}
84 \frametitle{Debian and Ubuntu} % NB Maybe skip this slide?
85 \framesubtitle{Open Linux distributions}
89 \item Debian is \textsl{the} community-driven Linux distribution where
90 numerous volunteers provide over twenty-thousand packages for around
91 a dozen architectures.
92 \item Packages and package management ``just work'': with arguably the most
93 advanced and robust package management system, and a tremendous
94 build and test infrastructure.
95 \item Ubuntu has taken Debian, added a fair amount of spit and polish, as
96 well as regular bi-annual releases, and has rapidly gained mind- and
97 well as market-share as the Linux distribution to beat.
98 \item We also note that the CRAN backend is implemented on Debian.
103 \frametitle{Why build Debian R packages?}
104 \framesubtitle{Combining R and Debian}
105 Bates, Eddelbuettel and Gebhard (UseR! 2004) listed a number of reason
108 \item \textbf{Dependencies} are resolved automatically: \textsl{it just
110 \item \textbf{Convenience} of installing binary packages via
112 %easier than building from source
113 \item \textbf{Quality control} as build daemons, automated rebuilds,
114 porting, ... all ensure that everything is pretty much buildable all the
116 \item \textbf{Scalability} as building one binary package and scripting
117 installation on a cluster beats doing lots of manual installations
118 \item \textbf{Common platform} as Debian forms the base for Ubuntu and
119 several other derivative or single-focus distributions
120 \item \textbf{Different architectures} ranging from small arm or mips based
121 systems to amd64, sparc64, hppa or even s390 mainframes
122 \item \textbf{Audience} given the reach of Debian and Ubuntu, large number
123 of users can be reached with little effort
128 %\section{What is behind it?}
130 \frametitle{So what is a Debian package?} % NB Maybe skip this?
131 \framesubtitle{And how do I build it?}
133 Building a Debian package is similar to using \texttt{R
136 \item Reads meta-information is read from the files in the debian/ directory
138 \item debian/control (similar to R's DESCRIPTION) lists names,
139 maintainers, build- and run-time dependencies
140 \item debian/copyright lists all author, license holders and copyright
142 \item debian/changelog provides current and past version numbers with a
143 list of all changes in chronological fashion
144 \item debian/rules is a Makefile containing all steps to configure,
145 build, install, package-create and clean
147 \item Employs a number of external tools scripts and tools, can be used
148 interactively or in batch mode in chroot'ed 'clean rooms'
153 \section[How]{How: Key aspects of the approach and implementation}
155 \frametitle{Comparing two approaches}
156 \framesubtitle{What have we learned?}
158 Eddelbuettel, Vernazobres, Gebhard and M\"{o}ller (UseR 2007) presented a first
167 \item Top-down approach
168 \item Monolithic and large Perl program
169 \item Meta-information encode directly as Perl hashes in program
170 \item Re-implementing chunks of what \R does in parsing archives
171 \item Not very robust
178 \item Bottom-up approach
179 \item Collection of \R and shell scripts, also lots of SQL
180 \item Re-using \R internal infrastructure as much as possible
181 \item Influenced by %Eddelbuettel's
182 \href{http://dirk.eddelbuettel.com/cranberries/}{CRANberries} and its
183 200 lines of \R code to monitor and summarize CRAN changes
190 \frametitle{Technology Overview: Big Picture}
191 \framesubtitle{Key components}
193 % \textsc{Charles: Can you fill something in here, if I haven't stolen
194 % all nuggets on the previous slide?}
196 cran2deb is implemented as a collection of small tools:
198 \item cran2deb is just a wrapper script calling out to twenty-one other
199 'worker' scripts implementing the twenty-one basic high-level commands
201 \item 'worker' scripts are written in \R (for littler), Korn/Bash shell,
202 and in the Plan9 shell rc
203 \item all these scripts are small: the largest is 4 kb and only seven
205 \item this is recursive: 'help' is one of these scripts scanning for
206 doc-strings in the other scripts
208 \item cran2deb is also an R package that is being called by some of the R
209 scripts; the R package has just over 1500 lines of code, and it calls out
210 to R functionality from package utils and tools.
211 \item SQL it used fairly extensively via nine tables containing everything
212 from meta-information, blacklist to build logs.
217 \frametitle{Technology Overview}
218 \framesubtitle{Continued}
220 Re-use, re-duce, re-cycle:
223 \item All this makes use of Debian build infrastructure, notably the
224 pbuilder chroot environment and the package management system
225 \item cran2deb sets the build environment up by invoking the proper Debian
227 \item the 'production' use if fully automated via cron and report status
229 \item per-package patches are allowed (currently eleven packages have
230 mostly trivial patches)
231 \item source code is available via the r-forge subversion repository and archive
236 \section[Status]{Status: Where are we now?}
239 \frametitle{Building 1700+ package}
240 \framesubtitle{Summary from a package views}
242 It's easy: basically \textsl{everything} builds and is available as a
243 Debian package (complete with full dependencies) --- apart from:
246 \item 17 packages that are \textsl{not free enough}:\footnote{We should
247 provide a longer discussion of the various licenses.}
248 mclust, mclust02, ConvCalendar, SDDA, conf.design, isa2, optmatch,
249 rankreg, realized, rngwell19937, tnet, spatialkernel, Bhat, PTAk,
250 PredictiveRegression, RLadyBug, mapproj
251 \item 1 package that is obsolete: xgobi
252 \item 2 package that break building packages via cran2deb:\footnote{It
253 takes down the cronjob; we are stumped as to why.} dprep, EngrExpt
254 \item 1 package that is not built for 'other' reasons:\footnote{It contains
260 \frametitle{Building 1700+ package}
261 \framesubtitle{Continued}
264 \item 47 packages that have \textsl{unsatisfied
265 dependencies}:\footnote{Some require other commercial software, some
266 require software we classified\newline as non-free, some require BioConductor packages.}
267 ROracle, Rlsf, Rsge, CarbonEL, VhayuR, gputools, klaR, wgaim, svGUI,
268 RScaLAPACK, caMassClass, Rcplex, ADaCGH, DAAGbio, GFMaps, GOSim,
269 Metabonomic, classGraph, gcExplorer, logilasso, pcalg, celsius, multtest,
270 hopach, GExMap, LMGene, PCS, SubpathwayMiner, gene2pathway, PhViD,
271 SNPMaP, qdg, lsa, mpm, sisus, metaMA, clustTool, clustvarsel,
272 SpectralGEM, bayesCGH, crosshybDetector
273 \item 7 package that (as of end of June) fails for unclassified reasons:
274 IDPmisc, Rsymphony, SuppDists, aroma.apd, aroma.core, cmprskContin, mvgraph
278 \textsl{But everything else}---currently 1768 packages---builds and is
279 available via \texttt{apt-get} and other package management frontends!
283 \frametitle{Status and credits}
284 \framesubtitle{Ready for wider deployment and testing}
286 Who do we owe, and where is it at:
289 \item The ground-work was provided during Google Summer of Code (GSoC) 2008 under the
290 umbrella of the \R Foundation. We thank Google for the GSoC support.
291 \item Currently we are using a (small) Xen-instance on a server at WU Wien to host
292 two Debian pbuilder chroots and an archive. We thank WU Wien/CRAN for
293 hosting and cpu cycles.
294 \item 1700+ packages for i386 and amd64 on Debian testing
295 \item In daily use for the last few weeks!
299 So just add one of these URLs:\newline
300 i386 \phantom{xx} : { \SmallSkip \scriptsize
301 \texttt{deb http://xmcorsairs.wu.ac.at/cran2deb/debian-i386 testing/}
303 amd64 : { \SmallSkip \scriptsize
304 \texttt{deb http://xmcorsairs.wu.ac.at/cran2deb/debian-amd64 testing/}
309 \section{Open Issues}
311 \frametitle{Question to be addressed}
312 \framesubtitle{These may not be showstoppers}
314 Things that still need to be sorted out:
316 \item What can or cannot be (re-)distributed by CRAN and its mirrors?
317 \item What can or cannot be used by all users?
318 \item Remaining external dependencies:
320 \item BioConductor is the single largest source: BioBase, RGraphviz, etc
321 \item Other external libraries or tools not in Debian
322 \item Commercial external dependencies: SGE, LSF, Oracle, Vhayu
324 \item Builds for other architectures ?
325 \item Builds for other Debian flavours such as Ubuntu ?