2 %% add 'handout' option for handouts, and pgfpages for 2-on-1
3 \documentclass[smaller,compress]{beamer}
5 %\pgfpagesuselayout{2 on 1}[letterpaper,border shrink=5mm]
6 %\pgfpagesuselayout{4 on 1}[letterpaper,border shrink=5mm]
7 %\pgfpagesuselayout{2 on 1}[a4,border shrink=5mm]
9 \include{setup} %% has all definitions etc
11 %\title[cran2deb: Automated CRAN to Debian packages generation]{cran2deb: A
12 % system to automatically provide 1700+ CRAN packages as Debian binaries}
13 \title[cran2deb: CRAN to Debian packages]{cran2deb: A fully automated CRAN to \\
14 Debian package generation system}
15 \subtitle{\textsl{UseR! 2009 Presentation}}
16 \subject{UseR! 2009 Presentation}
17 \author[Charles Blundell \and Dirk Eddelbuettel]{Charles Blundell\inst{1} \and Dirk Eddelbuettel\inst{2}}
18 \institute[Gatsby \and Debian]{\inst{1}Gatsby Computational Neuroscience Unit
19 \\ University College London, UK \and \inst{2}Debian and R Projects \\ Chicago,
21 %\date[UseR! 2009]{Presentation at UseR! 2009 \\ Rennnes, France \\ July 2009}
22 \date[UseR! 2009 Presentation]{Universit\'{e} Rennes II, Agrocampus Ouest \\ Laboratoire de
23 Math\'{e}matiques Appliqu\'{e}es \\ 8-10 July 2009}
37 \section[Why]{Why: Background and Motivation}
39 \frametitle{About R -- and its repositories}
40 \framesubtitle{An open statistical language / environment -- with lots of
41 excellent code contributions}
43 A few key facts that are non-controversial at a \textsl{useR!} conference:
45 \item \R\ is now a standard for statistical applications and research
46 \item \textit{``Success has many fathers''}: several key drivers can
47 be identified as to why \R has done so well
48 \item We would like to stress \textsl{repositories} and available packages here:
49 CRAN, as well as BioConductor and Omegahat.
50 \item CRAN has been one of the drivers: an open yet rigorously QA'ed
51 repository which has experienced tremendous growth
56 \frametitle{CRAN Packages} %% NB Or shall we merge this with the preceding slide?
57 \framesubtitle{Exponential Growth}
62 \includegraphics[height=6cm,transparent]{figures/Packages}
65 Source: Fox (2008, 2009), our calculations
71 \item CRAN archive network growing by 40\% p.a., now at around 1750 packages
73 \item John Fox provided this chart in an invited lecture at the last
74 \emph{useR!} meetings.
77 \begin{column}{0.25in}
84 \frametitle{Debian and Ubuntu} % NB Maybe skip this slide?
85 \framesubtitle{Open Linux distributions}
89 \item Debian is \textsl{the} community-driven Linux distribution where
90 numerous volunteers provide over twenty-thousand packages for around
91 a dozen architectures.
92 \item Packages and package management ``just work'': with arguably the most
93 advanced and robust package management system, and a tremendous
94 build and test infrastructure.
95 \item Ubuntu has taken Debian, added a fair amount of spit and polish, as
96 well as regular bi-annual releases, and has rapidly gained mind- and
97 well as market-share as the Linux distribution to beat.
98 \item We also note that the CRAN backend is implemented on Debian.
103 \frametitle{Why build Debian R packages?}
104 \framesubtitle{Combining R and Debian}
105 Bates, Eddelbuettel and Gebhard (UseR! 2004) listed a number of reason
108 \item \textbf{Dependencies} are resolved automatically: \textsl{it just
110 \item \textbf{Convenience} of installing binary packages via
112 %easier than building from source
113 \item \textbf{Quality control} as build daemons, automated rebuilds,
114 porting, ... all ensure that everything is pretty much buildable all the
116 \item \textbf{Scalability} as building one binary package and scripting
117 installation on a cluster beats doing lots of manual installations
118 \item \textbf{Common platform} as Debian forms the base for Ubuntu and
119 several other derivative or single-focus distributions
120 \item \textbf{Different architectures} ranging from small arm or MIPS based
121 systems to amd64, sparc64, hppa or even s390 mainframes
122 \item \textbf{Audience} given the reach of Debian and Ubuntu, large number
123 of users can be reached with little effort
128 %\section{What is behind it?}
130 % \frametitle{So what is a Debian package?} % NB Maybe skip this?
131 % \framesubtitle{And how do I build it?}
133 % Building a Debian package is similar to using \texttt{R
136 % \item Reads meta-information is read from the files in the debian/ directory
138 % \item debian/control (similar to R's DESCRIPTION) lists names,
139 % maintainers, build- and run-time dependencies
140 % \item debian/copyright lists all author, license holders and copyright
142 % \item debian/changelog provides current and past version numbers with a
143 % list of all changes in chronological fashion
144 % \item debian/rules is a Makefile containing all steps to configure,
145 % build, install, package-create and clean
147 % \item Employs a number of external tools scripts and tools, can be used
148 % interactively or in batch mode in chroot'ed 'clean rooms'
153 \section[How]{How: Key aspects of the approach and implementation}
155 \frametitle{Comparing two approaches}
156 \framesubtitle{What have we learned?}
158 Eddelbuettel, Vernazobres, Gebhard and M\"{o}ller (UseR 2007) implemented a
159 system which provides a basis for comparison:
167 \item Top-down approach
168 \item Monolithic and large Perl program
169 \item Meta-information encode directly as Perl hashes in program
170 \item Re-implementing chunks of what \R does in parsing archives
171 \item Not very robust
178 \item Bottom-up approach
179 \item Collection of \R and shell scripts, also lots of SQL
180 \item Re-using \R internal infrastructure as much as possible
181 \item Influenced by %Eddelbuettel's
182 \href{http://dirk.eddelbuettel.com/cranberries/}{CRANberries} and its
183 200 lines of \R code to monitor and summarize CRAN changes
190 \frametitle{Technology Overview: Big Picture}
191 \framesubtitle{Key components}
193 Our cran2deb system is implemented as a collection of small tools:
195 \item cran2deb itself is a wrapper script calling out to about twenty other
196 'worker' scripts implementing the principal commands
198 \item 'worker' scripts are written in \R (for littler), Korn/Bash shell,
199 and in the Plan9 shell rc
200 \item these scripts are small: the largest is 4 kb and only seven
202 \item this is recursive: 'help' is one of these scripts scanning for
203 doc-strings in the other scripts
205 \item cran2deb is also an R package that is being called by some of the R
206 scripts; the R package has just over 1500 lines of code, and it calls out
207 to R functionality from package utils and tools.
212 \frametitle{Technology Overview}
213 \framesubtitle{A walk through: some details}
215 What does cran2deb do:
217 \item pulls new meta-data from CRAN via \texttt{available.packages()}
218 \item detects new (or changed) packages and builds each one via:
220 \item map declared \R dependencies onto cran2deb packages
221 \item map free-form SystemRequirements onto Debian packages
223 \item Rules for this shared among packages---many packages ``just work''.
225 \item add any undeclared dependencies (this applies to just 36 packages
226 and often entails only loading, say, MASS).
227 \item build each package in its own isolated, clean, fresh, up to date
228 build environment via pbuilder: this looks like a fresh install of
229 Debian and ensures correctness of dependencies.
231 \item checks package quality via Debian's lintian.
236 \frametitle{Technology Overview}
237 \framesubtitle{A walk through: some more details}
239 What does cran2deb do (cont.):
241 \item uses RSQLite backend for cran2deb state: everything from package
242 meta-information, blacklist of bad packages, to build logs.
243 \item checks for a free license of a package before its built:
245 \item initially: handcrafted regular expressions to match
247 \item some packages ignore ``Writing R extensions'' guidelines
248 concerning the License: field: how many ways to write GPL?
250 \item initialised vs. its expansion (GPL vs. GNU general public license)
251 \item license vs. licence
252 \item see \texttt{http://www.gnu.org/GPL}
253 \item (v, version) (2.0, 2) or (higher, later, newer, greater, above)
254 \item typos of the above
255 \item file LICENSE: contents reformatted in arbitrary ways
257 \item now: strip white space and perform other harmless transforms
258 and match SHA1 checksums to determine license; likewise for contents of LICENSE
265 \frametitle{Technology Overview}
266 \framesubtitle{Continued}
268 Re-use, re-duce, re-cycle:
271 \item \R's infrastructure is used to obtain the \R view of the world:
272 what packages and where, first approximation to dependencies.
273 \item All this uses the Debian build infrastructure, notably the
274 pbuilder chroot environment and the package management system
275 \item cran2deb sets the build environment up by invoking the proper Debian
277 \item the `production line' of packages is fully automated via cron and report status
279 \item per-package patches are allowed (currently eleven packages have
280 mostly trivial patches)
281 \item source code is available via the r-forge subversion repository and archive
286 \section[Status]{Status: Where are we now?}
289 \frametitle{Building 1700+ package}
290 \framesubtitle{Summary from a package views}
292 It's easy: basically \textsl{everything} builds and is available as a
293 Debian package (complete with full dependencies) --- apart from:
296 \item 17 packages that are \textsl{not free enough}:\footnote{Generally these
297 do not allow commercial use, modification and/or distribution with the
298 exception of ConvCalendar which gives no modification or distribution rights.}
299 mclust, mclust02, ConvCalendar, SDDA, conf.design, isa2, optmatch,
300 rankreg, realized, rngwell19937, tnet, spatialkernel, Bhat, PTAk,
301 PredictiveRegression, RLadyBug, mapproj
302 \item 1 package that is obsolete: xgobi
303 \item 2 package that break building packages via cran2deb:\footnote{They
304 take down the cronjob; we are stumped as to why.} dprep, EngrExpt
305 \item 1 package that is not built for 'other' reasons:\footnote{It contains
311 \frametitle{Building 1700+ package}
312 \framesubtitle{Continued}
315 \item 47 packages that have \textsl{unsatisfied
316 dependencies}:\footnote{Some require other commercial software, some
317 require software we classified\newline as non-free, some require BioConductor packages.}
318 ROracle, Rlsf, Rsge, CarbonEL, VhayuR, gputools, klaR, wgaim, svGUI,
319 RScaLAPACK, caMassClass, Rcplex, ADaCGH, DAAGbio, GFMaps, GOSim,
320 Metabonomic, classGraph, gcExplorer, logilasso, pcalg, celsius, multtest,
321 hopach, GExMap, LMGene, PCS, SubpathwayMiner, gene2pathway, PhViD,
322 SNPMaP, qdg, lsa, mpm, sisus, metaMA, clustTool, clustvarsel,
323 SpectralGEM, bayesCGH, crosshybDetector
324 \item 8 package that (as of end of June) fail for unclassified reasons:
325 IDPmisc, Rsymphony, SuppDists, aroma.apd, aroma.core, aroma.affymetrix, cmprskContin, mvgraph
329 \textsl{But everything else}---currently 1770 packages---builds and is
330 available via \texttt{apt-get} and other package management frontends!
334 \frametitle{Status and credits}
335 \framesubtitle{Ready for wider deployment and testing}
337 Who do we owe, and where is it at:
340 \item The ground-work was provided during Google Summer of Code (GSoC) 2008 under the
341 umbrella of the Debian project. We thank Google for the GSoC support.
342 \item Currently we are using a (small) Xen-instance on a server at WU Wien to host
343 two Debian pbuilder chroots and an archive. We thank WU Wien/CRAN for
344 hosting and cpu cycles.
345 \item 1700+ packages for i386 and amd64 on Debian testing
346 \item In daily use for the last few weeks!
350 So just add one of these URLs:\newline
353 \texttt{deb http://debian.cran.r-project.org/cran2deb/debian-i386 testing/}
355 \texttt{deb http://debian.cran.r-project.org/cran2deb/debian-amd64 testing/}
359 \section[Next]{Next: Open Issues}
361 \frametitle{Question to be addressed}
362 \framesubtitle{For cran2deb to migrate out of beta testing}
364 %Things that may need to be sorted out:
366 \item \textbf{Licenses:}
368 \item What can or cannot be (re-)distributed by CRAN and its mirrors?
369 \item What can or cannot be used (and/or modified) by all users?
371 \item \textbf{Externtal dependencies} % Remaining external dependencies:
373 \item BioConductor is the single largest source: BioBase, RGraphviz, etc
374 \item Other external libraries or tools not in Debian
375 \item Commercial external dependencies: SGE, LSF, Oracle, Vhayu
379 \item Builds for other architectures?
380 \item Builds for other Debian flavours such as Ubuntu?
381 \item Builds of other repositories: BioConductor? R-Forge?