%\title[cran2deb: Automated CRAN to Debian packages generation]{cran2deb: A
% system to automatically provide 1700+ CRAN packages as Debian binaries}
-\title[cran2deb: CRAN to Debian packages]{cran2deb: Automating CRAN to Debian packages generation}
-
-%\subtitle{\textsl{Tutorial at R/Finance 2009}}
+\title[cran2deb: CRAN to Debian packages]{cran2deb: A fully automated CRAN to \\
+ Debian package generation system}
+\subtitle{\textsl{UseR! 2009 Presentation}}
\subject{UseR! 2009 Presentation}
\author[Charles Blundell \and Dirk Eddelbuettel]{Charles Blundell\inst{1} \and Dirk Eddelbuettel\inst{2}}
\institute[Gatsby \and Debian]{\inst{1}Gatsby Computational Neuroscience Unit
- \\ University College London \and \inst{2}Debian / R Projects}
-\date[UseR! 2009]{Presentation at UseR! 2009 \\ Rennnes, France \\ July 2009}
+ \\ University College London, UK \and \inst{2}Debian and R Projects \\ Chicago,
+IL, USA}
+%\date[UseR! 2009]{Presentation at UseR! 2009 \\ Rennnes, France \\ July 2009}
+\date[UseR! 2009 Presentation]{Universit\'{e} Rennes II, Agrocampus Ouest \\ Laboratoire de
+ Math\'{e}matiques Appliqu\'{e}es \\ 8-10 July 2009}
\begin{document}
\section[Why]{Why: Background and Motivation}
\begin{frame}
- \frametitle{About R -- and its repos}
+ \frametitle{About R -- and its repositories}
\framesubtitle{An open statistical language / environment -- with lots of
excellent code contributions}
\begin{itemize}
\item \R\ is now a standard for statistical applications and research
\item \textit{``Success has many fathers''}: several key drivers can
- be identified as to why \R\ has done so well
- \item We would like to stress \textsl{repos} and available packages here:
+ be identified as to why \R has done so well
+ \item We would like to stress \textsl{repositories} and available packages here:
CRAN, as well as BioConductor and Omegahat.
- \item CRAN has been one of the drivers: an open yet rigourously QA'ed
- repostory which has experienced tremendous growth
+ \item CRAN has been one of the drivers: an open yet rigorously QA'ed
+ repository which has experienced tremendous growth
\end{itemize}
\end{frame}
\item Ubuntu has taken Debian, added a fair amount of spit and polish, as
well as regular bi-annual releases, and has rapidly gained mind- and
well as market-share as the Linux distribution to beat.
- \item Lastly, we note that the CRAN backend is also implemented on Debian.
+ \item We also note that the CRAN backend is implemented on Debian.
\end{itemize}
\end{frame}
installation on a cluster beats doing lots of manual installations
\item \textbf{Common platform} as Debian forms the base for Ubuntu and
several other derivative or single-focus distributions
- \item \textbf{Different architectures} ranging from small arm or mips based
+ \item \textbf{Different architectures} ranging from small arm or MIPS based
systems to amd64, sparc64, hppa or even s390 mainframes
\item \textbf{Audience} given the reach of Debian and Ubuntu, large number
of users can be reached with little effort
\end{frame}
%\section{What is behind it?}
-\begin{frame}
- \frametitle{So what is a Debian package?} % NB Maybe skip this?
- \framesubtitle{And how do I build it?}
-
- Building a Debian package is similar to using \texttt{R
- CMD binary} etc:
- \begin{itemize}
- \item Reads meta-information is read from the files in the debian/ directory
- \begin{itemize}
- \item debian/control (similar to R's DESCRIPTION) lists names,
- maintainers, build- and run-time dependencies
- \item debian/copyright lists all author, license holders and copyright
- statements
- \item debian/changelog provides current and past version numbers with a
- list of all changes in chronological fashion
- \item debian/rules is a Makefile containing all steps to configure,
- build, install, package-create and clean
- \end{itemize}
- \item Employs a number of external scripts and tools tie into this,
- similar to what R has below \texttt{\$RHOME/share}
- \end{itemize}
-\end{frame}
+%\begin{frame}
+% \frametitle{So what is a Debian package?} % NB Maybe skip this?
+% \framesubtitle{And how do I build it?}
+%
+% Building a Debian package is similar to using \texttt{R
+% CMD binary} etc:
+% \begin{itemize}
+% \item Reads meta-information is read from the files in the debian/ directory
+% \begin{itemize}
+% \item debian/control (similar to R's DESCRIPTION) lists names,
+% maintainers, build- and run-time dependencies
+% \item debian/copyright lists all author, license holders and copyright
+% statements
+% \item debian/changelog provides current and past version numbers with a
+% list of all changes in chronological fashion
+% \item debian/rules is a Makefile containing all steps to configure,
+% build, install, package-create and clean
+% \end{itemize}
+% \item Employs a number of external tools scripts and tools, can be used
+% interactively or in batch mode in chroot'ed 'clean rooms'
+% \end{itemize}
+%\end{frame}
\section[How]{How: Key aspects of the approach and implementation}
\frametitle{Comparing two approaches}
\framesubtitle{What have we learned?}
- Eddelbuettel, Vernazobres, Gebhard and M\"{o}ller (UseR 2007) presented a first
- approach.
+ Eddelbuettel, Vernazobres, Gebhard and M\"{o}ller (UseR 2007) implemented a
+ system which provides a basis for comparison:
\MedSkip
\begin{itemize}
\item Top-down approach
\item Monolithic and large Perl program
+ \item Meta-information encode directly as Perl hashes in program
\item Re-implementing chunks of what \R does in parsing archives
\item Not very robust
\end{itemize}
\end{frame}
\begin{frame}
- \frametitle{Technology Overview}
- \framesubtitle{Charles: Can you fill something in here, if I haven't stolen
- all nuggets on the previous slide?}
+ \frametitle{Technology Overview: Big Picture}
+ \framesubtitle{Key components}
+
+ Our cran2deb system is implemented as a collection of small tools:
+ \begin{itemize}
+ \item cran2deb itself is a wrapper script calling out to about twenty other
+ 'worker' scripts implementing the principal commands
+ \begin{itemize}
+ \item 'worker' scripts are written in \R (for littler), Korn/Bash shell,
+ and in the Plan9 shell rc
+ \item these scripts are small: the largest is 4 kb and only seven
+ are larger than 1 kb
+ \item this is recursive: 'help' is one of these scripts scanning for
+ doc-strings in the other scripts
+ \end{itemize}
+ \item cran2deb is also an R package that is being called by some of the R
+ scripts; the R package has just over 1500 lines of code, and it calls out
+ to R functionality from package utils and tools.
+ \end{itemize}
+\end{frame}
+
+\begin{frame}
+ \frametitle{Technology Overview}
+ \framesubtitle{A walk through: some details}
+
+ What does cran2deb do:
+ \begin{itemize}
+ \item pulls new meta-data from CRAN via \texttt{available.packages()}
+ \item detects new (or changed) packages and builds each one via:
+ \begin{itemize}
+ \item map declared \R dependencies onto cran2deb packages
+ \item map free-form SystemRequirements onto Debian packages
+ \begin{itemize}
+ \item Rules for this shared among packages---many packages ``just work''.
+ \end{itemize}
+ \item add any undeclared dependencies (this applies to just 36 packages
+ and often entails only loading, say, MASS).
+ \item build each package in its own isolated, clean, fresh, up to date
+ build environment via pbuilder: this looks like a fresh install of
+ Debian and ensures correctness of dependencies.
+ \end{itemize}
+ \item checks package quality via Debian's lintian.
+ \end{itemize}
+\end{frame}
+
+\begin{frame}
+ \frametitle{Technology Overview}
+ \framesubtitle{A walk through: some more details}
+
+ What does cran2deb do (cont.):
+ \begin{itemize}
+ \item uses RSQLite backend for cran2deb state: everything from package
+ meta-information, blacklist of bad packages, to build logs.
+ \item checks for a free license of a package before its built:
+ \begin{itemize}
+ \item initially: handcrafted regular expressions to match
+ licenses.
+ \item some packages ignore ``Writing R extensions'' guidelines
+ concerning the License: field: how many ways to write GPL?
+ \begin{itemize}
+ \item initialised vs. its expansion (GPL vs. GNU general public license)
+ \item license vs. licence
+ \item see \texttt{http://www.gnu.org/GPL}
+ \item (v, version) (2.0, 2) or (higher, later, newer, greater, above)
+ \item typos of the above
+ \item file LICENSE: contents reformatted in arbitrary ways
+ \end{itemize}
+ \item now: strip white space and perform other harmless transforms
+ and match SHA1 checksums to determine license; likewise for contents of LICENSE
+ file.
+ \end{itemize}
+ \end{itemize}
+\end{frame}
+
+\begin{frame}
+ \frametitle{Technology Overview}
+ \framesubtitle{Continued}
+
+ Re-use, re-duce, re-cycle:
+
+ \begin{itemize}
+ \item \R's infrastructure is used to obtain the \R view of the world:
+ what packages and where, first approximation to dependencies.
+ \item All this uses the Debian build infrastructure, notably the
+ pbuilder chroot environment and the package management system
+ \item cran2deb sets the build environment up by invoking the proper Debian
+ scripts
+ \item the `production line' of packages is fully automated via cron and report status
+ summaries by email
+ \item per-package patches are allowed (currently eleven packages have
+ mostly trivial patches)
+ \item source code is available via the r-forge subversion repository and archive
+ \end{itemize}
\end{frame}
\section[Status]{Status: Where are we now?}
+
+\begin{frame}
+ \frametitle{Building 1700+ package}
+ \framesubtitle{Summary from a package views}
+
+ It's easy: basically \textsl{everything} builds and is available as a
+ Debian package (complete with full dependencies) --- apart from:
+
+ \begin{itemize}
+ \item 17 packages that are \textsl{not free enough}:\footnote{Generally these
+do not allow commercial use, modification and/or distribution with the
+exception of ConvCalendar which gives no modification or distribution rights.}
+ mclust, mclust02, ConvCalendar, SDDA, conf.design, isa2, optmatch,
+ rankreg, realized, rngwell19937, tnet, spatialkernel, Bhat, PTAk,
+ PredictiveRegression, RLadyBug, mapproj
+ \item 1 package that is obsolete: xgobi
+ \item 2 package that break building packages via cran2deb:\footnote{They
+ take down the cronjob; we are stumped as to why.} dprep, EngrExpt
+ \item 1 package that is not built for 'other' reasons:\footnote{It contains
+ binary code.} sabreR
+ \end{itemize}
+\end{frame}
+
+\begin{frame}
+ \frametitle{Building 1700+ package}
+ \framesubtitle{Continued}
+
+ \begin{itemize}
+ \item 47 packages that have \textsl{unsatisfied
+ dependencies}:\footnote{Some require other commercial software, some
+ require software we classified\newline as non-free, some require BioConductor packages.}
+ ROracle, Rlsf, Rsge, CarbonEL, VhayuR, gputools, klaR, wgaim, svGUI,
+ RScaLAPACK, caMassClass, Rcplex, ADaCGH, DAAGbio, GFMaps, GOSim,
+ Metabonomic, classGraph, gcExplorer, logilasso, pcalg, celsius, multtest,
+ hopach, GExMap, LMGene, PCS, SubpathwayMiner, gene2pathway, PhViD,
+ SNPMaP, qdg, lsa, mpm, sisus, metaMA, clustTool, clustvarsel,
+ SpectralGEM, bayesCGH, crosshybDetector
+ \item 8 package that (as of end of June) fail for unclassified reasons:
+ IDPmisc, Rsymphony, SuppDists, aroma.apd, aroma.core, aroma.affymetrix, cmprskContin, mvgraph
+ \end{itemize}
+
+ \MedSkip
+ \textsl{But everything else}---currently 1770 packages---builds and is
+ available via \texttt{apt-get} and other package management frontends!
+\end{frame}
+
\begin{frame}
- \frametitle{Current Status}
+ \frametitle{Status and credits}
\framesubtitle{Ready for wider deployment and testing}
+ Who do we owe, and where is it at:
+
\begin{itemize}
- \item Ground-work provided during Google Summer of Code 2008 under the
- umbrella of the \R Foundation
- \item Currently using a (small) Xen-instance on a server at WU Wien to host
- two Debian pbuilder chroots and an archive
+ \item The ground-work was provided during Google Summer of Code (GSoC) 2008 under the
+ umbrella of the Debian project. We thank Google for the GSoC support.
+ \item Currently we are using a (small) Xen-instance on a server at WU Wien to host
+ two Debian pbuilder chroots and an archive. We thank WU Wien/CRAN for
+ hosting and cpu cycles.
\item 1700+ packages for i386 and amd64 on Debian testing
\item In daily use for the last few weeks!
\end{itemize}
\MedSkip
- Just add the following URL (with -amd64 for 64-bit) \newline
- { \SmallSkip \scriptsize
- \texttt{deb http://xmcorsairs.wu.ac.at/cran2deb/debian-i386 testing/}
- }
+ So just add one of these URLs:\newline
+ { \scriptsize
+
+ \texttt{deb http://debian.cran.r-project.org/cran2deb/debian-i386 testing/}
+ \texttt{deb http://debian.cran.r-project.org/cran2deb/debian-amd64 testing/}
+ }
\end{frame}
-\section{Open Issues}
+\section[Next]{Next: Open Issues}
\begin{frame}
\frametitle{Question to be addressed}
- \framesubtitle{These may not be showstoppers}
+ \framesubtitle{For cran2deb to migrate out of beta testing}
+ %Things that may need to be sorted out:
\begin{itemize}
- \item What can or cannot be (re-)distributed by CRAN and its mirrors?
- \item What can or cannot be used by all users?
- \item Remaining external dependencies:
+ \item \textbf{Licenses:}
+ \begin{itemize}
+ \item What can or cannot be (re-)distributed by CRAN and its mirrors?
+ \item What can or cannot be used (and/or modified) by all users?
+ \end{itemize}
+ \item \textbf{Externtal dependencies} % Remaining external dependencies:
+ \begin{itemize}
+ \item BioConductor is the single largest source: BioBase, RGraphviz, etc
+ \item Other external libraries or tools not in Debian
+ \item Commercial external dependencies: SGE, LSF, Oracle, Vhayu
+ \end{itemize}
+ \item \textbf{Scope}
\begin{itemize}
- \item BioConductor is the single largest source: BioBase, RGraphviz, etc
- \item Other external libraries or tools not in Debian
- \item Commercial external dependencies: SGE, LSF, Oracle, Vhayu
+ \item Builds for other architectures?
+ \item Builds for other Debian flavours such as Ubuntu?
+ \item Builds of other repositories: BioConductor? R-Forge?
\end{itemize}
- \item Builds for other architectures ?
- \item Builds for other Debian flavours such as Ubuntu ?
\end{itemize}
\end{frame}