use NULL instead of NA. include git revision and date stamp in build log. generate changelog.
NA has length 1 and conveys perhaps-ok information. NULL was often more
appropriate.
a configure script edits R/zzz.R to add a new global 'git_revision'
prior to R building the package. This assumes that the current working
directory is in a git repository and will fail if it is not.
changelog generated according to Debian guidelines---latest builds to
the top. Included: the git revision of cran2deb, the time/date, and the
database version.
getrpkg: invoke curl rather than using download.packages.
curl has the options necessary to make a mass build possible with my
current unreliable internet connection. some tcp connections hang
indefinitely, which indefinitely halts any build. retrying works. R has
no suitable timeout mechanism.
version: increment revision when a new build is attempted and the previous build was successful.
every successful build will have its own revision; code to be introduced
ensures that builds only occur when 'things change', hence revisions
correspond to successful builds of things significantly changing, except
the greatest revision, which may be a failed build.
exec: drop bioc, more verbose version mismatch error, switch hoc to awk (more portable)
some packages are in bioc and cran (e.g., graph). this confused cran2deb
as One True Package is expected. to solve this, we could filter for
common tables and take the highest version number. but who's to say
these aren't actually distinct packages? so for now, assume all cran
dependencies are satisfiable in cran (they aren't).
misc fixes: update database fully rather than just licenses, initialise a variable in autobuild, correct some corruption(?!), don't include orig.tar.gz if the package revision > 1.
slightly worrying how the 5 changed to a 1.
R's error reporting is pretty foul. this missing extra_deps error
manifested itself 3-4 layers down the stack trace. R doesn't give a
stack trace either. not good for large software.
build: automatically version package builds, record results in build log, record all cran2deb generated messages in log.
automatic version works as follows:
- if there is no previous build in the database, use R version with
epoch=0 (will probably change to base_epoch in the DB), revision=1.
- if there is a previous build, and the R version of that build is the
same as the R version of the to-be-built, then increment the revision
by one.
- otherwise use the previous epoch and revision=1 with the new R version.
TODO: grab the output from system()s into the log too.
TODO: whilst version changes make sense, the Debian revision number
probably creeps up a little bit too quickly. No point in versioning
failed builds or repeat builds where nothing changed.
note: database needs to be completely recreated with this commit.
packages are flagged for build iff:
- there is no latest build
- the database (or cran2deb) has changed since the last build
- the debian epoch for cran2deb has changed since the last build
- the R version of the last build does not match that of the very
latest R package.
it should be easy to add further conditions in as this is all done with
one moderately-sized SQL SELECT.
depends: add forced dependencies, separate run/build depends.
changes sufficient to make failures in rJava and RBGL external: openjdk
is not ready yet (latest crashes during configure), and
libboost-graph-dev 1.34.1-11 is not compatible with gcc 4.3 (see BioC
list archives).
also limit the frequency of pbuilder update and cache update.
note: database must be deleted and rebuilt with this patch.
cran2deb: use /var/cache/cran2deb as a permanent cache between installs.
Previously, when a new cran2deb was installed, cran2deb update would
re-generate the database and cache in their entirety, as well as lose
all previously generated .debs. Instead they are stored outside the R
package heirarchy and so persist.
The cache is intended to contain everything that should be kept
in-memory for cran2deb (e.g., common data structures like the list of
all available packages), whilst the database is for all other data to be
stored on disk.
license: split out some of the heuristics for reducing poorly formed license fields.
in particular, do not remove numbers in just one step. it seems common
for people to use:
License: X11 (http://blahblah/dfgdfg/)
which will be reduce to just 'x' if these steps are not split.
Also separate license hashing into getting the license text and then
separately hashing via digest. (Useful for for the 'view' command about to
be added to cran2deb license.)
license: hashes of freeform licenses are stored in the database and these hashes used for auto-acceptance.
freeform licenses may be files, or may be the contents of the License:
field in the R DESCRIPTION. such text is mapped to lower case and all
space characters are compressed and mapped to a single space.
a nicer interface for adding these freeform licenses is introduced.
after reviewing the license, its hash may be added as follows:
$ cran2deb license
license> add uroot gpl
(maps hash of whatever freeform license uroot has to gpl)
...
$ cran2deb build uroot
(success is assured!)
litter sets tempdir() to /tmp (or whatever is in one of the many TMP
environment variables) whereas Rscript is as R: tempdir() returns a
per-session temporary directory. Rscript seems more sane on this one.
Manifested as a bug:
cran2deb update
cran2deb build DPpackage
-> 404 no such package found in CRAN
-> but DPpackage in available.packages?!
-> available.packages using old cache (hence old version of package
metadata)
Rscript cleans up each time. littler doesn't.
ctv: make CRAN task view mass building work again.
this just means you can now do:
$ cran2deb build_ctv
to build every task view, whereas previously the code had not
be adapted to use the cran2deb wrapper script.
depends+license: use boolean masking on arrow of rownames instead of directly on available when reversing arcs. typo in license simplification.
since R, like MATLAB, treats containers as size 1 as scalars (at least
in some cases), it seems that foo[x,drop = F] where x is a container has
sometimes surprising results (e.g., forgetting rownames), particularly
when x is of size 1 and/or foo is of size 1. instead only deal with the
form where foo is one dimensional -- this at least seems ok.
sysreq: parse SystemRequirements and check each one against database, after preprocessing.
``Writing R extensions'' does not actually specify a format for
SystemRequirements. Fortunately most package authors seem to use
a similar notation as that of Depends/Imports, but also include URLs
and ad-hoc information.
This is enough to make the R package Matrix depend correctly on 'make'
and likely more after a bit of leg work.
licenses+db: delegate license acceptance to the database. add license management interface.
``License: file FOO'' support is nearly done; just need to plug bits
together on the acceptance side of things. An SHA1 hash of the license
file is stored in the DB for matching. Unsure how effective this will
be. Might want to remove all whitespace prior to hashing.
needs testing; probably has a few bugs lurking. still need to work out
appropriate place for database so that R does not wipe it each time.
Perhaps it's time to delve into /var.
also a fix to r.dependency.closure --- previously incorrectly uses
levels() on the wrong type.
cran2deb: put base_pkgs into the cache. generate cache so that R is happy. use separate base.tgz for cran2deb.
base_pkgs is the list of all packages that are provided in the basic
install of R. It is found by listing all installed packages in the
pbuilder environment.
Previously the cache of availabile packages lived in sysdata.rda.
Unfortunately it appears that R does not like it when sysdata.rda is
updated after package installation (I think this is something to do with
lazy loading, but disabling this did not seem to help). Instead the
cache is maintained separated in the data/ package directory.
pbuilder now uses base-cran2deb.tgz for the cran2deb pbuilder
environment -- this should help keep cran2deb from interfering from
other uses of pbuilder.
cran2deb: correct cran2deb dependencies. make cran2deb a meta-executable.
'cran2deb' is a script that determines the root of the cran2deb R
package installation and then invokes some other executable with this
root as the first argument. e.g.,
$ cran2deb update
$ cran2deb build zoo
The README file now includes details of what must be done to use
cran2deb.
also filter out SystemRequirement fails; this shows whether there are
still some C header and other failures attributable to things other than
declared SystemRequirements (short answer: there are, annoyingly).
cran2deb: extra dependencies on command line; fix nasty bug in cross-repo dependencies.
accept with a common typo of version in License:.
bail out on SystemRequirements
correct a nasty bug: dependencies[r,]$name displayed like a string in R,
but was actually treated as a number; hence some cross-repo dependencies
did not work correctly since the wrong available[] entry was being used.
allow some extra dependencies to be specified on the command line.
cran2deb: cache availabile packages. support cross-repo dependencies[1]. basic understanding of bundles (no building yet).
If an R package name is needed, and cannot be found in the available
packages, then try to resolve it into a bundle. If this works, then
substitute the name of the bundle for the package name and procede. This
is enough to get dependency resolution working and R source packages
downloaded, but still to do is the generation of debian/ for bundles.
[1] Appears to be some problems during building of bioc packages -- even
though the package is called r-bioc-XXX, a directory r-cran-XXX is
expected by some part. Suspect need to change generation of
debian/rules.
cran2deb: satisfy immediate R dependencies iff the requisite package has already been build.
Works by running apt-get update before each build, pulling in the
declared R dependencies from a local repository.
Still to do is actually read off a topological order and deal with
transitivity. Otherwise, this appears to work fine where the only
dependencies are those in Depends or Imports of the R DESCRIPTION.
Also included is to use Imports as well as Depends for generating R
dependencies... this was necessary for, for example, stashR --- not
quite sure if this is technically correct, but it allows this package to
work.
Constructs an archive that pbuilder can apt-get dependencies from when
necessary. This requires an http server to serve var/archive (can be a
symlink).
cran2deb: rebuild the source tarball each time removing file exec bit.
The subdirectory of an R package is 'pkgname'. Debian typically has
'pkgname-upstreamversion'. Hence the tarball is rebuilt after renaming
the 'pkgname' directory appropriately.
Whilst this is all going on, every file in the R package has its
executable bit removed. There would appear to be little correct need
for it at the moment (this will change when #! is handled better) and
this gets rid of some inappropriately executable files.
cran2deb: catch a few more licenses found 'in the wild'.
If the license does not look like a properly formatted one, strip away
anything harmless (space, punctuation, numbers, etc) and see if it
can be made to match a known-good license exactly.
cran2deb: use iconv(1) to convert debian/{control,changelog,copyright} to utf8
lintian(1) likes these files to be utf8. R does not specify a character
set, so this code just hopes that iconv(1) will do the right thing. In
the only case where it matters so far (lspls) this allows the package to
build and pass lintian(1).
cranpkgs & build_some: build a random subset of all available packages.
current result:
100 packages tried
23 failed -- all due to license problems: not considered
(MIT,BSD) or malformed field (e.g., GPL 2.0 or later)
'some' (uncounted) have significant lintian warnings (mostly
seems to be incorrectly +x files)
77 .deb files produced
License handling code now deals with version numbers and also the more
common malformed license fields found in the wild. License version
information is discarded completely at the moment (though retained in
debian/copyright).
debian/copyright also includes a copyright notice formed from the Author
field of DESCRIPTION.
Debian package names must be lower case; the Debian source package
of an R package is the lowercase form of the R package name, whilst the
binary package is r-<lower case repo>-<lower case R name>.