sandbox/proposal_regressiontestframwork.moin

   1 ## This will eventually do to wiki.debian.org/DebTestFramework
   2
   3  * '''Created''': <<Date(2010-10-07)>>
   4  * '''Contributors''': MichaelHanke, YaroslavHalchenko
   5  * '''Packages affected''':
   6  * '''See also''':
   7
   8 == Summary ==
   9
  10 This specification describes DebTest -- a framework with conventions and tools
  11 that allow Debian to distribute test batteries developed by upstream or Debian
  12 developers.  DebTest aims to enable developers and users to perform extensive
  13 testing of a deployed Debian system or a particular software of interest in a
  14 uniform fashion.
  15
  16 == Rationale ==
  17
  18 Ideally software packaged for Debian comes with an exhaustive test suite that
  19 can be used to determine whether this particular software works as expected on
  20 the Debian platform. However, especially for complex software, these test
  21 suites are often resource hungry (CPU time, memory, disk space, network
  22 bandwidth) and cannot be ran at package build time by buildds. Consequently,
  23 test suites are typically utilized manually and only by the respective packager
  24 on a particular machine, before uploading a new version to the archive.
  25
  26 However, Debian is an integrated system and packaged software typically relies
  27 on functionality provided by other Debian packages (e.g. shared libraries)
  28 instead of shipping duplicates with different versions in every package -- for
  29 many good reasons. Unfortunately, there is also a downside to this: Debian
  30 packages often use versions of 3rd-party tools that are different from those
  31 tested by upstream, and moreover, the actual versions of dependencies might
  32 change frequently between subsequent uploads of a dependent package.  Currently
  33 a change in a dependency that introduces an incompatibility cannot be detected
  34 reliably even if upstream provides a test suite that would have caught the
  35 breakage.  Therefore integration testing heavily relies on users to detect
  36 incorrect functioning and file bug reports. Although there are archive-wide QA
  37 efforts (e.g. constantly rebuilding all packages) these tests can only detect
  38 API/ABI breakage or functionality tested during build-time checks -- they are
  39 not exhaustive for the aforementioned reasons.
  40
  41 This is a proposal to, first of all, package upstream test suites in a way that
  42 they can be used to run expensive archive-wide QA tests. However, this is also
  43 a proposal to establish means to test interactions between software from
  44 multiple Debian packages to provide more thorough continued integration and
  45 regression testing for the Debian systems.
  46
  47 == Use Cases ==
  48
  49   * Moritz is a member of the security team. Whenever he applies a patch to fix
  50     a security issue he wants to make sure that the generic behavior of the software
  51     remains unchanged. However, in general he only has access to test cases that
  52     are included in the source package (if any). In the absence of proper tests
  53     he can only either assume that is would work (bad by design), or rely on the
  54     respective package maintainer to run the appropriate tests (introduces
  55     delays). A packaged exhaustive regression test suite would allow Moritz to
  56     perform comprehensive testing on his own and release the fixed package as
  57     soon as the tests pass.
  58
  59   * Michael is a Debian package maintainer that takes care of three
  60     packages each providing a data format conversion utility. While
  61     all three tools have their merits there is also lots of
  62     overlap. For example, given a particular data file they should all
  63     generate identical output. With a DebTest framework, Michael can
  64     write and package cross-package test suites to ensure that this
  65     promise is fulfilled at any time.  Moreover, Michael can also
  66     develop/package "pipeline" tests that ensure proper functioning of
  67     multi-stage/package processing pipelines (from raw data format
  68     conversion to visualization), where some stages could be
  69     (re)processed using alternative tools from different software
  70     packages promising to provide the same functionality.  By testing
  71     a whole processing stream while changing the alternative
  72     implementations, breakage of the compatibility compliance could be
  73     detected.
  74
  75   * Yarik is a Debian maintainer of a package where upstream provides
  76     a complete analysis pipeline which was used for an article
  77     publication.  Such analysis requires relatively large array of
  78     data and a range of tools from other packages to acquire
  79     publication-ready summary of the results. Therefor such analysis
  80     cannot be carried out at package build time.  Upstream aims to
  81     assure the reproducibility of the published results and encourages
  82     Yarik to promise correct functioning of the research product on
  83     Debian systems.  Within the DebTest framework, Yarik can package
  84     upstream analysis pipeline along with the target results to assure
  85     reproducibility of the scientific findings.
  86
  87   * Albert is a scientist using Debian for his research activities. The
  88     developers of his favorite software tell him to rather use the GreenPants
  89     distribution, because they cannot guarantee that their software works
  90     properly on Debian. They reason that Debian has a different
  91     version of a numerical library that hasn't been "tested" by the authors.
  92     With packaged regression test suites Albert can install and run, at any given point,
  93     a complete test of his Debian system to ensure that everything is working
  94     properly given the exact set of base libraries installed at this very moment.
  95     This includes the test suite of the authors of his favorite software, but
  96     also all distribution test suites provided by Debian developers (see above).
  97
  98   * Sylvestre maintains a core computational library in Debian.
  99     A new version (or other modification) of this library promises performance
 100     advantages.  Using DebTest he could not only verify the absence of
 101     regressions but also to obtain direct performance comparison
 102     against the previous version across a range of applications.
 103
 104   * Joerg maintains a repository of backports of Debian packages to be
 105     installed in a stable environment.  He wants to assure that
 106     backporting of the packages has not caused a deviation in their
 107     intended functioning.  By using existing DebTest tests suites he
 108     could verify that backported versions of the packages do not break
 109     the stability and function as promised within the stable
 110     environment.
 111
 112   * Mark wants to create a Debian-derived distribution and needs to
 113     modify a number of essential packages in order to achieve the desired
 114     improvements. He hopes that these changes do not break other Debian
 115     packages, but he is not really sure. A comprehensive test battery for the
 116     whole Debian system would offer him a way to verify proper functioning
 117     of his modified snapshot of Debian -- without having to manually replicate
 118     the testing efforts done by thousands of Debian contributors.
 119
 120   * Linus is an upstream developer. He just loves the fact that he can tell any
 121     of his Debian-based users to just 'apt-get install' something and send him
 122     the output of a debtest command, whenever they claim that his software
 123     doesn't work properly. It pleases him to see his carefully developed test
 124     suite to be conveniently accessible for users.
 125
 126   * Finally, Lucas has access to a powerful computing facility and
 127     likes to run all kinds of tests on all packages in the Debian archive.
 128     A Debian-wide regression test framework would allow Lucas to execute
 129     complex test collections (suites for individual packages,
 130     interoperability tests, or comparative) in an automated fashion,
 131     and file bug reports against the respective packages whenever a
 132     malfunction is detected. Some of Lucas friends are not brave enough to file
 133     bugs, but still want to contribute. They simply run (selected) tests
 134     on their local machines that in turn report results/logs to a Debian
 135     dashboard server, where interested parties can get a weather report of
 136     Debian's status.
 137
 138 == Scope ==
 139
 140 This specification is applicable to all Debian packages, and Debian as a whole.
 141
 142 == Design ==
 143
 144 A specification should be built with the following considerations:
 145
 146   * The person implementing it may not be the person writing it. Specification should be
 147   * clear enough for someone to be able to read it and have a clear path
 148   * towards implementing it. If it is not straightforward, it needs more detail.
 149
 150   * Use cases covered in the specification should be practical
 151   * situations, not contrived issues.
 152
 153   * Limitations and issues discovered during the creation of a specification
 154   * should be clearly pointed out so that they can be dealt with explicitly.
 155
 156   * If you don't know enough to be able to competently write a spec, you should
 157   * either get help or research the problem further. Avoid spending time making
 158   * up a solution: base yourself on your peers' opinions and prior work.
 159
 160 Specific issues related to particular sections are described further below.
 161
 162
 163 === Core components ===
 164
 165  * Organization of the framework
 166    - packages might register ways to run basic tests against installed
 167      versions
 168    register:
 169     - executable?
 170
 171
 172 ==== Packaged tests ====
 173
 174  * Metainformation:
 175    * duration: ....
 176    * resources:
 177    * suites:
 178
 179  * Debug symbols: ....
 180    * do not strip symbols from test binary
 181
 182  * Packages that register tests might provide a virtual package
 183    'test-<packagename>' to allow easy test discovery and retrival via
 184    debtest tools.
 185
 186
 187 ==== debtest tools ====
 188
 189  * Invocation::
 190    * single package tests
 191    * all (with -f to force even if resources are not sufficient)
 192    * tests of dependent packages (discovered via rdepends,
 193      "rrecommends" and "rsuggests")
 194    * given specific resources demands, just run
 195      the ones matching those
 196  * Customization/Output::
 197    plugins::
 198      * job resources requirement adjustments
 199            . manual customization
 200        . request from dashboard for the system (or alike)
 201          * executioners
 202        . local execution (monitor resources)
 203        . submit to cluster/cloud
 204      * output/reports
 205            . some structured output
 206            . interfaces to dashboards
 207
 208
 209 ==== Maintainer helpers ====
 210
 211    Helpers:
 212    - assess resources/performance:
 213
 214
 215 === Supplementary infrastructure ===
 216
 217 ==== Dashboard server ====
 218
 219 === Implementation Plan ===
 220
 221 This section is usually broken down into subsections, such as the packages
 222 being affected, data and system migration where necessary, user interface
 223 requirements and pictures     (photographs of drawings on paper work well).
 224
 225 == Implementation ==
 226
 227 To implement a specification, the developer should observe the use cases
 228 carefully, and follow the design specified. He should make note of places in
 229 which he has strayed from the design section, adding rationale describing why
 230 this happened. This is important so that next iterations of this specification
 231 (and new specifications that touch upon this subject) can use the specification
 232 as a reference.
 233
 234 The implementation is very dependent on the type of feature to be implemented.
 235 Refer to the team leader for further suggestions and guidance on this topic.
 236
 237  * Implementation language:
 238    - Python unless someone takes the burden to develop
 239      and maintain for upcoming years.
 240
 241 == Outstanding Issues ==
 242
 243 The specification process requires experienced people to drive it. More
 244 documentation on the process should be produced.
 245
 246 The drafting of a specification requires english skills and a very good
 247 understanding of the problem. It must also describe things to an extent that
 248 someone else could implement. This is a difficult set of conditions to ensure
 249 throughout all the specifications added.
 250
 251 There is a lot of difficulty in gardening obsolete, unwanted and abandoned
 252 specifications in the Wiki.
 253
 254 == BoF agenda and discussion ==
 255
 256 Possible meetings where this specification will be discussed.
 257 ----
 258 CategorySpec
 259