sandbox/proposal_regressiontestframwork.moin

   1 ## This will eventually do to wiki.debian.org/DebTestFramework
   2
   3  * '''Created''': <<Date(2010-10-07)>>
   4  * '''Contributors''': MichaelHanke, YaroslavHalchenko
   5  * '''Packages affected''':
   6  * '''See also''':
   7
   8 == Summary ==
   9
  10 This specification describes DebTest -- a framework with conventions and tools
  11 that allow Debian to distribute test batteries developed by upstream or Debian
  12 developers.  DebTest will enable an extensive testing of a deployed Debian
  13 system or a particular software of interest in a uniform fashion.
  14
  15 == Rationale ==
  16
  17 Ideally software packaged for Debian comes with an exhaustive test suite that
  18 can be used to determine whether this software works as expected on the Debian
  19 platform. However, especially for complex software, these test suites are often
  20 resource hungry (CPU time, memory, diskspace, network bandwidth) and cannot be
  21 ran at package build time by buildds. Consequently, test suites are typically
  22 utilized manually only by the respective packager on a particular machine, before
  23 uploading a new version to the archive.
  24
  25 However, Debian is an integrated system and packaged software typically
  26 relies on functionality provided by other Debian packages (e.g. shared
  27 libraries) instead of shipping duplicates with different versions in every
  28 package -- for many good reasons. Unfortunately, there is also a downside to
  29 this: Debian packages often use versions of 3rd-party tools different from
  30 those tested by upstream, and moreover, the actual versions of dependencies
  31 might change frequently between subsequent uploads of a dependent package.  Currently
  32 a change in a dependency that introduces an incompatibility cannot be detected
  33 reliably even if upstream provides a test suite that would have caught
  34 the breakage.  Therefore integration testing heavily relies on users to detect
  35 incorrect functioning and file bug reports. Although there are archive-wide
  36 QA efforts (e.g. constantly rebuilding all packages) these tests can only
  37 detect API/ABI breakage or functionality tested during build-time checks --
  38 they are not exhaustive for the aforementioned reasons.
  39
  40 This is a proposal to, first of all, package upstream test suites in a way that
  41 they can be used to run expensive archive-wide QA tests. However, this is also
  42 a proposal to establish means to test interactions between software from multiple
  43 Debian packages to provide more thorough continued integration and regression testing
  44 for the Debian systems.
  45
  46 == Use Cases ==
  47
  48   * Moritz is a member of the security team. Whenever he applies a patch to fix
  49     a security issue he wants to make sure that the generic behavior of the software
  50     remains unchanged. However, in general he only has access to test cases that
  51     are included in the source package (if any). In the absence of proper tests
  52     he can only either assume that is would work (bad by design), or rely on the
  53     respective package maintainer to run the appropriate tests (introduces
  54     delays). A packaged exhaustive regression test suite would allow Moritz to
  55     perform comprehensive testing on his own and release the fixed package as
  56     soon as the tests pass.
  57
  58   * Michael is a Debian package maintainer that takes care of three
  59     packages each providing a data format conversion utility. While
  60     all three tools have their merits there is also lots of
  61     overlap. For example, given a particular data file they should all
  62     generate identical output. With a DebTest framework, Michael can
  63     write and package cross-package test suites to ensure that this
  64     promise is fulfilled at any time.  Moreover, Michael can also
  65     develop/package "pipeline" tests that ensure proper functioning of
  66     multi-stage/package processing pipelines (from raw data format
  67     conversion to visualization), where some stages could be
  68     (re)processed using alternative tools from different software
  69     packages promising to provide the same functionality.  By testing
  70     a whole processing stream while changing the alternative
  71     implementations, breakage of the compatibility compliance could be
  72     detected.
  73
  74   * Yarik is a Debian maintainer of a package where upstream provides
  75     a complete analysis pipeline which was used for an article
  76     publication.  Such analysis requires relatively large array of
  77     data and a range of tools from other packages to acquire
  78     publication-ready summary of the results. Therefor such analysis
  79     cannot be carried out at package build time.  Upstream aims to
  80     assure the reproducibility of the published results and encourages
  81     Yarik to promise correct functioning of the research product on
  82     Debian systems.  Within the DebTest framework, Yarik can package
  83     upstream analysis pipeline along with the target results to assure
  84     reproducibility of the scientific findings.
  85
  86   * Albert is a scientist using Debian for his research activities. The
  87     developers of his favorite software tell him to rather use the GreenPants
  88     distribution, because they cannot guarantee that their software works
  89     properly on Debian. They reason that Debian has a different
  90     version of a numerical library that hasn't been "tested" by the authors.
  91     With packaged regression test suites Albert can install and run, at any given point,
  92     a complete test of his Debian system to ensure that everything is working
  93     properly given the exact set of base libraries installed at this very moment.
  94     This includes the test suite of the authors of his favorite software, but
  95     also all distribution test suites provided by Debian developers (see above).
  96
  97   * Sylvestre is a Debian developer of a core computational library
  98     new version (or a custom build) of which promises performance
  99     advantages.  Using DebTest he could not only verify the absence of
 100     regressions but also to obtain direct performance comparison
 101     against the previous version across a range of applications.
 102
 103   * Joerg maintains a repository of backports of Debian packages to be
 104     installed in a stable environment.  He wants to assure that
 105     backporting of the packages has not caused a deviation in their
 106     intended functioning.  By using existing DebTest tests suites he
 107     could verify that backported versions of the packages do not break
 108     the stability and function as promised within the stable
 109     environment.
 110
 111   * Mark wants to create a Debian-derived distribution and needs to
 112     modify a number of essential packages in order to achieve the desired
 113     improvements. He hopes that these changes do not break other Debian
 114     packages, but he is not really sure. A comprehensive test battery for the
 115     whole Debian system would offer him a way to verify proper functioning
 116     of his modified snapshot of Debian -- without having to manually replicate
 117     the testing efforts done by thousands of Debian contributors.
 118
 119   * Linus is an upstream developer. He just loves the fact that he can tell any
 120     of his Debian-based users to just 'apt-get install' something and send him
 121     the output of a debtest command, whenever they claim that his software
 122     doesn't work properly. It pleases him to see his carefully developed test
 123     suite to be conveniently accessible for users.
 124
 125   * Finally, Lucas has access to a powerful computing facility and
 126     likes to run all kinds of tests on all packages in the Debian archive.
 127     A Debian-wide regression test framework would allow Lucas to execute
 128     complex test collections (suites for individual packages,
 129     interoperability tests, or comparative) in an automated fashion,
 130     and file bug reports against the respective packages whenever a
 131     malfunction is detected. Some of Lucas friends are not brave enough to file
 132     bugs, but still want to contribute. They simply run (selected) tests
 133     on their local machines that in turn report results/logs to a Debian
 134     dashboard server, where interested parties can get a weather report of
 135     Debian's status.
 136
 137 == Scope ==
 138
 139 This specification is applicable to all Debian packages, and Debian as a whole.
 140
 141 == Design ==
 142
 143 A specification should be built with the following considerations:
 144
 145   * The person implementing it may not be the person writing it. Specification should be
 146   * clear enough for someone to be able to read it and have a clear path
 147   * towards implementing it. If it is not straightforward, it needs more detail.
 148
 149   * Use cases covered in the specification should be practical
 150   * situations, not contrived issues.
 151
 152   * Limitations and issues discovered during the creation of a specification
 153   * should be clearly pointed out so that they can be dealt with explicitly.
 154
 155   * If you don't know enough to be able to competently write a spec, you should
 156   * either get help or research the problem further. Avoid spending time making
 157   * up a solution: base yourself on your peers' opinions and prior work.
 158
 159 Specific issues related to particular sections are described further below.
 160
 161
 162 === Core components ===
 163
 164  * Organization of the framework
 165    - packages might register ways to run basic tests against installed
 166      versions
 167    register:
 168     - executable?
 169
 170
 171 ==== Packaged tests ====
 172
 173  * Metainformation:
 174    * duration: ....
 175    * resources:
 176    * suites:
 177
 178  * Debug symbols: ....
 179    * do not strip symbols from test binary
 180
 181  * Packages that register tests might provide a virtual package
 182    'test-<packagename>' to allow easy test discovery and retrival via
 183    debtest tools.
 184
 185
 186 ==== debtest tools ====
 187
 188  * Invocation::
 189    * single package tests
 190    * all (with -f to force even if resources are not sufficient)
 191    * tests of dependent packages (discovered via rdepends,
 192      "rrecommends" and "rsuggests")
 193    * given specific resources demands, just run
 194      the ones matching those
 195  * Customization/Output::
 196    plugins::
 197      * job resources requirement adjustments
 198            . manual customization
 199        . request from dashboard for the system (or alike)
 200          * executioners
 201        . local execution (monitor resources)
 202        . submit to cluster/cloud
 203      * output/reports
 204            . some structured output
 205            . interfaces to dashboards
 206
 207
 208 ==== Maintainer helpers ====
 209
 210    Helpers:
 211    - assess resources/performance:
 212
 213
 214 === Supplementary infrastructure ===
 215
 216 ==== Dashboard server ====
 217
 218 === Implementation Plan ===
 219
 220 This section is usually broken down into subsections, such as the packages
 221 being affected, data and system migration where necessary, user interface
 222 requirements and pictures     (photographs of drawings on paper work well).
 223
 224 == Implementation ==
 225
 226 To implement a specification, the developer should observe the use cases
 227 carefully, and follow the design specified. He should make note of places in
 228 which he has strayed from the design section, adding rationale describing why
 229 this happened. This is important so that next iterations of this specification
 230 (and new specifications that touch upon this subject) can use the specification
 231 as a reference.
 232
 233 The implementation is very dependent on the type of feature to be implemented.
 234 Refer to the team leader for further suggestions and guidance on this topic.
 235
 236  * Implementation language:
 237    - Python unless someone takes the burden to develop
 238      and maintain for upcoming years.
 239
 240 == Outstanding Issues ==
 241
 242 The specification process requires experienced people to drive it. More
 243 documentation on the process should be produced.
 244
 245 The drafting of a specification requires english skills and a very good
 246 understanding of the problem. It must also describe things to an extent that
 247 someone else could implement. This is a difficult set of conditions to ensure
 248 throughout all the specifications added.
 249
 250 There is a lot of difficulty in gardening obsolete, unwanted and abandoned
 251 specifications in the Wiki.
 252
 253 == BoF agenda and discussion ==
 254
 255 Possible meetings where this specification will be discussed.
 256 ----
 257 CategorySpec
 258