Also for stats report which repo and which job number use our setup

[neurodebian.git] / sandbox / proposal_regressiontestframwork.moin
diff --git a/sandbox/proposal_regressiontestframwork.moin b/sandbox/proposal_regressiontestframwork.moin

index fdfec3bc7250a1afe7f421150211a2c16f76fd4e..983a624c5d91550e3f7c12deaec304a560a8f373 100644 (file)
--- a/sandbox/proposal_regressiontestframwork.moin
+++ b/sandbox/proposal_regressiontestframwork.moin
@@ -1,49 +1,53 @@
-## This will eventually do to wiki.debian.org/RegressionTestFramework
+## This will eventually do to wiki.debian.org/DebTestFramework
  
   * '''Created''': <<Date(2010-10-07)>>
- * '''Contributors''': MichaelHanke
+ * '''Contributors''': MichaelHanke, YaroslavHalchenko
   * '''Packages affected''': 
   * '''See also''': 
  
  == Summary ==
  
-This specification describes conventions and tools that allow Debian to
-distribute and run regression test batteries developed by upstream or
-Debian developers in a uniform fashion.
+This specification describes DebTest -- a framework with conventions and tools
+that allow Debian to distribute test batteries developed by upstream or Debian
+developers.  DebTest aims to enable developers and users to perform extensive
+testing of a deployed Debian system or a particular software of interest in a
+uniform fashion.
  
  == Rationale ==
  
  Ideally software packaged for Debian comes with an exhaustive test suite that
-can be used to determine whether this software works as expected on the Debian
-platform. However, especially for complex software, these test suites are often
-resource hungry (CPU time, memory, diskspace, network bandwidth) and cannot be
-ran at package build time by buildds. Consequently, test suites are only
-utilized by the packager on a particular machine, before uploading a new version
-to the archive.
-
-However, Debian is an integrated system and packaged software is typically made
-to rely on functionality provided by other Debian packages (e.g. shared
-libraries) instead of shipping duplicates with different versions in every
-package -- for many good reasons. Unfortunately, there is also a downside to
-this: Debian packages often use 3rd-party tools with different versions than
-those tested by upstream, and moreover, the actual versions might change
-frequently between to subsequent uploads of a package.  Currently a change in a
-dependency that introduces an incompatibility cannot be detected reliably
-(before users have filed a bug report) -- even if upstream provides a testsuite
-that would have caught the breakage. Although there are archive-wide QA efforts
-(e.g. constantly rebuilding all packages) these tests can only detect API/ABI
-breakage or functionality tested during build-time checks -- they are not
-exhaustive for the aforementioned reasons.
+can be used to determine whether this particular software works as expected on
+the Debian platform. However, especially for complex software, these test
+suites are often resource hungry (CPU time, memory, disk space, network
+bandwidth) and cannot be ran at package build time by buildds. Consequently,
+test suites are typically utilized manually and only by the respective packager
+on a particular machine, before uploading a new version to the archive.
+
+However, Debian is an integrated system and packaged software typically relies
+on functionality provided by other Debian packages (e.g. shared libraries)
+instead of shipping duplicates with different versions in every package -- for
+many good reasons. Unfortunately, there is also a downside to this: Debian
+packages often use versions of 3rd-party tools that are different from those
+tested by upstream, and moreover, the actual versions of dependencies might
+change frequently between subsequent uploads of a dependent package.  Currently
+a change in a dependency that introduces an incompatibility cannot be detected
+reliably even if upstream provides a test suite that would have caught the
+breakage.  Therefore integration testing heavily relies on users to detect
+incorrect functioning and file bug reports. Although there are archive-wide QA
+efforts (e.g. constantly rebuilding all packages) these tests can only detect
+API/ABI breakage or functionality tested during build-time checks -- they are
+not exhaustive for the aforementioned reasons.
  
  This is a proposal to, first of all, package upstream test suites in a way that
  they can be used to run expensive archive-wide QA tests. However, this is also
-a proposal to establish means to test interactions of software from multiple
-Debian packages and test proper, continued, integration into the Debian system.
+a proposal to establish means to test interactions between software from
+multiple Debian packages to provide more thorough continued integration and
+regression testing for the Debian systems.
  
  == Use Cases ==
  
    * Moritz is a member of the security team. Whenever he applies a patch to fix
-    a security issue he wants to make sure that the behavior of the software
+    a security issue he wants to make sure that the generic behavior of the software
      remains unchanged. However, in general he only has access to test cases that
      are included in the source package (if any). In the absence of proper tests
      he can only either assume that is would work (bad by design), or rely on the
@@ -52,34 +56,84 @@ Debian packages and test proper, continued, integration into the Debian system.
      perform comprehensive testing on his own and release the fixed package as
      soon as the tests pass.
  
-  * Michael is a Debian package maintainer that takes care of three packages
-    each providing a data format conversion utility. While all three tools have
-    their merits there is also lots of overlap. For example, given a particular
-    data file they should all generate identical output. With a Debian
-    regression test framework, Michael can write and package cross-package test
-    suites to ensure that this promise is fulfilled at any time. Moreover,
-    Michael can also develop/package "pipeline" tests that ensure proper
-    functioning of multi-stage/package processing pipelines. For example,
-    testing a whole processing stream from a raw data format conversion, feeding
-    into an analysis to into a visualization package.
+  * Michael is a Debian package maintainer that takes care of three
+    packages each providing a data format conversion utility. While
+    all three tools have their merits there is also lots of
+    overlap. For example, given a particular data file they should all
+    generate identical output. With a DebTest framework, Michael can
+    write and package cross-package test suites to ensure that this
+    promise is fulfilled at any time.  Moreover, Michael can also
+    develop/package "pipeline" tests that ensure proper functioning of
+    multi-stage/package processing pipelines (from raw data format
+    conversion to visualization), where some stages could be
+    (re)processed using alternative tools from different software
+    packages promising to provide the same functionality.  By testing
+    a whole processing stream while changing the alternative
+    implementations, breakage of the compatibility compliance could be
+    detected.
+
+  * Yarik is a Debian maintainer of a package where upstream provides
+    a complete analysis pipeline which was used for an article
+    publication.  Such analysis requires relatively large array of
+    data and a range of tools from other packages to acquire
+    publication-ready summary of the results. Therefor such analysis
+    cannot be carried out at package build time.  Upstream aims to
+    assure the reproducibility of the published results and encourages
+    Yarik to promise correct functioning of the research product on
+    Debian systems.  Within the DebTest framework, Yarik can package
+    upstream analysis pipeline along with the target results to assure
+    reproducibility of the scientific findings.
  
    * Albert is a scientist using Debian for his research activities. The
      developers of his favorite software tell him to rather use the GreenPants
      distribution, because they cannot guarantee that their software works
-    properly on Debian. The reason their giving is that Debian has a different
+    properly on Debian. They reason that Debian has a different
      version of a numerical library that hasn't been "tested" by the authors.
-    With packaged regression test suites Albert can run, at any given point,
+    With packaged regression test suites Albert can install and run, at any given point,
      a complete test of his Debian system to ensure that everything is working
-    properly given the exact set of base library installed at this very moment.
+    properly given the exact set of base libraries installed at this very moment.
      This includes the test suite of the authors of his favorite software, but
-    also all distribution test suites written by Debian developers (see above).
-
-  * Finally, Lucas likes to run all kinds of tests on all packages in the Debian
-    archive. However, they are mostly concerned with individual packages. A
-    Debian-wide regression test framework would allow Lucas to execute complex
-    tests (suites for individual packages, interoperability tests, or
-    comparative) in an automated fashion, and file bug reports against the
-    respective packages whenever something fails.
+    also all distribution test suites provided by Debian developers (see above).
+
+  * Sylvestre maintains a core computational library in Debian.
+    A new version (or other modification) of this library promises performance
+    advantages.  Using DebTest he could not only verify the absence of
+    regressions but also to obtain direct performance comparison
+    against the previous version across a range of applications.
+
+  * Joerg maintains a repository of backports of Debian packages to be
+    installed in a stable environment.  He wants to assure that
+    backporting of the packages has not caused a deviation in their
+    intended functioning.  By using existing DebTest tests suites he
+    could verify that backported versions of the packages do not break
+    the stability and function as promised within the stable
+    environment.
+
+  * Mark wants to create a Debian-derived distribution and needs to
+    modify a number of essential packages in order to achieve the desired
+    improvements. He hopes that these changes do not break other Debian
+    packages, but he is not really sure. A comprehensive test battery for the
+    whole Debian system would offer him a way to verify proper functioning
+    of his modified snapshot of Debian -- without having to manually replicate
+    the testing efforts done by thousands of Debian contributors.
+
+  * Linus is an upstream developer. He just loves the fact that he can tell any
+    of his Debian-based users to just 'apt-get install' something and send him
+    the output of a debtest command, whenever they claim that his software
+    doesn't work properly. It pleases him to see his carefully developed test
+    suite to be conveniently accessible for users.
+
+  * Finally, Lucas has access to a powerful computing facility and
+    likes to run all kinds of tests on all packages in the Debian archive.
+    A Debian-wide regression test framework would allow Lucas to execute
+    complex test collections (suites for individual packages,
+    interoperability tests, or comparative) in an automated fashion,
+    and file bug reports against the respective packages whenever a
+    malfunction is detected. Some of Lucas friends are not brave enough to file
+    bugs, but still want to contribute. They simply run (selected) tests
+    on their local machines that in turn report results/logs to a Debian
+    dashboard server, where interested parties can get a weather report of
+    Debian's status.
  
  == Scope ==
  
@@ -89,11 +143,11 @@ This specification is applicable to all Debian packages, and Debian as a whole.
  
  A specification should be built with the following considerations:
  
-  * The person implementing it may not be the person writing it. It should be
+  * The person implementing it may not be the person writing it. Specification should be
    * clear enough for someone to be able to read it and have a clear path
-  * towards implementing it. If it doesn't, it needs more detail.
+  * towards implementing it. If it is not straightforward, it needs more detail.
  
-  * That the use cases covered in the specification should be practical
+  * Use cases covered in the specification should be practical
    * situations, not contrived issues.
  
    * Limitations and issues discovered during the creation of a specification
@@ -105,20 +159,62 @@ A specification should be built with the following considerations:
  
  Specific issues related to particular sections are described further below.
  
-=== Summary ===
  
-The summary should not attempt to say '''why''' the spec is being defined, just
-'''what''' is being specified.
+=== Core components ===
+
+ * Organization of the framework
+   - packages might register ways to run basic tests against installed
+     versions
+   register:
+    - executable?
+
+
+==== Packaged tests ====
+
+ * Metainformation:
+   * duration: ....
+   * resources:
+   * suites:
+
+ * Debug symbols: ....
+   * do not strip symbols from test binary
+
+ * Packages that register tests might provide a virtual package
+   'test-<packagename>' to allow easy test discovery and retrival via
+   debtest tools.
  
-=== Rationale ===
  
-This should be the description of '''why''' this spec is being defined.
+==== debtest tools ====
  
-=== Scope and Use Cases ===
+ * Invocation::
+   * single package tests
+   * all (with -f to force even if resources are not sufficient)
+   * tests of dependent packages (discovered via rdepends,
+     "rrecommends" and "rsuggests")
+   * given specific resources demands, just run
+     the ones matching those
+ * Customization/Output::
+   plugins::
+     * job resources requirement adjustments
+          . manual customization
+       . request from dashboard for the system (or alike)
+        * executioners
+       . local execution (monitor resources)
+       . submit to cluster/cloud
+     * output/reports
+          . some structured output
+          . interfaces to dashboards
  
-While not always required, but in many cases they bring much better clarity to
-the scope and scale of the specification than could be obtained by talking in
-abstract terms.
+
+==== Maintainer helpers ====
+
+   Helpers:
+   - assess resources/performance:
+
+
+=== Supplementary infrastructure ===
+
+==== Dashboard server ====
  
  === Implementation Plan ===
  
@@ -138,6 +234,10 @@ as a reference.
  The implementation is very dependent on the type of feature to be implemented.
  Refer to the team leader for further suggestions and guidance on this topic.
  
+ * Implementation language:
+   - Python unless someone takes the burden to develop
+     and maintain for upcoming years.
+
  == Outstanding Issues ==
  
  The specification process requires experienced people to drive it. More